Register for free and join us online on August 28th for our first SpeechBrain Online Summit endorsed by ISCA as an official Interspeech 2023 satellite event! In this one-day summit, you will learn about the latest developments and updates of SpeechBrain, and engage in an open and collaborative discussion with the community. The summit features four industrial talks from JP Morgan Chase & Co, Orange Labs, Ubenwa AI, and ViaDialog as well as two academic talks from the University of Cambridge and Avignon University. A panel discussion will conclude the day with researchers from HuggingFace, Kaldi and K2, ESPNet, Librosa, and Torchaudio to chat about the future of our open-source tools!
SpeechBrain is an open-source conversational AI toolkit. We designed it to be simple, flexible, and well-documented. It achieves competitive performance in various domains.
SpeechBrain supports state-of-the-art methods for end-to-end speech recognition, including models based on CTC, CTC+attention, transducers, transformers, and neural language models relying on recurrent neural networks and transformers.
Speaker recognition is already deployed in a wide variety of realistic applications. SpeechBrain provides different models for speaker recognition, including X-vector, ECAPA-TDNN, PLDA, contrastive learning
Spectral masking, spectral mapping, and time-domain enhancement are different methods already available within SpeechBrain. Separation methods such as Conv-TasNet, DualPath RNN, and SepFormer are implemented as well.
SpeechBrain provides efficient and GPU-friendly speech augmentation pipelines and acoustic features extraction, normalisation that can be used on-the-fly during your experiment.
Combining multiple microphones is a powerful approach to achieve robustness in adverse acoustic environments. SpeechBrain provides various techniques for beamforming (e.g, delay-and-sum, MVDR, and GeV) and speaker localization.
Text-to-Speech (TTS, also known as Speech Synthesis) allows users to generate speech signals from an input text. SpeechBrain supports popular models for TTS (e.g., Tacotron2) and Vocoders (e.g, HiFIGAN).
SpeechBrain also supports Spoken Language Understanding, Language Modeling, Diarization, Speech Translation, Language Identification, Voice Activity Detection, Sound classification, Grapheme-to-Phoneme, and many others.
SpeechBrain is designed to speed-up research and development of speech technologies. It is modular, flexible, easy-to-customize, and contains several recipes for popular datasets. Documentation and tutorials are here to help newcomers using SpeechBrain.
SpeechBrain provides multiple pre-trained models that can easily be deployed with nicely designed interfaces. Transcribing, verifying speakers, enhancing speech, separating sources have never been that easy!
# From PyPI
pip install speechbrain
# Local installation
git clone https://github.com/speechbrain/speechbrain.git
cd speechbrain
pip install -r requirements.txt
pip install --editable .
cd recipes/{dataset}/{task}/train
# Train the model using the default recipe
python train.py hparams/train.yaml
# Train the model with a hyperparameter tweak
python train.py hparams/train.yaml --learning_rate=0.1
class ASR_Brain(sb.Brain):
def compute_forward(self, batch, stage):
# Compute features (mfcc, fbanks, etc.) on the fly
features = self.hparams.compute_features(batch.wavs)
# Improve robustness with pre-built augmentations
features = self.hparams.augment(features)
# Apply your custom model
return self.modules.myCustomModel(features)
They are, or they sponsored SpeechBrain!