Watch the recording of our SpeechBrain Online Summit event!
In our one-day summit, which was endorsed by ISCA as an official Interspeech 2023 satellite event, we explored the latest developments and updates of SpeechBrain and engaged in an open and collaborative discussion with the community.
The summit featured four insightful industrial talks from JP Morgan Chase & Co, Orange Labs, Ubenwa AI, and ViaDialog, along with two engaging academic talks from the University of Cambridge
and Avignon University. The day concluded with a lively panel discussion involving researchers from HuggingFace, Kaldi and K2, ESPNet, Librosa, and Torchaudio, discussing the future of our
open-source tools.
SpeechBrain is an open-source conversational AI toolkit. We designed it to be simple, flexible, and well-documented. It achieves competitive performance in various domains.
SpeechBrain supports state-of-the-art methods for end-to-end speech recognition, including models based on CTC, CTC+attention, transducers, transformers, and neural language models relying on recurrent neural networks and transformers.
Speaker recognition is already deployed in a wide variety of realistic applications. SpeechBrain provides different models for speaker recognition, including X-vector, ECAPA-TDNN, PLDA, contrastive learning
Spectral masking, spectral mapping, and time-domain enhancement are different methods already available within SpeechBrain. Separation methods such as Conv-TasNet, DualPath RNN, and SepFormer are implemented as well.
SpeechBrain provides efficient and GPU-friendly speech augmentation pipelines and acoustic features extraction, normalisation that can be used on-the-fly during your experiment.
Combining multiple microphones is a powerful approach to achieve robustness in adverse acoustic environments. SpeechBrain provides various techniques for beamforming (e.g, delay-and-sum, MVDR, and GeV) and speaker localization.
Text-to-Speech (TTS, also known as Speech Synthesis) allows users to generate speech signals from an input text. SpeechBrain supports popular models for TTS (e.g., Tacotron2) and Vocoders (e.g, HiFIGAN).
SpeechBrain also supports Spoken Language Understanding, Language Modeling, Diarization, Speech Translation, Language Identification, Voice Activity Detection, Sound classification, Grapheme-to-Phoneme, and many others.
SpeechBrain is designed to speed-up research and development of speech technologies. It is modular, flexible, easy-to-customize, and contains several recipes for popular datasets. Documentation and tutorials are here to help newcomers using SpeechBrain.
SpeechBrain provides multiple pre-trained models that can easily be deployed with nicely designed interfaces. Transcribing, verifying speakers, enhancing speech, separating sources have never been that easy!
# From PyPI
pip install speechbrain
# Local installation
git clone https://github.com/speechbrain/speechbrain.git
cd speechbrain
pip install -r requirements.txt
pip install --editable .
cd recipes/{dataset}/{task}/train
# Train the model using the default recipe
python train.py hparams/train.yaml
# Train the model with a hyperparameter tweak
python train.py hparams/train.yaml --learning_rate=0.1
class ASR_Brain(sb.Brain):
def compute_forward(self, batch, stage):
# Compute features (mfcc, fbanks, etc.) on the fly
features = self.hparams.compute_features(batch.wavs)
# Improve robustness with pre-built augmentations
features = self.hparams.augment(features)
# Apply your custom model
return self.modules.myCustomModel(features)
They are, or they sponsored SpeechBrain!