2023 Online SpeechBrain Summit

Watch the recording of our SpeechBrain Online Summit event!

In our one-day summit, which was endorsed by ISCA as an official Interspeech 2023 satellite event, we explored the latest developments and updates of SpeechBrain and engaged in an open and collaborative discussion with the community. The summit featured four insightful industrial talks from JP Morgan Chase & Co, Orange Labs, Ubenwa AI, and ViaDialog, along with two engaging academic talks from the University of Cambridge and Avignon University. The day concluded with a lively panel discussion involving researchers from HuggingFace, Kaldi and K2, ESPNet, Librosa, and Torchaudio, discussing the future of our open-source tools.



Key Features

SpeechBrain is an open-source conversational AI toolkit. We designed it to be simple, flexible, and well-documented. It achieves competitive performance in various domains.

Speech Recognition

SpeechBrain supports state-of-the-art methods for end-to-end speech recognition, including models based on CTC, CTC+attention, transducers, transformers, and neural language models relying on recurrent neural networks and transformers.

Speaker Recognition

Speaker recognition is already deployed in a wide variety of realistic applications. SpeechBrain provides different models for speaker recognition, including X-vector, ECAPA-TDNN, PLDA, contrastive learning

Speech Enhancement

Spectral masking, spectral mapping, and time-domain enhancement are different methods already available within SpeechBrain. Separation methods such as Conv-TasNet, DualPath RNN, and SepFormer are implemented as well.

Speech Processing

SpeechBrain provides efficient and GPU-friendly speech augmentation pipelines and acoustic features extraction, normalisation that can be used on-the-fly during your experiment.

Multi Microphone Processing

Combining multiple microphones is a powerful approach to achieve robustness in adverse acoustic environments. SpeechBrain provides various techniques for beamforming (e.g, delay-and-sum, MVDR, and GeV) and speaker localization.

Text-to-Speech

Text-to-Speech (TTS, also known as Speech Synthesis) allows users to generate speech signals from an input text. SpeechBrain supports popular models for TTS (e.g., Tacotron2) and Vocoders (e.g, HiFIGAN).

Other Tasks

SpeechBrain also supports Spoken Language Understanding, Language Modeling, Diarization, Speech Translation, Language Identification, Voice Activity Detection, Sound classification, Grapheme-to-Phoneme, and many others.

Research & Development

SpeechBrain is designed to speed-up research and development of speech technologies. It is modular, flexible, easy-to-customize, and contains several recipes for popular datasets. Documentation and tutorials are here to help newcomers using SpeechBrain.

HuggingFace!

SpeechBrain provides multiple pre-trained models that can easily be deployed with nicely designed interfaces. Transcribing, verifying speakers, enhancing speech, separating sources have never been that easy!

Why SpeechBrain?

Adapts to your needs.

SpeechBrain allows users to install either via PyPI to rapidly use the standard library or via a local install to view recipes and further explore the features of the toolkit.
Get Started Now

  # From PyPI
  pip install speechbrain

  # Local installation
  git clone https://github.com/speechbrain/speechbrain.git
  cd speechbrain
  pip install -r requirements.txt
  pip install --editable .
                    

A single command.

Every SpeechBrain recipe relies on a YAML file that summarizes all the functions and hyperparameters of the system. A single Python script combines them to implement the desired task.
Get Started Now

  cd recipes/{dataset}/{task}/train

  # Train the model using the default recipe
  python train.py hparams/train.yaml

  # Train the model with a hyperparameter tweak
  python train.py hparams/train.yaml --learning_rate=0.1
                    

Built for research.

SpeechBrain is designed for research and development. Hence, flexibility and transparency are core concepts to facilitate our daily work. You can define your own deep learning models, losses, training / evaluation loops, input pipeline / transformations and use them handily without overhead.
Get Started Now

  class ASR_Brain(sb.Brain):
    def compute_forward(self, batch, stage):

      # Compute features (mfcc, fbanks, etc.) on the fly
      features = self.hparams.compute_features(batch.wavs)

      # Improve robustness with pre-built augmentations
      features = self.hparams.augment(features)

      # Apply your custom model
      return self.modules.myCustomModel(features)
                      

Sponsors

They are, or they sponsored SpeechBrain!

Our new call for sponsors (2023) is now open.

nle
hf
baidu
ovh
lia

Previous Sponsors

They are, or they sponsored SpeechBrain!

Our new call for sponsors (2022) is now open.

Collaborators