Skip to the content.

The project

SpeechBrain is an open-source and all-in-one speech toolkit relying on PyTorch.

The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition (both end-to-end and HMM-DNN), speaker recognition, speech separation, multi-microphone signal processing (e.g, beamforming), self-supervised and unsupervised learning, speech contamination / augmentation, and many others. The toolkit will be designed to be a stand-alone framework, but simple interfaces with well-known toolkits, such as Kaldi will also be implemented.

SpeechBrain is currently under development and has been announced in September 2019. A first alpha version will be available in the next months.

See a short introductory video on the SpeechBrain project

Stay tuned!

Why SpeechBrain?

Speech processing toolkits have gained popularity in the last years. For automatic speech recognition (ASR) purposes, for instance, Kaldi is an established framework. Some other ASR toolkits have been recently developed using the Python language such as PyTorch-Kaldi, PyKaldi, and ESPnet. Beyond speech recognition, a variety of other solutions have been developed for speech-related applications, such as speech separation, speech enhancement, speaker recognition, and language model training.

Even though many of these frameworks could be very helpful for the specific task for which they are designed, our experience in the field suggests that having a single, efficient, and flexible toolkit can significantly speed up research and development of speech and audio processing techniques. Indeed, it is significantly easier to familiarize oneself with a single toolkit than to learn several different frameworks. Moreover, the use of a single platform for different speech and audio applications makes it more natural to develop multi-task systems that jointly solve different problems. It is also easier to build a strong and fruitful community when considering a unique and self-contained framework.

Why PyTorch?

To ensure the needed flexibility and the user-friendliness of our system, we think that our platform must be built on the top of PyTorch for the following reasons:

Toolkits

During the project, we plan to collaborate with the PyTorch Audio team of Facebook and with NVIDIA, that has recently developed the Neural Modules toolkit (Nemo), which provides flexibility and modularity to accelerate speech applications.

How to collaborate

A strong toolkit needs a strong community. While a core team will be dedicated to develop and maintain the core functionalities of SpeechBrain, we need the help of the entire community to extend this ambitious project to numerous and various applications. Feel free to contact us, if you are interested to contribute.

Join us!

Thanks to our sponsors, we are hiring talented interns (3-6 months internships) that will work at Mila (Montréal) with the core development team. The ideal candidate is a PhD student with a strong experience in both PyTorch and speech technologies. Send us your CV if you are interested in this opportunity!

Contact us

If you are interested to collaborate or sponsor us, or if you simply want to hear more about this project, please contact us at speechbrainproject@gmail.com.

Current Sponsors

Mila Nvidia Dolby Samsung

Current collaborators

LIA PyTorch IBM FluentAI