SpeechBrain has established itself as a leading deep learning toolkit for speech processing in recent years, with impressive usage statistics to back it up. With an average of 100 daily clones and 1000 daily downloads on its GitHub repository, along with over 6,000 stars and 1100 forks, SpeechBrain is a popular choice among speech processing experts.
In this summit, we are excited to share the latest developments and updates on the SpeechBrain project, engage in an open and collaborative discussion with the community, and introduce it to a broader audience of speech professionals. We would like participants to stay up-to-date with the latest advancements made in SpeechBrain and speech processing technology. We also wish to gather, more interactively, the feedback from the community to better plan future developments. The event will take place four days after the main conference on August 28th.
We are passionate about organizing this event for several reasons:
• The field of speech technology has seen tremendous growth in recent
years. Our goal is to keep the community informed about the latest
developments and future plans for the SpeechBrain project. We also aim
to engage in an open dialogue with the community to set ambitious goals
for the project’s future.
• We are excited to bring together experts from both industry and
academia to showcase their impactful projects and share their knowledge
with the community.
• This event is not only an opportunity to learn and stay updated, but also to network and connect with like-minded individuals in the SpeechBrain community.
The core idea is to avoid missing this chance to be a part of shaping the future of speech technologies and building valuable connections within the community.
The event will start at 9am Eastern Daylight Time (EDT) on August 28th and, we will have two sessions with a break from 11.30 am - 12.00 pm.
Morning (9.00 am - 11.30 am):
■ 9.00 am - 9.30 am Opening and thanks to Sponsors
■ 9.30 am - 10.00 amIndustry Talk 1: Arsenii Gorin (UbenwaAI)
■ 10.00 am - 10.30 amAcademic Talk 1: Yannick Estève (Université Avignon)
■ 10.30 am - 11.00 amIndustry Talk 2: Ariane Nabeth Halber (ViaDialog)
■ 11.00 am - 11.30 amAcademic Talk 2: Yan Gao (University of Cambridge)
Lunch Break (11.30 am - 12.00 pm)
Afternoon (12.00 pm - 4.30 pm):
■ 12.00 pm - 1.00 pmSpeechBrain Roadmap 2023 & latest updates
■ 1.00 pm - 1.30 pmIndustry Talk 3: Peter Plantinga (JP Morgan Chase)
■ 1.30 pm - 2.00 pmAcademic Talk 4: Vielzeuf Valentin (Orange Labs)
■ 2.00 pm - 2.30 pmCoffee Break
■ 2.30 pm - 4.15 pmPanel Discussion and Q&A:
Shinji Watanabe, Dan Povey, Brian McFee, Sanchit Gandhi, Zhaoheng Ni
■ 4.15 pm - 4.30 pmFinal Remarks and Closing
Continual Learning for End-to-End ASR by Averaging Domain Experts
Peter Plantinga is an Applied AI/ML Associate at the Machine Learning Center of Excellence at JP Morgan Chase & Co. He received his PhD in computer science in 2021 from the Ohio State University (USA) under Prof. Eric Fosler-Lussier focusing on knowledge transfer for the tasks of speech enhancement, robust ASR, and reading verification. His current work involves adapting large-scale ASR models to the financial domain without forgetting, as well as better evaluations of ASR models.
How ViaDialog sponsors speechbrain and brings hyperlarge vocabulary speech technologies to the contact centres
Ariane Nabeth-Halber has been working in the speech industry for 25 years. She started her career in research (ATR, Japan; Thalès, France), and then moved to the speech industry, namely with Nuance Communication, and French company Bertin IT, working there with contact centres, broadcasters, trading floors and public ministries, but also academic labs such as LIUM and Avignon University/LIA. Since August 2021, Ariane Nabeth-Halber leads the Speech and Conversational AI team at ViaDialog, to deliver efficient and safe customer relationship experiences. A European Commission expert and LT-Innovate board member, Ariane holds a PhD in computer science and signal processing from Telecom ParisTech. She regularly speaks at conferences on AI and speech technology.
Deep learning for infant cry classification
Arsenii is the lead ML research scientist Ubenwa. He obtained a PhD from Université de Lorraine working on Automatic Speech Recognition. His main research interests are practical applications of machine learning techniques for audio and speech processing.
Speech Recognition Toolkits in Focus: Analyzing Speechbrain's Advantages and Drawbacks through some Orange’s Projects Examples
Valentin VIelzeuf is currently a researcher at Orange, focusing on Speech Recognition, Spoken Language Understanding and Complexity Reduction. He holds a phD thesis on Multimodal Deep Learning and also has a background in Computer Vision.
Federated self-supervised speech representation learning
Yan Gao is a final-year PhD student in Machine Learning System lab at the University of Cambridge, supervised by Prof Nicholas Lane. His research interests are in machine learning, deep learning, optimisation. His recent topics are in federated learning with self-supervised learning on audio and vision data.
Advancing research: some examples of the SpeechBrain's potential in the context of the LIAvignon partnership chair
Yannick received the M.S. (1998) in computer science from the Aix-Marseilles University and the Ph.D. (2002) from Avignon University, France. He joined Le Mans Université (LIUM lab) in 2003 as an associate professor, and became a full professor in 2010. He moved to Avignon University in 2019 and is the head of the Computer Science Laboratory of Avignon (LIA) since 2020. He has authored and co-authored more than 150 journal and conference papers in speech and language processing.
Shinji Watanabe is an Associate Professor at Carnegie Mellon University, Pittsburgh, PA. He received his B.S., M.S., and Ph.D. (Dr. Eng.) degrees from Waseda University, Tokyo, Japan. He was a research scientist at NTT Communication Science Laboratories, Kyoto, Japan, from 2001 to 2011, a visiting scholar at Georgia institute of technology, Atlanta, GA, in 2009, and a senior principal research scientist at Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA USA from 2012 to 2017. Before Carnegie Mellon University, he was an associate research professor at Johns Hopkins University, Baltimore, MD, USA, from 2017 to 2020. His research interests include automatic speech recognition, speech enhancement, spoken language understanding, and machine learning for speech and language processing. He has published over 300 papers in peer-reviewed journals and conferences and received several awards, including the best paper award from the IEEE ASRU in 2019. He is a Senior Area Editor of the IEEE Transactions on Audio Speech and Language Processing. He was/has been a member of several technical committees, including the APSIPA Speech, Language, and Audio Technical Committee (SLA), IEEE Signal Processing Society Speech and Language Technical Committee (SLTC), and Machine Learning for Signal Processing Technical Committee (MLSP). He is an IEEE and ISCA Fellow.
Daniel Povey is known for many different contributions to the technology of speech recognition, including early innovations in sequence training such as Minimum Phone Error, for the Kaldi toolkit, for "next-gen Kaldi" tools k2/lhotse/Icefall/sherpa, and for the Librispeech dataset. He completed his PhD at Cambridge University in 2003, spent about ten years working for industry research labs (IBM Research and then Microsoft Research) and 7 years as non-tenure-track faculty at Johns Hopkins University. He moved to Beijing, China in November 2019 to join Xiaomi Corporation as Chief Voice Scientist. He is an IEEE Fellow as of 2023.
Brian McFee is Assistant Professor of Music Technology and Data Science New York University. His work lies at the intersection of machine learning and audio analysis. He is an active open source software developer, and the principal maintainer of the librosa package for audio analysis.
Sanchit Gandhi is an ML Engineer at Hugging Face. He leads the open-source audio team and maintains the audio models in the Transformers library, with the goal of making state-of-the-art speech recognition models more accessible to the community. Sanchit’s research interests lie in robust, generalisable speech recognition. Prior to working at Hugging Face, Sanchit completed his Masters’ Degree from the University of Cambridge.
Zhaoheng Ni is a research scientist in the PyTorch Audio team of Meta. He graduated from City University of New York supervised by Professor Michael Mandel then joined Meta AI as a research scientist in 2021. His research interests are single-channel and multi-channel speech enhancement, speech separation, and robust ASR.
Titouan is a Research Scientist at the Samsung AI Research center in Cambridge (UK) and a visiting scholar at the Cambridge Machine Learning Systems Lab from the University of Cambridge (UK). Previously, he was an Associate Professor in computer science at the Laboratoire Informatique d’Avignon (LIA), from Avignon University (FR). He also was a senior research associate at the University of Oxford (UK) within the Oxford Machine Learning Systems group. He received his PhD in computer science from the University of Avignon (FR) and in partnership with Orkis focusing on quaternion neural networks, automatic speech recognition, and representation learning. His current work involves efficient speech recognition, federated learning, and self-supervised learning. He is also currently collaborating with the University of Montréal (Mila, QC, Canada) as the co-leader of the SpeechBrain project.
Cem is an Assistant Professor at Université Laval in the Computer Science and Software Engineering department. He is also currently an Affiliate Assistant Professor in the Concordia University Computer Science and Software Engineering Department, and an invited researcher at Mila, Québec AI Institue. He received his PhD in Computer Science from University of Illinois at Urbana-Champaign (UIUC), and did a postdoc in Mila Québec AI Institute and Université de Sherbrooke. He serves as reviewer in several conferences including Neurips, ICML, ICLR, ICASSP, MLSP and journals such as IEEE Signal Processing Letters (SPL), IEEE Transactions on Audio, Speech, and Language Processing (TASL). His research interests include Deep learning for Source Separation and Speech Enhancement under realistic conditions, Neural Network Interpretability, and Latent Variable Modeling. He is a recipient of best paper award in the 2017 version IEEE Machine Learning for Signal Processing Conference (MLSP), as well as the Sabura Muroga Fellowship from the UIUC CS department. He's a core contributor to the SpeechBrain project, leading the speech separation part.
Adel is a research engineer at University of Avignon (FR). He completed his Bachelor's degree in computer science with distinction in an innovation and research-devoted curriculum and earned a two-year entrepreneurship diploma in 2022. Currently, Adel is participating in a master's apprenticeship program in computer science with specialization in AI, where he professionally contributes to the development of SpeechBrain, an all-in-one, open-source, PyTorch-based speech processing toolkit. At SpeechBrain, he leads the efforts of the automatic speech recognition community.
Mirco is an assistant professor at Concordia University, an adjunct professor at Université de Montréal, and Mila associate member. His main research interests are deep learning and Conversational AI. He is the author or co-author of more than 60 papers on these research topics. He received his Ph.D. (with cum laude distinction) from the University of Trento in December 2017. Mirco is an active member of the speech and machine learning communities. He is the founder and leader of the SpeechBrain project which aim to build an open-source toolkit for conversational AI and speech processing.
François is an Assistant Professor at Université de Sherbrooke (CA) in the Department of Electrical Engineering and Computer Engineering since 2020. He was a Postdoctoral Associate at the Computer Science & Artificial Intelligence Laboratory (CSAIL), at Massachusetts Institute of Technology (USA), from 2018 to 2020. He received his PhD in electrical engineering from Université de Sherbrooke (CA) in 2017. He serves as a reviewer for multiple conferences including ICASSP, INTERSPEECH, RSJ/IROS, ICRA and journals such as Transactions on Audio, Speech and Language Processing, EURASIP Journal on Audio, Speech, and Music Processing, IEEE Transactions on Robotics, IEEE Robotics and Automation Letters, IEEE Robotics and Automation Letters, Transactions on Pattern Analysis and Machine Learning. His current work involves multichannel speech enhancement, sound source localization, ego-noise suppression, sound classification, robot audition and hybrid signal processing/machine learning approaches. He contributed to the multichannel processing tools in the SpeechBrain project.
Andreas was research engineer at Université d'Avignon, where he co-maintained SpeechBrain readying it for its next major version release. Andreas served as project editor for ISO/IEC 19794-13, co-organised editions of the VoicePrivacy and ASVspoof challenges, co-initiated the ISCA SIG on Security & Privacy in Speech Communication (SPSC), was an Associate Editor for the EURASIP Journal on Audio, Speech, and Music Processing, and co-lead the 2021 Lorentz workshop on Speech as Personal Identifiable Information. By 2020, he lead multidisciplinary publication teams composed of speech & language technologists, legal scholars, cryptographers, and biometric experts; Andreas co-responded on behalf of ISCA SIG-SPSC to the public consultation of the 2021 EDPB guidelines on virtual voice assistants. In 2023, he joined goSmart to lead the solution development (architecture, design, and implementation) for private 5G based campus networks.