Pre-trained Models and Fine-Tuning with drawing

Training DNN models is often very time-consuming and expensive. For this reason, whenever it is possible, using off-the-shelf pretrained models can be convenient in various scenarios. We provide a simple and straightforward way to download and instantiate a state-of-the-art pretrained-model from drawing HuggingFace drawing and use it either for direct inference or or fine-tuning/knowledge distillation or whatever new fancy technique you can come up with!

Open in Google Colab

Data Loading for Big Datasets and Shared Filesystems

Do you have a huge dataset stored in a shared file system? This tutorial will show you how to load large datasets from the shared file system and use them for training a neural network with SpeechBrain. In particular, we describe a solution based on the WebDataset library, that is easy to integrate within the SpeechBrain toolkit.

Open in Google Colab

Text Tokenizer

Machine Learning tasks that process text may contain thousands of vocabulary words which leads to models dealing with huge embeddings as input/output (e.g. for one-hot-vectors and ndim=vocabulary_size). This causes an important consumption of memory, complexe computations, and more importantly, sub-optimal learning due to extremely sparse and cumbersome one-hot vectors. In this tutorial, we provide all the basics needed to correctly use the SpeechBrain Tokenizer relying on SentencePiece (BPE and unigram).

Open in Google Colab