A python library to generate speech dataset. Youtube Speech Data Generator also takes care of almost all your speech data preprocessing needed to build a speech dataset along with their transcriptions.
Make sure ffmpeg is installed and is set to the system path.
$ pip install youtube-tts-data-generator
Minimal start for creating the dataset
from youtube_tts_data_generator import YTSpeechDataGenerator # First create a YTSpeechDataGenerator instance: generator = YTSpeechDataGenerator(dataset_name='elon') # Now create a '.txt' file that contains a list of YouTube videos that contains speeches. # NOTE - Make sure you choose videos with subtitles. generator.prepare_dataset('links.txt') # The above will take care about creating your dataset, creating a metadata file and trimming silence from the audios.
Final dataset structure
Once the dataset has been created, the structure under ‘your_dataset’ directory should look like:
your_dataset ├───txts │ ├───your_dataset1.txt │ └───your_dataset2.txt ├───wavs │ ├───your_dataset1.wav │ └───your_dataset2.wav └───metadata.csv/alignment.json
audio.py is highly based on Real Time Voice Cloning
Download the file for your platform. If you’re not sure which to choose, learn more about installing packages.