Project description

Code style: black

A python library to generate speech dataset. Youtube Speech Data Generator also takes care of almost all your speech data preprocessing needed to build a speech dataset along with their transcriptions.


Make sure ffmpeg is installed and is set to the system path.

$ pip install youtube-tts-data-generator

Minimal start for creating the dataset

from youtube_tts_data_generator import YTSpeechDataGenerator

# First create a YTSpeechDataGenerator instance:

generator = YTSpeechDataGenerator(dataset_name='elon')

# Now create a '.txt' file that contains a list of YouTube videos that contains speeches.
# NOTE - Make sure you choose videos with subtitles.

# The above will take care about creating your dataset, creating a metadata file and trimming silence from the audios.


Final dataset structure

Once the dataset has been created, the structure under ‘your_dataset’ directory should look like:

│   ├───your_dataset1.txt
│   └───your_dataset2.txt
│    ├───your_dataset1.wav
│    └───your_dataset2.wav

NOTE – is highly based on Real Time Voice Cloning

Download files

Download the file for your platform. If you’re not sure which to choose, learn more about installing packages.

Latest posts