youtube-tts-data-generator

Project description

License
Code style: black

A python library to generate speech dataset. Youtube Speech Data Generator also takes care of almost all your speech data preprocessing needed to build a speech dataset along with their transcriptions.

Installation

Make sure ffmpeg is installed and is set to the system path.

$ pip install youtube-tts-data-generator

Minimal start for creating the dataset

from youtube_tts_data_generator import YTSpeechDataGenerator

# First create a YTSpeechDataGenerator instance:

generator = YTSpeechDataGenerator(dataset_name='elon')

# Now create a '.txt' file that contains a list of YouTube videos that contains speeches.
# NOTE - Make sure you choose videos with subtitles.

generator.prepare_dataset('links.txt')
# The above will take care about creating your dataset, creating a metadata file and trimming silence from the audios.

Usage

Final dataset structure

Once the dataset has been created, the structure under ‘your_dataset’ directory should look like:

your_dataset
├───txts
│   ├───your_dataset1.txt
│   └───your_dataset2.txt
├───wavs
│    ├───your_dataset1.wav
│    └───your_dataset2.wav
└───metadata.csv/alignment.json

NOTE – audio.py is highly based on Real Time Voice Cloning

Download files

Download the file for your platform. If you’re not sure which to choose, learn more about installing packages.

Latest posts