DataCite Commons: Localizing Fake Segments in Speech

Partial Synthetic Detection (Psynd) dataset is a multi-speaker English corpus of 2294 utterances, approximately 13 hours English speech at 24kHz sampling rate. It is derived from LibriTTS , a read English speech corpus (all real voices) designed for TTS research. The data samples are real utterances injected with voice cloning synthetic speech. The fake parts are generated by state-of-art multi-speaker text-to-speech method and have high similarity with target speakers characterized by Global Style Token (GST) and X-Vector.

citation: SIM MONG CHENG, TERENCE, BOWEN ZHANG (2022-06-20). Localizing Fake Segments in Speech. 1.0. ScholarBank@NUS Repository. [Dataset].

TERENCE SIM MONG CHENG
BOWEN ZHANG

National University Of Singapore		Data Manager
National University Of Singapore		Hosting Institution

DOI registered June 24, 2022 via DataCite

Localizing Fake Segments in Speech

Cite as

Download Reports

Localizing Fake Segments in Speech

Cite as

Download Reports

Localizing Fake Segments in Speech

Cite as

Download Reports

Share

Localizing Fake Segments in Speech

Cite as

Download Reports

Share