Niah Nieuwenhuis,
Title:
Creating Naturalistic Synthetic Speech for Low-Resource Languages
Abstract:
Speech is a highly personal and effective method of communication. With new developments in machine learning and deep neural networks, speech synthesis technology has made it possible for almost anyone to have a voice. However, there exists a mismatch between the technology’s capabilities and the availability of naturalistic and personalized voices in low-resource languages. Individuals with severe speech impairments exist in all cultures, but monetary reasons drive most companies to produce speech synthesizers for languages with many users. To close this gap, we sought to create natural-sounding speech synthesizers in Spanish and Navajo, utilizing neural network architectures Tacotron 2 and Waveglow. To pilot this method while obtaining tribal approval from the Navajo Nation, we recorded a Spanish speaker in a sound-attenuated booth. After cleaning the data and removing fillers, hesitations, and non-speech sounds, just over two hours of Spanish speech remained. All speech was transcribed and extracted into WAV files that were 0.5 to 12 seconds long. Tacotron 2 and WaveGlow were used to create a language model and synthesize speech. Listeners evaluated the naturalness of 60 synthesized sentences on a scale of one to five. Our results highlight the feasibility of creating synthetic speech for other low-resource languages, to provide appropriate and individualized voices for all.
Nieuwenhuis, Niah
Category
Poster Presentation
Description
Session 3: 12:30-2:00 pm
18