Skip to main content
SearchLoginLogin or Signup

Learning to Model Prosodic and Spectral Features for Non-parallel Emotive Speech Conversion

Published onJun 08, 2021
Learning to Model Prosodic and Spectral Features for Non-parallel Emotive Speech Conversion
·

Abstract

Emotion conversion in speech has attracted recent attention owing to its importance in human-machine interaction and the current high quality of speech synthesis. Most existing approaches rely on parallel data, which is not available in many real-time applications. We propose a non-parallel emotion conversion approach based on the cycle generative adversarial network (cycleGAN) framework. We introduce new variants of cycleGAN that use recurrent neural networks and multi-kernel convolutional neural networks for modeling prosodic features along with spectral features for emotion conversion in speech. Subjective evaluation results show the effectiveness of our approach in converting natural speech, and also unseen synthesized speech samples to different target emotive states.

Article ID: 2021L22

Month: May

Year: 2021

Address: Online

Venue: Canadian Conference on Artificial Intelligence

Publisher: Canadian Artificial Intelligence Association

URL: https://caiac.pubpub.org/pub/dg4q12n6/

Comments
0
comment
No comments here
Why not start the discussion?