Voice Conversion Algorithm Based on Gaussian Mixture Model with Dynamic Frequency Warping of Straight Spectrum
Created on 2022-05-16T22:42:24-05:00
Associates the speech features of speaker A with features of speaker B using a gaussian mixture system.
Run features of speaker A through machine to predict the same features if B had done it.
Time stretching algorithm is used to compensate for differences in lengths in identical transitions of both voices.
Quality is lost due to analysis and subsequent resynthesis in the vocoder as well as due to smoothing introduced by averages in the model system.
Includes a pass which does regression on residuals to make up for very small differences after stretching.
General mean opinion score is very bad; control conditions were 5 and synthesized patches were ranked as 3's.