VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop
Created on 2022-06-06T19:11:09-05:00
Working memory takes the form of a matrix where columns are continually shifted off the end and removed while a new column is inserted at the front.
Neurons are connected to the working memory and learn to read variables out of the active memory.
Network outputs a new column to the front of its working memory system.
Input uses a soft attention model to select only a subset of inputs to process and this is integrated with the phonological loop and the output features are fed to both the output translation layer and also copied to the front of the loop.
A block of speaker values is kept for each unique speaker in the training set.
Fitting new speakers from samples is done by adapting an existing speaker and without changing the loop or attention parameters.