ObamaNet: Photo-realistic lip-sync from text
Created on 2022-05-07T18:41:49-05:00
I don't care about this paper all that much. Getting the lip features from sound has interesting use for making lip sync available to indies.
- Standard Char2Wav to perform text to speech synthesis.
- LSTM network with time delay to predict mouth shape from audio features.
- Uses WORLD vocoder to get audio features from sound track.
- Uses dlib facial landmark detector to get key points of the face
- Uses principle component analysis to identify most important keys
- Creates a wireframe rendering of the lips and then uses image to image networks to synthesize the rest of the mouth from those images