ObamaNet: Photo-realistic lip-sync from text

Created on 2022-05-07T18:41:49-05:00

I don't care about this paper all that much. Getting the lip features from sound has interesting use for making lip sync available to indies.

Standard Char2Wav to perform text to speech synthesis.
LSTM network with time delay to predict mouth shape from audio features.
Uses WORLD vocoder to get audio features from sound track.
Uses dlib facial landmark detector to get key points of the face
Uses principle component analysis to identify most important keys
Creates a wireframe rendering of the lips and then uses image to image networks to synthesize the rest of the mouth from those images