Skip Grams
Created on 2020-08-17T21:40:53.379733
TODO how does the hidden layer get extracted to perform document vectoring
"Skip Grams" are like CBOW but it guesses multiple output words given a single input word. It excels at "small data sets with rare words."
Network
- There is an input, single hidden and output layer.
- Input works as a "one hot" of the words going in to the bag of words.
- More than one word may be in the output layer to add "context."
- Output is a "one hot" of the next predicted word.
The output layer becomes a list of probabilities for each target word.
One-hot
- One neuron for each word the network knows how to handle.
- All are set to zero while the correct word is set to one.