Document Vectors (Doc2Vec)

Created on 2020-11-29T14:15:34-06:00

Start with CBOW but add a new vector to identify which document the words were inside of.
Instead of skip grams there is "PV-DBOW"; the input layer is the paragraph ID and output are words from the documents.
PV-DBOW: Paragraph Vector Distributed Bag of Words

Wisio also tested adding a "tag vector" which means each document has one or more tags attached. This is not unique (like the document ID is.) It can be used (via cosine similarity of the hidden layers?) to determine which tags are "most like" a given input document.

References

A gentle introduction to Doc2Vec