Engrams: Semantic Folding and Word2Vec are the Same Thing

Created on 2022-04-17T04:01:41-05:00

Return to the Index

This card can also be read via Gemini.

Word2Vec (and its similar extensions) are based on feeding in a small set of context words and having the computer learn, via the Chain Rule of calculus, to create a container vector that somehow represents an understanding of all language. Or at least, the ability to mostly predict the next word given some context. A Doc2Vec test was once done on scientific abstracts to ask the machine questions like, "what is the equivalent of this paper but for frequentist statistics instead of bayesian" and it did find a similar mathematics paper for a different discipline.

Research in to Word2Vec shows that it is possible to produce an engram for a document, perform mathematical operations on the engram, and then it will match words in the new context. For example Word2Vec and Doc2Vecs support a grammar of subtracting the engram for male from King, adding the engram for female, and it have the closest match for Queen.

Cortical.io's semantic folding theory uses a similar process in a more direct fashion. Rather than rely on calculus to create a black box it directly collects sentences and compares how many symbols overlap between each sentence. A two-dimensional graph placement algorithm spreads nodes out in 2D space based on how much overlap exists between each sentence of the training set and this 2D space is spiritually flattened to a 1D bitmap.

Cortical's whitepaper also discusses a similar algebra that context engrams can be added and subtracted from the "sparse distributed representations" to obtain similar queries.

The important aspects of the system seem to have an interesting Taoist approach: the meaning of individual symbols is entirely dependent, iteratively, on the meaning of every other symbol.

In both cases all symbols are defined solely by proximity to other symbols and the resulting codebooks consist of a navigable space. Any given position in that space has a symbol that it is closest to. Thus it is always possible to move from any concept "towards" another concept even if the resulting "closest to" makes very little sense. Which overlaps with the human brain's ability to conclude that two seemingly unrelated concepts are "basically the same."

In both cases a grammar is also defined where individual engrams are composed together to form larger fingerprints of a given piece of context. Doc2Vec tends to add an additional feature to identify which document from its training set is used for additional word context while Cortical simply ORs the resulting bit fields together to create a semantic fingerprint of a sentence from various words.

Both techniques appear to reach the same result: defining a collection of symbols purely by the existence of other symbols (with no inherent meaning) and the meaning of a symbol is wholly implied. It is also possible to travel the created semantic spaces by example.