Cortical.io Semantic Folding Theory: And its Application in Semantic Fingerprinting

Created on 2021-01-14T20:32:59-06:00

Semantic Folding

Start with a corpus (large collection of documents.)
Break corpus down to clippings representing individual contexts.
Optimize the location of clippings on a 2D grid (ex. 128x128) such that clippings which contain the same words are closer than those which contain different words
For every unique word in the corpus: find every context which contains the word. Convert the 2D coordinates of the context to a 1D bit in the output SDR. Set that bit to hot in the output SDR.

Word SDRs

A Word-SDR is an SDR where each hot bit represents one of the contexts in the context database.

Text SDRs

Create a union of all Word-SDR's in a sentence; but instead of doing a binary OR, count the number of times the bit appears in any SDR.

Sort indices by how many times they were hot in the sequence.

Take as many indices as you need (highest count to lowest) until you have enough bits to satisfy the output SDR's sparsity requirements.

Search

Using for semantic search by creating Text-SDR of search query, finding overlaps with recorded Text-SDRs of documents in the index.

Tuning

Cherry-picking phrases which are used as contextual clippings.
Can be outsourced to a domain expert to pick representative samples.
Process will then learn the meaning of words as used in that dialect; ex. law or medicine.

Matching terms

Still requires map/reducing over the entire list of known options and keeping the one with the highest match.
Finding the *next* matching term: clear all bits from the search term which exist in the current match.

Sentences

Sentence fingerprints is done by doing an OR over the hot bits of each word in the sentence, then cutting away bits which fall below a threshold that maintains a set level of sparsity.

Other notes

"Semantic Folding Theory."
Is based on interpreting human language as SDRs for HTM Theory.
Content Addressable Memory: where a query is formed by a set of bits which forms the address for the result.
Reiterates HTM basics; sparse bit fields representing memory patterns.
Brain works as a system which stores and predicts more than one which computes.
Reiteration of HTM sparsity; no matter how large the SDR you only need to store the indices of hot bits.
Prediction SDRs: a union of SDRs which belong to the patterns being seen. These are passed through pooling layers to pick the ambiguities apart.
Generalization SDRs: an intersection of SDRs. These can be used for matching multiple concepts in the input.
Encoding words as SDRs is learning the semantic form of the word.
Special Case Experiences: every word sequence which is seen in a short period of time is stored as a special case.
When the same Word SDR is encountered in another sequence, it is intersected with the currently stored one.

does this mean feeding words in to sequence memory for each training sentence, and then defining a word by cutting out bits which appear in different contexts? ex. sequence memory learns a whole sentence with next-word contexts but you merge each sentence by keeping only bits active in all contexts of the word?

Words become an SDR where one bit = one context the word appeared in.
This can itself be used to encode words but will not have semantic understanding across contexts.
Semantic folding / second mapping step: placing contexts in a 2D grid and optimizing so similar contexts are nearer to one another than dissimilar contexts.
Words are encoded as a hot bit for each context they are contained in but now the contexts are spatially spread out as 2D coordinates.

Sparse Distributed Representations (SDR)