A scalable sequence encoding for collaborative editing

Created on 2021-01-30T22:55:57-06:00

Return to the Index

This card pertains to a resource available on the internet.

This card can also be read via Gemini.

Describes the LSeq CDRT algorithm as used in the CRATE editor.

Garbage collection by consensus: the community votes/syncs the deleted set, when items are deleted by all members the tombstones may be cleared.

Variable size identifiers: when what identifies a particular change in the sequence has a name that varies, for example top levels are "1 2" while "1a" and "1a1" are created to fit specifically between other identifiers.

Overview

to insert
    p, q <- convert to path(previous, next)
    new path <- alloc path(p, q)
    new dis <- alloc dis(p, new path, q)
    broadcast insertion(new path, a, new dis)

When receiving changes from the wire perform a peer's insert patch before thedelete patch.

Tombstones are not mentioned but disambiguators are. Do these serve the same purpose? I do not know how a deletion can be propagated without tombstoning to some degree.

"Disambiguators," "paths" and so forth are about finding the smallest way to represent a particular coordinate.

For example:

Three nodes added are numbered 1, 2 and 3.

A node is to be inserted between 1 and 2. This new node is to be 1a.

A node is to be inserted between 1a and 2. Valid paths would be 1a1 or 1b. The shorter path is 1b and so that should be used.

Tree can become unbalanced if editing is focused primarily around a small number of sections.

Competing algorithms

Tombstone CRDTs (WOOT accumulate tombstones and so grow indefinitely over time.

Variable-length tools (Logoot, TreeDoc) identifiers grow and require a consensus cleanup step where the trees are compacted.

LSeq has a slower infinite growth of identifier names without cleanup than Logoot and TreeDoc.

LogootSplit is "no worse" at identifier growht than LSeq, TreeDoc, Logoot, but sometimes is better.