Plaintext & a Metadata Problem

Markdown (and refinements like Djot) are a nice way to get prose out and formatted in to annoying markups like HTML. You have to do this so that web browsers make them look pretty. Blog engines increasingly like to turn a bag of Markdown in to a website. And there’s some note-taking program named after a shiny volcano rock that has especially popularized “huge bag of Markdown files” as a notebook.

What these things miss is, however, persistence. If I change a heading to an article later on. If a footnote is rearranged. The conversion of markup from “simple thing a human can edit with a pencil” to “something a multi-billion dollar behemoth will turn in to words” doesn’t keep any of that in mind. It will happily take the first footnote in the file and make it footnote #1. The old footnote #1 will now be footnote #2. Thus all links that have ever been made to the document can become hilariously incorrect as they point to the wrong things.

An old project called NLS/Augment¹ asked and answered about this problem. They used a combination of “structure identifiers” and “serial identifiers” to deal with content.

A structure identifier is based on navigating the structure of the document as it currently is. These look like a series of numbers that mirror the outline of the document. So header 3 might be “3” and paragraph six underneath header three might be “3.6.” Now that is also a bit wordy so they decided to switch between letters and the latin alphabet. Your structural identifier would actually become “3f.” Since digits and letters are separate character sets its completely unambiguous when you have changed depths and it shims out the need for a period.

Serial identifiers instead are a number that is monotonically increasing². Every time a new paragraph is created it gets a new and unique (for that document) code. If a paragraph is moved then the code moves with it. Thus you can link to the serial numbered³ element of a document and always arrive at precisely the citation desired.

NLS/Augment also made one final adjustment: “journals” were permanent records. Once a document exited the draft stage it was published to a “journal” where it could never be modified again. You could amend a new version–but old links would also go to the original version of the article. This was useful for documents once they were published.

NLS/Augment was however the entire working group software. You didn’t “send e-mails.” You made a document and dropped a copy in someone else’s workspace. You also write your documents in the same software. The entirety of document generation and transfer went through this same piece of software and you could hyperlink everything.

Some attempts were made to bring this idea over to the web under the name “purple numbers.” The idea was each paragraph is annotated with a number in small print and provide a hyperlink back to that exact paragraph. Since web browsers support the idea of pointing at a document and also a specific anchor in the document it doesn’t require any new machinery. But it did not really take off.

The problem here is that everything is built on the idea of going from a plain text file -> filter -> output which will never be touched again. And if someone does touch it, then its just broken and we all collectively shrug. There is a legitimate gap for plain-text formats that preserve metadata for things like “this reference has been given this identifier,” or perhaps plain-text outlines which are also able to hold on to that meta-data.

I have no idea what this format would actually look like. I presume it would be something akin to just little hash tags or some customizable marginalia to identify whenever a text object has an ID. Then there would be some kind of barrier that says “all that metadata goes here now,” and maybe some way for the metadata to be colocated near the paragraphs specifically. You would then want a suite of tools to normalize and correct such things to keep it in line. People like to edit stuff without correcting it elsewhere–tools can go “ah, this footnote was actually orphaned, would you like to delete it also?” It’s also possible to generate a JSON document of such changes to feed off to other programs.

I think you would be running in to the issues that KDL and NestedText are already “documents,” and we’re really just describing a document where the “nodes” are slapped on after the fact instead of following a rigid stricture. We’d be processing things kind of like Djot (where certain text forms block objects which are identified first) or Funnelweb (where text is woven with a specific control character) and have to specifically design around the notion that the document is going to be edited in Notepad. Still, it’s a gap, and I’m not aware of anyone having solved it.

Engelbert designed a hypertext outliner as a kind of original internet system. This was called NLS/Augment and there’s old videos of it from the early days of color video. ↩︎
It always goes up by one when incremented. It never goes down. ↩︎
Serials were popular at the time but I would probably use something else these days. Snowflakes, NanoIDs, CUIDs, and so on, if only because it bothers my OCD. ↩︎