Compacted XML
Created on 2024-08-30T06:51:42-05:00
- String table holds every literal value used in the XML.
- Simple BER-like encoding carries the actual tag data (tag name, attributes, length of binary blob.)
I am a salad.
String table: 0. foo 1. bar 2. baz 3. blorp 4. I am a salad. Structure: tag (bytes xxx) (name 0) attribute (name 1) (value 2) tag (bytes 0) (name 3) content (value 4)
Reporting the byte size of a container does mean having to predict what will be there or having to buffer up output. Some small clients do not like doing this.
A similar idea could be used to store "compacted RDF" turtles.
Considerations
- FSST is a simple compressor for text strings which turns 1-8 byte strings in to offsets. That could be used if compressing the string table is desired.
- Though just slapping a compressor on a single payload is probably fine too.
- CRIME attacks happen when the whole stream is using the same dictionary; it should be fine if every payload is independently compressed.