RoLZ - The Reduced Offset LZ Data Compression Algorithm

Created on 2024-08-30T06:00:57-05:00

LZ77 works by replacing a stream with control codes with offsets and lengths to a previous instance of the same data in the stream. So if a word is used twice the second use of the word becomes a code to walk back some bytes and re-insert the payload.

LZ77 windows can include text which has just been emitted; as in being able to copy data we just wrote.

The current input context is hashed, a table is consulted with this hash to find a list of collision nodes.

Walk the nodes from most recent to least recent. Record how many symbols matched at each node against the current position and the lookahead buffer. Keep the match with the longest amount of symbols.

Insert the current position in front of the collision node list. Write the length of the largest match here and the index to the input where it was found.

RoLZ does not write the offset+size tuple. It only writes the length of a match for the current context window. To decode you consult the hash table and look for the most recent entry with the same length matching tag.