The VCDIFF Generic Differencing and Compression Data Format

Created on 2023-06-13T06:37:50-05:00

Return to the Index

This card pertains to a resource available on the internet.

This card can also be read via Gemini.

   1. Glossary

     1a. Target File

       1a1. The file you want to have.

     1b. Source File

       1b1. The file you have on hand already.

     1c. Deltas

       1c1. The changes you need to make to turn the source file in to the
            target file.

   2. Stated goals

     2a. Output compactness

       2a1. Provides a basic encoding format for dealing with patches.

       2a2. Applications can add additional layers to get better compression if
            needed.

     2b. Data portability

       2b1. Machine byte order and word size issues are worked around.

       2b2. Base unit of measure is the 8-bit byte.

     2c. Algorithm genericity

       2c1. VCDiff only specifies a language to apply patch data; it leaves the
            way you arrive at those changes undefined on purpose.

     2d. Decoding efficiency

       2d1. Uses only byte-aligned operations to avoid the need for bit
            operations.

   3. Integer encoding

     3a. Variable length; each chunk is an 8-bit byte. Most significant bit
         determines if another byte must be read to complete the integer. Values
         are stored in the least significant 7 bits.

   4. Windows

     4a. There is a "source" and "target" window 

     4b. These windows are put together in a "superstring" called U.

       4b1. The superstring is the equivalent of concatenating all bytes of the
            source and target window together.

     4c. Target window is initially blank when reconstructing a file--but is
         appended to as delta instructions are followed.

   5. Instructions

     5a. Instructions apply within the context of a Window.

     5b. Instructions are allowed to access indices which occur beyond the
         source window. In that case data is being referenced from data that has
         already been emitted to the target window. This is allowed as long as
         the data has already been pushed to the target and you are only
         referencing something you already injected or copied.

     5c. ADD

       5c1. Holds the number of bytes to be added, and the payload to be
            injected directly.

     5d. COPY

       5d1. Holds the number of bytes to be copied from the source window, and
            an offset to the window to copy from.

     5e. RUN

       5e1. As in, "run length encoding."

       5e2. Holds a count and a byte. The byte is repeated `count` number of
            times.

   6. File layout

     6a. There are exact byte specifications for how instructions should be
         encoded in to the file. I am not providing those here.

     6b. Header

     6c. Windows

       6c1. Targets a size and offset from a source file.

       6c2. Contains the instruction set to run to perform the transformation.