Golomb coding
Created on 2023-06-20T01:48:06-05:00
Golomb codes are equivalent to Huffman tables but do not require transmitting the table. They also only work when there is a known distribution of symbols which is the geometric distribution.
Only works on zero or above integers. Negative values have to be shimmied in some other way (like ZigZag encoding.)
"M" is the master tunable that determines where an integer is split.
Rice codes are Golomb codes where M is a power of two.
When M is a power of two, multiply and divide ops can use cheap bit shift operators.
Golomb codes
Calculating M
Golomb codes assume symbols have a geometric distribution. The initial symbol for this is the value zero, which you have to calculate the probability of encountering in some particular coded area.
Given the probability of encountering zero (assign to P) you can estimate an M:
M = ceil(log(2-P) / log(1-P))
This may be a non-power of two. In such case you have Golomb coding. If you were to force M to be a power of two, you would have Rice coding. Rice code can use cheaper bit shifting operations for multiplication and are often close to as good.
Encoding
Determine some value for M.
Calculate q: floor(N/M)
Calculate r: = mod(N, M)
Write Q in unary encoding: q-length number of 1 bits, finished with a zero.
Write R in truncated binary encoding.
Q may also be encoded with zero bits which terminate with a 1.
Truncated Binary Encoding
Assign B to floor(log2(M))
When R is < 2^(B+1): just write B bits from R
When R is >= 2^(B+1): write r+2^(B+1)-M in binary with B+1 bits
Decoding similarly calculates B then either recovers the bits directly in the < case, or in the >= case recovers R as r-2^(B+1)+M.