Arithmetic Coding.

Author:	Arkadi Kagan
	arkadi_kagan@hotmail.com
Document:	Entropy Compression Methods.

Arithmetic Coding.

The main idea of Arithmetic coding is to represent sequence of symbols by single number of very high precision. Lets try to define the basic algorithm for Binary Arithmetic Coding.

Encoding with Binary Arithmetic Coding.

Select range [α, β). Let initial range be [0, 1).

Any range [α, β) is a subrange of the maximal range [0, 1).

Represent α and β as binary floating point numbers. Position of the floating point is known, therefore α and β can be represented as long sequences of {0, 1}^*.

Select precision L as the number of bits to represent α or β.

If a = 0, write a as (0)_L - the sequence of zeros of length L.
If a < 1 and a is the closest number to 1 - write a as a sequence of (1)_L.

Let N to be a size of the source alphabet.

Start the encoding loop.

Slice the range [α, β) to N subranges.
- Subrange, corresponding to symbol a is [α_a, β_a).
  α_a = (β - α)∑_i=0^a-1f_i
  β_a = f_a(β - α)

Read the next input symbol a.

Compute sub-range [α_a, β_a).

Adjust sub-range to be the new range [α, β).

Check if α and β are in the state of underflow:
- The most significant bit msb of α is msb[α].
- In case of underflow msb[α] must be zero.
- And msb[β] must be one.
- The opposite case must not happen. It could be a mark of defective algorithm.
- Count the bits of α and β starting from the bit next to msb. Continue count while
  - bit_i[α] is one
  - bit_i[β] is zero
Shift founded bits without any output.
- α = (α << count) ^ 2^L-1
- β = ((β << count) | (2^count - 1)) | 2^L-1
  Here β is shifted by count bits and freed place is filled by 1`s. This is in opposite to α that is filled by 0`s.
Here '|' is a sign for binary 'or' and '^' is a sign for binary 'exclusive or' operation.
Sign '<<' is a sign of binary operation 'shift to left'. Freed least significant bits are filled with zeros by '<<' operation. The bits that are more significant then bit_L are dropped away.

Increase underflow counter by count.

Check if α and β are in the state of overflow:
- Count bits of α and β, starting from the most significant while bit i of α is equal to bit i of β. Sign 'count' the number of counted bits.
  If count is not zero do the follow:
  
  Output msb[α].
  
  Output !msb[α] underflow times (if underflow is zero - output nothing).
  Here '!' mean bit value opposite to the given.
  This decision is coming from the follow thought:
  
  If msb[α] (and msb[β]) is zero, this mean that chosen range is closer to α then to β.
  
  The original value of α is a sequence of underflow 1`s. Therefore the output must be a sequence of 1`s.
  
  If msb[α] is one, the computation is symmetric.
  
  Set underflow counter to zero.
  
  Output count-1 equal bits starting from the most significant but not including msb - it was already put out.
  
  Shift all this bits away from α and β.
  New α and β are computed like this:
  
  α = α << count
  or
  α = α * 2^count
  
  β = (β << count) | (2^count - 1)
  
  Here '&' is a binary 'and' operation.
Continue the main loop until the last input symbol.
If after quiting the main loop, the underflow counter is not zero:
- Output msb[β].
- Output underflow bits of !msb[β].
Output the reminder of β until
- The last '1' is printed out
- There are no more non-zero bits in α.
The alternative is to put out all reminded bits of β - this overhead is not too big.
We want the output floating point to be in the right range. For example, if the range is [0.001, 0.1011) and we have precision only for 2 bits, then we will get α = 0.00 and β = 0.10. Here we see β is in given range, while 0.00 < 0.001 and therefore α is out of range.

Decoding of Binary Arithmetic Code.

Decoding of Arithmetic code is generally symmetric to encoding. However there can be some minor differences.

Given L - precision used for encoding, lets choose precision for decoding:

If we choose precision less then L, it will be possible to decode wrong symbols.
If decoding precision will be higher or equal to L, decoding will be right allways.

In the rest of decoding I will not differ between L and decoding precision.

Read first L coded bits to variable a.

Define range [α, β), initially [0, 1).

Start of the decoding loop.

Slice range [α, β) exactly like it was done for encoding.
Find sub-range corresponding to the binary number a. Sign symbol corresponding to this number with x.
Compute subrange exactly like it was done for encoding:
α_x` = (β - α)∑_i=0^x-1f_i
β_x` = f_x(β - α)
Adjust α and β to the corresponding α` and β`.
Check and shift underflow bits for α and β.
Shift a by underflow bits:
a = ((a << underflow) ^ 2^L-1) | (msb[a] << (L-1)).
This computation is exactly the same that was done for α.
Read the next underflow coded bits to a.
Check for overflow.
Count how many bits in α are equal to bits in β on the same position. Count starting from msb and stop on the first unmatched bit.
Shift overflow from α, β and a. Operation is the same like described for underflow:
- α = α << overflow
- β = (β << overflow) | (2^overflow - 1)
- a = a << overflow
Read the next overflow bits to a.
Continue the decoding loop.

Notice that in decoding process we do not have to keep the underflow counter. During the encoding we did not know what bits must be put out until the next overflow. In the decoding

We have this bits in a.
We do not have any use for this information.

The n-ry Arithmetic Coding.

The described coding is Binary because the output of the coding is high-precision floating point number in binary representation. The Binary Arithmetic Coding is not the only possible Arithmetic Coding. Lets try to construct n-ry Arithmetic Coder.

The algorithm of constructing n-ry Coder is quit similar to the Binary Coder including care for underflow and overflow.
We decide that we are in the underflow position if:

α_s₀ = β_s₀ - 1.
β_s₁ = 0.
α_s₁ = n - 1.

Here I signed γ_{s_i} as a digit i of number γ, starting from the most significant digit γ_s₀.
All shifts, used in the Binary Coder are now shifts of digits instead of bits. When shifting β, it must be filled with digits n-1 instead of 1`s. In the rest of this document I will refer to the Binary Arithmetic Coder.

Precision of Arithmetic Coder.

In the definition of Arithmetic Coding algorithm I was using symbol L to sign precision of values α and β. That mean L is amount of bits used to represent α and β. If precision is too low, we can end-up in the middle of encoding with α equal to β and even this interval can fit for more then one symbol. For this reason precision L must be big enough that for any symbol s, f_s can be coded by L-3 bits of binary floating point number or simple f_s ≥ ¹/_2^L-3.

Why do I insist on probability to be encoded with L-3 non-zero bits:

The maximal value of α without overflow is .0(1)_L-1 or (^{2^L-1-1}/_2^L)
The minimal value of β without overflow is .1(0)_L-1 or (^{2^L-1}/_2^L)
With given α, minimal β` that is not causing underflow is .11(0)_L-2 or (^{2^L-1+2^L-2}/_2^L)
β`-α = (^{2^L-1+2^L-2}/_2^L) - (^{2^L-1-1}/_2^L) = (^{2^L-2-1}/_2^L) = .00(1)_L-2
With given β, maximal α` that is not causing underflow is .00(1)_L-2 or (^{2^L-2-1}/_2^L)
β-α` = (^{2^L-1}/_2^L) - (^{2^L-2-1}/_2^L) = (^{2*2^L-2-2^L-2+1}/_2^L) = (^{(2-1)*2^L-2+1}/_2^L) = (^{2^L-2+1}/_2^L) = .01(0)_L-31
β`-α < β-α`, therefore β`-α is the minimal width of ranges [α, β) that can appear during the computation of Binary Arithmetic Coding and not in state of overflow or underflow.
To distinguish symbol s from the next symbol s+1, subrange [α_s, β_s) must be big enough. Particularly, β_s-α_s ≥ ¹/_2^L.
β_s-α_s = (β`-α)*f_s
(β`-α)*f_s ≥ ¹/_2^L
(^{2^L-2-1}/_2^L)*f_s ≥ ¹/_2^L
f_s ≥ (¹/_2^L-2-1)
(¹/_2^L-3) > (¹/_2^L-2-1), therefore if f_s ≥ (¹/_2^L-3) then f_s > (¹/_2^L-2-1)
¹/_2^L-3 = .(0)_L-41
This mean f_s must be big enough to be represented with not more then L-3 non-zero bits.

Gather Statistics.

The next open question is how to gather and pass statistics data.

Static coding.

By given input sequence of symbols, compute probabilities of occurrence for each symbol. By this statistics, Arithmetic Coder is able to encode the input sequence, hopefully with fewer bits than original input sequence could take.
As mentioned previously, sometimes precision L is too low for one or more input symbols, say symbol s. We have several ways to handle this situation and here are some of them:
- Instead of computing real symbol probabilities we can count symbol occurrences in the input sequence.
  f_s = ^count[s]/_{<source
  length>}.
  The limitation on probabilities remain the same,
  f_s ≥ ¹/_2^L-3, or count[s] ≥ ^{<source
  length>}/_2^L-3.
  To assure this property we can create pseudo-symbols that will approach limitation. The algorithm must loop over the symbols until each occurred symbol probability can be represented with L-3 bits.
  Notice that if representing all symbols is impossible, this algorithm will not do miracles - it will loop infinitially.
  This representation push to do probability computations on-the-place in the main part of Arithmetic Cooding algorithm too. In particular, splitting range in subranges could be rewritten so:
  β_a = (β - α)f_a = (β - α)(^count[a]/_{<source
  length>})
  α_a = (β - α)∑_i=0^a-1f_i = (β - α)∑_i=0^a-1(^count[i]/_{<source
  length>})
  β - α = (β - α)(^{2^L}/_2^L)
  (^{2^Lα_a}/_2^L) = ^{(β -
  α)2^L∑_i=0^a-1(^count[i]/_{<source
  length>})}/_2^L
  2^Lα_a = (β - α)2^L∑_i=0^a-1(^count[i]/_{<source
  length>})
  2^Lβ_a = (β - α)2^L(^count[a]/_{<source
  length>})
  This way we do not need real floating point operations. All operations can be integer operations that are usually faster.
Adaptive coding.
Adaptive scheme is intended to avoid the input stream preprocessing during the encoding. Adaptive scheme gain us with two main benefits:
1. Decompression can be started before compression is finished.
2. No need to pass statistics data together with compressed data stream.
The main disadvantages are:
1. Compression ratio is generally lower then in static coding. Although for short input sequences benefit of not sending statistics data can be more significant.
Adaptive scheme for Binary Arithmetic Coding can be achieved by two main methods:
- In the start of encoding process, assume all possible symbols to have equal probability of occurrence. After encoding each new symbol, update this symbol probability. Except for this small difference the rest of the algorithm is exactly the same as the static Arithmetic Coding algorithm. This is including the check if each possible symbol probability can be represented by L-3 bits.
- Set initial probability of occurrence for each symbol to zero. Define special symbol with non-zero probability - Escape symbol.
  This algorithm can be implemented like this:
  - Initialize all internal structures as defined by Arithmetic Coding and set probabilities table as defined above.
  - Read the next input symbol s from the input stream.
  - Check probability of symbol s.
    - If probability of symbol s, f_s can be represented by L-3 bits, f_s > ¹/_2^L-3, assign symbol t, t = s.
    - Otherwise assign t = Escape symbol.
  - Slice range [α, β) exactly like it was done for static Arithmetic Coding.
  - Encode symbol t with Arithmetic Coding by computing new α and β.
  - Check and process overflow and underflow conditions.
  - If t is Escape symbol, output uncoded symbol s.
  - Update probability of occurrence for symbol s.
  - If needed, update probability of Escape symbol. Ensure f_Esc ≥ ¹/_2^L-3.
  - Repeat loop until the last symbol is encoded.
  - Finish Arithmetic Coding by output of underflow bits and reminder.
Notice that Adaptive Coding in this form is not requiring long recomputations for each symbol that can not be represented by L-3 bits.

Both of this schemes will benefit from representing symbol probabilities by symbol counts and not real floating point probabilities. In this case updating symbol probability is single operation, where computing real probabilities require recomputation for each symbol.

Advanced possibilities for Arithmetic Coding.

MTF - Move To Front scheme is applicable to Arithmetic Coding.
PPM - Prediction by Partial Match is applicable too.
BWT - Burrows-Wheeler Transform can improve compression ratio.

References:

[1] Fundamental Compression Algorithms.
Compression Team (compresswww@rasip.fer.hr)
http://oldwww.rasip.fer.hr/research/compress/algorithms/fund/

[2] Data Compression.
Debra A. Lelewer and Daniel S. Hirschberg
http://www.ics.uci.edu/~dan/pubs/DataCompression.html

[3] Introduction to Information Theory and Data Compression.
Darrel Hankerson, Greg A. Harris and Peter D. Johnson,Jr.
http://www.dms.auburn.edu/compression/

[4] Basic Arithmetic Coding.
Arturo San Emeterio Campos
http://www.arturocampos.com/ac_arithmetic.html

[5] Compression Basics.
Pasi Albert Ojala
http://www.cs.tut.fi/~albert/Dev/pucrunch/packing.html