SourceForge.net Logo
Author:Arkadi Kagan
arkadi_kagan@hotmail.com
Document:Entropy Compression Methods.

Arithmetic Coding.

The main idea of Arithmetic coding is to represent sequence of symbols by single number of very high precision. Lets try to define the basic algorithm for Binary Arithmetic Coding.

Encoding with Binary Arithmetic Coding.

Decoding of Binary Arithmetic Code.


Decoding of Arithmetic code is generally symmetric to encoding. However there can be some minor differences.
Notice that in decoding process we do not have to keep the underflow counter. During the encoding we did not know what bits must be put out until the next overflow. In the decoding
  1. We have this bits in a.
  2. We do not have any use for this information.

The n-ry Arithmetic Coding.


The described coding is
Binary because the output of the coding is high-precision floating point number in binary representation. The Binary Arithmetic Coding is not the only possible Arithmetic Coding. Lets try to construct n-ry Arithmetic Coder.

The algorithm of constructing n-ry Coder is quit similar to the Binary Coder including care for underflow and overflow.
We decide that we are in the underflow position if: Here I signed γsi as a digit i of number γ, starting from the most significant digit γs0.
All shifts, used in the Binary Coder are now shifts of digits instead of bits. When shifting β, it must be filled with digits n-1 instead of 1`s. In the rest of this document I will refer to the Binary Arithmetic Coder.

Precision of Arithmetic Coder.


In the definition of Arithmetic Coding algorithm I was using symbol L to sign precision of values α and β. That mean L is amount of bits used to represent α and β. If precision is too low, we can end-up in the middle of encoding with α equal to β and even this interval can fit for more then one symbol. For this reason precision L must be big enough that for any symbol s, fs can be coded by L-3 bits of binary floating point number or simple fs1/2L-3.

Why do I insist on probability to be encoded with L-3 non-zero bits:

Gather Statistics.


The next open question is how to gather and pass statistics data. Both of this schemes will benefit from representing symbol probabilities by symbol counts and not real floating point probabilities. In this case updating symbol probability is single operation, where computing real probabilities require recomputation for each symbol.

Advanced possibilities for Arithmetic Coding.




References:

[1] Fundamental Compression Algorithms.
Compression Team (compresswww@rasip.fer.hr)
http://oldwww.rasip.fer.hr/research/compress/algorithms/fund/

[2] Data Compression.
Debra A. Lelewer and Daniel S. Hirschberg
http://www.ics.uci.edu/~dan/pubs/DataCompression.html

[3] Introduction to Information Theory and Data Compression.
Darrel Hankerson, Greg A. Harris and Peter D. Johnson,Jr.
http://www.dms.auburn.edu/compression/

[4] Basic Arithmetic Coding.
Arturo San Emeterio Campos
http://www.arturocampos.com/ac_arithmetic.html

[5] Compression Basics.
Pasi Albert Ojala
http://www.cs.tut.fi/~albert/Dev/pucrunch/packing.html