SourceForge.net Logo
Author:Arkadi Kagan
arkadi_kagan@hotmail.com
Document:Entropy Compression Methods.

Auxilary Compression Transforms.

The content of this page is not a set of actual compression algorithms. It is rather a set of known transforms that make the compression step more efficient.

BWT - Burrows-Wheeler Transform.

Burrows and Wheeler [1] proved that after transform called BWT the actual encoding by one of Entropy Coding algorithms can gain its compression ratio. The original authors compression was to chain algorithms this way:
BWT --> MTF --> RLE-0 --> Entropy Coder
But there are a lot of variants. I will cover only the transform itself.

Forward BWT.

Inverse BWT.


The inverse transform is not too intuitive, but Burrows and Wheeler [1] proved it to be right. Here is explanation why Inverse BWT work.


MTF - Move To Front.


The idea of algorithm is to compress instead of arbitrary symbols, indexes of symbols specially ordered. This algorithm can be used as pre-processing step for LZ, Huffman, Arithmetic Codings and others. Decoding of MTF is stay-forward, however here it is:

RLE - Run Length Encoding.


In opposite to other algorithms in this document, RLE is indeed compression. However, compression ratio of RLE is so small that it is very rarely used as real compression method. One of once popular Animator Pro file formats ".cli" is indeed using RLE as its only compression method. Other usages of RLE that I know are using RLE as preprocessing or postprocessing method. Decoding RLE is symmetric to encoding. In this description I was using Limit constant. This constant depend on the way, how symbols and "count" are encoded. For example, if symbols (including Escape) and count are encoded one byte each, Limit could be 3 symbols. The idea of choosing Limit constant is to avoid unnecessary encoding of symbol sequences that are shorter without encoding.

RLE-0 - Zero Run Length Encoding.


If we precisely know the most frequent symbol, we can improve RLE coding to encode only this symbol and leave all others uncoded. There are endless variants to implement such basic idea, however RLE-0 is used in popular
[1] implementation of BWT. y = ↓[x] here is a sign of round-down - find the maximal integer y, y ≤ x and x is real.
Decoding RLE-0 is symmetric to encoding, however, if we determined sequence of x symbols from the group {0a, 0b}, we must count what number m was encoded by this sequence.
If decoded binary number can be signed N and m = 2x + N - 1, then Here we found that source was containing m sequential 0`s that can be putted out in the decoding process.

Elias Universal Coding of integers.


Elias Universal Coding is intended for universal coding of individual integers that can be distinguished in unmarked stream of bits. This method can provide us with relatively compact representation of unlimited integers.
Notice that Elias coding is not "Prefix" coding - several numbers can start from the same sequence.
The natural assumption when choosing Elias Coding is that smaller numbers are more frequent then bigger numbers.
Elias Universal code can be of two types.

Fibonacci Universal Coding of integers.


Fibonacci Code is intended for universal coding of individual integers that can be distinguished in unmarked stream of bits. The intention of Fibonacci Code is similar to Elias Code, however the method of encoding is quit different.
Encoding Fibonacci Universal Code for given number x. Decode a sequence of bits, coded with Fibonacci code. Notice the opposite order of bits in encoding and decoding.

References:

[1] A Block-sorting Lossless Data Compression Algorithm.
M. Burrows and D.J. Wheeler.
http://gatekeeper.dec.com/pub/DEC/SRC/research-reports/abstracts/src-rr-124.html

[2] Improvements to Burrows-Wheeler Compression Algorithm.
Sebastian Deorowicz.
http://www-zo.iinf.polsl.gliwice.pl/~sdeor/pub/deo00.ps

[3] Compression Basics.
Pasi Albert Ojala.
http://www.cs.tut.fi/~albert/Dev/pucrunch/packing.html

[4] Conversion between Fibonacci Code and Integers.
Lawrence J. Krakauer.
http://ljkrakauer.com/weight.htm

[5] Data Compression with the Burrows-Wheeler Transform.
Mark Nelson.
http://dogma.net/markn/articles/bwt/bwt.htm