SourceForge.net Logo
Author:Arkadi Kagan
arkadi_kagan@hotmail.com
Document:Entropy Compression Methods.

Lempel-Ziv Compressions.

Lempel-Ziv Compression algorithms are easy divided in two main groups: LZ77 and LZ78. The bold difference between this two groups is that LZ77 do not need an explicit dictionary where LZ78 do need it.

LZ77.


The main idea of LZ77 is to find the longest match to the current part of the input stream in the already passed part of the input stream.
Lets try to build algorithm for encoding LZ77.
This is the basic algorithm for LZ77.
Notice that uncoded yet symbols can be counted as well as already coded ones. The bold example is a sequence of equal symbols. In this case the encoded stream could be like this: If for some reason it is impossible to encode the actual length of the matched sequence, it is possible to select the maximal length that is possible for encoding. The limitation for the sequence length can appear because of the way, the output of the length is done.
All that nice, however there are at least three problems that must be solved.
  1. What must be the output if while scanning backward no match was found ?
    Here are some answers:
  2. The actual output of LZ77 contain a lot of redundant data.
  3. The search for longest match require a lot of CPU time. It is possible, however, by cost of amount of memory used to speed-up the search for longest matches.

LZ78.


The main idea of LZ78 is to enhance dictionary with appearance of each unencoded symbol. Each time, if symbol found that can not be encoded, the dictionary is enhanced with new entity that is the last found entity plus one symbol.
Lets try to build algorithm for encoding by LZ78. When implementing this algorithm there are at least three problems to be solved.
  1. How to initialize the dictionary ?
    What values the initialized dictionary must contain ?
  2. How to output the dictionary value ?
    • We can decide to encode dictionary value with amount of bits corresponding to the dictionary size. The amount of bits must be big enough to represent any value that is currently present in the dictionary.
    • The dictionary values can be encoded by one of entropy coders. It can be Huffman, Arithmetic, Universal Integer Coding etc.
  3. How to store the dictionary the way to allow fast search for input sequence ?


References:

[1] Fundamental Compression Algorithms.
Compression Team (compresswww@rasip.fer.hr).
http://oldwww.rasip.fer.hr/research/compress/algorithms/fund/

[2] Data Compression.
Debra A. Lelewer and Daniel S. Hirschberg.
http://www.ics.uci.edu/~dan/pubs/DataCompression.html

[3] Lossless Compression Algorithms (Entropy Encoding).
Dave Marshall.
http://www.cs.cf.ac.uk/Dave/Multimedia/node207.html

[4] Lossless Image Compression.
Dmitry Pechyonyj - Alexander Bogomjakov.
http://www.cs.technion.ac.il/Labs/Isl/Project/Projects_done/VisionClasses/DIP/Lossless_Compression/lossless.html

[5] DEFLATE Compressed Data Format Specification version 1.3.
L. Peter Deutsch, Jean-Loup Gailly and Mark Adler.
ftp://swrinde.nde.swri.edu/pub/png/documents/zlib/rfc-deflate.html