计算机网络第八部分--数据压缩(英文版本)

Coding and Decoding

Coding is a rule assigning exactly one codeword for each source symbol.

binary coding
if any codeword consists of two symbols (usually ‘0’ and ‘1’).

unique coding
is possible only when arbitrary任意的 two distinct不同的 source messages have distinct code.

block coding
uses pairwise成对的 distinct codewords of length n.
e.g., hexadecimal code 十六进制码, even parity code, ASCII code, etc

instantaneous瞬时 code
no codeword is prefix of another codeword
not all uniquely decodable codes are instantaneous
计算机网络第八部分--数据压缩(英文版本)

Block Code

计算机网络第八部分--数据压缩(英文版本)

Huffman Code

  • instantaneous (prefix) code
  • optimal最佳 symbol code
    – it encodes individual source symbols into a code of variable length
    – there is no other coding scheme that achieves shorter average codeword length
  • derived产生 based on the estimated probability of occurrence of individual source symbols
    计算机网络第八部分--数据压缩(英文版本)

Construction of Huffman code (sketch草图):

  1. list all possible symbols with their probabilities, and locate two symbols with the smallest probabilities.
  2. replace them with a single member containing both of them, whose probability is the sum of them.
  3. repeat these procedures recursively until the list contains only one member. (It can be seen like a binary tree with the original symbols at the leaves.)
  4. in order to form a codeword, trace backward the tree from the root to the leaves, labelling ‘0’ for one branch and ‘1’ for the other.

Arithmetic Code 算术码

  • codeword is not assigned to individual symbols (i.e., not symbol code)
  • represent symbols by intervals间隔
  • encode a stream of source symbols into a single fraction小数 between 0 and 1
  • slightly more efficient than Huffman code
    计算机网络第八部分--数据压缩(英文版本)

假设对FADDE编码

  • block code of length 3: 15 bits
    计算机网络第八部分--数据压缩(英文版本)
  • Huffman code: 12 bits
    计算机网络第八部分--数据压缩(英文版本)
  • arithmetic code :12 bits
    – encode with any number between 0.54256 and 0.54288 — e.g., 0.542724609375, whose binary expression is 0.100010101111.