Review of Information Theory Chapter 2 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Information Suppose that we have the source alphabet of q symbols s1, s2,.., sq, each with its probability p(si)=pi. How much information do we get when we receive one of these symbols ? For example, if p1=1, then there is no “surprise”, no information since you know what the message must be. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Information On the other hand, if the probabilities are all very different, then when a symbol with a low probability arrives, you feel more surprised, get more information, than when a symbol with a higher probability arrives. I(p) : A function which measures the amount of information-surprise, uncertainty -- in the occurrence of an event of probability p. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Information Properties of I(p) Define I(p) = log(1/p) = – log p 1) I(p) 0 (a real nonnegative measure) 2) I(p1p2) = I(p1)+I(p2) for independent event. 3) I(p) is a continuous function of p. Define I(p) = log(1/p) = – log p 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy If we get I(si) units of information when we receive the symbol si , how much do we get on the average ? Since pi is the probability of getting the information I(si), then on the average we get for each symbol si, pi I(si) = pi log(1/ pi) 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy Thus, on the average, over the whole alphabet of symbols si, we will get Conventionally, we write as the entropy of the signaling system S having symbols si and probability pi. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy Note that this is the entropy function for a distribution when all that is considered are the probabilities pi of the symbols si. Inter-pixel correlation is not considered here. If this is to be considered, we should compute the entropy function by a Markov process. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy Example : S={s1, s2, s3, s4}, p1=1/2, p2=1/4, p3= p4=1/8. Then H2(S) = 1/2 log22+ 1/4 log24+ 1/8log28 +1/8log28 = (1/2)+(1/4)2+(1/8)3+(1/8)3 = 1(3/4) bits of information. In this example, Lavg of Huffman code can reach Hr(S). But, this is not the general case. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy Consider the distribution consisting of just two events. Let p be the probability of the first symbol (event). Then, the entropy function is H2(P)= p log2(1/p) + (1- p)log2[1/(1-p)] The maximum of H2(P) occurs when p =1/2. H2(P) = I(p) 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy and Coding Instantaneous Codes : A code is said to be instantaneous if when a complete symbol is received, the receiver immediately knows this, and do not have to look further before deciding what message symbol is received. No encoded symbol of this code is a prefix of any other symbol. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy and Coding Example : s1=0 s1=0 s2=10 s2=01 s3=110 s3=011 s4=111 s4=111 instantaneous non-instantaneous 101100111- s2s3s1s4 011111111- s3s4s4 It is necessary and sufficient that an instantaneous code have no code word si which is a prefix of another code word, sj. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy and Coding Theorem. (The Kraft Inequality) A necessary and sufficient condition for the existence of an instantaneous code S of q symbols si with encoded words of lengths l1 l2…. lq is where r is the radix ( number of symbols ) of the alphabet of the encoded symbols. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy and Coding Theorem. The entropy supplies a lower bound on the average cod length Lavg for any instantaneous decodable system. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Shannon-Fano Coding Shannon-Fano coding is less efficient than is Huffman coding, but has the advantage that you can go directly from the probability pi to the code word length li. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Given the source symbols s1, s2, …, sq and their corresponding probabilities p1, p2, …, pq, then for each pi there is an integer li such that Kraft Inequality. Therefore, there is an instantaneous decodable code having these Shannon-Fano lengths. Shannon-Fano Coding 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Shannon-Fano Coding Entropy of the distribution pi : logr(1/pi) li < logr(1/pi) +1 Hr(S) = pilogr (1/pi) pili < Hr(S)+1 Thus, Hr(S) Lavg< Hr(S) +1 The entropy serves as a lower bound and part of the upper bound on the average length for the Shannon-Fano coding. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Shannon-Fano Coding l1= l2= 2, l3= l4= l5= l6=3 we then assign Example : p1 =p2= 1/4, p3 =p4 =p5 =p6=1/8 Shannon-Fano lengths: l1= l2= 2, l3= l4= l5= l6=3 we then assign s1=00 s3=100 s2=01 s4=101 s5=110 s6=111 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Shannon-Fano Coding Since Shannon-Fano coding obeys the Kraft inequality, there are always enough symbols to assign for an instantaneous code. Thus, orderly assignment leads readily to the decoding tree and the prefix condition is met. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
How Bad Is Shannon-Fano Coding ? Consider the source alphabet s1, s2 with probabilities p1= 1– , p2= (K 2) we get log2(1/p1) log2 2 =1 Thus, l1=1. But for l2, we have log2(1/p2) = log22K= K = l2 Hence where the Huffman encoding gives both code words one binary digit, Shannon-Fano has s1, a 1-bit word, and s2, a K-bit word. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
How Bad Is Shannon-Fano Coding ? Consider Lavg for Huffman encoding and Shannon-Fano encoding. Clearly, Lavg (Huffman) = 1 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
How Bad Is Shannon-Fano Coding ? For Shannon-Fano, we have Lavg =1(1– ) + K( ) = 1+[ (K-1) / ] K 1+[ (K-1) / ] 2 1+ 1/4 = 1.25 3 1+ 1/4 = 1.25 4 1+ 3/16 = 1.1875 5 1+ 1/8 = 1.125 6 1+ 5/64 = 1.078125 etc. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※ ‘
Extensions of a Code If we encode not one symbol at a time, but make up a code for blocks of symbols of the source code at a time, then we can expect to get a better approximation to the lower bound Hr(S), because the probabilities of the extension are more variable than the original probabilities. Definition : The n-th extension of a source alphabet s1, s2,…. sq symbols of the form si1si2…….sin having the probabilities Qi= pi1pi2…….pin. Each block of the n original symbols now becomes a single symbol ti with probability Qi. We label this alphabet SN=T. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Extensions of a Code Example : Encoding s1 : p1 = 2/3 s1 0 s2 : p2 = 1/3 s2 1 Lavg = 1 The second extension takes two symbols at a time. Encoding s1s1 : p11 = 4/9 s1s1 1 s1s2 : p12 = 2/9 s1s2 01 s2s1 : p21 = 2/9 s2s1 000 s2s2: p22 = 1/9 s2s2 001 Lavg = 17/18 =0.9444 Lavg for the third extension is : 76/87 = 0.93827 Lavg for the fourth extension : 0.93827. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Extensions of a Code Theorem. The entropy of the n-th extension of an alphabet is n times the entropy of the original alphabet. I.E. Hr(T) = n Hr(S). Proof. Hr(T) = Hr( ) = = 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Extensions of a Code = Thus, Hr(T)= = n Hr(S) 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Extensions of a Code Hr(Sn) Lavg(T)<Hr(Sn)+1 nHr(S) Lavg(T)<nHr(S)+1 Hr(S) Lavg(T)/n<Hr(S)+1/n ---------(A) Thus, for a sufficient large nth extension of the code, we can bring the average code word length as close as we please to the entropy Hr(S). Shannon’s noiseless coding theorem : the n-th extension of a code satisfies inequality (A). 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Extensions of a Code Since Huffman coding is optimal, we have Hr(S) (T) / n (T) / n < Hr (S) + 1/n. Note: Proof of that Huffman code is optimal can be found in many texts, including Fundamental of Computer Algorithms by Horowitz and Shani : (Greedy Method ). 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Extensions of a Code Example : s1 : p1 = 2/3, s2 : p2 = 1/3 The entropy is H2 (S) = 2/3 log2 (3/2) + (1/3)log23 = 0.9182958…. (lower bound) n (T)/n (T)/n n (T)/n (T)/n 1 1.00000 1.33333 4 0.93827 1.08333 2 0.94444 1.33333 5 0.92263 0.93333 3 0.93827 1.00000 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※ ‘
Entropy of a Markov Process For the input stream of data, a Markov process is used to handle the correlation between successive symbols. : Given that we have already seen the previous m symbols, this is the probability that the next symbol will be si. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy of a Markov Process Given that we have already seen ( are in the state) define the amount of information we get when we receive a symbol si as For example, in English, the letter ”q” is almost always followed by the letter “u”. Thus, I(“u”|”q”) 0 and I(“p”|”q”) 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy of a Markov Process In computer language, a zero memory source uses each source symbol independently of what has proceeded it. A m-memory source uses the previous m symbols, and this idea corresponds exactly to a m-th order Markov process. In general, for a m-memory source, there are q m states in the Markov process. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy of a Markov Process Example : p(a|a) = 1/3 p(b|a) = 1/3 p(c|a) = 1/3 p(a|b) = 1/4 p(b|b) = 1/2 p(c|b) = 1/4 p(a|c) = 1/4 p(b|c) = 1/4 p(c|c) = 1/2 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy of a Markov Process What is the probability distribution of the states a, b, and c ? pa•p(a|a)+pb•p(a|b)+pc•p(a|c)=pa pa•p(b|a)+pb•p(b|b)+pc•p(b|c)=pb pa•p(c|a)+pb•p(c|b)+pc•p(c|c)=pc pa +pb+pc = 1 pa = 3/11, pb = pc = 4/11. (equilibrium solution) 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy of a Markov Process Conditional entropy of the source alphabet S given that we have the sequence of symbols is defined as 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy of a Markov Process Define the entropy of the Markov system as the probabilities of being in a state times the conditional entropy of the states. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※
Entropy of a Markov Process Example : p(0|0,0) = 0.8 = p(1|1,1) p(1|0,0) = 0.2 = p(0|1,1) p(0|0,1) = 0.5 = p(1|1,0) p(1|0,1) = 0.5 = p(0|1,0) Using the obvious symmetry, we see that p(0,0) = p(1,1) and p(0,1) = p(1,0) Therefore, we must have the equations 0.4 p(0,0)–p(0,1)=0 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※ ‘
Entropy of a Markov Process Since we must be at some place, we also have p(0,0) + p(0,1) + p(1,0) + p(1,1) = 1 or 2p(0,0) + 2p(0,1) = 1 p(0,0) = 5/14 = p(1,1) ; p(0,1) = 2/14 = p(1,0). 0 0 0 0.8 5/14 4/14 0 0 1 0.2 5/14 1/14 0 1 0 0.5 2/14 1/14 0 1 1 0.5 2/14 1/14 1 0 0 0.5 2/14 1/14 1 0 1 0.5 2/14 1/14 1 1 0 0.2 5/14 1/14 1 1 1 0.8 5/14 4/14 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※ ‘
From the table we can now compute Entropy of a Markov Process From the table we can now compute 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※ ‘
Entropy of a Markov Process Since the inter-sample correlations are very high for natural digital signal, we must use Markov process to measure the entropy of that signal. m = ? 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※