Review of Information Theory

Slides:



Advertisements
Similar presentations
Chapter 2 Combinatorial Analysis 主講人 : 虞台文. Content Basic Procedure for Probability Calculation Counting – Ordered Samples with Replacement – Ordered.
Advertisements

英语中考复习探讨 如何写好书面表达 宁波滨海学校 李爱娣. 近三年中考试题分析 评分标准 试卷评分与练习 (2009 年书面表达为例 ) 影响给分的因素: 存在问题 书面表达高分技巧 建议.
第七课:电脑和网络. 生词 上网 vs. 网上 我上网看天气预报。 今天早上看了网上的天气预报。 正式 zhèngshì (报告,会议,纪录) 他被这所学校正式录取 大桥已经落成,日内就可以正式通车 落伍 luòw ǔ 迟到 chídào 他怕迟到,六点就起床了.
Time Objectives By the end of this chapter, you will be able to
:sisu Password:
第三章 隨機變數.
Chapter 8 Liner Regression and Correlation 第八章 直线回归和相关
沐阳老年社区.
深層學習 暑期訓練 (2017).
3-3 Modeling with Systems of DEs
Euler’s method of construction of the Exponential function
CH1 Number Systems and Conversion
one Counting units 2 ones 3 ones.
Been During the Vacation?
Module 5 Shopping 第2课时.
Applications of Digital Signal Processing
Population proportion and sample proportion
模式识别 Pattern Recognition
編碼 用於資料傳輸及壓縮 漢明碼 霍夫曼編碼.
Chapter 4 歸納(Induction)與遞迴(Recursion)
学练优英语教学课件 八年级(上) it! for Go
微積分網路教學課程 應用統計學系 周 章.
樹狀結構 陳怡芬 2018/11/16 北一女中資訊專題研究.
非線性規劃 Nonlinear Programming
Write a letter in a proper format
Sampling Theory and Some Important Sampling Distributions
Course 9 NP Theory序論 An Introduction to the Theory of NP
微程序控制器 刘鹏 Dept. ISEE Zhejiang University
Short Version : 6. Work, Energy & Power 短版: 6. 功,能和功率
Time Objectives By the end of this chapter, you will be able to
This Is English 3 双向视频文稿.
Remember the five simple rules to be happy 快樂的五個簡單常規
Interval Estimation區間估計
子博弈完美Nash均衡 我们知道,一个博弈可以有多于一个的Nash均衡。在某些情况下,我们可以按照“子博弈完美”的要求,把不符合这个要求的均衡去掉。 扩展型博弈G的一部分g叫做一个子博弈,如果g包含某个节点和它所有的后继点,并且一个G的信息集或者和g不相交,或者整个含于g。 一个Nash均衡称为子博弈完美的,如果它在每.
塑膠材料的種類 塑膠在模具內的流動模式 流動性質的影響 溫度性質的影響
客户服务 询盘惯例.
第十五课:在医院看病.
句子成分的省略(1).
Chp.4 The Discount Factor
Remember the five simple rules to be happy 快樂的五個簡單常規
Mechanics Exercise Class Ⅰ
Guide to a successful PowerPoint design – simple is best
每周三交作业,作业成绩占总成绩的15%; 平时不定期的进行小测验,占总成绩的 15%;
Chp.4 The Discount Factor
BORROWING SUBTRACTION WITHIN 20
Talk about a stomach ache that was caused by eating leftover food.
Common Qs Regarding Earnings
关联词 Writing.
中考英语阅读理解 完成句子命题与备考 宝鸡市教育局教研室 任军利
Chp.4 The Discount Factor
Remember the five simple rules to be happy 快樂的五個簡單常規
Remember the five simple rules to be happy 快樂的五個簡單常規
Introduction to Probability Theory ‧1‧
冀教版 九年级 Lesson 20: Say It in Five.
Remember the five simple rules to be happy 快樂的五個簡單常規
CHAPTER 6 Concurrency:deadlock And Starvation
李宏毅專題 Track A, B, C 的時間、地點開學前通知
Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨
名词从句(2).
赵才荣 同济大学,电子与信息工程学院,智信馆410室
English article read(英文文章閱讀)
5. Combinational Logic Analysis
2 Number Systems, Operations, and Codes
怎樣把同一評估 給與在不同班級的學生 How to administer the Same assessment to students from Different classes and groups.
Introduction to Computer Security and Cryptography
991 中大英語自學小組 English Study Group
Principle and application of optical information technology
Unit 1 Book 8 A land of diversity
Gaussian Process Ruohua Shi Meeting
Hybrid fractal zerotree wavelet image coding
Presentation transcript:

Review of Information Theory Chapter 2 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Information Suppose that we have the source alphabet of q symbols s1, s2,.., sq, each with its probability p(si)=pi. How much information do we get when we receive one of these symbols ? For example, if p1=1, then there is no “surprise”, no information since you know what the message must be. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Information On the other hand, if the probabilities are all very different, then when a symbol with a low probability arrives, you feel more surprised, get more information, than when a symbol with a higher probability arrives. I(p) : A function which measures the amount of information-surprise, uncertainty -- in the occurrence of an event of probability p. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Information Properties of I(p) Define I(p) = log(1/p) = – log p 1) I(p)  0 (a real nonnegative measure) 2) I(p1p2) = I(p1)+I(p2) for independent event. 3) I(p) is a continuous function of p. Define I(p) = log(1/p) = – log p 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy If we get I(si) units of information when we receive the symbol si , how much do we get on the average ? Since pi is the probability of getting the information I(si), then on the average we get for each symbol si, pi I(si) = pi log(1/ pi) 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy Thus, on the average, over the whole alphabet of symbols si, we will get Conventionally, we write as the entropy of the signaling system S having symbols si and probability pi. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy Note that this is the entropy function for a distribution when all that is considered are the probabilities pi of the symbols si. Inter-pixel correlation is not considered here. If this is to be considered, we should compute the entropy function by a Markov process. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy Example : S={s1, s2, s3, s4}, p1=1/2, p2=1/4, p3= p4=1/8. Then H2(S) = 1/2 log22+ 1/4 log24+ 1/8log28 +1/8log28 = (1/2)+(1/4)2+(1/8)3+(1/8)3 = 1(3/4) bits of information. In this example, Lavg of Huffman code can reach Hr(S). But, this is not the general case. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy Consider the distribution consisting of just two events. Let p be the probability of the first symbol (event). Then, the entropy function is H2(P)= p log2(1/p) + (1- p)log2[1/(1-p)] The maximum of H2(P) occurs when p =1/2. H2(P) = I(p) 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy and Coding Instantaneous Codes : A code is said to be instantaneous if when a complete symbol is received, the receiver immediately knows this, and do not have to look further before deciding what message symbol is received. No encoded symbol of this code is a prefix of any other symbol. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy and Coding Example : s1=0 s1=0 s2=10 s2=01 s3=110 s3=011 s4=111 s4=111 instantaneous non-instantaneous 101100111- s2s3s1s4 011111111- s3s4s4 It is necessary and sufficient that an instantaneous code have no code word si which is a prefix of another code word, sj. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy and Coding Theorem. (The Kraft Inequality) A necessary and sufficient condition for the existence of an instantaneous code S of q symbols si with encoded words of lengths l1 l2….  lq is where r is the radix ( number of symbols ) of the alphabet of the encoded symbols. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy and Coding Theorem. The entropy supplies a lower bound on the average cod length Lavg for any instantaneous decodable system. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Shannon-Fano Coding Shannon-Fano coding is less efficient than is Huffman coding, but has the advantage that you can go directly from the probability pi to the code word length li. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Given the source symbols s1, s2, …, sq and their corresponding probabilities p1, p2, …, pq, then for each pi there is an integer li such that Kraft Inequality. Therefore, there is an instantaneous decodable code having these Shannon-Fano lengths. Shannon-Fano Coding 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Shannon-Fano Coding Entropy of the distribution pi : logr(1/pi)  li < logr(1/pi) +1  Hr(S) =  pilogr (1/pi)   pili < Hr(S)+1 Thus, Hr(S)  Lavg< Hr(S) +1 The entropy serves as a lower bound and part of the upper bound on the average length for the Shannon-Fano coding. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Shannon-Fano Coding l1= l2= 2, l3= l4= l5= l6=3 we then assign Example : p1 =p2= 1/4, p3 =p4 =p5 =p6=1/8 Shannon-Fano lengths: l1= l2= 2, l3= l4= l5= l6=3 we then assign s1=00 s3=100 s2=01 s4=101 s5=110 s6=111 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Shannon-Fano Coding Since Shannon-Fano coding obeys the Kraft inequality, there are always enough symbols to assign for an instantaneous code. Thus, orderly assignment leads readily to the decoding tree and the prefix condition is met. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

How Bad Is Shannon-Fano Coding ? Consider the source alphabet s1, s2 with probabilities p1= 1– , p2= (K  2) we get log2(1/p1)  log2 2 =1 Thus, l1=1. But for l2, we have log2(1/p2) = log22K= K = l2 Hence where the Huffman encoding gives both code words one binary digit, Shannon-Fano has s1, a 1-bit word, and s2, a K-bit word. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

How Bad Is Shannon-Fano Coding ? Consider Lavg for Huffman encoding and Shannon-Fano encoding. Clearly, Lavg (Huffman) = 1 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

How Bad Is Shannon-Fano Coding ? For Shannon-Fano, we have Lavg =1(1– ) + K( ) = 1+[ (K-1) / ] K 1+[ (K-1) / ] 2 1+ 1/4 = 1.25 3 1+ 1/4 = 1.25 4 1+ 3/16 = 1.1875 5 1+ 1/8 = 1.125 6 1+ 5/64 = 1.078125 etc. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※ ‘

Extensions of a Code If we encode not one symbol at a time, but make up a code for blocks of symbols of the source code at a time, then we can expect to get a better approximation to the lower bound Hr(S), because the probabilities of the extension are more variable than the original probabilities. Definition : The n-th extension of a source alphabet s1, s2,…. sq symbols of the form si1si2…….sin having the probabilities Qi= pi1pi2…….pin. Each block of the n original symbols now becomes a single symbol ti with probability Qi. We label this alphabet SN=T. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Extensions of a Code Example : Encoding s1 : p1 = 2/3 s1 0 s2 : p2 = 1/3 s2 1 Lavg = 1 The second extension takes two symbols at a time. Encoding s1s1 : p11 = 4/9 s1s1 1 s1s2 : p12 = 2/9 s1s2 01 s2s1 : p21 = 2/9 s2s1 000 s2s2: p22 = 1/9 s2s2 001 Lavg = 17/18 =0.9444 Lavg for the third extension is : 76/87 = 0.93827 Lavg for the fourth extension : 0.93827. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Extensions of a Code Theorem. The entropy of the n-th extension of an alphabet is n times the entropy of the original alphabet. I.E. Hr(T) = n Hr(S). Proof. Hr(T) = Hr( ) = = 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Extensions of a Code = Thus, Hr(T)= = n Hr(S) 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Extensions of a Code Hr(Sn) Lavg(T)<Hr(Sn)+1  nHr(S) Lavg(T)<nHr(S)+1  Hr(S) Lavg(T)/n<Hr(S)+1/n ---------(A) Thus, for a sufficient large nth extension of the code, we can bring the average code word length as close as we please to the entropy Hr(S). Shannon’s noiseless coding theorem : the n-th extension of a code satisfies inequality (A). 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Extensions of a Code Since Huffman coding is optimal, we have Hr(S)  (T) / n  (T) / n < Hr (S) + 1/n. Note: Proof of that Huffman code is optimal can be found in many texts, including Fundamental of Computer Algorithms by Horowitz and Shani : (Greedy Method ). 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Extensions of a Code Example : s1 : p1 = 2/3, s2 : p2 = 1/3 The entropy is H2 (S) = 2/3 log2 (3/2) + (1/3)log23 = 0.9182958…. (lower bound) n (T)/n (T)/n n (T)/n (T)/n 1 1.00000 1.33333 4 0.93827 1.08333 2 0.94444 1.33333 5 0.92263 0.93333 3 0.93827 1.00000 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※ ‘

Entropy of a Markov Process For the input stream of data, a Markov process is used to handle the correlation between successive symbols. : Given that we have already seen the previous m symbols, this is the probability that the next symbol will be si. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy of a Markov Process Given that we have already seen ( are in the state) define the amount of information we get when we receive a symbol si as For example, in English, the letter ”q” is almost always followed by the letter “u”. Thus, I(“u”|”q”)  0 and I(“p”|”q”)   2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy of a Markov Process In computer language, a zero memory source uses each source symbol independently of what has proceeded it. A m-memory source uses the previous m symbols, and this idea corresponds exactly to a m-th order Markov process. In general, for a m-memory source, there are q m states in the Markov process. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy of a Markov Process Example : p(a|a) = 1/3 p(b|a) = 1/3 p(c|a) = 1/3 p(a|b) = 1/4 p(b|b) = 1/2 p(c|b) = 1/4 p(a|c) = 1/4 p(b|c) = 1/4 p(c|c) = 1/2 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy of a Markov Process What is the probability distribution of the states a, b, and c ? pa•p(a|a)+pb•p(a|b)+pc•p(a|c)=pa pa•p(b|a)+pb•p(b|b)+pc•p(b|c)=pb pa•p(c|a)+pb•p(c|b)+pc•p(c|c)=pc pa +pb+pc = 1  pa = 3/11, pb = pc = 4/11. (equilibrium solution) 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy of a Markov Process Conditional entropy of the source alphabet S given that we have the sequence of symbols is defined as 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy of a Markov Process Define the entropy of the Markov system as the probabilities of being in a state times the conditional entropy of the states. 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※

Entropy of a Markov Process Example : p(0|0,0) = 0.8 = p(1|1,1) p(1|0,0) = 0.2 = p(0|1,1) p(0|0,1) = 0.5 = p(1|1,0) p(1|0,1) = 0.5 = p(0|1,0) Using the obvious symmetry, we see that p(0,0) = p(1,1) and p(0,1) = p(1,0) Therefore, we must have the equations 0.4 p(0,0)–p(0,1)=0 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※ ‘

Entropy of a Markov Process Since we must be at some place, we also have p(0,0) + p(0,1) + p(1,0) + p(1,1) = 1 or 2p(0,0) + 2p(0,1) = 1  p(0,0) = 5/14 = p(1,1) ; p(0,1) = 2/14 = p(1,0). 0 0 0 0.8 5/14 4/14 0 0 1 0.2 5/14 1/14 0 1 0 0.5 2/14 1/14 0 1 1 0.5 2/14 1/14 1 0 0 0.5 2/14 1/14 1 0 1 0.5 2/14 1/14 1 1 0 0.2 5/14 1/14 1 1 1 0.8 5/14 4/14 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※ ‘

From the table we can now compute Entropy of a Markov Process From the table we can now compute 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※ ‘

Entropy of a Markov Process Since the inter-sample correlations are very high for natural digital signal, we must use Markov process to measure the entropy of that signal. m = ? 2019/2/24 資料壓縮 ※ 第二章 資訊理論之觀念 ※