Randomized Algorithms The Chernoff bound Speaker: Chuang-Chieh Lin Advisor: Professor Maw-Shang Chang National Chung Cheng University 2018/12/8
Computation Theory Lab, CSIE, CCU, Taiwan Outline Introduction The Chernoff bound Markov’s Inequality The Moment Generating Functions The Chernoff Bound for a Sum of Poisson Trials The Chernoff Bound for Special cases Set Balancing Problem Error-reduction for BPP References 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Computation Theory Lab, CSIE, CCU, Taiwan Introduction Goal: The Chernoff bound can be used in the analysis on the tail of the distribution of the sum of independent random variables, with some extensions to the case of dependent or correlated random variables. Markov’s Inequality and Moment generating functions which we shall introduce will be greatly needed. 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Math tool Professor Herman Chernoff’s bound, Annal of Mathematical Statistics 1952 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Computation Theory Lab, CSIE, CCU, Taiwan Chernoff bounds I n i t s m o g e r a l f , h C ® b u d v - X w : y > P [ ¸ ] · E q ¡ + T z p . A moment generating function 一開始大家並不熟悉E[e^{tX}],慢慢就會比較了解 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Computation Theory Lab, CSIE, CCU, Taiwan Markov’s Inequality F o r a n y d m v i b l e X ¸ > , P [ ] · E : W c u s M k I q t h f C j ¡ = ( ) 2 V ’ ’ 跟前面的 Chernoff bound 是不是很像? 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Proof of the Chernoff bound ’ I t f o l w s d i r e c y m M a k v n q u : P [ X ¸ ] = · E So, how to calculate this term? 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Moment Generating Functions X ( t ) = E [ e ] : T h i s f u n c o g a m b w r - y d ® v l ¯ 在 t=0 那一點對M_X(t) 作 i 次微分 T h e i t m o n f r . v X R e m a r k : E [ X i ] = P x 2 ¢ 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Moment Generating Functions (cont’d) W e c a n s i l y w h t m o g r f u k : d M X ( ) ¯ = E [ ] P 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Moment Generating Functions (cont’d) The concept of the moment generating function (mgf) is connected with a distribution rather than with a random variable. Two different random variables with the same distribution will have the same mgf. 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Moment Generating Functions (cont’d) : I f M X ( ) = Y o r l 2 ¡ ; s m e > , h n d v i b u . w p + L 1 k g T P y ’ 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Moment Generating Functions (cont’d) X a n d Y r e t w o i p m v b l s , h M + ( ) = : P E [ ] H u { c . 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Chernoff bound for the sum of Poisson trials The distribution of a sum of independent 0-1 random variables, which may not be identical. Bernoulli trials: The same as above except that all the random variables are identical. 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Chernoff bound for the sum of Poisson trials (cont’d) X i : = 1 ; n m u t a l y d e p - r o v b s w h P [ ] ¡ . L + E ¹ M ( ) ¢ · (Since 1 + y ≤ e y.) F M X ( t ) = E [ e ] 1 2 : n · p + ¡ ¹ , s i c . We will use this result later. 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Chernoff bound for the sum of Poisson trials (cont’d) 1 : L t X = + ¢ n , w ; a i d p l s u c P [ ] f 2 . ( ) y > ¸ ¹ · ³ ´ ¡ 3 R 6 由前一頁的結果來得到這個定理。 (1) 不好用,所以我們去推導出(2)來用,而當我們知道X之值很大(大於等於六倍期望值)時,可以套用(3) 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Computation Theory Lab, CSIE, CCU, Taiwan Proof of Theorem 1: F o r a n y d m v i - b l e X ¸ > , P [ ] · E : B y M a r k o v i n e q u l t , f > w h P [ X ¸ ( 1 + d ) ¹ ] = · E ¡ . F s T p 2 < 3 g m b c R 6 5 H ³ ´ (2)和(3)分別可由(1)以微積分的證明方式和簡單的代數運算推導而得, 所以重要的是要會推導(1)。 (2)可以先算f(d) = d - (1+d) ln (1+d) + d^2/3 的一階導數和二階導數,會發現當d=0時,一階導數為0且二階導數小於0,故f(0)為極大值。 from (1) 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Computation Theory Lab, CSIE, CCU, Taiwan probability Similarly, we have: X ¹ ¡ d ¹ ¹ + d T h e o r m : L t X = P n i 1 , w ; a d p s l u c [ ] . ¹ E f < ( ) · ¡ ³ ´ 2 C y F j ¸ 3 現在我們應該要知道為什麼要算 Pr[X\leq (1\pm d)\mu] 的原因。 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Computation Theory Lab, CSIE, CCU, Taiwan Example: Let X be the number of heads of n independent fair coin flips. Applying the above Corollary, we have: P r [ j X ¡ n = 2 ¸ p 6 l ] · e x ( 1 3 ) : 4 B y C h b s v i q u a t , . E V w Better!! ’ 由想要的機率結果去回推X的範圍, 或是由我們想要的X之範圍去估算偏離該範圍的機率。 由本張投影片可以知道Chernoff bound 比 Chebyshev’s inequality 來的好。(exponentially small in tail probability) 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Better bounds for special cases h e o r m L t X = 1 + ¢ n , w ; : a i d p v b l s P [ ] ¡ 2 . F y > ¸ · f E ( ) S c ! u g Q B e^t 由 Maclaurin series 得到。 E[e^{tX_i}] 只剩下偶次方項的 e^t 與 e^{-t} 之和 最後,E[e^{tX}] 就是上面的式子之乘積。 所求的Pr[X\geq a]就可以利用前面所講的方法,把他變成指數函數來代Chernoff bound,就可以求出來。 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Better bounds for special cases (cont’d) y L e t X = 1 + ¢ n , w h ; : i d p m v b s P [ ] ¡ 2 . F > j ¸ · Y ( ) f g c 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Better bounds for special cases (cont’d) y L e t Y = 1 + ¢ n , w h ; : i d p m v b s P [ ] 2 . ¹ E ( ) F > ¸ · ¡ 3 4 僅僅是利用Y_i = X_i + 1來作變數代換而已。 所以可以當習題就好。 Note: The details can be left for exercises. (See [MU05], pp. 70-71.) 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
An application: Set Balancing Given an n m matrix A with entries in {0,1}, let Suppose that we are looking for a vector v with entries in {1, 1} that minimizes B @ a 1 2 : m . n C A v = c 行:物種(species) 列:特徵(features) 當||Av||_{\infty}的值愈小,表示該feature的正和負差不多,那麼這分法(即v)對於這feature就是balance了 簡單來講,就是把給定的物種分兩堆,使得這兩堆裡面的物種盡可能有各式各樣的物種,才有可能有兩個平衡的生態系。 目標: 兩群物種中,特徵的最大差距(最不平衡)之值要最小。 k A v 1 = m a x i ; : n j c 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Set Balancing (cont’d) The problem arises in designing statistical experiments. Each column of matrix A represents a subject in the experiment and each row represents a feature. The vector v partitions the subjects into two disjoint groups, so that each feature is roughly as balanced as possible between the two groups. 當||Av||_{\infty}的值愈小,表示該feature的正和負差不多,那麼這分法(即v)對於這feature就是balance了 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Set Balancing (cont’d) For example, A: v: 斑馬 老虎 鯨魚 企鵝 肉食性 1 陸生 哺乳類 產卵 1 1 A v : 1 2 1 希望對每個特性都能分得均勻。 當||Av||_{\infty}的值愈小,表示該feature的正和負差不多,那麼這分法(即v)對於這feature就是balance了 W e o b t a i n h k A v 1 = 2 . 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Set Balancing (cont’d) For example, A: v: 斑馬 老虎 鯨魚 企鵝 肉食性 1 陸生 哺乳類 產卵 1 1 A v : 1 1 希望對每個特性都能分得均勻。 當||Av||_{\infty}的值愈小,表示該feature的正和負差不多,那麼這分法(即v)對於這feature就是balance了 W e o b t a i n h k A v 1 = . 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Set Balancing (cont’d) : G v £ m r x A w h s o 1 , - d f ; ¡ u = . T F y q p P [ j ¸ 4 ] · 2 randomly chosen A v m n c = n 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Proof of Set Balancing: d e t h - w A a = ( ; 1 ¢ m ) . S u p k I < 4 l , c y j v · ¸ z Z b / 2 g + ¡ B ® [ ] P r [ n S i = 1 ( j Z > p 4 m l ) ] A v m n ai 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Another application: Error-reduction in BPP The class BPP (for Bounded-error Probabilistic Polynomial time) consists of all languages L that have a randomized algorithm A running in worst-case polynomial time that for any input x *, x L Pr[A(x) accepts] ¾ . x L Pr[A(x) rejects] ¾ . That is, the error probability is at most ¼. 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Error-reduction in BPP (cont’d) Consider the following variant definition: The class BPP (for Bounded-error Probabilistic Polynomial time) consists of all languages L that have a randomized algorithm A running in worst-case polynomial time that for any input x * with | x | = n and some positive integer k 2, x L Pr[A(x) accepts] ½ + n k. x L Pr[A(x) rejects] ½ + n k. 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Error-reduction in BPP (cont’d) The previous two definitions of BPP are equivalent. We will show that the latter one can be transferred to the former one by Chernoff bounds as follows. Let MA be an algorithm simulating algorithm A for “t” times and output the majority answer. That is, if there are more than t/2 “accepts”, MA will output “Accept”. Otherwise, MA will output “Reject”. 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Error-reduction in BPP (cont’d) Let Xi , for 1 i t, be a random variable such that Xi = 1 if the ith execution of MA (running algorithm A) produces a correct answer and Xi = 0 otherwise. That is, accepts if x L and rejects if x L. L e t X = P i 1 , w h a v ¹ ¸ ( 2 + n k ) ¢ . S o t 2 · n k + ¢ ¹ X . 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Error-reduction in BPP (cont’d) Recall one of the previous results of the Chernoff bound: T h e o r m : L t X = P n i 1 , w ; a d p s l u c [ ] . ¹ E f < ( ) · ¡ ³ ´ 2 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Error-reduction in BPP (cont’d) W e h a v t r o p b i l y P [ X < = 2 ] · n k + ¢ ¹ µ 1 ¡ ¶ ( ) : L e t ¡ n k ( + 2 ) · 1 = 4 , w c a d r i v h l u o f s . 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Error-reduction in BPP (cont’d) y t a k i n g l o r h m b s d e , w v ¡ ( + 2 ) · 1 4 S c ¢ P [ X < = ] : 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Error-reduction in BPP (cont’d) Since is still polynomial, the running time of MA will be still polynomial. Hence the latter definition for BPP is equivalent to the former one! t = l n 4 ¢ ( 2 k + ) 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Computation Theory Lab, CSIE, CCU, Taiwan References [MR95] Rajeev Motwani and Prabhakar Raghavan, Randomized algorithms, Cambridge University Press, 1995. [MU05] Michael Mitzenmacher and Eli Upfal, Probability and Computing - Randomized Algorithms and Probabilistic Analysis, Cambridge University Press, 2005. 蔡錫鈞教授上課投影片 Professor Valentine Kabanets’s lectures 2018/12/8 Computation Theory Lab, CSIE, CCU, Taiwan
Thank you.