Download presentation
Presentation is loading. Please wait.
1
Gaussian Process Ruohua Shi Meeting
2
Presentation goals 1. Some Reviews 2. What is Gaussian Distribution?
3. What is Gaussian Process? 4. How to use Gaussian Process? • Gaussian Process Regression • A Sample • Gaussian Process Classification 5. Reference
3
Some Reviews Random variable : A real number to every possible outcome of a random experiment. 𝑋:Ω→ℝ There are two types of random variables, discrete and continuous. Sample Space : A set of possible outcomes of a random experiment. Ω Probability : 𝑃 𝑋=𝐴 =𝑝 Mean : μ= 𝐸 𝑋 𝑋 = 𝑛 1 𝑥 𝑖 𝑝 𝑖 Variance : 𝜎 2 = 𝑛 1 (𝑥 𝑖 −μ) 2 𝑝 𝑖 Standard Deviation: 𝜎= ( 𝑛 1 (𝑥 𝑖 −μ) 2 𝑝 𝑖 ) 1/2 Distribution Function: 𝐹 𝑋 𝑥 =𝑃{𝜔∈Ω;𝑋 𝜔 <𝑥} 𝐹 𝑋 𝑥 =𝑃(𝑋<𝑥) Density Functions: 𝐹 𝑋 (𝑥)= −∞ 𝑥 𝑓 𝑋 𝑡 𝑑𝑡 (for continuous) Joint Distributions: For random variables 𝑋1 and 𝑋2. For discrete random variables, 𝑓 𝑋 1, 𝑋 2 𝑥 1 , 𝑥 2 =𝑃{ 𝑋 1 = 𝑥 1 , 𝑋 2 = 𝑥 2 } Independent R. V: Conditional Distributions
4
Gaussian Distribution
x ~ 𝑁 𝑝 (𝜇, Σ) And Therefore we conclude that Pf. For any vector 𝜇=E(𝐱) Σ=𝐸 𝐱−𝜇 𝐱−𝜇 𝑇 =𝐸 𝐱 𝐱 𝑇 −𝜇 𝜇 𝑇 𝜀 𝑖𝑗 =𝐶𝑜𝑣[ 𝑋 𝑖 , 𝑋 𝑗 ] Multivariate Normal distribution 有个很重要的性质,就是假设 N 个变量服从 Multivariate Normal distribution , 从里面任意扣 n<N 个变量组成的vector 的联合分布仍然服从 Multivariate Normal distribution。这个性质除了 Normal, 和 Gamma (Gamma 是么) 好像也没谁了。 所以当处理高维联合分布的时候, 大家非常喜欢用高斯分布
5
What is a stochastic process?
X f(x) P 𝑥 1 ∙ 𝑥 2 ∙ 𝑥 𝑛 ∙ Parameter set: X={x1,…,xn} State Set: Z={Z1,Z2,…,Zn} Zi = f(xi) i=1,…,n Z (1)Discrete time Discrete state Bernoulli Process (2) Discrete time Continuous state White Noise Process (3) Continuous time Discrete state Poisson Process (4) Continuous time Continuous state Gaussian Process 离散参数,离散状态 eg.最简单也最早被人们研究的随机过程是随机游动.伯努利过程 它是一种描述粒子在格子点 集 Z := {0, ±1, ±2, } 上随机运动的数学模型. 设 {Zn : n ≥ 1} 是某个概率空间上独立 同分布的随机变量序列且都服从 Bernoulli 分布,即P(Zi=1)=p,P(Zi=-1)=1-p 离散参数,连续状态 eg.高斯白噪声过程 对第xi个时刻,Z1…Zn相互独立同分布,服从正态分布N(0,sigma^2)的白噪声 连续参数,离散状态 eg.计数过程 如果N t 表示直到时刻t为止发生的某随机事件总数, 则称实随机过程{N t ,t≥0}为计数过程. 如直到时刻t为止进入某商店的人数N t , {N t ,t≥0} 计数过程通常 满足: N t 是非负整数且N 0 =0 连续参数,连续状态 eg.高斯过程 设Z= {Z t ,t∈T}是一实值随机过程,若对任意n≥1及x1 ,x2 ,…,xn ∈X, n维随机变量(Z1 , Z2 , …, Zn )服从n维正态分布,则称X是高斯过程,如果X是一个连续集,那么这个高斯过程就是一个连续参数连续状态的随机过程 随机过程及其应用(第2版);西安电子科技大学出版社; 6 edition (May )
6
What is a Gaussian process?
Let Z={Z1,Z2,…,Zn}be an N-dimensional vector of function values evaluated at n points xi ∈ X, i=1,…,n. Zi = f(xi) • Note that Z is a random variable. • Definition: Z is a Gaussian process if for any finite subset of {x1,…,xn}, the marginal distribution over that finite subset Z has a multivariate Gaussian distribution. Gaussian process is parameterized by a mean function 𝜇 𝑋 ,and a covariance function (kernel) 𝐾( 𝑘 𝑖𝑗 ) 𝑛×𝑛 , where 𝑘 𝑖𝑗 =𝑘( 𝑥 𝑖 , 𝑥 𝑗 ). 这里没有强调Z1,Z2是不是要是同分布,只强调他们都是高斯分布并且联合分布也是高斯分布。 Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Chris Williams, the MIT Press, 2006
7
Kernels 𝑘 𝑥, 𝑥 ′ = 𝜑 𝑥 𝑇 𝜑 𝑥′ 𝜑(𝑥) :A nonlinear feature space mapping
k is A symmetric function of its arguments so that 𝑘 𝑥,𝑥′ =𝑘( 𝑥 ′ ,𝑥) 𝑘 𝑥, 𝑥 ′ = 𝜑 𝑥 𝑇 𝜑 𝑥′ Pattern Recognition and Machine Learning. Bishop, Christopher,2006
8
Gaussian Regression MODEL INFERENCE Let Let
observed unobserved MODEL Let Let Where m is the mean of the predicted values and D is the variance. We can know the confidence interval by mean and variance. Pattern Recognition and Machine Learning. Bishop, Christopher,2006
10
Relationship to Polynomial regression
prior Linear. Part 1. Non-Linear. Part 2. GP regression maximize the function: 极大似然函数,theta点估计 Polynomial regression minimize the function:
11
Example Gaussian Processes for Regression A Quick Introduction, M.Ebden, August 2008.
13
Gaussian Process Classification (GPC)
Consider a two-class problem with a target variable y ∈ {-1, 1}. Input: x=( 𝑥 1 ,…, 𝑥 𝑛 )T Observed value: 𝑦=(𝑦 1 ,…, 𝑦 𝑛 )T Test data : 𝑥 ∗ Target value: 𝑦 ∗ Goal: 𝑃( 𝑦 ∗ |𝑦) Gaussian process prior over the latent function f with a covariance function 𝑘(𝑥, 𝑥′|𝜃), which may depend on hyperparameters 𝜃. Pattern Recognition and Machine Learning. Bishop, Christopher,2006
14
Advantages Disadvantages
1) The GP model is considered to be a basic framework for statistical machine learning. (Interpolation,Fitting) 2) The GP model is a model that combines kernel machine learning with Bayesian inference learning, and has the advantages of the above two types of learning methods. GP can generate probability information, and different kernels can be specified. Disadvantages 1) The hyperparameters of the GP model, such as the covariance function and the pending parameters in the prior distribution, have a large impact on the learning and prediction results. But there is no clear explanation of how to determine the appropriate initial value. 2) GP model does not work well on sparse samples. 周亚同,陈子一 ,马尽文. 从高斯过程到高斯过程混合模型:研究与展望. Journal of Signal Processing. Vol.32 No.8. Aug 2016.
15
Reference [1] 随机过程及其应用(第2版);西安电子科技大学出版社; 6 edition (May 1 2012)
[2] Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Chris Williams, the MIT Press, 2006 [3] Pattern Recognition and Machine Learning. Bishop, Christopher,2006 [4] Gaussian Processes for Regression A Quick Introduction, M.Ebden, August 2008. [5] 周亚同,陈子一 ,马尽文. 从高斯过程到高斯过程混合模型:研究与展望. Journal of Signal Processing. Vol.32 No.8. Aug 2016. [6]
Similar presentations