Chapter 9 Validation Prof. Dehan Luo 第九章 确定方法 Section One Motivation (第一节 问题引发) Section Two The Holdout method (第二节 保持方法) Section Three Re-sampling techniques (第三节 重复采样技术) Section Four Three-way data splits (第四节 三路数据分离) Intelligent Sensors System 9-1 School of Information Engineering
Section One Motivation Chapter9 Validation Prof. Dehan Luo Section One Motivation (第一节 问题引发) (1) Validation techniques are motivated by two fundamental problems in pattern recognition: model selection and performance estimation (确认技术是由模式识别中的两个基本问题而引发产生,即模型选择和 性能估计) (2)Model selection(模型选择) (a)Almost invariably, all pattern recognition techniques have one or more free parameters (几乎都一样,所有模式识别技术有一个或多个自由参数) Intelligent Sensors System 9-2 School of Information Engineering
Section One Motivation Chapter9 Validation Prof. Dehan Luo Section One Motivation (第一节 问题引发) (2)Model selection(模型选择) 几乎都一样,所有模式识别技术有一个或多个自由参数 The number of neighbors in a kNN classification rule The network size, learning parameters and weights in MLPs (b)How do we select the “optimal” parameter(s) or model for a given classification problem? (对给定的分类问题如何选择最佳参数或模型?) Intelligent Sensors System 9-2 School of Information Engineering
Chapter9 Validation Prof. Dehan Luo Motivation (Cont.)(续) (3)Performance estimation(性能估计) Once we have chosen a model, how do we estimate its performance? Performance is typically measured by the TRUE ERROR RATE, the classifier’s error rate on the ENTIRE POPULATION (4) If we had access to an unlimited number of examples these questions have a straightforward answer (假如有无限样本数,这些问题就有直接答案) Choose the model that provides the lowest error rate on the entire population and, of course, that error rate is the true error rate Intelligent Sensors System 9-3 School of Information Engineering
Chapter 9 Validation Prof. Dehan Luo Motivation (Cont.)(续) (5) In real applications we only have access to a finite set of examples, usually smaller than we wanted Model selection (在实际应用中,所用样本数有限,通常小于模型选择所需要的样本数) One approach is to use the entire training data to select our classifier and estimate the error rate。 This naive approach has two fundamental problems (a)The final model will normally over fit (不适合)the training data. This problem is more pronounced (更突出)with models that have a large number of parameters Intelligent Sensors System 9-4 School of Information Engineering
Chapter 9 Validation Prof. Dehan Luo Motivation (Cont.)(续) (5) (在实际应用中,所用样本数有限,通常小于模型选择所需要的样本数) (b) The error rate estimate will be overly optimistic (lower than the true error rate),In fact, it is not uncommon to have 100% correct classification on training data (6) A much better approach is to split the training data into disjoint subsets: the holdout method (更好的方法是将训练数据分解为子数组,即保持方法) Intelligent Sensors System 9-4 School of Information Engineering
Chapter 9 Validation Prof. Dehan Luo Section Two The Holdout method (第二节 保持方法) 1、Split dataset into two groups(将数据组分为两组) (1) Training set: used to train the classifier (2) Test set: used to estimate the error rate of the trained classifier 2、A typical application the holdout method is determining a stopping point for the back propagation error (保持方法的典型应用是确 定反向误差传播的停止点) Intelligent Sensors System 9-5 School of Information Engineering
Chapter 9 Validation Prof. Dehan Luo The Holdout method (Cont.) (续) 3、The holdout method has two basic drawbacks (1) In problems where we have a sparse dataset we may not be able to afford the “luxury” of setting aside a portion of the dataset for testing 只有很少的数据组,担负不起将一部分数据放置旁边用于 测试的“奢侈” (2)Since it is a single train-and-test experiment, the holdout estimate of error rate will be misleading if we happen to get an “unfortunate” split (由于这是单一的训练与测试实验,假如碰巧遇到“不幸”数据分 离, 保持方法误差率估算将产生误导) Intelligent Sensors System 9-6 School of Information Engineering
Chapter 9 Validation Prof. Dehan Luo The Holdout method (Cont.) (续) 4、The limitations of the holdout can be overcome with a family of re-sampling methods at the expense of more computations (保持方法缺点可通过重复采样,花费更多的计算来克服) (1)Cross Validation (a)Random Subsampling (b)K-Fold Cross-Validation (c)Leave-one-out Cross-Validation Intelligent Sensors System 9-6 School of Information Engineering
Section Three Re-sampling techniques Chapter 9 Validation Prof. Dehan Luo Section Three Re-sampling techniques (第三节 重复采样技术) 1、Random Subsampling (1) Random Subsampling performs K data splits of the dataset (随机二次采样将数据组分成K个数据块) (2)Each split randomly selects a (fixed) no. examples without replacement (每个数据块选择(固定)的样本数而不置换) Intelligent Sensors System 9-7 School of Information Engineering
Section Three Re-sampling techniques Chapter 9 Validation Prof. Dehan Luo Section Three Re-sampling techniques (第三节 重复采样技术) 1、Random Subsampling (3)For each data split we retrain the classifier from scratch with the training examples and estimate Ei with the test examples (用每个数据块对初始用训练样本训练的分类器进行再训练并用测 试数据估计单个误差Ei ) Intelligent Sensors System 9-7 School of Information Engineering 总样本数 测试样本数
Chapter 9 Validation Prof. Dehan Luo 1、Random Subsampling (Cont.) (随机二次采样) (续) The true error estimate is obtained as the average of the separate estimates Ei (实际误差估计由单个误差估计的平均值而获得) This estimate is significantly better than the holdout estimate (随机误差估计要比保持误差估计方法更好 Intelligent Sensors System 9-8 School of Information Engineering
Chapter 9 Validation Prof. Dehan Luo 2、K-Fold Cross-validation(K倍交叉确定) (1)Create a K-fold partition of the the dataset (创建一个K倍的数据组分区) For each of K experiments, use K-1 folds for training and the remaining one for testing (在K个实验中,K-1个用于训练,剩余的一个用用于测试) Intelligent Sensors System 9-9 School of Information Engineering 总样本数 测试样本数
Chapter 9 Validation Prof. Dehan Luo 2、K-Fold Cross-validation(K倍交叉确定)(续) (2)K-Fold Cross validation is similar to Random Subsampling (K倍交叉确定类似于随机二次采样技术) The advantage of K-Fold Cross validation is that all the examples in the dataset are eventually used for both training and testing ( K倍交叉确定的优点是在数据组里的所有样本最终都被用于训练和测试) (3)As before, the true error is estimated as the average error rate (与前相同,实际误差由平均误差率来估计) Intelligent Sensors System 9-10 School of Information Engineering
Chapter 9 Validation Prof. Dehan Luo 3、Leave-one-out Cross Validation(去一交叉确定) (1)Leave-one-out is the degenerate case of K-Fold Cross Validation, where K is chosen as the total number of examples (去一交叉确定是K倍交叉确定变种,这里K被选择为总样本数) Intelligent Sensors System 9-11 School of Information Engineering 总样本数 单个测试样本
Chapter 9 Validation Prof. Dehan Luo 3、Leave-one-out Cross Validation(去一交叉确定) For a dataset with N examples, perform N experiments For each experiment use N-1 examples for training and the remaining example for testing (对N个样本的数据组,执行N个实验,每个实验,使用N-1个样本做训练,剩 余一个样本用于测试 ) Intelligent Sensors System 9-11 School of Information Engineering 总样本数 单个测试样本
Chapter 9 Validation Prof. Dehan Luo 3、Leave-one-out Cross Validation(去一交叉确定)(续) (2)As usual, the true error is estimated as the average error rate on test examples (与一般情况相同,实际误差由测试样本平均误差率来估计) 4、How many folds are needed?(需要多大倍数?) (1) With a large number of folds(大倍数时) (a)The bias of the true error rate estimator will be small (the estimator will be very accurate) (实际误差率估算器偏差小) Intelligent Sensors System 9-12 School of Information Engineering
Chapter 9 Validation Prof. Dehan Luo 3、Leave-one-out Cross Validation(去一交叉确定)(续) 4、How many folds are needed?(需要多大倍数?) (1) With a large number of folds(大倍数时) (b) The variance of the true error rate estimator will be large (实际误差率估算器不一致性变大) (c)The computational time will be very large as well (many experiments) (计算时间变长) Intelligent Sensors System 9-12 School of Information Engineering
Chapter 9 Validation Prof. Dehan Luo 4、How many folds are needed?(需要多大倍数?)(续) (2)With a small number of folds (小倍数时) (a)The number of experiments and, therefore, computation time are reduced (实验数和计算时间减少) (b)The variance of the estimator will be small (实际误差率估算器不一致性变小) (c)The bias of the estimator will be large (conservative or higher than the true error rate) (实际误差率估算器偏差小,同等于 或大于实际误差率) Intelligent Sensors System 9-13 School of Information Engineering
Chapter 9 Validation Prof. Dehan Luo 4、How many folds are needed?(需要多大倍数?)(续) (3)In practice, the choice of the number of folds depends on the size of the dataset (在实践中倍数K的选择取决于数据组的大小) (a)For large datasets, even 3-Fold Cross Validation will be quite accurate (大数据组时,3被的交叉确定相当精确) (b)For very sparse datasets, we may have to use leave-one-out in order to train on as many examples as possible (小数据组 时,为了尽可能多的训练样本,只能使用去一交叉确定方法) (4) A common choice for K-Fold Cross Validation is K=10 通常选择 K=10 Intelligent Sensors System 9-14 School of Information Engineering
Section Four Three-way data splits Chapter 9 Validation Prof. Dehan Luo Section Four Three-way data splits (第四节 三路数据分离) 1、 Data splits (数据分离) If model selection and true error estimates are to be computed simultaneously, the data needs to be divided into three disjoint sets (假如模型选择和实际误差估计被同时计算,则数据需要分成三组) (1) Training set: (训练数组) (a)a set of examples used for learning: to fit the parameters of the classifier (为了适应分类器参数,这组样本用于学习) Intelligent Sensors System 9-15 School of Information Engineering
Section Four Three-way data splits Chapter 9 Validation Prof. Dehan Luo Section Four Three-way data splits (第四节 三路数据分离) 1、 Data splits (数据分离) (1) Training set: (训练数组) (b)In the MLP case, we would use the training set to find the “optimal” weights with the back-prop rule (在多层神经元网络中,选择训练数据组去获取反向转传播 规则条件下的“最佳”权重) Intelligent Sensors System 9-15 School of Information Engineering
Chapter 9 Validation Prof. Dehan Luo 1、 Data splits (Cont.) (续) (2) Validation set: (确定数组) (a)Validation set is a set of examples used to tune the parameters of of a classifier (确定数组是用于调整分类器参数的样本数据) (b) In the MLP case, we would use the validation set to find the “optimal” number of hidden units or determine a stopping point for the back propagation algorithm (在多层神经元网络中,用确定数组去获取反向转传播算法) Intelligent Sensors System 9-16 School of Information Engineering
Chapter 9 Validation Prof. Dehan Luo 1、 Data splits (Cont.) (续) (3) Test set: (测试数组) (a)a set of examples used only to assess the performance of a fully-trained classifier (用于评定完全训练后的分类器性能) (b)In the MLP case, we would use the test to estimate the error rate after we have chosen the final model (MLP size and actual weights) (在多层神经元网络中,用测试数组去评定最终确定 的模型(多层 神经元网络的大小和权重)的误差) Intelligent Sensors System 9-17 School of Information Engineering
Chapter 9 Validation Prof. Dehan Luo 1、 Data splits (Cont.) (续) (3) Test set: (测试数组) (c) After assessing the final model with the test set, YOU MUST NOT further tune the model (用测试数据完成了对最终模型的评定后,就不能再对模型进 行调整) Intelligent Sensors System 9-17 School of Information Engineering
Chapter 9 Validation Prof. Dehan Luo 2、Why separate test and validation sets? (1)The error rate estimate of the final model on validation data will be biased (smaller than the true error rate) since the validation set is used to select the final model (由于校定数组用于选择最终模型,因此,基于校定数据的最终模型误 差率估计是有偏置的) (2)After assessing the final model with the test set, YOU MUST NOT tune the model any further (用测试数据完成了对最终模型的评定后,就不能再对模型进 行调整) Intelligent Sensors System 9-18 School of Information Engineering
Chapter 9 Validation Prof. Dehan Luo 3、Procedure outline 1. Divide the available data into training, validation and test set 2. Select architecture and training parameters 3. Train the model using the training set 4. Evaluate the model using the validation set 5. Repeat steps 2 through 4 using different architectures and training parameters 6. Select the best model and train it using data from the training and validation set 7. Assess this final model using the test set Intelligent Sensors System 9-19 School of Information Engineering
Chapter 9 Validation Prof. Dehan Luo 4、Three-way data split picture Intelligent Sensors System 9-20 School of Information Engineering