Advanced Artificial Intelligence

Slides:

Advertisements

Similar presentations

Chapter 2 Combinatorial Analysis 主講人 : 虞台文. Content Basic Procedure for Probability Calculation Counting – Ordered Samples with Replacement – Ordered.

Advertisements

Course 1 演算法: 效率、分析與量級 Algorithms: Efficiency, Analysis, and Order

Unsupervised feature learning: autoencoders

宏观经济学 N.Gregory Mankiw 上海杉达学院.

大规模深度学习算法 Deep Belief Network及其应用

資料採礦與商業智慧第四章類神經網路－Neural Net.

TQC+ 物件導向程式認證-JAVA.

-Artificial Neural Network- Hopfield Neural Network(HNN) 朝陽科技大學資訊管理系李麗華教授.

Chapter 8 Liner Regression and Correlation 第八章直线回归和相关

深層學習暑期訓練 (2017).

Minimum Spanning Trees

-Artificial Neural Network- Adaline & Madaline

An Adaptive Cross-Layer Multi-Path Routing Protocol for Urban VANET

指導教授：許子衡教授報告學生：翁偉傑 Qiangyuan Yu , Geert Heijenk

模式识别 Pattern Recognition

第三章生物神經網路與類神經網路類神經網路台大生工系水資源資訊系統研究室.

Chapter 1 用VC++撰寫程式 Text book: Ivor Horton.

Proteus 可视化设计 Drag, Drop and PLAY! Slide 1.

Source: IEEE Access, vol. 5, pp , October 2017

第十章基于立体视觉的深度估计.

第4章网络互联与广域网 4.1 网络互联概述 4.2 网络互联设备 4.3 广域网 4.4 ISDN 4.5 DDN

创建型设计模式.

组合逻辑3 Combinational Logic

InterSpeech 2013 Investigation of Recurrent-Neural-Network Architectures and Learning Methods for Spoken Language Understanding University of Rouen(France)

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi

普通物理 General Physics 29 - Current-Produced Magnetic Field

Interval Estimation區間估計

重點資料結構之選定會影響演算法選擇對的資料結構讓您上天堂程式.

校園網路架構介紹與資源利用主講人：趙志宏圖書資訊館網路通訊組.

線性相關與直線迴歸基本概念線性相關：兩個連續變項的共變關係，且有線性關係。所謂的線性關係乃指兩個變項的關係可以被一條最具

Advisor : Prof. Frank Y.S. Lin Presented by Yen-Yi, Hsu

職業 Random Slide Show Menu

類神經網路簡介 B 朱峰森 B 梁家愷.

闖關卡恭喜你通過所有的考驗！你是超級厲害的棒！三年班號姓名：有色眼鏡占心數九九神功你真棒！神奇敲敲樂魔陣密碼

A high payload data hiding scheme based on modified AMBTC technique

Version Control System Based DSNs

VIDEO COMPRESSION & MPEG

前向人工神经网络敏感性研究曾晓勤河海大学计算机及信息工程学院 2003年10月.

3.5 Region Filling Region Filling is a process of “coloring in” a definite image area or region. 2019/4/19.

中国科学技术大学计算机系陈香兰 2013Fall 第七讲存储器管理中国科学技术大学计算机系陈香兰 2013Fall.

ImageNet Classification with Deep Convolutional Neural Networks

虚拟仪器 virtual instrument

Unit 3 How many? (Sound time，Checkout time ) Unit 3 How many?

爬蟲類動物2 Random Slide Show Menu

Neural Networks: Learning

Deep Learning with Limited Numerical Precision

Chapter 10 Mobile IP TCP/IP Protocol Suite

李宏毅專題 Track A, B, C 的時間、地點開學前通知

Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨

Create and Use the Authorization Objects in ABAP

Introduction of this course

SLIQ：一种快速可伸缩分类器 Manish Mehta, Rakesh Agrawal, Jorma Rissanen IBM Almaden Research Center, 1996 报告人：郭新涛

第六章类属B树索引技术对基于树的索引方法给出一种通用算法。该算法是建立在类属B树的概念之上开发的。它将类型系统开放，使系统能支持用户自定义的数据类型、函数和某些特殊的查询谓词的集合。并且，将新的数据类型、函数、查询谓词等登记到数据库管理系统中，

 隐式欧拉法 /* implicit Euler method */

More About Auto-encoder

Speaker : YI-CHENG HUNG

动词不定式（6）.

主要内容什么是概念图？概念图的理论基础概念图的功能概念地图的种类如何构建概念图概念地图的评价标准国内外概念图研究现状

2012 程式設計比賽 Openfind 天使帝國 v2.0 (蓋亞的紋章).

Libfann R binding 快速神经网络在R上的绑定.

Chapter 9 Validation Prof. Dehan Luo

Class imbalance in Classification

ADX series Configuration

簡單迴歸分析與相關分析莊文忠副教授世新大學行政管理學系計量分析一(莊文忠副教授) 2019/8/3.

Principle and application of optical information technology

WiFi is a powerful sensing medium

Gaussian Process Ruohua Shi Meeting

When using opening and closing presentation slides, use the masterbrand logo at the correct size and in the right position. This slide meets both needs.

Presentation transcript:

Advanced Artificial Intelligence Lecture 5: Neural Networks

Outline Perceptron Introduction Deep Neural Network Structure Backpropagation

Perceptron Introduction Perceptron is Inspired by Neuron. It is a classifier. A diagram showing a perceptron updating its linear boundary as more training examples are added.

Single layer one-input Perceptron 𝑏 𝑎= 𝑤 1 𝑥 1 +𝑏 𝜃(𝑥)= 1, 𝑥≥0 0, 𝑥<0 𝑤 1 =−1.5,𝑏=−0.5 𝑌=𝜃(𝑎

Single layer multi-input Perceptron 𝑏 𝑎= 𝑤 1 𝑥 1 + 𝑤 2 𝑥 2 +𝑏 𝑤 𝑖 = 𝑤 𝑖 +𝛼⋅𝐸⋅ 𝑥 𝑖 Single layer perceptron is a linear classifier 𝑌=𝜃(𝑎 𝑏=𝑏+𝛼⋅𝐸 𝐸= 𝑌 −𝑌 𝜃(𝑥)= 1, 𝑥≥0 0, 𝑥<0

Single Hidden layer multi-input Perceptron Multiple inputs, single hidden node perceptron. Still a linear classifier, with a hyper-classify plane.

Non-linear activation Perceptron 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏 𝑎= 𝑤 1 𝑥 1 + 𝑤 2 𝑥 2 +𝑏 𝑌=𝜎(𝑎

Non-linear activation Perceptron With sigmoid activation function

Outline Perceptron Introduction Deep Neural Network Structure Backpropagation

Deep Neural Network One neuron (perceptron): Linear separation One hidden layer: Realization of convex regions Two hidden layers: Realization of non-convex regions Multi-hidden layers non-linear activation: All the complex shapes

Deep Structure Deep Neural Network can do almost all the classification and regression task

Why Deep and Thin Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/Why.pdf

Deep NN Structure Input Layer hidden Layer output Layer Forward Propagation： Step 1 : input->hidden layer net ℎ1 = 𝑤 1 ∗ 𝑖 1 + 𝑤 2 ∗ 𝑖 2 + 𝑏 1 ∗1 𝑜𝑢𝑡 ℎ1 = 1 1+ 𝑒 − net ℎ1 Step 2 : hidden->output layer net 𝑜1 = 𝑤 5 ∗ 𝑜𝑢𝑡 ℎ1 + 𝑤 6 ∗ 𝑜𝑢𝑡 ℎ2 + 𝑏 2 ∗1 𝑜𝑢𝑡 𝑜1 = 1 1+ 𝑒 − net 𝑜1 Assume the activation function is sigmoid

Outline Perceptron Introduction Deep Neural Network Structure Backpropagation

Backward Propagation weight Update out(y) = 1 1+ 𝑒 −𝑦 net x = 𝑤 1 𝑥+ 𝑤 2 𝑥+𝑏 𝑤 1 = 𝑤 1 + 𝜂 𝜕𝐸 𝜕 𝑤 1 (𝜂 is the learning rate)

Output layer weight update Backward Propagation： Step 1 : total cost 𝐸 𝑡𝑜𝑡𝑎𝑙 = 𝑛 1 2 (𝑡𝑟𝑎𝑔𝑒𝑡 −𝑜𝑢𝑡𝑝𝑢𝑡) 2 Step 2 : output ->hidden layer weights update 𝜕 𝐸 𝑡𝑜𝑡𝑎𝑙 𝜕 𝑤 5 = 𝜕 𝐸 𝑡𝑜𝑡𝑎𝑙 𝜕 𝑜𝑢𝑡 𝑜1 ∗ 𝜕 𝑜𝑢𝑡 𝑜1 𝜕 𝑛𝑒𝑡 𝑜1 * 𝜕 𝑛𝑒𝑡 01 𝜕 𝑤 5 ou𝑡(𝑦)= 1 1+ 𝑒 −𝑦 net x = 𝑤 1 𝑥+ 𝑤 2 𝑥+𝑏 𝑤 5 = 𝑤 5 + 𝜂 𝜕𝐸 𝜕 𝑤 5 (Similar to single perceptron)

Hidden layer weight update Backward Propagation： Step 3 : hidden layer -> hidden layer weight update 𝜕 𝐸 𝑡𝑜𝑡𝑎𝑙 𝜕 𝑤 1 = 𝜕 𝐸 𝑡𝑜𝑡𝑎𝑙 𝜕 𝑜𝑢𝑡 ℎ1 ∗ 𝜕 𝑜𝑢𝑡 ℎ1 𝜕 𝑛𝑒𝑡 ℎ1 * 𝜕 𝑛𝑒𝑡 ℎ1 𝜕 𝑤 1 𝜕 𝐸 𝑡𝑜𝑡𝑎𝑙 𝜕 𝑜𝑢𝑡 ℎ1 = 𝜕 𝐸 𝑜1 𝜕 𝑜𝑢𝑡 ℎ1 + 𝜕 𝐸 𝑜2 𝜕 𝑜𝑢𝑡 ℎ1 𝜕 𝐸 𝑜1 𝜕 𝑜𝑢𝑡 ℎ1 = 𝜕 𝐸 𝑜1 𝜕 𝑛𝑒𝑡 𝑜1 ∗ 𝜕 𝑛𝑒𝑡 𝑜1 𝜕 𝑜𝑢𝑡 ℎ1 𝜕 𝐸 𝑡𝑜𝑡𝑎𝑙 𝜕 𝑤 1 = 𝑜 𝜕 𝐸 𝑜 𝜕 𝑛𝑒𝑡 𝑜 ∗ 𝜕 𝑛𝑒𝑡 𝑜 𝜕 𝑜𝑢𝑡 ℎ1 ∗ 𝜕 𝑜𝑢𝑡 ℎ1 𝜕 𝑛𝑒𝑡 ℎ1 ∗ 𝜕 𝑛𝑒𝑡 ℎ1 𝜕 𝑤 1

Gradient Descent Millions of parameters …… Network parameters Starting Parameters …… A network can have millions of parameters. Millions of parameters …… To compute the gradients efficiently, we use back propagation. Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Chain Rule Case 1 Case 2 A network can have millions of parameters. Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Backpropagation 𝐿 𝜃 = 𝑛=1 𝑁 𝑙 𝑛 𝜃 𝜕𝐿 𝜃 𝜕𝑤 = 𝑛=1 𝑁 𝜕 𝑙 𝑛 𝜃 𝜕𝑤 𝑥 1 𝑥 2 xn NN 𝜃 yn 𝑦 𝑛 𝑙 𝑛 𝐿 𝜃 = 𝑛=1 𝑁 𝑙 𝑛 𝜃 𝜕𝐿 𝜃 𝜕𝑤 = 𝑛=1 𝑁 𝜕 𝑙 𝑛 𝜃 𝜕𝑤 𝑦 1 𝑥 1 𝑥 2 𝑦 2 Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Backpropagation 𝑧 𝑤 1 …… 𝑥 1 …… 𝑤 2 𝑧= 𝑥 1 𝑤 1 + 𝑥 2 𝑤 2 +𝑏 𝑥 2 𝑦 1 𝑥 1 b …… 𝑤 2 𝑧= 𝑥 1 𝑤 1 + 𝑥 2 𝑤 2 +𝑏 𝑦 2 𝑥 2 Forward pass: Compute 𝜕𝑧 𝜕𝑤 for all parameters 𝜕𝑙 𝜕𝑤 =? 𝜕𝑧 𝜕𝑤 𝜕𝑙 𝜕𝑧 Backward pass: (Chain rule) Compute 𝜕𝑙 𝜕𝑧 for all activation function inputs z Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Backpropagation-Forward pass Compute 𝜕𝑧 𝜕𝑤 for all parameters 𝑧 𝑤 1 …… 𝑦 1 𝑥 1 b …… 𝑤 2 𝑧= 𝑥 1 𝑤 1 + 𝑥 2 𝑤 2 +𝑏 𝑦 2 𝑥 2 𝑥 1 𝜕𝑧 𝜕 𝑤 1 =? The value of the input connected by the weight 𝑥 2 𝜕𝑧 𝜕 𝑤 2 =? Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Backpropagation-Forward pass Compute 𝜕𝑧 𝜕𝑤 for all parameters 0.98 1 2 0.86 3 1 -2 1 -1 -1 -2 -1 0.12 -2 -1 0.11 -1 1 -1 4 2 That’s it. We have done the forward pass. 𝜕𝑧 𝜕𝑤 =−1 𝜕𝑧 𝜕𝑤 =0.12 𝜕𝑧 𝜕𝑤 =0.11 Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Backpropagation-Backward pass Compute 𝜕𝑙 𝜕𝑧 for all activation function inputs z 𝑤 1 𝑧 𝑎 𝑥 1 b 𝑎=𝜎 𝑧 𝑤 2 𝜎′ 𝑧 𝜎 𝑧 𝑥 2 𝜕𝑙 𝜕𝑧 = 𝜕𝑎 𝜕𝑧 𝜕𝑙 𝜕𝑎 𝜎′ 𝑧 Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Backpropagation-Backward pass Compute 𝜕𝑙 𝜕𝑧 for all activation function inputs z 𝑧 𝑎 𝑤 3 𝑧′ 𝑤 1 𝑥 1 b 𝑧′=𝑎 𝑤 3 +⋯ 𝑎=𝜎 𝑧 𝑤 2 𝑤 4 𝑧’’ 𝑥 2 How to explain this chain rule 𝜕𝑙 𝜕𝑧 = 𝜕𝑎 𝜕𝑧 𝜕𝑙 𝜕𝑎 𝜕𝑙 𝜕𝑎 = 𝜕𝑧′ 𝜕𝑎 𝜕𝑙 𝜕𝑧′ + 𝜕𝑧′′ 𝜕𝑎 𝜕𝑙 𝜕𝑧′′ (Chain rule) ? ? Assumed it’s known 𝑤 3 𝑤 4 Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Backpropagation-Backward pass Compute 𝜕𝑙 𝜕𝑧 for all activation function inputs z 𝑧 𝑎 𝑧′ 𝑤 1 𝑤 3 𝑥 1 𝜕𝑙 𝜕𝑧′ 𝜕𝑙 𝜕𝑧 b 𝑤 2 𝑤 4 𝑧’’ 𝑥 2 How to explain this chain rule 𝜕𝑙 𝜕𝑧′′ 𝜕𝑙 𝜕𝑧 =𝜎′ 𝑧 𝑤 3 𝜕𝑙 𝜕𝑧′ + 𝑤 4 𝜕𝑙 𝜕𝑧′′ Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Backpropagation-Backward pass 𝜎′ 𝑧 𝑤 3 𝜕𝑙 𝜕𝑧′ 𝜕𝑙 𝜕𝑧 𝑤 4 𝜎′ 𝑧 is a constant because z is already determined in the forward pass. 𝜕𝑙 𝜕𝑧′′ 𝜕𝑙 𝜕𝑧 =𝜎′ 𝑧 𝑤 3 𝜕𝑙 𝜕𝑧′ + 𝑤 4 𝜕𝑙 𝜕𝑧′′ Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Backpropagation-Backward pass Compute 𝜕𝑙 𝜕𝑧 for all activation function inputs z 𝑤 1 𝑧 𝑎 𝑤 3 𝑧′ 𝑦 1 𝑥 1 𝜕𝑙 𝜕𝑧′ 𝜕𝑙 𝜕𝑧 b 𝑤 2 𝑤 4 𝑧’’ 𝑦 2 𝑥 2 𝜕𝑙 𝜕𝑧′′ Case 1. Output Layer 𝜕𝑙 𝜕𝑧′ = 𝜕 𝑦 1 𝜕𝑧′ 𝜕𝑙 𝜕 𝑦 1 𝜕𝑙 𝜕𝑧′′ = 𝜕 𝑦 2 𝜕𝑧′′ 𝜕𝑙 𝜕 𝑦 2 Done! Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Backpropagation-Backward pass Compute 𝜕𝑙 𝜕𝑧 for all activation function inputs z Case 2. Not Output Layer 𝑧′ …… 𝜕𝑙 𝜕𝑧′ 𝑧’’ …… 𝜕𝑙 𝜕𝑧′′ Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Backpropagation-Backward pass Compute 𝜕𝑙 𝜕𝑧 for all activation function inputs z Case 2. Not Output Layer 𝑧′ 𝑎′ 𝑤 5 𝑧 𝑎 𝜕𝑙 𝜕𝑧′ 𝜕𝑙 𝜕 𝑧 𝑎 𝑤 6 𝑧’’ 𝑧 𝑏 𝜕𝑙 𝜕𝑧′′ 𝜕𝑙 𝜕 𝑧 𝑏 Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Backpropagation-Backward pass Compute 𝜕𝑙 𝜕𝑧 for all activation function inputs z Case 2. Not Output Layer Compute 𝜕𝑙 𝜕𝑧 recursively 𝑧′ 𝑎′ 𝑤 5 𝑧 𝑎 𝜕𝑙 𝜕𝑧′ 𝜕𝑙 𝜕 𝑧 𝑎 𝜎′ 𝑧′ Until we reach the output layer …… 𝑤 6 𝑧’’ 𝑧 𝑏 𝜕𝑙 𝜕𝑧′′ 𝜕𝑙 𝜕 𝑧 𝑏 Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Backpropagation-Backward pass Compute 𝜕𝑙 𝜕𝑧 for all activation function inputs z Compute 𝜕𝑙 𝜕𝑧 from the output layer 𝜕𝑙 𝜕 𝑧 1 𝜕𝑙 𝜕 𝑧 3 𝜕𝑙 𝜕 𝑧 5 𝑧 1 𝑧 3 𝑧 5 𝑥 1 𝑦 1 𝑥 2 𝑦 2 𝑧 2 𝑧 4 𝑧 6 𝜕𝑙 𝜕 𝑧 2 𝜕𝑙 𝜕 𝑧 4 𝜕𝑙 𝜕 𝑧 6 Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Backpropagation-Backward pass Compute 𝜕𝑙 𝜕𝑧 for all activation function inputs z Compute 𝜕𝑙 𝜕𝑧 from the output layer 𝜕𝑙 𝜕 𝑧 1 𝜕𝑙 𝜕 𝑧 3 𝜕𝑙 𝜕 𝑧 5 𝑧 1 𝑧 3 𝑧 5 𝑥 1 𝑦 1 𝜎′ 𝑧 1 𝜎′ 𝑧 3 𝜎′ 𝑧 2 𝜎′ 𝑧 4 𝑥 2 𝑦 2 𝑧 2 𝑧 4 𝑧 6 𝜕𝑙 𝜕 𝑧 2 𝜕𝑙 𝜕 𝑧 4 𝜕𝑙 𝜕 𝑧 6 Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Backpropagation-Summary Forward Pass Backward Pass … … 𝑎 𝜕𝑧 𝜕𝑤 𝜕𝑙 𝜕𝑧 = 𝜕𝑙 𝜕𝑤 X =𝑎 for all w Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf

Backpropagation Implementation Consider the bitmap character images in the figure. We’ll train a neural network to take the cells of this image as the input (35 independent cells) and activate one of ten output cells representing the recognized pattern. While any of the output cells could be activated, we’ll take the largest activation as the cell to use in a style called winner-takes-all. Source of the picture: 《Artificial Intelligence: A Systems Approach——M.Tim Jones著》

Backpropagation Implementation The neural network that we’ll use is called a winner-takes-all network in which we have a number of output nodes, and we’ll select the one that has the largest activation. The largest activation indicates the number that was recognized. Source of the picture: 《Artificial Intelligence: A Systems Approach——M.Tim Jones著》

Backpropagation Implementation The input layer consists of 35 input cells (for each pixel in the image input), with 10 cells in the hidden layer. The output layer consists of 10 cells, one for each potential classification. The network is fully interconnected, with 350 connections between the input and hidden layer, and another 350 connections between the hidden layer and output layer (for a total of 700 weights). Source of the picture: 《Artificial Intelligence: A Systems Approach——M.Tim Jones著》

Backpropagation Implementation Neural network representation (inputs, activations, and weights) #define INPUT_NEURONS 35 #define HIDDEN_NEURONS 10 #define OUTPUT_NEURONS 10 double inputs[INPUT_NEURONS+1]; double hidden[HIDDEN_NEURONS+1]; double outputs[OUTPUT_NEURONS]; double w_h_i[HIDDEN_NEURONS][INPUT_NEURONS+1]; double w_o_h[OUTPUT_NEURONS][HIDDEN_NEURONS+1]; Source of the code: 《Artificial Intelligence: A Systems Approach——M.Tim Jones著》

Backpropagation Implementation Calculating the output activations with the feed_forward function void feed_forward( void ) { int i, j; /* Calculate outputs of the hidden layer */ for (i = 0 ; i < HIDDEN_NEURONS ; i++) { hidden[i] = 0.0; for (j = 0 ; j < INPUT_NEURONS+1 ; j++) { hidden[i] += (w_h_i[i][j] * inputs[j]); } hidden[i] = sigmoid( hidden[i] ); /* Calculate outputs for the output layer */ for (i = 0 ; i < OUTPUT_NEURONS ; i++) { outputs[i] = 0.0; for (j = 0 ; j < HIDDEN_NEURONS+1 ; j++) { outputs[i] += (w_o_h[i][j] * hidden[j] ); } outputs[i] = sigmoid( outputs[i] ); Source of the code: 《Artificial Intelligence: A Systems Approach——M.Tim Jones著》

Backpropagation Implementation Updating the weights given the backpropagation algorithm void backpropagate_error( int test ) { int out, hid, inp; double err_out[OUTPUT_NEURONS]; double err_hid[HIDDEN_NEURONS]; /* Compute the error for the output nodes (Equation 8.6) */ for (out = 0 ; out < OUTPUT_NEURONS ; out++) { err_out[out] = ((double)tests[test].output[out] - outputs[out]) * sigmoid_d(outputs[out]); } Source of the code: 《Artificial Intelligence: A Systems Approach——M.Tim Jones著》

Backpropagation Implementation Updating the weights given the backpropagation algorithm /* Compute the error for the hidden nodes (Equation 8.7) */ for (hid = 0 ; hid < HIDDEN_NEURONS ; hid++) { err_hid[hid] = 0.0; /* Include error contribution for all output nodes */ for (out = 0 ; out < OUTPUT_NEURONS ; out++) { err_hid[hid] += err_out[out] * w_o_h[out][hid]; } err_hid[hid] *= sigmoid_d( hidden[hid] ); Source of the code: 《Artificial Intelligence: A Systems Approach——M.Tim Jones著》

Backpropagation Implementation Updating the weights given the backpropagation algorithm /* Adjust the weights from the hidden to output layer (Equation 8.9) */ for (out = 0 ; out < OUTPUT_NEURONS ; out++) { for (hid = 0 ; hid < HIDDEN_NEURONS ; hid++) { w_o_h[out][hid] += RHO * err_out[out] * hidden[hid]; } Source of the code: 《Artificial Intelligence: A Systems Approach——M.Tim Jones著》

Backpropagation Implementation Updating the weights given the backpropagation algorithm /* Adjust the weights from the input to hidden layer (Equation 8.9) */ for (hid = 0 ; hid < HIDDEN_NEURONS ; hid++) { for (inp = 0 ; inp < INPUT_NEURONS+1 ; inp++) { w_h_i[hid][inp] += RHO * err_hid[hid] * inputs[inp]; } return; Source of the code: 《Artificial Intelligence: A Systems Approach——M.Tim Jones著》

Backpropagation Implementation The training and test loop (main function) int main( void ) { double mse, noise_prob; int test, i, j; RANDINIT(); init_network(); /* Training Loop */ do { /* Pick a test at random */ test = RANDMAX(MAX_TESTS); Source of the code: 《Artificial Intelligence: A Systems Approach——M.Tim Jones著》

Backpropagation Implementation The training and test loop (main function) /* Grab input image (with no noise) */ set_network_inputs( test, 0.0 ); /* Feed this data set forward */ feed_forward(); /* Backpropagate the error */ backpropagate_error( test ); /* Calculate the current MSE */ mse = calculate_mse( test ); } while (mse > 0.001); Source of the code: 《Artificial Intelligence: A Systems Approach——M.Tim Jones著》

Backpropagation Implementation The training and test loop (main function) /* Now, let’s test the network with increasing amounts of noise */ test = RANDMAX(MAX_TESTS); /* Start with 5% noise probability, end with 25% (per pixel) */ noise_prob = 0.05; for (i = 0 ; i < 5 ; i++) { set_network_inputs( test, noise_prob ); feed_forward(); for (j = 0 ; j < INPUT_NEURONS ; j++) { if ((j % 5) == 0) printf(“\n”); printf(“%d “, (int)inputs[j]); } printf( “\nclassified as %d\n\n”, classifier() ); noise_prob += 0.05; return 0; Source of the code: 《Artificial Intelligence: A Systems Approach——M.Tim Jones著》

Backpropagation Implementation The figure graphically illustrates the generalization capabilities of the network trained using error backpropagation. In both cases, once the error rate reaches 20%, the image is no longer recognizable. What’s shown in main is a common pattern for neural network training and use. Once a neural network has been trained, the weights can be saved off and used in the given application. Source of the words: 《Artificial Intelligence: A Systems Approach——M.Tim Jones著》

Backpropagation Implementation Baseline Image for Six Recognized as Six Recognized as Six Recognized as Six Recognized as Four Recognized as Eight Baseline Image for One Recognized as One Recognized as One Recognized as One Recognized as Seven Recognized as Five Source of the picture: 《Artificial Intelligence: A Systems Approach——M.Tim Jones著》

MNIST Based on Keras Import library files Source of the code: https://mlln.cn/2018/07/20/keras教程-04-手写字体识别/

MNIST Based on Keras Description of MNIST handwritten digit recognition problem This is a digital recognition task with 10 numbers (0 to 9) or 10 classes. Each image is a 28x28 pixel square (784 pixels total). Of these, 60,000 images were used to train the model, and a separate 10,000 image set was used to test the model. Source of the code: https://mlln.cn/2018/07/20/keras教程-04-手写字体识别/

MNIST Based on Keras Load image data Keras深度学习库提供了加载MNIST数据集的便捷方法。数据集在第一次调用此函数时自动下载，并作为15MB文件存储在〜/.keras/datasets/mnist.npz的主目录中。这对于开发和测试深度学习模型非常方便。为了演示加载MNIST数据集是多么容易，我们将首先编写一个小脚本来下载和可视化训练数据集中的第1个图像。 Source of the code: https://mlln.cn/2018/07/20/keras教程-04-手写字体识别/

MNIST Based on Keras Load image data 你应该能看出这个图上的数字是0。然后, 就是图片的本质, 图片本质上就是二维矩阵或者三维的张量。我们现在用到的图片是灰度图片, 没有颜色, 所以只需要一个二维矩阵即可。除了向上面那样查看图片外, 我们更多的是使用下面的方法。 Source of the code: https://mlln.cn/2018/07/20/keras教程-04-手写字体识别/

MNIST Based on Keras Load image data Source of the code: https://mlln.cn/2018/07/20/keras教程-04-手写字体识别/

MNIST Based on Keras Adjust the data format for easy calculation 我们的神经网络的输入为一个向量，因此我们需要对图片进行整形，以使每个28x28图像成为单个784维向量。我们还将数字缩放到[0-1]范围而不是[0-255] Source of the code: https://mlln.cn/2018/07/20/keras教程-04-手写字体识别/

MNIST Based on Keras Create a network 在这里，我们将做一个简单的3层全连接网络。 Source of the code: https://mlln.cn/2018/07/20/keras教程-04-手写字体识别/

MNIST Based on Keras Compile model Keras构建在TensorFlow之上，这两个软件包允许您在Python中定义计算图，然后它们可以在CPU或GPU上高效编译和运行，而无需Python解释器的开销。在编写模型时，Keras会要求您指定损失函数和优化器。我们在这里使用的损失函数称为分类交叉熵(categorical_crossentropy)，并且是一种非常适合比较两个概率分布的损失函数。在这里，我们的预测是十个不同数字的概率分布（例如“我们80％确信这个图像是3, 10％确定它是8, 5％它是2，等等”），而观察值Y_train和Y_text是概率分配正确类别为100％，其他所有类别为0。交叉熵是衡量预测分布与观察值分布的差异的度量。优化器有助于确定模型学习的速度。我们不会过多详细讨论这个问题，但“adam”通常是一个不错的选择。 Source of the code: https://mlln.cn/2018/07/20/keras教程-04-手写字体识别/

MNIST Based on Keras Training model 这是有趣的部分：您可以将之前加载的训练数据提供给此模型，它将学习对数字进行分类 Source of the code: https://mlln.cn/2018/07/20/keras教程-04-手写字体识别/

MNIST Based on Keras Evaluate performance Score[0] : loss Score[1] : accuracy Source of the code: https://mlln.cn/2018/07/20/keras教程-04-手写字体识别/

MNIST Based on Keras Check output 检查输出并确保一切看起来都很好，这总是一个好主意。在这里，我们将看一些正确的例子，以及一些错误的例子。 Source of the code: https://mlln.cn/2018/07/20/keras教程-04-手写字体识别/

MNIST Based on Keras Check output 检查输出并确保一切看起来都很好，这总是一个好主意。在这里，我们将看一些正确的例子，以及一些错误的例子。 Source of the code: https://mlln.cn/2018/07/20/keras教程-04-手写字体识别/