 ## Presentation on theme: "-Artificial Neural Network- Adaline & Madaline"— Presentation transcript:

The proof of Least-Square Learning Rule 朝陽科技大學 李麗華 教授

ADALINE (Adaptive Linear Neuron or Adaptive Linear Element) is a single layer neural network. It was developed by Professor Bernard Widrow and his graduate student Ted Hoff at Stanford University in 1960. It is based on the McCulloch–Pitts neuron. It consists of a weight, a bias and a summation function. Reference: 朝陽科技大學 李麗華 教授

The difference between Adaline and the standard (McCulloch-Pitts) perceptron is that in the learning phase the weights are adjusted according to the weighted sum of the inputs (the net). In the standard perceptron, the net is passed to the activation (transfer) function and the function's output is used for adjusting the weights. There also exists an extension known as Madaline. Reference: 朝陽科技大學 李麗華 教授

X1 w1 PE w2 ☆ single processing element X2 wn Xn (*)ADALINE is not good for non linear separable problem. To deal with it, the MADALINE network with two units of ADALINE can help solving the XOR problem. 朝陽科技大學 李麗華 教授

ADALINE (2/3) Method : The value in each unit must be +1 or –1
(perceptron 是輸出 0和1 ) ‧net = 1 -1 different from perceptron's transfer function 朝陽科技大學 李麗華 教授

Madaline (Multiple Adaline) is using a set of ADALINEs in parallel as its input layers and a single PE (processing element) in its output layer. The network of ADALINES can span many layers. For problems with multiple input variables and one output, each input is applied to one Adaline. For similar problems with multiple outputs, Madaline of parallel process can be used. The Madaline network is useful for problems which involve prediction based on multiple inputs, such as weather forecasting (Input variables: barometric pressure, difference in pressure. Output variables: rain, cloudy, sunny). Reference: 朝陽科技大學 李麗華 教授

（Multilayer Adaline.） Wij netj Y no Wij in 2nd layer if more than half of netj ≥ 0, then output ＋1,otherwise, output －1 After the second layer, the majority vote is applied. 朝陽科技大學 李麗華 教授

Least-Square Learning Rule (1/6)
x x X = [ x , x , , x ] t , ( i.e . X = 1 ) 1 L k L 1 n k j M 字母 k :代表第 k 組input pattern t :代表向量轉置 (transpose) L : 代表input pattern數量 x n ) . i.e ( , ] [ 1 = n t w W L M n i å = netj = Wt Xj = wi xi = w0x0 + w1x1 + …+ wn xn 朝陽科技大學 李麗華 教授

Least-Square Learning Rule (2/6)
By applying the least-square learning rule the weights can be obtained by using the formula. R: Correlation Matrix P RW * = R ' L å R = , R ' = R ' + R ' + ... + R ' = X X t W * = R - 1 P where 1 2 L j j L j = 1 T X t Pt = j j L 朝陽科技大學 李麗華 教授

Least-Square Learning Rule (3/6)
X1 X2 X3 Tj 1 -1 朝陽科技大學 李麗華 教授

Least-Square Learning Rule (4/6)
Sol. 先算R 朝陽科技大學 李麗華 教授

Least-Square Learning Rule (5/6)

Least-Square Learning Rule (6/6)
Verify the net: 代入（1,1,0） net=3X1-2X2-2X3=1 Y=1 ok 代入（1,0,1） net=3X1-2X2-2X3=1 Y=1 ok 代入（1,1,1） net=3X1-2X2-2X3=-1 Y=-1 ok 3 ADALINE -2 X1 X2 X3 Y (*)同學請回家找出反矩陣的快速計算法 朝陽科技大學 李麗華 教授

Proof of Least Square Learning Rule(1/3)
We use Least Mean Square Error to ensure the minimum total error. As long as the total error approaches zero, the best solution is found. Therefore, we are looking for the minimum of〈 〉. Proof: 朝陽科技大學 李麗華 教授

Proof of Least Square Learning Rule(2/3)

Proof of Least Square Learning Rule(3/3)