包含**函数的多层神经元网络及matlab实现
1. 两层神经网络结构
双输入单输出的两层神经网络结果如下图所示。其中括号中表示实际的权值、输出值、阈值等神经网络的参数,括号外的变量表示估计的权值、输出值、阈值等值。
具体的公式推导可参考周志华的《机器学习》清华大学出版社第5.3节的内容。
对于最后的代价函数的显示,可以用log函数来对其进行处理,以便能观察当代价函数的值很小时的变化趋势。
如果学习速率太大,则神经网可能会发散,如果学习速率太小,则神经网络收敛会很慢。因此,学习速率需要取一个比较合适的值。
有一点要特别注意,神经网络权值的更新必须在所有权值和阈值更新量计算完成后再实施。因为部分神经网络权值和阈值的更新过程中会用到其它部分的神经网的权值和阈值,如果不等全部更新值算出来就进行更新,那么就会导致权值更新的不同步;部分权值的第i步的更新量可能是基于其它部分权值的第i-1步的值而获得的;实际上,部分权值的第i步的更新量应该是基于其它部分权值的第i步的值而获得的;这样会导致系统的稳定和鲁棒性变差。对于输入数据和真实权值的接受范围变小。
其中权值和阈值的范围直接影响网络是否能收敛。如果权值和阈值的值偏大或偏小,都可能导致网络的cost函数停留在一个比较大的值,比如0.21。
2. 两层神经网络梯度下降公式推导
两层神经网络的输出层包含一个神经元,这个神经元不包含**函数。隐层包含两个神经元,这两个神经元包含sigmoid**函数。
(1). 前向计算过程
上式中,表示各种权值;a表示隐层第一个神经元的编号;b表示隐层第二个神经元的编号;s表示这个神经元是输出神经元。其偏置值对应θs0。表示隐层第一个神经元的输出,也是输出层神经元的第一个输入;表示隐层第二个神经元的输出,也是输出层神经元的第二个输入。
(2). 反向误差传播过程
(x1,y1),(x2,y2),(xi,yi),…(xt,yt)是真实的输入输出数据。反向误差传播的主要目标是,当神经网络的输入值是x1,x2,xi,…xt时,使得神经网络的输出值无线逼近输出数据y1,y2,yi,…yt。因此,定义损失函数如下。
损失函数的(cost Function)的内部构造如下面公式所述:
(4
误差反向传播主要用于减少权值和阈值的误差,将它们调整至期望的范围。调整方法主要是在原权值和阈值的基础上减去一个调整量。这个调整量就是根据梯度下降算法计算所获得的损失函数对于权值和阈值的偏导数与学习速率的乘积。
即
上式中r代表所有阈值或权值的下标,r可以包含多个值,比如对于输出层而言,r的值为s0,s1,s2,对应。(f12)
输出层的神经元的阈值和权值的偏导数如下所示:
隐层的第一个神经元(a)的阈值和权值的偏导数如下所示:
隐层的第二个神经元(b)的阈值和权值的偏导数如下所示:
3. Matlab实现不带**函数的两层神经元网络功能
clear all
clc
close all
pa0=6; pa1=6.7; pa2=0.2;
pb0=13; pb1=0.5; pb2=1.1;
s0=1.2; s1=1.6; s2=0.7;
x1=[7 9 12 5 4 9 23 5];
x2=[1 8 21 3 5 8 3 31];
x1_mean=mean(x1)
x1_max=max(x1)
x1_min=min(x1)
x2_mean=mean(x2)
x2_max=max(x2)
x2_min=min(x2)
x1=(x1-x1_mean)/(x1_max-x1_min)
x2=(x2-x2_mean)/(x2_max-x2_min)
ya=pa0+pa1*x1+pa2*x2;
yb=pb0+pb1*x1+pb2*x2;
ys=s0+s1*ya+s2*yb;
num_sample=size(ys,2);
% gradient descending process
% initial values of parameters
thetaa0=9/100; thetaa1=3/100; thetaa2=9/100;
thetab0=4/100; thetab1=2/100; thetab2=1/100;
thetas0=1/100; thetas1=3/100; thetas2=2/1000;
%learning rate
alpha=0.0001; % good for the system
epoch=8200;
for k=1:epoch
% v_k=k
if k<=3000
alpha=0.01;
elseif k<=6000
alpha=0.01;
else
alpha=0.001;
end
ha_theta_x=thetaa0+thetaa1*x1+thetaa2*x2; % hypothesis function
hb_theta_x=thetab0+thetab1*x1+thetab2*x2; % hypothesis function
hs_theta_x=thetas0+thetas1*ha_theta_x+thetas2*hb_theta_x; % hypothesis function
Jcosts(k)=sum((hs_theta_x-ys).^2)/num_sample;
rs0=sum(hs_theta_x-ys); thetas0=thetas0-alpha*rs0/num_sample;
rs1=sum((hs_theta_x-ys).*ha_theta_x); thetas1=thetas1-alpha*rs1/num_sample;
rs2=sum((hs_theta_x-ys).*hb_theta_x); thetas2=thetas2-alpha*rs2/num_sample;
ra0=sum((hs_theta_x-ys).*thetas1); thetaa0=thetaa0-alpha*ra0/num_sample;
ra1=sum(((hs_theta_x-ys).*thetas1).*x1);thetaa1=thetaa1-alpha*ra1/num_sample;
ra2=sum(((hs_theta_x-ys).*thetas1).*x2);thetaa2=thetaa2-alpha*ra2/num_sample;
rb0=sum((hs_theta_x-ys).*thetas2); thetab0=thetab0-alpha*rb0/num_sample;
rb1=sum(((hs_theta_x-ys).*thetas2).*x1);thetab1=thetab1-alpha*rb1/num_sample;
rb2=sum(((hs_theta_x-ys).*thetas2).*x2);thetab2=thetab2-alpha*rb2/num_sample;
end
yst=thetas0+thetas1*ha_theta_x+thetas2*hb_theta_x;
v_yst=yst
v_ys=ys
figure
v_Jcosts=Jcosts(k)
plot(log(Jcosts))
v_Jcosts =1.8144e-28
4. Matlab实现带sigmoid**函数的两层神经元网络功能
% hidden layer neurons use the activation function
clear all
clc
close all
pa0=0.06; pa1=0.07; pa2=0.02;
pb0=0.03; pb1=0.05; pb2=0.1;
s0=0.02; s1=0.06; s2=0.07;
x1=[7 9 12 5 4 9 23 5];
x2=[1 8 21 3 5 8 3 31];
x1_mean=mean(x1)
x1_max=max(x1)
x1_min=min(x1)
x2_mean=mean(x2)
x2_max=max(x2)
x2_min=min(x2)
x1=(x1-x1_mean)/(x1_max-x1_min)
x2=(x2-x2_mean)/(x2_max-x2_min)
ya=pa0+pa1*x1+pa2*x2; ya_h=1./(1+exp(-ya));
yb=pb0+pb1*x1+pb2*x2; yb_h=1./(1+exp(-yb));
ys=s0+s1*ya_h+s2*yb_h;
num_sample=size(ys,2);
% gradient descending process
% initial values of parameters
thetaa0=9/100; thetaa1=3/100; thetaa2=9/100;
thetab0=4/100; thetab1=2/100; thetab2=1/100;
thetas0=1/100; thetas1=3/100; thetas2=2/1000;
%learning rate
alpha=0.001; % good for the system
epoch=7900;
for k=1:epoch
% v_k=k
if k<=3000
alpha=0.001;
elseif k<=4000
alpha=0.0001;
else
alpha=0.0001;
end
ha_theta_x=thetaa0+thetaa1*x1+thetaa2*x2;ha_out=1./(1+exp(-ha_theta_x));
hb_theta_x=thetab0+thetab1*x1+thetab2*x2;hb_out=1./(1+exp(-hb_theta_x));
hs_theta_x=thetas0+thetas1*ha_out+thetas2*hb_out; % hypothesisfunction
Jcosts(k)=sum((hs_theta_x-ys).^2)/num_sample;
rs0=sum(hs_theta_x-ys);
rs1=sum((hs_theta_x-ys).*ha_theta_x);
rs2=sum((hs_theta_x-ys).*hb_theta_x);
dha_out=ha_out.*(1-ha_out);
ra0=sum(((hs_theta_x-ys).*thetas1).*dha_out);
ra1=sum((((hs_theta_x-ys).*thetas1).*dha_out).*x1);
ra2=sum((((hs_theta_x-ys).*thetas1).*dha_out).*x2);
dhb_out=hb_out.*(1-hb_out);
rb0=sum(((hs_theta_x-ys).*thetas2).*dhb_out);
rb1=sum((((hs_theta_x-ys).*thetas2).*dhb_out).*x1);
rb2=sum((((hs_theta_x-ys).*thetas2).*dhb_out).*x2);
thetas0=thetas0-alpha*rs0/num_sample;
thetas1=thetas1-alpha*rs1/num_sample;
thetas2=thetas2-alpha*rs2/num_sample;
thetaa0=thetaa0-alpha*ra0/num_sample;
thetaa1=thetaa1-alpha*ra1/num_sample;
thetaa2=thetaa2-alpha*ra2/num_sample;
thetab0=thetab0-alpha*rb0/num_sample;
thetab1=thetab1-alpha*rb1/num_sample;
thetab2=thetab2-alpha*rb2/num_sample;
end
yst=thetas0+thetas1*ha_out+thetas2*hb_out;
v_yst=yst
v_ys=ys
figure
v_Jcosts=Jcosts(k)
plot(log(Jcosts))
v_Jcosts =2.3851e-06
5. Matlab实现带sigmoid**函数的两层神经元网络功能2
v_Jcosts =7.2385877e-28
% hidden layer neuronsuse the activation function
% update action isperformed after all update values are obtained.
% the above item is veryimportant. because if the update action is
% performed before allupdate values are obtained, the update action may
% use the new updatevalues to update former weights or biases. the newwork
% may be unstable.
clear all
clc
close all
% pa0=2.06; pa1=3.7;pa2=0.02;
% pb0=0.3; pb1=1.5;pb2=0.1;
% s0=0.2; s1=1.6; s2=2.7;
pa0=0.6; pa1=11; pa2=0.2;
pb0=3; pb1=0.5; pb2=8.1;
s0=1.2; s1=5.6; s2=0.7;
% x1=[1 9 12 5 6 8];
% x2=[0 8 4 -9 7 2];
x1=[1 -9 12 -5 6];
x2=[0 8 4 -9 7];
x1_mean=mean(x1);
x1_max=max(x1);
x1_min=min(x1);
x2_mean=mean(x2);
x2_max=max(x2);
x2_min=min(x2);
x1=(x1-x1_mean)/(x1_max-x1_min);
x2=(x2-x2_mean)/(x2_max-x2_min);
% ya=pa0+pa1*x1+pa2*x2;ya_h=1./(1+exp(-ya));
% yb=pb0+pb1*x1+pb2*x2;yb_h=1./(1+exp(-yb));
% ys=s0+s1*ya_h+s2*yb_h;
ya=pa0+pa1*x1.*x1+pa2*x2;ya_h=1./(1+exp(-ya));
yb=pb0+pb1*x1+pb2*x2.*x2;yb_h=1./(1+exp(-yb));
ys=s0+s1*ya_h+s2*yb_h
num_sample=size(ys,2);
% gradient descendingprocess
% initial values ofparameters
thetaa0=9/100; thetaa1=3/100;thetaa2=9/100;
thetab0=4/100; thetab1=2/100;thetab2=1/100;
thetas0=1/100; thetas1=3/100;thetas2=2/1000;
%learning rate
alpha=0.1; % good for the system
epoch=68900;
for k=1:epoch
v_k=k
if k<=3000
alpha=0.9;
elseif k<=9000
alpha=0.09;
elseif k<=30000
alpha=0.09;
else
alpha=0.009;
end
ha_theta_x=thetaa0+thetaa1*x1+thetaa2*x2;ha_out=1./(1+exp(-ha_theta_x));
hb_theta_x=thetab0+thetab1*x1+thetab2*x2;hb_out=1./(1+exp(-hb_theta_x));
hs_theta_x=thetas0+thetas1*ha_out+thetas2*hb_out; %hypothesis function
v_hs_theta_x=hs_theta_x;
Jcosts(k)=sum((hs_theta_x-ys).^2)/num_sample;
rs0=sum(hs_theta_x-ys);
rs1=sum((hs_theta_x-ys).*ha_out);
rs2=sum((hs_theta_x-ys).*hb_out);
% rs1=sum((hs_theta_x-ys).*ha_theta_x); thetas1=thetas1-alpha*rs1/num_sample;
% rs2=sum((hs_theta_x-ys).*hb_theta_x); thetas2=thetas2-alpha*rs2/num_sample;
dha_out=ha_out.*(1-ha_out);
ra0=sum(((hs_theta_x-ys).*thetas1).*dha_out);
ra1=sum((((hs_theta_x-ys).*thetas1).*dha_out).*x1);
ra2=sum((((hs_theta_x-ys).*thetas1).*dha_out).*x2);
dhb_out=hb_out.*(1-hb_out);
rb0=sum(((hs_theta_x-ys).*thetas2).*dhb_out);
rb1=sum((((hs_theta_x-ys).*thetas2).*dhb_out).*x1);
rb2=sum((((hs_theta_x-ys).*thetas2).*dhb_out).*x2);
%update of the weight and threshold
thetas0=thetas0-alpha*rs0/num_sample;
thetas1=thetas1-alpha*rs1/num_sample;
thetas2=thetas2-alpha*rs2/num_sample;
thetaa0=thetaa0-alpha*ra0/num_sample;
thetaa1=thetaa1-alpha*ra1/num_sample;
thetaa2=thetaa2-alpha*ra2/num_sample;
thetab0=thetab0-alpha*rb0/num_sample;
thetab1=thetab1-alpha*rb1/num_sample;
thetab2=thetab2-alpha*rb2/num_sample;
end
yst=thetas0+thetas1*ha_out+thetas2*hb_out;
v_yst=yst
v_ys=ys
figure
v_Jcosts=Jcosts(k)
plot(log(Jcosts))