【Andrew Ng Deep Learning个人学习笔记】 2、神经网络基础(2)

计算图(Computation Graph)

举例:
  J(a,b,c)=3(a+bc)    {u=bcv=a+uJ=3vJ(a,b,c)=3(a+bc)\implies\begin{cases} u=bc \\ v=a+u \\ J=3v \end{cases}
那么这个函数的计算图为:
【Andrew Ng Deep Learning个人学习笔记】 2、神经网络基础(2)
  
  

逻辑回归梯度下降算法(Gradient descent algorithm)

单个训练样本(One training sample):
   z=wT+bz=w^T+b
   y^=a=σ(z)\hat{y}=a=\sigma(z)
   L(a,y)=(ylog(a)+(1y)log(1a))L(a,y)=-(ylog(a)+(1-y)log(1-a))
  计算图(Computaion Graph):
【Andrew Ng Deep Learning个人学习笔记】 2、神经网络基础(2)
  计算导数(Derivative):
   dl(a,y)da=ya+1y1a\frac{dl(a,y)}{da}=-\frac{y}{a}+\frac{1-y}{1-a}
   dl(a,y)dz=dldadadz\frac{dl(a,y)}{dz}=\frac{dl}{da}\cdot\frac{da}{dz}
       =(ya+1y1a)a(1a)=(-\frac{y}{a}+\frac{1-y}{1-a})a(1-a)
       =ay=a-y
   dl(a,y)dw1=x1(ay)\frac{dl(a,y)}{dw_1}=x_1(a-y)
   dl(a,y)dw2=x2(ay)\frac{dl(a,y)}{dw_2}=x_2(a-y)
   dl(a,y)db=ay\frac{dl(a,y)}{db}=a-y
  
  这实际上是把逻辑回归看作单层的神经网络,用反向传播算法(Back Propagation Algorithm)计算出各个参数的导数,以便下一步用梯度下降算法计算出代价最小的参数。
  
多个训练样本(m training samples):
  J(w,b)=1mi=1ml(a(i),y(i))J(w,b)=\frac{1}{m}\sum_{i=1}^{m}l(a^{(i)},y^{(i)})
  a(i)=y^(i)=σ(z(i))=σ(wTx(i)+b)a^{(i)}=\hat{y}^{(i)}=\sigma(z^{(i)})=\sigma(w^Tx^{(i)}+b)
  J(w,b)w1=1mi=1ml(a(i),y(i))w1\frac{∂J(w,b)}{∂w_1}=\frac{1}{m}\sum_{i=1}^{m}\frac{∂l(a^{(i)},y^{(i)})}{∂w_1}
  J(w,b)b=1mi=1ml(a(i),y(i))b\frac{∂J(w,b)}{∂b}=\frac{1}{m}\sum_{i=1}^{m}\frac{∂l(a^{(i)},y^{(i)})}{∂b}
  
逻辑回归算法(Logistic regression algorithm)
  Repeat{
      J=0;dw1=0;dw2=0;db=0J=0;dw_1=0;dw_2=0;db=0
     For i in range(m):
       z(i)=wTx(i)+bz^{(i)}=w^Tx^{(i)}+b
       a(i)=σ(z(i))a^{(i)}=\sigma(z^{(i)})
       J+=y(i)loga(i)+(1y(i))log(1a(i))J+=y^{(i)}loga^{(i)}+(1-y^{(i)})log(1-a^{(i)})
       dz(i)=a(i)y(i)dz^{(i)}=a^{(i)}-y^{(i)}
       dw1(i)+=x1(i)dz(i)dw_1^{(i)}+=x_1^{(i)}dz^{(i)}
       dw2(i)+=x2(i)dz(i)dw_2^{(i)}+=x_2^{(i)}dz^{(i)}
       db+=dz(i)db +=dz^{(i)}
     J/=mJ/=m
     dw1/=mdw_1/=m
     dw2/=mdw_2/=m
     db/=mdb/=m
     
     w1=w1αdw1w_1=w_1-\alpha dw_1
     w2=w2αdw2w_2=w_2-\alpha dw_2
     b=w1αdbb=w_1-\alpha db
  }
  
  未完待续…