Introduction to Recurrent Neural Networks

What is RNN

The networks are recurrent because they performance same computations for all the elements of a sequence of input, and the output of each element dependents, in addition to current input, from all the previous commutations.

Why RNN

Sequential type information of the inputs
Video Analysis
Speech Recognition
Machine Translation
RNN have proved to have excellent performance in such problems

RNN Procedure

Introduction to Recurrent Neural Networks

Sigmoid Gradient

Introduction to Recurrent Neural Networks

The Vanish Gradient Problem

Consider the recurrent networks:

h_{t} = σ (U x_{t} + V h_{t - 1})

then,

h_{3} = σ (U x_{3} + V (σ (U x_{2} + V (σ (U x_{1})))))

\frac{\partial E_{3}}{\partial U} = \frac{\partial E_{3}}{\partial o u t_{3}} \frac{\partial o u t_{3}}{\partial h_{3}} \frac{\partial h_{3}}{\partial h_{2}} \frac{\partial h_{2}}{\partial h_{1}} \frac{\partial h_{1}}{\partial U}

LSTM Cell

Introduction to Recurrent Neural Networks

Input Gate
$g = t a n h (b^{g} + x_{t} U^{g} + h_{t - 1} V^{g})$
$i = σ (b^{i} + x_{t} U^{i} + h_{t - 1} V^{i})$
$o u t_{i} = g \circ i$
forget gate
$f = σ (b^{f} + x_{t} U^{f} + h_{t - 1} V^{f})$
$s_{t} = s_{t - 1} \circ f + g \circ i$
output gate
$o = σ (b^{o} + x_{t} U^{o} + h_{t - 1} V^{o})$
$h_{t} = t a n h (s_{t}) \circ o$

Reducing The Problem

\frac{\partial s_{t}}{\partial s_{t - 1}} = f

Reference

http://adventuresinmachinelearning.com/recurrent-neural-networks-lstm-tutorial-tensorflow/
Deep Learning with Tensorflow

Introduction to Recurrent Neural Networks

What is RNN

Why RNN

RNN Procedure

Sigmoid Gradient

The Vanish Gradient Problem

LSTM Cell

Reducing The Problem

Reference

相关推荐