CMU 11-785 L19 Hopfield network
Hopfield Net
- So far, neural networks for computation are all feedforward structures
Loopy network
- Each neuron is a perceptron with +1/-1 output
- Every neuron receives input from every other neuron
- Every neuron outputs signals to every other neuron
- At each time each neuron receives a “field”
∑
j
≠
i
w
j
i
y
j
+
b
i
\sum_{j \neq i} w_{j i} y_{j}+b_{i}
∑j=iwjiyj+bi
- If the sign of the field matches its own sign, it does not respond
- If the sign of the field opposes its own sign, it “flips” to match the sign of the field
- If the sign of the field at any neuron opposes its own sign, it “flips” to match the field
- Which will change the field at other nodes
- Which may then flip… and so on…
Filp behavior
-
Let y i − y^{-}_{i} yi− be the output of the i i i-th neuron just before it responds to the current field
-
Let y i + y_{i}^{+} yi+ be the output of the i i i-th neuron just after it responds to the current field
-
if y i − = sign ( ∑ j ≠ i w j i y j + b i ) y_{i}^{-}=\operatorname{sign}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right) yi−=sign(∑j=iwjiyj+bi), then y i + = − y i − y_{i}^{+} = -y_{i}^{-} yi+=−yi−
-
If the sign of the field matches its own sign, it does not flip
-
y i + ( ∑ j ≠ i w j i y j + b i ) − y i − ( ∑ j ≠ i w j i y j + b i ) = 0 y_{i}^{+}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)-y_{i}^{-}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)=0 yi+⎝⎛j=i∑wjiyj+bi⎠⎞−yi−⎝⎛j=i∑wjiyj+bi⎠⎞=0
-
-
if y i − ≠ sign ( ∑ j ≠ i w j i y j + b i ) y_{i}^{-}\neq\operatorname{sign}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right) yi−=sign(∑j=iwjiyj+bi), then y i + = − y i − y_{i}^{+} = -y_{i}^{-} yi+=−yi−
-
y i + ( ∑ j ≠ i w j i y j + b i ) − y i − ( ∑ j ≠ i w j i y j + b i ) = 2 y i + ( ∑ j ≠ i w j i y j + b i ) y_{i}^{+}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)-y_{i}^{-}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)=2 y_{i}^{+}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right) yi+⎝⎛j=i∑wjiyj+bi⎠⎞−yi−⎝⎛j=i∑wjiyj+bi⎠⎞=2yi+⎝⎛j=i∑wjiyj+bi⎠⎞
-
This term is always positive!
-
-
Every flip of a neuron is guaranteed to locally increase y i ( ∑ j ≠ i w j i y j + b i ) y_{i}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right) yi(∑j=iwjiyj+bi)
Globally
- Consider the following sum across all nodes
D ( y 1 , y 2 , … , y N ) = ∑ i y i ( ∑ j ≠ i w j i y j + b i ) = ∑ i , j ≠ i w i j y i y j + ∑ i b i y i \begin{array}{c} D\left(y_{1}, y_{2}, \ldots, y_{N}\right)=\sum_{i} y_{i}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right) \\\\ =\sum_{i, j \neq i} w_{i j} y_{i} y_{j}+\sum_{i} b_{i} y_{i} \end{array} D(y1,y2,…,yN)=∑iyi(∑j=iwjiyj+bi)=∑i,j=iwijyiyj+∑ibiyi
- Assume w i i = 0 w_{ii} = 0 wii=0
- For any unit k k k that “flips” because of the local field
Δ D ( y k ) = D ( y 1 , … , y k + , … , y N ) − D ( y 1 , … , y k − , … , y N ) \Delta D\left(y_{k}\right)=D\left(y_{1}, \ldots, y_{k}^{+}, \ldots, y_{N}\right)-D\left(y_{1}, \ldots, y_{k}^{-}, \ldots, y_{N}\right) ΔD(yk)=D(y1,…,yk+,…,yN)−D(y1,…,yk−,…,yN)
Δ D ( y k ) = ( y k + − y k − ) ( ∑ j ≠ k w j k y j + b k ) \Delta D\left(y_{k}\right)=\left(y_{k}^{+}-y_{k}^{-}\right)\left(\sum_{j \neq k} w_{j k} y_{j}+b_{k}\right) ΔD(yk)=(yk+−yk−)⎝⎛j=k∑wjkyj+bk⎠⎞
- This is always positive!
- Every flip of a unit results in an increase in D D D
Overall
- Flipping a unit will result in an increase (non-decrease) of
D = ∑ i , j ≠ i w i j y i y j + ∑ i b i y i D=\sum_{i, j \neq i} w_{i j} y_{i} y_{j}+\sum_{i} b_{i} y_{i} D=i,j=i∑wijyiyj+i∑biyi
- D D D is bounded
D max = ∑ i , j ≠ i ∣ w i j ∣ + ∑ i ∣ b i ∣ D_{\max }=\sum_{i, j \neq i}\left|w_{i j}\right|+\sum_{i}\left|b_{i}\right| Dmax=i,j=i∑∣wij∣+i∑∣bi∣
- The minimum increment of D D D in a flip is
Δ D min = min i , { y i , i = 1. … N } 2 ∣ ∑ j ≠ i w j i y j + b i ∣ \Delta D_{\min }=\min _{i,\{y_{i}, i=1 . \ldots N\}} 2|\sum_{j \neq i} w_{j i} y_{j}+b_{i}| ΔDmin=i,{yi,i=1.…N}min2∣j=i∑wjiyj+bi∣
- Any sequence of flips must converge in a finite number of steps
- Think of this as an infinite deep network where every weights at every layers are identical
- Find the maximum layer!
The Energy of a Hopfield Net
- Define the Energy of the network as
E = − ∑ i , j ≠ i w i j y i y j − ∑ i b i y i E=-\sum_{i, j \neq i} w_{i j} y_{i} y_{j}-\sum_{i} b_{i} y_{i} E=−i,j=i∑wijyiyj−i∑biyi
-
Just the negative of D D D
-
The evolution of a Hopfield network constantly decreases its energy
-
This is analogous to the potential energy of a spin glass(Magnetic diploes)
- The system will evolve until the energy hits a local minimum
-
We remove bias for better understanding
-
The network will evolve until it arrives at a local minimum in the energy contour
Content-addressable memory
- Each of the minima is a “stored” pattern
- If the network is initialized close to a stored pattern, it will inevitably evolve to the pattern
-
This is a content addressable memory
- Recall memory content from partial or corrupt values
- Also called associative memory
- Evolve and recall pattern by content, not by location
Evolution
- The network will evolve until it arrives at a local minimum in the energy contour
- We proved that every change in the network will result in decrease in energy
- So path to energy minimum is monotonic
For 2-neuron net
- Symmetric
- − 1 2 y T W y = − 1 2 ( − y ) T W ( − y ) -\frac{1}{2} \mathbf{y}^{T} \mathbf{W} \mathbf{y}=-\frac{1}{2}(-\mathbf{y})^{T} \mathbf{W}(-\mathbf{y}) −21yTWy=−21(−y)TW(−y)
- If y ^ \hat{y} y^ is a local minimum, so is − y ^ -\hat{y} −y^
Computational algorithm
- Very simple
- Updates can be done sequentially, or all at once
- Convergence when it deos not chage significantly any more
E = − ∑ i ∑ j > i w j i y j y i E=-\sum_{i} \sum_{j>i} w_{j i} y_{j} y_{i} E=−i∑j>i∑wjiyjyi
Issues
Store a specific pattern
- A network can store multiple patterns
- Every stable point is a stored pattern
- So we could design the net to store multiple patterns
- Remember that every stored pattern P P P is actually two stored patterns, P P P and − P -P −P
- How could the quadrtic function have multiple minimum? (Convex function)
- Input has constrain (belong to ( − 1 , 1 ) (-1,1) (−1,1) )
- Hebbian learning: w j i = y j y i w_{j i}=y_{j} y_{i} wji=yjyi
- Design a stationary pattern
- sign ( ∑ j ≠ i w j i y j ) = y i ∀ i \operatorname{sign}\left(\sum_{j \neq i} w_{j i} y_{j}\right)=y_{i} \quad \forall i sign(∑j=iwjiyj)=yi∀i
- So
- sign ( ∑ j ≠ i w j i y j ) = sign ( ∑ j ≠ i y j y i y j ) \operatorname{sign}\left(\sum_{j \neq i} w_{j i} y_{j}\right)=\operatorname{sign}\left(\sum_{j \neq i} y_{j} y_{i} y_{j}\right) sign(∑j=iwjiyj)=sign(∑j=iyjyiyj)
- = sign ( ∑ j ≠ i y j 2 y i ) = sign ( y i ) = y i \quad=\operatorname{sign}\left(\sum_{j \neq i} y_{j}^{2} y_{i}\right)=\operatorname{sign}\left(y_{i}\right)=y_{i} =sign(∑j=iyj2yi)=sign(yi)=yi
- Energy
- E = − ∑ i ∑ j < i w j i y j y i = − ∑ i ∑ j < i y i 2 y j 2 = − ∑ i ∑ j < i 1 = − 0.5 N ( N − 1 ) \begin{aligned} E=&-\sum_{i} \sum_{j<i} w_{j i} y_{j} y_{i}=-\sum_{i} \sum_{j<i} y_{i}^{2} y_{j}^{2} \\\\ &=-\sum_{i} \sum_{j<i} 1=-0.5 N(N-1) \end{aligned} E=−i∑j<i∑wjiyjyi=−i∑j<i∑yi2yj2=−i∑j<i∑1=−0.5N(N−1)
- This is the lowest possible energy value for the network
- Stored pattern has lowest energy
- No matter where it begin, it will evolve into yellow pattern(lowest energy)
How many patterns can we store?
- To store more than one pattern
w j i = ∑ y p ∈ { y p } y i p y j p w_{j i}=\sum_{\mathbf{y}_{p} \in\left\{\mathbf{y}_{p}\right\}} y_{i}^{p} y_{j}^{p} wji=yp∈{yp}∑yipyjp
- { y P } \{y_P\} {yP} is the set of patterns to store
- Super/subscript p p p represents the specific pattern
- Hopfield: For a network of neurons can store up to ~ 0.15 N 0.15N 0.15N patterns through Hebbian learning(Provided in PPT)
Orthogonal/ Non-orthogonal patterns
-
Orthogonal patterns
- Patterns are local minima (stationary and stable)
- No other local minima exist
- But patterns perfectly confusable for recall
- Patterns are local minima (stationary and stable)
-
Non-orthogonal patterns
- Patterns are local minima (stationary and stable)
- No other local minima exist
- Actual wells for patterns
- Patterns may be perfectly recalled! (Note K > 0.14 N)
- No other local minima exist
- Patterns are local minima (stationary and stable)
-
Two orthogonal 6-bit patterns
- Perfectly stationary and stable
- Several spurious “fake-memory” local minima…
Observations
-
Many “parasitic” patterns
- Undesired patterns that also become stable or attractors
-
Patterns that are non-orthogonal easier to remember
- I.e. patterns that are closer are easier to remember than patterns that are farther!!
-
Seems possible to store K > 0.14N patterns
- i.e. obtain a weight matrix W such that K > 0.14N patterns are stationary
- Possible to make more than 0.14N patterns at-least 1-bit stable