Mathematics Basics - Linear Algebra (Matrix Part 3)

Eigen Theory

Define Eigen Vectors and Eigen Values

We have seen previously how to transform a single vector to a new vector space by matrix multiplication. In fact, we can also apply transformation on every vector in a vector space. Scaling, sheering, rotating and other types of linear transformation we have learned can all be applied to multiple vectors at the same time. In order to differentiate one transformation from the other, we have to find some characteristics of each transformation. One kind of characteristics is what we will discover later by Eigen vectors and Eigen values.

Let’s start from a transformation in 2-D space. Such a transformation can be visualized as follows. We draw a square centered at the origin with its sides intersecting the axis at basis vectors. Then we perform a transformation by distorting the shape of this square according to the change in basis vectors. The resultant shape shows the change of vectors in the original vector space. For example, we can scale every vector in the original square vertically by factor of 2 as shown below.

Mathematics Basics - Linear Algebra (Matrix Part 3)

Vectors v1v_1, v2v_2 and v3v_3 have been transformed to v1v_1', v2v_2' and v3v_3' respectively. It can be seen that v1v_1 and v2v_2 are special because they still lie on the same line after the scaling transformation. These special vectors are the characteristics of this transformation. They are also called Eigen vectors. On the other hand, v3v_3 originally is at 45° to the axis but it is now changed to a different angle. It is thus not an Eigen vector in this scaling transformation. In addition, length of v1v_1 is doubled after transformation, so v1v_1 has an Eigen value of 2. Length of v2v_2 is unchanged. Hence Eigen value for v2v_2 is 1.

We can look at one more example to find the Eigen vectors of a sheering transformation below.

Mathematics Basics - Linear Algebra (Matrix Part 3)

For sheering in the horizontal direction, only v2v_2 remains in its original direction. Both v1v_1 and v3v_3 have changed direction after transformation. Therefore v2v_2 is the Eigen vector in this transformation with an Eigen value 1.

Lastly, we can see what happens if we apply a rotation transformation.

Mathematics Basics - Linear Algebra (Matrix Part 3)

This time no vector remains in its original direction, thus there is no Eigen vector in this transformation.

The concept of Eigen vectors and Eigen values is the same for transformation in three or more dimensions. In general, we define a vector as Eigen vector if, after linear transformation, it still lies on exactly the same or opposite direction as before. The stretch or shrink in length for the Eigen vectors are defined as Eigen values. Therefore, the combination of Eigen vectors and their associated Eigen values can help us define a linear transformation in any dimensions.

Compute Eigen Vectors and Eigen Values Mathematically

It’s relatively easier to visually deduce the Eigen vectors when only simple transformation is performed. However, it becomes challenging soon when we have composite transformation or transformation in more than two dimensions. Therefore, we need to find an analytical method for solving Eigen vectors in a more complicated transformation.

It turns out all Eigen vectors in a transformation must satisfy following equation.

Ax=λx(1) Ax = \lambda x \tag{1}

where xx is the Eigen vector, AA is an n by n transformation matrix and λ\lambda is a scalar (Eigen value).

This equation tells us when an Eigen vector is multiplied by a transformation matrix, the result is just a scalar multiplication of that Eigen vector.

We can rearrange equation (1) to get

(AλI)x=0(2) (A-\lambda I)x=0 \tag{2}

Identity matrix II is introduced here so we can take the proper subtraction between matrix AA and scalar λ\lambda.

The solution to equation (2) could be either x=0x=0 or AλI=0A-\lambda I = 0. However, x=0x=0 is a trivial solution because the point at origin has no magnitude or direction thus is not transformed. We should instead solve for AλI=0A-\lambda I = 0. This is equivalent to say the magnitude of matrix AλIA-\lambda I is 0, i.e., det(AλI)=0det(A-\lambda I)=0.

It’s generally hard to calculate matrix determinant. However, we learned previously that there is a shortcut to solve for matrix determinant if it is a 2 by 2 matrix. For a matrix A=(abcd)A=\begin{pmatrix}a&b\\c&d\end{pmatrix}, we can get,

det(AλI)=0det((abcd)(λ00λ))=0det((aλbcdλ))=0(aλ)(dλ)bc=0λ2(a+d)λ+adbc=0(3) \begin{aligned} det(A-\lambda I)&=0 \\ det\left(\begin{pmatrix}a&b\\c&d\end{pmatrix}-\begin{pmatrix}\lambda&0\\0&\lambda\end{pmatrix}\right)&=0 \\ det\left(\begin{pmatrix}a-\lambda & b\\c & d-\lambda\end{pmatrix}\right)&=0 \\ (a-\lambda)(d-\lambda)-bc&=0 \\ \lambda^2-(a+d)\lambda+ad-bc&=0 \end{aligned} \tag{3}

The solution to λ\lambda in equation (3) is the Eigen value. We can then use this Eigen value to find its corresponding Eigen vectors.

Take our vertical scaling transformation as an example. The transformation matrix is represented by A=(1002)A=\begin{pmatrix}1&0\\0&2\end{pmatrix}. We solve for det(AλI)=0det(A-\lambda I) = 0.

det((1002)λI)=0λ2(1+2)λ+1200=0λ23λ+2=0 \begin{aligned} det\left(\begin{pmatrix}1&0\\0&2\end{pmatrix}-\lambda I\right)&=0 \\ \lambda^2-(1+2)\lambda+1*2-0*0&=0\\ \lambda^2-3\lambda+2&=0 \end{aligned}

The solution is λ=1\lambda=1 or λ=2\lambda=2.

When λ=1\lambda = 1, substituting AA and λ\lambda into equation (2) we get

((1002)1(1001))(x1x2)=0(0001)(x1x2)=0(0x2)=0 \begin{aligned} \left(\begin{pmatrix}1&0\\0&2\end{pmatrix}-1*\begin{pmatrix}1&0\\0&1\end{pmatrix}\right)*\begin{pmatrix}x_1\\x_2\end{pmatrix}&=0 \\ \begin{pmatrix}0&0\\0&1\end{pmatrix}*\begin{pmatrix}x_1\\x_2\end{pmatrix}&=0 \\ \begin{pmatrix}0\\x_2\end{pmatrix}&=0 \end{aligned}

This shows that any vector with x2=0x_2 = 0, i.e. along the horizontal axis, is an Eigen vector. The Eigen value is 1. This aligns with vector v2v_2 in our previous illustration.

When λ=2\lambda = 2, substituting AA and λ\lambda into equation (2) we get

((1002)2(1001))(x1x2)=0(1000)(x1x2)=0(x10)=0 \begin{aligned} \left(\begin{pmatrix}1&0\\0&2\end{pmatrix}-2*\begin{pmatrix}1&0\\0&1\end{pmatrix}\right)*\begin{pmatrix}x_1\\x_2\end{pmatrix}&=0 \\ \begin{pmatrix}-1&0\\0&0\end{pmatrix}*\begin{pmatrix}x_1\\x_2\end{pmatrix}&=0 \\ \begin{pmatrix}-x_1\\0\end{pmatrix}&=0 \end{aligned}

This shows that any vector with x1=0x_1=0, i.e. along the vertical axis, is an Eigen vector. The Eigen value is 2. This aligns with vector v1v_1 in our previous illustration.

Let’s do another example with the rotation transformation. The transformation matrix is represented by A=(0110)A=\begin{pmatrix}0&-1\\1&0\end{pmatrix}. Again, we solve for det(AλI)=0det(A-\lambda I )= 0.

det((0110)λI)=0λ2(0+0)λ+00(1)1=0λ2+1=0 \begin{aligned} det\left(\begin{pmatrix}0&-1\\1&0\end{pmatrix}-\lambda I\right)&=0 \\ \lambda^2-(0+0)\lambda+0*0-(-1)*1&=0\\ \lambda^2+1&=0 \end{aligned}

There is no real number solution for this equation. Therefore, no Eigen value or Eigen vector can be found for the rotation transformation. This result agrees with what we observed intuitively with the previous illustration.

Of course, this method of finding Eigen values and Eigen vectors can get much more difficult for higher dimensional space as it involves solving the roots of a polynomial equation of order n, dimension of matrix AA. We shall then turn to computers which employs some iterative method to solve this problem . Nonetheless, understanding the underlying concepts of this analytical method is still valuable.

Use Eigen Vectors to Perform Repeated Multiplication

We shall look at one important application of Eigen vectors, repeated multiplication of a matrix by itself. This happens, for example, when we apply the same transformation matrix to a vector many times. In order to apply a transformation matrix TT on a vector vv, we perform one multiplication calculation TvT*v. If we apply the same transformation again, we get TTvT*T*v or T2vT^2*v. Consequently, if we apply the same transformation n times we get TnvT^n*v.

It is in general difficult to perform matrix multiplication repeatedly because rows and columns of the matrix have to be multiplied and summed repeatedly. There is one exception though. If matrix TT is a diagonal matrix (all the terms in this matrix are zero except for those along the leading diagonal), repeated multiplication of matrix TT can be simplified.

Let’s take an example of T=(2003)T=\begin{pmatrix}2&0\\0&3\end{pmatrix} and try to multiply it by itself repeatedly.

KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ T^2&=\begin{pm…

Therefore, we can derive a simplified formula to raise the power of a diagonal matrix TT.

(t11000t22000tii)n=((t11)n000(t22)n000(tii)n) \begin{pmatrix} t_{11}&0&\cdots&0\\ 0&t_{22}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&t_{ii} \end{pmatrix}^{n} =\begin{pmatrix} (t_{11})^n&0&\cdots&0\\ 0&(t_{22})^n&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&(t_{ii})^n \end{pmatrix}

Note this is true regardless the number of dimension of TT.

The task then becomes how we can convert a matrix from its original form to a diagonal matrix form. Let’s take an example of a 2 by 2 matrix T=(1102)T=\begin{pmatrix}1&1\\0&2\end{pmatrix}. We can solve by equation (3) to get Eigen values λ1=1\lambda_1=1 or λ2=2\lambda_2=2. A diagonal matrix DD can be constructed with these 2 Eigen values, D=(λ100λ2)D=\begin{pmatrix}\lambda_1&0\\0&\lambda_2\end{pmatrix}. With the Eigen values, we could also come up with 2 Eigen vectors (10)\begin{pmatrix}1\\0\end{pmatrix} and (11)\begin{pmatrix}1\\1\end{pmatrix}. These are our column vectors in an Eigen matrix C=(1101)C=\begin{pmatrix}1&1\\0&1\end{pmatrix}. It turns out that the multiplication result of TCT*C is the same as CDC*D due to the special properties of Eigen vectors.

TC=(1102)(1101)=(1202)=(1101)(1002)=CD \begin{aligned} T*C&=\begin{pmatrix}1&1\\0&2\end{pmatrix}*\begin{pmatrix}1&1\\0&1\end{pmatrix}\\ &=\begin{pmatrix}1&2\\0&2\end{pmatrix}\\ &=\begin{pmatrix}1&1\\0&1\end{pmatrix}*\begin{pmatrix}1&0\\0&2\end{pmatrix}\\ &=C*D \end{aligned}

Since TC=CDT*C=C*D, we can get an expression for TT alone.

TC=CDT=CDC1(4) \begin{aligned} T*C&=C*D\\ T&=C*D*C^{-1} \end{aligned} \tag{4}

In equation (4), we have CC as an Eigen matrix, DD as a diagonal matrix and C1C^{-1} as the inverse of CC. So to raise power for matrix TT,

T2=CDC1CDC1=CDIDC1=CD2C1(5) \begin{aligned} T^2&=C*D*C^{-1}*C*D*C^{-1}\\ &=C*D*I*D*C^{-1}\\ &=C*D^2*C^{-1}\tag{5} \end{aligned}

T3=CD2C1CDC1=CD2IDC1=CD3C1(6) \begin{aligned} T^3&=C*D^2*C^{-1}*C*D*C^{-1}\\ &=C*D^2*I*D*C^{-1}\\ &=C*D^3*C^{-1}\tag{6}\\ \end{aligned}

Tn=CDnC1 \vdots\\ T^n=C*D^n*C^{-1}

Let’s verify this result by computing T2T^2 and T3T^3 directly.

T2=(1102)(1102)=(1304)T3=(1304)(1102)=(1708) \begin{aligned} T^2=\begin{pmatrix}1&1\\0&2\end{pmatrix}*\begin{pmatrix}1&1\\0&2\end{pmatrix}=\begin{pmatrix}1&3\\0&4\end{pmatrix}\\ T^3=\begin{pmatrix}1&3\\0&4\end{pmatrix}*\begin{pmatrix}1&1\\0&2\end{pmatrix}=\begin{pmatrix}1&7\\0&8\end{pmatrix} \end{aligned}

Substituting C=(1101)C=\begin{pmatrix}1&1\\0&1\end{pmatrix}, D=(1002)D=\begin{pmatrix}1&0\\0&2\end{pmatrix} and C1=(1101)C^{-1}=\begin{pmatrix}1&-1\\0&1\end{pmatrix} into equation (5) and (6), we can get the same result.

T2=CD2C1=(1101)(1002)2(1101)=(1404)(1101)=(1304)T3=CD3C1=(1101)(1002)3(1101)=(1808)(1101)=(1708) \begin{aligned} T^2=C*D^2*C^{-1}=\begin{pmatrix}1&1\\0&1\end{pmatrix}*\begin{pmatrix}1&0\\0&2\end{pmatrix}^2*\begin{pmatrix}1&-1\\0&1\end{pmatrix}=\begin{pmatrix}1&4\\0&4\end{pmatrix}*\begin{pmatrix}1&-1\\0&1\end{pmatrix}=\begin{pmatrix}1&3\\0&4\end{pmatrix}\\ T^3=C*D^3*C^{-1}=\begin{pmatrix}1&1\\0&1\end{pmatrix}*\begin{pmatrix}1&0\\0&2\end{pmatrix}^3*\begin{pmatrix}1&-1\\0&1\end{pmatrix}=\begin{pmatrix}1&8\\0&8\end{pmatrix}*\begin{pmatrix}1&-1\\0&1\end{pmatrix}=\begin{pmatrix}1&7\\0&8\end{pmatrix} \end{aligned}

This proves our formula An=CDnC1A^n=C*D^n*C^{-1} works as expected.

Therefore, if we need to apply a transformation matrix TT on vector vv repeatedly, we can write

Tnv=CDnC1v T^n*v=C*D^n*C^{-1}*v

In a step-by-step manner, here is what we need to do to calculate TnvT^nv.

  1. Calculate the Eigen vectors and Eigen values of matrix TT.
  2. Construct an Eigen matrix CC from Eigen vectors which defines a new vector space.
  3. Construct a diagonal matrix DD from Eigen values.
  4. Find the inverse of matrix CC.
  5. Transform vector vv from its original vector space into the new vector space, vC=C1vv_C=C^{-1}*v.
  6. Multiply the transformed vector by the diagonal matrix DD repeatedly in new vector space, [Tnv]C=DnvC[T^nv]_C=D^n*v_C.
  7. Transform the resultant vector back to the original vector space, Tnv=C[Tnv]CT^nv=C*[T^nv]_C.

Of course, there is little efficiency gained if we are just doing a transformation 2 or 3 times. The extra steps of computing Eigen vectors and matrix inverse are not simple either. This way of diagonalizing matrix before raising its power becomes much handier when we need to multiply a matrix by a large number of times. One good example is the famous PageRank algorithm used in ranking Google’s search result.

PageRank Application of Repeated Matrix Multiplication

In order to illustrate the idea of PageRank algorithm, we can make a mini “network” as follows. Each circle A, B, C and D represents a webpage. And the arrow between them shows the page link from one to the other. Our goal is to rank the relevance of these webpages for people who have received them as their search result.

Mathematics Basics - Linear Algebra (Matrix Part 3)

The central idea underlying the PangeRank algorithm is that the importance of a webpage is related to its links to and from other webpages. We can represent each webpage as a vector in which each row is either 0 or 1 depending on whether there is a link to the corresponding page. For example, for webpage A, it’s linked to webpage B, C and D except for itself. This can be represented as a vector (0111)\begin{pmatrix}0\\1\\1\\1\end{pmatrix}. We then normalize this vector to ensure the sum of all rows equal to 1. We call it the link vector for webpage A and denote it as lAl_A.

lA=(0131313) l_A=\begin{pmatrix}0\\\frac{1}{3}\\\frac{1}{3}\\\frac{1}{3}\end{pmatrix}

Similarly, we can obtain the link vectors for webpage B, C and D.

lB=(120012),lC=(0001),lD=(012120) \begin{aligned} l_B=\begin{pmatrix}\frac{1}{2}\\0\\0\\\frac{1}{2}\end{pmatrix}, l_C=\begin{pmatrix}0\\0\\0\\1\end{pmatrix}, l_D=\begin{pmatrix}0\\\frac{1}{2}\\\frac{1}{2}\\0\end{pmatrix} \end{aligned}

A link matrix LL is formed by the link vectors of all webpages.

L=((lA)(lB)(lC)(lD))=(01200130012130012131210) L=\Bigg(\bigg(l_A\bigg)\bigg(l_B\bigg)\bigg(l_C\bigg)\bigg(l_D\bigg)\Bigg)=\begin{pmatrix}0&\frac{1}{2}&0&0\\\frac{1}{3}&0&0&\frac{1}{2}\\\frac{1}{3}&0&0&\frac{1}{2}\\\frac{1}{3}&\frac{1}{2}&1&0\end{pmatrix}

This matrix LL summarizes the probability of going from one page to another. For example, the probability of going to webpage A is 12\frac{1}{2} if we are at webpage B and is 0 elsewhere. And the probability of going to webpage B is 13\frac{1}{3} when we are at page A and 12\frac{1}{2} when we are at page D.

We define the rank of a webpage as the long-run probability of ending in that page after some random clicks from page links. This is a self-referential problem as the rank of all the pages are depending on all the other pages. To calculate the rank of webpage A, we need to know the rank of all the other pages, whether these pages link to webpage A and how many outgoing links are in total in these pages. We can define a vector rAr_A to represent the rank of webpage A as,

rA=j=1nLAjrj r_A=\sum_{j=1}^{n}L_{Aj}r_j

where nn is the total number of webpages, LAjL_{Aj} is the probability of going to webpage A from any webpage jj and rjr_j is the rank of webpage jj.

Moreover, the rank vector rr for all web pages can be expressed as

r=Lr r=Lr

The rank vector rr is updated by matrix LL every time we perform this multiplication. Therefore, at timestamp tt, the page rank at next timestamp t+1t+1 would be.

rt+1=Lrt r^{t+1}=Lr^t

This is an iterative process. We can do the multiplication repeatedly. If after many rounds of multiplication rr stops changing, it means we get a constant relationship r=Lrr=Lr. Thinking back our Eigen vector definition. When r=Lrr=Lr holds true for all further multiplication updates, rr would be equal to the Eigen vector of matrix LL with an Eigen value of 1. This is in fact another way for us to find the Eigen vectors of a matrix.

There is one more thing to consider before we kick off this iterative process. We need to assign an initial value for rr at time 0. In this example, we simply assume that all webpages are ranked equally at the beginning. Since there are 4 pages in total,

r0=(14141414) r^0=\begin{pmatrix}\frac{1}{4}\\\frac{1}{4}\\\frac{1}{4}\\\frac{1}{4}\end{pmatrix}

We can ask our computer to do the work of updating values of rr iteratively. Soon we can see that, with our mini network, vector rr converges to

r=(0.120.240.240.4) r=\begin{pmatrix}0.12\\0.24\\0.24\\0.4\end{pmatrix}

The result shows, by randomly clicking around in these 4 webpages there is 12% probability that a user will end up in webpage A. And that probability for page B, C and D is 24%, 24% and 40% respectively. Therefore, our rank of these 4 webpages will be D>B=C>A.

It also means the Eigen vector for matrix LL is (0.120.240.240.4)\begin{pmatrix}0.12\\0.24\\0.24\\0.4\end{pmatrix}. Let’s verify this result.

Lr=(01200130012130012131210)(0.120.240.240.4)=(120.24130.12+120.4130.12+120.4130.12+120.24+10.24)=(0.120.240.240.4)=r \begin{aligned} Lr&=\begin{pmatrix}0&\frac{1}{2}&0&0\\\frac{1}{3}&0&0&\frac{1}{2}\\\frac{1}{3}&0&0&\frac{1}{2}\\\frac{1}{3}&\frac{1}{2}&1&0\end{pmatrix}*\begin{pmatrix}0.12\\0.24\\0.24\\0.4\end{pmatrix}\\ &=\begin{pmatrix}\frac{1}{2}*0.24\\\frac{1}{3}*0.12+\frac{1}{2}*0.4\\\frac{1}{3}*0.12+\frac{1}{2}*0.4\\\frac{1}{3}*0.12+\frac{1}{2}*0.24+1*0.24\end{pmatrix}\\ &=\begin{pmatrix}0.12\\0.24\\0.24\\0.4\end{pmatrix}\\ &=r \end{aligned}

This, however, only presents the simplest idea of the PageRank algorithm. We did not cover damping factors or any other stability or optimization measures. Nonetheless, hopefully this can demonstrate some usefulness of Eigen vectors.

There are also other topics we did not cover in Eigen theory. For example, how do we perform repeated multiplication when a matrix is not diagonalizable? Our solution for Eigen vectors consist of only real numbers so far. It is in fact possible to have Eigen vectors with complex numbers. These, however, are not the topics we shall focus on here in linear algebra. Grasping the core concept of Eigen vectors and Eigen values would set you up in good shape for what we will discuss in future for other machine learning topics.

That is all. We can conclude our journey on vector and matrix. Hopefully this gives you a solid understanding of linear algebra and build up some intuition about concepts like change basis, matrix transformation, Eigen vectors, etc. These core concepts of linear algebra will also come in handy as you progress along your study of machine learning.


(Inspired by Mathematics for Machine Learning lecture series from Imperial College London)