6.4 The Gram–Schmidt process (格拉姆-施密特算法)
The Gram–Schmidt process
The Gram–Schmidt process is a simple algorithm for producing an orthogonal or orthonormal basis for any nonzero subspace of R n \mathbb R^n Rn.
EXAMPLE 1
Let
W
=
S
p
a
n
{
x
1
,
x
2
}
W = Span\{\boldsymbol x_1, \boldsymbol x_2\}
W=Span{x1,x2}, where
x
1
=
[
3
6
0
]
\boldsymbol x_1 =\begin{bmatrix}3\\6\\0\end{bmatrix}
x1=⎣⎡360⎦⎤ and
x
2
=
[
1
2
3
]
\boldsymbol x_2 =\begin{bmatrix}1\\2\\3\end{bmatrix}
x2=⎣⎡123⎦⎤. Construct an orthogonal basis
{
v
1
,
v
2
}
\{\boldsymbol v_1, \boldsymbol v_2\}
{v1,v2} for
W
W
W.
SOLUTION
The component of
x
2
\boldsymbol x_2
x2 orthogonal to
x
1
\boldsymbol x_1
x1 is
x
2
−
p
\boldsymbol x_2 -\boldsymbol p
x2−p, which is in
W
W
W. Let
v
1
=
x
1
\boldsymbol v_1 =\boldsymbol x_1
v1=x1 and
The next example fully illustrates the Gram–Schmidt process. Study it carefully.
EXAMPLE 2 Let
x
1
=
[
1
1
1
1
]
\boldsymbol x_1 =\begin{bmatrix}1\\1\\1\\1\end{bmatrix}
x1=⎣⎢⎢⎡1111⎦⎥⎥⎤,
x
2
=
[
0
1
1
1
]
\boldsymbol x_2 =\begin{bmatrix}0\\1\\1\\1\end{bmatrix}
x2=⎣⎢⎢⎡0111⎦⎥⎥⎤, and
x
3
=
[
0
0
1
1
]
\boldsymbol x_3 =\begin{bmatrix}0\\0\\1\\1\end{bmatrix}
x3=⎣⎢⎢⎡0011⎦⎥⎥⎤. Then
{
x
1
,
x
2
,
x
3
}
\{\boldsymbol x_1, \boldsymbol x_2,\boldsymbol x_3\}
{x1,x2,x3} is clearly linearly independent and thus is a basis for a subspace
W
W
W of
R
4
\mathbb R^4
R4. Construct an orthogonal basis for
W
W
W .
SOLUTION
S
t
e
p
1.
Step 1.
Step1. Let
v
1
=
x
1
\boldsymbol v_1 =\boldsymbol x_1
v1=x1 and
W
1
=
S
p
a
n
{
x
1
}
=
S
p
a
n
{
v
1
}
W_1 = Span\{\boldsymbol x_1\}= Span\{\boldsymbol v_1\}
W1=Span{x1}=Span{v1}.
S t e p 2. Step 2. Step2. Let v 2 \boldsymbol v_2 v2 be the vector produced by subtracting from x 2 \boldsymbol x_2 x2 its projection onto the subspace W 1 W_1 W1. That is, let
Then
{
v
1
,
v
2
}
\{\boldsymbol v_1,\boldsymbol v_2\}
{v1,v2} is an orthogonal basis for the subspace
W
2
W_2
W2 spanned by
x
1
\boldsymbol x_1
x1 and
x
2
\boldsymbol x_2
x2.
S t e p 2 ′ ( o p t i o n a l ) . Step 2' (optional). Step2′(optional). If appropriate, scale v 2 \boldsymbol v_2 v2 to simplify later computations. Since v 2 \boldsymbol v_2 v2 has fractional entries, it is convenient to scale it by a factor of 4:
S
t
e
p
3.
Step 3.
Step3. Let
v
3
\boldsymbol v_3
v3 be the vector produced by subtracting from
x
3
\boldsymbol x_3
x3 its projection onto the subspace
W
2
W_2
W2. Use the orthogonal basis
{
v
1
,
v
2
′
}
\{\boldsymbol v_1,\boldsymbol v_2'\}
{v1,v2′} to compute this projection onto
W
2
W_2
W2:
The proof of the next theorem shows that this strategy really works. Scaling of vectors is not mentioned because that is used only to simplify hand calculations.
Theorem 11 shows that any nonzero subspace
W
W
W of
R
n
\mathbb R^n
Rn has an orthogonal basis.
PROOF
For
1
≤
k
≤
p
1 \leq k \leq p
1≤k≤p, let
W
k
=
S
p
a
n
{
x
1
,
.
.
.
,
x
k
}
W_k = Span \{\boldsymbol x_1,..., \boldsymbol x_k\}
Wk=Span{x1,...,xk}.
Set v 1 = x 1 \boldsymbol v_1 =\boldsymbol x_1 v1=x1, so that S p a n { v 1 } = S p a n { x 1 } Span \{v_1\}= Span \{x_1\} Span{v1}=Span{x1}.
Suppose, for some
k
<
p
k < p
k<p, we have constructed
v
1
,
.
.
.
,
v
k
\boldsymbol v_1,...,\boldsymbol v_k
v1,...,vk so that
{
v
1
,
.
.
.
,
v
k
}
\{\boldsymbol v_1,..., \boldsymbol v_k\}
{v1,...,vk} an orthogonal basis for
W
k
W_k
Wk. Define
v
k
+
1
=
x
k
+
1
−
p
r
o
j
W
k
x
k
+
1
(
2
)
\boldsymbol v_{k+1} = \boldsymbol x_{k+1}- proj_{W_k}\boldsymbol x_{k+1}\ \ \ \ \ (2)
vk+1=xk+1−projWkxk+1 (2)
By the Orthogonal Decomposition Theorem,
v
k
+
1
\boldsymbol v_{k+1}
vk+1 is orthogonal to
W
k
W_k
Wk.
Note that
p
r
o
j
W
k
x
k
+
1
proj_{W_k}\boldsymbol x_{k+1}
projWkxk+1 is in
W
k
W_k
Wk and hence also in
W
k
+
1
W_{k+1}
Wk+1. Since
x
k
+
1
\boldsymbol x_{k+1}
xk+1 is in
W
k
+
1
W_{k+1}
Wk+1, so is
v
k
+
1
\boldsymbol v_{k+1}
vk+1.
Furthermore,
v
k
+
1
≠
0
\boldsymbol v_{k+1} \neq \boldsymbol 0
vk+1=0 because
x
k
+
1
\boldsymbol x_{k+1}
xk+1 is not in
W
k
W_k
Wk.
Hence
{
v
1
,
.
.
.
,
v
k
+
1
}
\{\boldsymbol v_1,..., \boldsymbol v_{k+1}\}
{v1,...,vk+1} is an orthogonal set of nonzero vectors in
W
k
+
1
W_{k+1}
Wk+1. This set is an orthogonal basis for
W
k
+
1
W_{k+1}
Wk+1. Hence
W
k
+
1
=
S
p
a
n
{
v
1
,
.
.
.
,
v
k
+
1
}
W_{k+1}= Span \{\boldsymbol v_1,..., \boldsymbol v_{k+1}\}
Wk+1=Span{v1,...,vk+1}. When
k
+
1
=
p
k + 1 = p
k+1=p, the process stops.
Orthonormal Bases
An orthonormal basis is constructed easily from an orthogonal basis { v 1 , . . . , v p } \{\boldsymbol v_1,..., \boldsymbol v_p\} {v1,...,vp}: simply normalize (i.e., “scale”) all the v k \boldsymbol v_k vk.
When working problems by hand, this is easier than normalizing each v k \boldsymbol v_k vk as soon as it is found (because it avoids unnecessary writing of square roots).
QR Factorization of Matrices
If an m × n m \times n m×n matrix A A A has linearly independent columns { x 1 , . . . , x n } \{\boldsymbol x_1,..., \boldsymbol x_{n}\} {x1,...,xn}, then applying the Gram–Schmidt process (with normalizations) to { x 1 , . . . , x n } \{\boldsymbol x_1,..., \boldsymbol x_{n}\} {x1,...,xn} amounts to f a c t o r i n g factoring factoring A A A.
This factorization is widely used in computer algorithms
for various computations, such as solving equations (discussed in Section 6.5) and finding eigenvalues (mentioned in the exercises for Section 5.2).
PROOF
The columns of
A
A
A form a basis
{
x
1
,
.
.
.
,
x
n
}
\{\boldsymbol x_1,..., \boldsymbol x_{n}\}
{x1,...,xn} for
C
o
l
A
ColA
ColA. Construct an orthonormal basis
{
u
1
,
.
.
.
,
u
n
}
\{\boldsymbol u_1,..., \boldsymbol u_{n}\}
{u1,...,un} for
W
=
C
o
l
A
W = ColA
W=ColA. This basis may be constructed by the Gram–Schmidt process or some other means. Let
For
k
=
1
,
.
.
.
,
n
k = 1,..., n
k=1,...,n,
x
k
\boldsymbol x_k
xk is in
S
p
a
n
{
x
1
,
.
.
.
,
x
k
}
=
S
p
a
n
{
u
1
,
.
.
.
,
u
k
}
Span \{\boldsymbol x_1,...,\boldsymbol x_k\}= Span\{\boldsymbol u_1,...,\boldsymbol u_k\}
Span{x1,...,xk}=Span{u1,...,uk}. So there are constants,
r
1
k
,
.
.
.
,
r
k
k
r_{1k},..., r_{kk}
r1k,...,rkk, such that
We may assume that
r
k
k
≥
0
r_{kk}\geq0
rkk≥0. (If
r
k
k
<
0
r_{kk} < 0
rkk<0, multiply both
r
k
k
r_{kk}
rkk and
u
k
\boldsymbol u_k
uk by
−
1
-1
−1.) This shows that
x
k
\boldsymbol x_k
xk is a linear combination of the columns of
Q
Q
Q using as weights the entries in the vector
That is,
x
k
=
Q
r
k
\boldsymbol x_k= Q\boldsymbol r_k
xk=Qrk for
k
=
1
,
.
.
.
,
n
k= 1,..., n
k=1,...,n. Let
R
=
[
r
1
.
.
.
r
n
]
R =[\boldsymbol r_1\ \ ...\ \ \boldsymbol r_n]
R=[r1 ... rn]. Then
The fact that
R
R
R is invertible follows easily from the fact that the columns of
A
A
A are linearly independent. Since
R
R
R is clearly upper triangular, its nonnegative diagonal entries must be positive.
Q Q Q can be constructed by Gram–Schmidt process and R R R can be calculated by R = Q T A R=Q^TA R=QTA
1 ^1 1 See F u n d a m e n t a l s Fundamentals Fundamentals o f of of M a t r i x Matrix Matrix C o m p u t a t i o n s Computations Computations, by D a v i d David David S . S. S. W a t k i n s Watkins Watkins (New York: John Wiley & Sons, 1991), pp. 167–180.
EXERCISES
Suppose
A
=
Q
R
A = QR
A=QR, where
Q
Q
Q is
m
×
n
m \times n
m×n and
R
R
R is
n
×
n
n \times n
n×n. Show that if the columns of
A
A
A are linearly independent, then
R
R
R must be invertible.
SOLUTION
[Hint: Study the equation
R
x
=
0
R\boldsymbol x = \boldsymbol 0
Rx=0 and use the fact that
A
=
Q
R
A = QR
A=QR.]
EXERCISES 23
Suppose
A
=
Q
R
A = QR
A=QR is a
Q
R
QR
QR factorization of an
m
×
n
m \times n
m×n matrix
A
A
A (with linearly independent columns). Partition
A
A
A as
[
A
1
A
2
]
[A_1\ \ A_2]
[A1 A2], where
A
1
A_1
A1 has
p
p
p columns. Show how to obtain a QR factorization of
A
1
A_1
A1 .
SOLUTION
∵
A
=
Q
R
∴
[
A
1
m
×
p
A
2
m
×
(
n
−
p
)
]
=
[
Q
1
m
×
p
Q
2
m
×
(
n
−
p
)
]
[
R
1
1
p
×
p
0
p
×
(
n
−
p
)
0
(
n
−
p
)
×
p
R
2
2
(
n
−
p
)
×
(
n
−
p
)
]
∴
A
1
m
×
p
=
Q
1
m
×
p
R
1
1
p
×
p
\because A=QR \\\therefore \begin{bmatrix}A_{1_{m\times p}}& A_{2_{m\times (n-p)}}\end{bmatrix}=\begin{bmatrix}Q_{1_{m\times p}}& Q_{2_{m\times (n-p)}}\end{bmatrix}\begin{bmatrix}R_{11_{p\times p}}& 0_{p\times (n-p)}\\ 0_{(n-p)\times p}& R_{22_{(n-p)\times (n-p)}} \end{bmatrix}\\\therefore A_{1_{m\times p}}=Q_{1_{m\times p}}R_{11_{p\times p}}
∵A=QR∴[A1m×pA2m×(n−p)]=[Q1m×pQ2m×(n−p)][R11p×p0(n−p)×p0p×(n−p)R22(n−p)×(n−p)]∴A1m×p=Q1m×pR11p×p