7.3 Constrained optimization (条件优化)
本文为《Linear algebra and its applications》的读书笔记
目录
Engineers, economists, scientists, and mathematicians often need to find the maximum or minimum value of a quadratic form Q ( x ) Q(\boldsymbol x) Q(x) for x \boldsymbol x x in some specified set. Typically, the problem can be arranged so that x \boldsymbol x x varies over the set of unit vectors. This constrained optimization problem has an interesting and elegant solution. Example 6 below and the discussion in Section 7.5 will illustrate how such problems arise in practice.
The requirement that a vector x \boldsymbol x x in R n \R^n Rn be a unit vector can be stated in several equivalent ways:
and
The expanded version (1) of
x
T
x
=
1
\boldsymbol x^T\boldsymbol x = 1
xTx=1 is commonly used in applications.
When a quadratic form Q Q Q has no cross-product terms, it is easy to find the maximum and minimum of Q ( x ) Q(\boldsymbol x) Q(x) for x T x = 1 \boldsymbol x^T\boldsymbol x = 1 xTx=1.
EXAMPLE 1
Find the maximum and minimum values of
Q
(
x
)
=
9
x
1
2
+
4
x
2
2
+
3
x
3
2
Q(\boldsymbol x)= 9x_1^2+ 4x_2^2 + 3x_3^2
Q(x)=9x12+4x22+3x32 subject to the constraint
x
T
x
=
1
\boldsymbol x^T\boldsymbol x = 1
xTx=1.
SOLUTION
Furthermore, Q ( x ) = 9 Q(\boldsymbol x)=9 Q(x)=9 when x = ( 1 , 0 , 0 ) \boldsymbol x=(1, 0, 0) x=(1,0,0). Thus 9 is the maximum value of Q ( x ) Q(\boldsymbol x) Q(x) for x T x = 1 \boldsymbol x^T\boldsymbol x =1 xTx=1.
Also,
Q
(
x
)
=
9
Q(\boldsymbol x)=9
Q(x)=9 when
x
=
(
0
,
0
,
1
)
\boldsymbol x=(0, 0,1)
x=(0,0,1). Thus 3 is the minimum value of
Q
(
x
)
Q(\boldsymbol x)
Q(x) for
x
T
x
=
1
\boldsymbol x^T\boldsymbol x =1
xTx=1.
It is easy to see in Example 1 that the matrix of the quadratic form Q Q Q has eigenvalues 9 , 4 9, 4 9,4, and 3 3 3 and that the greatest and least eigenvalues equal, respectively, the (constrained) maximum and minimum of Q ( x ) Q(\boldsymbol x) Q(x). The same holds true for any quadratic form, as we shall see.
EXAMPLE 2
Let
A
=
[
3
0
0
7
]
A=\begin{bmatrix} 3& 0\\0 &7\end{bmatrix}
A=[3007], and let
Q
(
x
)
=
x
T
A
x
Q(\boldsymbol x)= \boldsymbol x^TA\boldsymbol x
Q(x)=xTAx for
x
\boldsymbol x
x in
R
2
\R^2
R2. Figure 1 displays the graph of
Q
Q
Q. Figure 2 shows only the portion of the graph inside a cylinder; the intersection of the cylinder with the surface is the set of points
(
x
1
,
x
2
,
z
)
(x_1, x_2,z)
(x1,x2,z) such that
z
=
Q
(
x
1
,
x
2
)
z= Q(x_1, x_2)
z=Q(x1,x2) and
x
1
2
+
x
2
2
=
1
x_1^2 + x_2^2 = 1
x12+x22=1.
The two highest points on the curve are 7 7 7 units above the x 1 x 2 x_1x_2 x1x2-plane, occurring where x 1 = 0 x_1= 0 x1=0 and x 2 = ± 1 x_2=\pm1 x2=±1. These points correspond to the eigenvalue 7 7 7 of A A A and the eigenvectors x = ( 0 , 1 ) \boldsymbol x=(0, 1) x=(0,1) and − x = ( 0 , − 1 ) -\boldsymbol x=(0,-1) −x=(0,−1). Similarly, the two lowest points on the curve are 3 3 3 units above the x 1 x 2 x_1x_2 x1x2-plane. They correspond to the eigenvalue 3 3 3 and the eigenvectors x = ( 1 , 0 ) \boldsymbol x=(1, 0) x=(1,0) and − x = ( − 1 , 0 ) -\boldsymbol x=(-1,0) −x=(−1,0).
The set of all possible values of
x
T
A
x
\boldsymbol x^TA\boldsymbol x
xTAx, for
∥
x
∥
=
1
\left\|x\right\|= 1
∥x∥=1, is the closed interval
3
≤
t
≤
7
3 \leq t\leq 7
3≤t≤7.
It can be shown that for any symmetric matrix A A A, the set of all possible values of x T A x \boldsymbol x^TA\boldsymbol x xTAx, for ∥ x ∥ = 1 \left\|x\right\|= 1 ∥x∥=1, is a closed interval on the real axis. (See Exercise 13.) Denote the left and right endpoints of this interval by m m m and M M M, respectively. That is, let
PROOF
Orthogonally diagonalize
A
A
A as
P
D
P
−
1
PDP^{-1}
PDP−1. We know that
Also,
Thus
x
T
A
x
\boldsymbol x^TA\boldsymbol x
xTAx and
y
T
D
y
\boldsymbol y^TD\boldsymbol y
yTDy assume the same set of values as
x
\boldsymbol x
x and
y
\boldsymbol y
y range over the set of all unit vectors. (Suppose that
λ
1
≥
λ
2
≥
.
.
.
≥
λ
n
\lambda_1\geq\lambda_2\geq...\geq\lambda_n
λ1≥λ2≥...≥λn)
x T A x = y T D y = ∑ i = 1 i = n λ i y i 2 ( ∗ ) \boldsymbol x^TA\boldsymbol x=\boldsymbol y^TD\boldsymbol y=\sum_{i=1}^{i=n}\lambda_iy_i^2\ \ \ \ \ \ \ (*) xTAx=yTDy=i=1∑i=nλiyi2 (∗) ( ∗ ) ≤ λ 1 ∑ i = 1 i = n y i 2 ( ∗ ) h a s t h e m a x i m u m v a l u e w h e n y = e 1 , x = P e 1 = u 1 ( ∗ ) ≥ λ n ∑ i = 1 i = n y i 2 ( ∗ ) h a s t h e m i n i m u m v a l u e w h e n y = e n , x = P e n = u n (*)\leq \lambda_1\sum_{i=1}^{i=n}y_i^2\\(*)\ has\ the\ maximum\ value\ when\ \boldsymbol y=\boldsymbol e_1,\boldsymbol x=P\boldsymbol e_1=\boldsymbol u_1\\(*)\geq \lambda_n\sum_{i=1}^{i=n}y_i^2\\(*)\ has\ the\ minimum\ value\ when\ \boldsymbol y=\boldsymbol e_n,\boldsymbol x=P\boldsymbol e_n=\boldsymbol u_n (∗)≤λ1i=1∑i=nyi2(∗) has the maximum value when y=e1,x=Pe1=u1(∗)≥λni=1∑i=nyi2(∗) has the minimum value when y=en,x=Pen=un
In Theorem 7 and in later applications, the values of x T A x \boldsymbol x^TA\boldsymbol x xTAx are computed with additional constraints on the unit vector x \boldsymbol x x.
The prrof is similar to Theorem 6.
The next theorem generalizes Theorem 7 and, together with Theorem 6, gives a useful characterization of all the eigenvalues of A A A.
EXAMPLE 6
During the next year, a county government is planning to repair
x
x
x hundred miles of public roads and bridges and to improve
y
y
y hundred acres of parks and recreation areas. The county must decide how to allocate its resources (funds, equipment, labor, etc.) between these two projects. If it is more cost effective to work simultaneously on both projects rather than on only one, then
x
x
x and
y
y
y might satisfy a constraint such as
See Figure 3. Each point ( x , y ) (x, y) (x,y) in the shaded feasible set represents a possible public works schedule for the year. The points on the constraint curve, 4 x 2 + 9 y 2 = 36 4x^2+ 9y^2 = 36 4x2+9y2=36, use the maximum amounts of resources available.
In choosing its public works schedule, the county wants to consider the opinions of the county residents. To measure the value, or u t i l i t y ( 效 用 ) utility(效用) utility(效用), that the residents would assign to the various work schedules ( x , y ) (x, y) (x,y), economists sometimes use a function such as
The set of points
(
x
,
y
)
(x, y)
(x,y) at which
q
(
x
,
y
)
q(x, y)
q(x,y) is a constant is called an
i
n
d
i
f
f
e
r
e
n
c
e
indifference
indifference
c
u
r
v
e
curve
curve(无差别曲线). Find the public works schedule that maximizes the utility function
q
q
q.
SOLUTION
The constraint equation does not describe a set of unit vectors, but a change of variable can fix that problem. Rewrite the constraint in the form
and define
Then the constraint equation becomes
and the utility function becomes
q
(
3
x
1
,
2
x
2
)
=
(
3
x
1
)
(
2
x
2
)
=
6
x
1
x
2
q(3x_1, 2x_2)=(3x_1)(2x_2)= 6x_1x_2
q(3x1,2x2)=(3x1)(2x2)=6x1x2. Let
x
=
[
x
1
x
2
]
\boldsymbol x =\begin{bmatrix} x_1\\x_2\end{bmatrix}
x=[x1x2]. Then the problem is to maximize
Q
(
x
)
=
6
x
1
x
2
Q(\boldsymbol x)= 6x_1x_2
Q(x)=6x1x2 subject to
x
T
x
=
1
\boldsymbol x^T\boldsymbol x= 1
xTx=1. Note that
Q
(
x
)
=
x
T
A
x
Q(\boldsymbol x)=\boldsymbol x^TA\boldsymbol x
Q(x)=xTAx, where
The eigenvalues of
A
A
A are
±
3
\pm3
±3, with eigenvectors
Thus the maximum value of
Q
(
x
)
Q(\boldsymbol x)
Q(x) is 3, attained when
x
1
=
1
/
2
x_1 = 1/\sqrt2
x1=1/2
and
x
2
=
1
/
2
x_2= 1/\sqrt2
x2=1/2
.
In terms of the original variables, the optimum public works schedule is x = 3 x 1 = 3 / 2 ≈ 2.1 x= 3x_1 = 3/\sqrt2\approx 2.1 x=3x1=3/2 ≈2.1 hundred miles of roads and bridges and y = 2 x 2 = 2 ≈ 1.4 y= 2x_2 = \sqrt2\approx 1.4 y=2x2=2 ≈1.4 hundred acres of parks and recreational areas. The optimum public works schedule is the point where the constraint curve and the indifference curve just meet.