Sep 17, 2014 - Orthogonal vectors qk that are generated by a Lanczos process will be a linear ...... ceedings of the Cornelius Lanczos International C...

0 downloads 6 Views 274KB Size

No documents

Tove ODLAND†

arXiv:1409.4937v1 [math.OC] 17 Sep 2014

September 17, 2014

Abstract In an unnormalized Krylov subspace framework for solving symmetric systems of linear equations, the orthogonal vectors that are generated by a Lanczos process are not necessarily on the form of gradients. Associating each orthogonal vector with a triple, and using only the three-term recurrences of the triples, we give conditions on whether a symmetric system of linear equations is compatible or incompatible. In the compatible case, a solution is given and in the incompatible case, a certificate of incompatibility is obtained. In particular, the case when the matrix is singular is handled. We also derive a minimum-residual method based on this framework and show how the iterates may be updated explicitly based on the triples, and in the incompatible case a minimum-residual solution of minimum Euclidean norm is obtained. Keywords: Krylov subspace method, symmetric system of linear equations, unnormalized Lanczos vectors, minimum-residual method MCS number: 65F10

1.

Introduction

An important problem in numerical linear algebra and optimization is to solve a system of equations where the matrix is symmetric. Such a problem may be posed as Hx + c = 0, (1.1) for x ∈ Rn , with c ∈ Rn and H = H T ∈ Rn×n . Note that with A = H and b = −c, (1.1) becomes Ax = b. However, we prefer the notation of (1.1) as it is on the form of a gradient g, defined as g = Hx + c, being equal to zero. This notation highlights that we are trying to find a non-trivial linear combination of the columns of H and c. Our primary motivation comes from optimization where in many cases the systems of linear equations that need to be solved are such that the matrix H is symmetric but in general indefinite. For example, KKT systems have this form, see, e.g., [8], but there are many other applications. Throughout, H is assumed symmetric, any ∗ This manuscript supersedes Report TRITA-MAT-2014-OS-01 “A general Krylov method for solving symmetric systems of linear equations”, Department of Mathematics, KTH Royal Institute of Technology, March 2014. † Optimization and Systems Theory, Department of Mathematics, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden ([email protected],[email protected]).

2

On solving symmetric linear equations in an unnormalized Krylov framework

other assumptions on H at particular instances will be stated explicitly. The key concept in this paper will be to determine if (1.1) is compatible or not. Our results include and handle the case when H is a singular matrix. It is assumed throughout that c 6= 0. Exact arithmetic will be assumed and the theory developed in this paper is based on that assumption. In the end of the paper we briefly discuss computational aspects of our results in finite precision. One strategy for solving (1.1) is to generate linearly independent vectors, qk , k = 0, 1, . . . until qk becomes linearly dependent for some k = r ≤ n and hence qr = 0. In this paper we consider Krylov subspace methods in which the generated vectors form an orthogonal, hence linearly independent, basis for the Krylov subspaces generated by H and c, K0 (c, H) = {0},

Kk (c, H) = span{c, Hc, H 2 c, . . . , H k−1 c},

k = 1, 2, . . . . (1.2)

The Krylov vectors c, Hc, . . . , H r−1 c are linearly independent, but as they become highly ill-conditioned it is desirable to work with some other set of vectors. Orthogonal vectors qk that are generated by a Lanczos process will be a linear combination of the columns of H and c. There is a freedom in the scaling of each generated vector. We will refer to the case when the coefficient corresponding to c is equal to one as a normalized Lanczos vector, i.e the vector is on the form of a gradient, g = Hx + c. An unnormalized Lanczos vector is then referring to the case when the coefficient corresponding to c is not required to be one, i.e. q = Hy + δc, where δ ∈ IR. The concept of using unnormalized Lanczos vectors was introduced by Gutknecht in [11,12] as a remedy for so called pivot breakdowns that occur when normalization is not well defined.1 In subsequent work by Gutknecht the term inconsistent is used, see, e.g., [13, 15]. However, in this paper the term unnormalized will be used as it better suits our purposes. The unnormalized framework will be used in a more general sense, not only as a remedy for pivot breakdown, to derive our results. The Lanczos process was first introduced by Lanczos [18, 19]. There have been very many contributions to the theory both for symmetric and non-symmetric systems, see, e.g., Golub and O’Leary’s extensive survey of the years 1948-1976 [9], Golub and Van Loan’s book [10] and Gutknecht’s survey [13]. The outline of the paper is as follows. Section 2 gives a review of background material on the unnormalized Krylov subspace framework. In particular, we review recursions for the unnormalized Lanczos triples (qk , yk , δk ) associated with the unnormalized Lanczos vectors qk , k = 0, . . . , r, such that qk = Hyk + δk c, qk ∈ Kk+1 (c, H), yk ∈ Kk (c, H) and δk ∈ R, k = 0, . . . , r. In Section 3, we give our main convergence result, based on the recursions for the triples, stating that when (1.1) is compatible a solution is given (in this case we show that δr 6= 0), or a certificate of incompatibility can be obtained for (1.1) (in this case δr = 0). The case of a singular matrix H is included and handled in the analysis, 1

Gutknecht considers the more general case when H is non symmetric where there are several other possible breakdowns for corresponding Lanczos a process, see, e.g., [13,26]. For H symmetric, the pivot breakdown is the only one that can happen.

2.

Background

3

which to the best of our knowledge has not been done before. The derivation is summarized in an unnormalized Krylov algorithm, and in addition some remarks are made on the case when normalization is well defined and used. Finally, in Section 4, a minimum-residual method, applicable also for incompatible systems, is derived by making use of the unnormalized Krylov framework. Explicit recursions for the minimum-residual iterates are derived, including an expression for the solution of minimum Euclidean norm in the incompatible case. 1.1.

Notation

The letter i, j and k denote integer indices, other lowercase letters such as q, y and c denote column vectors, possibly with super- and/or subscripts. For a symmetric matrix H, H ≻ 0 denotes that H is positive definite. Analogously, H 0 is used to denote that H is positive semidefinite. The null space and range space of H are denoted by N (H) and R(H) respectively. We will denote by Z an orthonormal matrix whose columns form a basis of N (H). If H is nonsingular, then Z is to be interpreted as an empty matrix. When referring to a norm, the Euclidean norm is used throughout.

2.

Background

Regarding (1.1), the raw data available is the matrix H and the vector c and combinations of the two, for example represented by the Krylov subspaces generated by H and c, as defined in (1.2). For an introduction and background on Krylov subspaces, see, e.g., Gutknecht [14] and Saad [26]. Without loss of generality, the scaling of the first vector q0 may be chosen so that q0 = c. Then one sequence of linearly independent vectors may be generated by letting qk ∈ Kk+1 (c, H) ∩ Kk (c, H)⊥ , k = 1, . . . , r, such that qk 6= 0, for k = 0, 1, . . . , r − 1 and qr = 0 where r is the minimum index k for which Kk+1 (c, H) = Kk (c, H). These vectors {q0 , q1 , . . . , qr−1 } form an orthogonal, hence linearly independent, basis of Kr (c, H). We will refer to these vectors as the Lanczos vectors. With q0 = c, each vector qk , k = 1, . . . , r − 1, is uniquely determined up to a scaling. A vector qk ∈ Kk+1 (c, H) may be expressed as qk =

k X

(j)

δk H j c,

(2.1)

j=0

(j)

for some parameters δk , j = 0, . . . , k, uniquely determined up to a nonzero scaling (k) and δk 6= 0. This is made precise in Lemma A.1. Normalized Lanczos vectors are obtained when the scaling is chosen such that (0) (0) δk = 1, and we call this the normalization condition.2 Since δk is determined up (0) to a scaling it holds that if δk 6= 0 then one can rescale the vector such that the 2

From (2.1), one can see that the vectors qk may be represented as qk = pk (H)c, where pk is polynomial of degree k, hence the normalization condition may be stated as pk (0) = 1, see, e.g., Gutknecht [13]

4

On solving symmetric linear equations in an unnormalized Krylov framework (0)

normalization condition holds, however if δk = 0, then this is not possible and a pivot breakdown occurs. The following proposition reviews a recursion for a sequence of Lanczos vectors r−1 where the scaling factors denoted by {θk }k=0 are left unspecified. This recursion is a slight generalization of the symmetric Lanczos process for generating mutually orthogonal vectors, in which the usual choice of the scaling is such that each vector qk is chosen such that ||qk || = 1, k = 0, . . . , r − 1. For completeness, this proposition and its proof is included. Proposition 2.1. Let r denote the smallest positive integer k for which Kk+1 (c, H) = Kk (c, H). Given q0 = c ∈ K1 (c, H), there exist vectors qk , k = 1, . . . , r, such that qk ∈ Kk+1 (c, H) ∩ Kk (c, H)⊥ ,

k = 1, . . . , r,

for which qk 6= 0, k = 1, . . . , r − 1, and qr = 0. Each such qk , k = 1, . . . , r − 1, is uniquely determined up to a scaling, and a sequence {qk }rk=1 may be generated as q1 = θ0 − Hq0 + qk+1 = θk − Hqk +

q0T Hq0 q0 , q0T q0

(2.2a)

T Hq qk−1 qkT Hqk k q + qk−1 , k T T qk qk qk−1 qk−1

k = 1, . . . , r − 1, (2.2b)

where θk , k = 0, . . . , r − 1, are free and nonzero parameters. In addition, it holds that T T Hqk , k = 0, . . . , r − 1. (2.3) qk+1 = −θk qk+1 qk+1 Proof. Given q0 = c, let k be an integer such that 1 ≤ k ≤ r − 1. Assume that qi , i = 0, . . . , k, are mutually orthogonal with qi ∈ Ki+1 (c, H) ∩ Ki (c, H)⊥ . Let qk+1 ∈ Kk+2 (c, H) be expressed as qk+1 = −θk Hqk +

k X

(i)

ηk qi ,

i=0

k = 0, . . . , r − 1,

(2.4) (i)

In order for qk+1 to be orthogonal to qi , i = 0, . . . , k, the parameters ηk , i = 0, . . . , k, are uniquely determined as follows. For k = 0, to have q0T q1 = 0, it must hold that (0)

η0 = θ0

q0T Hq0 , q0T q0

hence obtaining q1 ∈ K2 (c, H)∩ K1 (c, H)⊥ as in (2.2a), where θ0 is free and nonzero. For k such that 1 ≤ k ≤ r − 1, in order to have qiT qk+1 = 0, i = 0, . . . , k, it must hold that (k)

ηk = θk

qkT Hqk , qkT qk

(k−1)

ηk

= θk

T Hq qk−1 k T q qk−1 k−1

,

and

(i)

ηk = 0,

i = 0, . . . , k − 2.

2.

Background

5

The last relation follows by the symmetry of H. Hence, obtaining qk+1 ∈ Kk+2 (c, H)∩ Kk+1 (c, H)⊥ as in the three-term recurrence of (2.2b), where θk , k = 1, . . . , r − 1, are free and nonzero. Since q1 is orthogonal to q0 , and since qk+1 is orthogonal to qk and qk−1 , k = T yields 1, . . . , r − 1, pre-multiplication of (2.2) with qk+1 T T Hqk , qk+1 = −θk qk+1 qk+1

k = 0, . . . , r − 1.

Finally note that if qk+1 is given by (2.2), then the only term that increases the power (k+1) of H is θk (−Hqk ). Since θk 6= 0, repeated use of this argument gives δk+1 6= 0 Q (k+1) if qk+1 is expressed by (2.1). In fact, δk+1 = (−1)k+1 ki=0 θi 6= 0. Hence, by Lemma A.1, qk+1 = 0 implies Kk+2 (c, H) = Kk+1 (c, H), so that k + 1 = r, as required. The particular form of (2.2) with scaling parameters θk , k = 0, . . . , r, is made to get coherence with existing theory on the method of conjugate gradients, see Section 3.3 and Proposition A.6. To simplify the exposition, the following notation is introduced, α0 =

q0T Hq0 , q0T q0

αk =

qkT Hqk , qkT qk

βk−1 =

T Hq qk−1 k T q qk−1 k−1

k = 1, . . . , r − 1.

(2.5)

Let Qk be the matrix with the Lanczos vectors q0 , q1 , . . . , qk as columns vectors, then (2.2) may be written on matrix form as, 1 HQk = Qk+1 T¯k = Qk Tk − qk+1 eTk+1 , θk where

α0

− θ1 0 Tk =

β0 .. .

..

.

..

..

.

.

1 − θk−1

βk−1 αk

,

T¯k =

Tk 1 T − θk ek+1

.

(2.6)

The choice of θk such that ||qk ||2 = ||q0 ||2 implies βk = − θ1k and in this case Tk will r−1 be symmetric. Changing the set of {θk }k=0 can be seen as a similarity transform of Tk , see, e.g., Gutknecht [13]. Many methods for solving (1.1) use matrix-factorization techniques on Tk or T¯k . For an introduction to how Krylov subspace methods are formalized in this way, see, e.g., Paige, Saunders and Choi [5,23]. For our purposes we leave these available scaling factors unspecified and work with the recursions (2.2) directly. 2.1.

An extended representation of the unnormalized Lanczos vectors

To find a solution of (1.1), if it exists, it is not sufficient to generate the sequence {qk }rk=1 . Note that, as in (2.1), qk ∈ Kk+1 (c, H), k = 0, . . . , r, may be expressed as qk =

k X j=0

(j)

δk H j c = H

k X j=1

(0) (j) δk H j−1 c + δk c,

k = 1, . . . , r.

(2.7)

6

On solving symmetric linear equations in an unnormalized Krylov framework

It is not convenient to represent qk by (2.1). Therefore, defining y0 = 0, δ0 = 1, yk =

k X j=1

(j)

δk H j−1 c ∈ Kk (c, H)

and

(0)

δk = δk ,

k = 1, . . . , r,

it follows from (2.7) that qk = Hyk + δk c, so that qk may be expressed by yk and δk . These quantities will be represented by (j) the triples (qk , yk , δk ), k = 0, . . . , r. Note that {δk }kj=0 are given in association with (0)

the raw data H and c, the choice made here is to use only δk explicitly and collect all other terms in yk . It is straightforward to note that if δr 6= 0, then 0 = qr = Hxr + c for xr = (1/δr )yr , so that xr solves (1.1). It will be shown that (1.1) has a solution if and only if δr 6= 0. (j) As mentioned earlier, for a given k such that 1 ≤ k ≤ r, the parameters δk , j = 1, . . . , k, are uniquely defined up to a scaling. Hence, so is the triple (qk , yk , δk ). This is made precise in the recursions for the triples given in Lemma 2.2. (j) It is possible to use more of the coefficients {δk }kj=1 explicitly in the same representation as above. For the next power of the polynomial in (2.7), let (1)

(1)

yk = Hyk + δk c, (1)

yk =

k X j=2

with

(2.8)

(j)

δk H j−2 c ∈ Kk−1 (c, H),

k = 2, . . . , r,

(1)

in addition to y1 = 0. This will be used in the analysis, but not in the algorithm presented. Based on Proposition 2.1, given (q0 , y0 , δ0 ) = (c, 0, 1), one can formulate recursions for (qk , yk , δk ), k = 1, . . . , r. This derivation is given by Gutknecht in e.g. [13], but we give the following lemma for completeness. Lemma 2.2. Let r denote the smallest positive integer k for which Kk+1 (c, H) = Kk (c, H). Given (q0 , y0 , δ0 ) = (c, 0, 1), there exist vectors (qk , yk , δk ), k = 1, . . . , r, such that qk ∈ Kk+1 (c, H) ∩ Kk (c, H)⊥ ,

yk ∈ Kk (c, H),

qk = Hyk + δk c,

k = 1, . . . , r,

for which qk 6= 0, k = 1, . . . , r −1, and qr = 0. Each such (qk , yk , δk ), k = 1, . . . , r, is uniquely determined up to a scalar, and a sequence {(qk , yk , δk )}rk=1 may be generated as y1 = θ0 − q0 + α0 y0 , δ1 = θ0 α0 δ0 , q1 = θ0 − Hq0 + α0 q0 ,

and

yk+1 = θk − qk + αk yk + βk−1 yk−1 , δk+1 = θk αk δk + βk−1 δk−1 ,

k = 1, . . . , r − 1,

qk+1 = θk − Hqk + αk qk + βk−1 qk−1 ,

k = 1, . . . , r − 1,

k = 1, . . . , r − 1,

3.

Properties of the unnormalized Krylov framework

7

where θk , k = 0, . . . , r − 1, are free and nonzero parameters, and αk , k = 0, . . . , r − 1 and βk−1 , k = 1, . . . , r − 1 are given by (2.5). In addition, it holds that yk are linearly independent for k = 1, . . . , r. Proof. The recursions are given by simple induction on k. We omit the details, see, e.g. [13]. (j) By Lemma A.1 it holds that for k = 0, . . . , r, δk , j = 0, . . . , k are uniquely determined up to a scaling, hence it follows that yk+1 and δk+1 , k = 0, . . . , r − 1 are uniquely determined up to a scaling by the recursions of this proposition. Further, note that the recursion for yk+1 has a nonzero leading term of qk plus additional terms of yi , i = k and i = k − 1. Since qk is orthogonal to yi for i ≤ k and qk 6= 0 for k < r, it follows that the vectors yk+1 are linearly independent for k = 0, . . . , r − 1. Note that the choice θ0 =

1 , α0

θk =

1 , αk + βk−1

k = 1, . . . , r − 1,

(2.9)

in the recursions of Lemma 2.2 implies δk = 1, k = 0, . . . , r. Hence, this choice will give rise to Lanczos vectors that are on the form of gradients. The terms gk and xk are reserved for this case, and we then denote (qk , yk , δk ) by (gk , xk , 1). Therefore, (2.9) is another way of stating the normalization condition. Note that if αk + βk−1 = 0, for some k, then this particular choice is not well defined and a pivot breakdown occurs. In the unnormalized Krylov subspace framework the choice of scaling will not be based on the value of δk .

3.

Properties of the unnormalized Krylov framework

We will henceforth refer to the unnormalized Lanczos triples (qk , yk , δk ), k = 0, . . . , r, as given by Lemma 2.2. Based on the unnormalized framework due to Gutknecht that has been described in the previous section we will now proceed to state our results. 3.1.

Convergence in the unnormalized Krylov framework

The final triple, (qr , yr , δr ), can now be used to show our main convergence result, namely that (1.1) has a solution if and only if δr 6= 0, and that the recursions in Lemma 2.2 can be used to find a solution if δr 6= 0 and a certificate of incompatibility if δr = 0. The case when H is singular is included and handled in this result. Theorem 3.1. Let (qk , yk , δk ), k = 0, . . . , r, be given by Lemma 2.2, and let Z denote a matrix whose columns form an orthonormal basis for N (H). Then, the following holds for the cases δr 6= 0 and δr = 0 respectively. a) If δr 6= 0, then Hxr + c = 0 for xr = (1/δr )yr , so that c ∈ R(H) and xr solves (1.1). In addition, it holds that Z Tyk = 0, k = 0, . . . , r.

8

On solving symmetric linear equations in an unnormalized Krylov framework (1)

(1)

b) If δr = 0, then yr = δr ZZ T c, with δr 6= 0 and Z T c 6= 0, so that c 6∈ R(H) (1) and (1.1) has no solution. Further, there is a yr ∈ Kk−1 (c, H) so that yr = (1) (1) (1) (1) (1) (1) (1) Hyr + δr c. Hence, H(Hxr + c) = 0 for xr = (1/δr )yr , so that xr solves minx∈IRn kHx + ck22 . Proof. For (a), suppose that δr 6= 0. Then 0 = qr = Hyr + δr c, hence Hxr + c = 0 for xr = (1/δr )yr , i.e., xr = (1/δr )yr is a solution to (1.1). Since (1.1) has a solution, (1) (1) it must hold that Z Tc = 0. We have yk = Hyk + δk c for k = 0, . . . , r, so that (1) Z Tyk = δk Z T c. As Z Tc = 0, it follows that Z Tyk = 0, k = 0, . . . , r. (1) (1) For (b), suppose that δr = 0. We have yr = Hyr + δr c, so that Z T yr = (1) (1) δr Z T c. If δr = 0, then 0 = qr = Hyr so that yr = δr ZZ T c. It follows from (1) Proposition 2.2 that yr 6= 0 so that δr 6= 0 and Z Tc 6= 0. A combination of Hyr = 0 (1) (1) (1) (1) (1) and yr = Hyr + δr c gives H(Hyr + δr c) = 0. Consequently, since δr 6= 0, it (1) (1) (1) (1) holds that H(Hxr + c) = 0 for xr = (1/δr )yr . With f (x) = 12 kHx + ck22 , one (1)

obtains ∇f (x) = H(Hx + c), so that xr

is a global minimizer to f over IRn .

Hence, we have shown that (1.1) has a solution if and only if δr 6= 0, and that the recursions in Lemma 2.2 can be used to find a solution if δr 6= 0 and a certificate of incompatibility if δr = 0. We can make a few comments on the sequence {δk }. One can show that the sequence will never have two zero element in a row.3 Also, if θk−1 and θk have the same sign and δk = 0, then δk+1 δk−1 < 0. We give direct proofs of these properties, using only the recursions of the triples, in Appendix A.2. 3.2.

An unnormalized Krylov algorithm

To summarize the derivation up to this point we now state an algorithm for solving (1.1) based on the triples (qk , yk , δk ), k = 0, . . . , r, given by Lemma 2.2 using some θk of our choice. Algorithm 3.1 is called a unnormalized Krylov algorithm4 as it is the unnormalized vectors {qk }, spanning the Krylov subspaces, that drive the progress of the algorithm. In the unnormalized setting, the choice of a nonzero θk is in theory arbitrary, but for the algorithm stated below we have made the choice to let θk > 0 such that kyk+1 k2 = kck2 .This choice is well defined since yk 6= 0, k = 1, . . . , r, by Lemma 2.2. In theory, triples are generated as long as qk 6= 0. In the algorithm, we introduce a tolerance such that the iterations proceed as long as kqk k2 > qtol , where we let √ qtol = ǫM , where ǫM is the machine precision. In theory, we also draw conclusions √ based on δr 6= 0 or δr = 0, for this we introduce a tolerance δtol = ǫM . By Theorem 3.1, Algorithm 3.1 will return either a solution to (1.1) or a certificate that the system is incompatible. 3

This property is used in composite step biconjugate gradient method and other look-ahead techniques to show that a composite step or look-ahead block of size two is sufficient to avoid breakdown, see, e.g., [2, 4]. 4 In the terminology of Gutknecht’s survey, [13], this method would be called inconsistent ORes version of the method of conjugate gradients.

3.

Properties of the unnormalized Krylov framework

9

Algorithm 3.1 An unnormalized Krylov algorithm Input arguments: H, c; Output arguments: compatible; xr if compatible=1; yr if compatible=0; √ qtol ← tolerance on kqk2 ; [Our choice: qtol = ǫM ] √ δtol ← tolerance on |δ|; [Our choice: δtol = ǫM ] k ← 0; q0 ← c; y0 ← 0; δ0 ← 1; q T Hq α0 ← q0T q 0 ; 0 0 q1 ← − Hq0 + α0 q0 ; y1 ← − q0 + α0 y0 ; δ1 ← α0 δ0 ; θ0 ← nonzero scalar; [Our choice: θ0 = kck2 /ky1 k2 ] q1 ← θ0 q1 ; y1 ← θ0 y1 ; δ1 ← θ0 δ1 ; k ← 1; while kqk k2 > qtol do αk ←

qkT Hqk ; qkT qk

βk−1 ←

T Hqk qk−1 ; T qk−1 qk−1

qk+1 ← − Hqk + αk qk + βk−1 qk−1 ; yk+1 ← − qk + αk yk + βk−1 yk−1 ; δk+1 ← αk δk + βk−1 δk−1 ; θk ← nonzero scalar; [Our choice: θk = kck2 /kyk+1 k2 ] qk+1 ← θk qk+1 ; yk+1 ← θk yk+1 ; δk+1 ← θk δk+1 ; k ← k + 1; end while r ← k; if |δr | > δtol then xr ← δ1r yr ; compatible ← 1; else compatible ← 0; end if The following small example is chosen to illustrate Algorithm 3.1, with our choices for θk > 0, qtol and δtol , on a compatible case of (1.1) where H is a singular matrix. The example also illustrates the change of sign between δk+1 and δk−1 when δk = 0. Example 3.2. Let c=

3 2 1 0 −1 −2 −3

T

,

H = diag(c),

Algorithm 3.1 applied to H and c with θk > 0 such that ||yk || = ||c||, k = 1, . . . , r, √ and qtol = δtol = ǫM , yields the following sequences q = 3.0000 2.0000 1.0000 0 -1.0000 -2.0000 -3.0000

-9.0000 -4.0000 -1.0000 0 -1.0000 -4.0000 -9.0000

2.2678 -2.2678 -2.2678 0 2.2678 2.2678 -2.2678

-2.7046 5.4912 2.3768 0 2.3768 5.4912 -2.7046

0.2648 -1.0591 1.3239 0 -1.3239 1.0591 -0.2648

-0.2445 1.4673 -3.6681 0 -3.6681 1.4673 -0.2445

0.0000 0 0 0 0 0 -0.0000

10

On solving symmetric linear equations in an unnormalized Krylov framework

y = 0 0 0 0 0 0 0

-3.0000 -2.0000 -1.0000 0 1.0000 2.0000 3.0000

3.4017 1.5119 0.3780 0 0.3780 1.5119 3.4017

-0.9015 2.7456 2.3768 0 -2.3768 -2.7456 0.9015

-2.2241 -2.8419 -0.9885 0 -0.9885 -2.8419 -2.2241

-0.0815 0.7336 -3.6681 0 3.6681 -0.7336 0.0815

2.1602 2.1602 2.1602 0 2.1602 2.1602 2.1602

delta = 1.0000

0

-2.6458

0

2.3123

0

-2.1602

Hence, r = 6 and xr = (1/δr )yr =

−1 −1 −1 0 −1 −1 −1

An example of an incompatible system will be given in Section 4.2. 3.3.

T

.

On the case when normalization is well defined

It is well-known that when normalization is well defined and applied to Algorithm 3.1, then the method of conjugate gradients, by Hestenes and Stiefel [17], is obtained. In this case, we denote (qk , yk , δk ), by (gk , xk , 1) and the recursions for gk and xk simplify such that it is possible to obtain a two-term recurrence of a search-direction pk . The normalization condition for θk is θk = 1/(αk + βk−1 ). We show in Proposition A.6 that this choice for θk corresponds exactly to the optimal step-length along pk , which was the motivation for setting up the recursion (2.2) on that particular form. In Lemma A.5, we show that if H 0 then δi 6= 0, for i = 0, . . . , r − 1. Also, if δk > 0 and δk+1 6= 0, then δk+1 > 0 if and only if θk > 0. Hence, it holds that for H 0, θi > 0, i = 0, . . . , r − 1, and δ0 > 0, then δi > 0, i = 0, . . . , r − 1, and δr ≥ 0. With the additional information that c ∈ R(H) it holds that δr > 0 and normalization is possible in every iteration. On the other hand for H 0, θi > 0, i = 0, . . . , r − 1, δ0 > 0 and c ∈ / R(H), then δr = 0 and normalization is possible at all but the final iteration. Further, if H ≻ 0 and θi > 0, i = 0, . . . , r − 1, then δi > 0, i = 1, . . . , r, i.e., δr 6= 0 since (1.1) with H ≻ 0 is always compatible.

4.

Connection to the minimum-residual method

In the case when (1.1) is incompatible, instead of just a certificate of this fact, one would often be interested in a vector x that is ”as good as possible”. The method of choice could then be the minimum residual method which will return a solution in the compatible case, and a minimum-residual solution in the incompatible case. This method goes back to Lanczos early paper [19] and Stiefel [28], and the name is adopted from the implementation of the method, MINRES, by Paige and Saunders, see [24]. 2 R is defined as a solution to min For k = 0, . . . , r, xM x∈Kk (c,H) kHx + ck2 , and k R M R M R M R the corresponding residual gk is defined as gk = Hxk + c. The vectors xM k are uniquely defined for k = 0, . . . , r − 1, and for k = r if c ∈ R(H). For the

4.

Connection to the minimum-residual method

11

R . If c ∈ R(H), case k = r and c 6∈ R(H) there is one degree of freedom for xM r M R M R M R then xr solves (1.1), and if c 6∈ R(H), then xr−1 and xr are both solutions to minx∈IRn kHx + ck22 .

4.1.

Convergence of the minimum-residual method

In the following theorem, we derive the minimum-residual method based on the unnormalized Krylov subspace framework. In particular, we give explicit formulas R and g M R , k = 0, . . . , r. For the case k = r, c 6∈ R(H), we give an explicit for xM k k R of minimum Euclidean norm. formula for xM r Theorem 4.1. Let (qk , yk , δk ) be given by Lemma 2.2 for k = 0, . . . , r. Then, 2 R solves min for k = 0, . . . , r, it holds that xM x∈Kk (c,H) kHx + ck2 if and only if k P k R = xM i=0 γi yi for some γi , i = 0, . . . , k, that are optimal to k k X

minimize

1 2

subject to

Pi=0 k

γ0 ,...,γk

γi2 qiT qi

i=0 γi δi

(4.1)

= 1.

R takes the following form for the mutually exclusive cases (a) In particular, xM k k < r; (b) k = r and δr 6= 0; and (c) k = r and δr = 0.

a) For k < r, it holds that R xM =P k k

1 δj2

j=0 q T qj j

k X δi yi , T q q i=0 i i

(4.2)

R + c 6= 0. and gkM R = HxM k R = (1/δ )y and g M R = HxM R +c = 0, b) For k = r and δr 6= 0, it holds that xM r r r r r M R M R so that xr solves (1.1) and xr is identical to xr of Theorem 3.1. R = xM R +γ y , where γ is an arbitrary c) For k = r and δr = 0, it holds that xM r r r r r−1 M R R and xM R solve M R M R scalar, and gr = Hxr + c = gr−1 6= 0. In addition, xM r r−1 minx∈IRn kHx + ck22 . The particular choice

γr = −

R yrTxM r−1 yrTyr

2 R an optimal solution to min makes xM x∈IRn kHx + ck2 of minimum Euclidean r norm.

Proof. Since qi , i = 0, . . . , k, form an orthogonal basis for Kk+1 (c, H), an arbitrary vector in Kk+1 (c, H) can be written as g=

k X i=0

γi qi =

k X i=0

γi (Hyi + δi c) = H

k X i=0

k X γi δi c, γi yi + (

i=0

(4.3)

12

On solving symmetric linear equations in an unnormalized Krylov framework

P for some parameters γi , i = 0, . . . , k. Consequently, the condition ki=0 γi δi = 1 Pk inserted into (4.3) gives g = Hx + c with x = i=0 γi yi , i.e., g is an arbitrary vector in Kk+1 (c, H) for which the coefficient in front of c equals one, and x is the corresponding arbitrary vector in Kk (c, H). Minimizing the Euclidean norm of such a g gives the quadratic program 1 T 2g g

minimize g,γ0 ,...,γk

subject to g = Pk

Pk

(4.4)

i=0 γi qi ,

i=0 γi δi

= 1,

P so that, by (4.3), the optimal values of γi , i = 0, . . . , k, give gkM R as gkM R = ki=0 γi qi Pk R as xM R = and xM i=0 γi yi . Elimination of g in (4.4), taking into account the k k orthogonality of the qi ’s, gives the equivalent problem (4.1). Also note that since δ0 6= 0, the quadratic programs (4.1) and (4.4) are always feasible, and hence they are well defined. Let L(γ, λ) be the Lagrangian function for (4.1), k k X 1X 2 T γi δi − 1 . γi qi qi − λ L(γ, λ) = 2 i=0

i=0

The optimality conditions for (4.1) are given by 0 = ∇γi L(γ, λ) = γi qiT qi − λδi , k X

0 = ∇λ L(γ, λ) = 1 −

i = 0, . . . , k,

(4.5a)

γi δi .

(4.5b)

i=0

First, for (a), consider the case k < r. From (4.5a) it holds that γi = λ

δi , qiT qi

i = 0, . . . , k,

(4.6)

which are well defined, since qi 6= 0, i = 0, . . . , r −1. The expression for λ is obtained by inserting the expression for γi , i = 0, . . . , k, given by (4.6) in (4.5b) so that 1=

k X i=0

γi δi =

k X

λ

i=0

δi2 qiT qi

yielding λ = P k

1

δi2 i=0 q T qi i

.

(4.7)

Consequently, a combination of (4.6) and (4.7) gives γi = P k

1

j=0

δj2 T qj qj

δi , T qi qi

i = 0, . . . , k.

(4.8)

Pk R = Hence, by letting xM i=0 γi yi , with γi given by (4.8), (4.2) follows. k Now, for (b), consider the case k = r with δr 6= 0. Then, since qr = 0, (4.5a) gives λ = 0 and γi = 0, i = 0, . . . , r − 1. Consequently, (4.5b) gives γr = 1/δr .

4.

Connection to the minimum-residual method

13

Pr R = M R = (1/δ )y , for which g M R = Again, by letting xM r r r i=0 γi yi , it holds that xk k R solves (1.1). M R Hxk + c = 0, so that the optimal value is zero in (4.1) and xM r Finally, for (c), consider the case k = r with δr = 0. Theorem 3.1 shows that (1) there exists an xr ∈ Kr−1 (c, H) that solves minx∈IRn kHx + ck22 . Consequently, (1) (1) R since xr ∈ Kr−1 (c, H), it follows from (a) that it must hold that xr = xM r−1 so M R 2 that xr−1 solves minx∈IRn kHx + ck2 . For k = r, the optimality conditions (4.5) are equivalent to when k = r − 1, just R = xM R + γ y and with the additional information that γr is arbitrary. Hence, xM r r r r−1 R + c = g M R for arbitrary γ since Hy = 0. grM R = HxM r r r r−1 R is unique, since Regardless of the value of γr , the range-space component of xM r M R Theorem 3.1 gives Hyr = 0. For the remainder of the proof, let xr R = xM r−1 + γr yr R T T M R = 0. We will show for the particular choice γr = −(yrT xM r−1 )/(yr yr ), so that yr xr R is zero, and hence that for this particular choice the null-space component of xM r R is an optimal solution to min 2 xM x∈IRn kHx + ck2 of minimum Euclidean norm. r (1) (1) (1) T Since yk = Hyk + δk c, it follows that Z yk = δk Z T c, k = 0, . . . , r. ConseR is formed as a linear combination of y , k = 0, . . . , r, it holds quently, since xM k r (1) T T T M R M R T T T M R = 0, so that that Z xr is parallel to Z c. But 0 = yr xr = (δr Z c) Z x (1) T T M R T Z x is also orthogonal to Z c. By Theorem 3.1, δr 6= 0 and Z c 6= 0, so that R is zero. Z T xM R = 0. Hence, the null space component of xM r R = xM R Note that at an iteration k at which δk = 0 and qk 6= 0, it holds that xM k k−1 so that the iterate is unchanged. This is referred to as stagnation, and in accordance with Brown [3] it holds that the unnormalized Krylov method and the minimumresidual method form a pair, see, e.g., [26, Proposition 6.17]. In the framework of this paper, it holds that normalization is not possible at step k in the Krylov method if and only if there is stagnation in the minimum-residual method. Note that this cannot happen at two consecutive iterations. If qk 6= 0, all information from the problem has not been extracted even if δk = 0. Only in the case when k = r, global information is obtained, and it is determined whether (1.1) has a solution or not. In the following corollary of Theorem 4.1 we state explicit recursions for the minimum-residual method. R be given by Corollary 4.2. Let (qk , yk , δk ) be given by Lemma 2.2 and let xM k M R 2 M R Theorem 4.1 for k = 0, . . . , r. If δ0 = δ0 , y0 = δ0 y0 ,

δkM R

=

qkT qk

k−1 X δi2 + δk2 Tq q i i=0 i

then R = xM k

1 MR y , M δk R k

and

ykM R

=

qkT qk

k−1 X δi yi + δk yk , T q q i=0 i i

k = 0, . . . , r − 1

and

k = 1, . . . , r,

k = r if δr 6= 0.

In addition, it holds that MR δk+1 =

T q qk+1 k+1

qkT qk

for k = 0, . . . , r − 1.

2 δkM R + δk+1

and

MR yk+1 =

T q qk+1 k+1

qkT qk

ykM R + δk+1 yk+1 ,

14

On solving symmetric linear equations in an unnormalized Krylov framework

Proof. For k = 0, . . . , r − 1, the expressions for δkM R and ykM R give k

1 M R X δi2 δ = qkT qk k qT q i=0 i i

k

1 M R X δi y = y. Tq i qkT qk k q i i i=0

and

Note that δ0 6= 0 and qi 6= 0, i = 0, . . . , k implies δkM R > 0 for k < r. Hence, R = (1/δ M R )y M R . If k = r and δ 6= 0, the expressions for Theorem 4.1 gives xM r k k k M R M R δr and yr give δrM R = δr2 > 0 and yrM R = δr yr , R = (1/δ M R )y M R also for this case. so that Theorem 4.1 gives xM r r r M R , k = 0, . . . , r − 1, are straightforward to obtain. M R The recursions for yk+1 and δk+1 R , based on xM R and (q , y , δ ), for the case δ = 0 is given The recursion for xM r r r r r r−1 in Theorem 4.1 and it is not reiterated in Corollary 4.2. R Note that the expressions in Theorem 4.1 and Corollary 4.2 for xM k , k = 0, . . . , r, are independent of the scaling of (qk , yk , δk ). Hence, if H 0 and c ∈ R(H) R then normalization is well defined so that (gk , xk , 1) may be used to give xM k , k = 0, . . . , r − 1, as convex combinations of xi , i = 0, . . . , k, respectively.

4.2.

A minimum-residual algorithm based on the unnormalized Krylov method

To summarize we next state an algorithm for the minimum-residual method based on Algorithm 3.1 and extended with the recursions in Corollary 4.2. Algorithm 4.2 A minimum-residual algorithm based on the unnormalized Krylov method Input arguments: H, c; R , g M R , compatible; (xr or yr if compatible=1 or 0;) Output arguments: xM r r Run Algorithm 3.1 with the extra initialization R ← 1 yM R; R + c; y0M R ← δ0 y0 ; δ0M R ← δ02 ; xM g0M R ← HxM 0 0 δ0M R 0 For k = 1 calculate in addition qT q qT q y1M R ← q1T q1 y0M R + δ1 y1 ; δ1M R ← q1T q1 δ0M R + δ12 ; 0

0

0

0

R + c; R ← 1 yM R; g1M R ← HxM xM 1 1 δ1M R 1 For k > 1 until termination calculate in addition T T M R ← qk+1 qk+1 y M R + δ M R ← qk+1 qk+1 δ M R + δ 2 ; yk+1 y ; δ k+1 k+1 T k k+1 k k+1 q q qT q R xM k+1 ←

1

k

k

yM R ; δM R k+1 k+1

M R ← HxM R + c; gk+1 k+1

k k

At termination, if |δr | < δtol , calculate in addition R ← xM R − xM r r−1

R yrTxM r−1 yr ; T yr yr

compatible ← 0;

Hence, for a compatible system (1.1) Algorithm 4.2 gives the same solution xr R . They are both estimates of a as Algorithm 3.1, and in addition it calculates xM r

4.

Connection to the minimum-residual method

15

R, solution to (1.1). Further, if (1.1) is incompatible then Algorithm 4.2 delivers xM r 2 an optimal solution to minx∈IRn kHx + ck2 of minimum Euclidean norm, in addition to the certificate of incompatibility. Next we observe another small example chosen to illustrates Algorithm 4.2 with our choices for θk > 0, qtol and δtol , on a case when (1.1) is incompatible, i.e. c∈ / R(H).

Example 4.3. Let 3 2 1 1 −1 −2 −3

c=

T

,

H = diag

5 2 1 0 −1 −2 −3

,

Algorithm 4.2 applied to H and c with θk > 0 such that ||yk || = ||c||, k = 1, . . . , r, √ and qtol = δtol = ǫM , yields the following sequences q = 3.0000 2.0000 1.0000 1.0000 -1.0000 -2.0000 -3.0000

-13.1379 -2.7586 -0.3793 0.6207 -1.6207 -5.2414 -10.8621

3.5628 -5.7676 -3.1464 -2.8617 2.0296 1.3007 -3.8286

-0.8597 3.9832 0.1039 -1.7605 2.7934 4.3735 -2.6032

0.1063 -1.3787 1.8638 2.2573 -0.6882 2.1842 -0.6658

-0.0181 0.6372 -2.5737 0.5896 -2.4489 1.1548 -0.2082

0.0017 -0.1470 1.1021 -1.7634 -1.4695 0.3149 -0.0367

-0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000

-3.0000 -2.0000 -1.0000 -1.0000 1.0000 2.0000 3.0000

2.4296 -0.0222 -0.2847 -0.5584 0.8320 2.2113 4.1379

0.8844 3.7521 1.8644 1.5833 -1.0329 -0.4262 2.6283

-1.3331 -2.9466 -0.3935 0.7502 -1.5691 -3.3494 -2.0353

-0.3574 -0.2710 -3.1633 -3.7018 1.8593 -1.1670 -0.5202

1.0584 1.6899 2.8655 -0.7931 3.2329 1.6060 1.7756

-0.0000 0 0.0000 5.3852 0.0000 0.0000 0.0000

0.6207

-2.8617

-1.7605

2.2573

0.5896

-1.7634

-0.0000

-0.1588 -0.1059 -0.0529 -0.0529 0.0529 0.1059 0.1588

-0.6633 -0.0228 0.0585 0.1284 -0.1983 -0.5364 -1.0143

-0.6143 -0.6647 -0.2817 -0.1845 0.0407 -0.2994 -1.1600

-0.5995 -1.0640 -0.2148 0.1376 -0.4178 -1.0375 -0.9990

-0.5998 -1.0371 -0.4441 -0.1481 -0.2588 -1.0794 -0.9938

-0.6000 -1.0000 -1.0000 0.1333 -1.0000 -1.0000 -1.0000

-0.6000 -1.0000 -1.0000 -0.0000 -1.0000 -1.0000 -1.0000

y = 0 0 0 0 0 0 0 delta = 1.0000 xMR = 0 0 0 0 0 0 0

T R = Hence, r = 7 and xM −0.6 −1 −1 0 −1 −1 −1 . Note that, since r M R |δr | < δtol , the system is considered incompatible and xr is the optimal solution to R + ck2 = 1. minx∈IRn kHx + ck22 of minimum Euclidean norm and kHxM r 2

16

On solving symmetric linear equations in an unnormalized Krylov framework

5.

Summary and conclusion

By making use of an unnormalized Krylov subspace framework for solving symmetric system of linear equations, as proposed by Gutknecht [11, 12], we show how to determine, in exact arithmetic, if the system is compatible or incompatible. In the compatible case, a solution is given. In the incompatible case, a certificate of incompatibility is obtained. The basis of this framework are the triples (qk , yk , δk ), k = 0, . . . , r, given by Lemma 2.2, that are uniquely defined up to a scaling. Our results include and handle the case of a singular matrix H. To the best of our knowledge this is not covered in any previous work. We have also put the minimum-residual method in this framework and provided explicit formulas for the iterations. Again, the analysis is based on the triples. R In the case of an incompatible system, our analysis gives an expression for xM r of minimum Euclidean norm. The original implementation of MINRES by Paige and Saunders, [24], did not deliver the optimal solution to minx∈IRn kHx + ck22 of minimum Euclidean norm. In [5], Choi, Paige and Saunders present a MINRES-like algorithm, MINRES-QLP, that does. One may observe that an alternative to using the minimum-residual iterations (1) (1) would be to consider recursions for yk and δk as given by (2.8) and then calculate (1) (1) R xM r−1 = (1/δr )yr , according to the analysis in the proof of Theorems 3.1 and R 4.1. However, such an approach would not automatically yield the estimates xM k , k = 0, . . . , r − 2. One could also note that the method of conjugate gradients may be viewed as trying to solve the minimum-residual problem (4.1) in the situation where only the present triple (qk , yk , δk ) is allowed in the linear combination, i.e., γi = 0, i = 0, . . . , k − 1. This problem is then not necessarily feasible. It will be infeasible exactly when δk = 0. One could think of methods other than the minimum-residual method which use a linear combination of more than one triple. It would suffice to use two consecutive triples, since it cannot hold that δk−1 = 0 and δk = 0. Finally, we want to stress that this paper is meant to give insight into the unnormalized Krylov subspace framework, in exact arithmetic. In finite precision, the unnormalized Krylov method would inherit deficiencies of any method based on a Lanczos process such as loss of orthogonality of the generated vectors. It is beyond the scope of the present paper to make such an analysis, see, e.g., [16,21,22,25]. The theory of our paper is based on determining if certain quantities are zero or not. In our algorithms, we have made choices on optimality tolerances that are not meant to be universal. To obtain a fully functioning algorithm, the issue of determining if a quantity is near-zero would need to be considered more in detail. Also, we have based our analysis on the triples, so that termination of Algorithm 4.2 is based on qk and δk . In practice, one should probably also consider gkM R . Further, the use of pre-conditioning is not explored in this paper, for this subject see, e.g., [1, 6, 7].

A.

A.

Appendix

17

Appendix

A.1.

A result on the Lanczos vectors

For completeness, we review a result that characterizes the properties of qk expressed as in (2.1), needed for the analysis. Lemma A.1. Let r denote the smallest positive integer k for which Kk+1 (c, H) = Kk (c, H). For an index k such that 1 ≤ k ≤ r, let qk ∈ Kk+1 (c, H) ∩ Kk (c, H)⊥ , be (j) expressed as in (2.1). Then, the scalars δk , j = 0, . . . , k, are uniquely determined (k) up to a nonzero scaling. In addition, if δk 6= 0 it holds that qk = 0 if and only if Kk+1 (c, H) = Kk (c, H), i.e., if k = r. Proof. Assume that qk ∈ Kk+1 (c, H) ∩ Kk (c, H)⊥ is expressed as in (2.1). If k < r, (j) then c, Hc, H 2 c, . . . , H k c are linearly independent. Hence, δk , j = 0, . . . , k, are uniquely determined by qk . Consequently, as qk is uniquely defined up to a nonzero (j) scaling, then so are δk , j = 0, . . . , k. For k = r, we have qr = 0 so that −

δr(r) H r c

=

r−1 X

δr(j) H j c.

(A.1)

j=0

By the definition of r, it holds that c, Hc, H 2 c, . . . , H r−1 c are linearly independent. (r) (j) Hence, (A.1) shows that a fixed δr uniquely determines δr , j = 1, . . . , r − 1. (r) (j) Consequently, a scaling of δr gives a corresponding scaling of δr , j = 0, . . . , r − 1. (j) Thus, δr , j = 0, . . . , r, are uniquely determined up to a common scaling. (k) Finally, assume that δk 6= 0. By definition Kk+1 (c, H) = Kk (c, H) implies qk = 0. To show the converse, assume that qk = 0. Then, (k)

− δk H k c =

k−1 X

(j)

δk H j c.

(A.2)

j=0

(k)

If δk 6= 0, then (A.2) implies H k c ∈ span{c, Hc, H 2 c, . . . , H k−1 c}, i.e., Kk+1 (c, H) = Kk (c, H), completing the proof. A.2.

Properties of the sequence {δk }

In the following proposition it is shown that the sequence {δk } can not have two zero elements in a row. Proposition A.2. Let (qk , yk , δk ), k = 0, . . . , r, be given by Lemma 2.2. If qk 6= 0 and δk = 0, then qkT qk θk δk−1 6= 0. δk+1 = − T q θk−1 qk−1 k−1

18

On solving symmetric linear equations in an unnormalized Krylov framework

Proof. By Proposition 2.1 it holds that qkT qk = −θk−1 qkT Hqk−1 , and, taking into account δk = 0, the expression for δk+1 from Proposition 2.2 gives δk+1 = θk

T Hq qk−1 k T q qk−1 k−1

δk−1 = −

qkT qk δk−1 , T q θk−1 qk−1 k−1 θk

(A.3)

giving the required expression for δk+1 . It remains to show that δk+1 6= 0. First, assume that k = 1 so that δ1 = 0. Then, since θ1 6= 0, θ0 6= 0 and δ0 = 1, (A.3) gives δ2 6= 0. Now assume that k > 1. Assume, to get a contradiction, that δk+1 = 0. Then, since θk 6= 0, θk−1 6= 0, (A.3) gives δk−1 = 0. We may then repeat the same argument to obtain δi = 0, i = 1, . . . , k. But this gives a contradiction, as δ1 = 0 implies δ2 6= 0. Hence, it must hold that δk+1 6= 0, as required. Based on Proposition A.2 the following corollary states that if θk−1 and θk have the same sign and δk = 0, then δk+1 and δk−1 will have opposite signs. Corollary A.3. Let (qk , yk , δk ), k = 0, . . . , r, be given by Proposition 2.2 with θk−1 and θk of the same sign. If qk 6= 0 and δk = 0, then δk+1 δk−1 < 0. The following lemma states an expression for the triples that is used in showing properties of the signs of δk and θk for the case when H is positive semidefinite. Lemma A.4. Let (qk , yk , δk ), k = 0, . . . , r, be given by Lemma 2.2. If δk 6= 0 and k < r, then yk+1 −

δk+1 T δk+1 δk+1 T yk H yk+1 − yk = θk q qk . δk δk δk k

(A.4)

Proof. Eliminating c from the difference of qk+1 and qk yields qk+1 −

δk+1 δk+1 qk = H yk+1 − yk . δk δk

Then pre-multiplication of (A.5) with (yk+1 − yk+1 −

δk+1 T δk yk )

(A.5)

yields

δk+1 T δk+1 δk+1 T δk+1 qk+1 − yk qk = yk+1 − yk H yk+1 − yk . (A.6) δk δk δk δk

Since qk+1 is orthogonal to yk+1 and yk , and since qk is orthogonal to yk , (A.6) becomes δk+1 T δk+1 δk+1 T (A.7) yk+1 qk = yk+1 − yk H yk+1 − yk . − δk δk δk

Hence, by Proposition 2.1 and since qk is orthogonal to yk and yk−1 , (A.7) may be written as δk+1 T δk+1 δk+1 T θk qk qk = yk+1 − yk H yk+1 − yk , δk δk δk

A.

Appendix

19

hence (A.4) is obtained. The following lemma gives some results on the behavior of the sequence of {δk } in connection to the sign of θk for the case when H 0. Lemma A.5. Let (qk , yk , δk ), k = 0, . . . , r, be given by Lemma 2.2. Assume that H 0. Then δk 6= 0 for k < r. If δk > 0 and δk+1 6= 0, then δk+1 > 0 if and only if θk > 0. Proof. Assume that δk = 0 for k < r, then qk = Hyk , hence pre-multiplication with ykT yields 0 = ykT qk = ykT Hyk , since qk is orthogonal to yk . Then, since H 0, it follows that Hyk = 0 and hence qk = 0. Since qk 6= 0 for k < r, the assumption yields a contradiction. Hence, δk 6= 0, k < r. Next suppose that δk > 0 and δk+1 6= 0. Since H 0, Lemma A.4 gives θk

δk+1 T q qk ≥ 0, δk k

(A.8)

which implies that δk+1 and δk have the same sign if and only if θk > 0. Hence, if δk > 0, then δk+1 > 0 if and only if θk > 0. The relation of the signs in Lemma A.5 is a consequence of our choice of the minus-sign in (2.4). Otherwise δk would alternate sign in each iteration for θk > 0 and H 0. A consequence of Lemma A.5 is that if θk is chosen positive for k = 0, . . . , r, then δk ≤ 0 for some k implies H 6≻ 0 and δk < 0 for some k implies H 6 0. A.3.

The method of conjugate gradients

If normalization is well defined and applied to Algorithm 3.1, then one obtains the method of conjugate gradients, by Hestenes and Stiefel [17]. For an introduction to the method of conjugate gradients see, e.g., [20, 27]. This method is usually defined for the case where H ≻ 0. In the method of conjugate gradients, an iterate xk is defined as the solution to minKk (c,H) 21 xT Hx+cT x, and gk = Hxk +c for k = 0, . . . , r, i.e., gk ∈ Kk+1 (c, H) ∩ Kk (c, H)⊥ . In the setting of this paper, it is equivalent to generating triples (qk , yk , δk ), k = 0, . . . , r, given by Lemma 2.2, selecting the scaling θk in (2.9) such that δk = 1, for all k. With the additional assumption H 0, Lemma A.5 gives δk 6= 0, k = 0, . . . , r − 1. If c ∈ R(H), i.e., (1.1) is compatible, then Theorem 3.1 ensures that also δr 6= 0. Further, if H 0 and c 6∈ R(H), normalization will be well defined in all except the very last iteration. For completeness, in the following proposition we show that when the normalization condition is satisfied, θk is exactly the step-length along the search-direction pk in iteration k, so that the usual line-search description of the method of conjugate gradients, see, e.g., [26], follows.

20

On solving symmetric linear equations in an unnormalized Krylov framework

Proposition A.6. Assume that H 0 and c ∈ R(H). If (qk , yk , δk ), k = 0, . . . , r, are given by Lemma 2.2, for the choice of θk in (2.9), then δk = 1, k = 1, . . . , r, and θk > 0, k = 0, . . . , r − 1. Hence, denoting (qk , yk , δk ) by (gk , xk , 1), for p0 = −g0 ,

pk = −gk +

gkT gk pk−1 , T g gk−1 k−1

k = 1, . . . , r − 1.

it holds that xk+1 = xk + θk pk , gk+1 = gk + θk Hpk ,

k = 0, . . . , r − 1, k = 0, . . . , r − 1,

and further, θk = −

gkT pk , pTk Hpk

k = 0, . . . , r − 1.

Proof. Let (q0 , y0 , δ0 ) = (c, 0, 1), then with θk as in (2.9), i.e., θ0 =

1 , α0

θk =

1 , αk + βk−1

k = 1, . . . , r − 1,

where αk , k = 0, . . . , r − 1, and βk−1 , k = 1, . . . , r − 1, are given by (2.5), the recursions of Lemma 2.2 yield δk = 1, k = 0, . . . , r − 1. Hence, by Lemma A.5, θk > 0, k = 0, . . . , r − 1. Denoting (qk , yk , δk ) by (gk , xk , 1), the recursions for xk and gk of Lemma 2.2 are then given by x1 = x0 + θ0 (−g0 ), xk+1 = xk + θk − gk − βk−1 (xk − xk−1 ) ,

k = 1, . . . , r − 1,

and g1 = Hx1 + c = g0 + θ0 (−Hg0 ), gk+1 = Hxk+1 + c = gk + θk − Hgk − βk−1 (gk − gk−1 ) , k = 1, . . . , r − 1.

For pk = (1/θk )(xk+1 − xk ), k = 0, . . . , r − 1, the above recursions give p0 = −g0 ,

and pk = −gk − βk−1 (xk − xk−1 ),

k = 1, . . . , r − 1.

By Proposition 2.1 it holds that gkT gk = −θk−1 gkT Hgk−1 , and therefore, βk−1 =

T Hg gk−1 k T g gk−1 k−1

=−

gkT gk , T g θk−1 gk−1 k−1 1

hence pk = −gk − βk−1 θk−1 pk−1 = −gk +

gkT gk pk−1 , T g gk−1 k−1

k = 1, . . . , r − 1.

References

21

Consequently, since gk+1 − gk = H(xk+1 − xk ) = θk Hpk , k = 0, . . . , r − 1, it holds T p = 0 since g that gk+1 = gk + θk Hpk , k = 0, . . . , r − 1. Further, gk+1 k k+1 ∈ T ⊥ T Kk+1 (c, H) ∩ Kk (c, H) and pk ∈ Kk (c, H). Hence, 0 = gk+1 pk = gk pk + θk pTk Hpk yields g T pk θk = − Tk , k = 0, . . . , r − 1, pk Hpk completing the proof.

Acknowledgement This research was partially supported by the Swedish Research Council (VR).

References [1] O. Axelsson, On the efficiency of a class of A-stable methods, Nordisk Tidskr. Informationsbehandling (BIT), 14 (1974), pp. 279–287. [2] R. E. Bank and T. F. Chan, An analysis of the composite step biconjugate gradient method, Numer. Math., 66 (1993), pp. 295–319. [3] P. N. Brown, A theoretical comparison of the Arnoldi and GMRES algorithms, SIAM J. Sci. Statist. Comput., 12 (1991), pp. 58–78. [4] T. F. Chan and T. Szeto, Composite step product methods for solving nonsymmetric linear systems, SIAM J. Sci. Comput., 17 (1996), pp. 1491–1508. [5] S.-C. T. Choi, C. C. Paige, and M. A. Saunders, MINRES-QLP: a Krylov subspace method for indefinite or singular symmetric systems, SIAM J. Sci. Comput., 33 (2011), pp. 1810–1836. [6] D. J. Evans, The use of pre-conditioning in iterative methods for solving linear equations with symmetric positive definite matrices, J. Inst. Math. Appl., 4 (1968), pp. 295–314. [7]

, The analysis and application of sparse matrix algorithms in the finite element method, in The Mathematics of Finite Elements and Applications, Academic Press, London, 1973, pp. 427–447.

[8] A. Forsgren, P. E. Gill, and J. R. Shinnerl, Stability of symmetric ill-conditioned systems arising in interior methods for constrained optimization, SIAM J. Matrix Anal. Appl., 17 (1996), pp. 187–211. [9] G. H. Golub and D. P. O’Leary, Some history of the conjugate gradient and Lanczos algorithms: 1948–1976, SIAM Rev., 31 (1989), pp. 50–102. [10] G. H. Golub and C. F. Van Loan, Matrix computations, vol. 3 of Johns Hopkins Series in the Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, 1983. [11] M. H. Gutknecht, The unsymmetric Lanczos algorithms and their relations to Pade approximation, continued fractions, and the QD algorithm. Preliminary Proceedings of the Copper Mountain Conference on Iterative methods, April, 1990. [12]

, A completed theory of the unsymmetric Lanczos process and related algorithms, part I, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 594–639.

[13]

, Lanczos-type solvers for nonsymmetric linear systems of equations, in Acta numerica, 1997, vol. 6 of Acta Numer., Cambridge Univ. Press, Cambridge, 1997, pp. 271–397.

[14]

, A brief introduction to Krylov space methods for solving linear systems, in Frontiers of Computational Science, Springer-Verlag Berlin, 2007, pp. 53–62. International Symposium on Frontiers of Computational Science, Nagoya Univ, Nagoya, Japan, Dec 12-13, 2005.

22

References

[15] M. H. Gutknecht and K. J. Ressel, Look-ahead procedures for Lanczos-type product methods based on three-term Lanczos recurrences, SIAM J. Matrix Anal. Appl., 21 (2000), pp. 1051– 1078. [16] M. Hanke, Conjugate gradient type methods for ill-posed problems, vol. 327 of Pitman Research Notes in Mathematics Series, Longman Scientific & Technical, Harlow, 1995. [17] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Research Nat. Bur. Standards, 49 (1952), pp. 409–436 (1953). [18] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, J. Research Nat. Bur. Standards, 45 (1950), pp. 255–282. [19]

, Solution of systems of linear equations by minimized-iterations, J. Research Nat. Bur. Standards, 49 (1952), pp. 33–53.

[20] D. G. Luenberger, Linear and nonlinear programming, Addison-Wesley Pub Co, Boston, MA, second ed., 1984. [21] G. Meurant and Z. Strakoˇs, The Lanczos and conjugate gradient algorithms in finite precision arithmetic, Acta Numer., 15 (2006), pp. 471–542. [22] C. C. Paige, Error analysis of the Lanczos algorithm for tridiagonalizing a symmetric matrix, J. Inst. Math. Appl., 18 (1976), pp. 341–349. [23]

, Krylov subspace processes, Krylov subspace methods, and iteration polynomials, in Proceedings of the Cornelius Lanczos International Centenary Conference (Raleigh, NC, 1993), Philadelphia, PA, 1994, SIAM, pp. 83–92.

[24] C. C. Paige and M. A. Saunders, Solutions of sparse indefinite systems of linear equations, SIAM J. Numer. Anal., 12 (1975), pp. 617–629. [25] J. K. Reid, On the method of conjugate gradients for the solution of large sparse systems of linear equations, in Large sparse sets of linear equations (Proc. Conf., St. Catherine’s Coll., Oxford, 1970), Academic Press, London, 1971, pp. 231–254. [26] Y. Saad, Iterative methods for sparse linear systems, Society for Industrial and Applied Mathematics, Philadelphia, PA, second ed., 2003. [27] J. R. Shewchuk, An introduction to the conjugate gradient method without the agonizing pain, tech. rep., Carnegie-Mellon University, Pittsburgh, PA, USA, 1994. [28] E. Stiefel, Relaxationsmethoden bester Strategie zur L¨ osung linearer Gleichungssysteme, Comment. Math. Helv., 29 (1955), pp. 157–179.