LASO2 Documentation David S. Scott Computer Science Department University of Texas at Austin Table of Contents 1. Introduction 1 2. New Eigenvalue Solver 1 2.1. Changed Workspace 1 2.2. Extra Space in S 2 2.3. Random Numbers 2 2.4. Storage of the Band Matrix 2 3. Other Changes 3 3.1. Local Reorthogonalization 3 3.2. Changes to subroutine SMVPC 3 3.3. Subroutine SVZERO 3 3.4. Change in Subroutine SORTQR 4 3.5. SUBROUTINE SVSORT 4 3.6. Formatting Changes 4 4. Subroutine Glossary 4 5. Variable Glossary 5 Appendix A: User guide for SNLASO Appendix B: User guide for SILASO 1 1. Introduction The LASO package is a set of FORTRAN IV subroutines for computing a few eigenvalues of a large (sparse) symmetric matrix. The LASO package is documented in a technical report [3]. This article documents the changes to the LASO package which were incorporated into LASO2. The primary difference between LASO and LASO2 is that LASO2 does not use EISPACK to compute the eigenvalues of the band matrix generated by the Lanczos algorithm. It also monitors all the eigenvalues of interest at every Lanczos step rather than selected ones. Other changes were incorporated at the same time and are described in this article. The user guides for the two entry points for LASO2 (subroutines SNLASO and SILASO) are given at the end. These guides are virtually identical to the user guides for the old LASO package. In particular, the calling sequences for SNLASO and SILASO are the same for both versions. 2. New Eigenvalue Solver The subroutines available in EISPACK [4] and [1] are not very efficient for computing a few eigenvalues of a symmetric band matrix. LASO2 uses SLAEIG, a specialized version of the subroutine SBPRQS [2]. The new solver is better for two reasons. 1. The work required is proportional to the dimension of the band matrix instead of being proportional to the square of the dimension. 2. SBPRQS can take advantage of known approximations to eigenvectors. In the LASO context certain eigenvalues are computed at each step and it is easy to augment the eigenvector from the previous step with zeroes to obtain an approximate eigenvector. In later steps of the Lanczos algorithm these approximations are really quite good and significant savings over the EISPACK codes are obtained. Several additional changes were needed to merge SBPRQS and LASO. These are described in the following subsections. 2.1. Changed Workspace In LASO2 the array WORK is broken into different pieces. The following table gives the work spaces and their dimensions in each of the entry points. ARRAY SNLASO SILASO P1 N*NBLOCK same P0 N*NBLOCK same RES NVAL MAXVEC TAU NVAL MAXVEC OTAU NVAL MAXVEC T (NBLOCK+1)*MAXJ same ALP NBLOCK*NBLOCK same BET NBLOCK*NBLOCK same 2 S MAXJ*(NVAL+1) MAXJ*(MAXVEC+2) P2 N*NBLOCK same BOUND 2*NVAL + 6 2*MAXVEC + 8 ATEMP (2*NBLOCK+1)*MAXVEC same VTEMP MAXJ same D (NBLOCK+1)*(2*NBLOCK+1) same The four arrays BOUND, ATEMP, VTEMP, and D are used by SLAEIG. The space as described above is slightly less than that needed by LASO. However a further savings was achieved by overlapping the space for P2 and the space for the last four arrays. Since P2 is occasionally used to hold an eigenvector in a call to SLAEIG it is necessary to offset the start of the array BOUND from the start of P2. In particular BOUND(1) = P2(NBLOCK+1). 2.2. Extra Space in S To take full advantage of the new band eigensolver it is necessary to remember the eigenvectors of interest from one Lanczos step to the next. This required that SNLASO have one extra column in S and SILASO to have two, to hold the eigenvectors associated with the nearest undesired eigenvalue(s) which are needed to estimate the accuracy of the desired eigenvalues. 2.3. Random Numbers The LASO package used the function URAND to generate uniform pseudo-random numbers whenever they were needed. In fact they were needed only to generate random vectors. Since SLAEIG also needs to generate random vectors, this task was isolated in a new subroutine SLARAN. It still uses the function URAND but it is now the only place where URAND is accessed which makes it easier for a user to substitute some other random number generator. 2.4. Storage of the Band Matrix In LASO the band matrix is stored in the MAXJ x (NBLOCK+1) array T. For efficiency SLAEIG was written to accept a (NBLOCK+1) x MAXJ array. This required that the storage of the blocks ALP and BET in T be changed. This change is more than just a transposition of T for two reasons. 1. In EISPACK the main diagonal of the matrix appears in the rightmost column of T while in SLAEIG the main diagonal is the first row. 2. In EISPACK the rows of the matrix appear as rows of T so that the empty triangle appears in the upper left corner. In SLAEIG 3 the columns of the matrix appear as columns of T so that the empty triangle appears at the bottom right corner. In fact in LASO2 the 'empty' triangle already contains the next off diagonal block BET. It was also necessary to provide a new variable NBAND = NBLOCK + 1 in order to dimension T properly. 3. Other Changes Several other changes to LASO, unrelated to the new eigensolver, were made at the same time. These changes are documented in the following subsections. 3.1. Local Reorthogonalization LASO provided for the local reorthogonalization of the next Lanczos block against the two previous ones (which are still in core). Several authors had suggested that this technique delayed the loss of orthogonality. Using selective orthogonalization the local reorthogonalization has the effect of slightly delaying the required orthogonalizations but has no apparent effect on the number of Lanczos steps required or the accuracy obtained. Since the cost of the local reorthogonalization is significant it has been dropped from LASO2. However to guard against significant cancellation due to good starting vectors the second Lanczos block is still reorthogonalized against the first. 3.2. Changes to subroutine SMVPC As noted in the documentation for LASO, SMVPC modified the computed orthogonality coefficients to try an reflect the effect of local reorthogonalization. Since the reorthogonalization has been dropped from LASO2, this modification was eliminated as well. 3.3. Subroutine SVZERO LASO had a subroutine SVZERO for zeroing arrays. The same effect can be obtained using the subroutine SCOPY with a zero increment. Thus the two calls CALL SVZERO(N, V) CALL SCOPY(N, 0.0, 0, V, 1) have exactly the same effect. Since SCOPY is needed anyway for other purposes, all calls to SVZERO were replaced by calls to SCOPY and SVZERO was eliminated. 4 3.4. Change in Subroutine SORTQR The subroutine SORTQR was changed to allow it to be used to orthonormalize Ritz vectors (approximate eigenvectors) as well as the Lanczos blocks. The required change was to include an extra integer parameter so that the dimension of the array holding the vectors to be orthonormalized could be different than the length of the vectors. 3.5. SUBROUTINE SVSORT At several points in the code it is necessary to sort eigenvalues and their associated eigenvectors. This task has now been separated into a separate module. 3.6. Formatting Changes Three changes were made in the formatting of the code (how it appears in a listing). These changes are 1. No required blank comment lines before and after each DO loop. 2. Indentation of DO loops 3 spaces instead of 5. 3. Elimination of sequence numbers. 4. Subroutine Glossary This section gives a list of all the subroutines and functions provided in LASO2 with a brief description of their purpose. SAXPY Basic linear algebra subroutine for adding a scalar multiple of one vector to another. SCOPY Basic linear algebra subroutine for copying one vector to another. SDOT Basic linear algebra function for computing dot products. SILASO The entry point to LASO2 for interval type problems. SIPPLA The post processing module for interval type problems. SIWLA The main computational module for interval type problems. SLABAX A Subroutine to compute the product of a band matrix and a vector. Used by SLAEIG. SLABCM The control module for SLAEIG. SLABCM decides which shift to use next in the Rayleigh quotient type iteration used by SLAEIG. SLABFC The computational module for SLAEIG. SLABFC factors the 5 shifted band matrix and solves linear equations. SLAEIG The entry point for the band eigensolver. SLAGER A subroutine for updating the Gerschgorin circle bounds on the spectrum of the band matrix. SLARAN A subroutine for generating random vectors. SMVPC A subroutine to compute the residual bounds and orthogonality coefficients. SNLASO The entry point to LASO2 for number type problems. SNPPLA The post processing module for number type problems. SNWLA The main computational module for number type problems. SNRM2 Basic linear algebra function for computing the 2-norm of a vector. SORTQR A subroutine to orthonormalize a set of vectors using Householder transformations. SSCAL Basic linear algebra subroutine for scaling a vector. SSWAP Basic linear algebra subroutine for swapping two vectors. SVSORT A subroutine to sort eigenvalues and their corresponding eigenvectors. URAND A portable random number function. 5. Variable Glossary This section gives a description of most of the variables used in the six main subroutines in LASO2. Immediately following the variable name is a string of two characters. The first character indicates whether the variable is a scalar (s) or an array (a). The second character indicates the variable type: logical (l), integer (i), or real (r). The string xx indicates that the variable is an external subroutine. After these two characters there follows a string of six characters. These characters indicate the status of the variable with respect to the six subroutines 1. SNLASO 2. SNWLA 3. SNPPLA 4. SILASO 5. SIWLA 6 6. SIPPLA The possible characters and their meanings are: . the variable does not appear in the subroutine. i the variable is a formal parameter which must have a value on entry to the subroutine. o the variable is a formal parameter which will always be assigned a value before the subroutine is finished. b the variable is a formal parameter which is used for both input and output. w the variable is a formal parameter which holds internally defined and used quantities or which is passed to subsidiary subroutines to provide workspace. s the variable is strictly internal to the subroutine and is not passed to any of the other major subroutines. Thus an entry in the table XXXXXX sr .iobws would mean that the variable XXXXXX is a scalar real variable which does not appear in SNLASO, is an input parameter to SNWLA, is an output parameter from SNPPLA, is both input and output in SILASO, is passed to other subroutines for workspace in SIWLA, and is used only internally in SIPPLA. Such a combination is impossible. ALP ar .w..w. ALP is the current diagonal block of T. ALPMAX sr .s..s. ALPMAX is the largest eigenvalue of the ALP. ALPMIN sr .s..s. ALPMIN is the smallest eigenvalue of ALP. ANORM sr .s..s. ANORM is the best estimate of the norm of the matrix whose eigenvalues are being computed. ATEMP ar .w..w. ATEMP provides work space for SLAEIG (it holds the upper triangle of the factored matrix). It is also used in calls to to SMVPC to hold the residual norms of the eigenvectors (when more than one residual norm is being calculated) AXL sr ....s. AXL is a small perturbation of XL. AXR sr ....s. AXR is a small perturbation of XR. BET ar .w..w. BET is the current off diagonal block of T. 7 BETMAX sr .s..s. BETMAX is the largest singular value of BET. BETMIN sr .s..s. BETMIN is the smallest singular value of BET. BOUND ar .ww.ww BOUND provides workspace to SLAEIG (it holds the current upper and lower bounds on the eigenvalues). D ar .ww.ww D provides workspace to SLAEIG (it holds the active part of the matrix during the factorization). DELTA sr soi... DELTA is the (NVAL+1)th eigenvalue of A. It is needed to assess the accuracy of the desired eigenvalues. DELTAL sr ...soi DELTAL is the smallest eigenvalue of A which is bigger than XL. It is needed to assess the accuracy of the desired eigenvalues. DELTAR sr ...soi DELTAR is the largest eigenvalue of A which is smaller than XR. It is needed to assess the accuracy of the desired eigenvalues. DONE sl ....s. DONE indicates whether the largest and smallest eigenvalues of T in the interval [AXL,AXR] have settle down sufficiently so that the corresponding eigenvalue of A is known to lie in the same interval. ENOUGH sl .s.... ENOUGH indicates whether the number of eigenvalues of T computed is all that are required. EPS sr siisii EPS is an approximation to the relative machine precision. EPSRT sr .s..s. EPSRT is the square root of EPS. H ar ..w..w H is the reduced matrix in the final Rayleigh-Ritz procedure (when needed). HMAX sr ..s..s HMAX is set by a call to SLAGER to an estimate of the largest eigenvalue of H. HMIN sr ..s..s HMIN is set by a call to SLAGER to an estimate of the smallest (most negative) eigenvalue of H. HV ar ..w..w HV contains the eigenvectors of H after a call to SLAEIG. 8 IER si .ss.ss IER is the error return flag from SLAEIG. IERR si oooooo IERR is the error return flag for LASO2. IND ai ow.ow. IND is a integer array of work space used primarily for sorting purposes. On exit from either SNLASO or SILASO IND(1) is set to the number of calls to OP. IOVECT xx iiiiii IOVECT is the user supplied subroutine for storing and recalling Lanczos vectors. J si .s..s. J is the current dimension of T in the Lanczos algorithm. MAXJ si ii.ii. MAXJ is the maximum allowed value of J. MAXOP si ii.ii. MAXOP is the upper limit on calls to OP. MAXVEC si ...ii. MAXVEC the maximum number of eigenvalues to be found. N si iiiiii N is the dimension of the matrix A. NBAND si si.si. NBAND (= NBLOCK + 1) is the row dimension of T. NBLOCK si iiiiii NBLOCK is the number of vectors in each Lanczos block. NFIG si ii.ii. NFIG is the number of decimal digits of accuracy desired. NGOOD si .s..s. NGOOD is the number of eigenvectors currently being used for orthogonalization. NL si ....s. NL is the number of computed eigenvalues which belong to the left half of the spectrum. NLEFT si .s.... NLEFT is the number of desired eigenvalues which have not converged. 9 NMVAL si i.ii.i NMVAL is the row dimension of VAL (except in SNWLA and SIWLA). NMVEC si iiiiii NMVEC is the row dimension of VEC. NOP si sobsob NOP is the number of calls to OP. NPERM si bbibbb NPERM is the number of accepted eigenvectors of A. NSTART si .s..s. NSTART is the number of starting vectors used in restarting the Lanczos algorithm. NTHETA si .s..s. NTHETA is the number of eigenvalues of T computed at a potential pause. NUMBER si .s..s. NUMBER (= NPERM + NGOOD) is the number of eigenvectors currently used for orthogonalization. NUML si ....s. NUML-1 is the number of desired eigenvalues smaller than AXL. NUMR si ....s. NUMR-1 is the number of desired eigenvalues larger than AXR. NVAL si ii.... abs(NVAL) is the number of desired eigenvalues. It may be negative in SNLASO. OP xx iiiiii OP is the user supplied subroutine for computing matrix vector products. OTAU ar .w..w. OTAU is used to monitor the return of deflated eigenvectors. P ar ..w..w P holds a block of eigenvectors during the post processing. P0 ar .w..w. P0 is the previous block of Lanczos vectors. P1 ar .w..w. P1 is the current block of Lanczos vectors. P2 ar .w..w. P2 is used in the Lanczos step and used elsewhere for temporary workspace. PNORM sr .s.... PNORM is the 'norm' of the desired eigenvalues. That is PNORM = the largest absolute value of a desired eigenvalue. It is used in the termination criterion. Q ar ..w..w Q holds a block of eigenvectors during the post processing. 10 RARITZ sl soisoi RARITZ is true if a final Rayleigh-Ritz procedure is needed due to an eigenvalue of multiplicity greater than NBLOCK. RES ar .w..w. RES holds the residual norms of the eigenvectors saved from a previous Lanczos sequence. RNORM sr .s..s. RNORM is the 'norm' of the eigenvalues computed by previous Lanczos sequences. (See PNORM.) S ar .w..w. S holds eigenvectors of T. SMALL sl sii... SMALL is true if the smallest eigenvalues of A are to be computed. T ar .w..w. T is the band matrix computed by the Lanczos algorithm. TAU ar .w..w. TAU is used to monitor the return of deflated eigenvectors. TEST sl .s..s. TEST is true if a test run is needed due to an eigenvalue of multiplicity greater than NBLOCK - 1. TMAX sr .s..s. TMAX is the Gerschgorin circle bound on the largest eigenvalue of T. TMIN sr .s..s. TMIN is the Gerschgorin circle bound on the smallest eigenvalue of T. TOLA sr .s..s. TOLA is the relative error tolerance for computed eigenvalues. TOLG sr .s..s. TOLG is the threshold for the orthogonalization coefficients. TXL sr .s..s. TXL is a dummy parameter in a number type call to SLAEIG. TXR sr .s..s. TXR is a dummy parameter in a number type call to SLAEIG. 11 UTOL sr .s..s. UTOL is the current absolute error tolerance for computed eigenvalues. VAL ar ooboob VAL holds the computed eigenvalues and in the post processors the error estimates as well. VEC ar ooboob VEC holds the computed eigenvectors. VTEMP ar .w..w. VTEMP provides workspace to SLAEIG. It is also used in calls to to SMVPC to hold the orthogonality coefficients of the eigenvectors (when more than one is being calculated). XL sr ...iii XL is the left endpoint of the excluded interval. XR sr ...iii XR is the right endpoint of the excluded interval. Appendix A. User Guide for SNLASO 1. Purpose The subroutine SNLASO computes a few eigenvalues and eigenvectors at one end of the spectrum of a large sparse symmetric matrix, hereafter called A. SNLASO uses the block Lanczos algorithm with selective orthogonalization to compute Rayleigh-Ritz approximations to eigenpairs of A. 2. Usage 2.1. Calling Sequence The subroutine statement is SUBROUTINE SNLASO(OP, IOVECT, N, NVAL, NFIG, NPERM, NMVAL, VAL, NMVEC, VEC, NBLOCK, MAXOP, MAXJ, WORK, IND, IERR) On input: OP specifies a user supplied subroutine for entering information about the matrix A with calling sequence OP(N,M,P,Q). See section 2.2 for further information IOVECT specifies a user supplied subroutine for storing and recalling vectors with calling sequence IOVECT(N,M,Q,J,K). See section 2.2 for further information. N specifies the dimension of A. NVAL specifies which eigenvalues are desired. abs(NVAL) is the number of eigenvalues to be computed. If NVAL < 0 the algebraically smallest (leftmost) eigenvalues are computed while if NVAL > 0 the largest (rightmost) eigenvalues are computed. NVAL must not be zero. NFIG specifies the number of decimal digits of accuracy desired in the eigenvalues. NPERM is an integer variable specifying the number of eigenpairs presupplied by the user. In most cases NPERM will be zero. See section 2.8 about using positive values of NPERM. NPERM must not be negative. NMVAL specifies the row dimension of VAL. NMVAL must not be smaller than abs(NVAL). VAL a two dimensional real array with NMVAL rows and at least four columns. If NPERM > 0 on input then VAL must contain certain information. 1 NMVEC specifies the row dimension of VEC. NMVEC must not be smaller than N. VEC a two dimenional real array with NMVEC rows and at least abs(NVAL) columns. NBLOCK specifies the number of vectors in each Lanczos block. See section 2.6 for guidelines in choosing NBLOCK. NBLOCK must be positive and must not exceed MAXJ/6. MAXOP specifies an upper bound on the number of calls to the subroutine OP. SNLASO terminates when this maximum is reached. See section 2.7 for guidelines in choosing MAXOP. MAXJ indicates the amount of storage available (see WORK below in this section and IOVECT in section 2.2). MAXJ should be as large as possible subject to the available storage. However there is no advantage in having MAXJ > MAXOP*NBLOCK. MAXJ must not be smaller than 6*NBLOCK. WORK is a one dimensional real array at least as large as 2*N*NBLOCK + MAXJ*(NBLOCK+NV+2) + 2*NBLOCK*NBLOCK + 3*NV plus the maximum of N*NBLOCK and MAXJ*(2*NBLOCK+3) + 2*NV + 6 + (2*NBLOCK+2)*(NBLOCK+1), where NV = abs(NVAL). The first N*NBLOCK elements must contain the starting vectors for the algorithm (see section 2.5). IND is an integer array of dimension at least abs(NVAL) used for workspace. IERR is an integer variable. On output: NPERM is the number of eigenpairs now known. VAL contains information about the eigenvalues. The first column of VAL contains the eigenvalues, ordered from the most extreme inward. The second, third, and fourth columns of VAL contain accuracy information explained in section 2.4. 2 VEC contains the corresponding eigenvectors. WORK if IERR is negative then the first N*NBLOCK elements of WORK contain the best vectors for restarting the algorithm (see section 2.5). IND(1) contains the actual number of calls to OP. In some circumstances this might be slightly larger than MAXOP. IERR is an error completion code. The normal completion code is zero. See section 2.3 for an explanation of nonzero codes. 2.2. User Supplied Subroutines The two user supplied subroutines must be declared EXTERNAL in the calling program and must conform as follows. OP(N,M,P,Q). P and Q are NxM real arrays. Q should be returned as AP where A is the matrix whose eigenvalues are to be determined. M will never be larger than NBLOCK but it may be smaller. This subroutine is the only way that the matrix enters the calculation. The user should adequately test the subroutine OP because SNLASO has no way of detecting errors in OP. IOVECT(N,M,Q,J,K). Q is an NxM real array. M will never be larger than NBLOCK but it may be smaller. IOVECT is used to store Lanczos vectors as they are computed and to periodically recall all the currently stored Lanczos vectors. If K = 0 then the M columns of Q should be stored as the (J - M + 1)th through the Jth Lanczos vectors. If K = 1 then the columns of Q should be returned as the (J - M +1)th through the Jth Lanczos vectors which were previously stored. Lanczos vectors are computed sequentially. They are stored by calls to IOVECT with K = 0 and increasing values of J up to some internally derived value J = I which signals a pause. These vectors are then recalled by calls to IOVECT with K = 1 and the same sequence of J values. The first J value in any sequence is equal to M. After the pause more Lanczos vectors are computed and these are stored by calls to IOVECT with K = 0 and J values greater than I until the next pause at which time all the Lanczos vectors currently stored are recalled with calls to IOVECT with K = 1 and J = M, ... . After any pause the algorithm may discard the current Lanczos vectors and start a new sequence of Lanczos vectors by a call to IOVECT with K = 0 and J = M. At subsequent pauses only the current sequence of Lanczos vectors is recalled. In solving a problem SNLASO may pause many times and discard the previous Lanczos vectors several times before computing all the desired 3 eigenvalues. The largest value of J which can appear in a call to IOVECT is J = MAXJ. We give two examples for IOVECT. The first example requires than logical unit 20 be assigned to a secondary storage medium. SUBROUTINE IOVECT(N,M,Q,J,K) INTEGER N,M,J,K,I,L DIMENSION Q(N,M) IF(J .EQ. M) REWIND 20 IF(K .EQ. 0) WRITE(20) ((Q(I,L), I=1,N), L=1,M) IF(K .EQ. 1) READ(20) ((Q(I,L), I=1,N), L=1,M) RETURN END This example will fail on systems that do not allow a WRITE after a READ. On such a system it would be necessary to copy back and forth between two unit numbers. The Lanczos vectors can also be kept in fast store. In this example we assume that N < 100 and MAXJ < 50. SUBROUTINE IOVECT(N,M,Q,J,K) INTEGER N,M,J,K,I,L,L1 DIMENSION Q(N,M) COMMON QVEC(100,50) IF(K .EQ. 1) GO TO 30 DO 20 L = 1,M L1 = J - M + L DO 10 I = 1,N QVEC(I,L1) = Q(I,L) 10 CONTINUE 20 CONTINUE RETURN C 30 DO 50 L = 1,M L1 = J - M + L DO 40 I = 1,N Q(I,L) = QVEC(I,L1) 40 CONTINUE 50 CONTINUE RETURN END 4 2.3. Error Completion Codes IERR = 0 indicates a normal completion. abs(NVAL) eigenpairs have been computed. See section 2.4 for a description of the information returned. IERR > 0 and IERR < 1024 indicates that some calling parameters were inconsistant. 1-bit is set if N < 6*NBLOCK 2-bit is set if NFIG < 0 4-bit is set if NMVEC < N 8-bit is set if NPERM < 0 16-bit is set if MAXJ < 6*NBLOCK 32-bit is set if abs(NVAL) < max(1, NPERM) 64-bit is set if abs(NVAL) > NMVAL 128-bit is set if abs(NVAL) > MAXOP 256-bit is set if abs(NVAL) > MAXJ/2 512-bit is set if NBLOCK < 1 Thus IERR can be decoded to determine the errors. For example IERR = 68 indicates both NMVEC < N and abs(NVAL) > NMVAL. IERR = -1 can occur only if NPERM > 0 on input. It indicates that either some user supplied eigenvector was zero or that significant cancellation occurred when the user supplied vectors were orthogonalized. Some modification of the user supplied vectors may have occurred but no other computation will have been done. IERR = -2 indicates that MAXOP calls to the subroutine OP occurred without finding all the desired eigenvalues. NPERM eigenvalues are known and information on them is returned as described in section 2.4. Furthermore the first N*NBLOCK elements of work contain the best vectors for restarting the algorithm. Thus SNLASO may be immediately recalled to continue working on the problem if progress up to this point is deemed satisfactory. IERR = -8 indicates that disastrous loss of orthogonality has occurred. This is usually due to errors in the user supplied subroutine OP or possible IOVECT. 2.4. Information Returned When IERR = 0 IERR = 0 indicates that the desired eigenpairs have been found. The eigenvalues are in the first column of VAL. If NVAL < 0 the eigenvalues are in ascending order (smallest at the top) while if NVAL > 0 the eigenvalues are in descending order. The corresponding eigenvectors are in the first abs(NVAL) columns of VEC. The second column of VAL contains the residual norms (2-norm of Ay - ye, for the eigenvector y and the eigenvalue e) which are bounds on the accuracy of the eigenvalues. 5 In most cases the residual norms seriously underestimate the accuracy of the eigenvalues. To obtain a more realistic estimate of the error the program approximates the gap between the desired eigenvalues and the rest of the spectrum and computes the (residual squared)/gap which would be a bound on the accuracy of the eigenvalue if the gap was accurate. This estimate of the accuracy of the eigenvalue is returned in the third column of VAL. The fourth column of VAL contains residual/gap which is an estimate of the error in the eigenvector. If the user has supplied some eigenpairs of the matrix, it is possible that some of these eigenpairs have been discarded in favor of eigenpairs computed by the algorithm. (See section 2.8 for additional information). 2.5. Choosing the Starting Vectors SNLASO requires NBLOCK starting vectors to be stored in the first N*NBLOCK elements of the array WORK. Zero vectors are replaced by random vectors so that a set of random starting vectors may be selected by merely initializing the first N*NBLOCK elements of WORK to zero. However convergence is enhanced if the starting vectors have large components in the directions of the desired eigenvectors. Therefore if the user knows approximations to the desired eigenvectors he should choose his starting vectors as linear combinations of these approximations. If some of the desired eigenpairs are already known to sufficient accuracy it is possible to avoid having SNLASO recompute these eigenpairs, see section 2.8 for details. 2.6. Choosing a Value for NBLOCK NBLOCK specifies the number of vectors in each Lanczos block. Two factors may favor a large block size. The convergence of the algorithm is faster if NBLOCK exceeds the highest multiplicity of any of the desired eigenvalues. For example if a desired eigenvalue is double then a value of NBLOCK at least 3 is best. Also NBLOCK vectors are multiplied by A on every call to OP. Large values of NBLOCK tend to reduce the number of calls to OP so that if the matrix must be recomputed or recalled from disk then a large value of NBLOCK may be preferable (perhaps as large as 8). On the other hand the amount of overhead involved in the algorithm itself is a quadratic of NBLOCK. Furthermore the convergence of the algorithm is degraded if nblock > sqrt(MAXJ). In conclusion if the matrix multiply is inexpensive then a small value of NBLOCK (2 or 3) is recommended while if the matrix multiply is expensive then larger values of NBLOCK (up to 8 say) are better. NBLOCK = 1 is recommended if and only if abs(NVAL) = 1 as well. 6 2.7. Choosing a Value for MAXOP SNLASO is an iterative procedure. The user must limit the effort by SNLASO on a given problem by choosing a value for MAXOP. If more than MAXOP calls to the subroutine OP are needed to solve the given problem then SNLASO terminates at that point with IERR set to -2. MAXOP must not be less than abs(NVAL) and setting MAXOP less than MAXJ/NBLOCK will waste the storage indicated by MAXJ. If cost is not a factor and the subroutine OP is known to be correct then there is no harm in setting MAXOP to a large value (say N). Otherwise MAXOP should be set an upper bound on the acceptable cost of running the code. 2.8. Setting NPERM > 0 SNLASO allows known eigenpairs to be input directly so that they need not be recomputed. The first column of VAL must contain the eigenvalue (in any order) and the second column of VAL must contain the residual norms. The correct order of magnitude is sufficient. Columns three and four of VAL are arbitrary. The first NPERM columns of VEC must contain the corresponding eigenvectors. SNLASO will sort the eigenvalues (and vectors) and then orthonormalize the eigenvectors. The user supplied eigenpairs are counted the number of desired eigenpairs so NPERM must not be greater than abs(NVAL). If in the course of the computation it appears that a user supplied eigenpair is not one of the desired ones it will be discarded. In particular if NPERM = abs(NVAL) then the algorithm will either confirm that the user supplied eigenpairs are the desired ones or it will discard one or more of them and go on and compute their replacements. 3. Applicability and Restrictions SNLASO is designed to find a few extreme eigenpairs of a large sparse symmetric matrix. For small dense matrices the subroutines provided by EISPACK are to be preferred. It is not necessary for the matrix to be explicitly represented. It is only necessary to providea subroutine OP to compute matrix vector products. Any compact storage scheme that is sufficient to generate the matrix vector products will work. Consider the generalized eigenvalue problem (A - (lambda)M)x = 0, where M is positive T -1 -T definite and can be factored as LL . The matrix L AL can be coded in OP as a triangular solve, a multiply by A, and another triangular solve. Thus the generalized eigenvalue problem can be reduced to a standard -1 -T eigenvalue problem without explicitly forming the matrix L AL . More complex operators can also be handled. SNLASO calls a number of subsidiary functions and subroutines. The following names must not be used in the driver program. SNWLA,SNPPLA,SMVPC,SORTQR,SVSORT,SAXPY,SCOPY,SDOT,SNRM2,SSCAL,SSWAP,SLAEIG, 7 SLAGER,SLARAN,SLABAX,SLABCM,SLABFC,URAND. 4. Discussion of Method SNLASO uses the block Lanczos algorithm with selective orthogonalization. The (scalar) Lanczos algorithm is an efficient scheme for computing an orthonormal basis q, q ,...,q for the Krylov subspace 1 2 j j-1 span(q , Aq , ..., A q ) and the corresponding reduced matrix 1 1 1 T T = Q AQ , which happens to be tridiagonal. At each step the Krylov j j j subspace grows larger, one more Lanczos vector is added to the list, and T gets longer. With T it is easy to compute the Rayleigh-Ritz approximations to eigenpairs of A from the Krylov subspace and their corresponding error bounds. The algorithm continues until the error bounds are sufficiently small. The block Lanczos algorithm replaces each vector in the simple Lanczos algorithm by an orthonormal block of vectors. Block Lanczos has theoretical advantages over simple Lanczos with respect to finding multiple eigenvalues and has advantages in efficiency if the cost of forming the matrix vector products is high. Unfortunately finite precision arithmetic causes the vectors computed by the algorithm to lose orthogonality and approach linear dependence. To maintain robust independence among the Lanczos vectors SNLASO uses selective orthogonalization in which a few Lanczos vectors are orthogonalized against a few selected Ritz vectors. The algorithm is terminated when the desired Ritz values are sufficiently accurate. If necessary, SNLASO then makes another Lanczos run to test for undisclosed multiplicities. Finally if such multiplicities are discovered SNLASO performs a final Rayleigh-Ritz procedure to resolve such clusters. Appendix B. User Guide for SILASO 1. Purpose The subroutine SILASO computes all the eigenvalues and eigenvectors of a large sparse symmetric matrix, hereafter called A, outside a user defined excluded interval. SILASO uses the block Lanczos algorithm with selective orthogonalization to compute Rayleigh-Ritz approximations to eigenpairs of A. 2. Usage 2.1. Calling Sequence The subroutine statement is SUBROUTINE SILASO(OP, IOVECT, N, XL, XR, NFIG, NPERM, NMVAL, VAL, NMVEC, MAXVEC, VEC, NBLOCK, MAXOP, MAXJ, WORK, IND, IERR) On input: OP specifies a user supplied subroutine for entering information about the matrix A with calling sequence OP(N,M,P,Q). See section 2.2 for further information IOVECT specifies a user supplied subroutine for storing and recalling vectors with calling sequence IOVECT(N,M,Q,J,K). See section 2.2 for further information. N specifies the dimension of A. XL specifies the left endpoint of the excluded interval. XR specifies the right endpoint of the excluded interval. NFIG specifies the number of decimal digits of accuracy desired in the eigenvalues. NPERM is an integer variable specifying the number of eigenpairs presupplied by the user. In most cases NPERM will be zero. See section 2.8 about using positive values of NPERM. NPERM must not be negative. NMVAL specifies the row dimension of VAL. NMVAL must not be smaller than MAXVEC. VAL a two dimensional real array with NMVAL rows and at least four columns. If NPERM > 0 on input then VAL must contain certain information. NMVEC specifies the row dimension of VEC. NMVEC must not be smaller than N. 1 MAXVEC specifies the maximum number of eigenpairs which can be computed. VEC a two dimenional real array with NMVEC rows and at least MAXVEC columns. NBLOCK specifies the number of vectors in each Lanczos block. See section 2.6 for guidelines in choosing NBLOCK. NBLOCK must be positive and must not exceed MAXJ/6. MAXOP specifies an upper bound on the number of calls to the subroutine OP. SILASO terminates when this maximum is reached. See section 2.7 for guidelines in choosing MAXOP. MAXJ indicates the amount of storage available (see WORK below in this section and IOVECT in section 2.2). MAXJ should be as large as possible subject to the available storage. However there is no advantage in having MAXJ > MAXOP*NBLOCK. MAXJ must not be smaller than 6*NBLOCK. WORK is a one dimensional real array at least as large as 2*N*NBLOCK + MAXJ*(NBLOCK+1+MAXVEC+2) + 2*NBLOCK*NBLOCK + 3*MAXVEC plus the maximum of N*NBLOCK and MAXJ*(2*NBLOCK+2) + 2*MAXVEC + 8 + (2*NBLOCK+2)*(NBLOCK+1) The first N*NBLOCK elements must contain the starting vectors for the algorithm (see section 2.5). IND is an integer array of dimension at least MAXVEC used for workspace. IERR is an integer variable. On output: NPERM is the number of eigenpairs now known. VAL contains information about the eigenvalues. The first column of VAL contains the eigenvalues, ordered from the most extreme inward. The second, third, and fourth columns of VAL contain accuracy information explained in section 2.4. 2 VEC contains the corresponding eigenvectors. WORK if IERR is negative then the first N*NBLOCK elements of WORK contain the best vectors for restarting the algorithm (see section 2.5). IND(1) contains the actual number of calls to OP. In some circumstances this might be slightly larger than MAXOP. IERR is an error completion code. The normal completion code is zero. See section 2.3 for an explanation of nonzero codes. 2.2. User Supplied Subroutines The two user supplied subroutines must be declared EXTERNAL in the calling program and must conform as follows. OP(N,M,P,Q). P and Q are NxM real arrays. Q should be returned as AP where A is the matrix whose eigenvalues are to be determined. M will never be larger than NBLOCK but it may be smaller. This subroutine is the only way that the matrix enters the calculation. The user should adequately test the subroutine OP because SILASO has no way of detecting errors in OP. IOVECT(N,M,Q,J,K). Q is an NxM real array. M will never be larger than NBLOCK but it may be smaller. IOVECT is used to store Lanczos vectors as they are computed and to periodically recall all the currently stored Lanczos vectors. If K = 0 then the M columns of Q should be stored as the (J - M + 1)th through the Jth Lanczos vectors. If K = 1 then the columns of Q should be returned as the (J - M +1)th through the Jth Lanczos vectors which were previously stored. Lanczos vectors are computed sequentially. They are stored by calls to IOVECT with K = 0 and increasing values of J up to some internally derived value J = I which signals a pause. These vectors are then recalled by calls to IOVECT with K = 1 and the same sequence of J values. The first J value in any sequence is equal to M. After the pause more Lanczos vectors are computed and these are stored by calls to IOVECT with K = 0 and J values greater than I until the next pause at which time all the Lanczos vectors currently stored are recalled with calls to IOVECT with K = 1 and J = M, ... . After any pause the algorithm may discard the current Lanczos vectors and start a new sequence of Lanczos vectors by a call to IOVECT with K = 0 and J = M. At subsequent pauses only the current sequence of Lanczos vectors is recalled. In solving a problem SILASO may pause many times and discard the previous Lanczos vectors several times before computing all the desired 3 eigenvalues. The largest value of J which can appear in a call to IOVECT is J = MAXJ. We give two examples for IOVECT. The first example requires than logical unit 20 be assigned to a secondary storage medium. SUBROUTINE IOVECT(N,M,Q,J,K) INTEGER N,M,J,K,I,L DIMENSION Q(N,M) IF(J .EQ. M) REWIND 20 IF(K .EQ. 0) WRITE(20) ((Q(I,L), I=1,N), L=1,M) IF(K .EQ. 1) READ(20) ((Q(I,L), I=1,N), L=1,M) RETURN END This example will fail on systems that do not allow a WRITE after a READ. On such a system it would be necessary to copy back and forth between two unit numbers. The Lanczos vectors can also be kept in fast store. In this example we assume that N < 100 and MAXJ < 50. SUBROUTINE IOVECT(N,M,Q,J,K) INTEGER N,M,J,K,I,L,L1 DIMENSION Q(N,M) COMMON QVEC(100,50) IF(K .EQ. 1) GO TO 30 DO 20 L = 1,M L1 = J - M + L DO 10 I = 1,N QVEC(I,L1) = Q(I,L) 10 CONTINUE 20 CONTINUE RETURN C 30 DO 50 L = 1,M L1 = J - M + L DO 40 I = 1,N Q(I,L) = QVEC(I,L1) 40 CONTINUE 50 CONTINUE RETURN END 4 2.3. Error Completion Codes IERR = 0 indicates a normal completion. NPERM eigenpairs have been computed. See section 2.4 for a description of the information returned. IERR > 0 and IERR < 1024 indicates that some calling parameters were inconsistant. 1-bit is set if N < 6*NBLOCK 2-bit is set if NFIG < 0 4-bit is set if NMVEC < N 8-bit is set if NPERM < 0 16-bit is set if MAXJ < 6*NBLOCK 32-bit is set if MAXVEC < NPERM 64-bit is set if MAXVEC > NMVAL 128-bit is set if MAXVEC > MAXOP 256-bit is set if XL > XR 512-bit is set if NBLOCK < 1 Thus IERR can be decoded to determine the errors. For example IERR = 68 indicates both NMVEC < N and MAXVEC > NMVAL. IERR = -1 can occur only if NPERM > 0 on input. It indicates that either some user supplied eigenvector was zero or that significant cancellation occurred when the user supplied vectors were orthogonalized. Some modification of the user supplied vectors may have occurred but no other computation will have been done. IERR = -2 indicates that MAXOP calls to the subroutine OP occurred without finding all the desired eigenvalues. NPERM eigenvalues are known and information on them is returned as described in section 2.4. Furthermore the first N*NBLOCK elements of work contain the best vectors for restarting the algorithm. Thus SILASO may be immediately recalled to continue working on the problem if progress up to this point is deemed satisfactory. IERR = -4 can occur only if NPERM > 0 on input. It indicates that a user supplied eigenvalue lies inside the excluded interval. Some modification of the user supplied eigenpairs may have occurred but no other computation will have been done. IERR = -5 and IERR = -6 indicate that the program has terminated without full assurance that all the desired eigenvalues have been computed due to an approximate eigenvalue which lies inside but near to the boundary of the excluded region which has not converged sufficiently to guarantee that the real eigenvalue lies outside the excluded interval. IERR = -5 if J = MAXJ and IERR = -6 if MAXOP was exceeded. All the eigenpairs returned by the code are correct but there may be more. It is necessary to recall SILASO to continue working on the problem to obtain full assurance. IERR = -8 indicates that disastrous loss of orthogonality has occurred. 5 This is usually due to errors in the user supplied subroutine OP or possible IOVECT. IERR < -10 indicates that some of the eigenvalues found by the program lie inside the excluded interval but have error bounds which overlap the boundary. Such eigenvalues are explicitly set equal to the boundary and marked as described in section 2.4. The tens digit indicates the number of such eigenvalues while the units digit should be interpreted as above. IERR < -100 indicates that more than MAXVEC eigenvalues lie outside the excluded interval. The hundreds digit gives the number of extra eigenvalues now known to exist. Any eigenpairs returned by the code are correct and after raising the value of MAXVEC it is possible to recall SILASO to keep working on the problem. Of course the extra storage indicated by the larger value of MAXVEC must be available and it is possible that the code may find even more eigenvalues and again terminate with IERR < -100. The tens and units digits are as described above. 2.4. Information Returned When IERR = 0 IERR = 0 indicates that the all desired eigenpairs have been found. The NPERM eigenvalues are in the first column of VAL. The eigenvalues are in ascending order. The corresponding eigenvectors are in the first NPERM columns of VEC. The second column of VAL contains the residual norms (2-norm of Ay - ye, for the eigenvector y and the eigenvalue e) which are bounds on the accuracy of the eigenvalues. In most cases the residual norms seriously underestimate the accuracy of the eigenvalues. To obtain a more realistic estimate of the error the program approximates the gaps between the desired eigenvalues and the rest of the spectrum and computes the (residual squared)/gap which would be a bound on the accuracy of the eigenvalue if the gap was accurate. This estimate of the accuracy of the eigenvalue is returned in the third column of VAL. The fourth column of VAL contains residual/gap which is an estimate of the error in the eigenvector. 2.5. Choosing the Starting Vectors SILASO requires NBLOCK starting vectors to be stored in the first N*NBLOCK elements of the array WORK. Zero vectors are replaced by random vectors so that a set of random starting vectors may be selected by merely initializing the first N*NBLOCK elements of WORK to zero. However convergence is enhanced if the starting vectors have large components in the directions of the desired eigenvectors. Therefore if the user knows approximations to the desired eigenvectors he should choose his starting vectors as linear combinations of these approximations. If some of the desired eigenpairs are already known to sufficient accuracy it is possible to avoid having SILASO recompute these eigenpairs, see section 2.8 for details. 6 2.6. Choosing a Value for NBLOCK NBLOCK specifies the number of vectors in each Lanczos block. Two factors may favor a large block size. The convergence of the algorithm is faster if NBLOCK exceeds the highest multiplicity of any of the desired eigenvalues. For example if a desired eigenvalue is double then a value of NBLOCK at least 3 is best. Also NBLOCK vectors are multiplied by A on every call to OP. Large values of NBLOCK tend to reduce the number of calls to OP so that if the matrix must be recomputed or recalled from disk then a large value of NBLOCK may be preferable (perhaps as large as 8). On the other hand the amount of overhead involved in the algorithm itself is a quadratic of NBLOCK. Furthermore the convergence of the algorithm is degraded if nblock > sqrt(MAXJ). In conclusion if the matrix multiply is inexpensive then a small value of NBLOCK (2 or 3) is recommended while if the matrix multiply is expensive then larger values of NBLOCK (up to 8 say) are better. NBLOCK = 1 is recommended if and only if abs(NVAL) = 1 as well. 2.7. Choosing a Value for MAXOP SILASO is an iterative procedure. The user must limit the effort by SILASO on a given problem by choosing a value for MAXOP. If more than MAXOP calls to the subroutine OP are needed to solve the given problem then SILASO terminates at that point with IERR set to -2. MAXOP must not be less than abs(NVAL) and setting MAXOP less than MAXJ/NBLOCK will waste the storage indicated by MAXJ. If cost is not a factor and the subroutine OP is known to be correct then there is no harm in setting MAXOP to a large value (say N). Otherwise MAXOP should be set an upper bound on the acceptable cost of running the code. 2.8. Setting NPERM > 0 SILASO allows known eigenpairs to be input directly so that they need not be recomputed. The first column of VAL must contain the eigenvalue (in any order) and the second column of VAL must contain the residual norms. The correct order of magnitude is sufficient. Columns three and four of VAL are arbitrary. The first NPERM columns of VEC must contain the corresponding eigenvectors. SILASO will sort the eigenvalues (and vectors) and then orthonormalize the eigenvectors. 3. Applicability and Restrictions SILASO is designed to find a few extreme eigenpairs of a large sparse symmetric matrix. For small dense matrices the subroutines provided by EISPACK are to be preferred. It is not necessary for the matrix to be explicitly represented. It is only necessary to providea subroutine OP to compute matrix vector products. Any compact storage scheme that is sufficient to generate the matrix vector products will work. Consider the generalized eigenvalue problem (A - (lambda)M)x = 0, where M is positive 7 T -1 -T definite and can be factored as LL . The matrix L AL can be coded in OP as a triangular solve, a multiply by A, and another triangular solve. Thus the generalized eigenvalue problem can be reduced to a standard -1 -T eigenvalue problem without explicitly forming the matrix l Al . More complex operators can also be handled. SILASO can be used to find all the eigenvalues of A in some given interval [a,b] provided that it is possible to factor (A - (sigma)I) for some value of (sigma) in [a,b]. By coding OP to solve linear equations in (A - (sigma)I) it is possible to implicitly use the operator -1 (A - (sigma)I) whose eigenvalues are 1/((lambda)-(sigma)). Thus all the eigenvalues of inside [a,b] become all the eigenvalues of the inverted operator outside [1/(a - (sigma)), 1/(b - (sigma))] and SILASO can be applied. The only problem is that the computed eigenvalues have to be back transformed. SILASO calls a number of subsidiary functions and subroutines. The following names must not be used in the driver program. SIWLA,SIPPLA,SMVPC,SORTQR,SVSORT,SAXPY,SCOPY,SDOT,SNRM2,SSCAL,SSWAP,SLAEIG, SLAGER,SLARAN,SLABAX,SLABCM,SLABFC,URAND. 4. Discussion of Method SILASO uses the block Lanczos algorithm with selective orthogonalization. The (scalar) Lanczos algorithm is an efficient scheme for computing an orthonormal basis q , q ,...,q for the Krylov subspace 1 2 j j-1 span(q , Aq , ..., A q ) and the corresponding reduced matrix 1 1 1 T T = Q AQ which happens to be tridiagonal. At each step the Krylov j j j subspace grows larger, one more Lanczos vector is added to the list, and T gets longer. With T it is easy to compute the Rayleigh-Ritz approximations to eigenpairs of A from the Krylov subspace and their corresponding error bounds. The algorithm continues until the error bounds are sufficiently small. The block Lanczos algorithm replaces each vector in the simple Lanczos algorithm by an orthonormal block of vectors. Block Lanczos has theoretical advantages over simple Lanczos with respect to finding multiple eigenvalues and has advantages in efficiency if the cost of forming the matrix vector products is high. Unfortunately finite precision arithmetic causes the vectors computed by the algorithm to lose orthogonality and approach linear dependence. To 8 maintain robust independence among the Lanczos vectors SILASO uses selective orthogonalization in which a few Lanczos vectors are orthogonalized against a few selected Ritz vectors. The algorithm is terminated when the desired Ritz values are sufficiently accurate. If necessary, SILASO then makes another Lanczos run to test for undisclosed multiplicities. Finally if such multiplicities are discovered SILASO performs a final Rayleigh-Ritz procedure to resolve such clusters. -------