|
Anasazi Version of the Day
|
Parallel Tall Skinny QR (TSQR) factorization. More...
#include <Tsqr.hpp>
Public Member Functions | |
| Tsqr (const node_tsqr_ptr &nodeTsqr, const dist_tsqr_ptr &distTsqr) | |
| size_t | cache_block_size () const |
| Cache block size in bytes. | |
| bool | QR_produces_R_factor_with_nonnegative_diagonal () const |
| FactorOutput | factor (const LocalOrdinal nrows_local, const LocalOrdinal ncols, Scalar A_local[], const LocalOrdinal lda_local, Scalar R[], const LocalOrdinal ldr, const bool contiguousCacheBlocks=false) |
| Compute QR factorization of the global dense matrix A. | |
| void | apply (const std::string &op, const LocalOrdinal nrows_local, const LocalOrdinal ncols_Q, const Scalar Q_local[], const LocalOrdinal ldq_local, const FactorOutput &factor_output, const LocalOrdinal ncols_C, Scalar C_local[], const LocalOrdinal ldc_local, const bool contiguousCacheBlocks=false) |
| Apply Q factor to the global dense matrix C. | |
| void | explicit_Q (const LocalOrdinal nrows_local, const LocalOrdinal ncols_Q_in, const Scalar Q_local_in[], const LocalOrdinal ldq_local_in, const FactorOutput &factorOutput, const LocalOrdinal ncols_Q_out, Scalar Q_local_out[], const LocalOrdinal ldq_local_out, const bool contiguousCacheBlocks=false) |
| Compute the explicit Q factor from factor() | |
| void | Q_times_B (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar Q[], const LocalOrdinal ldq, const Scalar B[], const LocalOrdinal ldb, const bool contiguousCacheBlocks=false) const |
| Compute Q*B. | |
| LocalOrdinal | reveal_R_rank (const LocalOrdinal ncols, Scalar R[], const LocalOrdinal ldr, Scalar U[], const LocalOrdinal ldu, const magnitude_type tol) const |
| LocalOrdinal | reveal_rank (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar Q[], const LocalOrdinal ldq, Scalar R[], const LocalOrdinal ldr, const magnitude_type tol, const bool contiguousCacheBlocks=false) const |
| Rank-revealing decomposition. | |
| void | cache_block (const LocalOrdinal nrows_local, const LocalOrdinal ncols, Scalar A_local_out[], const Scalar A_local_in[], const LocalOrdinal lda_local_in) const |
| Cache-block A_in into A_out. | |
| void | un_cache_block (const LocalOrdinal nrows_local, const LocalOrdinal ncols, Scalar A_local_out[], const LocalOrdinal lda_local_out, const Scalar A_local_in[]) const |
| Un-cache-block A_in into A_out. | |
Parallel Tall Skinny QR (TSQR) factorization.
Parallel Tall Skinny QR (TSQR) factorization of a matrix distributed in block rows across one or more MPI processes. The parallel critical path length for TSQR is independent of the number of columns in the matrix, unlike ScaLAPACK's comparable QR factorization (P_GEQR2), Modified Gram-Schmidt, or Classical Gram-Schmidt.
LocalOrdinal: index type that can address all elements of a matrix (when treated as a 1-D array, so for A[i + LDA*j], the number i + LDA*j must fit in a LocalOrdinal).
Scalar: the type of the matrix entries.
NodeTsqrType: the intranode (single-node) part of Tsqr. Defaults to sequential cache-blocked TSQR. Any class implementing the same compile-time interface is valid. We provide NodeTsqr.hpp as an archetype of the "NodeTsqrType" concept, but it is not necessary that NodeTsqrType derive from that abstract base class.
DistTsqrType: the internode (across nodes) part of Tsqr. Any class implementing the same compile-time interface as the default template parameter class is valid.
| TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::Tsqr | ( | const node_tsqr_ptr & | nodeTsqr, |
| const dist_tsqr_ptr & | distTsqr | ||
| ) | [inline] |
| size_t TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::cache_block_size | ( | ) | const [inline] |
Cache block size in bytes.
Cache block size (in bytes) used by the underlying intranode TSQR implementation.
| bool TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::QR_produces_R_factor_with_nonnegative_diagonal | ( | ) | const [inline] |
| FactorOutput TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::factor | ( | const LocalOrdinal | nrows_local, |
| const LocalOrdinal | ncols, | ||
| Scalar | A_local[], | ||
| const LocalOrdinal | lda_local, | ||
| Scalar | R[], | ||
| const LocalOrdinal | ldr, | ||
| const bool | contiguousCacheBlocks = false |
||
| ) | [inline] |
Compute QR factorization of the global dense matrix A.
Compute the QR factorization of the tall and skinny dense matrix A. The matrix A is distributed in a row block layout over all the MPI processes. A_local contains the matrix data for this process.
| nrows_local | [in] Number of rows of this node's local component (A_local) of the matrix. May differ on different nodes. Precondition: nrows_local >= ncols. |
| ncols | [in] Number of columns in the matrix to factor. Should be the same on all nodes. Precondition: nrows_local >= ncols. |
| A_local | [in,out] On input, this node's local component of the matrix, stored as a general dense matrix in column-major order. On output, overwritten with an implicit representation of the Q factor. |
| lda_local | [in] Leading dimension of A_local. Precondition: lda_local >= nrows_local. |
| R | [out] The final R factor of the QR factorization of the global matrix A. An ncols by ncols upper triangular matrix with leading dimension ldr. |
| ldr | [in] Leading dimension of the matrix R. |
| contiguousCacheBlocks | [in] Whether or not cache blocks of A_local are stored contiguously. The default value of false means that A_local uses ordinary column-major (Fortran-style) order. Otherwise, the details of the format depend on the specific NodeTsqrType. Tsqr's cache_block() and un_cache_block() methods may be used to convert between cache-blocked and non-cache-blocked (column-major order) formats. |
| void TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::apply | ( | const std::string & | op, |
| const LocalOrdinal | nrows_local, | ||
| const LocalOrdinal | ncols_Q, | ||
| const Scalar | Q_local[], | ||
| const LocalOrdinal | ldq_local, | ||
| const FactorOutput & | factor_output, | ||
| const LocalOrdinal | ncols_C, | ||
| Scalar | C_local[], | ||
| const LocalOrdinal | ldc_local, | ||
| const bool | contiguousCacheBlocks = false |
||
| ) | [inline] |
Apply Q factor to the global dense matrix C.
Apply the Q factor (computed by factor() and represented implicitly) to the global dense matrix C, consisting of all nodes' C_local matrices stacked on top of each other.
| [in] | If | "N", compute Q*C. If "T", compute Q^T * C. If "H" or "C", compute Q^H * C. (The last option may not be implemented in all cases.) |
| nrows_local | [in] Number of rows of this node's local component (C_local) of the matrix C. Should be the same on this node as the nrows_local argument with which factor() was called Precondition: nrows_local >= ncols. | |
| ncols_Q | [in] Number of columns in Q. Should be the same on all nodes. Precondition: nrows_local >= ncols_Q. | |
| Q_local | [in] Same as A_local output of factor() | |
| ldq_local | [in] Same as lda_local of factor() | |
| factor_output | [in] Return value of factor() | |
| ncols_C | [in] Number of columns in C. Should be the same on all nodes. Precondition: nrows_local >= ncols_C. | |
| C_local | [in,out] On input, this node's local component of the matrix C, stored as a general dense matrix in column-major order. On output, overwritten with this node's component of op(Q)*C, where op(Q) = Q, Q^T, or Q^H. | |
| ldc_local | [in] Leading dimension of C_local. Precondition: ldc_local >= nrows_local. | |
| contiguousCacheBlocks | [in] Whether or not the cache blocks of Q and C are stored contiguously. |
| void TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::explicit_Q | ( | const LocalOrdinal | nrows_local, |
| const LocalOrdinal | ncols_Q_in, | ||
| const Scalar | Q_local_in[], | ||
| const LocalOrdinal | ldq_local_in, | ||
| const FactorOutput & | factorOutput, | ||
| const LocalOrdinal | ncols_Q_out, | ||
| Scalar | Q_local_out[], | ||
| const LocalOrdinal | ldq_local_out, | ||
| const bool | contiguousCacheBlocks = false |
||
| ) | [inline] |
Compute the explicit Q factor from factor()
Compute the explicit version of the Q factor computed by factor() and represented implicitly (via Q_local_in and factor_output).
| nrows_local | [in] Number of rows of this node's local component (Q_local_in) of the matrix Q_local_in. Also, the number of rows of this node's local component (Q_local_out) of the output matrix. Should be the same on this node as the nrows_local argument with which factor() was called. Precondition: nrows_local >= ncols_Q_in. |
| ncols_Q_in | [in] Number of columns in the original matrix A, whose explicit Q factor we are computing. Should be the same on all nodes. Precondition: nrows_local >= ncols_Q_in. |
| Q_local_in | [in] Same as A_local output of factor(). |
| ldq_local_in | [in] Same as lda_local of factor() |
| factorOutput | [in] Return value of factor(). |
| ncols_Q_out | [in] Number of columns of the explicit Q factor to compute. Should be the same on all nodes. |
| Q_local_out | [out] This node's component of the Q factor (in explicit form). |
| ldq_local_out | [in] Leading dimension of Q_local_out. |
| contiguousCacheBlocks | [in] Whether or not cache blocks in Q_local_in and Q_local_out are stored contiguously. |
| void TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::Q_times_B | ( | const LocalOrdinal | nrows, |
| const LocalOrdinal | ncols, | ||
| Scalar | Q[], | ||
| const LocalOrdinal | ldq, | ||
| const Scalar | B[], | ||
| const LocalOrdinal | ldb, | ||
| const bool | contiguousCacheBlocks = false |
||
| ) | const [inline] |
| LocalOrdinal TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::reveal_R_rank | ( | const LocalOrdinal | ncols, |
| Scalar | R[], | ||
| const LocalOrdinal | ldr, | ||
| Scalar | U[], | ||
| const LocalOrdinal | ldu, | ||
| const magnitude_type | tol | ||
| ) | const [inline] |
Compute SVD
, not in place. Use the resulting singular values to compute the numerical rank of R, with respect to the relative tolerance tol. If R is full rank, return without modifying R. If R is not full rank, overwrite R with
.
| LocalOrdinal TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::reveal_rank | ( | const LocalOrdinal | nrows, |
| const LocalOrdinal | ncols, | ||
| Scalar | Q[], | ||
| const LocalOrdinal | ldq, | ||
| Scalar | R[], | ||
| const LocalOrdinal | ldr, | ||
| const magnitude_type | tol, | ||
| const bool | contiguousCacheBlocks = false |
||
| ) | const [inline] |
Rank-revealing decomposition.
Using the R factor from factor() and the explicit Q factor from explicit_Q(), compute the SVD of R (
). R. If R is full rank (with respect to the given relative tolerance tol), don't change Q or R. Otherwise, compute
and
in place (the latter may be no longer upper triangular).
| R | [in/out] On input: ncols by ncols upper triangular matrix with leading dimension ldr >= ncols. On output: if input is full rank, R is unchanged on output. Otherwise, if is the SVD of R, on output R is overwritten with $ V^*$. This is also an ncols by ncols matrix, but may not necessarily be upper triangular. |
of R:
. | void TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::cache_block | ( | const LocalOrdinal | nrows_local, |
| const LocalOrdinal | ncols, | ||
| Scalar | A_local_out[], | ||
| const Scalar | A_local_in[], | ||
| const LocalOrdinal | lda_local_in | ||
| ) | const [inline] |
| void TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::un_cache_block | ( | const LocalOrdinal | nrows_local, |
| const LocalOrdinal | ncols, | ||
| Scalar | A_local_out[], | ||
| const LocalOrdinal | lda_local_out, | ||
| const Scalar | A_local_in[] | ||
| ) | const [inline] |
1.7.4