|
Anasazi Version of the Day
|
Intranode TSQR, parallelized with Intel TBB. More...
#include <TbbTsqr.hpp>
Public Member Functions | |
| size_t | ncores () const |
| (Max) number of cores used for the factorization. | |
| size_t | cache_block_size () const |
| Cache block size (in bytes) used for the factorization. | |
| TbbTsqr (const size_t numCores, const size_t cacheBlockSize=0) | |
| FactorOutput | factor (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar A[], const LocalOrdinal lda, Scalar R[], const LocalOrdinal ldr, const bool contiguous_cache_blocks=false) |
| Compute QR factorization of the dense matrix A. | |
| void | apply (const ApplyType &apply_type, const LocalOrdinal nrows, const LocalOrdinal ncols_Q, const Scalar Q[], const LocalOrdinal ldq, const FactorOutput &factor_output, const LocalOrdinal ncols_C, Scalar C[], const LocalOrdinal ldc, const bool contiguous_cache_blocks=false) |
| Apply Q factor to the global dense matrix C. | |
| void | explicit_Q (const LocalOrdinal nrows, const LocalOrdinal ncols_Q_in, const Scalar Q_in[], const LocalOrdinal ldq_in, const FactorOutput &factor_output, const LocalOrdinal ncols_Q_out, Scalar Q_out[], const LocalOrdinal ldq_out, const bool contiguous_cache_blocks=false) |
| Compute the explicit Q factor from factor() | |
| void | Q_times_B (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar Q[], const LocalOrdinal ldq, const Scalar B[], const LocalOrdinal ldb, const bool contiguous_cache_blocks=false) const |
| Compute Q*B. | |
| LocalOrdinal | reveal_R_rank (const LocalOrdinal ncols, Scalar R[], const LocalOrdinal ldr, Scalar U[], const LocalOrdinal ldu, const magnitude_type tol) const |
| LocalOrdinal | reveal_rank (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar Q[], const LocalOrdinal ldq, Scalar R[], const LocalOrdinal ldr, const magnitude_type tol, const bool contiguous_cache_blocks=false) |
| Rank-revealing decomposition. | |
Static Public Member Functions | |
| static bool | QR_produces_R_factor_with_nonnegative_diagonal () |
Intranode TSQR, parallelized with Intel TBB.
TSQR factorization for a dense, tall and skinny matrix stored on a single node. Parallelized using Intel's Threading Building Blocks.
Definition at line 62 of file TbbTsqr.hpp.
| TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::TbbTsqr | ( | const size_t | numCores, |
| const size_t | cacheBlockSize = 0 |
||
| ) | [inline] |
Constructor; sets up tuning parameters
| numCores | [in] Maximum number of processing cores to use when factoring the matrix. Fewer cores may be used if the matrix is not big enough to justify their use. |
| cacheBlockSize | [in] Size (in bytes) of cache block to use in the sequential part of TSQR. If zero or not specified, a reasonable default is used. If each core has a private cache, that cache's size (minus a little wiggle room) would be the appropriate value for this parameter. Set to zero for the implementation to choose a default, which may or may not give good performance on your platform. |
Definition at line 104 of file TbbTsqr.hpp.
| size_t TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::ncores | ( | ) | const [inline] |
(Max) number of cores used for the factorization.
Definition at line 87 of file TbbTsqr.hpp.
| size_t TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::cache_block_size | ( | ) | const [inline] |
Cache block size (in bytes) used for the factorization.
Definition at line 90 of file TbbTsqr.hpp.
| static bool TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::QR_produces_R_factor_with_nonnegative_diagonal | ( | ) | [inline, static] |
Whether or not this QR factorization produces an R factor with all nonnegative diagonal entries.
Definition at line 116 of file TbbTsqr.hpp.
| FactorOutput TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::factor | ( | const LocalOrdinal | nrows, |
| const LocalOrdinal | ncols, | ||
| Scalar | A[], | ||
| const LocalOrdinal | lda, | ||
| Scalar | R[], | ||
| const LocalOrdinal | ldr, | ||
| const bool | contiguous_cache_blocks = false |
||
| ) | [inline] |
Compute QR factorization of the dense matrix A.
Compute the QR factorization of the dense matrix A.
| nrows | [in] Number of rows of A. Precondition: nrows >= ncols. |
| ncols | [in] Number of columns of A. Precondition: nrows >= ncols. |
| A | [in,out] On input, the matrix to factor, stored as a general dense matrix in column-major order. On output, overwritten with an implicit representation of the Q factor. |
| lda | [in] Leading dimension of A. Precondition: lda >= nrows. |
| R | [out] The final R factor of the QR factorization of the matrix A. An ncols by ncols upper triangular matrix stored in column-major order, with leading dimension ldr. |
| ldr | [in] Leading dimension of the matrix R. |
| b_contiguous_cache_blocks | [in] Whether cache blocks are stored contiguously in the input matrix A and the output matrix Q (of explicit_Q()). If not and you want them to be, you should use the cache_block() method to copy them into that format. You may use the un_cache_block() method to copy them out of that format into the usual column-oriented format. |
Definition at line 199 of file TbbTsqr.hpp.
| void TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::apply | ( | const ApplyType & | apply_type, |
| const LocalOrdinal | nrows, | ||
| const LocalOrdinal | ncols_Q, | ||
| const Scalar | Q[], | ||
| const LocalOrdinal | ldq, | ||
| const FactorOutput & | factor_output, | ||
| const LocalOrdinal | ncols_C, | ||
| Scalar | C[], | ||
| const LocalOrdinal | ldc, | ||
| const bool | contiguous_cache_blocks = false |
||
| ) | [inline] |
Apply Q factor to the global dense matrix C.
Apply the Q factor (computed by factor() and represented implicitly) to the dense matrix C.
| apply_type | [in] Whether to compute Q*C, Q^T * C, or Q^H * C. |
| nrows | [in] Number of rows of the matrix C and the matrix Q. Precondition: nrows >= ncols_Q, ncols_C. |
| ncols_Q | [in] Number of columns of Q |
| Q | [in] Same as the "A" output of factor() |
| ldq | [in] Same as the "lda" input of factor() |
| factor_output | [in] Return value of factor() |
| ncols_C | [in] Number of columns in C. Precondition: nrows_local >= ncols_C. |
| C | [in,out] On input, the matrix C, stored as a general dense matrix in column-major order. On output, overwritten with op(Q)*C, where op(Q) = Q or Q^T. |
| ldc | [in] Leading dimension of C. Precondition: ldc_local >= nrows_local. Not applicable if C is cache-blocked in place. |
| contiguous_cache_blocks | [in] Whether or not cache blocks of Q and C are stored contiguously (default: false). |
Definition at line 246 of file TbbTsqr.hpp.
| void TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::explicit_Q | ( | const LocalOrdinal | nrows, |
| const LocalOrdinal | ncols_Q_in, | ||
| const Scalar | Q_in[], | ||
| const LocalOrdinal | ldq_in, | ||
| const FactorOutput & | factor_output, | ||
| const LocalOrdinal | ncols_Q_out, | ||
| Scalar | Q_out[], | ||
| const LocalOrdinal | ldq_out, | ||
| const bool | contiguous_cache_blocks = false |
||
| ) | [inline] |
Compute the explicit Q factor from factor()
Compute the explicit version of the Q factor computed by factor() and represented implicitly (via Q_in and factor_output).
| nrows | [in] Number of rows of the matrix Q_in. Also, the number of rows of the output matrix Q_out. Precondition: nrows >= ncols_Q_in. |
| ncols_Q_in | [in] Number of columns in the original matrix A, whose explicit Q factor we are computing. Precondition: nrows >= ncols_Q_in. |
| Q_local_in | [in] Same as A output of factor(). |
| ldq_local_in | [in] Same as lda input of factor() |
| ncols_Q_out | [in] Number of columns of the explicit Q factor to compute. |
| Q_out | [out] The explicit representation of the Q factor. |
| ldq_out | [in] Leading dimension of Q_out. |
| factor_output | [in] Return value of factor(). |
Definition at line 290 of file TbbTsqr.hpp.
| void TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::Q_times_B | ( | const LocalOrdinal | nrows, |
| const LocalOrdinal | ncols, | ||
| Scalar | Q[], | ||
| const LocalOrdinal | ldq, | ||
| const Scalar | B[], | ||
| const LocalOrdinal | ldb, | ||
| const bool | contiguous_cache_blocks = false |
||
| ) | const [inline] |
Compute Q*B.
Compute matrix-matrix product Q*B, where Q is nrows by ncols and B is ncols by ncols. Respect cache blocks of Q.
Definition at line 311 of file TbbTsqr.hpp.
| LocalOrdinal TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::reveal_R_rank | ( | const LocalOrdinal | ncols, |
| Scalar | R[], | ||
| const LocalOrdinal | ldr, | ||
| Scalar | U[], | ||
| const LocalOrdinal | ldu, | ||
| const magnitude_type | tol | ||
| ) | const [inline] |
Compute SVD
, not in place. Use the resulting singular values to compute the numerical rank of R, with respect to the relative tolerance tol. If R is full rank, return without modifying R. If R is not full rank, overwrite R with
.
Definition at line 330 of file TbbTsqr.hpp.
| LocalOrdinal TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::reveal_rank | ( | const LocalOrdinal | nrows, |
| const LocalOrdinal | ncols, | ||
| Scalar | Q[], | ||
| const LocalOrdinal | ldq, | ||
| Scalar | R[], | ||
| const LocalOrdinal | ldr, | ||
| const magnitude_type | tol, | ||
| const bool | contiguous_cache_blocks = false |
||
| ) | [inline] |
Rank-revealing decomposition.
Using the R factor from factor() and the explicit Q factor from explicit_Q(), compute the SVD of R (
). R. If R is full rank (with respect to the given relative tolerance tol), don't change Q or R. Otherwise, compute
and
in place (the latter may be no longer upper triangular).
of R:
. Definition at line 352 of file TbbTsqr.hpp.
1.7.4