rankMatrix.RdCompute ‘the’ matrix rank, a well-defined functional in theory(*), somewhat ambiguous in practice. We provide several methods, the default corresponding to Matlab's definition.
(*) The rank of a \(n \times m\) matrix \(A\), \(rk(A)\), is the maximal number of linearly independent columns (or rows); hence \(rk(A) \le min(n,m)\).
numeric matrix, of dimension \(n \times m\), say.
nonnegative number specifying a (relative,
“scalefree”) tolerance for testing of
“practically zero” with specific meaning depending on
method; by default, max(dim(x)) * .Machine$double.eps
is according to Matlab's default (for its only method which is our
method="tolNorm2").
a character string specifying the computational method for the rank, can be abbreviated:
"tolNorm2":the number of singular values
>= tol * max(sval);
"qrLINPACK":for a dense matrix, this is the rank of
qr(x, tol, LAPACK=FALSE) (which is
qr(...)$rank);
This ("qr*", dense) version used to be the recommended way to
compute a matrix rank for a while in the past.
For sparse x, this is equivalent to "qr.R".
"qr.R":this is the rank of triangular matrix
\(R\), where qr() uses LAPACK or a "sparseQR" method
(see qr-methods) to compute the decomposition
\(QR\). The rank of \(R\) is then defined as the number of
“non-zero” diagonal entries \(d_i\) of \(R\), and
“non-zero”s fulfill
\(|d_i| \ge \mathrm{tol}\cdot\max(|d_i|)\).
"qr":is for back compatibility; for dense x,
it corresponds to "qrLINPACK", whereas for sparse
x, it uses "qr.R".
For all the "qr*" methods, singular values sval are not
used, which may be crucially important for a large sparse matrix
x, as in that case, when sval is not specified,
the default, computing svd() currently coerces
x to a dense matrix.
"useGrad":considering the “gradient” of the (decreasing) singular values, the index of the smallest gap.
"maybeGrad":choosing method "useGrad" only when
that seems reasonable; otherwise using "tolNorm2".
numeric vector of non-increasing singular values of
x; typically unspecified and computed from x when
needed, i.e., unless method = "qr".
logical indicating if rankMatrix() should warn
when it needs t(x) instead of x. Currently, for
method = "qr" only, gives a warning by default because the
caller often could have passed t(x) directly, more efficiently.
in the \(QR\) cases (i.e., if method starts with
"qr"), rankMatrix() calls
qr2rankMarix(.., do.warn = warn.qr), see below.
an R object resulting from qr(x,..), i.e.,
typically inheriting from class "qr" or
"sparseQR".
logical indicating if qr is resulting
from base qr(). (Otherwise, it is typically
from Matrix package sparse qr.)
logical; if true, warn about non-finite diagonal entries in the \(R\) matrix of the \(QR\) decomposition. Do not change lightly!
qr2rankMatrix() is typically called from rankMatrix() for
the "qr"* methods, but can be used directly - much more
efficiently in case the qr-decomposition is available anyway.
For large sparse matrices x, unless you can specify
sval yourself, currently method = "qr" may
be the only feasible one, as the others need sval and call
svd() which currently coerces x to a
denseMatrix which may be very slow or impossible,
depending on the matrix dimensions.
Note that in the case of sparse x, method = "qr", all
non-strictly zero diagonal entries \(d_i\) where counted, up to
including Matrix version 1.1-0, i.e., that method implicitly
used tol = 0, see also the set.seed(42) example below.
If x is a matrix of all 0 (or of zero dimension), the rank
is zero; otherwise, typically a positive integer in 1:min(dim(x))
with attributes detailing the method used.
There are rare cases where the sparse \(QR\) decomposition
“fails” in so far as the diagonal entries of \(R\), the
\(d_i\) (see above), end with non-finite, typically NaN
entries. Then, a warning is signalled (unless warn.qr /
do.warn is not true) and NA (specifically,
NA_integer_) is returned.
rankMatrix(cbind(1, 0, 1:3)) # 2
#> [1] 2
#> attr(,"method")
#> [1] "tolNorm2"
#> attr(,"useGrad")
#> [1] FALSE
#> attr(,"tol")
#> [1] 6.661338e-16
(meths <- eval(formals(rankMatrix)$method))
#> [1] "tolNorm2" "qr.R" "qrLINPACK" "qr" "useGrad" "maybeGrad"
## a "border" case:
H12 <- Hilbert(12)
rankMatrix(H12, tol = 1e-20) # 12; but 11 with default method & tol.
#> [1] 12
#> attr(,"method")
#> [1] "tolNorm2"
#> attr(,"useGrad")
#> [1] FALSE
#> attr(,"tol")
#> [1] 1e-20
sapply(meths, function(.m.) rankMatrix(H12, method = .m.))
#> tolNorm2 qr.R qrLINPACK qr useGrad maybeGrad
#> 11 11 12 12 11 11
## tolNorm2 qr.R qrLINPACK qr useGrad maybeGrad
## 11 11 12 12 11 11
## The meaning of 'tol' for method="qrLINPACK" and *dense* x is not entirely "scale free"
rMQL <- function(ex, M) rankMatrix(M, method="qrLINPACK",tol = 10^-ex)
rMQR <- function(ex, M) rankMatrix(M, method="qr.R", tol = 10^-ex)
sapply(5:15, rMQL, M = H12) # result is platform dependent
#> [1] 7 7 8 9 10 11 11 11 12 12 12
## 7 7 8 10 10 11 11 11 12 12 12 {x86_64}
sapply(5:15, rMQL, M = 1000 * H12) # not identical unfortunately
#> [1] 7 7 8 10 11 11 12 12 12 12 12
## 7 7 8 10 11 11 12 12 12 12 12
sapply(5:15, rMQR, M = H12)
#> [1] 5 6 7 8 8 9 9 10 10 11 11
## 5 6 7 8 8 9 9 10 10 11 11
sapply(5:15, rMQR, M = 1000 * H12) # the *same*
#> [1] 5 6 7 8 8 9 9 10 10 11 11
## "sparse" case:
M15 <- kronecker(diag(x=c(100,1,10)), Hilbert(5))
sapply(meths, function(.m.) rankMatrix(M15, method = .m.))
#> tolNorm2 qr.R qrLINPACK qr useGrad maybeGrad
#> 15 15 15 15 14 15
#--> all 15, but 'useGrad' has 14.
sapply(meths, function(.m.) rankMatrix(M15, method = .m., tol = 1e-7)) # all 14
#> tolNorm2 qr.R qrLINPACK qr useGrad maybeGrad
#> 14 14 15 15 14 14
## "large" sparse
n <- 250000; p <- 33; nnz <- 10000
L <- sparseMatrix(i = sample.int(n, nnz, replace=TRUE),
j = sample.int(p, nnz, replace=TRUE),
x = rnorm(nnz))
(st1 <- system.time(r1 <- rankMatrix(L))) # warning+ ~1.5 sec (2013)
#> Warning: rankMatrix(<large sparse Matrix>, method = 'tolNorm2') coerces to dense matrix.
#> Probably should rather use method = 'qr' !?
#> user system elapsed
#> 0.828 0.286 0.836
(st2 <- system.time(r2 <- rankMatrix(L, method = "qr"))) # considerably faster!
#> user system elapsed
#> 0.008 0.000 0.008
r1[[1]] == print(r2[[1]]) ## --> ( 33 TRUE )
#> [1] 33
#> [1] TRUE
## another sparse-"qr" one, which ``failed'' till 2013-11-23:
set.seed(42)
f1 <- factor(sample(50, 1000, replace=TRUE))
f2 <- factor(sample(50, 1000, replace=TRUE))
f3 <- factor(sample(50, 1000, replace=TRUE))
D <- t(do.call(rbind, lapply(list(f1,f2,f3), as, 'sparseMatrix')))
dim(D); nnzero(D) ## 1000 x 150 // 3000 non-zeros (= 2%)
#> [1] 1000 150
#> [1] 3000
stopifnot(rankMatrix(D, method='qr') == 148,
rankMatrix(crossprod(D),method='qr') == 148)
## zero matrix has rank 0 :
stopifnot(sapply(meths, function(.m.)
rankMatrix(matrix(0, 2, 2), method = .m.)) == 0)