rankMatrix.Rd
Compute ‘the’ matrix rank, a well-defined functional in theory(*), somewhat ambiguous in practice. We provide several methods, the default corresponding to Matlab's definition.
(*) The rank of a \(n \times m\) matrix \(A\), \(rk(A)\), is the maximal number of linearly independent columns (or rows); hence \(rk(A) \le min(n,m)\).
numeric matrix, of dimension \(n \times m\), say.
nonnegative number specifying a (relative,
“scalefree”) tolerance for testing of
“practically zero” with specific meaning depending on
method
; by default, max(dim(x)) * .Machine$double.eps
is according to Matlab's default (for its only method which is our
method="tolNorm2"
).
a character string specifying the computational method for the rank, can be abbreviated:
"tolNorm2"
:the number of singular values
>= tol * max(sval)
;
"qrLINPACK"
:for a dense matrix, this is the rank of
qr(x, tol, LAPACK=FALSE)
(which is
qr(...)$rank
);
This ("qr*", dense) version used to be the recommended way to
compute a matrix rank for a while in the past.
For sparse x
, this is equivalent to "qr.R"
.
"qr.R"
:this is the rank of triangular matrix
\(R\), where qr()
uses LAPACK or a "sparseQR" method
(see qr-methods
) to compute the decomposition
\(QR\). The rank of \(R\) is then defined as the number of
“non-zero” diagonal entries \(d_i\) of \(R\), and
“non-zero”s fulfill
\(|d_i| \ge \mathrm{tol}\cdot\max(|d_i|)\).
"qr"
:is for back compatibility; for dense x
,
it corresponds to "qrLINPACK"
, whereas for sparse
x
, it uses "qr.R"
.
For all the "qr*" methods, singular values sval
are not
used, which may be crucially important for a large sparse matrix
x
, as in that case, when sval
is not specified,
the default, computing svd()
currently coerces
x
to a dense matrix.
"useGrad"
:considering the “gradient” of the (decreasing) singular values, the index of the smallest gap.
"maybeGrad"
:choosing method "useGrad"
only when
that seems reasonable; otherwise using "tolNorm2"
.
numeric vector of non-increasing singular values of
x
; typically unspecified and computed from x
when
needed, i.e., unless method = "qr"
.
logical indicating if rankMatrix()
should warn
when it needs t(x)
instead of x
. Currently, for
method = "qr"
only, gives a warning by default because the
caller often could have passed t(x)
directly, more efficiently.
in the \(QR\) cases (i.e., if method
starts with
"qr"
), rankMatrix()
calls
qr2rankMarix(.., do.warn = warn.qr)
, see below.
an R object resulting from qr(x,..)
, i.e.,
typically inheriting from class
"qr"
or
"sparseQR"
.
logical
indicating if qr
is resulting
from base qr()
. (Otherwise, it is typically
from Matrix package sparse qr
.)
logical; if true, warn about non-finite diagonal entries in the \(R\) matrix of the \(QR\) decomposition. Do not change lightly!
qr2rankMatrix()
is typically called from rankMatrix()
for
the "qr"
* method
s, but can be used directly - much more
efficiently in case the qr
-decomposition is available anyway.
For large sparse matrices x
, unless you can specify
sval
yourself, currently method = "qr"
may
be the only feasible one, as the others need sval
and call
svd()
which currently coerces x
to a
denseMatrix
which may be very slow or impossible,
depending on the matrix dimensions.
Note that in the case of sparse x
, method = "qr"
, all
non-strictly zero diagonal entries \(d_i\) where counted, up to
including Matrix version 1.1-0, i.e., that method implicitly
used tol = 0
, see also the set.seed(42)
example below.
If x
is a matrix of all 0
(or of zero dimension), the rank
is zero; otherwise, typically a positive integer in 1:min(dim(x))
with attributes detailing the method used.
There are rare cases where the sparse \(QR\) decomposition
“fails” in so far as the diagonal entries of \(R\), the
\(d_i\) (see above), end with non-finite, typically NaN
entries. Then, a warning is signalled (unless warn.qr
/
do.warn
is not true) and NA
(specifically,
NA_integer_
) is returned.
rankMatrix(cbind(1, 0, 1:3)) # 2
#> [1] 2
#> attr(,"method")
#> [1] "tolNorm2"
#> attr(,"useGrad")
#> [1] FALSE
#> attr(,"tol")
#> [1] 6.661338e-16
(meths <- eval(formals(rankMatrix)$method))
#> [1] "tolNorm2" "qr.R" "qrLINPACK" "qr" "useGrad" "maybeGrad"
## a "border" case:
H12 <- Hilbert(12)
rankMatrix(H12, tol = 1e-20) # 12; but 11 with default method & tol.
#> [1] 12
#> attr(,"method")
#> [1] "tolNorm2"
#> attr(,"useGrad")
#> [1] FALSE
#> attr(,"tol")
#> [1] 1e-20
sapply(meths, function(.m.) rankMatrix(H12, method = .m.))
#> tolNorm2 qr.R qrLINPACK qr useGrad maybeGrad
#> 11 11 12 12 11 11
## tolNorm2 qr.R qrLINPACK qr useGrad maybeGrad
## 11 11 12 12 11 11
## The meaning of 'tol' for method="qrLINPACK" and *dense* x is not entirely "scale free"
rMQL <- function(ex, M) rankMatrix(M, method="qrLINPACK",tol = 10^-ex)
rMQR <- function(ex, M) rankMatrix(M, method="qr.R", tol = 10^-ex)
sapply(5:15, rMQL, M = H12) # result is platform dependent
#> [1] 7 7 8 9 10 11 11 11 12 12 12
## 7 7 8 10 10 11 11 11 12 12 12 {x86_64}
sapply(5:15, rMQL, M = 1000 * H12) # not identical unfortunately
#> [1] 7 7 8 10 11 11 12 12 12 12 12
## 7 7 8 10 11 11 12 12 12 12 12
sapply(5:15, rMQR, M = H12)
#> [1] 5 6 7 8 8 9 9 10 10 11 11
## 5 6 7 8 8 9 9 10 10 11 11
sapply(5:15, rMQR, M = 1000 * H12) # the *same*
#> [1] 5 6 7 8 8 9 9 10 10 11 11
## "sparse" case:
M15 <- kronecker(diag(x=c(100,1,10)), Hilbert(5))
sapply(meths, function(.m.) rankMatrix(M15, method = .m.))
#> tolNorm2 qr.R qrLINPACK qr useGrad maybeGrad
#> 15 15 15 15 14 15
#--> all 15, but 'useGrad' has 14.
sapply(meths, function(.m.) rankMatrix(M15, method = .m., tol = 1e-7)) # all 14
#> tolNorm2 qr.R qrLINPACK qr useGrad maybeGrad
#> 14 14 15 15 14 14
## "large" sparse
n <- 250000; p <- 33; nnz <- 10000
L <- sparseMatrix(i = sample.int(n, nnz, replace=TRUE),
j = sample.int(p, nnz, replace=TRUE),
x = rnorm(nnz))
(st1 <- system.time(r1 <- rankMatrix(L))) # warning+ ~1.5 sec (2013)
#> Warning: rankMatrix(<large sparse Matrix>, method = 'tolNorm2') coerces to dense matrix.
#> Probably should rather use method = 'qr' !?
#> user system elapsed
#> 0.862 0.269 0.994
(st2 <- system.time(r2 <- rankMatrix(L, method = "qr"))) # considerably faster!
#> user system elapsed
#> 0.010 0.000 0.011
r1[[1]] == print(r2[[1]]) ## --> ( 33 TRUE )
#> [1] 33
#> [1] TRUE
## another sparse-"qr" one, which ``failed'' till 2013-11-23:
set.seed(42)
f1 <- factor(sample(50, 1000, replace=TRUE))
f2 <- factor(sample(50, 1000, replace=TRUE))
f3 <- factor(sample(50, 1000, replace=TRUE))
D <- t(do.call(rbind, lapply(list(f1,f2,f3), as, 'sparseMatrix')))
dim(D); nnzero(D) ## 1000 x 150 // 3000 non-zeros (= 2%)
#> [1] 1000 150
#> [1] 3000
stopifnot(rankMatrix(D, method='qr') == 148,
rankMatrix(crossprod(D),method='qr') == 148)
## zero matrix has rank 0 :
stopifnot(sapply(meths, function(.m.)
rankMatrix(matrix(0, 2, 2), method = .m.)) == 0)