fastLm.RdfastLm estimates the linear model using the solve
function of Armadillo linear algebra library.
fastLmPure(X, y)
fastLm(X, ...)
# Default S3 method
fastLm(X, y, ...)
# S3 method for class 'formula'
fastLm(formula, data = list(), ...)Linear models should be estimated using the lm function. In
some cases, lm.fit may be appropriate.
The fastLmPure function provides a reference use case of the Armadillo
library via the wrapper functions in the RcppArmadillo package.
The fastLm function provides a more standard implementation of
a linear model fit, offering both a default and a formula interface as
well as print, summary and predict methods.
Lastly, one must be be careful in timing comparisons of
lm and friends versus this approach based on
Armadillo. The reason that Armadillo can do something
like lm.fit faster than the functions in the stats
package is because Armadillo can use different solvers, including
fast / approximative ones. Older versions of Armadillo could therefore
either fail or, worse, produce completely incorrect answers
on rank-deficient model matrices whereas the functions from the stats
package will handle them properly due to the modified Linpack code.
Newer Armadillo version pivot (with warning) to an approximate solutions.
This behavior can be controlled with options to the solve function,
see the Armadillo documentation.
An example of the type of situation requiring extra care in checking for rank deficiency is a two-way layout with missing cells (see the examples section). These cases require a special pivoting scheme of “pivot only on (apparent) rank deficiency” which is not part of conventional linear algebra software.
fastLmPure returns a list with three components:
a vector of coefficients
a vector of the (estimated) standard errors of the coefficient estimates
a scalar denoting the degrees of freedom in the model
fastLm returns a richer object which also includes the
residuals, fitted values and call argument similar to the lm or
rlm functions.
Armadillo project: https://arma.sourceforge.net/
data(trees, package="datasets")
## bare-bones direct interface
flm <- fastLmPure( cbind(1, log(trees$Girth)), log(trees$Volume) )
print(flm)
#> $coefficients
#> [1] -2.353325 2.199970
#>
#> $stderr
#> [1] 0.23066284 0.08983455
#>
#> $df.residual
#> [1] 29
#>
## standard R interface for formula or data returning object of class fastLm
flmmod <- fastLm( log(Volume) ~ log(Girth), data=trees)
summary(flmmod)
#>
#> Call:
#> fastLm.formula(formula = log(Volume) ~ log(Girth), data = trees)
#>
#> Residuals:
#> Min. 1st Qu. Median 3rd Qu. Max.
#> -0.2060000 -0.0687020 0.0010105 0.0725850 0.2479600
#>
#> Estimate StdErr t.value p.value
#> (Intercept) -2.353325 0.230663 -10.202 4.18e-11 ***
#> log(Girth) 2.199970 0.089835 24.489 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.115 on 29 degrees of freedom
#> Multiple R-squared: 0.9539, Adjusted R-squared: 0.9523
## case where fastLm breaks down
dd <- data.frame(f1 = gl(4, 6, labels = LETTERS[1:4]),
f2 = gl(3, 2, labels = letters[1:3]))[-(7:8), ]
xtabs(~ f2 + f1, dd) # one missing cell
#> f1
#> f2 A B C D
#> a 2 0 2 2
#> b 2 2 2 2
#> c 2 2 2 2
mm <- model.matrix(~ f1 * f2, dd)
kappa(mm) # large, indicating rank deficiency
#> [1] 1.98556e+17
set.seed(1)
dd$y <- mm %*% seq_len(ncol(mm)) + rnorm(nrow(mm), sd = 0.1)
summary(lm(y ~ f1 * f2, dd)) # detects rank deficiency
#>
#> Call:
#> lm(formula = y ~ f1 * f2, data = dd)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.12155 -0.04702 0.00000 0.04702 0.12155
#>
#> Coefficients: (1 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.97786 0.05816 16.81 3.41e-09 ***
#> f1B 12.03807 0.08226 146.35 < 2e-16 ***
#> f1C 3.11722 0.08226 37.90 5.22e-13 ***
#> f1D 4.06852 0.08226 49.46 2.83e-14 ***
#> f2b 5.06012 0.08226 61.52 2.59e-15 ***
#> f2c 5.99759 0.08226 72.91 4.01e-16 ***
#> f1B:f2b -3.01476 0.11633 -25.92 3.27e-11 ***
#> f1C:f2b 7.70300 0.11633 66.22 1.16e-15 ***
#> f1D:f2b 8.96425 0.11633 77.06 < 2e-16 ***
#> f1B:f2c NA NA NA NA
#> f1C:f2c 10.96133 0.11633 94.23 < 2e-16 ***
#> f1D:f2c 12.04108 0.11633 103.51 < 2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.08226 on 11 degrees of freedom
#> Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999
#> F-statistic: 1.855e+04 on 10 and 11 DF, p-value: < 2.2e-16
#>
summary(fastLm(y ~ f1 * f2, dd)) # fits all coefficients via approx solution
#>
#> Call:
#> fastLm.formula(formula = y ~ f1 * f2, data = dd)
#>
#> Residuals:
#> Min. 1st Qu. Median 3rd Qu. Max.
#> -1.2155e-01 -4.7016e-02 -7.1054e-15 4.7016e-02 1.2155e-01
#>
#> Estimate StdErr t.value p.value
#> (Intercept) 0.977859 0.061004 16.029 1.845e-08 ***
#> f1B 7.020458 0.040669 172.623 < 2.2e-16 ***
#> f1C 3.117222 0.086273 36.132 6.266e-12 ***
#> f1D 4.068523 0.086273 47.159 4.431e-13 ***
#> f2b 5.060123 0.086273 58.653 5.040e-14 ***
#> f2c 5.997592 0.086273 69.519 9.246e-15 ***
#> f1B:f2b 2.002847 0.064304 31.147 2.733e-11 ***
#> f1C:f2b 7.702999 0.122008 63.135 2.418e-14 ***
#> f1D:f2b 8.964251 0.122008 73.473 5.323e-15 ***
#> f1B:f2c 5.017610 0.064304 78.030 2.919e-15 ***
#> f1C:f2c 10.961326 0.122008 89.841 7.143e-16 ***
#> f1D:f2c 12.041081 0.122008 98.691 2.794e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.08627 on 10 degrees of freedom
#> Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999