This function prints the condition number of a matrix while adding columns one-by-one. This is useful for testing multicollinearity and other numerical problems. It is a generic function with a default method, and a method for maxLik objects.

condiNumber(x, ...)
# Default S3 method
condiNumber(x, exact = FALSE, norm = FALSE,
   printLevel=print.level, print.level=1, digits = getOption( "digits" ), ... )
# S3 method for class 'maxLik'
condiNumber(x, ...)

Arguments

x

numeric matrix, condition numbers of which are to be printed

exact

logical, should condition numbers be exact or approximations (see kappa)

norm

logical, whether the columns should be normalised to have unit norm

printLevel

numeric, positive value will output the numbers during the calculations. Useful for interactive work.

print.level

same as ‘printLevel’, for backward compatibility

digits

minimal number of significant digits to print (only relevant if argument print.level is larger than zero).

...

Further arguments to condiNumber.default are currently ignored; further arguments to condiNumber.maxLik are passed to condiNumber.default.

Details

Statistical model often fail because of a high correlation between the explanatory variables in the linear index (multicollinearity) or because the evaluated maximum of a non-linear model is virtually flat. In both cases, the (near) singularity of the related matrices may help to understand the problem.

condiNumber inspects the matrices column-by-column and indicates which variables lead to a jump in the condition number (cause singularity). If the matrix column name does not immediately indicate the problem, one may run an OLS model by estimating this column using all the previous columns as explanatory variables. Those columns that explain almost all the variation in the current one will have very high \(t\)-values.

Value

Invisible vector of condition numbers by column. If the start values for maxLik are named, the condition numbers are named accordingly.

References

Greene, W. (2012): Econometrics Analysis, 7th edition, p. 130.

Author

Ott Toomet

See also

Examples

   set.seed(0)
   ## generate a simple nearly multicollinear dataset
   x1 <- runif(100)
   x2 <- runif(100)
   x3 <- x1 + x2 + 0.000001*runif(100) # this is virtually equal to x1 + x2
   x4 <- runif(100)
   y <- x1 + x2 + x3 + x4 + rnorm(100)
   m <- lm(y ~ -1 + x1 + x2 + x3 + x4)
   print(summary(m)) # note the outlandish estimates and standard errors
#> 
#> Call:
#> lm(formula = y ~ -1 + x1 + x2 + x3 + x4)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -3.01496 -0.70762 -0.02821  0.60782  2.39831 
#> 
#> Coefficients:
#>      Estimate Std. Error t value Pr(>|t|)
#> x1 -1.374e+05  3.762e+05  -0.365    0.716
#> x2 -1.374e+05  3.762e+05  -0.365    0.716
#> x3  1.374e+05  3.762e+05   0.365    0.716
#> x4  4.862e-01  3.204e-01   1.518    0.132
#> 
#> Residual standard error: 1.044 on 96 degrees of freedom
#> Multiple R-squared:  0.8808,	Adjusted R-squared:  0.8759 
#> F-statistic: 177.4 on 4 and 96 DF,  p-value: < 2.2e-16
#> 
                     # while R^2 is 0.88. This suggests multicollinearity
   condiNumber(model.matrix(m))   # note the value 'explodes' at x3
#> x1 	 1 
#> x2 	 3.413135 
#> x3 	 14095268 
#> x4 	 11680350 
   ## we may test the results further:
   print(summary(lm(x3 ~ -1 + x1 + x2)))
#> 
#> Call:
#> lm(formula = x3 ~ -1 + x1 + x2)
#> 
#> Residuals:
#>        Min         1Q     Median         3Q        Max 
#> -5.579e-07 -1.886e-07 -7.440e-09  2.539e-07  6.849e-07 
#> 
#> Coefficients:
#>     Estimate Std. Error  t value Pr(>|t|)    
#> x1 1.000e+00  8.418e-08 11879172   <2e-16 ***
#> x2 1.000e+00  8.480e-08 11792743   <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 3.014e-07 on 98 degrees of freedom
#> Multiple R-squared:      1,	Adjusted R-squared:      1 
#> F-statistic: 6.722e+14 on 2 and 98 DF,  p-value: < 2.2e-16
#> 
   # Note the extremely high t-values and R^2: x3 is (almost) completely
   # explained by x1 and x2