Sequential fitting of coordinate tests using a dr object

This function implements backward elimination using a dr object for which a dr.coordinate.test is defined, currently for SIR SAVE, IRE and PIRE.

dr.step(object,scope=NULL,d=NULL,minsize=2,stop=0,trace=1,...)

# S3 method for class 'dr'
drop1(object, scope = NULL,  update=TRUE,
test="general",trace=1,...)

Arguments

object: A dr object for which dr.coordinate.test is defined, for method equal to one of sir, save or ire.
scope: A one sided formula specifying predictors that will never be removed.
d: To use conditional coordinate tests, specify the dimension of the central (mean) subspace. The default is NULL, meaning no conditioning. This is currently available only for methods sir, save without categorical predictors, or for ire with or without categorical predictors.
minsize: Minimum subset size, must be greater than or equal to 2.
stop: Set stopping criterion: continue removing variables until the p-value for the next variable to be removed is less than stop. The default is stop = 0.
update: If true, the update method is used to return a dr object obtained from object by updating the formula to drop the variable with the largest p.value. This can significantly slow the computations for IRE but has little effect on SAVE and SIR.
test: Type of test to be used for selecting the next predictor to remove for method="save" only. "normal" assumes normal predictors, "general" assumes elliptically contoured predictors. For other methods, this argument is ignored.
trace: If positive, print informative output at each step, the default. If trace is 0 or false, suppress all printing.
...: Additional arguments passed to dr.coordinate.test.

Details

Suppose a dr object has \(p=a+b\) predictors, with \(a\) predictors specified in the scope statement. drop1 will compute either marginal coordinate tests (if d=NULL) or conditional marginal coordinate tests (if d is positive) for dropping each of the b predictors not in the scope, and return p.values. The result is an object created from the original object with the predictor with the largest p.value removed.

dr.step will call drop1.dr repeatedly until \(\max(a,d+1)\) predictors remain.

Value

As a side effect, a data frame of labels, tests, df, and p.values is printed. If update=TRUE, a dr object is returned with the predictor with the largest p.value removed.

References

Cook, R. D. (2004). Testing predictor contributions in sufficient dimension reduction. Annals of Statistics, 32, 1062-1092.

Shao, Y., Cook, R. D. and Weisberg (2007). Marginal tests with sliced average variance estimation. Biometrika.

Author

Sanford Weisberg, <sandy@stat.umn.edu>, based on the drop1 generic function in the base R. The dr.step function is also similar to step in base R.

Examples

data(ais)
# To make this idential to ARC, need to modify slices to match by
# using slice.info=dr.slices.arc() rather than nslices=8
summary(s1 <- dr(LBM~log(SSF)+log(Wt)+log(Hg)+log(Ht)+log(WCC)+log(RCC)+
                 log(Hc)+log(Ferr), data=ais,method="sir",
                 slice.method=dr.slices.arc,nslices=8)) 
#> 
#> Call:
#> dr(formula = LBM ~ log(SSF) + log(Wt) + log(Hg) + log(Ht) + log(WCC) + 
#>     log(RCC) + log(Hc) + log(Ferr), data = ais, method = "sir", 
#>     slice.method = dr.slices.arc, nslices = 8)
#> 
#> Method:
#> sir with 8 slices, n = 202.
#> 
#> Slice Sizes:
#> 25 25 25 25 27 27 30 18 
#> 
#> Estimated Basis Vectors for Central Subspace:
#>                Dir1      Dir2     Dir3      Dir4
#> log(SSF)   0.155356  0.045363 -0.08080  0.007174
#> log(Wt)   -0.969123  0.006309  0.28789  0.249082
#> log(Hg)   -0.157412 -0.456823 -0.00915 -0.045435
#> log(Ht)   -0.054094  0.315217 -0.68876 -0.542777
#> log(WCC)   0.005472  0.007850 -0.01038 -0.061888
#> log(RCC)  -0.006035 -0.419167  0.08569  0.566282
#> log(Hc)    0.094247  0.716934 -0.65463 -0.555732
#> log(Ferr) -0.003480  0.009819  0.01067 -0.088837
#> 
#>               Dir1   Dir2    Dir3    Dir4
#> Eigenvalues 0.9391 0.2220 0.09066 0.06427
#> R^2(OLS|dr) 0.9991 0.9991 0.99925 0.99926
#> 
#> Large-sample Marginal Dimension Tests:
#>               Stat df   p.value
#> 0D vs >= 1D 269.35 56 0.0000000
#> 1D vs >= 2D  79.66 42 0.0004021
#> 2D vs >= 3D  34.82 30 0.2492051
#> 3D vs >= 4D  16.51 20 0.6847223
# The following will almost duplicate information in Table 5 of Cook (2004).
# Slight differences occur because a different approximation for the
# sum of independent chi-square(1) random variables is used:
ans1 <- drop1(s1)
#> 
#>  LBM ~ log(SSF) + log(Wt) + log(Hg) + log(Ht) + log(WCC) + 
#>      log(RCC) + log(Hc) + log(Ferr)
#>             Statistic      P.value
#> - log(WCC)   2.388834 8.674682e-01
#> - log(Hg)    5.202510 4.813751e-01
#> - log(Ht)    8.077548 1.994139e-01
#> - log(RCC)   9.770283 1.097139e-01
#> - log(Hc)   10.039536 9.936986e-02
#> - log(Ferr) 10.863385 7.296643e-02
#> - log(SSF)  25.322296 1.435986e-04
#> - log(Wt)   42.322135 4.193156e-08
ans2 <- drop1(s1,d=2)
#> 
#>  LBM ~ log(SSF) + log(Wt) + log(Hg) + log(Ht) + log(WCC) + 
#>      log(RCC) + log(Hc) + log(Ferr)
#>              Statistic      P.value
#> - log(WCC)   0.1084395 7.561248e-01
#> - log(Ferr)  0.7359519 3.635322e-01
#> - log(Ht)    1.9821144 1.198231e-01
#> - log(Hg)    4.4894002 1.647214e-02
#> - log(Hc)    6.3180635 4.150525e-03
#> - log(RCC)   6.8164660 2.865534e-03
#> - log(SSF)  15.6856307 4.656384e-06
#> - log(Wt)   31.7148776 5.643419e-11
ans3 <- drop1(s1,d=3)
#> 
#>  LBM ~ log(SSF) + log(Wt) + log(Hg) + log(Ht) + log(WCC) + 
#>      log(RCC) + log(Hc) + log(Ferr)
#>              Statistic      P.value
#> - log(WCC)   0.1638879 9.205514e-01
#> - log(Ferr)  0.8845944 6.144432e-01
#> - log(Hg)    4.4901630 7.304205e-02
#> - log(Ht)    5.1053927 5.054880e-02
#> - log(RCC)   6.9233683 1.698072e-02
#> - log(Hc)    8.6677890 5.943620e-03
#> - log(SSF)  20.9863952 3.462158e-06
#> - log(Wt)   35.3792122 5.595753e-10
# remove predictors stepwise until we run out of variables to drop.
dr.step(s1,scope=~log(Wt)+log(Ht))
#> 
#>  LBM ~ log(SSF) + log(Wt) + log(Hg) + log(Ht) + log(WCC) + 
#>      log(RCC) + log(Hc) + log(Ferr)
#>             Statistic      P.value
#> - log(WCC)   2.388834 0.8674682270
#> - log(Hg)    5.202510 0.4813750749
#> - log(RCC)   9.770283 0.1097139021
#> - log(Hc)   10.039536 0.0993698643
#> - log(Ferr) 10.863385 0.0729664252
#> - log(SSF)  25.322296 0.0001435986
#> 
#>  LBM ~ log(SSF) + log(Wt) + log(Hg) + log(Ht) + log(RCC) + 
#>      log(Hc) + log(Ferr)
#>             Statistic      P.value
#> - log(Hg)    5.280848 4.727389e-01
#> - log(RCC)   9.680354 1.142247e-01
#> - log(Hc)   10.467980 8.542042e-02
#> - log(Ferr) 10.965572 7.080631e-02
#> - log(SSF)  27.341251 5.775252e-05
#> 
#>  LBM ~ log(SSF) + log(Wt) + log(Ht) + log(RCC) + log(Hc) + 
#>      log(Ferr)
#>             Statistic      P.value
#> - log(Hc)    9.498782 7.117135e-02
#> - log(Ferr) 10.728497 4.278951e-02
#> - log(RCC)  10.814724 4.126625e-02
#> - log(SSF)  31.988544 1.956933e-06
#> 
#>  LBM ~ log(SSF) + log(Wt) + log(Ht) + log(RCC) + log(Ferr)
#>             Statistic      P.value
#> - log(Ferr)  10.62658 2.198143e-02
#> - log(RCC)   19.05848 3.750310e-04
#> - log(SSF)   28.72390 2.821080e-06
#> 
#>  LBM ~ log(SSF) + log(Wt) + log(Ht) + log(RCC)
#>            Statistic      P.value
#> - log(RCC)  18.45772 1.542279e-04
#> - log(SSF)  30.14733 3.038586e-07
#> 
#>  LBM ~ log(SSF) + log(Wt) + log(Ht)
#>            Statistic      P.value
#> - log(SSF)  50.17375 8.198997e-13
#> 
#> No more variables to remove
#> 
#>  dr(formula = LBM ~ log(Wt) + log(Ht), data = ais, method = "sir", 
#>      slice.method = dr.slices.arc, nslices = 8)
#> Estimated Basis Vectors for Central Subspace:
#>              Dir1       Dir2
#> log(Wt) 0.7067240 -0.2467316
#> log(Ht) 0.7074893  0.9690838
#> Eigenvalues:
#> [1] 0.84018385 0.01589101
#>