Divide a vector into slices of approximately equal size

Divides a vector into slices of approximately equal size.

dr.slices(y, nslices)

dr.slices.arc(y, nslices)

Arguments

y: a vector of length \(n\) or an \(n \times p\) matrix
nslices: the number of slices, no larger than \(n\), or a vector of \(p\) numbers giving the number of slices in each direction. If \(y\) has \(p\) columns and nslices is a number, then the number of slices in each direction is the smallest integer greater than the p-th root of nslices.

Details

If \(y\) is an n-vector, order \(y\). The goal for the number of observations per slice is \(m\), the smallest integer in nslices/n. Allocate the first \(m\) observations to slice 1. If there are duplicates in \(y\), keep adding observations to the first slice until the next value of \(y\) is not equal to the largest value in the first slice. Allocate the next \(m\) values to the next slice, and again check for ties. Continue until all values are allocated to a slice. This does not guarantee that nslices will be obtained, nor does it guarantee an equal number of observations per slice. This method of choosing slices is invariant under rescaling, but not under multiplication by \(-1\), so the slices of \(y\) will not be the same as the slices of \(-y\). This function was rewritten for Version 2.0.4 of this package, and will no longer give exactly the same results as the program Arc. If you want to duplicate Arc, use the function dr.slice.arc, as illustrated in the example below.

If \(y\) is a matrix of p columns, slice the first column as described above. Then, within each of the slices determined for the first column, slice based on the second column, so that each of the “cells” has approximately the same number of observations. Continue through all the columns. This method is not invariant under reordering of the columns, or under multiplication by \(-1\).

Value

Returns a named list with three elements as follows:

slice.indicator: ordered eigenvectors that describe the estimates of the dimension reduction subspace
nslices: Gives the actual number of slices produced, which may be smaller than the number requested.
slice.sizes: The number of observations in each slice.

References

R. D. Cook and S. Weisberg (1999), Applied Regression Including Computing and Graphics, New York: Wiley.

Author

Sanford Weisberg, <sandy@stat.umn.edu>

Examples

data(ais)
summary(s1 <- dr(LBM~log(SSF)+log(Wt)+log(Hg)+log(Ht)+log(WCC)+log(RCC)+
                 log(Hc)+log(Ferr), data=ais,method="sir",nslices=8))
#> 
#> Call:
#> dr(formula = LBM ~ log(SSF) + log(Wt) + log(Hg) + log(Ht) + log(WCC) + 
#>     log(RCC) + log(Hc) + log(Ferr), data = ais, method = "sir", 
#>     nslices = 8)
#> 
#> Method:
#> sir with 8 slices, n = 202.
#> 
#> Slice Sizes:
#> 25 25 25 25 27 27 30 18 
#> 
#> Estimated Basis Vectors for Central Subspace:
#>                Dir1      Dir2     Dir3      Dir4
#> log(SSF)   0.155356  0.045363 -0.08080  0.007174
#> log(Wt)   -0.969123  0.006309  0.28789  0.249082
#> log(Hg)   -0.157412 -0.456823 -0.00915 -0.045435
#> log(Ht)   -0.054094  0.315217 -0.68876 -0.542777
#> log(WCC)   0.005472  0.007850 -0.01038 -0.061888
#> log(RCC)  -0.006035 -0.419167  0.08569  0.566282
#> log(Hc)    0.094247  0.716934 -0.65463 -0.555732
#> log(Ferr) -0.003480  0.009819  0.01067 -0.088837
#> 
#>               Dir1   Dir2    Dir3    Dir4
#> Eigenvalues 0.9391 0.2220 0.09066 0.06427
#> R^2(OLS|dr) 0.9991 0.9991 0.99925 0.99926
#> 
#> Large-sample Marginal Dimension Tests:
#>               Stat df   p.value
#> 0D vs >= 1D 269.35 56 0.0000000
#> 1D vs >= 2D  79.66 42 0.0004021
#> 2D vs >= 3D  34.82 30 0.2492051
#> 3D vs >= 4D  16.51 20 0.6847223
# To make this idential to ARC, need to modify slices to match.
summary(s2 <- update(s1,slice.info=dr.slices.arc(ais$LBM,8)))
#> 
#> Call:
#> dr(formula = LBM ~ log(SSF) + log(Wt) + log(Hg) + log(Ht) + log(WCC) + 
#>     log(RCC) + log(Hc) + log(Ferr), data = ais, method = "sir", 
#>     nslices = 8, slice.info = dr.slices.arc(ais$LBM, 8))
#> 
#> Method:
#> sir with 8 slices, n = 202.
#> 
#> Slice Sizes:
#> 25 25 25 25 27 27 30 18 
#> 
#> Estimated Basis Vectors for Central Subspace:
#>                Dir1      Dir2     Dir3      Dir4
#> log(SSF)   0.155356  0.045363 -0.08080  0.007174
#> log(Wt)   -0.969123  0.006309  0.28789  0.249082
#> log(Hg)   -0.157412 -0.456823 -0.00915 -0.045435
#> log(Ht)   -0.054094  0.315217 -0.68876 -0.542777
#> log(WCC)   0.005472  0.007850 -0.01038 -0.061888
#> log(RCC)  -0.006035 -0.419167  0.08569  0.566282
#> log(Hc)    0.094247  0.716934 -0.65463 -0.555732
#> log(Ferr) -0.003480  0.009819  0.01067 -0.088837
#> 
#>               Dir1   Dir2    Dir3    Dir4
#> Eigenvalues 0.9391 0.2220 0.09066 0.06427
#> R^2(OLS|dr) 0.9991 0.9991 0.99925 0.99926
#> 
#> Large-sample Marginal Dimension Tests:
#>               Stat df   p.value
#> 0D vs >= 1D 269.35 56 0.0000000
#> 1D vs >= 2D  79.66 42 0.0004021
#> 2D vs >= 3D  34.82 30 0.2492051
#> 3D vs >= 4D  16.51 20 0.6847223