This data consists of 150 randomly selected persons from a survey with information on over 2000 elderly US citizens, where the response, indicates participation in the U.S. Food Stamp Program.

data(foodstamp, package="robustbase")

Format

A data frame with 150 observations on the following 4 variables.

participation

participation in U.S. Food Stamp Program; yes = 1, no = 0

tenancy

tenancy, indicating home ownership; yes = 1, no = 0

suppl.income

supplemental income, indicating whether some form of supplemental security income is received; yes = 1, no = 0

income

monthly income (in US dollars)

Source

Data description and first analysis: Stefanski et al.(1986) who indicate Rizek(1978) as original source of the larger study.

Electronic version from CRAN package catdata.

References

Rizek, R. L. (1978) The 1977-78 Nationwide Food Consumption Survey. Family Econ. Rev., Fall, 3–7.

Stefanski, L. A., Carroll, R. J. and Ruppert, D. (1986) Optimally bounded score functions for generalized linear models with applications to logistic regression. Biometrika 73, 413–424.

Künsch, H. R., Stefanski, L. A., Carroll, R. J. (1989) Conditionally unbiased bounded-influence estimation in general regression models, with applications to generalized linear models. J. American Statistical Association 84, 460–466.

Examples

data(foodstamp)

(T123 <- xtabs(~ participation+ tenancy+ suppl.income, data=foodstamp))
#> , , suppl.income = 0
#> 
#>              tenancy
#> participation  0  1
#>             0 22 75
#>             1  9  3
#> 
#> , , suppl.income = 1
#> 
#>              tenancy
#> participation  0  1
#>             0 11 18
#>             1  9  3
#> 
summary(T123) ## ==> the binary var's are clearly not independent
#> Call: xtabs(formula = ~participation + tenancy + suppl.income, data = foodstamp)
#> Number of cases in table: 150 
#> Number of factors: 3 
#> Test for independence of all factors:
#> 	Chisq = 36.06, df = 4, p-value = 2.818e-07
#> 	Chi-squared approximation may be incorrect

foodSt <- within(foodstamp, {
   logInc <- log(1 + income)
   rm(income)
})

m1 <- glm(participation ~ ., family=binomial, data=foodSt)
summary(m1)
#> 
#> Call:
#> glm(formula = participation ~ ., family = binomial, data = foodSt)
#> 
#> Coefficients:
#>              Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)    0.9264     1.6229   0.571  0.56813    
#> tenancy       -1.8502     0.5347  -3.460  0.00054 ***
#> suppl.income   0.8961     0.5009   1.789  0.07365 .  
#> logInc        -0.3328     0.2729  -1.219  0.22280    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 131.9  on 149  degrees of freedom
#> Residual deviance: 106.4  on 146  degrees of freedom
#> AIC: 114.4
#> 
#> Number of Fisher Scoring iterations: 5
#> 
rm1 <- glmrob(participation ~ ., family=binomial, data=foodSt)
summary(rm1)
#> 
#> Call:  glmrob(formula = participation ~ ., family = binomial, data = foodSt) 
#> 
#> 
#> Coefficients:
#>              Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)    0.5894     1.6237   0.363 0.716583    
#> tenancy       -1.7896     0.5377  -3.328 0.000875 ***
#> suppl.income   0.8167     0.5166   1.581 0.113851    
#> logInc        -0.2669     0.2736  -0.975 0.329355    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> Robustness weights w.r * w.x: 
#>  135 weights are ~= 1. The remaining 15 ones are summarized as
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.2865  0.5259  0.7527  0.6774  0.8876  0.9052 
#> 
#> Number of observations: 150 
#> Fitted by method ‘Mqle’  (in 10 iterations)
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#> No deviance values available 
#> Algorithmic parameters: 
#>    acc    tcc 
#> 0.0001 1.3450 
#> maxit 
#>    50 
#> test.acc 
#>   "coef" 
#> 
## Now use robust weights.on.x :
rm2 <- glmrob(participation ~ ., family=binomial, data=foodSt,
              weights.on.x = "robCov")
summary(rm2)## aha, now the weights are different:
#> 
#> Call:  glmrob(formula = participation ~ ., family = binomial, data = foodSt,      weights.on.x = "robCov") 
#> 
#> 
#> Coefficients:
#>              Estimate Std. Error z value Pr(>|z|)   
#> (Intercept)    8.1767     3.5418   2.309  0.02096 * 
#> tenancy       -1.6313     0.6169  -2.644  0.00818 **
#> suppl.income   0.7805     0.5925   1.317  0.18774   
#> logInc        -1.6107     0.6181  -2.606  0.00916 **
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> Robustness weights w.r * w.x: 
#>  55 weights are ~= 1. The remaining 95 ones are summarized as
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.0011  0.2794  0.4176  0.4005  0.5043  0.9027 
#> 
#> Number of observations: 150 
#> Fitted by method ‘Mqle’  (in 8 iterations)
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#> No deviance values available 
#> Algorithmic parameters: 
#>    acc    tcc 
#> 0.0001 1.3450 
#> maxit 
#>    50 
#> test.acc 
#>   "coef" 
#> 
which( weights(rm2, type="robust") < 0.5)
#>  [1]   3   5   8  10  16  17  18  22  24  26  27  29  30  33  34  36  39  40  44
#> [20]  52  54  57  58  60  61  62  64  66  68  70  71  72  73  76  77  78  79  82
#> [39]  83  85  89  91  94  95  96  97  99 101 102 103 105 107 108 109 110 115 120
#> [58] 125 134 137 140 141 142 143 144 146 147 149