foodstamp.RdThis data consists of 150 randomly selected persons from a survey with information on over 2000 elderly US citizens, where the response, indicates participation in the U.S. Food Stamp Program.
data(foodstamp, package="robustbase")A data frame with 150 observations on the following 4 variables.
participationparticipation in U.S. Food Stamp Program; yes = 1, no = 0
tenancytenancy, indicating home ownership; yes = 1, no = 0
suppl.incomesupplemental income, indicating whether some form of supplemental security income is received; yes = 1, no = 0
incomemonthly income (in US dollars)
Data description and first analysis: Stefanski et al.(1986) who indicate Rizek(1978) as original source of the larger study.
Electronic version from CRAN package catdata.
Rizek, R. L. (1978) The 1977-78 Nationwide Food Consumption Survey. Family Econ. Rev., Fall, 3–7.
Stefanski, L. A., Carroll, R. J. and Ruppert, D. (1986) Optimally bounded score functions for generalized linear models with applications to logistic regression. Biometrika 73, 413–424.
Künsch, H. R., Stefanski, L. A., Carroll, R. J. (1989) Conditionally unbiased bounded-influence estimation in general regression models, with applications to generalized linear models. J. American Statistical Association 84, 460–466.
data(foodstamp)
(T123 <- xtabs(~ participation+ tenancy+ suppl.income, data=foodstamp))
#> , , suppl.income = 0
#>
#> tenancy
#> participation 0 1
#> 0 22 75
#> 1 9 3
#>
#> , , suppl.income = 1
#>
#> tenancy
#> participation 0 1
#> 0 11 18
#> 1 9 3
#>
summary(T123) ## ==> the binary var's are clearly not independent
#> Call: xtabs(formula = ~participation + tenancy + suppl.income, data = foodstamp)
#> Number of cases in table: 150
#> Number of factors: 3
#> Test for independence of all factors:
#> Chisq = 36.06, df = 4, p-value = 2.818e-07
#> Chi-squared approximation may be incorrect
foodSt <- within(foodstamp, {
logInc <- log(1 + income)
rm(income)
})
m1 <- glm(participation ~ ., family=binomial, data=foodSt)
summary(m1)
#>
#> Call:
#> glm(formula = participation ~ ., family = binomial, data = foodSt)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.9264 1.6229 0.571 0.56813
#> tenancy -1.8502 0.5347 -3.460 0.00054 ***
#> suppl.income 0.8961 0.5009 1.789 0.07365 .
#> logInc -0.3328 0.2729 -1.219 0.22280
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 131.9 on 149 degrees of freedom
#> Residual deviance: 106.4 on 146 degrees of freedom
#> AIC: 114.4
#>
#> Number of Fisher Scoring iterations: 5
#>
rm1 <- glmrob(participation ~ ., family=binomial, data=foodSt)
summary(rm1)
#>
#> Call: glmrob(formula = participation ~ ., family = binomial, data = foodSt)
#>
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.5894 1.6237 0.363 0.716583
#> tenancy -1.7896 0.5377 -3.328 0.000875 ***
#> suppl.income 0.8167 0.5166 1.581 0.113851
#> logInc -0.2669 0.2736 -0.975 0.329355
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> Robustness weights w.r * w.x:
#> 135 weights are ~= 1. The remaining 15 ones are summarized as
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.2865 0.5259 0.7527 0.6774 0.8876 0.9052
#>
#> Number of observations: 150
#> Fitted by method ‘Mqle’ (in 10 iterations)
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> No deviance values available
#> Algorithmic parameters:
#> acc tcc
#> 0.0001 1.3450
#> maxit
#> 50
#> test.acc
#> "coef"
#>
## Now use robust weights.on.x :
rm2 <- glmrob(participation ~ ., family=binomial, data=foodSt,
weights.on.x = "robCov")
summary(rm2)## aha, now the weights are different:
#>
#> Call: glmrob(formula = participation ~ ., family = binomial, data = foodSt, weights.on.x = "robCov")
#>
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 8.1767 3.5418 2.309 0.02096 *
#> tenancy -1.6313 0.6169 -2.644 0.00818 **
#> suppl.income 0.7805 0.5925 1.317 0.18774
#> logInc -1.6107 0.6181 -2.606 0.00916 **
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> Robustness weights w.r * w.x:
#> 55 weights are ~= 1. The remaining 95 ones are summarized as
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.0011 0.2794 0.4176 0.4005 0.5043 0.9027
#>
#> Number of observations: 150
#> Fitted by method ‘Mqle’ (in 8 iterations)
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> No deviance values available
#> Algorithmic parameters:
#> acc tcc
#> 0.0001 1.3450
#> maxit
#> 50
#> test.acc
#> "coef"
#>
which( weights(rm2, type="robust") < 0.5)
#> [1] 3 5 8 10 16 17 18 22 24 26 27 29 30 33 34 36 39 40 44
#> [20] 52 54 57 58 60 61 62 64 66 68 70 71 72 73 76 77 78 79 82
#> [39] 83 85 89 91 94 95 96 97 99 101 102 103 105 107 108 109 110 115 120
#> [58] 125 134 137 140 141 142 143 144 146 147 149