geese.RdProduces an object of class `geese' which is a Generalized Estimating Equation fit of the data.
geese(
formula = formula(data),
sformula = ~1,
id,
waves = NULL,
data = parent.frame(),
subset = NULL,
na.action = na.omit,
contrasts = NULL,
weights = NULL,
zcor = NULL,
corp = NULL,
control = geese.control(...),
b = NULL,
alpha = NULL,
gm = NULL,
family = gaussian(),
mean.link = NULL,
variance = NULL,
cor.link = "identity",
sca.link = "identity",
link.same = TRUE,
scale.fix = FALSE,
scale.value = 1,
corstr = "independence",
...
)a formula expression as for glm, of the form
response ~ predictors. See the documentation of lm and
formula for details. As for glm, this specifies the linear
predictor for modeling the mean. A term of the form
offset(expression) is allowed.
a formula expression of the form ~
predictor, the response being ignored. This specifies the
linear predictor for modeling the dispersion. A term of the
form offset(expression) is allowed.
a vector which identifies the clusters. The length of `id' should be the same as the number of observations. Data are assumed to be sorted so that observations on a cluster are contiguous rows for all entities in the formula.
an integer vector which identifies components in
clusters. The length of waves should be the same as the
number of observation. components with the same waves
value will have the same link functions.
an optional data frame in which to interpret the
variables occurring in the formula, along with the
id and n variables.
expression saying which subset of the rows of the data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of observations), or a numeric vector indicating which observation numbers are to be included, or a character vector of the row names to be included. All observations are included by default.
a function to filter missing data. For gee
only na.omit should be used here.
a list giving contrasts for some or all of the factors appearing in the model formula. The elements of the list should have the same name as the variable and should be either a contrast matrix (specifically, any full-rank matrix with as many rows as there are levels in the factor), or else a function to compute such a matrix given the number of levels.
an optional vector of weights to be used in the
fitting process. The length of weights should be the
same as the number of observations. This weights is not (yet)
the weight as in sas proc genmod, and hence is not recommended
to use.
a design matrix for correlation parameters.
known parameters such as coordinates used for correlation coefficients.
a list of iteration and algorithmic constants. See
geese.control for their names and default
values. These can also be set as arguments to geese
itself.
an initial estimate for the mean parameters.
an initial estimate for the correlation parameters.
an initial estimate for the scale parameters.
a description of the error distribution and link
function to be used in the model, as for glm.
a character string specifying the link function
for the means. The following are allowed: "identity",
"logit", "probit", "cloglog",
"log", and "inverse". The default value is
determined from family.
a character string specifying the variance function
in terms of the mean. The following are allowed:
"gaussian", "binomial", "poisson", and
"gamma". The default value is determined from family.
a character string specifying the link function for
the correlation coefficients. The following are allowed:
"identity", and "fisherz".
a character string specifying the link function for
the scales. The following are allowed: "identity", and
"log".
a logical indicating if all the components in a cluster should use the same link.
a logical variable; if true, the scale parameter
is fixed at the value of scale.value.
numeric variable giving the value to which the
scale parameter should be fixed; used only if scale.fix
== TRUE.
a character string specifying the correlation
structure. The following are permitted: "independence",
"exchangeable", "ar1", "unstructured",
"userdefined", and "fixed"
further arguments passed to or from other methods.
An object of class "geese" representing the fit.
when the correlation structure is fixed, the specification of
Zcor should be a vector of length sum(clusz * (clusz - 1)) /
2.
Yan, J. and J.P. Fine (2004) Estimating Equations for Association Structures. Statistics in Medicine, 23, 859–880.
data(seizure)
## Diggle, Liang, and Zeger (1994) pp166-168, compare Table 8.10
seiz.l <- reshape(seizure,
varying=list(c("base","y1", "y2", "y3", "y4")),
v.names="y", times=0:4, direction="long")
seiz.l <- seiz.l[order(seiz.l$id, seiz.l$time),]
seiz.l$t <- ifelse(seiz.l$time == 0, 8, 2)
seiz.l$x <- ifelse(seiz.l$time == 0, 0, 1)
m1 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id,
data=seiz.l, corstr="exch", family=poisson)
summary(m1)
#>
#> Call:
#> geese(formula = y ~ offset(log(t)) + x + trt + x:trt, id = id,
#> data = seiz.l, family = poisson, corstr = "exch")
#>
#> Mean Model:
#> Mean Link: log
#> Variance to Mean Relation: poisson
#>
#> Coefficients:
#> estimate san.se wald p
#> (Intercept) 1.34761 0.1574 73.34238 0.0000
#> x 0.11184 0.1159 0.93061 0.3347
#> trt 0.02753 0.2218 0.01541 0.9012
#> x:trt -0.10473 0.2134 0.24073 0.6237
#>
#> Scale Model:
#> Scale Link: identity
#>
#> Estimated Scale Parameters:
#> estimate san.se wald p
#> (Intercept) 19.41 8.689 4.992 0.02546
#>
#> Correlation Model:
#> Correlation Structure: exch
#> Correlation Link: identity
#>
#> Estimated Correlation Parameters:
#> estimate san.se wald p
#> alpha 0.7767 0.07522 106.6 0
#>
#> Returned Error Value: 0
#> Number of clusters: 59 Maximum cluster size: 5
#>
m2 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id,
data = seiz.l, subset = id!=49,
corstr = "exch", family=poisson)
summary(m2)
#>
#> Call:
#> geese(formula = y ~ offset(log(t)) + x + trt + x:trt, id = id,
#> data = seiz.l, subset = id != 49, family = poisson, corstr = "exch")
#>
#> Mean Model:
#> Mean Link: log
#> Variance to Mean Relation: poisson
#>
#> Coefficients:
#> estimate san.se wald p
#> (Intercept) 1.3476 0.1574 73.3424 0.00000
#> x 0.1118 0.1159 0.9306 0.33470
#> trt -0.1068 0.1937 0.3041 0.58130
#> x:trt -0.3024 0.1711 3.1248 0.07711
#>
#> Scale Model:
#> Scale Link: identity
#>
#> Estimated Scale Parameters:
#> estimate san.se wald p
#> (Intercept) 10.38 2.282 20.71 5.347e-06
#>
#> Correlation Model:
#> Correlation Structure: exch
#> Correlation Link: identity
#>
#> Estimated Correlation Parameters:
#> estimate san.se wald p
#> alpha 0.5978 0.08107 54.38 1.653e-13
#>
#> Returned Error Value: 0
#> Number of clusters: 58 Maximum cluster size: 5
#>
## Using fixed correlation matrix
cor.fixed <- matrix(c(1, 0.5, 0.25, 0.125, 0.125,
0.5, 1, 0.25, 0.125, 0.125,
0.25, 0.25, 1, 0.5, 0.125,
0.125, 0.125, 0.5, 1, 0.125,
0.125, 0.125, 0.125, 0.125, 1), 5, 5)
cor.fixed
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1.000 0.500 0.250 0.125 0.125
#> [2,] 0.500 1.000 0.250 0.125 0.125
#> [3,] 0.250 0.250 1.000 0.500 0.125
#> [4,] 0.125 0.125 0.500 1.000 0.125
#> [5,] 0.125 0.125 0.125 0.125 1.000
zcor <- rep(cor.fixed[lower.tri(cor.fixed)], 59)
m3 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id,
data = seiz.l, family = poisson,
corstr = "fixed", zcor = zcor)
summary(m3)
#>
#> Call:
#> geese(formula = y ~ offset(log(t)) + x + trt + x:trt, id = id,
#> data = seiz.l, zcor = zcor, family = poisson, corstr = "fixed")
#>
#> Mean Model:
#> Mean Link: log
#> Variance to Mean Relation: poisson
#>
#> Coefficients:
#> estimate san.se wald p
#> (Intercept) 1.33150 0.1603 68.99478 1.110e-16
#> x 0.12972 0.1069 1.47238 2.250e-01
#> trt 0.02237 0.2166 0.01067 9.177e-01
#> x:trt -0.11278 0.2275 0.24585 6.200e-01
#>
#> Scale Model:
#> Scale Link: identity
#>
#> Estimated Scale Parameters:
#> estimate san.se wald p
#> (Intercept) 19.61 8.798 4.968 0.02582
#>
#> Correlation Model:
#> Correlation Structure: fixed
#>
#> Returned Error Value: 0
#> Number of clusters: 59 Maximum cluster size: 5
#>
data(ohio)
fit <- geese(resp ~ age + smoke + age:smoke, id=id, data=ohio,
family=binomial, corstr="exch", scale.fix=TRUE)
summary(fit)
#>
#> Call:
#> geese(formula = resp ~ age + smoke + age:smoke, id = id, data = ohio,
#> family = binomial, scale.fix = TRUE, corstr = "exch")
#>
#> Mean Model:
#> Mean Link: logit
#> Variance to Mean Relation: binomial
#>
#> Coefficients:
#> estimate san.se wald p
#> (Intercept) -1.90050 0.11909 254.6860 0.00000
#> age -0.14124 0.05820 5.8889 0.01524
#> smoke 0.31383 0.18784 2.7912 0.09478
#> age:smoke 0.07083 0.08828 0.6438 0.42234
#>
#> Scale is fixed.
#>
#> Correlation Model:
#> Correlation Structure: exch
#> Correlation Link: identity
#>
#> Estimated Correlation Parameters:
#> estimate san.se wald p
#> alpha 0.3545 0.03603 96.8 0
#>
#> Returned Error Value: 0
#> Number of clusters: 537 Maximum cluster size: 4
#>
fit.ar1 <- geese(resp ~ age + smoke + age:smoke, id=id, data=ohio,
family=binomial, corstr="ar1", scale.fix=TRUE)
summary(fit.ar1)
#>
#> Call:
#> geese(formula = resp ~ age + smoke + age:smoke, id = id, data = ohio,
#> family = binomial, scale.fix = TRUE, corstr = "ar1")
#>
#> Mean Model:
#> Mean Link: logit
#> Variance to Mean Relation: binomial
#>
#> Coefficients:
#> estimate san.se wald p
#> (Intercept) -1.92538 0.12079 254.0893 0.00000
#> age -0.14790 0.05990 6.0971 0.01354
#> smoke 0.28799 0.19156 2.2601 0.13275
#> age:smoke 0.08376 0.09178 0.8328 0.36147
#>
#> Scale is fixed.
#>
#> Correlation Model:
#> Correlation Structure: ar1
#> Correlation Link: identity
#>
#> Estimated Correlation Parameters:
#> estimate san.se wald p
#> alpha 0.5014 0.03794 174.6 0
#>
#> Returned Error Value: 0
#> Number of clusters: 537 Maximum cluster size: 4
#>
###### simulated data
## a function to generate a dataset
gendat <- function() {
id <- gl(50, 4, 200)
visit <- rep(1:4, 50)
x1 <- rbinom(200, 1, 0.6) ## within cluster varying binary covariate
x2 <- runif(200, 0, 1) ## within cluster varying continuous covariate
phi <- 1 + 2 * x1 ## true scale model
## the true correlation coefficient rho for an ar(1)
## correlation structure is 0.667.
rhomat <- 0.667 ^ outer(1:4, 1:4, function(x, y) abs(x - y))
chol.u <- chol(rhomat)
noise <- as.vector(sapply(1:50, function(x) chol.u %*% rnorm(4)))
e <- sqrt(phi) * noise
y <- 1 + 3 * x1 - 2 * x2 + e
dat <- data.frame(y, id, visit, x1, x2)
dat
}
dat <- gendat()
fit <- geese(y ~ x1 + x2, id = id, data = dat, sformula = ~ x1,
corstr = "ar1", jack = TRUE, j1s = TRUE, fij = TRUE)
summary(fit)
#>
#> Call:
#> geese(formula = y ~ x1 + x2, sformula = ~x1, id = id, data = dat,
#> corstr = "ar1", jack = TRUE, j1s = TRUE, fij = TRUE)
#>
#> Mean Model:
#> Mean Link: identity
#> Variance to Mean Relation: gaussian
#>
#> Coefficients:
#> estimate san.se ajs.se j1s.se fij.se wald p
#> (Intercept) 0.8309 0.1695 0.1640 0.1640 0.1686 24.02 9.549e-07
#> x1 2.7508 0.1795 0.1735 0.1735 0.1727 234.76 0.000e+00
#> x2 -1.9036 0.2600 0.2517 0.2517 0.2593 53.59 2.471e-13
#>
#> Scale Model:
#> Scale Link: identity
#>
#> Estimated Scale Parameters:
#> estimate san.se ajs.se j1s.se fij.se wald p
#> (Intercept) 1.013 0.1788 0.1823 0.1792 0.1811 32.09 1.469e-08
#> x1 1.991 0.3780 0.3796 0.3747 0.3769 27.75 1.384e-07
#>
#> Correlation Model:
#> Correlation Structure: ar1
#> Correlation Link: identity
#>
#> Estimated Correlation Parameters:
#> estimate san.se ajs.se j1s.se fij.se wald p
#> alpha 0.4618 0.06047 0.06239 0.05744 0.05961 58.34 2.209e-14
#>
#> Returned Error Value: 0
#> Number of clusters: 50 Maximum cluster size: 4
#>
#### create user-defined design matrix of unstrctured correlation.
#### in this case, zcor has 4*3/2 = 6 columns, and 50 * 6 = 300 rows
zcor <- genZcor(clusz = rep(4, 50), waves = dat$visit, "unstr")
zfit <- geese(y ~ x1 + x2, id = id, data = dat, sformula = ~ x1,
corstr = "userdefined", zcor = zcor,
jack = TRUE, j1s = TRUE, fij = TRUE)
summary(zfit)
#>
#> Call:
#> geese(formula = y ~ x1 + x2, sformula = ~x1, id = id, data = dat,
#> zcor = zcor, corstr = "userdefined", jack = TRUE, j1s = TRUE,
#> fij = TRUE)
#>
#> Mean Model:
#> Mean Link: identity
#> Variance to Mean Relation: gaussian
#>
#> Coefficients:
#> estimate san.se ajs.se j1s.se fij.se wald p
#> (Intercept) 0.7588 0.1917 0.1757 0.1757 0.1813 15.66 7.566e-05
#> x1 2.8553 0.1901 0.1740 0.1740 0.1821 225.49 0.000e+00
#> x2 -1.8396 0.2969 0.2729 0.2729 0.2834 38.39 5.796e-10
#>
#> Scale Model:
#> Scale Link: identity
#>
#> Estimated Scale Parameters:
#> estimate san.se ajs.se j1s.se fij.se wald p
#> (Intercept) 1.014 0.1776 0.1714 0.1678 0.1709 32.58 1.142e-08
#> x1 1.994 0.3804 0.3608 0.3560 0.3590 27.47 1.595e-07
#>
#> Correlation Model:
#> Correlation Structure: userdefined
#> Correlation Link: identity
#>
#> Estimated Correlation Parameters:
#> estimate san.se ajs.se j1s.se fij.se wald p
#> alpha.1:2 0.53787 0.14950 0.14684 0.14240 0.14957 12.9437 3.210e-04
#> alpha.1:3 0.44269 0.16739 0.15903 0.15193 0.16035 6.9938 8.179e-03
#> alpha.1:4 0.05345 0.10276 0.09730 0.09721 0.10114 0.2705 6.030e-01
#> alpha.2:3 0.41282 0.09183 0.08691 0.08662 0.08924 20.2111 6.935e-06
#> alpha.2:4 0.15105 0.09312 0.08598 0.08610 0.08559 2.6313 1.048e-01
#> alpha.3:4 0.33408 0.08956 0.08396 0.08477 0.08947 13.9149 1.913e-04
#>
#> Returned Error Value: 0
#> Number of clusters: 50 Maximum cluster size: 4
#>
#### Now, suppose that we want the correlation of 1-2, 2-3, and 3-4
#### to be the same. Then zcor should have 4 columns.
z2 <- matrix(NA, 300, 4)
z2[,1] <- zcor[,1] + zcor[,4] + zcor[,6]
z2[,2:4] <- zcor[, c(2, 3, 5)]
summary(geese(y ~ x1 + x2, id = id, data = dat, sformula = ~ x1,
corstr = "userdefined", zcor = z2,
jack = TRUE, j1s = TRUE, fij = TRUE))
#>
#> Call:
#> geese(formula = y ~ x1 + x2, sformula = ~x1, id = id, data = dat,
#> zcor = z2, corstr = "userdefined", jack = TRUE, j1s = TRUE,
#> fij = TRUE)
#>
#> Mean Model:
#> Mean Link: identity
#> Variance to Mean Relation: gaussian
#>
#> Coefficients:
#> estimate san.se ajs.se j1s.se fij.se wald p
#> (Intercept) 0.761 0.1875 0.1759 0.1759 0.1783 16.47 4.931e-05
#> x1 2.804 0.1801 0.1686 0.1686 0.1685 242.35 0.000e+00
#> x2 -1.826 0.2827 0.2655 0.2655 0.2751 41.74 1.044e-10
#>
#> Scale Model:
#> Scale Link: identity
#>
#> Estimated Scale Parameters:
#> estimate san.se ajs.se j1s.se fij.se wald p
#> (Intercept) 1.012 0.1771 0.1753 0.1718 0.1739 32.66 1.099e-08
#> x1 1.993 0.3797 0.3687 0.3642 0.3665 27.55 1.532e-07
#>
#> Correlation Model:
#> Correlation Structure: userdefined
#> Correlation Link: identity
#>
#> Estimated Correlation Parameters:
#> estimate san.se ajs.se j1s.se fij.se wald p
#> alpha:1 0.42848 0.04980 0.04982 0.04791 0.04852 74.0211 0.000000
#> alpha:2 0.43649 0.16778 0.16331 0.15609 0.16145 6.7680 0.009281
#> alpha:3 0.05113 0.10364 0.10074 0.10099 0.10331 0.2434 0.621748
#> alpha:4 0.14932 0.09422 0.08911 0.08809 0.08861 2.5114 0.113023
#>
#> Returned Error Value: 0
#> Number of clusters: 50 Maximum cluster size: 4
#>
#### Next, we introduce non-constant cluster sizes by
#### randomly selecting 60 percent of the data
good <- sort(sample(1:nrow(dat), .6 * nrow(dat)))
mdat <- dat[good,]
summary(geese(y ~ x1 + x2, id = id, data = mdat, waves = visit,
sformula = ~ x1, corstr="ar1",
jack = TRUE, j1s = TRUE, fij = TRUE))
#>
#> Call:
#> geese(formula = y ~ x1 + x2, sformula = ~x1, id = id, waves = visit,
#> data = mdat, corstr = "ar1", jack = TRUE, j1s = TRUE, fij = TRUE)
#>
#> Mean Model:
#> Mean Link: identity
#> Variance to Mean Relation: gaussian
#>
#> Coefficients:
#> estimate san.se ajs.se j1s.se fij.se wald p
#> (Intercept) 0.7051 0.2306 0.2262 0.2262 0.2378 9.349 2.231e-03
#> x1 2.8634 0.2272 0.2239 0.2239 0.2286 158.897 0.000e+00
#> x2 -1.8614 0.4160 0.4134 0.4134 0.4713 20.018 7.672e-06
#>
#> Scale Model:
#> Scale Link: identity
#>
#> Estimated Scale Parameters:
#> estimate san.se ajs.se j1s.se fij.se wald p
#> (Intercept) 0.8808 0.1804 0.1901 0.1846 0.1941 23.84 1.047e-06
#> x1 1.2811 0.2633 0.2673 0.2635 0.2708 23.67 1.146e-06
#>
#> Correlation Model:
#> Correlation Structure: ar1
#> Correlation Link: identity
#>
#> Estimated Correlation Parameters:
#> estimate san.se ajs.se j1s.se fij.se wald p
#> alpha 0.3359 0.129 0.1376 0.1549 0.2202 6.775 0.009242
#>
#> Returned Error Value: 0
#> Number of clusters: 49 Maximum cluster size: 4
#>