Select Variables for a Formula Response or the RHS of a Formula

Select variables from a data frame whose names begin with a certain character string.

Select(data = list(), prefix = "y",
       lhs = NULL, rhs = NULL, rhs2 = NULL, rhs3 = NULL,
       as.character = FALSE, as.formula.arg = FALSE, tilde = TRUE,
       exclude = NULL, sort.arg = TRUE)

Arguments

data

A data frame or a matrix.

prefix

A vector of character strings, or a logical. If a character then the variables chosen from data begin with the value of prefix. If a logical then only TRUE is accepted and all the variables in data are chosen.

lhs

A character string. The response of a formula.

rhs

A character string. Included as part of the RHS a formula. Set rhs = "0" to suppress the intercept.

rhs2, rhs3

Same as rhs but appended to its RHS, i.e., paste0(rhs, " + ", rhs2, " + ", rhs3). If used, rhs should be used first, and then possibly rhs2 and then possibly rhs3.

as.character

Logical. Return the answer as a character string?

as.formula.arg

Logical. Is the answer a formula?

tilde

Logical. If as.character and as.formula.arg are both TRUE then include the tilde in the formula?

exclude

Vector of character strings. Exclude these variables explicitly.

sort.arg

Logical. Sort the variables?

Details

This is meant as a utility function to avoid manually: (i) making a cbind call to construct a big matrix response, and (ii) constructing a formula involving a lot of terms. The savings can be made because the variables of interest begin with some prefix, e.g., with the character "y".

Value

If as.character = FALSE and as.formula.arg = FALSE then a matrix such as cbind(y1, y2, y3). If as.character = TRUE and as.formula.arg = FALSE then a character string such as "cbind(y1, y2, y3)".

If as.character = FALSE and as.formula.arg = TRUE then a formula such as lhs ~ y1 + y2 + y3. If as.character = TRUE and as.formula.arg = TRUE then a character string such as "lhs ~ y1 + y2 + y3". See the examples below. By default, if no variables beginning the the value of prefix is found then a NULL is returned. Setting prefix = " " is a way of selecting no variables.

Author

T. W. Yee.

Note

This function is a bit experimental at this stage and may change in the short future. Some of its utility may be better achieved using subset and its select argument, e.g., subset(pdata, TRUE, select = y01:y10).

For some models such as posbernoulli.t the order of the variables in the xij argument is crucial, therefore care must be taken with the argument sort.arg. In some instances, it may be good to rename variables y1 to y01, y2 to y02, etc. when there are variables such as y14.

Currently subsetcol() and Select() are identical. One of these functions might be withdrawn in the future.

Examples

Pneumo <- pneumo
colnames(Pneumo) <- c("y1", "y2", "y3", "x2")  # The "y" variables are response
Pneumo$x1 <- 1; Pneumo$x3 <- 3; Pneumo$x <- 0; Pneumo$x4 <- 4  # Add these

Select(data = Pneumo)  # Same as with(Pneumo, cbind(y1, y2, y3))
#>     y1 y2 y3
#> 1  5.8 98  0
#> 2 15.0 51  2
#> 3 21.5 34  6
#> 4 27.5 35  5
#> 5 33.5 32 10
#> 6 39.5 23  7
#> 7 46.0 12  6
#> 8 51.5  4  2
Select(Pneumo, "x")
#>   x x1 x2 x3 x4
#> 1 0  1  0  3  4
#> 2 0  1  1  3  4
#> 3 0  1  3  3  4
#> 4 0  1  8  3  4
#> 5 0  1  9  3  4
#> 6 0  1  8  3  4
#> 7 0  1 10  3  4
#> 8 0  1  5  3  4
Select(Pneumo, "x", sort = FALSE, as.char = TRUE)
#> [1] "cbind(x2, x1, x3, x, x4)"
Select(Pneumo, "x", exclude = "x1")
#>   x x2 x3 x4
#> 1 0  0  3  4
#> 2 0  1  3  4
#> 3 0  3  3  4
#> 4 0  8  3  4
#> 5 0  9  3  4
#> 6 0  8  3  4
#> 7 0 10  3  4
#> 8 0  5  3  4
Select(Pneumo, "x", exclude = "x1", as.char = TRUE)
#> [1] "cbind(x, x2, x3, x4)"
Select(Pneumo, c("x", "y"))
#>   x x1 x2 x3 x4   y1 y2 y3
#> 1 0  1  0  3  4  5.8 98  0
#> 2 0  1  1  3  4 15.0 51  2
#> 3 0  1  3  3  4 21.5 34  6
#> 4 0  1  8  3  4 27.5 35  5
#> 5 0  1  9  3  4 33.5 32 10
#> 6 0  1  8  3  4 39.5 23  7
#> 7 0  1 10  3  4 46.0 12  6
#> 8 0  1  5  3  4 51.5  4  2
Select(Pneumo, "z")  # Now returns a NULL
#> NULL
Select(Pneumo, " ")  # Now returns a NULL
#> NULL
Select(Pneumo, prefix = TRUE, as.formula = TRUE)
#> ~x + x1 + x2 + x3 + x4 + y1 + y2 + y3
#> <environment: 0x56035b44ee20>
Select(Pneumo, "x", exclude = c("x3", "x1"), as.formula = TRUE,
       lhs = "cbind(y1, y2, y3)", rhs = "0")
#> cbind(y1, y2, y3) ~ x + x2 + x4 + 0
#> <environment: 0x56035bc93120>
Select(Pneumo, "x", exclude = "x1", as.formula = TRUE, as.char = TRUE,
       lhs = "cbind(y1, y2, y3)", rhs = "0")
#> [1] "cbind(y1, y2, y3) ~ x + x2 + x3 + x4 + 0"

# Now a 'real' example:
Huggins89table1 <- transform(Huggins89table1, x3.tij = t01)
tab1 <- subset(Huggins89table1,
               rowSums(Select(Huggins89table1, "y")) > 0)
# Same as
# subset(Huggins89table1, y1 + y2 + y3 + y4 + y5 + y6 + y7 + y8 + y9 + y10 > 0)

# Long way to do it:
fit.th <-
   vglm(cbind(y01, y02, y03, y04, y05, y06, y07, y08, y09, y10) ~ x2 + x3.tij,
        xij = list(x3.tij ~ t01 + t02 + t03 + t04 + t05 + t06 + t07 + t08 +
                            t09 + t10 - 1),
        posbernoulli.t(parallel.t = TRUE ~ x2 + x3.tij),
        data = tab1, trace = TRUE,
        form2 = ~ x2 + x3.tij + t01 + t02 + t03 + t04 + t05 + t06 + t07 + t08 +
                                t09 + t10)
#> Iteration 1: loglikelihood = -97.120355
#> Iteration 2: loglikelihood = -97.079804
#> Iteration 3: loglikelihood = -97.079782
#> Iteration 4: loglikelihood = -97.079782
# Short way to do it:
Fit.th <- vglm(Select(tab1, "y") ~ x2 + x3.tij,
               xij = list(Select(tab1, "t", as.formula = TRUE,
                                 sort = FALSE, lhs = "x3.tij", rhs = "0")),
               posbernoulli.t(parallel.t = TRUE ~ x2 + x3.tij),
               data = tab1, trace = TRUE,
               form2 = Select(tab1, prefix = TRUE, as.formula = TRUE))
#> Iteration 1: loglikelihood = -97.120355
#> Iteration 2: loglikelihood = -97.079804
#> Iteration 3: loglikelihood = -97.079782
#> Iteration 4: loglikelihood = -97.079782