Select.RdSelect variables from a data frame whose names begin with a certain character string.
Select(data = list(), prefix = "y",
lhs = NULL, rhs = NULL, rhs2 = NULL, rhs3 = NULL,
as.character = FALSE, as.formula.arg = FALSE, tilde = TRUE,
exclude = NULL, sort.arg = TRUE)A data frame or a matrix.
A vector of character strings, or a logical.
If a character then
the variables chosen from data begin with the
value of prefix.
If a logical then
only TRUE is accepted and all the variables
in data are chosen.
A character string. The response of a formula.
A character string.
Included as part of the RHS a formula.
Set rhs = "0" to suppress the intercept.
Same as rhs but appended to its RHS,
i.e., paste0(rhs, " + ", rhs2, " + ", rhs3).
If used, rhs should be used first,
and then possibly rhs2
and then possibly rhs3.
Logical. Return the answer as a character string?
Logical. Is the answer a formula?
Logical.
If as.character and as.formula.arg
are both TRUE
then include the tilde in the formula?
Vector of character strings. Exclude these variables explicitly.
Logical. Sort the variables?
This is meant as a utility function to avoid manually:
(i) making a cbind call to construct
a big matrix response,
and
(ii) constructing a formula involving a lot of terms.
The savings can be made because the variables of interest
begin with some prefix, e.g., with the character "y".
If as.character = FALSE and
as.formula.arg = FALSE then a matrix such
as cbind(y1, y2, y3).
If as.character = TRUE and
as.formula.arg = FALSE then a character string such
as "cbind(y1, y2, y3)".
If as.character = FALSE and
as.formula.arg = TRUE then a formula such
as lhs ~ y1 + y2 + y3.
If as.character = TRUE and
as.formula.arg = TRUE then a character string such
as "lhs ~ y1 + y2 + y3".
See the examples below.
By default, if no variables beginning the the value of prefix
is found then a NULL is returned.
Setting prefix = " " is a way of selecting no variables.
This function is a bit experimental at this stage and
may change in the short future.
Some of its utility may be better achieved using
subset and its select argument,
e.g., subset(pdata, TRUE, select = y01:y10).
For some models such as posbernoulli.t the
order of the variables in the xij argument is
crucial, therefore care must be taken with the
argument sort.arg.
In some instances, it may be good to rename variables
y1 to y01,
y2 to y02, etc.
when there are variables such as
y14.
Currently subsetcol() and Select() are identical.
One of these functions might be withdrawn in the future.
Pneumo <- pneumo
colnames(Pneumo) <- c("y1", "y2", "y3", "x2") # The "y" variables are response
Pneumo$x1 <- 1; Pneumo$x3 <- 3; Pneumo$x <- 0; Pneumo$x4 <- 4 # Add these
Select(data = Pneumo) # Same as with(Pneumo, cbind(y1, y2, y3))
#> y1 y2 y3
#> 1 5.8 98 0
#> 2 15.0 51 2
#> 3 21.5 34 6
#> 4 27.5 35 5
#> 5 33.5 32 10
#> 6 39.5 23 7
#> 7 46.0 12 6
#> 8 51.5 4 2
Select(Pneumo, "x")
#> x x1 x2 x3 x4
#> 1 0 1 0 3 4
#> 2 0 1 1 3 4
#> 3 0 1 3 3 4
#> 4 0 1 8 3 4
#> 5 0 1 9 3 4
#> 6 0 1 8 3 4
#> 7 0 1 10 3 4
#> 8 0 1 5 3 4
Select(Pneumo, "x", sort = FALSE, as.char = TRUE)
#> [1] "cbind(x2, x1, x3, x, x4)"
Select(Pneumo, "x", exclude = "x1")
#> x x2 x3 x4
#> 1 0 0 3 4
#> 2 0 1 3 4
#> 3 0 3 3 4
#> 4 0 8 3 4
#> 5 0 9 3 4
#> 6 0 8 3 4
#> 7 0 10 3 4
#> 8 0 5 3 4
Select(Pneumo, "x", exclude = "x1", as.char = TRUE)
#> [1] "cbind(x, x2, x3, x4)"
Select(Pneumo, c("x", "y"))
#> x x1 x2 x3 x4 y1 y2 y3
#> 1 0 1 0 3 4 5.8 98 0
#> 2 0 1 1 3 4 15.0 51 2
#> 3 0 1 3 3 4 21.5 34 6
#> 4 0 1 8 3 4 27.5 35 5
#> 5 0 1 9 3 4 33.5 32 10
#> 6 0 1 8 3 4 39.5 23 7
#> 7 0 1 10 3 4 46.0 12 6
#> 8 0 1 5 3 4 51.5 4 2
Select(Pneumo, "z") # Now returns a NULL
#> NULL
Select(Pneumo, " ") # Now returns a NULL
#> NULL
Select(Pneumo, prefix = TRUE, as.formula = TRUE)
#> ~x + x1 + x2 + x3 + x4 + y1 + y2 + y3
#> <environment: 0x56035b44ee20>
Select(Pneumo, "x", exclude = c("x3", "x1"), as.formula = TRUE,
lhs = "cbind(y1, y2, y3)", rhs = "0")
#> cbind(y1, y2, y3) ~ x + x2 + x4 + 0
#> <environment: 0x56035bc93120>
Select(Pneumo, "x", exclude = "x1", as.formula = TRUE, as.char = TRUE,
lhs = "cbind(y1, y2, y3)", rhs = "0")
#> [1] "cbind(y1, y2, y3) ~ x + x2 + x3 + x4 + 0"
# Now a 'real' example:
Huggins89table1 <- transform(Huggins89table1, x3.tij = t01)
tab1 <- subset(Huggins89table1,
rowSums(Select(Huggins89table1, "y")) > 0)
# Same as
# subset(Huggins89table1, y1 + y2 + y3 + y4 + y5 + y6 + y7 + y8 + y9 + y10 > 0)
# Long way to do it:
fit.th <-
vglm(cbind(y01, y02, y03, y04, y05, y06, y07, y08, y09, y10) ~ x2 + x3.tij,
xij = list(x3.tij ~ t01 + t02 + t03 + t04 + t05 + t06 + t07 + t08 +
t09 + t10 - 1),
posbernoulli.t(parallel.t = TRUE ~ x2 + x3.tij),
data = tab1, trace = TRUE,
form2 = ~ x2 + x3.tij + t01 + t02 + t03 + t04 + t05 + t06 + t07 + t08 +
t09 + t10)
#> Iteration 1: loglikelihood = -97.120355
#> Iteration 2: loglikelihood = -97.079804
#> Iteration 3: loglikelihood = -97.079782
#> Iteration 4: loglikelihood = -97.079782
# Short way to do it:
Fit.th <- vglm(Select(tab1, "y") ~ x2 + x3.tij,
xij = list(Select(tab1, "t", as.formula = TRUE,
sort = FALSE, lhs = "x3.tij", rhs = "0")),
posbernoulli.t(parallel.t = TRUE ~ x2 + x3.tij),
data = tab1, trace = TRUE,
form2 = Select(tab1, prefix = TRUE, as.formula = TRUE))
#> Iteration 1: loglikelihood = -97.120355
#> Iteration 2: loglikelihood = -97.079804
#> Iteration 3: loglikelihood = -97.079782
#> Iteration 4: loglikelihood = -97.079782