Read and augment data with extended metadata attributes

t1read(data, metadata = NULL, read.fun = read.csv, ..., escape.html = TRUE)

Arguments

data: Either a file name (character) or a data.frame. If a file name it will be read using the function read.fun.
metadata: Either a file name (character) or a list. If a file name it will be read using the function read_yaml (so it should be a file the contains valid YAML text), and a list results. See Details regarding the list contents.
read.fun: A function to read files. It should accept a file name as its first argument and return a data.frame.
...: Further optional arguments, passed to read.fun.
escape.html: Logical. Should strings (labels, units) be converted to valid HTML by escaping special symbols?

Value

A data.frame (as returned by read.fun).

Details

The metadata list may contain the following 3 named elements (other elements are ignored):

labels: a named list, with names corresponding to columns in data and values the associated label attribute.
units: a named list, with names corresponding to columns in data and values the associated units attribute.
categoricals: a named list, with names corresponding to columns in data and values are themselves lists, used to convert the column to a factor: the list names are the levels, and the values are the associated labels. The names can also be omitted if the goal is just to specify the order of the factor levels.

Examples


# Simulate some data
set.seed(123)
data <- expand.grid(sex=0:1, cohort=1:3)[rep(1:6, times=c(7, 9, 21, 22, 11, 14)),]
data$age <- runif(nrow(data), 18, 80)
data$agecat <- 1*(data$age >= 65)
data$wgt <- rnorm(nrow(data), 75, 15)

metadata <- list(
  labels=list(
    cohort = "Cohort",
    sex = "Sex",
    age = "Age",
    agecat  = "Age category",
    wgt = "Weight"),
  units=list(
    age = "years",
    wgt = "kg"),
  categoricals=list(
    cohort = list(
      `1` = "Cohort A",
      `2` = "Cohort B",
      `3` = "Cohort C"),
    sex = list(
      `0` = "Female",
      `1` = "Male"),
    agecat = list(
      `0` = "< 65",
      `1` = "\U{2265} 65")))

 data <- t1read(data, metadata)
 table1(~ sex + age + agecat + wgt | cohort, data=data)
#> <table class="Rtable1">
#> <thead>
#> <tr>
#> <th class='rowlabel firstrow lastrow'></th>
#> <th class='firstrow lastrow'><span class='stratlabel'>Cohort A<br/><span class='stratn'>(N=16)</span></span></th>
#> <th class='firstrow lastrow'><span class='stratlabel'>Cohort B<br/><span class='stratn'>(N=43)</span></span></th>
#> <th class='firstrow lastrow'><span class='stratlabel'>Cohort C<br/><span class='stratn'>(N=25)</span></span></th>
#> <th class='firstrow lastrow'><span class='stratlabel'>Overall<br/><span class='stratn'>(N=84)</span></span></th>
#> </tr>
#> </thead>
#> <tbody>
#> <tr>
#> <td class='rowlabel firstrow'><span class='varlabel'>Sex</span></td>
#> <td class='firstrow'></td>
#> <td class='firstrow'></td>
#> <td class='firstrow'></td>
#> <td class='firstrow'></td>
#> </tr>
#> <tr>
#> <td class='rowlabel'>Female</td>
#> <td>7 (43.8%)</td>
#> <td>21 (48.8%)</td>
#> <td>11 (44.0%)</td>
#> <td>39 (46.4%)</td>
#> </tr>
#> <tr>
#> <td class='rowlabel lastrow'>Male</td>
#> <td class='lastrow'>9 (56.3%)</td>
#> <td class='lastrow'>22 (51.2%)</td>
#> <td class='lastrow'>14 (56.0%)</td>
#> <td class='lastrow'>45 (53.6%)</td>
#> </tr>
#> <tr>
#> <td class='rowlabel firstrow'><span class='varlabel'>Age<span class='varunits'> (years)</span></span></td>
#> <td class='firstrow'></td>
#> <td class='firstrow'></td>
#> <td class='firstrow'></td>
#> <td class='firstrow'></td>
#> </tr>
#> <tr>
#> <td class='rowlabel'>Mean (SD)</td>
#> <td>54.6 (18.1)</td>
#> <td>47.6 (18.5)</td>
#> <td>48.4 (15.5)</td>
#> <td>49.2 (17.5)</td>
#> </tr>
#> <tr>
#> <td class='rowlabel lastrow'>Median [Min, Max]</td>
#> <td class='lastrow'>52.8 [20.8, 77.3]</td>
#> <td class='lastrow'>45.4 [19.5, 79.6]</td>
#> <td class='lastrow'>45.8 [18.0, 68.5]</td>
#> <td class='lastrow'>47.2 [18.0, 79.6]</td>
#> </tr>
#> <tr>
#> <td class='rowlabel firstrow'><span class='varlabel'>Age category</span></td>
#> <td class='firstrow'></td>
#> <td class='firstrow'></td>
#> <td class='firstrow'></td>
#> <td class='firstrow'></td>
#> </tr>
#> <tr>
#> <td class='rowlabel'>&lt; 65</td>
#> <td>10 (62.5%)</td>
#> <td>33 (76.7%)</td>
#> <td>20 (80.0%)</td>
#> <td>63 (75.0%)</td>
#> </tr>
#> <tr>
#> <td class='rowlabel lastrow'>≥ 65</td>
#> <td class='lastrow'>6 (37.5%)</td>
#> <td class='lastrow'>10 (23.3%)</td>
#> <td class='lastrow'>5 (20.0%)</td>
#> <td class='lastrow'>21 (25.0%)</td>
#> </tr>
#> <tr>
#> <td class='rowlabel firstrow'><span class='varlabel'>Weight<span class='varunits'> (kg)</span></span></td>
#> <td class='firstrow'></td>
#> <td class='firstrow'></td>
#> <td class='firstrow'></td>
#> <td class='firstrow'></td>
#> </tr>
#> <tr>
#> <td class='rowlabel'>Mean (SD)</td>
#> <td>77.5 (15.8)</td>
#> <td>76.7 (13.5)</td>
#> <td>70.9 (11.7)</td>
#> <td>75.1 (13.6)</td>
#> </tr>
#> <tr>
#> <td class='rowlabel lastrow'>Median [Min, Max]</td>
#> <td class='lastrow'>74.5 [51.8, 108]</td>
#> <td class='lastrow'>76.9 [40.4, 108]</td>
#> <td class='lastrow'>69.8 [50.0, 103]</td>
#> <td class='lastrow'>74.2 [40.4, 108]</td>
#> </tr>
#> </tbody>
#> </table>