R/derive_locf_records.R
derive_locf_records.Rd
Adds LOCF records as new observations for each 'by group' when the dataset does not contain observations for missed visits/time points and when analysis value is missing.
derive_locf_records(
dataset,
dataset_ref,
by_vars,
id_vars_ref = NULL,
analysis_var = AVAL,
imputation = "add",
order,
keep_vars = NULL
)
Input dataset
The variables specified by the by_vars
, analysis_var
, order
, and keep_vars
arguments are expected to be in the dataset.
none
Expected observations dataset
Data frame with all the combinations of PARAMCD
, PARAM
, AVISIT
,
AVISITN
, ... which are expected in the dataset is expected.
none
Grouping variables
For each group defined by by_vars
those observations from dataset_ref
are added to the output dataset which do not have a corresponding observation
in the input dataset or for which analysis_var
is NA
for the corresponding observation
in the input dataset.
none
Grouping variables in expected observations dataset
The variables to group by in dataset_ref
when determining which observations should be
added to the input dataset.
All the variables in dataset_ref
Analysis variable.
a variable
AVAL
Select the mode of imputation:
add
: Keep all original records and add imputed records for missing
timepoints and missing analysis_var
values from dataset_ref
.
update
: Update records with missing analysis_var
and add imputed records
for missing timepoints from dataset_ref
.
update_add
: Keep all original records, update records with missing analysis_var
and add imputed records for missing timepoints from dataset_ref
.
One of these 3 values: "add"
, "update"
, "update_add"
"add"
Sort order
The dataset is sorted by order
before carrying the last observation
forward (e.g. AVAL
) within each by_vars
.
For handling of NA
s in sorting variables see Sort Order.
none
Variables that need carrying the last observation forward
Keep variables that need carrying the last observation forward other than analysis_var
(e.g., PARAMN
, VISITNUM
). If by default NULL
, only variables specified in
by_vars
and analysis_var
will be populated in the newly created records.
NULL
The input dataset with the new "LOCF" observations added for each
by_vars
, based on the value passed to the imputation
argument.
For each group (with respect to the variables specified for the
by_vars parameter) those observations from dataset_ref
are added to
the output dataset
which do not have a corresponding observation in the input dataset or
for which analysis_var
is NA
for the corresponding observation in the input dataset.
For the new observations, analysis_var
is set to the non-missing analysis_var
of the
previous observation in the input dataset (when sorted by order
) and
DTYPE
is set to "LOCF".
The imputation
argument decides whether to update the existing observation when
analysis_var
is NA
("update"
and "update_add"
), or to add a new observation from
dataset_ref
instead ("add"
).
BDS-Findings Functions for adding Parameters/Records:
default_qtc_paramcd()
,
derive_expected_records()
,
derive_extreme_event()
,
derive_extreme_records()
,
derive_param_bmi()
,
derive_param_bsa()
,
derive_param_computed()
,
derive_param_doseint()
,
derive_param_exist_flag()
,
derive_param_exposure()
,
derive_param_framingham()
,
derive_param_map()
,
derive_param_qtc()
,
derive_param_rr()
,
derive_param_wbc_abs()
,
derive_summary_records()
library(dplyr)
library(tibble)
advs <- tribble(
~STUDYID, ~USUBJID, ~VSSEQ, ~PARAMCD, ~PARAMN, ~AVAL, ~AVISITN, ~AVISIT,
"CDISC01", "01-701-1015", 1, "PULSE", 1, 65, 0, "BASELINE",
"CDISC01", "01-701-1015", 2, "DIABP", 2, 79, 0, "BASELINE",
"CDISC01", "01-701-1015", 3, "DIABP", 2, 80, 2, "WEEK 2",
"CDISC01", "01-701-1015", 4, "DIABP", 2, NA, 4, "WEEK 4",
"CDISC01", "01-701-1015", 5, "DIABP", 2, NA, 6, "WEEK 6",
"CDISC01", "01-701-1015", 6, "SYSBP", 3, 130, 0, "BASELINE",
"CDISC01", "01-701-1015", 7, "SYSBP", 3, 132, 2, "WEEK 2"
)
# A dataset with all the combinations of PARAMCD, PARAM, AVISIT, AVISITN, ...
# which are expected.
advs_expected_obsv <- tribble(
~PARAMCD, ~AVISITN, ~AVISIT,
"PULSE", 0, "BASELINE",
"PULSE", 6, "WEEK 6",
"DIABP", 0, "BASELINE",
"DIABP", 2, "WEEK 2",
"DIABP", 4, "WEEK 4",
"DIABP", 6, "WEEK 6",
"SYSBP", 0, "BASELINE",
"SYSBP", 2, "WEEK 2",
"SYSBP", 4, "WEEK 4",
"SYSBP", 6, "WEEK 6"
)
# Example 1: Add imputed records for missing timepoints and for missing
# `analysis_var` values (from `dataset_ref`), keeping all the original records.
derive_locf_records(
dataset = advs,
dataset_ref = advs_expected_obsv,
by_vars = exprs(STUDYID, USUBJID, PARAMCD),
imputation = "add",
order = exprs(AVISITN, AVISIT),
keep_vars = exprs(PARAMN)
) |>
arrange(USUBJID, PARAMCD, AVISIT)
#> # A tibble: 12 × 9
#> STUDYID USUBJID VSSEQ PARAMCD PARAMN AVAL AVISITN AVISIT DTYPE
#> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 CDISC01 01-701-1015 2 DIABP 2 79 0 BASELINE NA
#> 2 CDISC01 01-701-1015 3 DIABP 2 80 2 WEEK 2 NA
#> 3 CDISC01 01-701-1015 NA DIABP 2 80 4 WEEK 4 LOCF
#> 4 CDISC01 01-701-1015 4 DIABP 2 NA 4 WEEK 4 NA
#> 5 CDISC01 01-701-1015 NA DIABP 2 80 6 WEEK 6 LOCF
#> 6 CDISC01 01-701-1015 5 DIABP 2 NA 6 WEEK 6 NA
#> 7 CDISC01 01-701-1015 1 PULSE 1 65 0 BASELINE NA
#> 8 CDISC01 01-701-1015 NA PULSE 1 65 6 WEEK 6 LOCF
#> 9 CDISC01 01-701-1015 6 SYSBP 3 130 0 BASELINE NA
#> 10 CDISC01 01-701-1015 7 SYSBP 3 132 2 WEEK 2 NA
#> 11 CDISC01 01-701-1015 NA SYSBP 3 132 4 WEEK 4 LOCF
#> 12 CDISC01 01-701-1015 NA SYSBP 3 132 6 WEEK 6 LOCF
# Example 2: Add imputed records for missing timepoints (from `dataset_ref`)
# and update missing `analysis_var` values.
derive_locf_records(
dataset = advs,
dataset_ref = advs_expected_obsv,
by_vars = exprs(STUDYID, USUBJID, PARAMCD),
imputation = "update",
order = exprs(AVISITN, AVISIT),
) |>
arrange(USUBJID, PARAMCD, AVISIT)
#> # A tibble: 10 × 9
#> STUDYID USUBJID VSSEQ PARAMCD PARAMN AVAL AVISITN AVISIT DTYPE
#> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 CDISC01 01-701-1015 2 DIABP 2 79 0 BASELINE NA
#> 2 CDISC01 01-701-1015 3 DIABP 2 80 2 WEEK 2 NA
#> 3 CDISC01 01-701-1015 4 DIABP 2 80 4 WEEK 4 LOCF
#> 4 CDISC01 01-701-1015 5 DIABP 2 80 6 WEEK 6 LOCF
#> 5 CDISC01 01-701-1015 1 PULSE 1 65 0 BASELINE NA
#> 6 CDISC01 01-701-1015 NA PULSE NA 65 6 WEEK 6 LOCF
#> 7 CDISC01 01-701-1015 6 SYSBP 3 130 0 BASELINE NA
#> 8 CDISC01 01-701-1015 7 SYSBP 3 132 2 WEEK 2 NA
#> 9 CDISC01 01-701-1015 NA SYSBP NA 132 4 WEEK 4 LOCF
#> 10 CDISC01 01-701-1015 NA SYSBP NA 132 6 WEEK 6 LOCF
# Example 3: Add imputed records for missing timepoints (from `dataset_ref`) and
# update missing `analysis_var` values, keeping all the original records.
derive_locf_records(
dataset = advs,
dataset_ref = advs_expected_obsv,
by_vars = exprs(STUDYID, USUBJID, PARAMCD),
imputation = "update_add",
order = exprs(AVISITN, AVISIT),
) |>
arrange(USUBJID, PARAMCD, AVISIT)
#> # A tibble: 12 × 9
#> STUDYID USUBJID VSSEQ PARAMCD PARAMN AVAL AVISITN AVISIT DTYPE
#> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 CDISC01 01-701-1015 2 DIABP 2 79 0 BASELINE NA
#> 2 CDISC01 01-701-1015 3 DIABP 2 80 2 WEEK 2 NA
#> 3 CDISC01 01-701-1015 4 DIABP 2 80 4 WEEK 4 LOCF
#> 4 CDISC01 01-701-1015 4 DIABP 2 NA 4 WEEK 4 NA
#> 5 CDISC01 01-701-1015 5 DIABP 2 80 6 WEEK 6 LOCF
#> 6 CDISC01 01-701-1015 5 DIABP 2 NA 6 WEEK 6 NA
#> 7 CDISC01 01-701-1015 1 PULSE 1 65 0 BASELINE NA
#> 8 CDISC01 01-701-1015 NA PULSE NA 65 6 WEEK 6 LOCF
#> 9 CDISC01 01-701-1015 6 SYSBP 3 130 0 BASELINE NA
#> 10 CDISC01 01-701-1015 7 SYSBP 3 132 2 WEEK 2 NA
#> 11 CDISC01 01-701-1015 NA SYSBP NA 132 4 WEEK 4 LOCF
#> 12 CDISC01 01-701-1015 NA SYSBP NA 132 6 WEEK 6 LOCF