getting_started.Rmd
The purpose of the sdtmchecks
package is to help detect and investigate potential analysis relevant issues in SDTM data. This is done using a set of data check functions. These check functions are intended to be generalizable, actionable, and meaningful for analysis.
sdtmchecks
first install the latest version from GitHubdevtools::install_github("pharmaverse/sdtmchecks")
library(sdtmchecks)
# type ??sdtmchecks into the console ??sdtmchecks
sdtmchecksmeta
dataset which contains metadata on each check functionIt contains details like function name, category, priority, and descriptions
Each function is given a Category (Cross Therapeutic Area, Oncology, Covid-19, Patient Reported Outcomes, Ophthalmology) and a Priority (High, Medium, Low).
#Just type this in sdtmchecksmeta
## # A tibble: 10 × 5
## check title categ…¹ prior…² domains
## <chr> <chr> <chr> <chr> <chr>
## 1 check_ae_aeacn_ds_disctx_covid COVID AE trt disc… COVID Low ae, ds
## 2 check_ae_aeacnoth AE AEACNOTH multi… ALL Low ae
## 3 check_ae_aeacnoth_ds_stddisc_covid COVID AE study di… COVID Low ae, ds
## 4 check_ae_aedecod AE Missing PT ALL High ae
## 5 check_ae_aedthdtc_aesdth AE Death Date vs … ALL High ae
## 6 check_ae_aedthdtc_ds_death DS Death Dates in… ALL High ae, ds
## 7 check_ae_aelat AE AELAT Missing OPHTH High ae
## 8 check_ae_aeout AE Death Outcome ALL High ae
## 9 check_ae_aeout_aeendtc_aedthdtc Fatal AE Resoluti… ALL High ae
## 10 check_ae_aerel AE AEREL ALL Medium ae
## # … with abbreviated variable names ¹category, ²priority
check_ae_ds_partial_death_dates(AE,DS)
This check flags records with partial death dates (i.e. length <10) in AE and DS. If any are found, then data check returns FALSE
with attributes containing a list of flagged records as well as a brief message explaining the result. If no issues are detected the check returns TRUE
.
# Use sample data frames. AE
## USUBJID AEDECOD AEDTHDTC
## 1 1 AE1 2017-01-01
## 2 2 AE2 2017
## 3 3 AE3 <NA>
DS
## USUBJID DSSCAT DSDECOD DSSTDTC
## 1 4 STUDY DISCON DEATH 2018-01-01
## 2 5 STUDY DISCON DEATH 2017-03-03
## 3 6 STUDY DISCON DEATH 2018-01-02
## 4 7 STUDY DISCON DEATH 2016-10
# Run the data check. check_ae_ds_partial_death_dates(AE,DS)
## [1] FALSE
## attr(,"msg")
## [1] "There are 2 patients with partial death dates. "
## attr(,"data")
## USUBJID DSSCAT DSDECOD DSSTDTC AEDECOD AEDTHDTC
## 1 2 <NA> <NA> <NA> AE2 2017
## 2 7 STUDY DISCON DEATH 2016-10 <NA> <NA>
Just use the run_all_checks
function.
This function assumes you have all of your sdtm datasets as objects in your global environment, e.g. ae
,dm
,ex
,etc
# Read data to your global environment AE = haven::read_sas("path/to/ae.sas7bdat") DS = haven::read_sas("path/to/ds.sas7bdat") # Run the checks and save as an object called "myreport" myreport=run_all_checks(metads = sdtmchecksmeta, priority = c("High", "Medium", "Low"), #subset checks based on priority type = c("ALL", "ONC", "COVID", "PRO", "OPHTH"), #subset checks based category verbose = TRUE) class(myreport) #results in a list object names(myreport) #each check result is saved in a slot of the list myreport[["check_ae_aedecod"]] #investigate the results of a check
The run_all_checks
function lets you easily subset on category or priority
myreport=run_all_checks(metads = sdtmchecksmeta, priority = c("High"), type = c("ONC"), verbose = TRUE)
# Read data to your global environment AE = haven::read_sas("path/to/ae.sas7bdat") CM = haven::read_sas("path/to/cm.sas7bdat") DM = haven::read_sas("path/to/dm.sas7bdat") # Subset to checks that should work OK for most datasets metads = sdtmchecksmeta %>% filter(check %in% c("check_ae_aedecod", "check_ae_aetoxgr", "check_ae_dup", "check_cm_cmdecod", "check_cm_missing_month", "check_dm_age_missing", "check_dm_usubjid_dup", "check_dm_armcd" )) myreport=run_all_checks(metads = metads, verbose = TRUE)