The purpose of the sdtmchecks package is to help detect and investigate potential analysis relevant issues in SDTM data. This is done using a set of data check functions. These check functions are intended to be generalizable, actionable, and meaningful for analysis.

To start using sdtmchecks first install the latest version from GitHub

devtools::install_github("pharmaverse/sdtmchecks")

Then just load the package

library(sdtmchecks) 

Here’s how to access the help page for the package

# type ??sdtmchecks into the console
??sdtmchecks 

The package comes with the sdtmchecksmeta dataset which contains metadata on each check function

It contains details like function name, category, priority, and descriptions

Each function is given a Category (Cross Therapeutic Area, Oncology, Covid-19, Patient Reported Outcomes, Ophthalmology) and a Priority (High, Medium, Low).

#Just type this in
sdtmchecksmeta
## # A tibble: 10 × 5
##    check                              title              categ…¹ prior…² domains
##    <chr>                              <chr>              <chr>   <chr>   <chr>  
##  1 check_ae_aeacn_ds_disctx_covid     COVID AE trt disc… COVID   Low     ae, ds 
##  2 check_ae_aeacnoth                  AE AEACNOTH multi… ALL     Low     ae     
##  3 check_ae_aeacnoth_ds_stddisc_covid COVID AE study di… COVID   Low     ae, ds 
##  4 check_ae_aedecod                   AE Missing PT      ALL     High    ae     
##  5 check_ae_aedthdtc_aesdth           AE Death Date vs … ALL     High    ae     
##  6 check_ae_aedthdtc_ds_death         DS Death Dates in… ALL     High    ae, ds 
##  7 check_ae_aelat                     AE AELAT Missing   OPHTH   High    ae     
##  8 check_ae_aeout                     AE Death Outcome   ALL     High    ae     
##  9 check_ae_aeout_aeendtc_aedthdtc    Fatal AE Resoluti… ALL     High    ae     
## 10 check_ae_aerel                     AE AEREL           ALL     Medium  ae     
## # … with abbreviated variable names ¹​category, ²​priority

Let’s do an example using check_ae_ds_partial_death_dates(AE,DS)

This check flags records with partial death dates (i.e. length <10) in AE and DS. If any are found, then data check returns FALSE with attributes containing a list of flagged records as well as a brief message explaining the result. If no issues are detected the check returns TRUE.

# Use sample data frames.
AE
##   USUBJID AEDECOD   AEDTHDTC
## 1       1     AE1 2017-01-01
## 2       2     AE2       2017
## 3       3     AE3       <NA>
DS
##   USUBJID       DSSCAT DSDECOD    DSSTDTC
## 1       4 STUDY DISCON   DEATH 2018-01-01
## 2       5 STUDY DISCON   DEATH 2017-03-03
## 3       6 STUDY DISCON   DEATH 2018-01-02
## 4       7 STUDY DISCON   DEATH    2016-10
# Run the data check.
check_ae_ds_partial_death_dates(AE,DS)
## [1] FALSE
## attr(,"msg")
## [1] "There are 2 patients with partial death dates. "
## attr(,"data")
##   USUBJID       DSSCAT DSDECOD DSSTDTC AEDECOD AEDTHDTC
## 1       2         <NA>    <NA>    <NA>     AE2     2017
## 2       7 STUDY DISCON   DEATH 2016-10    <NA>     <NA>

Running all the checks on your data is super easy

Just use the run_all_checks function.

This function assumes you have all of your sdtm datasets as objects in your global environment, e.g. ae,dm,ex,etc

# Read data to your global environment
AE = haven::read_sas("path/to/ae.sas7bdat")
DS = haven::read_sas("path/to/ds.sas7bdat")

# Run the checks and save as an object called "myreport"
myreport=run_all_checks(metads = sdtmchecksmeta,
               priority = c("High", "Medium", "Low"), #subset checks based on priority
               type = c("ALL", "ONC", "COVID", "PRO", "OPHTH"), #subset checks based category
               verbose = TRUE)

class(myreport) #results in a list object
names(myreport) #each check result is saved in a slot of the list
myreport[["check_ae_aedecod"]] #investigate the results of a check

The run_all_checks function lets you easily subset on category or priority

myreport=run_all_checks(metads = sdtmchecksmeta,
               priority = c("High"),
               type = c("ONC"),
               verbose = TRUE)

Here’s a way to get started with some checks that should work fairly well for most datasets

# Read data to your global environment
AE = haven::read_sas("path/to/ae.sas7bdat")
CM = haven::read_sas("path/to/cm.sas7bdat")
DM = haven::read_sas("path/to/dm.sas7bdat")

# Subset to checks that should work OK for most datasets
metads = sdtmchecksmeta %>%
  filter(check %in% c("check_ae_aedecod",
                      "check_ae_aetoxgr",
                      "check_ae_dup",
                      "check_cm_cmdecod",
                      "check_cm_missing_month",
                      "check_dm_age_missing",
                      "check_dm_usubjid_dup",
                      "check_dm_armcd"
                      ))

myreport=run_all_checks(metads = metads,
               verbose = TRUE)