Getting Started

The purpose of the sdtmchecks package is to help detect and investigate potential analysis relevant issues in SDTM data. This is done using a set of data check functions. These check functions are intended to be generalizable, actionable, and meaningful for analysis.

To start using `sdtmchecks` first install the latest version from GitHub

devtools::install_github("pharmaverse/sdtmchecks")

Then just load the package

library(sdtmchecks)

Here’s how to access the help page for the package

# type ??sdtmchecks into the console
??sdtmchecks

The package comes with the `sdtmchecksmeta` dataset which contains metadata on each check function

It contains details like function name, category, priority, and descriptions

Each function is given a Category (Cross Therapeutic Area, Oncology, Covid-19, Patient Reported Outcomes, Ophthalmology) and a Priority (High, Medium, Low).

#Just type this in
sdtmchecksmeta

## # A tibble: 10 × 5
##    check                              title              categ…¹ prior…² domains
##    <chr>                              <chr>              <chr>   <chr>   <chr>  
##  1 check_ae_aeacn_ds_disctx_covid     COVID AE trt disc… COVID   Low     ae, ds 
##  2 check_ae_aeacnoth                  AE AEACNOTH multi… ALL     Low     ae     
##  3 check_ae_aeacnoth_ds_stddisc_covid COVID AE study di… COVID   Low     ae, ds 
##  4 check_ae_aedecod                   AE Missing PT      ALL     High    ae     
##  5 check_ae_aedthdtc_aesdth           AE Death Date vs … ALL     High    ae     
##  6 check_ae_aedthdtc_ds_death         DS Death Dates in… ALL     High    ae, ds 
##  7 check_ae_aelat                     AE AELAT Missing   OPHTH   High    ae     
##  8 check_ae_aeout                     AE Death Outcome   ALL     High    ae     
##  9 check_ae_aeout_aeendtc_aedthdtc    Fatal AE Resoluti… ALL     High    ae     
## 10 check_ae_aerel                     AE AEREL           ALL     Medium  ae     
## # … with abbreviated variable names ¹category, ²priority

Let’s do an example using `check_ae_ds_partial_death_dates(AE,DS)`

This check flags records with partial death dates (i.e. length <10) in AE and DS. If any are found, then data check returns FALSE with attributes containing a list of flagged records as well as a brief message explaining the result. If no issues are detected the check returns TRUE.

# Use sample data frames.
AE

##   USUBJID AEDECOD   AEDTHDTC
## 1       1     AE1 2017-01-01
## 2       2     AE2       2017
## 3       3     AE3       <NA>

DS

##   USUBJID       DSSCAT DSDECOD    DSSTDTC
## 1       4 STUDY DISCON   DEATH 2018-01-01
## 2       5 STUDY DISCON   DEATH 2017-03-03
## 3       6 STUDY DISCON   DEATH 2018-01-02
## 4       7 STUDY DISCON   DEATH    2016-10

# Run the data check.
check_ae_ds_partial_death_dates(AE,DS)

## [1] FALSE
## attr(,"msg")
## [1] "There are 2 patients with partial death dates. "
## attr(,"data")
##   USUBJID       DSSCAT DSDECOD DSSTDTC AEDECOD AEDTHDTC
## 1       2         <NA>    <NA>    <NA>     AE2     2017
## 2       7 STUDY DISCON   DEATH 2016-10    <NA>     <NA>

Running all the checks on your data is super easy

Just use the run_all_checks function.

This function assumes you have all of your sdtm datasets as objects in your global environment, e.g. ae,dm,ex,etc

# Read data to your global environment
AE = haven::read_sas("path/to/ae.sas7bdat")
DS = haven::read_sas("path/to/ds.sas7bdat")

# Run the checks and save as an object called "myreport"
myreport=run_all_checks(metads = sdtmchecksmeta,
               priority = c("High", "Medium", "Low"), #subset checks based on priority
               type = c("ALL", "ONC", "COVID", "PRO", "OPHTH"), #subset checks based category
               verbose = TRUE)

class(myreport) #results in a list object
names(myreport) #each check result is saved in a slot of the list
myreport[["check_ae_aedecod"]] #investigate the results of a check

The run_all_checks function lets you easily subset on category or priority

myreport=run_all_checks(metads = sdtmchecksmeta,
               priority = c("High"),
               type = c("ONC"),
               verbose = TRUE)

Here’s a way to get started with some checks that should work fairly well for most datasets

# Read data to your global environment
AE = haven::read_sas("path/to/ae.sas7bdat")
CM = haven::read_sas("path/to/cm.sas7bdat")
DM = haven::read_sas("path/to/dm.sas7bdat")

# Subset to checks that should work OK for most datasets
metads = sdtmchecksmeta %>%
  filter(check %in% c("check_ae_aedecod",
                      "check_ae_aetoxgr",
                      "check_ae_dup",
                      "check_cm_cmdecod",
                      "check_cm_missing_month",
                      "check_dm_age_missing",
                      "check_dm_usubjid_dup",
                      "check_dm_armcd"
                      ))

myreport=run_all_checks(metads = metads,
               verbose = TRUE)

To start using sdtmchecks first install the latest version from GitHub