Test Character Variables for Dates and Times

testCharDateTime(x, p = 0.5, m = 0, convert = FALSE, existing = FALSE)

Arguments

x

input vector of any type, but interesting cases are for character x

p

minimum proportion of non-missing non-blank values of x for which the format is one of the formats described before considering x to be of that type

m

if greater than 0, a test is applied: the number of distinct illegal values of x (values containing a letter or underscore) must not exceed m, or type character will be returned. p is set to 1.0 when m > 0.

convert

set to TRUE to convert the variable under the dominant format. If all values are NA, type will be set to 'character'.

existing

set to TRUE to return a character string with the current type of variable without examining pattern matches

Value

if convert=FALSE, a single character string with the type of x: "character", "datetime", "date", "time". If convert=TRUE, a list with components named type, x (converted to POSIXct, Date, or chron times format), and numna, the number of originally non-NA values of x that could not be converted to the predominant format. If there were any non-covertible dates/times, the returned vector is given an additional class special.miss and an attribute special.miss which is a list with original character values (codes) and observation numbers (obs). These are summarized by describe().

Details

For a vector x, if it is already a date-time, date, or time variable, the type is returned if convert=FALSE, or a list with that type, the original vector, and numna=0 is returned. Otherwise if x is not a character vector, a type of notcharacter is returned, or a list that includes the original x and type='notcharacter'. When x is character, the main logic is applied. The default logic (when m=0) is to consider x a date-time variable when its format is YYYY-MM-DD HH:MM:SS (:SS is optional) in more than 1/2 of the non-missing observations. It is considered to be a date if its format is YYYY-MM-DD or MM/DD/YYYY or DD-MMM-YYYY in more than 1/2 of the non-missing observations (MMM=3-letter month). A time variable has the format HH:MM:SS or HH:MM. Blank values of x (after trimming) are set to NA before proceeding.

Author

Frank Harrell

Examples

for(conv in c(FALSE, TRUE)) {
  print(testCharDateTime(c('2023-03-11', '2023-04-11', 'a', 'b', 'c'), convert=conv))
  print(testCharDateTime(c('2023-03-11', '2023-04-11', 'a', 'b'), convert=conv))
  print(testCharDateTime(c('2023-03-11 11:12:13', '2023-04-11 11:13:14', 'a', 'b'), convert=conv))
  print(testCharDateTime(c('2023-03-11 11:12', '2023-04-11 11:13', 'a', 'b'), convert=conv))
  print(testCharDateTime(c('3/11/2023', '4/11/2023', 'a', 'b'), convert=conv))
}
#> [1] "character"
#> [1] "date"
#> [1] "datetime"
#> [1] "datetime"
#> [1] "date"
#> $type
#> [1] "character"
#> 
#> $x
#> [1] "2023-03-11" "2023-04-11" "a"          "b"          "c"         
#> 
#> $numna
#> [1] 0
#> 
#> $type
#> [1] "date"
#> 
#> $x
#> [1] 2023-03-11 2023-04-11 a          b         
#> 
#> $numna
#> [1] 2
#> 
#> $type
#> [1] "datetime"
#> 
#> $x
#> [1] 2023-03-11 11:12:13 2023-04-11 11:13:14 a                  
#> [4] b                  
#> 
#> $numna
#> [1] 2
#> 
#> $type
#> [1] "datetime"
#> 
#> $x
#> [1] 2023-03-11 11:12:00 2023-04-11 11:13:00 a                  
#> [4] b                  
#> 
#> $numna
#> [1] 2
#> 
#> $type
#> [1] "date"
#> 
#> $x
#> [1] 2023-03-11 2023-04-11 a          b         
#> 
#> $numna
#> [1] 2
#> 
x <- c(paste0('2023-03-0', 1:9), 'a', 'a', 'a', 'b')
y <- testCharDateTime(x, convert=TRUE)$x
describe(y)  # note counts of special missing values a, b
#> y 
#>          n    missing          a          b   distinct       Info       Mean 
#>          9          4          3          1          9          1 2023-03-05 
#>    pMedian        Gmd 
#>      19421      3.333 
#>                                                                             
#> Value      2023-03-01 2023-03-02 2023-03-03 2023-03-04 2023-03-05 2023-03-06
#> Frequency           1          1          1          1          1          1
#> Proportion      0.111      0.111      0.111      0.111      0.111      0.111
#>                                            
#> Value      2023-03-07 2023-03-08 2023-03-09
#> Frequency           1          1          1
#> Proportion      0.111      0.111      0.111
#> 
#> For the frequency table, variable is rounded to the nearest 0