testCharDateTime.Rd
Test Character Variables for Dates and Times
testCharDateTime(x, p = 0.5, m = 0, convert = FALSE, existing = FALSE)
input vector of any type, but interesting cases are for character x
minimum proportion of non-missing non-blank values of x
for which the format is one of the formats described before considering x
to be of that type
if greater than 0, a test is applied: the number of distinct illegal values of x
(values containing a letter or underscore) must not exceed m
, or type character
will be returned. p
is set to 1.0
when m
> 0.
set to TRUE
to convert the variable under the dominant format. If all values are NA
, type
will be set to 'character'
.
set to TRUE
to return a character string with the current type of variable without examining pattern matches
if convert=FALSE
, a single character string with the type of x
: "character", "datetime", "date", "time"
. If convert=TRUE
, a list with components named type
, x
(converted to POSIXct
, Date
, or chron
times format), and numna
, the number of originally non-NA
values of x
that could not be converted to the predominant format. If there were any non-covertible dates/times,
the returned vector is given an additional class special.miss
and an
attribute special.miss
which is a list with original character values
(codes
) and observation numbers (obs
). These are summarized by
describe()
.
For a vector x
, if it is already a date-time, date, or time variable, the type is returned if convert=FALSE
, or a list with that type, the original vector, and numna=0
is returned. Otherwise if x
is not a character vector, a type of notcharacter
is returned, or a list that includes the original x
and type='notcharacter'
. When x
is character, the main logic is applied. The default logic (when m=0
) is to consider x
a date-time variable when its format is YYYY-MM-DD HH:MM:SS (:SS is optional) in more than 1/2 of the non-missing observations. It is considered to be a date if its format is YYYY-MM-DD or MM/DD/YYYY or DD-MMM-YYYY in more than 1/2 of the non-missing observations (MMM=3-letter month). A time variable has the format HH:MM:SS or HH:MM. Blank values of x
(after trimming) are set to NA
before proceeding.
for(conv in c(FALSE, TRUE)) {
print(testCharDateTime(c('2023-03-11', '2023-04-11', 'a', 'b', 'c'), convert=conv))
print(testCharDateTime(c('2023-03-11', '2023-04-11', 'a', 'b'), convert=conv))
print(testCharDateTime(c('2023-03-11 11:12:13', '2023-04-11 11:13:14', 'a', 'b'), convert=conv))
print(testCharDateTime(c('2023-03-11 11:12', '2023-04-11 11:13', 'a', 'b'), convert=conv))
print(testCharDateTime(c('3/11/2023', '4/11/2023', 'a', 'b'), convert=conv))
}
#> [1] "character"
#> [1] "date"
#> [1] "datetime"
#> [1] "datetime"
#> [1] "date"
#> $type
#> [1] "character"
#>
#> $x
#> [1] "2023-03-11" "2023-04-11" "a" "b" "c"
#>
#> $numna
#> [1] 0
#>
#> $type
#> [1] "date"
#>
#> $x
#> [1] 2023-03-11 2023-04-11 a b
#>
#> $numna
#> [1] 2
#>
#> $type
#> [1] "datetime"
#>
#> $x
#> [1] 2023-03-11 11:12:13 2023-04-11 11:13:14 a
#> [4] b
#>
#> $numna
#> [1] 2
#>
#> $type
#> [1] "datetime"
#>
#> $x
#> [1] 2023-03-11 11:12:00 2023-04-11 11:13:00 a
#> [4] b
#>
#> $numna
#> [1] 2
#>
#> $type
#> [1] "date"
#>
#> $x
#> [1] 2023-03-11 2023-04-11 a b
#>
#> $numna
#> [1] 2
#>
x <- c(paste0('2023-03-0', 1:9), 'a', 'a', 'a', 'b')
y <- testCharDateTime(x, convert=TRUE)$x
describe(y) # note counts of special missing values a, b
#> y
#> n missing a b distinct Info Mean
#> 9 4 3 1 9 1 2023-03-05
#> pMedian Gmd
#> 19421 3.333
#>
#> Value 2023-03-01 2023-03-02 2023-03-03 2023-03-04 2023-03-05 2023-03-06
#> Frequency 1 1 1 1 1 1
#> Proportion 0.111 0.111 0.111 0.111 0.111 0.111
#>
#> Value 2023-03-07 2023-03-08 2023-03-09
#> Frequency 1 1 1
#> Proportion 0.111 0.111 0.111
#>
#> For the frequency table, variable is rounded to the nearest 0