Used in textrecipes::step_tokenize_sentencepiece() and
textrecipes::step_tokenize_bpe().
vocabulary_size(range = c(1000L, 32000L), trans = NULL)A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units.
A trans object from the scales package, such as
scales::transform_log10() or scales::transform_reciprocal(). If not provided,
the default is used which matches the units used in range. If no
transformation, NULL.
vocabulary_size()
#> # Unique Tokens in Vocabulary (quantitative)
#> Range: [1000, 32000]