Re-encodes SGR and OSC encoded URL sequences into a unique decomposed form. Strings containing semantically identical SGR and OSC sequences that are encoded differently should compare equal after normalization.
Usage
normalize_state(
x,
warn = getOption("fansi.warn", TRUE),
term.cap = getOption("fansi.term.cap", dflt_term_cap()),
carry = getOption("fansi.carry", FALSE)
)Arguments
- x
a character vector or object that can be coerced to such.
- warn
TRUE (default) or FALSE, whether to warn when potentially problematic Control Sequences are encountered. These could cause the assumptions
fansimakes about how strings are rendered on your display to be incorrect, for example by moving the cursor (see?fansi). At most one warning will be issued per element in each input vector. Will also warn about some badly encoded UTF-8 strings, but a lack of UTF-8 warnings is not a guarantee of correct encoding (usevalidUTF8for that).- term.cap
character a vector of the capabilities of the terminal, can be any combination of "bright" (SGR codes 90-97, 100-107), "256" (SGR codes starting with "38;5" or "48;5"), "truecolor" (SGR codes starting with "38;2" or "48;2"), and "all". "all" behaves as it does for the
ctlparameter: "all" combined with any other value means all terminal capabilities except that one.fansiwill warn if it encounters SGR codes that exceed the terminal capabilities specified (seeterm_cap_testfor details). In versions prior to 1.0,fansiwould also skip exceeding SGRs entirely instead of interpreting them. You may add the string "old" to any otherwise validterm.capspec to restore the pre 1.0 behavior. "old" will not interact with "all" the way other valid values for this parameter do.- carry
TRUE, FALSE (default), or a scalar string, controls whether to interpret the character vector as a "single document" (TRUE or string) or as independent elements (FALSE). In "single document" mode, active state at the end of an input element is considered active at the beginning of the next vector element, simulating what happens with a document with active state at the end of a line. If FALSE each vector element is interpreted as if there were no active state when it begins. If character, then the active state at the end of the
carrystring is carried into the first element ofx(see "Replacement Functions" for differences there). The carried state is injected in the interstice between an imaginary zeroeth character and the first character of a vector element. See the "Position Semantics" section ofsubstr_ctland the "State Interactions" section of?fansifor details. Except forstrwrap_ctlwhereNAis treated as the string"NA",carrywill causeNAs in inputs to propagate through the remaining vector elements.
Details
Each compound SGR sequence is broken up into individual tokens, superfluous tokens are removed, and the SGR reset sequence "ESC[0m" (or "ESC[m") is replaced by the closing codes for whatever SGR styles are active at the point in the string in which it appears.
Unrecognized SGR codes will be dropped from the output with a warning. The
specific order of SGR codes associated with any given SGR sequence is not
guaranteed to remain the same across different versions of fansi, but
should remain unchanged except for the addition of previously uninterpreted
codes to the list of interpretable codes. There is no special significance
to the order the SGR codes are emitted in other than it should be consistent
for any given SGR state. URLs adjacent to SGR codes are always emitted after
the SGR codes irrespective of what side they were on originally.
OSC encoded URL sequences are always terminated by "ESC]\", and those between abutting URLs are omitted. Identical abutting URLs are merged. In order for URLs to be considered identical both the URL and the "id" parameter must be specified and be the same. OSC URL parameters other than "id" are dropped with a warning.
The underlying assumption is that each element in the vector is
unaffected by SGR or OSC URLs in any other element or elsewhere. This may
lead to surprising outcomes if these assumptions are untrue (see examples).
You may adjust this assumption with the carry parameter.
Normalization was implemented primarily for better compatibility with
crayon which emits SGR codes individually and assumes that each
opening code is paired up with its specific closing code, but it can also be
used to reduce the probability that strings processed with future versions of
fansi will produce different results than the current version.
See also
?fansi for details on how Control Sequences are
interpreted, particularly if you are getting unexpected results,
unhandled_ctl for detecting bad control sequences.
Examples
normalize_state("hello\033[42;33m world")
#> [1] "hello\033[33m\033[42m world"
normalize_state("hello\033[42;33m world\033[m")
#> [1] "hello\033[33m\033[42m world\033[39m\033[49m"
normalize_state("\033[4mhello\033[42;33m world\033[m")
#> [1] "\033[4mhello\033[33m\033[42m world\033[24m\033[39m\033[49m"
## Superflous codes removed
normalize_state("\033[31;32mhello\033[m") # only last color prevails
#> [1] "\033[32mhello\033[39m"
normalize_state("\033[31\033[32mhello\033[m") # only last color prevails
#> Warning: Argument `x` contains a non-SGR CSI or a non-URL OSC sequence with invalid substrings at index [1], see `?unhandled_ctl`; you can use `warn=FALSE` to turn off these warnings.
#> [1] "\033[31\033[32mhello"
normalize_state("\033[31mhe\033[49mllo\033[m") # unused closing
#> [1] "\033[31mhello\033[39m"
## Equivalent normalized sequences compare identical
identical(
normalize_state("\033[31;32mhello\033[m"),
normalize_state("\033[31mhe\033[49mllo\033[m")
)
#> [1] FALSE
## External SGR will defeat normalization, unless we `carry` it
red <- "\033[41m"
writeLines(
c(
paste(red, "he\033[0mllo", "\033[0m"),
paste(red, normalize_state("he\033[0mllo"), "\033[0m"),
paste(red, normalize_state("he\033[0mllo", carry=red), "\033[0m")
) )
#> hello
#> hello
#> hello