Computes shortest edit script to convert a
into b
by removing
elements from a
and adding elements from b
. Intended primarily
for debugging or for other applications that understand that particular
format. See GNU diff docs
for how to interpret the symbols.
character
character
integer(1L), number of differences (default 50000L)
after which we abandon the O(n^2)
diff algorithm in favor of a naive
O(n)
one. Set to -1L
to stick to the original algorithm up to
the maximum allowed (~INT_MAX/4).
TRUE (default) or FALSE whether to warn if we hit
max.diffs
.
TRUE (default) or FALSE, whether to also return the indices in
a
and b
the diff values are taken from. Set to FALSE for a
small performance gain.
character shortest edit script, or a machine readable version of it
as a ses_dat
object, which is a data.frame
with columns
op
(factor, values “Match”, “Insert”, or
“Delete”), val
character corresponding to the value taken
from either a
or b
, and if extra
is TRUE, integer
columns id.a
and id.b
corresponding to the indices in
a
or b
that val
was taken from. See Details.
ses
will be much faster than any of the
diff*
methods, particularly for large inputs with
limited numbers of differences.
NAs are treated as the string “NA”. Non-character inputs are coerced to character.
ses_dat
provides a semi-processed “machine-readable” version of
precursor data to ses
that may be useful for those desiring to use the
raw diff data and not the printed output of diffobj
, but do not wish
to manually parse the ses
output. Whether it is faster than
ses
or not depends on the ratio of matching to non-matching values as
ses_dat
includes matching values whereas ses
does not.
ses_dat
objects have a print method that makes it easy to interpret
the diff, but are actually data.frames. You can see the underlying data by
using as.data.frame
, removing the "ses_dat" class, etc..
a <- letters[1:6]
b <- c('b', 'CC', 'DD', 'd', 'f')
ses(a, b)
#> [1] "1d0" "3c2,3" "5d4"
(dat <- ses_dat(a, b))
#> "ses_dat" object (Match: 3, Delete: 3, Insert: 2):
#>
#> D: a c e
#> M: b d f
#> I: CC DD
str(dat) # data.frame with a print method
#> Classes ‘ses_dat’ and 'data.frame': 8 obs. of 4 variables:
#> $ op : Factor w/ 3 levels "Match","Insert",..: 3 1 3 2 2 1 3 1
#> $ val : chr "a" "b" "c" "CC" ...
#> $ id.a: int 1 2 3 NA NA 4 5 6
#> $ id.b: int NA NA NA 2 3 NA NA NA
## use `ses_dat` output to construct a minimal diff
## color with ANSI CSI SGR
diff <- dat[['val']]
del <- dat[['op']] == 'Delete'
ins <- dat[['op']] == 'Insert'
if(any(del))
diff[del] <- paste0("\033[33m- ", diff[del], "\033[m")
if(any(ins))
diff[ins] <- paste0("\033[34m+ ", diff[ins], "\033[m")
if(any(!ins & !del))
diff[!ins & !del] <- paste0(" ", diff[!ins & !del])
writeLines(diff)
#> - a
#> b
#> - c
#> + CC
#> + DD
#> d
#> - e
#> f
## We can recover `a` and `b` from the data
identical(subset(dat, op != 'Insert', val)[[1]], a)
#> [1] TRUE
identical(subset(dat, op != 'Delete', val)[[1]], b)
#> [1] TRUE