R/extract.R
extract.Rd
extract()
has been superseded in favour of separate_wider_regex()
because it has a more polished API and better handling of problems.
Superseded functions will not go away, but will only receive critical bug
fixes.
Given a regular expression with capturing groups, extract()
turns
each group into a new column. If the groups don't match, or the input
is NA, the output will be NA.
extract(
data,
col,
into,
regex = "([[:alnum:]]+)",
remove = TRUE,
convert = FALSE,
...
)
A data frame.
<tidy-select
> Column to expand.
Names of new variables to create as character vector.
Use NA
to omit the variable in the output.
A string representing a regular expression used to extract the
desired values. There should be one group (defined by ()
) for each
element of into
.
If TRUE
, remove input column from output data frame.
If TRUE
, will run type.convert()
with
as.is = TRUE
on new columns. This is useful if the component
columns are integer, numeric or logical.
NB: this will cause string "NA"
s to be converted to NA
s.
Additional arguments passed on to methods.
separate()
to split up by a separator.
df <- tibble(x = c(NA, "a-b", "a-d", "b-c", "d-e"))
df %>% extract(x, "A")
#> # A tibble: 5 × 1
#> A
#> <chr>
#> 1 NA
#> 2 a
#> 3 a
#> 4 b
#> 5 d
df %>% extract(x, c("A", "B"), "([[:alnum:]]+)-([[:alnum:]]+)")
#> # A tibble: 5 × 2
#> A B
#> <chr> <chr>
#> 1 NA NA
#> 2 a b
#> 3 a d
#> 4 b c
#> 5 d e
# Now recommended
df %>%
separate_wider_regex(
x,
patterns = c(A = "[[:alnum:]]+", "-", B = "[[:alnum:]]+")
)
#> # A tibble: 5 × 2
#> A B
#> <chr> <chr>
#> 1 NA NA
#> 2 a b
#> 3 a d
#> 4 b c
#> 5 d e
# If no match, NA:
df %>% extract(x, c("A", "B"), "([a-d]+)-([a-d]+)")
#> # A tibble: 5 × 2
#> A B
#> <chr> <chr>
#> 1 NA NA
#> 2 a b
#> 3 a d
#> 4 b c
#> 5 NA NA