str_extract()
extracts the first complete match from each string,
str_extract_all()
extracts all matches from each string.
str_extract(string, pattern, group = NULL)
str_extract_all(string, pattern, simplify = FALSE)
Input vector. Either a character vector, or something coercible to one.
Pattern to look for.
The default interpretation is a regular expression, as described in
vignette("regular-expressions")
. Use regex()
for finer control of the
matching behaviour.
Match a fixed string (i.e. by comparing only bytes), using
fixed()
. This is fast, but approximate. Generally,
for matching human text, you'll want coll()
which
respects character matching rules for the specified locale.
Match character, word, line and sentence boundaries with
boundary()
. An empty pattern, "", is equivalent to
boundary("character")
.
If supplied, instead of returning the complete match, will return the matched text from the specified capturing group.
A boolean.
FALSE
(the default): returns a list of character vectors.
TRUE
: returns a character matrix.
str_extract()
: an character vector the same length as string
/pattern
.
str_extract_all()
: a list of character vectors the same length as
string
/pattern
.
str_match()
to extract matched groups;
stringi::stri_extract()
for the underlying implementation.
shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
str_extract(shopping_list, "\\d")
#> [1] "4" NA NA "2"
str_extract(shopping_list, "[a-z]+")
#> [1] "apples" "bag" "bag" "milk"
str_extract(shopping_list, "[a-z]{1,4}")
#> [1] "appl" "bag" "bag" "milk"
str_extract(shopping_list, "\\b[a-z]{1,4}\\b")
#> [1] NA "bag" "bag" "milk"
str_extract(shopping_list, "([a-z]+) of ([a-z]+)")
#> [1] NA "bag of flour" "bag of sugar" NA
str_extract(shopping_list, "([a-z]+) of ([a-z]+)", group = 1)
#> [1] NA "bag" "bag" NA
str_extract(shopping_list, "([a-z]+) of ([a-z]+)", group = 2)
#> [1] NA "flour" "sugar" NA
# Extract all matches
str_extract_all(shopping_list, "[a-z]+")
#> [[1]]
#> [1] "apples" "x"
#>
#> [[2]]
#> [1] "bag" "of" "flour"
#>
#> [[3]]
#> [1] "bag" "of" "sugar"
#>
#> [[4]]
#> [1] "milk" "x"
#>
str_extract_all(shopping_list, "\\b[a-z]+\\b")
#> [[1]]
#> [1] "apples"
#>
#> [[2]]
#> [1] "bag" "of" "flour"
#>
#> [[3]]
#> [1] "bag" "of" "sugar"
#>
#> [[4]]
#> [1] "milk"
#>
str_extract_all(shopping_list, "\\d")
#> [[1]]
#> [1] "4"
#>
#> [[2]]
#> character(0)
#>
#> [[3]]
#> character(0)
#>
#> [[4]]
#> [1] "2"
#>
# Simplify results into character matrix
str_extract_all(shopping_list, "\\b[a-z]+\\b", simplify = TRUE)
#> [,1] [,2] [,3]
#> [1,] "apples" "" ""
#> [2,] "bag" "of" "flour"
#> [3,] "bag" "of" "sugar"
#> [4,] "milk" "" ""
str_extract_all(shopping_list, "\\d", simplify = TRUE)
#> [,1]
#> [1,] "4"
#> [2,] ""
#> [3,] ""
#> [4,] "2"
# Extract all words
str_extract_all("This is, suprisingly, a sentence.", boundary("word"))
#> [[1]]
#> [1] "This" "is" "suprisingly" "a" "sentence"
#>