Match letters in R regex


Suppose I run the following

txt <- "client:A, field:foo, category:bar" grep("field:[A-z]+", txt, value = TRUE, perl = TRUE)

Based on <a href="http://regexr.com" rel="nofollow noreferrer">regexr.com</a> I expected I would get field:foo, but instead I get the entire string. Why is this?


You seem to want to extract the value. Use regmatches:

txt <- "client:A, field:foo, category:bar" regmatches(txt, regexpr("field:[[:alpha:]]+", txt)) # => [1] "field:foo"

See the <a href="https://ideone.com/K1vfTC" rel="nofollow">R demo</a>.

To match multiple occurrences, replace regexpr with gregexpr.

Or use <em>stringr</em> str_extract_all:

library(stringr) str_extract_all(text, "field:[a-zA-Z]+")

Another point is that <a href="https://stackoverflow.com/questions/29771901/why-is-this-regex-allowing-a-caret/29771926#29771926" rel="nofollow">[A-z] matches more than ASCII letters</a>. Use [[:alpha:]] in a TRE (regexpr / gregexpr with no perl=TRUE)/ICU (stringr) regex to match any letter.


