Anda di halaman 1dari 1

pattern

regmatches(string, regexpr(pattern, string))


Cheat Sheet extract first match [1] "tam" "tim"
string regmatches(string, gregexpr(pattern, string))
extracts all matches, outputs a list
[[1]] "tam" [[2]] character(0) [[3]] "tim" "tom"
stringr::str_extract(string, pattern)
extract first match [1] "tam" NA "tim"
[[:digit:]] or \\d Digits; [0-9] stringr::str_extract_all(string, pattern)
\\D Non-digits; [^0-9] extract all matches, outputs a list
[[:lower:]] Lower-case letters; [a-z] > string <- c("Hiphopopotamus", "Rhymenoceros", "time for bottomless lyrics")
stringr::str_extract_all(string, pattern, simplify = TRUE)
[[:upper:]] Upper-case letters; [A-Z] > pattern <- "t.m"
extract all matches, outputs a matrix
[[:alpha:]] Alphabetic characters; [A-z]
stringr::str_match(string, pattern)
[[:alnum:]] Alphanumeric characters [A-z0-9]
extract first match + individual character groups
\\w Word characters; [A-z0-9_]
\\W Non-word characters grep(pattern, string) regexpr(pattern, string) stringr::str_match_all(string, pattern)
[[:xdigit:]] or \\x Hexadec. digits; [0-9A-Fa-f] [1] 1 3 find starting position and length of first match extract all matches + individual character groups
[[:blank:]] Space and tab grep(pattern, string, value = TRUE) gregexpr(pattern, string)
[[:space:]] or \\s Space, tab, vertical tab, newline, [1] "Hiphopopotamus" find starting position and length of all matches
form feed, carriage return [2] "time for bottomless lyrics stringr::str_locate(string, pattern)
\\S Not space; [^[:space:]] sub(pattern, replacement, string)
grepl(pattern, string) find starting and end position of first match replace first match
[[:punct:]] Punctuation characters; [1] TRUE FALSE TRUE stringr::str_locate_all(string, pattern) gsub(pattern, replacement, string)
!"#$%&()*+,-./:;<=>?@[]^_`{|}~
stringr::str_detect(string, pattern) find starting and end position of all matches replace all matches
Graphical char.;
[[:graph:]] [[:alnum:][:punct:]]
[1] TRUE FALSE TRUE
stringr::str_replace(string, pattern, replacement)
Printable characters; replace first match
[[:print:]] [[:alnum:][:punct:]\\s]
[[:cntrl:]] or \\c Control characters; \n, \r etc. stringr::str_replace_all(string, pattern, replacement)
strsplit(string, pattern) or stringr::str_split(string, pattern) replace all matches

\n New line . Any character except \n * Matches at least 0 times


^ Start of the string
\r Carriage return | Or, e.g. (a|b) + Matches at least 1 time
$ End of the string
\t Tab [] List permitted characters, e.g. [abc] ? Matches at most 1 time; optional string
\\b Empty string at either edge of a word
\v Vertical tab [a-z] Specify character ranges {n} Matches exactly n times
\\B NOT the edge of a word
\f Form feed [^] List excluded characters {n,} Matches at least n times
\\< Beginning of a word
() Grouping, enables back referencing using {,n} Matches at most n times
\\> End of a word
\\N where N is an integer {n,m} Matches between n and m times

(?=) Lookahead (requires PERL = TRUE),


e.g. (?=yx): position followed by 'xy' By default R uses POSIX extended regular Metacharacters (. * + etc.) can be used as By default the asterisk * is greedy, i.e. it always
(?!) Negative lookahead (PERL = TRUE); expressions. You can switch to PCRE regular literal characters by escaping them. Characters matches the longest possible string. It can be
position NOT followed by pattern expressions using PERL = TRUE for base or by can be escaped using \\ or by enclosing them used in lazy mode by adding ?, i.e. *?.
(?<=) Lookbehind (PERL = TRUE), e.g. wrapping patterns with perl() for stringr. in \\Q...\\E.
(?<=yx): position following 'xy' Greedy mode can be turned off using (?U). This
Negative lookbehind (PERL = TRUE); All functions can be used with literal searches switches the syntax, so that (?U)a* is lazy and
(?<!) position NOT following pattern using fixed = TRUE for base or by wrapping (?U)a*? is greedy.
patterns with fixed() for stringr. Regular expressions can be made case insensitive
?(if)then If-then-condition (PERL = TRUE); use
using (?i). In backreferences, the strings can be
lookaheads, optional char. etc in if-clause
All base functions can be made case insensitive converted to lower or upper case using \\L or \\U
?(if)then|else If-then-else-condition (PERL = TRUE) Regular expressions can conveniently be
by specifying ignore.cases = TRUE. (e.g. \\L\\1). This requires PERL = TRUE.
*see, e.g. http://www.regular-expressions.info/lookaround.html created using rex::rex().
http://www.regular-expressions.info/conditional.html

CC BY Ian Kopacka ian.kopacka@ages.at Updated: 09/16

Anda mungkin juga menyukai