----------------------------------------------
Control characters
\a The BEL control character 0x07 (alarm).
\b The BS control character 0x08 (backspace). This is only allowed inside a
character class definition. Otherwise, this means "a word boundary".
\e The ESC control character 0x1B.
\f The FF control character 0x0C (form feed).
\n The LF control character 0x0A (line feed). This is the regular end of line
under Unix systems.
\r The CR control character 0x0D (carriage return). This is part of the
DOS/Windows end of line sequence CR-LF, and was the EOL character on Mac 9 and
earlier. OSX and later versions use \n.
\R Any newline character.
\t The TAB control character 0x09 (tab, or hard tab, horizontal tab).
\Ccharacter The control character obtained from character by stripping all but its
6 lowest order bits. For instance, \C1, \CA and \Ca all stand for the SOH control
character 0x01.
\Ccharacter", \xnn,\x{nnnn</i>}
Like in search patterns, respectively the control character with the same low order
bits, the character with code 'nn and the character with code nnnn (requires
Unicode encoding).
\l
Causes next character to output in lowercase
\L
Causes next characters to be output in lowercase, until a \E is found.
\u
Causes next character to output in uppercase
\U
Causes next characters to be output in uppercase, until a \E is found.
\E
Puts an end to forced case mode initiated by \L or \U.
$&, $MATCH, ${^MATCH}
The whole matched text.
$`, $PREMATCH, ${^PREMATCH}
The text between the previous and current match, or the text before the match if
this is the first one.
$", $POSTMATCH, ${$POSTMATCH}
Everything that follows current match.
$LAST_SUBMATCH_RESULT, $^N
Returns what the last matching subexpression matched.
$+, $LAST_PAREN_MATCH
Returns what matched the last subexpression in the pattern.
$$
Returns $.
$n, ${n}, \n
Returns what matched the subexpression numbered n. Negative indices are not alowed.
$+{name}
Returns what matched subexpression named name.
Zero length matches
While, in normal or extended mode, there would be no point in looking for text of
length 0, this can very normally happen with regula expressions. For instance, to
add something at the beginning of a line, you'll search for "^" and replace with
whatever is to be added.
Notepad++ would select the match, bt there is no sensible way to select a stretch
zero character long. Whe this happens, a tooltip very similar to function call tips
is displayed instea, with a caret pointing upwards to the empty match.
A match was found at the first column of line 5.
Examples
These examples come from an earlier version of this page: Notepad++ RegExp Help, by
Author : Georg Dembowski
IMPORTANT
You have to check the box "regular expression" in search & replace dialog
When copying the strings out of here, pay close attention not to have additional
spaces in front of them! Then the RegExp will not work!
Example 0
How to replace/delete full lines according to a regex pattern? Let's say you wish
to delete all the lines in a file that contain the word "unused", without leaving
blank lines in their stead. This means you need to locate the line, remove it all,
and additionally remove its terminating newline.
So, you'd want to do this:: Find: ^.*?unused.*?$\R Replace with: nothing, not even
a space The regular expression appears to always work is to be read like this:
assert the start of a line
match some characters, stopping as early as required for the expression to match
the string you search in the file, "unused"
more characters, again stopping at the earliest necessary for the expression to
match
assert line ends
A newline character or sequence
Remember that .* gobbles everything to the end of line if ". matches newline" is
off, and to the end of file if the option is on!
Well, why is appears above in bold letters? Because this expression assumes each
line ends with an end of line sequence. This is almost always true, and may fail
for the last line in the file. It won't match and won't be deleted.
But the remedy is fairly simle: we translate in regex parlance that the newline
should match if it is there. So the correct expression actually is:
^.*?unused.*?$\R?
Example 1
You use a MediaWiki (e.g. Wikipedia, Wikitravel) and want to make all headings one
"level higher", so a H2 becomes a H1 etc.
Search ^=(=)
Replace with \1
Click "Replace all"
You do this to find all headings2...9 (two equal sign characters are required)
which begin at line beginning (^) and to replace the two equal sign characters by
only the last of the two, so eleminating one and having one remaining.
Search =(=)$
Replace with \1
Click "Replace all"
You do this to find all headings2...9 (two equal sign characters are required)
which end at line ending ($) and to replace the two equal sign characters by only
the last of the two, so eleminating one and having one remaining.
== title == became = title =, you're done :-)
Example 2
You have a document with a lot of dates, which are in German date format (dd.mm.yy)
and you'd like to transform them to sortable format (yy-mm-dd). Don't be afraid by
the length of the search term – it's long, but consiting of pretty easy and short
parts.
Do the following:
Search ([^0-9])([0123][0-9])\.([01][0-9])\.([0-9][0-9])([^0-9])
Replace with \1\4-\3-\2\5
Click "Replace all"
You do this to fetch
the day, whose first number can only be 0, 1, 2 or 3
the month, whose first number can only be 0 or 1
but only if the separator is . and not 'any character' ( . versus \. )
but only if no numbers are sourrounding the date, as then it might be an IP address
instead of a date
and to write all of this in the opposite order, except for the surroundings. Pay
attention: Whatever SEARCH matches will be deleted and only replaced by the stuff
in the REPLACE field, thus it is mandatory to have the surroundings in the REPLACE
field as well!
Outcome:
31.12.97 became 97-12-31
14.08.05 became 05-08-14
the IP address 14.13.14.14 did not change
You're done :-)
Example 3
You have printed in windows a file list using dir /b/s >filelist.txt to the file
filelist.txt and want to make local URLs out of them.
Open filelist.txt with Notepad++
Search \\
Replace with /
Click "Replace all" to change windows path separator char \ into URL path separator
char /
Search ^(.*)$
Replace with file:///\1
Click "Replace all" to add file:/// in the beginning of all lines
According on your requirements, preceed to escape some characters like space to %20
etc. C:\!\aktuell.csv became file:///C:/!/aktuell.csv
You're done :-)
Example 4
Another Search Replace Example
[Data]
AS AF AFG 004 Afghanistan
EU AX ALA 248 Åland Islands
EU AL ALB 008 Albania, People's Socialist Republic of
AF DZ DZA 012 Algeria, People's Democratic Republic of
OC AS ASM 016 American Samoa
EU AD AND 020 Andorra, Principality of
AF AO AGO 024 Angola, Republic of
NA AI AIA 660 Anguilla
AN AQ ATA 010 Antarctica (the territory South of 60 deg S)
NA AG ATG 028 Antigua and Barbuda
SA AR ARG 032 Argentina, Argentine Republic
AS AM ARM 051 Armenia
NA AW ABW 533 Aruba
OC AU AUS 036 Australia, Commonwealth of
Search for: ([A-Z]+) ([A-Z]+) ([A-Z]+) ([0-9]+) (.*)
Replace with: \1,\2,\3,\4,\5
Hit "Replace All"
Final Data:
AS,AF,AFG,004,Afghanistan
EU,AX,ALA,248,Åland Islands
EU,AL,ALB,008,Albania, People's Socialist Republic of
AF,DZ,DZA,012,Algeria, People's Democratic Republic of
OC,AS,ASM,016,American Samoa
EU,AD,AND,020,Andorra, Principality of
AF,AO,AGO,024,Angola, Republic of
NA,AI,AIA,660,Anguilla
AN,AQ,ATA,010,Antarctica (the territory South of 60 deg S)
NA,AG,ATG,028,Antigua and Barbuda
SA,AR,ARG,032,Argentina, Argentine Republic
AS,AM,ARM,051,Armenia
NA,AW,ABW,533,Aruba
OC,AU,AUS,036,Australia, Commonwealth of
Example 5
How to recognize a balanced expression, in mathematics or in programming?
Let's first explicitly describe what we wish to match. An expression is balanced if
and only if all areas delineatd by parentheses contain a balanced expression. Like
in: 1+f(x+g())-h(2).
This leads to define the following kinds of groups: balanced ::= no_paren paren ...
no_paren
no_paren = [^()]* -- a possibly empty group of characters without a single
parenthesis
paren ::= ( balanced )
Can we represent this as a regex? We cannot as-is.
The first hurdle is that there is no primitive construct to represent an
alternating sequence of tokens. A common trick then is to represent the sequence as
a repetition of the repeating pattern - here, no_paren followed by paren -, with
any odd stuff at the end added.
So we have a more manageable, although slightly more complex, representation:
balanced ::= simple* no_paren
simple ::= no_paren paren
no_paren ::= [^()]*
paren = ( balanced )
A second hurdle is that parentheses are not ordinary characters. That's ok, we'll
escape them as \( and \) respectively.
The third one is more interesting. How do we represent the whole of an expression
inside a nested sub-expression? This smacks of recursion. PCRE has recursion. The
simplest form of it is tgoing back to the start of the search pattern - not the
searched text! - and doing it again. It writes as (?R). You remember seeing this
one in the main list, right?
So:
we know how to match a no_paren. It will be nicer to give it an explicit name. This
we'll do in the embelishments section below.
we jusrtr discovered how to write a paren: \((?R)\)
This gives us the following hard to read, but correct regex:
([^()]*\((?R)\))*[^()]*
Try it, it works. But it is about as hard to decrypt as a badly indented piece of
code without a comment and with unpromising, unclear identifiers. This is only one
of the reasons why old Perl earned itself the rare qualifier of "write-only
language".
Embellishments
First of all, let's add some spacing so that we can identify the components of the
regex. Spacing can be added using the x modifier flag, which is off by default.
So we can write something more legible:
(?x: ([^ ( ) ]* \( (?R) \) )* [^()]* )
Now let's add some commenting
(?x: ([^ ( ) ]* \( (?# The next group means "start matching the
beginning of the regex")(?R) \) )* [^()]* )
In Perl, we could go further by assigning names to groups. However, in PCRE this
will not work, because any named group, once matched, won't change. This is
obviously not what we want.