Regular Expressions
Regular expressions behave like sophisticated wildcards (actually it's more accurate to say that wildcards are a very small subset of regular expressions). Regular expressions use symbols with special meanings, called metacharacters.
A regular expression, or regex for short, lets you do a number of things not possible in a literal-text search, such as requiring the search text to occur at the beginning or end of a line, searching for text that fits a certain pattern but isn't bound to specific characters, or searching across multiple lines. Multi-Edit provides support for three regex styles: Multi-Edit Classic, Unix-compatible (loosely based on the X/Open standard for extended expressions), and Perl 5.8-compatible. The Perl and UNIX styles support maximal and minimal matching, and UNIX expressions can span lines (the Perl style will soon have this capability as well).
The regular expression style to be used is set in the Tools | Customize | Search dialog.
Here is an example of what regular expressions can do. Given the following text in a document:
Test-drive the car of your choice.
Wow!!! This handles great!
What is the price?
The UNIX regex pattern
Test?*$Wow?*$What
Will match everything from “Test” on the first line to “What” on the third line.
Regular Expressions are a very powerful search tool, but one that carries a cost. Unless you use them often enough to become thoroughly familiar with them, on the odd occasion you do use them you'll find yourself puzzling through a pattern to make it find exactly what you're after, or worse, continually looking up the meanings of the various metacharacters. This can be particularly annoying when you've spent a lot of time figuring out a really great regex for some esoteric search, only to find that you can't remember it later. As well, regular expressions can appear quite cryptic and difficult to read, and leaving out or misplacing a single character can cause your search to fail, or to match against a totally unexpected string. Also, Multi-Edit supports three different regex styles, which means you must be careful not to forget which style you're using.
Multi-Edit provides two tools to make regexes easier to use: drop-down reminder menus, and expression aliases. The drop-down reminder menus list all the available metacharacters for the currently-selected regular expression style. To access them, simply click on the arrow button to the right of the “Search for” text box on any of the search menus; selecting a menu item inserts its associated metacharacter in the text box at the current cursor position. Note that the Perl-style drop-downs contain submenus grouped by type, with some of the most commonly-used metacharacters at the top of the main menu. If you're performing a replacement as well, there's a second drop-down menu button to the right of the “Replace with” text box, listing the expressions valid in a replace expression. Note that since replacement expressions aren't part of most regular expression standards, they vary from program to program; we've chosen replacement expressions in common use in major packages, such as Perl, with occasional variances where an expression wouldn't make sense in Multi-Edit's context, or would collide with something else.
Expression aliases allow you to save expressions under names of your choice, in any or all regex styles, with a short description; to use such a saved expression later, simply enter its name in the text box, and Multi-Edit will automatically expand it into the full expression internally. There are a number of aliases already defined for your use, and you can easily add your own to the list. The only syntactic requirement on alias names is that you must enclose them in angle-brackets, like this: <alias_name>. You'll probably also want to keep those names short, for ease of use. Aliases representing filenames and paths fully support long filenames, with embedded spaces. (See Example 1: Search for zero or more occurrences of any character next page for an example of creating an alias.)
Many of Multi-Edit's system macros use regex aliases; the search macro itself is one of these. You can use aliases in macros of your own composition, as well, using the techniques illustrated in the system macro header file RegExp.sh. The most important of these is to call the ReTranslate() macro to expand any aliases in your search string into their actual text before you run your search. You can also call ReSelectAlias() to display the Regular Expression Aliases dialog, which lists the currently-defined aliases; the macro's return string is the selected alias.
The Classic- and UNIX-style regular expressions deviate from the X/Open standard in that they don't treat strings of characters as groups, unless they're explicitly grouped using the grouping expressions specific to each style. The Perl 5.8-style regexes follow the spec. For example, Given the text
"The cat ate the car"
and the regular expression
"cat|r"
which is valid in all three styles, the Classic and UNIX styles match on the words “cat” and “car”, while the Perl 5.8 style matches on “cat” and the final “r” in “car”. This is because the Perl 5.8 engine interprets the regex (including group metacharacters for clarity) as
"(cat)|(r)"
while the other engines interpret it as
"ca(t|r)"
which does not, as observed, follow the X/Open specification. And now for something completely different.
Classic Regular Expressions
Search String Expressions Expressions that anchor a match to a location in the text:
%exp Text matched by "exp" must occur at the beginning of a line
exp$ Text matched by "exp" must occur at the end of a line
Quantifiers that determine how many times an expression can match:
C* Match zero or more occurrences of the previous character ("C") or grouped expression
C+ Match one or more occurrences of the previous character ("C") or grouped expression
Expressions that allow an “either/or” match:
C|C Match either the left or right character ("C") or grouped expression
Expressions that group and capture text:
{exp} Group the text matched by "exp" and capture it in one of #0, #1, etc.
(determine group numbers by counting "{" characters from the left of the search regex)
Expressions that match single characters:
? Match any character
[list] Match any character in the bracketed list of characters
[~list] Match any character except one in the bracketed list
@C Match the character "C" as a literal character
@a 0x07 BEL
@b 0x08 BS Backspace
@f 0x0C FF Page Break (Form Feed)
@n 0x0A LF Newline (Linefeed)
@r 0x0D CR Carriage Return
@t 0x09 HT Tab (Horizontal Tab)
@v 0x0B VT Vertical Tab
@xHH Byte specified by the two hexadecimal digits "HH"
Replace String Expressions
$ Insert a line break
#n Insert the text captured by group number "n" (the group beginning with the
(n - 1)th "{", since Classic groups are zero-based)
& Insert the complete matched text
% Delete the previous character
^ Leave cursor at this position in the replaced text
Unix Regular Expressions
Search String Expressions
Expressions that anchor a match to a location in the text:
^exp Text matched by "exp" must occur at the beginning of a line
exp$ Text matched by "exp" must occur at the end of a line
Quantifiers that determine how many times an expression can match:
C* Match zero or more occurrences of the previous character ("C") or grouped expression
(as many as possible)
C+ Match one or more occurrences of the previous character ("C") or grouped expression
(as many as possible)
C@ Match zero or more occurrences of the previous character ("C") or grouped expression
(as few as necessary)
C# Match one or more occurrences of the previous character ("C") or grouped expression
(as few as necessary)
Expressions that allow an “either/or” match:
C|C Match either the left or right character ("C") or grouped expression
Expressions that group and capture text:
(exp) Group the text matched by "exp" and capture it in one of \0, \1, etc.
(determine group numbers by counting "(" characters from the left of
the search regex)
Expressions that match single characters:
. Match any character
[list] Match any character in the bracketed list of characters
[^list] Match any character except one in the bracketed list
\C Match the character "C" as a literal character
\a 0x07 BEL
\b 0x08 BS Backspace
\f 0x0C FF Page Break (Form Feed)
\n 0x0A LF Newline (Linefeed)
\r 0x0D CR Carriage Return
\t 0x09 HT Tab (Horizontal Tab)
\v 0x0B VT Vertical Tab
\xHH Byte specified by the two hexadecimal digits "HH"
Replace String Expressions
$ Insert a line break
\n Insert the text captured by group number "n" (the group beginning
with the (n - 1)th "(", since UNIX groups are zero-based)
& Insert the complete matched text
% Delete the previous character
^ Leave cursor at this position in the replaced text
\a 0x07 BEL
\b 0x08 BS Backspace
\f 0x0C FF Page Break (Form Feed)
\n 0x0A LF Newline (Linefeed)
\r 0x0D CR Carriage Return
\t 0x09 HT Tab (Horizontal Tab)
\v 0x0B VT Vertical Tab
\xHH Byte specified by the two hexadecimal digits "HH"
Perl 5.8 Regular Expressions
Search Strings Expressions
Expressions that match single characters:
\a 0x07 BEL
\b 0x08 BS Backspace; only supported in character classes
\e 0x1B ESC Escape
\f 0x0C FF Page Break (Form Feed)
\n 0x0A LF Newline (Linefeed)
\r 0x0D CR Carriage Return
\t 0x09 HT Tab (Horizontal Tab)
\OOO Character specified by three octal [0-7] digits "OOO"
\xH, \xHH Character specified by one "H" or two "HH" hexadecimal digits
\x{H..H} Character specified by any string of hexadecimal digits "H..H"
Mainly intended for multi-byte Unicode, but can be corrupted by some regex
operations, such as backtracking
\cC Named control character: \cZ = Ctrl-Z; some platforms require that C be capital
. Match any character
[list] Match any character in the bracketed list of characters
[^list] Match any character except one in the bracketed list
\C Match the character "C" as a literal character
\w Word character, [a-zA-Z0-9_]
\W Non-word character, [^a-zA-Z0-9_]
\d Decimal digit, [0-9]
\D Non-digit, [^0-9]
\s Whitespace character, [\n\r\f\t ]
\S Non- whitespace character, [^\n\r\f\t ]
(?#text) Inline comment; embeds "text" in a regular expression
#text EOL comment; all "text" after the "#" is ignored
Expressions that anchor a match to a location in the text, or test a condition without consuming characters:
^exp Text matched by "exp" must occur at the start of the searched text or after a line break
exp$ Text matched by "exp" must occur at the end of the searched text or before a line break
\A Start of search string
\Z End of string or before EOS newline
\z End of string
\G Beginning of current search
\b Word boundary; between \w and \W or BOS/EOS
\B Not-word boundary
(?=exp) Positive lookahead; matches if the text after the current position
matches, but doesn't consume that text
(?!exp) Negative lookahead; matches if the text after the current position
doesn't match, but doesn't consume that text
(?<=exp) Positive lookbehind; matches if the text preceding the current
position matches; consumes no characters
(?<!exp) Negative lookbehind; matches if the text preceding the current
position doesn't match; consumes no
Expressions that allow an “either/or” match:
exp|exp Match the left expression; if that fails, attempt to match the right one
Expressions that group and capture text:
(exp) Group the text matched by "exp" and capture it in one of \1, \2, etc. (determine group
numbers by counting "(" characters from the left of the search regex)
(?P<name>exp)
Group and capture "exp" into named group "name"
\n Results of nth earlier group submatch
(?:exp) Group "exp" without capturing it
(?>exp) Group "exp", disallowing backtracking
exp|exp Attempt to match the left expression; if that fails, try the right expression
Quantifiers that determine how many times an expression can match:
exp* Match "exp" zero or more, but as many as possible
exp+ Match "exp" one or more times, but as many as possible
exp? Match "exp" zero or one times
exp{n} Match "exp" exactly n times
exp{n,} Match "exp" at least n times, but as many as possible
exp{min,max}
Match "exp" between "min" and "max" times, but as many as possible
exp*? Match "exp" zero or more times, but as few as necessary
exp+? Match "exp" one or more times, but as few as necessary
exp?? Match "exp" zero or one times, but as few as necessary
texp{n,}? Match "exp" at least n times, but as few as necessary
exp{min,max}?
Match "exp" between "min" and "max" times, but as few as necessary
exp*+ Match "exp" zero or more, but as many as possible; never backtrack
exp++ Match "exp" one or more times, but as many as possible; never backtrack
exp?+ Match "exp" zero or one times; never backtrack
exp{n}+ Match "exp" exactly n times; never backtrack
exp{n,}+ Match "exp" at least n times, but as many as possible; never backtrack
exp{min,max}+
Match "exp" between "min" and "max" times, but as many as possible; never backtrack
(?(condition)true_exp|false_exp)
If/else match; if "condition" is true, attempt to match
"true_exp"; else attempt to match "false_exp"; the condition can
be a capture group or a lookaround number
(?(condition)exp)
If match; if "condition" is true, attempt to match "exp" ;
the condition can be a capture group or a lookaround number
Modes:
i Case-insensitive match
m Multi-line: ^ and $ match next to embedded \n
s Single-line: dot metacharacter matches newlines
x Ignore whitespace and allow EOL comments in expression
Mode modifiers:
(?imsx) Turn listed modes on
(?-imsx) Turn listed modes off
(?modes:imsx) Toggle listed modes within parentheses
Quote delimiters:
\Q Quote all following metacharacters
\E End a span started with \Q
Replace String Expressions
$ Insert a line break
\n Insert the text captured by group number "n" (the group beginning
with the nth "(", since Perl 5.8 groups are one-based)
\0 Insert the complete matched text
% Delete the previous character
^ Leave cursor at this position in the replaced text
!(macro)
Run Multi-Edit macro within the parentheses; if the left parenthesis is absent, the text is
interpreted normally, while a missing right parenthesis will display an "Error 1018" dialog.
To force the string "!(" to be interpreted as normal text, preface the "!" with a backslash: "\!(".
\0 Insert the complete matched text
\a 0x07 BEL
\b 0x08 BS Backspace
\f 0x0C FF Page Break (Form Feed)
\n 0x0A LF Newline (Linefeed)
\r 0x0D CR Carriage Return
\t 0x09 HT Tab (Horizontal Tab)
\v 0x0B VT Vertical Tab
\xHH Byte specified by the two hexadecimal digits "HH"
Minimal vs. Maximal Closure
Regular Expression Quantifier Types Quantifiers are the metacharacter expressions that determine how many times a preceding expression can repeatedly match. There are three basic kinds: lazy, greedy, and possessive. Not all kinds are supported by all three regular expression types in Multi-Edit.
Minimal, or lazy, quantifiers match only as many characters as are necessary, even though a longer match might be possible. The lazy quantifiers for each style are:
Classic: N/A
UNIX: C@, C#
Perl 5.8: exp*?, exp+?, exp??, exp{n}?, exp{n,}?, exp{min,max}?
Maximal, or greedy, quantifiers match as many characters as possible, although (Perl-compatible only) they will backtrack to shorter matches if necessary to find a match for the entire search expression. Note that unless otherwise qualified, Perl 5.8 quantifiers are greedy by default, although they are not multi-line (use the multi-line mode modifier ”(?m)” before a Perl regex to make it match across line boundaries). The older UNIX expressions do not match across line boundaries. The greedy quantifiers for the three styles are:
Classic: C*, C+
UNIX: C*, C+
Perl 5.8: exp*, exp+, exp?, exp{n}, exp{n,}, exp{min,max}
possessive quantifiers (Perl-style only) behave like greedy quantifiers, except that they never backtrack; they always match the longest possible string, which can cause the search string to fail to match where a greedy quantifier might backtrack to an overall match, but they also avoid certain infinite-loop conditions that can happen with poorly-constructed regexes. The possessive quantifiers are:
Classic: N/A
UNIX: N/A
Perl 5.8: exp*+, exp++, exp?+, exp{n}+, exp{n,}+, exp{min,max}+
For example, given the following code fragment
if ( ( Wrap_Stat && Doc_Mode ) || ( Format_Stat && Indent_Style ) ) {
Use_Format_Line = True;
}
<code>
the greedy regex "\(.*\)", which matches the text between opening and closing parentheses, inclusively, matches
<code>
( ( Wrap_Stat && Doc_Mode ) || ( ( Format_Stat && Indent_Style ) )
whereas the lazy regex “\(.@\)” for UNIX, “\(.*?\)” for Perl, when run against the same text, matches
( ( Wrap_Stat && Doc_Mode )
The Perl 5.8 possessive regex “\(.*+\) \) {”, run against the same text, finds no match, because it's already consumed the last closing parenthesis with the subexpression ”.*+” and can't backtrack to “give up” the space and parenthesis necessary to match the entire expression.
Changing the regex to the greedy “\(.*\) \) {” matches against
( ( Wrap_Stat && Doc_Mode ) || ( Format_Stat && Indent_Style ) ) {
because the non-possessive form can backtrack to an earlier partial match that allows the rest of the expression to match as well.
Examples
Search For Zero Or More Occurrences Of Any Character
EXAMPLE 1: Search for zero or more occurrences of any character.
SEARCH EXPRESSION:
(Classic Style) ?*
(Unix Style) .@
(Perl Style) .*?
DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:
(Classic Style)
? Match any character * Match zero or more occurrences of the previous character
(Unix Style)
. Match any character @ Match zero or more occurrences of the previous character
(Perl Style)
. Match any character *? Match zero or more times
CREATING AN ALIAS:
If you want you can make an alias <*> to hold these regular expressions. In any search dialog click ‘Alias’ to bring up the Regular Expression Aliases dialog. Click ‘Insert’, and fill in the fields:
Description: *Star* Alias: <*> Classic: ?* Unix: .@ Perl: .*?
Now you can search for things like “Hello<*>Goodbye”. This will find the word “Hello”, followed by zero or more of any characters, followed by “Goodbye”.
Search For The '$' Character
EXAMPLE 2: Search for the '$' character.
SEARCH EXPRESSION:
(Classic Style) @$
(Unix and Perl Style) \$
DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:
(Classic Style)
@ Find literal. $ Search for '$' char
(Unix and Perl Style)
\ Find literal $ Search for '$' char
SUMMARY: Since the '$' symbol is a wildcard character and has special meaning in search and replace, an 'escape character' must be placed before it. The escape character tells Multi-Edit to treat the next character literally. For Multi-Edit Classic Style regular expressions the escape character is the '@' symbol. For Unix and Perl style regular expressions, the backslash '\' character is the escape character.
The escape character must be used when searching for any of the wildcard characters literally. The wildcard characters include:
(Classic Style) ^ $ * + @ # [ ] | ? % { } & ~
(Unix Style) ^ $ * + @ # . [ ] | ( ) \
(Perl Style) ^ $ * + @ # . [ ] | ( ) \ ? { }
Note that within a character class the metacharacters are different. In general, In UNIX and Perl classes the only metacharacters are -, [, ], and \. Note also that \b has a different meaning in a character class than outside one in Perl. The wildcard characters Multi-Edit recognizes in replace expressions are:
(Classic Style) $ # & % ^ (Unix Style) $ \ & % ^ (Perl Style) $ \ % ^
Find A Parenthesis Set
EXAMPLE 3: Find a parenthesis set.
SEARCH EXPRESSION:
(Classic Style) (?*)
(Unix Style) \(.*\)
(Perl Style) \(.*\) or (?m)\(.*\)
DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:
(Classic Style)
( Search for an open parenthesis ?* Followed by any number of occurrences (>=0) of any character ) Followed by a close parenthesis
(Unix Style)
\( Search for an open parenthesis .* Followed by maximal match of any number of occurrences (>=0) of any character \) Followed by a close parenthesis
(Perl Style)
(?m) Multi-line search, if present \( Search for an open parenthesis .* Followed by maximal match of any number of occurrences (>=0) of any character \) Followed by a close parenthesis
SUMMARY: Finds parenthesis sets by searching for an open parenthesis followed by any number of characters, of any kind, followed by a close parenthesis. Note that the Unix style and Perl style use maximal matching. The Perl expression performs a multi-line search if prefixed by ‘(?m)’.
Find Any Occurrence Of Single Or Double Quote Sets
EXAMPLE 4: Finds any occurrence of single or double quote sets.
SEARCH EXPRESSION:
(Classic Style) {"?*"}|{'?*'}
(Unix Style) (".*")|('.*')
(Perl Style) ([‘”]).*?\1 or (?m)([‘”]).*?\1
DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:
SUMMARY: Finds double or single quotes sets by searching for a quote, followed by any number of characters, of any kind, followed by a quote.
Find The Next Word
EXAMPLE 5: Finds the next word.
SEARCH EXPRESSION:
(Classic Style) %|[~a-z0-9_][a-z0-9_] (Unix Style) ^|[^a-z0-9_][a-z0-9_] (Perl Style) (?<!\w)(?=\w)
DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:
(Classic Style)
% Search for beginning of line | -OR- [~a-z0-9_] Search for any character that is not a letter between a-z, a number between 0-9, or an underscore [a-z0-9_] Search for any character that is a letter between a-z, a number between 0-9, or an underscore
(Unix Style)
^ Search for beginning of line | -OR- [^a-z0-9_] Search for any character that is not a letter between a-z, a number between 0-9, or an underscore [a-z0-9_] Search for any character that is a letter between a-z, a number between 0-9, or an underscore
(Perl Style)
(?<!\w) Search for a position not preceded by a word character (negative look-behind) (?=\w) and followed by a word character (look-ahead) The Unix expression would have also worked as a Perl expression, but is not as efficient as the Perl expression shown. If it doesn’t matter whether the expression matches the beginning or the ending of the word, the simple: \b to find a word boundary is sufficient. Note that the Perl expression, unlike the other two expressions, doesn’t select any text. That’s because it consists of two anchors. That also makes it quite nice as a sort of prefix to an extended expression; the prefix locates the start of a word, and the rest of the expression matches from there.
SUMMARY:
Search for the beginning of the line OR search for the first occurrence of a blank followed by a letter, digit, or underscore. In the Perl expression, search for a look-behind and look-ahead match that together define the start of a word; it isn’t necessary then to cover the case where the word occurs at the start of a line as a separate subexpression, as the look-behind condition covers it.
Search For The Following Operators
Searches for any of the following operators: '= > < ! | &'
SEARCH EXPRESSION:
(Classic Style) [=><!@|@&] (Unix and Perl Styles) [=<>!|&]
DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:
(Classic Style)
[ Begin character class = Search for '=' character > Search for '>' character < Search for '<' character ! Search for '!' character @| Search for '|' character literally. Since the | is a wildcard character an @ must be placed before it in order to search for the character literally (as opposed to interpreting it as one of the search expression operators) @& Search for '&' character ] End character class
(Unix and Perl Styles)
[ Begin character class = Search for '=' character < Search for '>' character > Search for '<' character ! Search for '!' character | Search for '|' character literally. Since the | is a wildcard character, it would ordinarily have to “escaped” by preceding it with a backslash to prevent the regex engine from interpreting it as such. Occurring as it does, however, inside a character class, this is not necessary, as the alternation operator would be meaningless in that context. & Search for '&' character ] End character class
Search For The Occurrence of A String And Replace It With The Found String In Parenthesis
EXAMPLE 7:
Searches for the occurrence of a string and replaces it with the found string in parenthesis, demonstrating the use of '&' and ‘\0’ as replace expression wildcard characters.
CURRENT STRING: Multi-Edit
DESIRED RESULT: (Multi-Edit)
SEARCH EXPRESSION:
(All Styles) Multi-Edit
REPLACE EXPRESSION:
(Classic & UNIX Styles) (&)
(Perl Style) (\0)
DETAILED EXPLANATION OF REPLACE EXPRESSION COMPONENTS:
(Classic and Unix Styles)
( Print a literal open parenthesis, & followed by the entire string matched by the search expression, ) and then a closing literal parenthesis
(Perl)
( Print a literal open parenthesis, \0 followed by the entire string matched by the search expression, ) and then a closing literal parenthesis
Search For A Blank Line And Delete It
EXAMPLE 8:
Searches for a blank line and deletes it.
SEARCH EXPRESSION:
(Classic Style) %$
(Unix and Perl Styles) ^$
DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:
(Classic Style)
% Search for the beginning of a line $ Followed by the end of a line
(Unix and Perl Styles)
^ Search for the beginning of a line $ Followed by the end of a line
REPLACE EXPRESSION:
(All Styles) %
DETAILED EXPLANATION OF REPLACE EXPRESSION COMPONENTS:
(All Styles) % Delete a character
SUMMARY:
Finds a blank line by searching for the beginning of a line immediately followed by the end of line. Using '%' (delete character) as the replace expression deletes the blank line.
Search For The First Occurrence Of A ';' And Delete The Rest Of The Line
EXAMPLE 9:
Searches for the first occurrence of a semicolon and deletes the rest of the line after it.
CURRENT STRING: goto_line(1);del_line;
DESIRED RESULT: goto_line(1);
SEARCH EXPRESSION:
(Classic Style) {?*;}{?*}$
(Unix Style) (.@;)(.*)$
(Perl Style) (.*?;)(.*)$
DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:
SUMMARY:
Find any character that occurs any number of times, (including 0 times) that is followed by a semicolon. This constitutes Group 0 (Group 1 in Perl). Following the previous match, find any character that occurs any number of times. This will constitute Group 1 (2 in Perl).
REPLACE EXPRESSION:
(Classic Style) #0
(Unix Style) \0
(Perl Style) \1
DETAILED EXPLANATION OF REPLACE EXPRESSION COMPONENTS:
(Classic Style)
#0 replace found string with group 0 from search string. (Unix Style) \0 replace found string with group 0 from search string. (Perl Style) \1 replace found string with group 1 from search string. Perl counts its sub-pattern matches starting at one, rather than zero, like the Classic and Unix styles. \0 is reserved for the text matched by the entire pattern.
SUMMARY:
In this example we are searching for two groups of strings, everything up to the ';' and everything else to the end of the line. When we replace the string we only replace the first group, which has the effect of deleting to the end of the line.
Swap The Parameters Of The Procedure gotoxy
EXAMPLE 10:
Swaps the parameters of the procedure 'gotoxy'.
CURRENT STRING: gotoxy(x,y);
DESIRED RESULT: gotoxy(y,x);
SEARCH EXPRESSION:
(Classic Style) gotoxy({?+},{?+});
(Unix Style) %%gotoxy\((.#),(.#)\);%%
(Perl Style) gotoxy\((.+?),(.+?)\);
DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:
SUMMARY:
Finds the literal string “gotoxy(”, followed by any string at least one character long (the first argument, captured as Group 1 by a Perl expression, Group 0 by the other sorts), followed by the comma that delimits the two arguments; then another capture expression identical to the first for the second parameter (Group 0 or 1), followed by the literal string “);”.
REPLACE EXPRESSION:
(Classic Style) gotoxy(#1,#0); (Unix Style) gotoxy(\1,\0); (Perl Style) gotoxy(\2,\1);
DETAILED EXPLANATION OF REPLACE EXPRESSION COMPONENTS:
(Classic Style)
gotoxy( Replace with the opening string “gotoxy(” #1 followed by Group 1 from the search expression , followed by the argument-delimiter comma #0 followed by Group 0 from the search expression ); followed by the closing string “);”
(Unix Style)
gotoxy( Replace with the opening string “gotoxy(” \1 followed by Group 1 from the search expression , followed by the argument-delimiter comma \0 followed by Group 0 from the search expression ); followed by the closing string “);”
(Perl Style)
gotoxy( Replace with the opening string “gotoxy(” \2 followed by Group 2 from the search expression , followed by the argument-delimiter comma \1 followed by Group 1 from the search expression ); followed by the closing string “);”
SUMMARY:
This example demonstrates the use of groups to change the order of your search string. Since we defined each parameter as a group, it was easy to swap them by changing their order in the replace expression.
Search For Two Semicolon Delimited Statements On A Single Line
EXAMPLE 11:
Searches for two semicolon delimited statements on a single line. Once found, a carriage return is inserted between them leaving each statement on its own line.
CURRENT STRING: goto_line(1);del_line; DESIRED RESULT: goto_line(1); del_line;
SEARCH EXPRESSION:
(Classic Style) {;}@x20*{[~ ]?*;} (@x20 is the space character a literal space would also work)
(Unix Style) (;)\x20@([^ ].@;) (\x20 is space in this style)
(Perl Style) (;)\s*?(\S.*?;)
DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:
SUMMARY:
Find the first occurrence of a semicolon on a line. This will constitute Group 0 (1 for Perl). Following this, find any number of occurrences of a space. Then find Group 1 (2 for Perl), which consists of the shortest matching string of at least one character, followed by a semicolon.
REPLACE EXPRESSION:
(Classic Style) #0$#1 (Unix Style) \0$\1 (Perl Style) \1$\2
DETAILED EXPLANATION OF REPLACE EXPRESSION COMPONENTS:
(Classic Style)
#0 Replace the matched string with whatever matched capture Group 0, $ followed by a line break #1 followed by whatever matched Group 1
(Unix Style)
\0 Replace the matched string with whatever matched capture Group 0, $ followed by a line break, \1 followed by whatever matched Group 1
(Perl Style)
\1 Replace the matched string with whatever matched capture Group 1, $ followed by a line break, \2 followed by whatever matched Group 2
SUMMARY: In this example, two groups are defined. The first consists of the first occurrence of a semicolon on a line. The second consists of any number of characters (excluding a space) followed by a semicolon. When the found string is replaced a carriage return is inserted between the two groups, leaving each one on its own line.
Search And Replace Phrases
EXAMPLE 12:
Searches for the phrases “this is a test” OR “this is not a test” and replaces “a test” with “a pizza”
CURRENT STRING: this is a test
- OR -
this is not a test
DESIRED RESULT: this is a pizza
- OR -
this is not a pizza
SEARCH EXPRESSION:
(Classic Style) this i{{s not}|{s}} a test
(Unix Style) this i((s not)|(s)) a test
(Perl Style) this is( not)? a test
DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:
SUMMARY:
Match “this i”, followed by either “s not” OR “s” (which constitutes group #0), followed by “a test”. For the Perl version, find “this is” , possibly followed by just one “ not”, followed by “ a test”. This is only possible in the Perl version, since only it has a “none or one” control, which is necessary to lock out false matches (repeated “ not”).
REPLACE EXPRESSION:
(Classic Style) this i#0 a pizza (Unix Style) this i\0 a pizza (Perl Style) this is\1 a pizza
SUMMARY:
Replace the complete matched string with the literal-string prefix “this i” (with an extra “s” in the Perl version), concatenated with Group 0 (1 for Perl) from the search expression, then with the literal-string suffix. “a pizza”. There is a very important detail to be noted in the Classic and Unix search expressions in this example: “s not” came before “s” in the alternation.
This was done intentionally. Since “OR” expressions are evaluated from left to right, you must pay close attention to the order in which its components appear. Once the regex engine matches any component of the expression, it won’t evaluate any of the remaining components for a match. Since the Perl version doesn’t use alternation, it isn’t an issue here; however, in Perl expressions that do use alternation, it is still an issue. If “s” had been placed before “s not” the following would result when encountering the line “this is not a test” (the Unix version only differs in that the curly braces become parentheses):
FOUND TEXT: xx this is not a test xxx xx
RESULT: Equivalent of no match, no replacement is made.
The “s” portion of the line would match, terminating evaluation of the other branch of the alternation. The remaining component “s not” would never be evaluated. At that point, the engine would attempt to match the next unmatched part of the expression, “ a test”, and would fail, because the searched text at that point is “not a test”.
The engine will always attempt to backtrack until it finds a match (whenever possible, expressions attempt to “give up” already-matched text to backtrack to a prior state from which a match down a different path may be possible), unless it exhausts the text without finding a match, is prevented from backtracking (note the operators explicitly intended for that), or arrives at a state where the available search space expands exponentially (you’ll likely think Multi-Edit is hung, but it’s the result of a poorly-formulated regex). Always keep this in mind when constructing your own search expressions.