Regular Expressions

Regular expressions behave like sophisticated wildcards (actually it's more accurate to say that wildcards are a very small subset of regular expressions). Regular expressions use symbols with special meanings, called metacharacters.

A regular expression, or regex for short, lets you do a number of things not possible in a literal-text search, such as requiring the search text to occur at the beginning or end of a line, searching for text that fits a certain pattern but isn't bound to specific characters, or searching across multiple lines. Multi-Edit provides support for three regex styles: Multi-Edit Classic, Unix-compatible (loosely based on the X/Open standard for extended expressions), and Perl 5.8-compatible. The Perl and UNIX styles support maximal and minimal matching, and UNIX expressions can span lines (the Perl style will soon have this capability as well).

The regular expression style to be used is set in the Tools | Customize | Search dialog.

Here is an example of what regular expressions can do. Given the following text in a document:

Test-drive the car of your choice.
	Wow!!! This handles great!
	What is the price?

The UNIX regex pattern
    Test?*$Wow?*$What

Will match everything from “Test” on the first line to “What” on the third line.

Regular Expressions are a very powerful search tool, but one that carries a cost. Unless you use them often enough to become thoroughly familiar with them, on the odd occasion you do use them you'll find yourself puzzling through a pattern to make it find exactly what you're after, or worse, continually looking up the meanings of the various metacharacters. This can be particularly annoying when you've spent a lot of time figuring out a really great regex for some esoteric search, only to find that you can't remember it later. As well, regular expressions can appear quite cryptic and difficult to read, and leaving out or misplacing a single character can cause your search to fail, or to match against a totally unexpected string. Also, Multi-Edit supports three different regex styles, which means you must be careful not to forget which style you're using.

Multi-Edit provides two tools to make regexes easier to use: drop-down reminder menus, and expression aliases. The drop-down reminder menus list all the available metacharacters for the currently-selected regular expression style. To access them, simply click on the arrow button to the right of the “Search for” text box on any of the search menus; selecting a menu item inserts its associated metacharacter in the text box at the current cursor position. Note that the Perl-style drop-downs contain submenus grouped by type, with some of the most commonly-used metacharacters at the top of the main menu. If you're performing a replacement as well, there's a second drop-down menu button to the right of the “Replace with” text box, listing the expressions valid in a replace expression. Note that since replacement expressions aren't part of most regular expression standards, they vary from program to program; we've chosen replacement expressions in common use in major packages, such as Perl, with occasional variances where an expression wouldn't make sense in Multi-Edit's context, or would collide with something else.

Expression aliases allow you to save expressions under names of your choice, in any or all regex styles, with a short description; to use such a saved expression later, simply enter its name in the text box, and Multi-Edit will automatically expand it into the full expression internally. There are a number of aliases already defined for your use, and you can easily add your own to the list. The only syntactic requirement on alias names is that you must enclose them in angle-brackets, like this: <alias_name>. You'll probably also want to keep those names short, for ease of use. Aliases representing filenames and paths fully support long filenames, with embedded spaces. (See Example 1: Search for zero or more occurrences of any character next page for an example of creating an alias.)

Many of Multi-Edit's system macros use regex aliases; the search macro itself is one of these. You can use aliases in macros of your own composition, as well, using the techniques illustrated in the system macro header file RegExp.sh. The most important of these is to call the ReTranslate() macro to expand any aliases in your search string into their actual text before you run your search. You can also call ReSelectAlias() to display the Regular Expression Aliases dialog, which lists the currently-defined aliases; the macro's return string is the selected alias.

IMPORTANT: never rename or modify any of the predefined aliases unless you're very familiar with the workings of Multi-Edit's system macros, and quite sure of what you're doing; it's all too easy to break major program subsystems, since, as mentioned above, many system macros depend on them.

The Classic- and UNIX-style regular expressions deviate from the X/Open standard in that they don't treat strings of characters as groups, unless they're explicitly grouped using the grouping expressions specific to each style. The Perl 5.8-style regexes follow the spec. For example, Given the text

    "The cat ate the car"

and the regular expression

    "cat|r"

which is valid in all three styles, the Classic and UNIX styles match on the words “cat” and “car”, while the Perl 5.8 style matches on “cat” and the final “r” in “car”. This is because the Perl 5.8 engine interprets the regex (including group metacharacters for clarity) as

    "(cat)|(r)"

while the other engines interpret it as

    "ca(t|r)"

which does not, as observed, follow the X/Open specification. And now for something completely different.

Classic Regular Expressions

Search String Expressions Expressions that anchor a match to a location in the text:

    
    %exp	Text matched by "exp" must occur at the beginning of a line
    exp$	Text matched by "exp" must occur at the end of a line

Quantifiers that determine how many times an expression can match:

    C*		Match zero or more occurrences of the previous character ("C") or grouped expression
    C+		Match one or more occurrences of the previous character ("C") or grouped expression

Expressions that allow an “either/or” match:

    C|C		Match either the left or right character ("C") or grouped expression

Expressions that group and capture text:

    {exp}	Group the text matched by "exp" and capture it in one of #0, #1, etc.
            	(determine group numbers by counting "{" characters from the left of the search regex)

Expressions that match single characters:

    ?		Match any character
    [list]	Match any character in the bracketed list of characters
    [~list]	Match any character except one in the bracketed list
    @C		Match the character "C" as a literal character
    @a		0x07	BEL
    @b		0x08	BS	Backspace
    @f		0x0C	FF	Page Break (Form Feed)
    @n	 	0x0A	LF	Newline (Linefeed)		
    @r	 	0x0D	CR	Carriage Return
    @t	 	0x09	HT	Tab (Horizontal Tab)
    @v	 	0x0B	VT	Vertical Tab
    @xHH      Byte specified by the two hexadecimal digits "HH"

Replace String Expressions

    $		Insert a line break
    #n		Insert the text captured by group number "n" (the group beginning with the 
  		(n - 1)th "{", since Classic groups are zero-based)
    &		Insert the complete matched text
    %		Delete the previous character
    ^		Leave cursor at this position in the replaced text

Unix Regular Expressions

Search String Expressions

Expressions that anchor a match to a location in the text:

    ^exp	Text matched by "exp" must occur at the beginning of a line
    exp$	Text matched by "exp" must occur at the end of a line

Quantifiers that determine how many times an expression can match:

    C*		Match zero or more occurrences of the previous character ("C") or grouped expression 
   		(as many as possible)
    C+		Match one or more occurrences of the previous character ("C") or grouped expression 
   		(as many as possible)
    C@		Match zero or more occurrences of the previous character ("C") or grouped expression 
  		(as few as necessary)
    C#		Match one or more occurrences of the previous character ("C") or grouped expression 
  		(as few as necessary)

Expressions that allow an “either/or” match:

    C|C		Match either the left or right character ("C") or grouped expression

Expressions that group and capture text:

    (exp)	Group the text matched by "exp" and capture it in one of \0, \1, etc. 
                (determine group numbers by counting "(" characters from the left of 
                the search regex)

Expressions that match single characters:

    .		Match any character
    [list]	Match any character in the bracketed list of characters
    [^list]	Match any character except one in the bracketed list
    \C		Match the character "C" as a literal character
    \a		0x07	BEL
    \b		0x08	BS	Backspace
    \f		0x0C	FF	Page Break (Form Feed)
    \n		0x0A	LF	Newline (Linefeed)
    \r		0x0D	CR	Carriage Return
    \t		0x09	HT	Tab (Horizontal Tab)
    \v		0x0B	VT	Vertical Tab
    \xHH   Byte specified by the two hexadecimal digits "HH"

Replace String Expressions

$	Insert a line break
\n	Insert the text captured by group number "n" (the group beginning
        with the (n - 1)th "(", since UNIX groups are zero-based)
&	Insert the complete matched text
%	Delete the previous character
^	Leave cursor at this position in the replaced text
\a	0x07	BEL
\b	0x08	BS	Backspace
\f	0x0C	FF	Page Break (Form Feed)
\n	0x0A	LF	Newline (Linefeed)
\r	0x0D	CR	Carriage Return
\t	0x09	HT	Tab (Horizontal Tab)
\v	0x0B	VT	Vertical Tab
\xHH	Byte specified by the two hexadecimal digits "HH"

Perl 5.8 Regular Expressions

Search Strings Expressions

Expressions that match single characters:

    \a		0x07	BEL
    \b		0x08	BS	Backspace; only supported in character classes
    \e		0x1B	ESC	Escape
    \f		0x0C	FF	Page Break (Form Feed)
    \n		0x0A	LF	Newline (Linefeed)
    \r		0x0D	CR	Carriage Return
    \t		0x09	HT	Tab (Horizontal Tab)
    \OOO	Character specified by three octal [0-7] digits "OOO"
    \xH, \xHH	Character specified by one "H" or two "HH" hexadecimal digits
    \x{H..H}	Character specified by any string of hexadecimal digits "H..H"
               	Mainly intended for multi-byte Unicode, but can be corrupted by some regex 
  		operations, such as backtracking
    \cC		Named control character:  \cZ = Ctrl-Z; some platforms require that C be capital
    .		Match any character
    [list]	Match any character in the bracketed list of characters
    [^list]	Match any character except one in the bracketed list
    \C		Match the character "C" as a literal character
    \w		Word character, [a-zA-Z0-9_]
    \W		Non-word character, [^a-zA-Z0-9_]
    \d		Decimal digit, [0-9]
    \D		Non-digit, [^0-9]
    \s		Whitespace character, [\n\r\f\t ]
    \S		Non- whitespace character, [^\n\r\f\t ]
    (?#text)	Inline comment; embeds "text" in a regular expression
    #text	EOL comment; all "text" after the "#" is ignored

Expressions that anchor a match to a location in the text, or test a condition without consuming characters:

    ^exp	Text matched by "exp" must occur at the start of the searched text or after a line break
    exp$	Text matched by "exp" must occur at the end of the searched text or before a line break
    \A		Start of search string
    \Z		End of string or before EOS newline
    \z		End of string
    \G		Beginning of current search
    \b		Word boundary; between \w and \W or BOS/EOS
    \B		Not-word boundary
    (?=exp)    	Positive lookahead; matches if the text after the current position 
 	      	matches, but doesn't consume that text
    (?!exp)     Negative lookahead; matches if the text after the current position 
 	      	doesn't match, but doesn't consume that text
    (?<=exp)  	Positive lookbehind; matches if the text preceding the current
    	       	position matches; consumes no characters
    (?<!exp)   	Negative lookbehind; matches if the text preceding the current
 	       	position doesn't match; consumes no

Expressions that allow an “either/or” match:

    exp|exp	Match the left expression; if that fails, attempt to match the right one

Expressions that group and capture text:

    (exp)	Group the text matched by "exp" and capture it in one of \1, \2, etc. (determine group
 		numbers by counting "(" characters from the left of the search regex)
    (?P<name>exp)      
 		Group and capture "exp" into named group "name"
    \n		Results of nth earlier group submatch
    (?:exp)	Group "exp" without capturing it
    (?>exp)	Group "exp", disallowing backtracking
    exp|exp	Attempt to match the left expression; if that fails, try the right expression

Quantifiers that determine how many times an expression can match:

    exp*	Match "exp" zero or more, but as many as possible
    exp+	Match "exp" one or more times, but as many as possible
    exp?	Match "exp" zero or one times
    exp{n}	Match "exp" exactly n times
    exp{n,}	Match "exp" at least n times, but as many as possible
    exp{min,max}   
 		Match "exp" between "min" and "max" times, but as many as possible
    exp*?	Match "exp" zero or more times, but as few as necessary
    exp+?	Match "exp" one or more times, but as few as necessary
    exp??	Match "exp" zero or one times, but as few as necessary
    texp{n,}?	Match "exp" at least n times, but as few as necessary
    exp{min,max}?
  	        Match "exp" between "min" and "max" times, but as few as necessary
    exp*+	Match "exp" zero or more, but as many as possible; never backtrack 
    exp++	Match "exp" one or more times, but as many as possible; never backtrack
    exp?+	Match "exp" zero or one times; never backtrack
    exp{n}+	Match "exp" exactly n times; never backtrack
    exp{n,}+	Match "exp" at least n times, but as many as possible; never backtrack
    exp{min,max}+
 		Match "exp" between "min" and "max" times, but as many as possible; never backtrack
(?(condition)true_exp|false_exp)
 		If/else match; if "condition" is true, attempt to match
 	             "true_exp"; else attempt to match "false_exp"; the condition can
 	             be a capture group or a lookaround number
(?(condition)exp)
 	              If match; if "condition" is true, attempt to match "exp" ;
 	              the condition can be a capture group or a lookaround number

Modes:

    i		Case-insensitive match
    m		Multi-line:  ^ and $ match next to embedded \n
    s		Single-line:  dot metacharacter matches newlines
    x		Ignore whitespace and allow EOL comments in expression

Mode modifiers:

    (?imsx)		Turn listed modes on
    (?-imsx)		Turn listed modes off
    (?modes:imsx) 	Toggle listed modes within parentheses

Quote delimiters:

    \Q 		Quote all following metacharacters
    \E		End a span started with \Q

Replace String Expressions

$	Insert a line break
\n	Insert the text captured by group number "n" (the group beginning
      	with the nth "(", since Perl 5.8 groups are one-based)
\0	Insert the complete matched text
%	Delete the previous character
^	Leave cursor at this position in the replaced text
!(macro)    
  	Run Multi-Edit macro within the parentheses; if the left parenthesis is absent, the text is 
 	interpreted normally, while a missing right parenthesis will display an "Error 1018" dialog.  
        To force the string "!(" to be interpreted as normal text, preface the "!" with a backslash:  "\!(".
\0	Insert the complete matched text
\a	0x07	BEL
\b	0x08	BS	Backspace
\f	0x0C	FF	Page Break (Form Feed)
\n	0x0A	LF	Newline (Linefeed)
\r	0x0D	CR	Carriage Return
\t	0x09	HT	Tab (Horizontal Tab)
\v	0x0B	VT	Vertical Tab
\xHH	Byte specified by the two hexadecimal digits "HH"

Minimal vs. Maximal Closure

Regular Expression Quantifier Types Quantifiers are the metacharacter expressions that determine how many times a preceding expression can repeatedly match. There are three basic kinds: lazy, greedy, and possessive. Not all kinds are supported by all three regular expression types in Multi-Edit.

Minimal, or lazy, quantifiers match only as many characters as are necessary, even though a longer match might be possible. The lazy quantifiers for each style are:

    Classic:	N/A
    UNIX:	C@, C#
    Perl 5.8:	exp*?, exp+?, exp??, exp{n}?, exp{n,}?, exp{min,max}?

Maximal, or greedy, quantifiers match as many characters as possible, although (Perl-compatible only) they will backtrack to shorter matches if necessary to find a match for the entire search expression. Note that unless otherwise qualified, Perl 5.8 quantifiers are greedy by default, although they are not multi-line (use the multi-line mode modifier ”(?m)” before a Perl regex to make it match across line boundaries). The older UNIX expressions do not match across line boundaries. The greedy quantifiers for the three styles are:

    Classic:	C*, C+
    UNIX:	C*, C+
    Perl 5.8:	exp*, exp+, exp?, exp{n}, exp{n,}, exp{min,max}

possessive quantifiers (Perl-style only) behave like greedy quantifiers, except that they never backtrack; they always match the longest possible string, which can cause the search string to fail to match where a greedy quantifier might backtrack to an overall match, but they also avoid certain infinite-loop conditions that can happen with poorly-constructed regexes. The possessive quantifiers are:

    Classic:	N/A
    UNIX:	N/A
    Perl 5.8:	exp*+, exp++, exp?+, exp{n}+, exp{n,}+, exp{min,max}+

For example, given the following code fragment

    if ( ( Wrap_Stat && Doc_Mode ) || ( Format_Stat && Indent_Style ) ) {
             Use_Format_Line = True;
    }
<code>

the greedy regex "\(.*\)", which matches the text between opening and closing parentheses, inclusively, matches 

<code>
    ( ( Wrap_Stat && Doc_Mode ) || ( ( Format_Stat && Indent_Style ) )

whereas the lazy regex “\(.@\)” for UNIX, “\(.*?\)” for Perl, when run against the same text, matches

    ( ( Wrap_Stat && Doc_Mode )

The Perl 5.8 possessive regex “\(.*+\) \) {”, run against the same text, finds no match, because it's already consumed the last closing parenthesis with the subexpression ”.*+” and can't backtrack to “give up” the space and parenthesis necessary to match the entire expression.

Changing the regex to the greedy “\(.*\) \) {” matches against

    ( ( Wrap_Stat && Doc_Mode ) || ( Format_Stat && Indent_Style ) ) {

because the non-possessive form can backtrack to an earlier partial match that allows the rest of the expression to match as well.

Examples

Search For Zero Or More Occurrences Of Any Character

EXAMPLE 1: Search for zero or more occurrences of any character.

SEARCH EXPRESSION: (Classic Style) ?*
(Unix Style) .@
(Perl Style) .*?

DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:

(Classic Style)

	?	Match any character
	*	Match zero or more occurrences of the previous character

(Unix Style)

	.	Match any character
	@	Match zero or more occurrences of the previous character

(Perl Style)

	.	Match any character
	*?	Match zero or more times

CREATING AN ALIAS:

If you want you can make an alias <*> to hold these regular expressions. In any search dialog click ‘Alias’ to bring up the Regular Expression Aliases dialog. Click ‘Insert’, and fill in the fields:

Description: *Star*
Alias: <*>
Classic: ?*
Unix: .@
Perl: .*?

Now you can search for things like “Hello<*>Goodbye”. This will find the word “Hello”, followed by zero or more of any characters, followed by “Goodbye”.

Search For The '$' Character

EXAMPLE 2: Search for the '$' character.

SEARCH EXPRESSION: (Classic Style) @$
(Unix and Perl Style) \$

DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:

(Classic Style)

@	Find literal.
$	Search for '$' char

(Unix and Perl Style)

\	Find literal
$	Search for '$' char

SUMMARY: Since the '$' symbol is a wildcard character and has special meaning in search and replace, an 'escape character' must be placed before it. The escape character tells Multi-Edit to treat the next character literally. For Multi-Edit Classic Style regular expressions the escape character is the '@' symbol. For Unix and Perl style regular expressions, the backslash '\' character is the escape character.

The escape character must be used when searching for any of the wildcard characters literally. The wildcard characters include:

(Classic Style)	^ $ * + @ # [ ] | ? % { } & ~ 
(Unix Style)	^ $ * + @ # . [ ] | ( ) \
(Perl Style)	^ $ * + @ # . [ ] | ( ) \ ? { }

Note that within a character class the metacharacters are different. In general, In UNIX and Perl classes the only metacharacters are -, [, ], and \. Note also that \b has a different meaning in a character class than outside one in Perl. The wildcard characters Multi-Edit recognizes in replace expressions are:

(Classic Style)	$ # & % ^
(Unix Style)	$ \ & % ^
(Perl Style)	$ \ % ^

Find A Parenthesis Set

EXAMPLE 3: Find a parenthesis set.

SEARCH EXPRESSION: (Classic Style) (?*)
(Unix Style) \(.*\)
(Perl Style) \(.*\) or (?m)\(.*\)

DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:

(Classic Style)

(		Search for an open parenthesis
	?*	Followed by any number of occurrences (>=0) of any character
)		Followed by a close parenthesis

(Unix Style)

\(		Search for an open parenthesis
	.*	Followed by maximal match of any number of occurrences (>=0) of any character
\)		Followed by a close parenthesis

(Perl Style)

(?m)		Multi-line search, if present
\(		Search for an open parenthesis
	.*	Followed by maximal match of any number of occurrences (>=0) of any character
\)		Followed by a close parenthesis

SUMMARY: Finds parenthesis sets by searching for an open parenthesis followed by any number of characters, of any kind, followed by a close parenthesis. Note that the Unix style and Perl style use maximal matching. The Perl expression performs a multi-line search if prefixed by ‘(?m)’.

For UNIX and Perl-style users, take care to select the proper minimal/maximal matching character. See Minimal vs. Maximal Closure for more information.

Find Any Occurrence Of Single Or Double Quote Sets

EXAMPLE 4: Finds any occurrence of single or double quote sets.

SEARCH EXPRESSION:

(Classic Style)	{"?*"}|{'?*'}
(Unix Style)	(".*")|('.*')
(Perl Style)	([‘”]).*?\1 or (?m)([‘”]).*?\1

DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:

SUMMARY: Finds double or single quotes sets by searching for a quote, followed by any number of characters, of any kind, followed by a quote.

For UNIX- and Perl-style users, take care to select the proper minimal/maximal matching character. See Minimal vs. Maximal Closure for more information. Since Perl expressions can use backreferences, there’s no need to repeat verbatim strings; instead, the alternation can be encapsulated as a character class and a backreference. Note that this can’t be extended to allow matching nested structures, as they usually don’t use the same character sequence to begin a nesting level as they do to end one, and there’s no way to refer to match counts within a regex.

Find The Next Word

EXAMPLE 5: Finds the next word.

SEARCH EXPRESSION:

(Classic Style)	 %|[~a-z0-9_][a-z0-9_]
(Unix Style)	^|[^a-z0-9_][a-z0-9_]
(Perl Style)	(?<!\w)(?=\w)

DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:

(Classic Style)

	%		Search for beginning of line
	|		-OR-
	[~a-z0-9_]	Search for any character that is not a letter between a-z, a number 
			between 0-9, or an underscore
	[a-z0-9_]	Search for any character that is a letter between a-z, a number 
			between 0-9, or an underscore

(Unix Style)

	^		Search for beginning of line
	|		-OR-
	[^a-z0-9_]	Search for any character that is not a letter between a-z, a number 
			between 0-9, or an underscore
	[a-z0-9_]	Search for any character that is a letter between a-z, a number 
			between 0-9, or an underscore

(Perl Style)

	(?<!\w)		Search for a position not preceded by a word character (negative look-behind)
	(?=\w)		and followed by a word character (look-ahead)
The Unix expression would have also worked as a Perl expression, but is not as efficient as the Perl expression shown.  If it doesn’t matter whether the expression matches the beginning or the ending of the word, the simple:
	\b
to find a word boundary is sufficient.  Note that the Perl expression, unlike the other two expressions, doesn’t select any text.  That’s because it consists of two anchors.  That also makes it quite nice as a sort of prefix to an extended expression; the prefix locates the start of a word, and the rest of the expression matches from there.

SUMMARY:
Search for the beginning of the line OR search for the first occurrence of a blank followed by a letter, digit, or underscore. In the Perl expression, search for a look-behind and look-ahead match that together define the start of a word; it isn’t necessary then to cover the case where the word occurs at the start of a line as a separate subexpression, as the look-behind condition covers it.

Search For The Following Operators

EXAMPLE 6:
Searches for any of the following operators: '= > < ! | &'

SEARCH EXPRESSION:

(Classic Style)		[=><!@|@&]
(Unix and Perl Styles)	[=<>!|&]

DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:

(Classic Style)

	[	Begin character class
	=	Search for '=' character
	>	Search for '>' character
	<	Search for '<' character
	!	Search for '!' character
	@|	Search for '|' character literally.  Since the | is a wildcard character an @ must be placed
		before it in order to search for the character literally (as opposed to interpreting it as one of 
  		the search expression operators)
	@&	Search for '&' character
	]	End character class

(Unix and Perl Styles)

	[	Begin character class
	=	Search for '=' character
	<	Search for '>' character
	>	Search for '<' character
	!	Search for '!' character
	|	Search for '|' character literally.  Since the | is a wildcard character, it would ordinarily have to
		“escaped” by preceding it with a backslash to prevent the regex engine from interpreting it as
		such.  Occurring as it does, however, inside a character class, this is not necessary, as the
		alternation operator would be meaningless in that context.
	&	Search for '&' character
	]	End character class

Search For The Occurrence of A String And Replace It With The Found String In Parenthesis

EXAMPLE 7:

Searches for the occurrence of a string and replaces it with the found string in parenthesis, demonstrating the use of '&' and ‘\0’ as replace expression wildcard characters.

CURRENT STRING: Multi-Edit
DESIRED RESULT: (Multi-Edit)

SEARCH EXPRESSION:
(All Styles) Multi-Edit

REPLACE EXPRESSION:
(Classic & UNIX Styles) (&)

(Perl Style) (\0)

DETAILED EXPLANATION OF REPLACE EXPRESSION COMPONENTS:

(Classic and Unix Styles)

	(		Print a literal open parenthesis,
		&	followed by the entire string matched by the search expression,
	)		and then a closing literal parenthesis

(Perl)

	(		Print a literal open parenthesis,
		\0	followed by the entire string matched by the search expression,
	)		and then a closing literal parenthesis

Search For A Blank Line And Delete It

EXAMPLE 8:

Searches for a blank line and deletes it.

SEARCH EXPRESSION:

(Classic Style) %$
(Unix and Perl Styles) ^$

DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:

(Classic Style)

%	Search for the beginning of a line
$	Followed by the end of a line

(Unix and Perl Styles)

^	Search for the beginning of a line
$	Followed by the end of a line

REPLACE EXPRESSION:

(All Styles) %

DETAILED EXPLANATION OF REPLACE EXPRESSION COMPONENTS:
(All Styles) % Delete a character

SUMMARY:
Finds a blank line by searching for the beginning of a line immediately followed by the end of line. Using '%' (delete character) as the replace expression deletes the blank line.

In order for this expression to work properly the 'Leave Cursor At' switch must be set to leave the cursor at the beginning of the replace string. Tools | Customize | Editing | Blocks, Leave Cursor At(right side of dialog): Start of Pasted Text.

Search For The First Occurrence Of A ';' And Delete The Rest Of The Line

The use of curly braces { } in Classic Style and parentheses in Unix and Perl Styles to define capture groups within search expressions adds power and flexibility to Multi-Edit's Search and Replace operations. This grouping capability enables you to do such things as delete portions of the found string or change the sequence of matched groups within the found string. Both of these operations are demonstrated in the following examples. Perl has several other types of more specialized grouping operators; refer to the documentation for the regular expression library or a text on regular expression usage for more detail, as such material is quite outside the bounds of this manual.

EXAMPLE 9:

Searches for the first occurrence of a semicolon and deletes the rest of the line after it.

CURRENT STRING: goto_line(1);del_line;

DESIRED RESULT: goto_line(1);

SEARCH EXPRESSION:
(Classic Style) {?*;}{?*}$
(Unix Style) (.@;)(.*)$
(Perl Style) (.*?;)(.*)$

DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:
SUMMARY:
Find any character that occurs any number of times, (including 0 times) that is followed by a semicolon. This constitutes Group 0 (Group 1 in Perl). Following the previous match, find any character that occurs any number of times. This will constitute Group 1 (2 in Perl).

While greedy (maximal) operators are generally more efficient than lazy (minimal) ones, it’s fairly easy to construct pathological regexes where the opposite is very much the case, due to excessive “super-linear” backtracking in search of a match. In particular, the Kleene star, ‘*’, consumes everything when used in an expression like ‘.*’, potentially the entire file if multi-line matching is enabled (although it can also deliver some surprising results in other ways, since, while it attempts to match the longest possible string consistent with an overall match, if necessary, it will match against nothing at all;. Judicious use of “possessive” operators and capture groups, which prevent certain kinds of backtracking, can make searches much faster, and can often prevent an exponential explosion in search time that can cause Multi-Edit to appear hung. A good source book on these topics at the time of this printing is Jeffrey E.F. Friedl’s Mastering Regular Expressions, under the O’Reilly imprint.

REPLACE EXPRESSION:

(Classic Style) #0
(Unix Style) \0
(Perl Style) \1

DETAILED EXPLANATION OF REPLACE EXPRESSION COMPONENTS:

(Classic Style)

#0	replace found string with group 0 from search string.
(Unix Style)
\0	replace found string with group 0 from search string.
(Perl Style)
\1	replace found string with group 1 from search string.  Perl counts its sub-pattern matches starting at 
  	one, rather than zero, like the Classic and Unix styles.  \0 is reserved for the text matched by the entire 
 	pattern.

SUMMARY:
In this example we are searching for two groups of strings, everything up to the ';' and everything else to the end of the line. When we replace the string we only replace the first group, which has the effect of deleting to the end of the line.

Swap The Parameters Of The Procedure gotoxy

EXAMPLE 10:

Swaps the parameters of the procedure 'gotoxy'.

CURRENT STRING: gotoxy(x,y);
DESIRED RESULT: gotoxy(y,x);

SEARCH EXPRESSION:

(Classic Style)	gotoxy({?+},{?+});
(Unix Style)	%%gotoxy\((.#),(.#)\);%%
(Perl Style)	gotoxy\((.+?),(.+?)\);

DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:

SUMMARY:
Finds the literal string “gotoxy(”, followed by any string at least one character long (the first argument, captured as Group 1 by a Perl expression, Group 0 by the other sorts), followed by the comma that delimits the two arguments; then another capture expression identical to the first for the second parameter (Group 0 or 1), followed by the literal string “);”.

REPLACE EXPRESSION:

(Classic Style)	gotoxy(#1,#0);
(Unix Style)	gotoxy(\1,\0);
(Perl Style)	gotoxy(\2,\1);

DETAILED EXPLANATION OF REPLACE EXPRESSION COMPONENTS:

(Classic Style)

gotoxy(	Replace with the opening string “gotoxy(”
#1	followed by Group 1 from the search expression
,	followed by the argument-delimiter comma
#0	followed by Group 0 from the search expression
);	followed by the closing string “);”

(Unix Style)

gotoxy(	Replace with the opening string “gotoxy(”
\1	followed by Group 1 from the search expression
,	followed by the argument-delimiter comma
\0	followed by Group 0 from the search expression
);	followed by the closing string “);”

(Perl Style)

gotoxy(	Replace with the opening string “gotoxy(”
\2	followed by Group 2 from the search expression
,	followed by the argument-delimiter comma
\1	followed by Group 1 from the search expression
);	followed by the closing string “);”

Notice that the literal parentheses that form the argument list had to be backslash-escaped in the search expression, but not in the replace expression, because they’re metacharacters in the former, but couldn’t be in the latter. It also becomes obvious in this example that sub-pattern capture groups are zero-based for Classic and Unix expressions, but one-based for Perl.

SUMMARY:
This example demonstrates the use of groups to change the order of your search string. Since we defined each parameter as a group, it was easy to swap them by changing their order in the replace expression.

Search For Two Semicolon Delimited Statements On A Single Line

EXAMPLE 11:

Searches for two semicolon delimited statements on a single line. Once found, a carriage return is inserted between them leaving each statement on its own line.

CURRENT STRING:	goto_line(1);del_line;
DESIRED RESULT:	goto_line(1);
			del_line;

SEARCH EXPRESSION:

(Classic Style)	{;}@x20*{[~ ]?*;}	(@x20 is the space character a literal space would also work)
(Unix Style)	(;)\x20@([^ ].@;)	(\x20 is space in this style)
(Perl Style)	(;)\s*?(\S.*?;)

DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:
SUMMARY:
Find the first occurrence of a semicolon on a line. This will constitute Group 0 (1 for Perl). Following this, find any number of occurrences of a space. Then find Group 1 (2 for Perl), which consists of the shortest matching string of at least one character, followed by a semicolon.

REPLACE EXPRESSION:

(Classic Style)	#0$#1
(Unix Style)	\0$\1
(Perl Style)	\1$\2

DETAILED EXPLANATION OF REPLACE EXPRESSION COMPONENTS:

(Classic Style)

	#0	Replace the matched string with whatever matched capture Group 0,
	$	followed by a line break
	#1	followed by whatever matched Group 1

(Unix Style)

	\0	Replace the matched string with whatever matched capture Group 0,
	$	followed by a line break,
	\1	followed by whatever matched Group 1

(Perl Style)

	\1	Replace the matched string with whatever matched capture Group 1,
	$	followed by a line break,
	\2	followed by whatever matched Group 2

SUMMARY: In this example, two groups are defined. The first consists of the first occurrence of a semicolon on a line. The second consists of any number of characters (excluding a space) followed by a semicolon. When the found string is replaced a carriage return is inserted between the two groups, leaving each one on its own line.

Search And Replace Phrases

EXAMPLE 12:

Searches for the phrases “this is a test” OR “this is not a test” and replaces “a test” with “a pizza”

CURRENT STRING: this is a test
- OR -
this is not a test

DESIRED RESULT: this is a pizza
- OR -
this is not a pizza

SEARCH EXPRESSION:

(Classic Style)	this i{{s not}|{s}} a test
(Unix Style)	this i((s not)|(s)) a test
(Perl Style)	this is( not)? a test

DETAILED EXPLANATION OF SEARCH EXPRESSION COMPONENTS:

SUMMARY:
Match “this i”, followed by either “s not” OR “s” (which constitutes group #0), followed by “a test”. For the Perl version, find “this is” , possibly followed by just one “ not”, followed by “ a test”. This is only possible in the Perl version, since only it has a “none or one” control, which is necessary to lock out false matches (repeated “ not”).

REPLACE EXPRESSION:

(Classic Style)	this i#0 a pizza
(Unix Style)	this i\0 a pizza
(Perl Style)	this is\1 a pizza

SUMMARY:
Replace the complete matched string with the literal-string prefix “this i” (with an extra “s” in the Perl version), concatenated with Group 0 (1 for Perl) from the search expression, then with the literal-string suffix. “a pizza”. There is a very important detail to be noted in the Classic and Unix search expressions in this example: “s not” came before “s” in the alternation.
This was done intentionally. Since “OR” expressions are evaluated from left to right, you must pay close attention to the order in which its components appear. Once the regex engine matches any component of the expression, it won’t evaluate any of the remaining components for a match. Since the Perl version doesn’t use alternation, it isn’t an issue here; however, in Perl expressions that do use alternation, it is still an issue. If “s” had been placed before “s not” the following would result when encountering the line “this is not a test” (the Unix version only differs in that the curly braces become parentheses):
FOUND TEXT: xx this is not a test xxx xx
RESULT: Equivalent of no match, no replacement is made.

The “s” portion of the line would match, terminating evaluation of the other branch of the alternation. The remaining component “s not” would never be evaluated. At that point, the engine would attempt to match the next unmatched part of the expression, “ a test”, and would fail, because the searched text at that point is “not a test”.

The engine will always attempt to backtrack until it finds a match (whenever possible, expressions attempt to “give up” already-matched text to backtrack to a prior state from which a match down a different path may be possible), unless it exhausts the text without finding a match, is prevented from backtracking (note the operators explicitly intended for that), or arrives at a state where the available search space expands exponentially (you’ll likely think Multi-Edit is hung, but it’s the result of a poorly-formulated regex). Always keep this in mind when constructing your own search expressions.

More info on Regular Expressions can be found on the internet

 
doc/manual2k8/regular_expressions.txt · Last modified: 2008/07/09 14:53 (external edit)     Back to top