114 lines
5.3 KiB
Plaintext
114 lines
5.3 KiB
Plaintext
Search And Replace Summary
|
|
==========================
|
|
|
|
Search Strings
|
|
--------------
|
|
regular-exp : {branch}|{branch}|...
|
|
branch : {piece}{piece}...
|
|
piece : {atom{* or + or ?}}{atom{* or + or ?}}...
|
|
* - match 0 or more of the atom
|
|
+ - match 1 or more of the atom
|
|
? - match a match of the atom, or the null string
|
|
atom : (regexp) or range or ~ or @ or ^ or $ or \char or char
|
|
range : [ {^} char and/or charlo-charhi ] '^' causes negation of range
|
|
. : match any character
|
|
^ : match start of line
|
|
$ : match end of line
|
|
@ : what follows is to be searched with case sensitivity
|
|
~ : what follows is to be searched without case sensitivity
|
|
! : if it occurs as the FIRST CHARACTER in a regular expression,
|
|
it causes the ignoring of the NoMagic variable; that is,
|
|
all magic characters are treated as magical. ! is treated
|
|
as a regular character if it occurs anywhere but at
|
|
the very start.
|
|
char : any character
|
|
\char : forces \char to be accepted as char (no special meaning)
|
|
execept for
|
|
\t - match a tab (if RealTabs is set)
|
|
|
|
Replace Strings
|
|
---------------
|
|
& - Each instance of `&' in the replacement string is replaced by the
|
|
entire string of matched characters
|
|
\\ - Used to enter a '\' in the replacement string
|
|
\n - Replaced with a new line
|
|
\t - Replaced with a tab (if RealTabs is set)
|
|
\<n> - Each instance of <n>, where n is a digit, is replaced by the
|
|
n'th substring in the previous regular expression
|
|
\u - Next item in replacement string is changed to upper case
|
|
\l - Next item in replacement string is changed to lower case
|
|
\U - All items following '\U' in the replacement string are changed to
|
|
upper case, until a '\E' or '\e' is encountered
|
|
\L - All items following '\U' in the replacement string are changed to
|
|
lower case, until a '\E' or '\e' is encountered
|
|
\e,\E - Terminate \U or \L
|
|
\|<n> - substitutes spaces up to column <n>, so that the item that follows
|
|
occurs at column <n>
|
|
\# - substitutes current line number
|
|
|
|
|
|
Regular Expression Search Strings
|
|
=================================
|
|
A regular expression is zero or more branches, separated by `|'.
|
|
It matches anything that matches one of the branches.
|
|
|
|
A branch is zero or more pieces, concatenated.
|
|
It matches a match for the first, followed by a match for the second, etc.
|
|
|
|
A piece is an atom possibly followed by `*', `+', or `?'.
|
|
An atom followed by `*' matches a sequence of 0 or more matches of the atom.
|
|
An atom followed by `+' matches a sequence of 1 or more matches of the atom.
|
|
An atom followed by `?' matches a match of the atom, or the null string.
|
|
|
|
An atom is a regular expression in parentheses (matching a match for the
|
|
regular expression), a range (see below),
|
|
`.' (matching any single character), `^' (matching the null string at the
|
|
beginning of the input string), `$' (matching the null string at the
|
|
end of the input string), a `\' followed by a single character (matching
|
|
that character), '~' (causing next stuff to be case sensitive), '@'
|
|
(causing next stuff to be case insensitive), or a single character
|
|
with no other significance (matching that character).
|
|
|
|
A range is a sequence of characters enclosed in `[]'.
|
|
It normally matches any single character from the sequence.
|
|
If the sequence begins with `^',
|
|
it matches any single character not from the rest of the sequence.
|
|
If two characters in the sequence are separated by `-', this is shorthand
|
|
for the full list of ASCII characters between them
|
|
(e.g. `[0-9]' matches any decimal digit).
|
|
To include a literal `]' in the sequence, make it the first character
|
|
(following a possible `^').
|
|
To include a literal `-', make it the first or last character.
|
|
|
|
If a regular expression could match two different parts of the line,
|
|
it will match the one which begins earliest.
|
|
If both begin in the same place but match different lengths, or match
|
|
the same length in different ways, life gets messier, as follows.
|
|
|
|
In general, the possibilities in a list of branches are considered in
|
|
left-to-right order, the possibilities for `*', `+', and `?' are
|
|
considered longest-first, nested constructs are considered from the
|
|
outermost in, and concatenated constructs are considered leftmost-first.
|
|
The match that will be chosen is the one that uses the earliest
|
|
possibility in the first choice that has to be made.
|
|
If there is more than one choice, the next will be made in the same manner
|
|
(earliest possibility) subject to the decision on the first choice.
|
|
And so forth.
|
|
|
|
For example, `(ab|a)b*c' could match `abc' in one of two ways.
|
|
The first choice is between `ab' and `a'; since `ab' is earlier, and does
|
|
lead to a successful overall match, it is chosen.
|
|
Since the `b' is already spoken for,
|
|
the `b*' must match its last possibility _the empty string_ since
|
|
it must respect the earlier choice.
|
|
|
|
In the particular case where no `|'s are present and there is only one
|
|
`*', `+', or `?', the net effect is that the longest possible
|
|
match will be chosen.
|
|
So `ab*', presented with `xabbbby', will match `abbbb'.
|
|
Note that if `ab*' is tried against `xabyabbbz', it
|
|
will match `ab' just after `x', due to the begins-earliest rule.
|
|
(In effect, the decision on where to start the match is the first choice
|
|
to be made, hence subsequent choices must respect it even if this leads them
|
|
to less-preferred alternatives.)
|