(ed.info)Regular expressions
5 Regular expressions
*********************
Regular expressions are patterns used in selecting text. For example,
the 'ed' command
g/STRING/
prints all lines containing STRING. Regular expressions are also used
by the 's' command for selecting old text to be replaced with new text.
In addition to specifying string literals, regular expressions can
represent classes of strings. Strings thus represented are said to be
matched by the corresponding regular expression. If it is possible for a
regular expression to match several strings in a line, then the
left-most match is the one selected. If the regular expression permits a
variable number of matching characters, the longest sequence starting at
that point is matched.
A null RE is equivalent to the last RE encountered.
The following symbols are used in constructing regular expressions:
'C'
Any character C not listed below, including '{', '}', '(', ')',
'<' and '>', matches itself.
'\C'
Any backslash-escaped character C, other than '{', '}', '(', ')',
'<', '>', 'b', 'B', 'w', 'W', '+' and '?', matches itself.
'.'
Matches any single character.
'[CHAR-CLASS]'
Matches any single character in CHAR-CLASS. To include a ']' in
CHAR-CLASS, it must be the first character. A range of characters
may be specified by separating the end characters of the range
with a '-', e.g., 'a-z' specifies the lower case characters. The
following literal expressions can also be used in CHAR-CLASS to
specify sets of characters:
[:alnum:] [:cntrl:] [:lower:] [:space:]
[:alpha:] [:digit:] [:print:] [:upper:]
[:blank:] [:graph:] [:punct:] [:xdigit:]
If '-' appears as the first or last character of CHAR-CLASS, then
it matches itself. All other characters in CHAR-CLASS match
themselves.
Patterns in CHAR-CLASS of the form:
[.COL-ELM.]
[=COL-ELM=]
where COL-ELM is a "collating element" are interpreted according
to 'locale (5)'. See 'regex (3)' for an explanation of these
constructs.
'[^CHAR-CLASS]'
Matches any single character, other than newline, not in
CHAR-CLASS. CHAR-CLASS is defined as above.
'^'
If '^' is the first character of a regular expression, then it
anchors the regular expression to the beginning of a line.
Otherwise, it matches itself.
'$'
If '$' is the last character of a regular expression, it anchors
the regular expression to the end of a line. Otherwise, it matches
itself.
'\(RE\)'
Defines a (possibly null) subexpression RE. Subexpressions may be
nested. A subsequent backreference of the form '\N', where N is a
number in the range [1,9], expands to the text matched by the Nth
subexpression. For example, the regular expression '\(a.c\)\1'
matches the string 'abcabc', but not 'abcadc'. Subexpressions are
ordered relative to their left delimiter.
'*'
Matches zero or more repetitions of the regular expression
immediately preceding it. The regular expression can be either a
single character regular expression or a subexpression. If '*' is
the first character of a regular expression or subexpression, then
it matches itself. The '*' operator sometimes yields unexpected
results. For example, the regular expression 'b*' matches the
beginning of the string 'abbb', as opposed to the substring 'bbb',
since a null match is the only left-most match.
'\{N,M\}'
'\{N,\}'
'\{N\}'
Matches the single character regular expression or subexpression
immediately preceding it at least N and at most M times. If M is
omitted, then it matches at least N times. If the comma is also
omitted, then it matches exactly N times. If any of these forms
occurs first in a regular expression or subexpression, then it is
interpreted literally (i.e., the regular expression '\{2\}'
matches the string '{2}', and so on).
'\<'
'\>'
Anchors the single character regular expression or subexpression
immediately following it to the beginning (in the case of '\<') or
ending (in the case of '\>') of a "word", i.e., in ASCII, a
maximal string of alphanumeric characters, including the
underscore (_).
The following extended operators are preceded by a backslash '\' to
distinguish them from traditional 'ed' syntax.
'\`'
'\''
Unconditionally matches the beginning '\`' or ending '\'' of a
line.
'\?'
Optionally matches the single character regular expression or
subexpression immediately preceding it. For example, the regular
expression 'a[bd]\?c' matches the strings 'abc', 'adc' and 'ac'.
If '\?' occurs at the beginning of a regular expressions or
subexpression, then it matches a literal '?'.
'\+'
Matches the single character regular expression or subexpression
immediately preceding it one or more times. So the regular
expression 'a\+' is shorthand for 'aa*'. If '\+' occurs at the
beginning of a regular expression or subexpression, then it
matches a literal '+'.
'\b'
Matches the beginning or ending (null string) of a word. Thus the
regular expression '\bhello\b' is equivalent to '\<hello\>'.
However, '\b\b' is a valid regular expression whereas '\<\>' is
not.
'\B'
Matches (a null string) inside a word.
'\w'
Matches any character in a word.
'\W'
Matches any character not in a word.
automatically generated by info2www version 1.2.2.9