(ed.info)Regular expressions


Next: Commands Prev: Line addressing Up: Top
Enter node , (file) or (file)node

5 Regular expressions
*********************

Regular expressions are patterns used in selecting text. For example,
the 'ed' command

     g/STRING/

prints all lines containing STRING. Regular expressions are also used
by the 's' command for selecting old text to be replaced with new text.

   In addition to specifying string literals, regular expressions can
represent classes of strings. Strings thus represented are said to be
matched by the corresponding regular expression. If it is possible for a
regular expression to match several strings in a line, then the
left-most match is the one selected. If the regular expression permits a
variable number of matching characters, the longest sequence starting at
that point is matched.

   A null RE is equivalent to the last RE encountered.

   The following symbols are used in constructing regular expressions:

'C'
     Any character C not listed below, including '{', '}', '(', ')',
     '<' and '>', matches itself.

'\C'
     Any backslash-escaped character C, other than '{', '}', '(', ')',
     '<', '>', 'b', 'B', 'w', 'W', '+' and '?', matches itself.

'.'
     Matches any single character.

'[CHAR-CLASS]'
     Matches any single character in CHAR-CLASS. To include a ']' in
     CHAR-CLASS, it must be the first character. A range of characters
     may be specified by separating the end characters of the range
     with a '-', e.g., 'a-z' specifies the lower case characters. The
     following literal expressions can also be used in CHAR-CLASS to
     specify sets of characters:

          [:alnum:] [:cntrl:] [:lower:] [:space:]
          [:alpha:] [:digit:] [:print:] [:upper:]
          [:blank:] [:graph:] [:punct:] [:xdigit:]

     If '-' appears as the first or last character of CHAR-CLASS, then
     it matches itself. All other characters in CHAR-CLASS match
     themselves.

     Patterns in CHAR-CLASS of the form:
          [.COL-ELM.]
          [=COL-ELM=]

     where COL-ELM is a "collating element" are interpreted according
     to 'locale (5)'. See 'regex (3)' for an explanation of these
     constructs.

'[^CHAR-CLASS]'
     Matches any single character, other than newline, not in
     CHAR-CLASS.  CHAR-CLASS is defined as above.

'^'
     If '^' is the first character of a regular expression, then it
     anchors the regular expression to the beginning of a line.
     Otherwise, it matches itself.

'$'
     If '$' is the last character of a regular expression, it anchors
     the regular expression to the end of a line. Otherwise, it matches
     itself.

'\(RE\)'
     Defines a (possibly null) subexpression RE. Subexpressions may be
     nested. A subsequent backreference of the form '\N', where N is a
     number in the range [1,9], expands to the text matched by the Nth
     subexpression. For example, the regular expression '\(a.c\)\1'
     matches the string 'abcabc', but not 'abcadc'. Subexpressions are
     ordered relative to their left delimiter.

'*'
     Matches zero or more repetitions of the regular expression
     immediately preceding it. The regular expression can be either a
     single character regular expression or a subexpression. If '*' is
     the first character of a regular expression or subexpression, then
     it matches itself. The '*' operator sometimes yields unexpected
     results. For example, the regular expression 'b*' matches the
     beginning of the string 'abbb', as opposed to the substring 'bbb',
     since a null match is the only left-most match.

'\{N,M\}'
'\{N,\}'
'\{N\}'
     Matches the single character regular expression or subexpression
     immediately preceding it at least N and at most M times. If M is
     omitted, then it matches at least N times. If the comma is also
     omitted, then it matches exactly N times. If any of these forms
     occurs first in a regular expression or subexpression, then it is
     interpreted literally (i.e., the regular expression '\{2\}'
     matches the string '{2}', and so on).

'\<'
'\>'
     Anchors the single character regular expression or subexpression
     immediately following it to the beginning (in the case of '\<') or
     ending (in the case of '\>') of a "word", i.e., in ASCII, a
     maximal string of alphanumeric characters, including the
     underscore (_).


   The following extended operators are preceded by a backslash '\' to
distinguish them from traditional 'ed' syntax.

'\`'
'\''
     Unconditionally matches the beginning '\`' or ending '\'' of a
     line.

'\?'
     Optionally matches the single character regular expression or
     subexpression immediately preceding it. For example, the regular
     expression 'a[bd]\?c' matches the strings 'abc', 'adc' and 'ac'.
     If '\?' occurs at the beginning of a regular expressions or
     subexpression, then it matches a literal '?'.

'\+'
     Matches the single character regular expression or subexpression
     immediately preceding it one or more times. So the regular
     expression 'a\+' is shorthand for 'aa*'. If '\+' occurs at the
     beginning of a regular expression or subexpression, then it
     matches a literal '+'.

'\b'
     Matches the beginning or ending (null string) of a word. Thus the
     regular expression '\bhello\b' is equivalent to '\<hello\>'.
     However, '\b\b' is a valid regular expression whereas '\<\>' is
     not.

'\B'
     Matches (a null string) inside a word.

'\w'
     Matches any character in a word.

'\W'
     Matches any character not in a word.



automatically generated by info2www version 1.2.2.9