(coreutils.info)Squeezing and deleting


Prev: Translating Up: tr invocation
Enter node , (file) or (file)node

9.1.3 Squeezing repeats and deleting
------------------------------------

When given just the ‘--delete’ (‘-d’) option, ‘tr’ removes any input
characters that are in SET1.

   When given just the ‘--squeeze-repeats’ (‘-s’) option and not
translating, ‘tr’ replaces each input sequence of a repeated character
that is in SET1 with a single occurrence of that character.

   When given both ‘--delete’ and ‘--squeeze-repeats’, ‘tr’ first
performs any deletions using SET1, then squeezes repeats from any
remaining characters using SET2.

   The ‘--squeeze-repeats’ option may also be used when translating, in
which case ‘tr’ first performs translation, then squeezes repeats from
any remaining characters using SET2.

   Here are some examples to illustrate various combinations of options:

   • Remove all zero bytes:

          tr -d '\0'

   • Put all words on lines by themselves.  This converts all
     non-alphanumeric characters to newlines, then squeezes each string
     of repeated newlines into a single newline:

          tr -cs '[:alnum:]' '[\n*]'

   • Convert each sequence of repeated newlines to a single newline.
     I.e., delete blank lines:

          tr -s '\n'

   • Find doubled occurrences of words in a document.  For example,
     people often write “the the” with the repeated words separated by a
     newline.  The Bourne shell script below works first by converting
     each sequence of punctuation and blank characters to a single
     newline.  That puts each “word” on a line by itself.  Next it maps
     all uppercase characters to lower case, and finally it runs ‘uniq’
     with the ‘-d’ option to print out only the words that were
     repeated.

          #!/bin/sh
          cat -- "$@" \
            | tr -s '[:punct:][:blank:]' '[\n*]' \
            | tr '[:upper:]' '[:lower:]' \
            | uniq -d

   • Deleting a small set of characters is usually straightforward.  For
     example, to remove all ‘a’s, ‘x’s, and ‘M’s you would do this:

          tr -d axM

     However, when ‘-’ is one of those characters, it can be tricky
     because ‘-’ has special meanings.  Performing the same task as
     above but also removing all ‘-’ characters, we might try ‘tr -d
     -axM’, but that would fail because ‘tr’ would try to interpret ‘-a’
     as a command-line option.  Alternatively, we could try putting the
     hyphen inside the string, ‘tr -d a-xM’, but that wouldn’t work
     either because it would make ‘tr’ interpret ‘a-x’ as the range of
     characters ‘a’...‘x’ rather than the three.  One way to solve the
     problem is to put the hyphen at the end of the list of characters:

          tr -d axM-

     Or you can use ‘--’ to terminate option processing:

          tr -d -- -axM

     More generally, use the character class notation ‘[=c=]’ with ‘-’
     (or any other character) in place of the ‘c’:

          tr -d '[=-=]axM'

     Note how single quotes are used in the above example to protect the
     square brackets from interpretation by a shell.


automatically generated by info2www version 1.2.2.9