(recode.info)dump-with-names


Prev: count-characters Up: Universal
Enter node , (file) or (file)node

Fully interpreted UCS dump
==========================

   Another device may be used to get fully interpreted dumps of an
`UCS-2' stream of characters, with one `UCS-2' character displayed on a
full output line.  Each line receives the RFC 1345 mnemonic for the
character if it exists, the `UCS-2' value of the character, and a
descriptive comment for that character.  As each input character
produces its own output line, beware that the output file from this
conversion may be much, much bigger than the input file.

   This charset is available in `recode' under the name
`dump-with-names'.

   This `dump-with-names' feature has been implemented as a charset
rather than a surface.  This is surely debatable.  The current
implementation allows for dumping charsets other than `UCS-2'.  For
example, the command `recode l2..full < INPUT' implies a necessary
conversion from `Latin-2' to `UCS-2', as `dump-with-names' is only
connected out from `UCS-2'.  In such cases, `recode' does not display
the original `Latin-2' codes in the dump, only the corresponding
`UCS-2' values.  To give a simpler example, the command

     echo 'Hello, world!' | recode us..dump

produces the following output:

     UCS2   Mne   Description
     
     0048   H     latin capital letter h
     0065   e     latin small letter e
     006C   l     latin small letter l
     006C   l     latin small letter l
     006F   o     latin small letter o
     002C   ,     comma
     0020   SP    space
     0077   w     latin small letter w
     006F   o     latin small letter o
     0072   r     latin small letter r
     006C   l     latin small letter l
     0064   d     latin small letter d
     0021   !     exclamation mark
     000A   LF    line feed (lf)

   The descriptive comment is given in English and `ASCII', yet if the
English description is not available but a French one is, then the
French description is given instead, using `Latin-1'.  However, if the
`LANGUAGE' or `LANG' environment variable begins with the letters `fr',
then listing preference goes to French when both descriptions are
available.

   Here is another example.  To get the long description of the code
237 in Latin-5 table, one may use the following command.

     echo -n 237 | recode l5/d..dump

If your `echo' does not grok `-n', use `echo 237\c' instead.  Here is
how to see what Unicode `U+03C6' means, while getting rid of the title
lines.

     echo -n 0x03C6 | recode u2/x2..dump | tail +3


automatically generated by info2www version 1.2.2.9