(recode.info)Listings


Next: Recoding Prev: Requests Up: Invoking recode
Enter node , (file) or (file)node

Asking for various lists
========================

   Many options control listing output generated by `recode' itself,
they are not meant to accompany actual file recodings.  These options
are:

`--version'
     The program merely prints its version numbers on standard output,
     and exits without doing anything else.

`--help'
     The program merely prints a page of help on standard output, and
     exits without doing any recoding.

`-C'
`--copyright'
     Given this option, all other parameters and options are ignored.
     The program prints briefly the copyright and copying conditions.
     See the file `COPYING' in the distribution for full statement of
     the Copyright and copying conditions.

`-h[LANGUAGE/][NAME]'
`--header[=[LANGUAGE/][NAME]]'
     Instead of recoding files, `recode' writes a LANGUAGE source file
     on standard output and exits.  This source is meant to be included
     in a regular program written in the same programming LANGUAGE: its
     purpose is to declare and initialise an array, named NAME, which
     represents the requested recoding.  The only acceptable values for
     LANGUAGE are `c' or `perl', and may may be abbreviated.  If
     LANGUAGE is not specified, `c' is assumed.  If NAME is not
     specified, then it defaults to `BEFORE_AFTER'.  Strings BEFORE and
     AFTER are cleaned before being used according to the syntax of
     LANGUAGE.

     Even if `recode' tries its best, this option does not always
     succeed in producing the requested source table.  It will however,
     provided the recoding can be internally represented by only one
     step after the optimisation phase, and if this merged step conveys
     a one-to-one or a one-to-many explicit table.  Also, when
     attempting to produce sources tables, `recode' relaxes its
     checking a tiny bit: it ignores the algorithmic part of some
     tabular recodings, it also avoids the processing of implied
     surfaces.  But this is all fairly technical.  Better try and see!

     Beware that other options might affect the produced source tables,
     these are: `-d', `-g' and, particularly, `-s'.

`-k PAIRS'
`--known=PAIRS'
     This particular option is meant to help identifying an unknown
     charset, using as hints some already identified characters of the
     charset.  Some examples will help introducing the idea.

     Let's presume here that `recode' is run in an ISO-8859-1 locale,
     and that `DEFAULT_CHARSET' is unset in the environment.  Suppose
     you have guessed that code 130 (decimal) of the unknown charset
     represents a lower case `e' with an acute accent.  That is to say
     that this code should map to code 233 (decimal) in the usual
     charset.  By executing:

          recode -k 130:233

     you should obtain a listing similar to:

          AtariST atarist
          CWI cphu cwi cwi2
          IBM437 437 cp437 ibm437
          IBM850 850 cp850 ibm850
          IBM851 851 cp851 ibm851
          IBM852 852 cp852 ibm852
          IBM857 857 cp857 ibm857
          IBM860 860 cp860 ibm860
          IBM861 861 cp861 cpis ibm861
          IBM863 863 cp863 ibm863
          IBM865 865 cp865 ibm865

     You can give more than one clue at once, to restrict the list
     further.  Suppose you have _also_ guessed that code 211 of the
     unknown charset represents an upper case `E' with diaeresis, that
     is, code 203 in the usual charset.  By requesting:

          recode -k 130:233,211:203

     you should obtain:

          IBM850 850 cp850 ibm850
          IBM852 852 cp852 ibm852
          IBM857 857 cp857 ibm857

     The usual charset may be overridden by specifying one non-option
     argument.  For example, to request the list of charsets for which
     code 130 maps to code 142 for the Macintosh, you may ask:

          recode -k 130:142 mac

     and get:

          AtariST atarist
          CWI cphu cwi cwi2
          IBM437 437 cp437 ibm437
          IBM850 850 cp850 ibm850
          IBM851 851 cp851 ibm851
          IBM852 852 cp852 ibm852
          IBM857 857 cp857 ibm857
          IBM860 860 cp860 ibm860
          IBM861 861 cp861 cpis ibm861
          IBM863 863 cp863 ibm863
          IBM865 865 cp865 ibm865

     which, of course, is identical to the result of the first example,
     since the code 142 for the Macintosh is a small `e' with acute.

     More formally, option `-k' lists all possible _before_ charsets
     for the _after_ charset given as the sole non-option argument to
     `recode', but subject to restrictions given in PAIRS.  If there is
     no non-option argument, the _after_ charset is taken to be the
     default charset for this `recode'.

     The restrictions are given as a comma separated list of pairs,
     each pair consisting of two numbers separated by a colon.  The
     numbers are taken as decimal when the initial digit is between `1'
     and `9'; `0x' starts an hexadecimal number, or else `0' starts an
     octal number.  The first number is a code in any _before_ charset,
     while the second number is a code in the specified _after_ charset.
     If the first number would not be transformed into the second
     number by recoding from some _before_ charset to the _after_
     charset, then this _before_ charset is rejected.  A _before_
     charset is listed only if it is not rejected by any pair.  The
     program will only test those _before_ charsets having a tabular
     style internal description (Note: Tabular), so should be the
     selected _after_ charset.

     The produced list is in fact a subset of the list produced by the
     option `-l'.  As for option `-l', the non-option argument is
     interpreted as a charset name, possibly abbreviated to any non
     ambiguous prefix.

`-l[FORMAT]'
`--list[=FORMAT]'
     This option asks for information about all charsets, or about one
     particular charset.  No file will be recoded.

     If there is no non-option arguments, `recode' ignores the FORMAT
     value of the option, it writes a sorted list of charset names on
     standard output, one per line.  When a charset name have aliases
     or synonyms, they follow the true charset name on its line, sorted
     from left to right.  Each charset or alias is followed by its
     implied surfaces, if any.  This list is over two hundred lines.
     It is best used with `grep -i', as in:

          recode -l | grep -i greek

     There might be one non-option argument, in which case it is
     interpreted as a charset name, possibly abbreviated to any non
     ambiguous prefix.  This particular usage of the `-l' option is
     obeyed _only_ for charsets having a tabular style internal
     description (Note: Tabular).  Even if most charsets have this
     property, some do not, and the option `-l' cannot be used to
     detail these particular charsets.  For knowing if a particular
     charset can be listed this way, you should merely try and see if
     this works.  The FORMAT value of the option is a keyword from the
     following list.  Keywords may be abbreviated by dropping suffix
     letters, and even reduced to the first letter only:

    `decimal'
          This format asks for the production on standard output of a
          concise tabular display of the charset, in which character
          code values are expressed in decimal.

    `octal'
          This format uses octal instead of decimal in the concise
          tabular display of the charset.

    `hexadecimal'
          This format uses hexadecimal instead of decimal in the
          concise tabular display of the charset.

    `full'
          This format requests an extensive display of the charset on
          standard output, using one line per character showing its
          decimal, hexadecimal, octal and `UCS-2' code values, and also
          a descriptive comment which should be the 10646 name for the
          character.

          The descriptive comment is given in English and ASCII, yet if
          the English description is not available but a French one is,
          then the French description is given instead, using Latin-1.
          However, if the `LANGUAGE' or `LANG' environment variable
          begins with the letters `fr', then listing preference goes to
          French when both descriptions are available.

     When option `-l' is used together with a CHARSET argument, the
     FORMAT defaults to `decimal'.

`-T'
`--find-subsets'
     This option is a maintainer tool for evaluating the redundancy of
     those charsets, in `recode', which are internally represented by
     an `UCS-2' data table.  After the listing has been produced, the
     program exits without doing any recoding.  The output is meant to
     be sorted, like this: `recode -T | sort'.  The option triggers
     `recode' into comparing all pairs of charsets, seeking those which
     are subsets of others.  The concept and results are better
     explained through a few examples.  Consider these three sample
     lines from `-T' output:

          [  0] IBM891 == IBM903
          [  1] IBM1004 < CP1252
          [ 12] INVARIANT < CSA_Z243.4-1985-1

     The first line means that `IBM891' and `IBM903' are completely
     identical as far as `recode' is concerned, so one is fully
     redundant to the other.  The second line says that `IBM1004' is
     wholly contained within `CP1252', yet there is a single character
     which is in `CP1252' without being in `IBM1004'.  The third line
     says that `INVARIANT' is wholly contained within
     `CSA_Z243.4-1985-1', but twelve characters are in
     `CSA_Z243.4-1985-1' without being in `INVARIANT'.  The whole
     output might most probably be reduced and made more significant
     through a transitivity study.


automatically generated by info2www version 1.2.2.9