(recode.info)Listings
Asking for various lists
========================
Many options control listing output generated by `recode' itself,
they are not meant to accompany actual file recodings. These options
are:
`--version'
The program merely prints its version numbers on standard output,
and exits without doing anything else.
`--help'
The program merely prints a page of help on standard output, and
exits without doing any recoding.
`-C'
`--copyright'
Given this option, all other parameters and options are ignored.
The program prints briefly the copyright and copying conditions.
See the file `COPYING' in the distribution for full statement of
the Copyright and copying conditions.
`-h[LANGUAGE/][NAME]'
`--header[=[LANGUAGE/][NAME]]'
Instead of recoding files, `recode' writes a LANGUAGE source file
on standard output and exits. This source is meant to be included
in a regular program written in the same programming LANGUAGE: its
purpose is to declare and initialise an array, named NAME, which
represents the requested recoding. The only acceptable values for
LANGUAGE are `c' or `perl', and may may be abbreviated. If
LANGUAGE is not specified, `c' is assumed. If NAME is not
specified, then it defaults to `BEFORE_AFTER'. Strings BEFORE and
AFTER are cleaned before being used according to the syntax of
LANGUAGE.
Even if `recode' tries its best, this option does not always
succeed in producing the requested source table. It will however,
provided the recoding can be internally represented by only one
step after the optimisation phase, and if this merged step conveys
a one-to-one or a one-to-many explicit table. Also, when
attempting to produce sources tables, `recode' relaxes its
checking a tiny bit: it ignores the algorithmic part of some
tabular recodings, it also avoids the processing of implied
surfaces. But this is all fairly technical. Better try and see!
Beware that other options might affect the produced source tables,
these are: `-d', `-g' and, particularly, `-s'.
`-k PAIRS'
`--known=PAIRS'
This particular option is meant to help identifying an unknown
charset, using as hints some already identified characters of the
charset. Some examples will help introducing the idea.
Let's presume here that `recode' is run in an ISO-8859-1 locale,
and that `DEFAULT_CHARSET' is unset in the environment. Suppose
you have guessed that code 130 (decimal) of the unknown charset
represents a lower case `e' with an acute accent. That is to say
that this code should map to code 233 (decimal) in the usual
charset. By executing:
recode -k 130:233
you should obtain a listing similar to:
AtariST atarist
CWI cphu cwi cwi2
IBM437 437 cp437 ibm437
IBM850 850 cp850 ibm850
IBM851 851 cp851 ibm851
IBM852 852 cp852 ibm852
IBM857 857 cp857 ibm857
IBM860 860 cp860 ibm860
IBM861 861 cp861 cpis ibm861
IBM863 863 cp863 ibm863
IBM865 865 cp865 ibm865
You can give more than one clue at once, to restrict the list
further. Suppose you have _also_ guessed that code 211 of the
unknown charset represents an upper case `E' with diaeresis, that
is, code 203 in the usual charset. By requesting:
recode -k 130:233,211:203
you should obtain:
IBM850 850 cp850 ibm850
IBM852 852 cp852 ibm852
IBM857 857 cp857 ibm857
The usual charset may be overridden by specifying one non-option
argument. For example, to request the list of charsets for which
code 130 maps to code 142 for the Macintosh, you may ask:
recode -k 130:142 mac
and get:
AtariST atarist
CWI cphu cwi cwi2
IBM437 437 cp437 ibm437
IBM850 850 cp850 ibm850
IBM851 851 cp851 ibm851
IBM852 852 cp852 ibm852
IBM857 857 cp857 ibm857
IBM860 860 cp860 ibm860
IBM861 861 cp861 cpis ibm861
IBM863 863 cp863 ibm863
IBM865 865 cp865 ibm865
which, of course, is identical to the result of the first example,
since the code 142 for the Macintosh is a small `e' with acute.
More formally, option `-k' lists all possible _before_ charsets
for the _after_ charset given as the sole non-option argument to
`recode', but subject to restrictions given in PAIRS. If there is
no non-option argument, the _after_ charset is taken to be the
default charset for this `recode'.
The restrictions are given as a comma separated list of pairs,
each pair consisting of two numbers separated by a colon. The
numbers are taken as decimal when the initial digit is between `1'
and `9'; `0x' starts an hexadecimal number, or else `0' starts an
octal number. The first number is a code in any _before_ charset,
while the second number is a code in the specified _after_ charset.
If the first number would not be transformed into the second
number by recoding from some _before_ charset to the _after_
charset, then this _before_ charset is rejected. A _before_
charset is listed only if it is not rejected by any pair. The
program will only test those _before_ charsets having a tabular
style internal description (Note: Tabular), so should be the
selected _after_ charset.
The produced list is in fact a subset of the list produced by the
option `-l'. As for option `-l', the non-option argument is
interpreted as a charset name, possibly abbreviated to any non
ambiguous prefix.
`-l[FORMAT]'
`--list[=FORMAT]'
This option asks for information about all charsets, or about one
particular charset. No file will be recoded.
If there is no non-option arguments, `recode' ignores the FORMAT
value of the option, it writes a sorted list of charset names on
standard output, one per line. When a charset name have aliases
or synonyms, they follow the true charset name on its line, sorted
from left to right. Each charset or alias is followed by its
implied surfaces, if any. This list is over two hundred lines.
It is best used with `grep -i', as in:
recode -l | grep -i greek
There might be one non-option argument, in which case it is
interpreted as a charset name, possibly abbreviated to any non
ambiguous prefix. This particular usage of the `-l' option is
obeyed _only_ for charsets having a tabular style internal
description (Note: Tabular). Even if most charsets have this
property, some do not, and the option `-l' cannot be used to
detail these particular charsets. For knowing if a particular
charset can be listed this way, you should merely try and see if
this works. The FORMAT value of the option is a keyword from the
following list. Keywords may be abbreviated by dropping suffix
letters, and even reduced to the first letter only:
`decimal'
This format asks for the production on standard output of a
concise tabular display of the charset, in which character
code values are expressed in decimal.
`octal'
This format uses octal instead of decimal in the concise
tabular display of the charset.
`hexadecimal'
This format uses hexadecimal instead of decimal in the
concise tabular display of the charset.
`full'
This format requests an extensive display of the charset on
standard output, using one line per character showing its
decimal, hexadecimal, octal and `UCS-2' code values, and also
a descriptive comment which should be the 10646 name for the
character.
The descriptive comment is given in English and ASCII, yet if
the English description is not available but a French one is,
then the French description is given instead, using Latin-1.
However, if the `LANGUAGE' or `LANG' environment variable
begins with the letters `fr', then listing preference goes to
French when both descriptions are available.
When option `-l' is used together with a CHARSET argument, the
FORMAT defaults to `decimal'.
`-T'
`--find-subsets'
This option is a maintainer tool for evaluating the redundancy of
those charsets, in `recode', which are internally represented by
an `UCS-2' data table. After the listing has been produced, the
program exits without doing any recoding. The output is meant to
be sorted, like this: `recode -T | sort'. The option triggers
`recode' into comparing all pairs of charsets, seeking those which
are subsets of others. The concept and results are better
explained through a few examples. Consider these three sample
lines from `-T' output:
[ 0] IBM891 == IBM903
[ 1] IBM1004 < CP1252
[ 12] INVARIANT < CSA_Z243.4-1985-1
The first line means that `IBM891' and `IBM903' are completely
identical as far as `recode' is concerned, so one is fully
redundant to the other. The second line says that `IBM1004' is
wholly contained within `CP1252', yet there is a single character
which is in `CP1252' without being in `IBM1004'. The third line
says that `INVARIANT' is wholly contained within
`CSA_Z243.4-1985-1', but twelve characters are in
`CSA_Z243.4-1985-1' without being in `INVARIANT'. The whole
output might most probably be reduced and made more significant
through a transitivity study.
automatically generated by info2www version 1.2.2.9