(recode.info)Introduction


Next: Invoking recode Prev: Tutorial Up: Top
Enter node , (file) or (file)node

Terminology and purpose
***********************

   A few terms are used over and over in this manual, our wise reader
will learn their meaning right away.  Both ISO (International
Organization for Standardisation) and IETF (Internet Engineering Task
Force) have their own terminology, this document does not try to stick
to either one in a strict way, while it does not want to throw more
confusion in the field.  On the other hand, it would not be efficient
using paraphrases all the time, so `recode' coins a few short words,
which are explained below.

   A "charset", in the context of `recode', is a particular association
between computer codes on one side, and a repertoire of intended
characters on the other side.  Codes are usually taken from a set of
consecutive small integers, starting at 0.  Some characters have a
graphical appearance (glyph) or displayable effect, others have special
uses like, for example, to control devices or to interact with
neighbouring codes to specify them more precisely.  So, a _charset_ is
roughly one of those tables, giving a meaning to each of the codes from
the set of allowable values.  MIME also uses the term charset with
approximately the same meaning.  It does _not_ exactly corresponds to
what ISO calls a "coded character set", that is, a set of characters
with an encoding for them.  An coded character set does not necessarily
use all available code positions, while a MIME charset usually tries to
specify them all.  A MIME charset might be the union of a few disjoint
coded character sets.

   A "surface" is a term used in `recode' only, and is a short for
surface transformation of a charset stream.  This is any kind of
mapping, usually reversible, which associates physical bits in some
medium for a stream of characters taken from one or more charsets
(usually one).  A surface is a kind of varnish added over a charset so
it fits in actual bits and bytes.  How end of lines are exactly encoded
is not really pertinent to the charset, and so, there is surface for
end of lines.  `Base64' is also a surface, as we may encode any charset
in it.  Other examples would `DES' enciphering, or `gzip' compression
(even if `recode' does not offer them currently): these are ways to give
a real life to theoretical charsets.  The "trivial" surface consists
into putting characters into fixed width little chunks of bits, usually
eight such bits per character.  But things are not always that simple.

   This `recode' library, and the program by that name, have the purpose
of converting files between various charsets and surfaces.  When this
cannot be done in exact ways, as it is often the case, the program may
get rid of the offending characters or fall back on approximations.
This library recognises or produces around 175 such charsets under 500
names, and handle a dozen surfaces.  Since it can convert each charset
to almost any other one, many thousands of different conversions are
possible.

   The `recode' program and library do not usually know how to split and
sort out textual and non-textual information which may be mixed in a
single input file.  For example, there is no surface which currently
addresses the problem of how lines are blocked into physical records,
when the blocking information is added as binary markers or counters
within files.  So, `recode' should be given textual streams which are
rather _pure_.

   This tool pays special attention to superimposition of diacritics for
some French representations.  This orientation is mostly historical, it
does not impair the usefulness, generality or extensibility of the
program.  `recode' is both a French and English word.  For those who
pay attention to those things, the proper pronunciation is French (that
is, `racud', with `a' like in `above', and `u' like in `cut').

   The program `recode' has been written by Franc,ois Pinard.  With
time, it got to reuse works from other contributors, and notably, those
of Keld Simonsen and Bruno Haible.

Charset overview
Overview of charsets
Surface overview
Overview of surfaces
Contributing
Contributions and bug reports

automatically generated by info2www version 1.2.2.9