(recode.info)libiconv


Next: Tabular Prev: Universal Up: Top
Enter node , (file) or (file)node

The `iconv' library
*******************

   The `recode' library itself contains most code and tables from the
portable `iconv' library, written by Bruno Haible.  In fact, many
capabilities of the `recode' library are duplicated because of this
merging, as the older `recode' and `iconv' libraries share many
charsets.  We discuss, here, the issues related to this duplication, and
other peculiarities specific to the `iconv' library.  The plan is to
remove duplications and better merge specificities, as `recode' evolves.

   As implemented, if a recoding request can be satisfied by the
`recode' library both with and without its `iconv' library part, it is
likely that the `iconv' library will be used.  To sort out if the
`iconv' is indeed used of not, just use the `-v' or `--verbose' option,
Note: Recoding.

   The `:libiconv:' charset represents a conceptual pivot charset
within the `iconv' part of the `recode' library (in fact, this pivot
exists, but is not directly reachable).  This charset has a mere `:' (a
colon) for an alias.  It is not allowed to recode from or to this
charset directly.  But when this charset is selected as an
intermediate, usually by automatic means, then the `iconv' part of the
`recode' library is called to handle the transformations.  By using an
`--ignore=:libiconv:' option on the `recode' call or equivalently, but
more simply, `-x:', `recode' is instructed to fully avoid this charset
as an intermediate, with the consequence that the `iconv' part of the
library is defeated.  Consider these two calls:

     recode l1..1250 < INPUT > OUTPUT
     recode -x: l1..1250 < INPUT > OUTPUT

Both should transform INPUT from `ISO-8859-1' to `CP1250' on OUTPUT.
The first call uses the `iconv' part of the library, while the second
call avoids it.  Whatever the path used, the results should normally be
identical.  However, there might be observable differences.  Most of
them might result from reversibility issues, as the `iconv' engine,
which the `recode' library directly uses for the time being, does not
address reversibility.  Even if much less likely, some differences
might result from slight errors in the tables used, such differences
should then be reported as bugs.

   Other irregularities might be seen in the area of error detection and
recovery.  The `recode' library usually tries to detect canonicity
errors in input, and production of ambiguous output, but the `iconv'
part of the library currently does not.  Input is always validated,
however.  The `recode' library may not always react properly when its
`iconv' part has no translation for a given character.

   Within a collection of names for a single charset, the `recode'
library distinguishes one of them as being the genuine charset name,
while the others are said to be aliases.  When `recode' lists all
charsets, for example with the `-l' or `--list' option, the list
integrates all `iconv' library charsets.  The selection of one of the
aliases as the genuine charset name is an artifact added by `recode',
it does not come from `iconv'.  Moreover, the `recode' library
dynamically resolves some conflicts when it initialises itself at
runtime.  This might explain some discrepancies in the table below, as
for what is the genuine charset name.

   * General character sets
    `US-ASCII'
          `ASCII', `ISO646-US', `ISO_646.IRV:1991', `ISO-IR-6',
          `ANSI_X3.4-1968', `CP367', `IBM367', `US', `csASCII' and
          `ISO646.1991-IRV' are aliases for this charset.

   * General multi-byte encodings
    `UTF-8'
          `UTF8' is an alias for this charset.

    `UCS-2'
          `ISO-10646-UCS-2' and `csUnicode' are aliases for this
          charset.

    `UCS-2BE'
          `UNICODEBIG', `UNICODE-1-1' and `csUnicode11' are aliases for
          this charset.

    `UCS-2LE'
          `UNICODELITTLE' is an alias for this charset.

    `UCS-4'
          `ISO-10646-UCS-4' and `csUCS4' are aliases for this charset.

    `UCS-4BE'

    `UCS-4LE'

    `UTF-16'

    `UTF-16BE'

    `UTF-16LE'

    `UTF-7'
          `UNICODE-1-1-UTF-7' and `csUnicode11UTF7' are aliases for
          this charset.

    `UCS-2-INTERNAL'

    `UCS-2-SWAPPED'

    `UCS-4-INTERNAL'

    `UCS-4-SWAPPED'

    `JAVA'
   * Standard 8-bit encodings
    `ISO-8859-1'
          `ISO_8859-1', `ISO_8859-1:1987', `ISO-IR-100', `CP819',
          `IBM819', `LATIN1', `L1', `csISOLatin1', `ISO8859-1' and
          `ISO8859_1' are aliases for this charset.

    `ISO-8859-2'
          `ISO_8859-2', `ISO_8859-2:1987', `ISO-IR-101', `LATIN2',
          `L2', `csISOLatin2', `ISO8859-2' and `ISO8859_2' are aliases
          for this charset.

    `ISO-8859-3'
          `ISO_8859-3', `ISO_8859-3:1988', `ISO-IR-109', `LATIN3',
          `L3', `csISOLatin3', `ISO8859-3' and `ISO8859_3' are aliases
          for this charset.

    `ISO-8859-4'
          `ISO_8859-4', `ISO_8859-4:1988', `ISO-IR-110', `LATIN4',
          `L4', `csISOLatin4', `ISO8859-4' and `ISO8859_4' are aliases
          for this charset.

    `ISO-8859-5'
          `ISO_8859-5', `ISO_8859-5:1988', `ISO-IR-144', `CYRILLIC',
          `csISOLatinCyrillic', `ISO8859-5' and `ISO8859_5' are aliases
          for this charset.

    `ISO-8859-6'
          `ISO_8859-6', `ISO_8859-6:1987', `ISO-IR-127', `ECMA-114',
          `ASMO-708', `ARABIC', `csISOLatinArabic', `ISO8859-6' and
          `ISO8859_6' are aliases for this charset.

    `ISO-8859-7'
          `ISO_8859-7', `ISO_8859-7:1987', `ISO-IR-126', `ECMA-118',
          `ELOT_928', `GREEK8', `GREEK', `csISOLatinGreek', `ISO8859-7'
          and `ISO8859_7' are aliases for this charset.

    `ISO-8859-8'
          `ISO_8859-8', `ISO_8859-8:1988', `ISO-IR-138', `HEBREW',
          `csISOLatinHebrew', `ISO8859-8' and `ISO8859_8' are aliases
          for this charset.

    `ISO-8859-9'
          `ISO_8859-9', `ISO_8859-9:1989', `ISO-IR-148', `LATIN5',
          `L5', `csISOLatin5', `ISO8859-9' and `ISO8859_9' are aliases
          for this charset.

    `ISO-8859-10'
          `ISO_8859-10', `ISO_8859-10:1992', `ISO-IR-157', `LATIN6',
          `L6', `csISOLatin6' and `ISO8859-10' are aliases for this
          charset.

    `ISO-8859-13'
          `ISO_8859-13', `ISO-IR-179', `LATIN7' and `L7' are aliases
          for this charset.

    `ISO-8859-14'
          `ISO_8859-14', `ISO_8859-14:1998', `ISO-IR-199', `LATIN8' and
          `L8' are aliases for this charset.

    `ISO-8859-15'
          `ISO_8859-15', `ISO_8859-15:1998' and `ISO-IR-203' are
          aliases for this charset.

    `ISO-8859-16'
          `ISO_8859-16', `ISO_8859-16:2000' and `ISO-IR-226' are
          aliases for this charset.

    `KOI8-R'
          `csKOI8R' is an alias for this charset.

    `KOI8-U'

    `KOI8-RU'
   * Windows 8-bit encodings
    `CP1250'
          `WINDOWS-1250' and `MS-EE' are aliases for this charset.

    `CP1251'
          `WINDOWS-1251' and `MS-CYRL' are aliases for this charset.

    `CP1252'
          `WINDOWS-1252' and `MS-ANSI' are aliases for this charset.

    `CP1253'
          `WINDOWS-1253' and `MS-GREEK' are aliases for this charset.

    `CP1254'
          `WINDOWS-1254' and `MS-TURK' are aliases for this charset.

    `CP1255'
          `WINDOWS-1255' and `MS-HEBR' are aliases for this charset.

    `CP1256'
          `WINDOWS-1256' and `MS-ARAB' are aliases for this charset.

    `CP1257'
          `WINDOWS-1257' and `WINBALTRIM' are aliases for this charset.

    `CP1258'
          `WINDOWS-1258' is an alias for this charset.

   * DOS 8-bit encodings
    `CP850'
          `IBM850', `850' and `csPC850Multilingual' are aliases for
          this charset.

    `CP866'
          `IBM866', `866' and `csIBM866' are aliases for this charset.

   * Macintosh 8-bit encodings
    `MacRoman'
          `Macintosh', `MAC' and `csMacintosh' are aliases for this
          charset.

    `MacCentralEurope'

    `MacIceland'

    `MacCroatian'

    `MacRomania'

    `MacCyrillic'

    `MacUkraine'

    `MacGreek'

    `MacTurkish'

    `MacHebrew'

    `MacArabic'

    `MacThai'
   * Other platform specific 8-bit encodings
    `HP-ROMAN8'
          `ROMAN8', `R8' and `csHPRoman8' are aliases for this charset.

    `NEXTSTEP'
   * Regional 8-bit encodings used for a single language
    `ARMSCII-8'

    `Georgian-Academy'

    `Georgian-PS'

    `MuleLao-1'

    `CP1133'
          `IBM-CP1133' is an alias for this charset.

    `TIS-620'
          `TIS620', `TIS620-0', `TIS620.2529-1', `TIS620.2533-0',
          `TIS620.2533-1' and `ISO-IR-166' are aliases for this charset.

    `CP874'
          `WINDOWS-874' is an alias for this charset.

    `VISCII'
          `VISCII1.1-1' and `csVISCII' are aliases for this charset.

    `TCVN'
          `TCVN-5712', `TCVN5712-1' and `TCVN5712-1:1993' are aliases
          for this charset.

   * CJK character sets (not documented)
    `JIS_C6220-1969-RO'
          `ISO646-JP', `ISO-IR-14', `JP' and `csISO14JISC6220ro' are
          aliases for this charset.

    `JIS_X0201'
          `JISX0201-1976', `X0201', `csHalfWidthKatakana',
          `JISX0201.1976-0' and `JIS0201' are aliases for this charset.

    `JIS_X0208'
          `JIS_X0208-1983', `JIS_X0208-1990', `JIS0208', `X0208',
          `ISO-IR-87', `csISO87JISX0208', `JISX0208.1983-0',
          `JISX0208.1990-0' and `JIS0208' are aliases for this charset.

    `JIS_X0212'
          `JIS_X0212.1990-0', `JIS_X0212-1990', `X0212', `ISO-IR-159',
          `csISO159JISX02121990', `JISX0212.1990-0' and `JIS0212' are
          aliases for this charset.

    `GB_1988-80'
          `ISO646-CN', `ISO-IR-57', `CN' and `csISO57GB1988' are
          aliases for this charset.

    `GB_2312-80'
          `ISO-IR-58', `csISO58GB231280', `CHINESE' and `GB2312.1980-0'
          are aliases for this charset.

    `ISO-IR-165'
          `CN-GB-ISOIR165' is an alias for this charset.

    `KSC_5601'
          `KS_C_5601-1987', `KS_C_5601-1989', `ISO-IR-149',
          `csKSC56011987', `KOREAN', `KSC5601.1987-0' and
          `KSX1001:1992' are aliases for this charset.

   * CJK encodings
    `EUC-JP'
          `EUCJP', `Extended_UNIX_Code_Packed_Format_for_Japanese',
          `csEUCPkdFmtJapanese' and `EUC_JP' are aliases for this
          charset.

    `SJIS'
          `SHIFT_JIS', `SHIFT-JIS', `MS_KANJI' and `csShiftJIS' are
          aliases for this charset.

    `CP932'

    `ISO-2022-JP'
          `csISO2022JP' and `ISO2022JP' are aliases for this charset.

    `ISO-2022-JP-1'

    `ISO-2022-JP-2'
          `csISO2022JP2' is an alias for this charset.

    `EUC-CN'
          `EUCCN', `GB2312', `CN-GB', `csGB2312' and `EUC_CN' are
          aliases for this charset.

    `GBK'
          `CP936' is an alias for this charset.

    `GB18030'

    `ISO-2022-CN'
          `csISO2022CN' and `ISO2022CN' are aliases for this charset.

    `ISO-2022-CN-EXT'

    `HZ'
          `HZ-GB-2312' is an alias for this charset.

    `EUC-TW'
          `EUCTW', `csEUCTW' and `EUC_TW' are aliases for this charset.

    `BIG5'
          `BIG-5', `BIG-FIVE', `BIGFIVE', `CN-BIG5' and `csBig5' are
          aliases for this charset.

    `CP950'

    `BIG5HKSCS'

    `EUC-KR'
          `EUCKR', `csEUCKR' and `EUC_KR' are aliases for this charset.

    `CP949'
          `UHC' is an alias for this charset.

    `JOHAB'
          `CP1361' is an alias for this charset.

    `ISO-2022-KR'
          `csISO2022KR' and `ISO2022KR' are aliases for this charset.

    `CHAR'

    `WCHAR_T'


automatically generated by info2www version 1.2.2.9