(recode.info)Errors


Prev: Charset level Up: Library
Enter node , (file) or (file)node

Handling errors
===============

   The `recode' program, while using the `recode' library, needs to
control whether recoding problems are reported or not, and then reflect
these in the exit status.  The program should also instruct the library
whether the recoding should be abruptly interrupted when an error is
met (so sparing processing when it is known in advance that a wrong
result would be discarded anyway), or if it should proceed nevertheless.
Here is how the library groups errors into levels, listed here in order
of increasing severity.

`RECODE_NO_ERROR'
     No error was met on previous library calls.

`RECODE_NOT_CANONICAL'
     The input text was using one of the many alternative codings for
     some phenomenon, but not the one `recode' would have canonically
     generated.  So, if the reverse recoding is later attempted, it
     would produce a text having the same _meaning_ as the original
     text, yet not being byte identical.

     For example, a `Base64' block in which end-of-lines appear
     elsewhere that at every 76 characters is not canonical.  An
     e-circumflex in TeX which is coded as `\^{e}' instead of `\^e' is
     not canonical.

`RECODE_AMBIGUOUS_OUTPUT'
     It has been discovered that if the reverse recoding was attempted
     on the text output by this recoding, we would not obtain the
     original text, only because an ambiguity was generated by accident
     in the output text.  This ambiguity would then cause the wrong
     interpretation to be taken.

     Here are a few examples.  If the `Latin-1' sequence `e^' is
     converted to Easy French and back, the result will be interpreted
     as e-circumflex and so, will not reflect the intent of the
     original two characters.  Recoding an `IBM-PC' text to `Latin-1'
     and back, where the input text contained an isolated `LF', will
     have a spurious `CR' inserted before the `LF'.

     Currently, there are many cases in the library where the
     production of ambiguous output is not properly detected, as it is
     sometimes a difficult problem to accomplish this detection, or to
     do it speedily.

`RECODE_UNTRANSLATABLE'
     One or more input character could not be recoded, because there is
     just no representation for this character in the output charset.

     Here are a few examples.  Non-strict mode often allows `recode' to
     compute on-the-fly mappings for unrepresentable characters, but
     strict mode prohibits such attribution of reversible translations:
     so strict mode might often trigger such an error.  Most `UCS-2'
     codes used to represent Asian characters cannot be expressed in
     various Latin charsets.

`RECODE_INVALID_INPUT'
     The input text does not comply with the coding it is declared to
     hold.  So, there is no way by which a reverse recoding would
     reproduce this text, because `recode' should never produce invalid
     output.

     Here are a few examples.  In strict mode, `ASCII' text is not
     allowed to contain characters with the eight bit set.  `UTF-8'
     encodings ought to be minimal(1).

`RECODE_SYSTEM_ERROR'
     The underlying system reported an error while the recoding was
     going on, likely an input/output error.  (This error symbol is
     currently unused in the library.)

`RECODE_USER_ERROR'
     The programmer or user requested something the recoding library is
     unable to provide, or used the API wrongly.  (This error symbol is
     currently unused in the library.)

`RECODE_INTERNAL_ERROR'
     Something really wrong, which should normally never happen, was
     detected within the recoding library.  This might be due to
     genuine bugs in the library, or maybe due to un-initialised or
     overwritten arguments to the API.  (This error symbol is currently
     unused in the library.)

`RECODE_MAXIMUM_ERROR'
     This error code should never be returned, it is only internally
     used as a sentinel for the list of all possible error codes.

   One should be able to set the error level threshold for returning
failure at end of recoding, and also the threshold for immediate
interruption.  If many errors occur while the recoding proceed, which
are not severe enough to interrupt the recoding, then the most severe
error is retained, while others are forgotten(2).  So, in case of an
error, the possible actions currently are:

   * do nothing and let go, returning success at end of recoding,

   * just let go for now, but return failure at end of recoding,

   * interrupt recoding right away and return failure now.

Note: Task level, and particularly the description of the fields
`fail_level', `abort_level' and `error_so_far', for more information
about how errors are handled.

   ---------- Footnotes ----------

   (1) The minimality of an `UTF-8' encoding is guaranteed on output,
but currently, it is not checked on input.

   (2) Another approach would have been to define the level symbols as
masks instead, and to give masks to threshold setting routines, and to
retain all errors--yet I never met myself such a need in practice, and
so I fear it would be overkill.  On the other hand, it might be
interesting to maintain counters about how many times each kind of
error occurred.


automatically generated by info2www version 1.2.2.9