(recode.info)Errors
Handling errors
===============
The `recode' program, while using the `recode' library, needs to
control whether recoding problems are reported or not, and then reflect
these in the exit status. The program should also instruct the library
whether the recoding should be abruptly interrupted when an error is
met (so sparing processing when it is known in advance that a wrong
result would be discarded anyway), or if it should proceed nevertheless.
Here is how the library groups errors into levels, listed here in order
of increasing severity.
`RECODE_NO_ERROR'
No error was met on previous library calls.
`RECODE_NOT_CANONICAL'
The input text was using one of the many alternative codings for
some phenomenon, but not the one `recode' would have canonically
generated. So, if the reverse recoding is later attempted, it
would produce a text having the same _meaning_ as the original
text, yet not being byte identical.
For example, a `Base64' block in which end-of-lines appear
elsewhere that at every 76 characters is not canonical. An
e-circumflex in TeX which is coded as `\^{e}' instead of `\^e' is
not canonical.
`RECODE_AMBIGUOUS_OUTPUT'
It has been discovered that if the reverse recoding was attempted
on the text output by this recoding, we would not obtain the
original text, only because an ambiguity was generated by accident
in the output text. This ambiguity would then cause the wrong
interpretation to be taken.
Here are a few examples. If the `Latin-1' sequence `e^' is
converted to Easy French and back, the result will be interpreted
as e-circumflex and so, will not reflect the intent of the
original two characters. Recoding an `IBM-PC' text to `Latin-1'
and back, where the input text contained an isolated `LF', will
have a spurious `CR' inserted before the `LF'.
Currently, there are many cases in the library where the
production of ambiguous output is not properly detected, as it is
sometimes a difficult problem to accomplish this detection, or to
do it speedily.
`RECODE_UNTRANSLATABLE'
One or more input character could not be recoded, because there is
just no representation for this character in the output charset.
Here are a few examples. Non-strict mode often allows `recode' to
compute on-the-fly mappings for unrepresentable characters, but
strict mode prohibits such attribution of reversible translations:
so strict mode might often trigger such an error. Most `UCS-2'
codes used to represent Asian characters cannot be expressed in
various Latin charsets.
`RECODE_INVALID_INPUT'
The input text does not comply with the coding it is declared to
hold. So, there is no way by which a reverse recoding would
reproduce this text, because `recode' should never produce invalid
output.
Here are a few examples. In strict mode, `ASCII' text is not
allowed to contain characters with the eight bit set. `UTF-8'
encodings ought to be minimal(1).
`RECODE_SYSTEM_ERROR'
The underlying system reported an error while the recoding was
going on, likely an input/output error. (This error symbol is
currently unused in the library.)
`RECODE_USER_ERROR'
The programmer or user requested something the recoding library is
unable to provide, or used the API wrongly. (This error symbol is
currently unused in the library.)
`RECODE_INTERNAL_ERROR'
Something really wrong, which should normally never happen, was
detected within the recoding library. This might be due to
genuine bugs in the library, or maybe due to un-initialised or
overwritten arguments to the API. (This error symbol is currently
unused in the library.)
`RECODE_MAXIMUM_ERROR'
This error code should never be returned, it is only internally
used as a sentinel for the list of all possible error codes.
One should be able to set the error level threshold for returning
failure at end of recoding, and also the threshold for immediate
interruption. If many errors occur while the recoding proceed, which
are not severe enough to interrupt the recoding, then the most severe
error is retained, while others are forgotten(2). So, in case of an
error, the possible actions currently are:
* do nothing and let go, returning success at end of recoding,
* just let go for now, but return failure at end of recoding,
* interrupt recoding right away and return failure now.
Note: Task level, and particularly the description of the fields
`fail_level', `abort_level' and `error_so_far', for more information
about how errors are handled.
---------- Footnotes ----------
(1) The minimality of an `UTF-8' encoding is guaranteed on output,
but currently, it is not checked on input.
(2) Another approach would have been to define the level symbols as
masks instead, and to give masks to threshold setting routines, and to
retain all errors--yet I never met myself such a need in practice, and
so I fear it would be overkill. On the other hand, it might be
interesting to maintain counters about how many times each kind of
error occurred.
automatically generated by info2www version 1.2.2.9