(recode.info)Reversibility
Reversibility issues
====================
The following options are somewhat related to reversibility issues:
`-f'
`--force'
With this option, irreversible or otherwise erroneous recodings
are run to completion, and `recode' does not exit with a non-zero
status if it would be only because irreversibility matters. Note:
Reversibility.
Without this option, `recode' tries to protect you against recoding
a file irreversibly over itself(1). Whenever an irreversible
recoding is met, or any other recoding error, `recode' produces a
warning on standard error. The current input file does not get
replaced by its recoded version, and `recode' then proceeds with
the recoding of the next file.
When the program is merely used as a filter, standard output will
have received a partially recoded copy of standard input, up to
the first error point. After all recodings have been done or
attempted, and if some recoding has been aborted, `recode' exits
with a non-zero status.
In releases of `recode' prior to version 3.5, this option was
always selected, so it was rather meaningless. Nevertheless,
users were invited to start using `-f' right away in scripts
calling `recode' whenever convenient, in preparation for the
current behaviour.
`-q'
`--quiet'
`--silent'
This option has the sole purpose of inhibiting warning messages
about irreversible recodings, and other such diagnostics. It has
no other effect, in particular, it does _not_ prevent recodings to
be aborted or `recode' to return a non-zero exit status when
irreversible recodings are met.
This option is set automatically for the children processes, when
recode splits itself in many collaborating copies. Doing so, the
diagnostic is issued only once by the parent. See option `-p'.
`-s'
`--strict'
By using this option, the user requests that `recode' be very
strict while recoding a file, merely losing in the transformation
any character which is not explicitly mapped from a charset to
another. Such a loss is not reversible and so, will bring
`recode' to fail, unless the option `-f' is also given as a kind
of counter-measure.
Using `-s' without `-f' might render the `recode' program very
susceptible to the slighest file abnormalities. Despite the fact
that it might be irritating to some users, such paranoia is
sometimes wanted and useful.
Even if `recode' tries hard to keep the recodings reversible, you
should not develop an unconditional confidence in its ability to do so.
You _ought_ to keep only reasonable expectations about reverse
recodings. In particular, consider:
* Most transformations are fully reversible for all inputs, but lose
this property whenever `-s' is specified.
* A few transformations are not meant to be reversible, by design.
* Reversibility sometimes depends on actual file contents and cannot
be ascertained beforehand, without reading the file.
* Reversibility is never absolute across successive versions of this
program. Even correcting a small bug in a mapping could induce
slight discrepancies later.
* Reversibility is easily lost by merging. This is best explained
through an example. If you reversibly recode a file from charset
A to charset B, then you reversibly recode the result from charset
B to charset C, you cannot expect to recover the original file by
merely recoding from charset C directly to charset A. You will
instead have to recode from charset C back to charset B, and only
then from charset B to charset A.
* Faulty files create a particular problem. Consider an example,
recoding from `IBM-PC' to `Latin-1'. End of lines are represented
as `\r\n' in `IBM-PC' and as `\n' in `Latin-1'. There is no way
by which a faulty `IBM-PC' file containing a `\n' not preceded by
`\r' be translated into a `Latin-1' file, and then back.
* There is another difficulty arising from code equivalences. For
example, in a `LaTeX' charset file, the string `\^\i{}' could be
recoded back and forth through another charset and become
`\^{\i}'. Even if the resulting file is equivalent to the
original one, it is not identical.
Unless option `-s' is used, `recode' automatically tries to fill
mappings with invented correspondences, often making them fully
reversible. This filling is not made at random. The algorithm tries to
stick to the identity mapping and, when this is not possible, it prefers
generating many small permutation cycles, each involving only a few
codes.
For example, here is how `IBM-PC' code 186 gets translated to
`control-U' in `Latin-1'. `Control-U' is 21. Code 21 is the `IBM-PC'
section sign, which is 167 in `Latin-1'. `recode' cannot reciprocate
167 to 21, because 167 is the masculine ordinal indicator within
`IBM-PC', which is 186 in `Latin-1'. Code 186 within `IBM-PC' has no
`Latin-1' equivalent; by assigning it back to 21, `recode' closes this
short permutation loop.
As a consequence of this map filling, `recode' may sometimes produce
_funny_ characters. They may look annoying, they are nevertheless
helpful when one changes his (her) mind and wants to revert to the prior
recoding. If you cannot stand these, use option `-s', which asks for a
very strict recoding.
This map filling sometimes has a few surprising consequences, which
some users wrongly interpreted as bugs. Here are two examples.
1. In some cases, `recode' seems to copy a file without recoding it.
But in fact, it does. Consider a request:
recode l1..us < File-Latin1 > File-ASCII
cmp File-Latin1 File-ASCII
then `cmp' will not report any difference. This is quite normal.
`Latin-1' gets correctly recoded to ASCII for charsets
commonalities (which are the first 128 characters, in this case).
The remaining last 128 `Latin-1' characters have no ASCII
correspondent. Instead of losing them, `recode' elects to map
them to unspecified characters of ASCII, so making the recoding
reversible. The simplest way of achieving this is merely to keep
those last 128 characters unchanged. The overall effect is
copying the file verbatim.
If you feel this behaviour is too generous and if you do not wish
to care about reversibility, simply use option `-s'. By doing so,
`recode' will strictly map only those `Latin-1' characters which
have an ASCII equivalent, and will merely drop those which do not.
Then, there is more chance that you will observe a difference
between the input and the output file.
2. Recoding the wrong way could sometimes give the false impression
that recoding has _almost_ been done properly. Consider the
requests:
recode 437..l1 < File-Latin1 > Temp1
recode 437..l1 < Temp1 > Temp2
so declaring wrongly `File-Latin1' to be an IBM-PC file, and
recoding to `Latin-1'. This is surely ill defined and not
meaningful. Yet, if you repeat this step a second time, you might
notice that many (not all) characters in `Temp2' are identical to
those in `File-Latin1'. Sometimes, people try to discover how
`recode' works by experimenting a little at random, rather than
reading and understanding the documentation; results such as this
are surely confusing, as they provide those people with a false
feeling that they understood something.
Reversible codings have this property that, if applied several
times in the same direction, they will eventually bring any
character back to its original value. Since `recode' seeks small
permutation cycles when creating reversible codings, besides
characters unchanged by the recoding, most permutation cycles will
be of length 2, and fewer of length 3, etc. So, it is just
expectable that applying the recoding twice in the same direction
will recover most characters, but will fail to recover those
participating in permutation cycles of length 3. On the other
end, recoding six times in the same direction would recover all
characters in cycles of length 1, 2, 3 or 6.
---------- Footnotes ----------
(1) There are still some cases of ambiguous output which are rather
difficult to detect, and for which the protection is not active.
automatically generated by info2www version 1.2.2.9