(gettext.info)Normalizing


8.3.4 Normalizing Strings in Entries
------------------------------------

   There are many different ways for encoding a particular string into a
PO file entry, because there are so many different ways to split and
quote multi-line strings, and even, to represent special characters by
backslashed escaped sequences.  Some features of PO mode rely on the
ability for PO mode to scan an already existing PO file for a particular
string encoded into the ‘msgid’ field of some entry.  Even if PO mode
has internally all the built-in machinery for implementing this
recognition easily, doing it fast is technically difficult.  To
facilitate a solution to this efficiency problem, we decided on a
canonical representation for strings.

   A conventional representation of strings in a PO file is currently
under discussion, and PO mode experiments with a canonical
representation.  Having both ‘xgettext’ and PO mode converging towards a
uniform way of representing equivalent strings would be useful, as the
internal normalization needed by PO mode could be automatically
satisfied when using ‘xgettext’ from GNU ‘gettext’.  An explicit PO mode
normalization should then be only necessary for PO files imported from
elsewhere, or for when the convention itself evolves.

   So, for achieving normalization of at least the strings of a given PO
file needing a canonical representation, the following PO mode command
is available:

‘M-x po-normalize’
     Tidy the whole PO file by making entries more uniform.

   The special command ‘M-x po-normalize’, which has no associated keys,
revises all entries, ensuring that strings of both original and
translated entries use uniform internal quoting in the PO file.  It also
removes any crumb after the last entry.  This command may be useful for
PO files freshly imported from elsewhere, or if we ever improve on the
canonical quoting format we use.  This canonical format is not only
meant for getting cleaner PO files, but also for greatly speeding up
‘msgid’ string lookup for some other PO mode commands.

   ‘M-x po-normalize’ presently makes three passes over the entries.
The first implements heuristics for converting PO files for GNU
‘gettext’ 0.6 and earlier, in which ‘msgid’ and ‘msgstr’ fields were
using K&R style C string syntax for multi-line strings.  These
heuristics may fail for comments not related to obsolete entries and
ending with a backslash; they also depend on subsequent passes for
finalizing the proper commenting of continued lines for obsolete
entries.  This first pass might disappear once all oldish PO files would
have been adjusted.  The second and third pass normalize all ‘msgid’ and
‘msgstr’ strings respectively.  They also clean out those trailing
backslashes used by XView’s ‘msgfmt’ for continued lines.

   Having such an explicit normalizing command allows for importing PO
files from other sources, but also eases the evolution of the current
convention, evolution driven mostly by aesthetic concerns, as of now.
It is easy to make suggested adjustments at a later time, as the
normalizing command and eventually, other GNU ‘gettext’ tools should
greatly automate conformance.  A description of the canonical string
format is given below, for the particular benefit of those not having
Emacs handy, and who would nevertheless want to handcraft their PO files
in nice ways.

   Right now, in PO mode, strings are single line or multi-line.  A
string goes multi-line if and only if it has _embedded_ newlines, that
is, if it matches ‘[^\n]\n+[^\n]’.  So, we would have:

     msgstr "\n\nHello, world!\n\n\n"

   but, replacing the space by a newline, this becomes:

     msgstr ""
     "\n"
     "\n"
     "Hello,\n"
     "world!\n"
     "\n"
     "\n"

   We are deliberately using a caricatural example, here, to make the
point clearer.  Usually, multi-lines are not that bad looking.  It is
probable that we will implement the following suggestion.  We might lump
together all initial newlines into the empty string, and also all
newlines introducing empty lines (that is, for N > 1, the N-1’th last
newlines would go together on a separate string), so making the previous
example appear:

     msgstr "\n\n"
     "Hello,\n"
     "world!\n"
     "\n\n"

   There are a few yet undecided little points about string
normalization, to be documented in this manual, once these questions
settle.
automatically generated by info2www version 1.2.2.9