(gettext.info)Preparing Strings


Next: Mark Keywords Prev: Triggering Up: Sources
Enter node , (file) or (file)node

4.3 Preparing Translatable Strings
==================================

   Before strings can be marked for translations, they sometimes need to
be adjusted.  Usually preparing a string for translation is done right
before marking it, during the marking phase which is described in the
next sections.  What you have to keep in mind while doing that is the
following.

   • Decent English style.

   • Entire sentences.

   • Split at paragraphs.

   • Use format strings instead of string concatenation.

   • Avoid unusual markup and unusual control characters.

Let’s look at some examples of these guidelines.

   Translatable strings should be in good English style.  If slang
language with abbreviations and shortcuts is used, often translators
will not understand the message and will produce very inappropriate
translations.

     "%s: is parameter\n"

This is nearly untranslatable: Is the displayed item _a_ parameter or
_the_ parameter?

     "No match"

The ambiguity in this message makes it unintelligible: Is the program
attempting to set something on fire?  Does it mean "The given object
does not match the template"?  Does it mean "The template does not fit
for any of the objects"?

   In both cases, adding more words to the message will help both the
translator and the English speaking user.

   Translatable strings should be entire sentences.  It is often not
possible to translate single verbs or adjectives in a substitutable way.

     printf ("File %s is %s protected", filename, rw ? "write" : "read");

Most translators will not look at the source and will thus only see the
string ‘"File %s is %s protected"’, which is unintelligible.  Change
this to

     printf (rw ? "File %s is write protected" : "File %s is read protected",
             filename);

This way the translator will not only understand the message, she will
also be able to find the appropriate grammatical construction.  A French
translator for example translates "write protected" like "protected
against writing".

   Entire sentences are also important because in many languages, the
declination of some word in a sentence depends on the gender or the
number (singular/plural) of another part of the sentence.  There are
usually more interdependencies between words than in English.  The
consequence is that asking a translator to translate two half-sentences
and then combining these two half-sentences through dumb string
concatenation will not work, for many languages, even though it would
work for English.  That’s why translators need to handle entire
sentences.

   Often sentences don’t fit into a single line.  If a sentence is
output using two subsequent ‘printf’ statements, like this

     printf ("Locale charset \"%s\" is different from\n", lcharset);
     printf ("input file charset \"%s\".\n", fcharset);

the translator would have to translate two half sentences, but nothing
in the POT file would tell her that the two half sentences belong
together.  It is necessary to merge the two ‘printf’ statements so that
the translator can handle the entire sentence at once and decide at
which place to insert a line break in the translation (if at all):

     printf ("Locale charset \"%s\" is different from\n\
     input file charset \"%s\".\n", lcharset, fcharset);

   You may now ask: how about two or more adjacent sentences?  Like in
this case:

     puts ("Apollo 13 scenario: Stack overflow handling failed.");
     puts ("On the next stack overflow we will crash!!!");

Should these two statements merged into a single one?  I would recommend
to merge them if the two sentences are related to each other, because
then it makes it easier for the translator to understand and translate
both.  On the other hand, if one of the two messages is a stereotypic
one, occurring in other places as well, you will do a favour to the
translator by not merging the two.  (Identical messages occurring in
several places are combined by xgettext, so the translator has to handle
them once only.)

   Translatable strings should be limited to one paragraph; don’t let a
single message be longer than ten lines.  The reason is that when the
translatable string changes, the translator is faced with the task of
updating the entire translated string.  Maybe only a single word will
have changed in the English string, but the translator doesn’t see that
(with the current translation tools), therefore she has to proofread the
entire message.

   Many GNU programs have a ‘--help’ output that extends over several
screen pages.  It is a courtesy towards the translators to split such a
message into several ones of five to ten lines each.  While doing that,
you can also attempt to split the documented options into groups, such
as the input options, the output options, and the informative output
options.  This will help every user to find the option he is looking
for.

   Hardcoded string concatenation is sometimes used to construct English
strings:

     strcpy (s, "Replace ");
     strcat (s, object1);
     strcat (s, " with ");
     strcat (s, object2);
     strcat (s, "?");

In order to present to the translator only entire sentences, and also
because in some languages the translator might want to swap the order of
‘object1’ and ‘object2’, it is necessary to change this to use a format
string:

     sprintf (s, "Replace %s with %s?", object1, object2);

   A similar case is compile time concatenation of strings.  The ISO C
99 include file ‘<inttypes.h>’ contains a macro ‘PRId64’ that can be
used as a formatting directive for outputting an ‘int64_t’ integer
through ‘printf’.  It expands to a constant string, usually "d" or "ld"
or "lld" or something like this, depending on the platform.  Assume you
have code like

     printf ("The amount is %0" PRId64 "\n", number);

The ‘gettext’ tools and library have special support for these
‘<inttypes.h>’ macros.  You can therefore simply write

     printf (gettext ("The amount is %0" PRId64 "\n"), number);

The PO file will contain the string "The amount is %0<PRId64>\n".  The
translators will provide a translation containing "%0<PRId64>" as well,
and at runtime the ‘gettext’ function’s result will contain the
appropriate constant string, "d" or "ld" or "lld".

   This works only for the predefined ‘<inttypes.h>’ macros.  If you
have defined your own similar macros, let’s say ‘MYPRId64’, that are not
known to ‘xgettext’, the solution for this problem is to change the code
like this:

     char buf1[100];
     sprintf (buf1, "%0" MYPRId64, number);
     printf (gettext ("The amount is %s\n"), buf1);

   This means, you put the platform dependent code in one statement, and
the internationalization code in a different statement.  Note that a
buffer length of 100 is safe, because all available hardware integer
types are limited to 128 bits, and to print a 128 bit integer one needs
at most 54 characters, regardless whether in decimal, octal or
hexadecimal.

   All this applies to other programming languages as well.  For
example, in Java and C#, string concatenation is very frequently used,
because it is a compiler built-in operator.  Like in C, in Java, you
would change

     System.out.println("Replace "+object1+" with "+object2+"?");

into a statement involving a format string:

     System.out.println(
         MessageFormat.format("Replace {0} with {1}?",
                              new Object[] { object1, object2 }));

Similarly, in C#, you would change

     Console.WriteLine("Replace "+object1+" with "+object2+"?");

into a statement involving a format string:

     Console.WriteLine(
         String.Format("Replace {0} with {1}?", object1, object2));

   Unusual markup or control characters should not be used in
translatable strings.  Translators will likely not understand the
particular meaning of the markup or control characters.

   For example, if you have a convention that ‘|’ delimits the left-hand
and right-hand part of some GUI elements, translators will often not
understand it without specific comments.  It might be better to have the
translator translate the left-hand and right-hand part separately.

   Another example is the ‘argp’ convention to use a single ‘\v’
(vertical tab) control character to delimit two sections inside a
string.  This is flawed.  Some translators may convert it to a simple
newline, some to blank lines.  With some PO file editors it may not be
easy to even enter a vertical tab control character.  So, you cannot be
sure that the translation will contain a ‘\v’ character, at the
corresponding position.  The solution is, again, to let the translator
translate two separate strings and combine at run-time the two
translated strings with the ‘\v’ required by the convention.

   HTML markup, however, is common enough that it’s probably ok to use
in translatable strings.  But please bear in mind that the GNU gettext
tools don’t verify that the translations are well-formed HTML.


automatically generated by info2www version 1.2.2.9