26th May, 2008

Quick Tips for Localizing Web Apps

Sign with 'Leather Worker' in many languages

Before I worked at Yahoo!, I had never localized (modifying an application to fit a region’s language and cultural needs) anything. Now that I’ve done it for over 2 years and 30 different localizations, there are a few things I’ve learned that are quite important.

Allow for text expansion and shrinkage

This is rule #1 when localizing. Some languages like German or Russian can expand to over 100% more than the English equivalent. And others like Chinese or Japanese can shrink by more than 50%, which can equally wreck a non-flexible layout. Your HTML and CSS needs to be extremely flexible when dealing with localized content. No more fixed height or width buttons, tabs or any other content areas.

Keep text out of images

Replacing text in HTML is easy and can be done automatically with many localization tools. Replacing text in graphics is not so easy. Do whatever you can to get text out of graphics and into your HTML. This may annoy designers, but will save everyone time.

Store configuration data separately from translations

Because your web app will need to behave differently for each locale, you will need to put configuration information somewhere. You may be tempted to put it into your translations which would allow translators/localizers the ability to change configuration on their own, but this is a bad idea. Why?

One day a localizer will put a really bad value into your configuration data and cause your app to crash. And it will be your fault because why should a localizer know how your app works and what a correct configuration value should be?

Another benefit of storing configuration data separately is you can easily write scripts to parse your configuration or change it. Much harder to do that when it’s buried in localizations.

Just use UTF-8

I’m not an encoding expert, but when every locale is using 1 encoding and that encoding can handle almost all languages, it will make your life much easier. Tell everyone to use UTF-8 and you won’t be banging your head against the wall every time you see little boxes with question makrs in them.

Watch out for variable substitution

When dealing with sentences like “You have X new messages”, many languages have multiple different translations depending on the number of messages. 0, 1, 2-3, 4-7, 8-N may all have different translations. Try handling that in an if-else block times 10 languages. It doesn’t scale.

You can skip dealing with this type of challenge sometimes by setting things up like this: “Number of new messages: X”. Sometimes this is OK, sometimes it’s not. There’s a lot of different ways to handle it, which goes beyond the scope of this post.

Responses

Thankfully, I’ve never had to deal with any kind of localization, but I’m sure my day is coming. I’m curious about your “You have X new messages” example. What do you mean when you say there could be different translations? In some languages, does the wording of the sentence actually change if it’s more than, say, five messages?

@Brock

Yeah, some languages change their pluralization depending on the # of items they are talking about.

See http://ed.agadak.net/2007/12/one-potato-two-potato-three-potato-four

@Brock,
Ryan is correct about the pluralization issue. A good way of organizing your strings is to use tokens for variables so the localizer can replace them and rearrange them without having to know anything about your code

something like
“You have X new messages” could be stored as

“You have %NUMBER_OF_MESSAGES% new messages”

then the localizer could easily understand how to rearrange that as:

spanish
“usted tiene %NUMBER_OF_MESSAGES% nuevos mensajes”

Portuguese
“tem %NUMBER_OF_MESSAGES% novas mensagens”

I have some of Ryan’s tips and a few others in a post I made in 05
http://www.litfuel.net/plush/?postid=84

@Jim

Yeah, I agree with your comment about not using things like ‘X’ for variables, better to use what you mentioned. My point was more about how pluralization for 3 messages is different than pluralization for 10 in some languages. Gettext does support this, not sure about other localization tools.

Regarding your article you linked to:
At Yahoo!, we never stored any translations in a DB for real-time access. We generated a complete copy of our templates for each locale so there was no runtime performance hit.

Categories