Get your first digital copy of the magazine for iPhone and iPad free – just search for ‘Linux User’ on the Apple App Store now! [twitter username=”linuxusermag”]
Let’s admit it, writing applications is a complex thing to do; it requires lot of blood and sweat. After putting so much effort into creating an application it would be shame to see it not being used just because it was only available in English. The bottom line is; most people pay more attention and give more respect to a product which is available in their own language. By its very nature, open source software qualifies as some of the most translated on the planet. If you want to seek a global audience for your software, it is very important that you localise your application for your users. Here’s how…
Technical terms involved in internationalisation can be very daunting, so let’s clear these before proceeding. The following are the key components that make up the complete internationalisation framework…
Locale: A locale is the part of a user’s environment that brings together information about how to handle data that is specific to the end user’s particular country, language or territory. The locale is typically installed as part of the operating system. Usually a locale identifier consists of at least a language identifier and a region identifier. It is defined in this format: [language[_territory][.codeset][@modifier]]. For example, British English using the UTF-8 encoding is en_GB.UTF-8. (More on character sets later in this article.) The same code also defines the territorial convention for spelling, currency, date format etc.:
en_US = “color,” mm/dd/yyyy, $1,234.56
en_GB = “colour,” dd/mm/yyyy, £1.234,56
Translation: It simply means the translation of the text into another language. It may not be an accurate word-by-word translation, but it conveys the correct message.
Localisation (aka L10n): Localisation is a combined term used for both translation while conforming to a relevant locale.
Internationalisation (also known as i18n): The term ‘internationalisation’ refers to the process of building a product that is locale-neutral. It means that the application should be adapted to target languages and countries without making changes to the core of the product.
Globalisation: The combination of localisation and internationalisation. It commonly refers to the process of transforming a locale-specific product into one that support all locales.
Character sets/encodings: ‘Character set’ is often used to describe a digital representation of text. A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, or octets, in order to facilitate the transmission of data. The following are the popular character encodings…
8-bit character encodings and multibyte encodings: This Includes Latin-1, Latin-2 and ISO-8859-3 encodings. These collectively support English, Danish, French, German, Italian, Norwegian, Portuguese, Spanish, Swedish, Czech, Hungarian, Polish, Slovak, Cyrillic, Arabic, Greek, Hebrew, Turkish, Baltic and many others. Multibyte character applies when you do not have one-byte-per-character mapping.
Unicode: This is by far the most complete character set produced ever. It contains 96,447 characters from all of the world’s languages. Unicode comes in many flavours, mostly differentiated based on the bytes used. Popular ones are UTF-8, UCS-2 and UTF-16. UTF-8 is a variable-length encoding using 1-4 bytes. Primary applications are for use with XML, XHTML and various other text file formats. UCS-2 provides native encoding on NT-based systems. UTF-16 introduces 16-bit encodings plus 4-byte surrogates. Used for Asian language characters, mathematical symbols, esoteric scripts etc.