Internationalization and localization


In computing, internationalization and localization or internationalisation and localisation, often abbreviated i18n and l10n respectively, are means of adapting to different languages, regional peculiarities and technical requirements of a target locale.
Internationalization is the process of designing a software application so that it can be adapted to various languages and regions without engineering changes. Localization is the process of adapting internationalized software for a specific region or language by translating text and adding locale-specific components.
Localization uses the infrastructure or flexibility provided by internationalization.

Naming

The terms are frequently abbreviated to the numeronyms i18n and l10n for localization, due to the length of the words. Some writers have the latter term capitalized to help distinguish the two.
Some companies, like IBM and Oracle, use the term globalization, g11n, for the combination of internationalization and localization.
Microsoft defines internationalization as a combination of world-readiness and localization. World-readiness is a developer task, which enables a product to be used with multiple scripts and cultures and separates user interface resources in a localizable format.
Hewlett-Packard and HP-UX created a system called "National Language Support" or "Native Language Support" to produce localizable software.
Some vendors, including IBM use the term National Language Version for localized versions of software products supporting only one specific locale. The term implies the existence of other alike NLV versions of the software for different markets; this terminology is not used where no internationalization and localization was undertaken and a software product only supports one language and locale in any version.

Scope

According to Software without frontiers, the design aspects to consider when internationalizing a product are "data encoding, data and documentation, software construction, hardware device support, and user interaction"; while the key design areas to consider when making a fully internationalized product from scratch are "user interaction, algorithm design and data formats, software services, and documentation".
Translation is typically the most time-consuming component of language localization. This may involve:
  • For film, video, and audio, translation of spoken words or music lyrics, often using either dubbing or subtitles
  • Text translation for printed materials, and digital media
  • Potentially altering images and logos containing text to contain translations or generic icons
  • Different translation lengths and differences in character sizes can cause layouts that work well in one language to work poorly in others
  • Consideration of differences in dialect, register or variety
  • Writing conventions like:
  • * Formatting of numbers
  • * Date and time format, possibly including the use of different calendars
  • Representation conventions like:
  • * Projected 3D views
  • * Title blocks
  • * Dimension styles
  • Other aspects of the product or service that must comply with legal regulations and local technical standards

    [|Standard locale data]

can encounter differences above and beyond straightforward translation of words and phrases, because computer programs can generate content dynamically. These differences may need to be taken into account by the internationalization process in preparation for translation. Many of these differences are so regular that a conversion between languages can be easily automated. The Common Locale Data Repository by Unicode provides a collection of such differences. Its data is used by major operating systems, including Microsoft Windows, macOS and Debian, and by major Internet companies or projects such as Google and the Wikimedia Foundation. Examples of such differences include:
  • Different "scripts" in different writing systems use different characters – a different set of letters, syllograms, logograms, or symbols. Modern systems use the Unicode standard to represent many different languages with a single character encoding.
  • Writing direction is left to right in most European languages, right-to-left in Hebrew and Arabic, or both in boustrophedon scripts, and optionally vertical in some Asian languages.
  • Complex text layout, for languages where characters change shape depending on context
  • Capitalization exists in some scripts and not in others
  • Different languages and writing systems have different text sorting rules
  • Different languages have different numeral systems, which might need to be supported if Western Arabic numerals are not used
  • Different languages have different pluralization rules, which can complicate programs that dynamically display numerical content. Other grammar rules might also vary, e.g. genitive.
  • Different languages use different punctuation as in English, or guillemets
  • Keyboard shortcuts can only make use of buttons on the keyboard layout which is being localized for. If a shortcut corresponds to a word in a particular language, it may need to be changed.

    National conventions

Different countries have different economic conventions, including variations in:
In particular, the United States and Europe differ in most of these cases. Other areas often follow one of these.
Specific third-party services, such as online maps, weather reports, or payment service providers, might not be available worldwide from the same carriers, or at all.
Time zones vary across the world, and this must be taken into account if a product originally only interacted with people in a single time zone. For internationalization, UTC is often used internally and then converted into a local time zone for display purposes.
Different countries have different legal requirements, meaning for example:
Localization also may take into account differences in culture, such as:
To internationalize a product, it is important to look at a variety of markets that the product will foreseeably enter. Details such as field length for street addresses, unique format for the address, ability to make the postal code field optional to address countries that do not have postal codes or the state field for countries that do not have states, plus the introduction of new registration flows that adhere to local laws are just some of the examples that make internationalization a complex project. A broader approach takes into account cultural factors regarding for example the adaptation of the business process logic or the inclusion of individual cultural aspects.
Already in the 1990s, companies such as Bull used machine translation on a large scale, for all their translation activity: human translators handled pre-editing and post-editing.

Engineering

Both in re-engineering an existing software or designing a new internationalized software, the first step of internationalization is to split each potentially locale-dependent part into a separate module. Each module can then either rely on a standard library/dependency or be independently replaced as needed for each locale.
The current prevailing practice is for applications to place text in resource files which are loaded during program execution as needed. These strings, stored in resource files, are relatively easy to translate. Programs are often built to reference resource libraries depending on the selected locale data.
The storage for translatable and translated strings is sometimes called a message catalog as the strings are called messages. The catalog generally comprises a set of files in a specific localization format and a standard library to handle said format. One software library and format that aids this is gettext.
Thus to get an application to support multiple languages one would design the application to select the relevant language resource file at runtime. The code required to manage data entry verification and many other locale-sensitive data types also must support differing locale requirements. Modern development systems and operating systems include sophisticated libraries for international support of these types, see also Standard locale data above.
Many localization issues require more profound changes in the software than text translation. For example, OpenOffice.org achieves this with compilation switches.