IETF language tag


An IETF BCP 47 language tag is a standardized code that is used to identify human languages on the internet. The tag structure has been standardized by the Internet Engineering Task Force in Best Current Practice 47; the subtags are maintained by the IANA Language Subtag Registry.
To distinguish language variants for countries, regions, or writing systems, IETF language tags combine subtags from other standards such as ISO 639, ISO 15924, ISO 3166-1 and UN M.49.
For example, the tag stands for English; for Latin American Spanish; for Romansh Sursilvan; for Serbian written in Cyrillic script; for Min Nan Chinese using traditional Han characters, as spoken in Taiwan; for Cantonese using traditional Han characters, as spoken in Hong Kong; and for Zürich German.
It is used by computing standards such as HTTP, HTML, XML and PNG.

History

IETF language tags were first defined in, edited by Harald Tveit Alvestrand, published in March 1995. The tags used ISO 639 two-letter language codes and ISO 3166 two-letter country codes, and allowed registration of whole tags that included variant or script subtags of three to eight letters.
In January 2001, this was updated by, which added the use of ISO 639-2 three-letter codes, permitted subtags with digits, and adopted the concept of language ranges from HTTP/1.1 to help with matching of language tags.
The next revision of the specification came in September 2006 with the publication of , edited by Addison Philips and Mark Davis, and . RFC 4646 introduced a more structured format for language tags, added the use of ISO 15924 four-letter script codes and UN M.49 three-digit geographical region codes, and replaced the old registry of tags with a new registry of subtags. The small number of previously defined tags that did not conform to the new structure were grandfathered in order to maintain compatibility with RFC 3066.
The current version of the specification,, was published in September 2009. The main purpose of this revision was to incorporate three-letter codes from ISO 639-3 and 639-5 into the Language Subtag Registry, in order to increase the interoperability between ISO 639 and BCP 47.

Syntax of language tags

Each language tag is composed of one or more "subtags" separated by hyphens. Each subtag is composed of basic Latin letters or digits only.
With the exceptions of private-use language tags beginning with an x- prefix and grandfathered language tags, subtags occur in the following order:
  • A single primary language subtag based on a two-letter language code from ISO 639-1 or a three-letter code from ISO 639-2, ISO 639-3 or ISO 639-5, or registered through the BCP 47 process and composed of five to eight letters;
  • Up to three optional extended language subtags composed of three letters each, separated by hyphens;
  • An optional script subtag, based on a four-letter script code from ISO 15924 ;
  • An optional region subtag based on a two-letter country code from ISO 3166-1 alpha-2, or a three-digit code from UN M.49 for geographical regions;
  • Optional variant subtags, separated by hyphens, each composed of five to eight letters, or of four characters starting with a digit;
  • Optional extension subtags, separated by hyphens, each composed of a single character, with the exception of the letter x, and a hyphen followed by one or more subtags of two to eight characters each, separated by hyphens;
  • An optional private-use subtag, composed of the letter x and a hyphen followed by subtags of one to eight characters each, separated by hyphens.
Subtags are not case-sensitive, but the specification recommends using the same case as in the Language Subtag Registry, where region subtags are UPPERCASE, script subtags are Title Case, and all other subtags are lowercase. This capitalization follows the recommendations of the underlying ISO standards.
Optional script and region subtags are preferred to be omitted when they add no distinguishing information to a language tag. For example, es is preferred over es-Latn, as Spanish is fully expected to be written in the Latin script; ja is preferred over ja-JP, as Japanese as used in Japan does not differ markedly from Japanese as used elsewhere.
Not all linguistic regions can be represented with a valid region subtag: the subnational regional dialects of a primary language are registered as variant subtags. For example, the valencia variant subtag for the Valencian variant of the Catalan is registered in the Language Subtag Registry with the prefix ca. As this dialect is spoken almost exclusively in Spain, the region subtag ES can normally be omitted.
Furthermore, there are script tags that do not refer to traditional scripts such as Latin, or even scripts at all, and these usually begin with a Z. For example, Zsye refers to emojis, Zmth to mathematical notation, Zxxx to unwritten documents and Zyyy to undetermined scripts.
IETF language tags have been used as locale identifiers in many applications. It may be necessary for these applications to establish their own strategy for defining, encoding and matching locales if the strategy described in RFC 4647 is not adequate.
The use, interpretation and matching of IETF language tags is currently defined in RFC 5646 and RFC 4647. The Language Subtag Registry lists all currently valid public subtags. Private-use subtags are not included in the Registry as they are implementation-dependent and subject to private agreements between third parties using them. These private agreements are out of scope of BCP 47.

List of common primary language subtags

The following is a list of some of the more commonly used primary language subtags. The list represents only a small subset of primary language subtags; for full information, the Language Subtag Registry should be consulted directly.
English nameNative nameSubtag
AfrikaansAfrikaansaf
Amharicአማርኛam
Arabicar
MapudungunMapudungunarn
Moroccan Arabicary
Assameseঅসমীয়াas
AzerbaijaniAzərbaycanaz
BashkirБашҡортba
Belarusianбеларускаяbe
Bulgarianбългарскиbg
Bengaliবাংলাbn
Tibetanབོད་ཡིགbo
Bretonbrezhonegbr
Bosnianbosanski/босанскиbs
Catalancatalàca
Central Kurdishckb
CorsicanCorsuco
Czechčeštinacs
WelshCymraegcy
Danishdanskda
GermanDeutschde
Lower Sorbiandolnoserbšćinadsb
Divehidv
GreekΕλληνικάel
EnglishEnglishen
Spanishespañoles
Estonianeestiet
Basqueeuskaraeu
Persianfa
Finnishsuomifi
FilipinoFilipinofil
Faroeseføroysktfo
Frenchfrançaisfr
FrisianFryskfy
IrishGaeilgega
Scottish GaelicGàidhliggd
GilberteseTaetae ni Kiribatigil
Galiciangalegogl
Swiss GermanSchweizerdeutschgsw
Gujaratiગુજરાતીgu
HausaHausaha
Hebrewhe
Hindiहिंदीhi
Croatianhrvatskihr
Upper Sorbianhornjoserbšćinahsb
Hungarianmagyarhu
ArmenianՀայերենhy
IndonesianBahasa Indonesiaid
IgboIgboig
Yiꆈꌠꁱꂷii
Icelandicíslenskais
Italianitalianoit
InuktitutInuktitut/
ᐃᓄᒃᑎᑐᑦ
iu
Japanese日本語ja
Georgianქართულიka
KazakhҚазақшаkk
Greenlandickalaallisutkl
Khmerខ្មែរkm
Kannadaಕನ್ನಡkn
Korean한국어ko
Konkaniकोंकणीkok
KurdishKurdîku
KyrgyzКыргызky
LuxembourgishLëtzebuergeschlb
Laoລາວlo
Lithuanianlietuviųlt
Latvianlatviešulv
MaoriReo Māorimi
Macedonianмакедонски јазикmk
Malayalamമലയാളംml
MongolianМонгол хэл/
ᠮᠤᠨᠭᠭᠤᠯ ᠬᠡᠯᠡ
mn
MohawkKanien'kéhamoh
Marathiमराठीmr
MalayBahasa Malaysiams
MalteseMaltimt
Burmeseမြန်မာဘာသာmy
Norwegian norsk nb
Nepaliनेपाली ne
DutchNederlandsnl
Norwegian norsk nn
Norwegiannorskno
Occitanoccitanoc
Odiaଓଡ଼ିଆor
PapiamentoPapiamentupap
Punjabiਪੰਜਾਬੀpa
Polishpolskipl
Dariprs
Pashtops
Portugueseportuguêspt
K'icheK'ichequc
Quechuarunasimiqu
RomanshRumantschrm
Romanianromânăro
Russianрусскийru
KinyarwandaKinyarwandarw
Sanskritसंस्कृतsa
Yakutсахаsah
Sindhisd
Sami davvisámegiellase
Sinhalaසිංහලsi
Slovakslovenčinask
Slovenianslovenščinasl
Sami åarjelsaemiengielesma
Sami julevusámegiellasmj
Sami sämikielâsmn
Sami sääʹmǩiõllsms
Albanianshqipsq
Serbiansrpski/српскиsr
SesothoSesothost
Swedishsvenskasv
KiswahiliKiswahilisw
Syriacsyc
Tamilதமிழ்ta
Teluguతెలుగుte
TajikТоҷикӣtg
Thaiไทยth
Turkmentürkmençetk
TagalogTagalogtl
TswanaSetswanatn
TurkishTürkçetr
TatarТатарчаtt
TamazightTamazighttzm
Uyghurug
Ukrainianукраїнськаuk
Urduur
UzbekUzbek/Ўзбекuz
VietnameseTiếng Việtvi
WolofWolofwo
XhosaisiXhosaxh
Yiddishיידישyi
YorubaYorubayo
Chinese中文zh
ZuluisiZuluzu