Google Translate


Google Translate is a multilingual neural machine translation service developed by Google to translate text, documents and websites from one language into another. It offers a website interface, a mobile app for Android and iOS, and an API that helps developers build browser extensions and software applications. As of , Google Translate supports languages and language varieties at various levels. It served over 200 million people daily in May 2013, and over 500 million total users as of 2016, with more than 100 billion words translated daily.
Launched in April 2006 as a statistical machine translation service, it originally used United Nations and European Parliament documents and transcripts to gather linguistic data. Rather than translating languages directly, it first translated text to English and then pivoted to the target language in most of the language combinations it posited in its grid, with a few exceptions including Catalan–Spanish. During a translation, it looked for patterns in millions of documents to help decide which words to choose and how to arrange them in the target language. In recent years, it has used a deep learning model to power its translations. Its accuracy, which has been criticized on several occasions, has been measured to vary greatly across languages. In November 2016, Google announced that Google Translate would switch to a neural machine translation engine – Google Neural Machine Translation – which translated "whole sentences at a time, rather than just piece by piece. It uses this broader context to help it figure out the most relevant translation, which it then rearranges and adjusts to be more like a human speaking with proper grammar".

History

Google Translate is a web-based free-to-use translation service developed by Google in April 2006. It translates multiple forms of texts and media such as words, phrases and webpages.
Originally, Google Translate was released as a statistical machine translation service. The input text had to be translated into English first before being translated into the selected language. Since SMT uses predictive algorithms to translate text, it had poor grammatical accuracy. Despite this, Google initially did not hire experts to resolve this limitation due to the ever-evolving nature of language.
In January 2010, Google introduced an Android app and iOS version in February 2011 to serve as a portable personal interpreter. As of February 2010, it was integrated into browsers such as Chrome and was able to pronounce the translated text, automatically recognize words in a picture and spot unfamiliar text and languages.
In May 2014, Google acquired Word Lens to improve the quality of visual and voice translation. It is able to scan text or a picture using the device and have it translated instantly. Moreover, the system automatically identifies foreign languages and translates speech without requiring individuals to tap the microphone button whenever speech translation is needed.
In November 2016, Google transitioned its translating method to a system called neural machine translation. It uses deep learning techniques to translate whole sentences at a time, which has been measured to be more accurate between English and French, German, Spanish, and Chinese. No measurement results have been provided by Google researchers for GNMT from English to other languages, other languages to English, or between language pairs that do not include English. As of 2018, it translates more than 100 billion words a day.
In 2017, Google Translate was used during a court hearing when court officials at Teesside Magistrates' Court failed to book an interpreter for the Chinese defendant.
A petition for Google to add Cree to Google Translate was created in 2021, but it was not one of the languages in development at the time of the Translate Community's closure.
At the end of September 2022, Google Translate was discontinued in mainland China, which Google said was due to "low usage".
In 2024, a record of 110 languages including Cantonese, Tok Pisin and some regional languages in Russia including Bashkir, Chechen, Ossetian and Tatar language|Crimean Tatar] were added. The languages were added through the help of the PaLM 2 Generative AI model.

Functions

Google Translate can translate multiple forms of text and media, which includes text, speech, and text within still or moving images. Specifically, its functions include:
  • Written Words Translation: a function that translates written words or text to a foreign language.
  • Website Translation: a function that translates a whole webpage to selected languages.
  • Document Translation: a function that translates a document uploaded by the users to selected languages. The documents should be in the form of:.doc,.docx,.odf,.pdf,.ppt,.pptx,.ps,.rtf,.txt,.xls,.xlsx.
  • Speech Translation: a function that instantly translates spoken language into the selected foreign language.
  • Mobile App Translation: in 2018, Google introduced its new Google Translate feature called "Tap to Translate", which made instant translation accessible inside any app without exiting or switching it.
  • Image Translation: a function that identifies text in a picture taken by the users and translates text on the screen instantly by images.
  • Handwritten Translation: a function that translates language that are handwritten on the phone screen or drawn on a virtual keyboard without the support of a keyboard.
  • Bilingual Conversation Translation: a function that translates conversations in multiple languages.
  • Transcription: a function that transcribes speech in different languages.
For most of its features, Google Translate provides the pronunciation, dictionary, and listening to translation. Additionally, Google Translate has introduced its own Translate app, so translation is available with a mobile phone in offline mode.

Features

Web interface

Google Translate produces approximations across languages of multiple forms of text and media, including text, speech, websites, or text on display in still or live video images. For some languages, Google Translate can synthesize speech from text, and in certain pairs it is possible to highlight specific corresponding words and phrases between the source and target text. Results are sometimes shown with dictional information below the translation box, but it is not a dictionary and has been shown to invent translations in all languages for words it does not recognize. If "Detect language" is selected, text in an unknown language can be automatically identified. In the web interface, users can suggest alternate translations, such as for technical terms, or correct mistakes. These suggestions may be included in future updates to the translation process. If a user enters a URL in the source text, Google Translate will produce a hyperlink to a machine translation of the website. Users can save translation proposals in a "phrasebook" for later use, and a shareable URL is generated for each translation. For some languages, text can be entered via an on-screen keyboard, whether through handwriting recognition or speech recognition. It is possible to enter searches in a source language that are first translated to a destination language allowing one to browse and interpret results from the selected destination language in the source language.
Texts written in the Arabic, Cyrillic, Devanagari and Greek scripts can be automatically transliterated from their phonetic equivalents written in the Latin alphabet. The browser version of Google Translate provides the option to show phonetic equivalents of text translated from Japanese to English. The same option is not available on the paid API version.
Many of the more popular languages have a "text-to-speech" audio function that is able to read back a text in that language, up to several hundred words or so. In the case of pluricentric languages, the accent depends on the region: for English, in the Americas, most of the Asia–Pacific and West Asia, the audio uses a female General American accent, whereas in Europe, Hong Kong, Malaysia, Singapore, Guyana and all other parts of the world, a female British accent is used, except for a special General Australian accent used in Australia, New Zealand and Norfolk Island, and an Indian English accent used in India; for Spanish, in the Americas, a Latin American accent is used, while in other parts of the world, a Castilian accent is used; for French, a Quebec accent is used in Canada, while in other parts of the world, a standard European accent is used; for Bengali, a male Bangladeshi accent is used, except in India, where a special female Indian Bengali accent is used instead. Until March 2023, some less widely spoken languages used the open-source eSpeak synthesizer for their speech; producing a robotic, awkward voice that may be difficult to understand.

Browser integration

Google Translate is available in some web browsers as an optional downloadable extension that can run the translation engine, which allow right-click command access to the translation service. In February 2010, Google Translate was integrated into the Google Chrome browser by default, for optional automatic webpage translation.

Mobile app

The Google Translate app for Android and iOS supports languages and can propose translations for 37 languages via photo, 32 via voice in "conversation mode", and 27 via live video imagery in "augmented reality mode".
The Android app was released in January 2010, and for iOS on February 8, 2011, after an HTML5 web application was released for iOS users in August 2008. The Android app is compatible with devices running at least Android 2.1, while the iOS app is compatible with iPod Touches, iPads and iPhones updated to iOS 7.0+.
A January 2011 Android version experimented with a "Conversation Mode" that aims to allow users to communicate fluidly with a nearby person in another language. Originally limited to English and Spanish, the feature received support for 12 new languages, still in testing, the following October.
The 'Camera input' functionality allows users to take a photograph of a document, signboard, etc. Google Translate recognises the text from the image using optical character recognition technology and gives the translation. Camera input is not available for all languages.
In January 2015, the apps gained the ability to propose translations of physical signs in real time using the device's camera, as a result of Google's acquisition of the Word Lens app. The original January launch only supported seven languages, but a July update added support for 20 new languages, with the release of a new implementation that utilizes neural network">neural machine translation">neural networks, and also enhanced the speed and quality of Conversation Mode translations. The feature was subsequently renamed Instant Camera. The technology underlying Instant Camera combines image processing and optical character recognition, then attempts to produce cross-language equivalents using standard Google Translate estimations for the text as it is perceived.
On May 11, 2016, Google introduced Tap to Translate for Google Translate for Android. Upon highlighting text in an app that is in a foreign language, Translate will pop up inside of the app and offer translations.

API

On May 26, 2011, Google announced that the Google Translate API for software developers had been deprecated and would cease functioning. The Translate API page stated the reason as "substantial economic burden caused by extensive abuse" with an end date set for December 1, 2011. In response to public pressure, Google announced in June 2011 that the API would continue to be available as a paid service.
Because the API was used in numerous third-party websites and apps, the original decision to deprecate it led some developers to criticize Google and question the viability of using Google APIs in their products.

Google Assistant

Google Translate also provides translations for Google Assistant and the devices that Google Assistant runs on such as Google Nest and Pixel Buds.

Supported languages

the following 249 languages, dialects and language varieties written in different scripts are supported by Google Translate.
  1. Abkhaz
  2. Acehnese
  3. Acholi
  4. Afar
  5. Afrikaans
  6. Albanian
  7. Alur
  8. Amharic
  9. Arabic
  10. Armenian
  11. Assamese
  12. Avar
  13. Awadhi
  14. Aymara
  15. Azerbaijani
  16. Balinese
  17. Baluchi
  18. Bambara
  19. Baoulé
  20. Bashkir
  21. Basque
  22. Batak Karo
  23. Batak Simalungun
  24. Batak Toba
  25. Belarusian
  26. Bemba
  27. Bengali
  28. Betawi
  29. Bhojpuri
  30. Bikol
  31. Bosnian
  32. Breton
  33. Bulgarian
  34. Buryat
  35. Cantonese
  36. Catalan
  37. Cebuano
  38. Chamorro
  39. Chechen
  40. Chichewa
  41. Chinese
  42. Chinese
  43. Chuukese
  44. Chuvash
  45. Corsican
  46. Crimean Tatar
  47. Crimean Tatar
  48. Croatian
  49. Czech
  50. Danish
  51. Dari
  52. Dhivehi
  53. Dinka
  54. Dogri
  55. Dombe
  56. Dutch
  57. Dyula
  58. Dzongkha
  59. English
  60. Esperanto
  61. Estonian
  62. Ewe
  63. Faroese
  64. Fijian
  65. Filipino
  66. Finnish
  67. Fon
  68. French
  69. French
  70. Frisian language|Frisian]
  71. Friulian
  72. Fulani
  73. Ga
  74. Galician
  75. Georgian
  76. German
  77. Greek
  78. Guarani
  79. Gujarati
  80. Haitian Creole
  81. Hakha Chin
  82. Hausa
  83. Hawaiian
  84. Hebrew
  85. Hiligaynon
  86. Hindi
  87. Hmong
  88. Hungarian
  89. Hunsrik
  90. Iban
  91. Icelandic
  92. Igbo
  93. Ilocano
  94. Indonesian
  95. Inuktut
  96. Inuktut
  97. Irish
  98. Italian
  99. Jamaican Patois
  100. Japanese
  101. Javanese
  102. Jingpo
  103. Kalaallisut
  104. Kannada
  105. Kanuri
  106. Kapampangan
  107. Kazakh
  108. Khasi
  109. Khmer
  110. Kiga
  111. Kikongo
  112. Kinyarwanda
  113. Kituba
  114. Kokborok
  115. Komi
  116. Konkani
  117. Korean
  118. Krio
  119. Kurdish
  120. Kurdish
  121. Kyrgyz
  122. Lao
  123. Latgalian
  124. Latin
  125. Latvian
  126. Ligurian
  127. Limburgish
  128. Lingala
  129. Lithuanian
  130. Lombard
  131. Luganda
  132. Luo
  133. Luxembourgish
  134. Macedonian
  135. Madurese
  136. Maithili
  137. Makassar
  138. Malagasy
  139. Malay
  140. Malay
  141. Malayalam
  142. Maltese
  143. Mam
  144. Manx
  145. Maori
  146. Marathi
  147. Marshallese
  148. Marwadi
  149. Mauritian Creole
  150. Meadow Mari
  151. Meiteilon (Manipuri)
  152. Minang
  153. Mizo
  154. Mongolian
  155. Myanmar (Burmese)
  156. Nahuatl (Eastern Huasteca)
  157. Ndau
  158. Ndebele (South)
  159. Nepalbhasa (Newari)
  160. Nepali
  161. NKo
  162. Norwegian
  163. Nuer
  164. Occitan
  165. Odia (Oriya)
  166. Oromo
  167. Ossetian
  168. Pangasinan
  169. Papiamento
  170. Pashto
  171. Persian
  172. Polish
  173. Portuguese
  174. Portuguese
  175. Punjabi
  176. Punjabi
  177. Quechua
  178. Qʼeqchiʼ
  179. Romani
  180. Romanian
  181. Rundi
  182. Russian
  183. Sami (North)
  184. Samoan
  185. Sango
  186. Sanskrit
  187. Santali
  188. Santali
  189. Scots Gaelic
  190. Sepedi
  191. Serbian
  192. Sesotho
  193. Seychellois Creole
  194. Shan
  195. Shona
  196. Sicilian
  197. Silesian
  198. Sindhi
  199. Sinhala
  200. Slovak
  201. Slovenian
  202. Somali
  203. Spanish
  204. Sundanese
  205. Susu
  206. Swahili
  207. Swati
  208. Swedish
  209. Tahitian
  210. Tajik
  211. Tamazight
  212. Tamazight
  213. Tamil
  214. Tatar
  215. Telugu
  216. Tetum
  217. Thai
  218. Tibetan
  219. Tigrinya
  220. Tiv
  221. Tok Pisin
  222. Tongan
  223. Tshiluba
  224. Tsonga
  225. Tswana
  226. Tulu
  227. Tumbuka
  228. Turkish
  229. Turkmen
  230. Tuvan
  231. Twi
  232. Udmurt
  233. Ukrainian
  234. Urdu
  235. Uyghur
  236. Uzbek
  237. Venda
  238. Venetian
  239. Vietnamese
  240. Waray
  241. Welsh
  242. Wolof
  243. Xhosa
  244. Yakut
  245. Yiddish
  246. Yoruba
  247. Yucatec Maya
  248. Zapotec
  249. Zulu

Stages


  1. 1st stage
  2. # English to and from French
  3. # English to and from German
  4. # English to and from Spanish
  5. 2nd stage
  6. # English to and from Portuguese
  7. 3rd stage
  8. # English to and from Italian
  9. 4th stage
  10. # English to and from Chinese
  11. # English to and from Japanese
  12. # English to and from Korean
  13. 5th stage
  14. # English to and from Arabic
  15. 6th stage
  16. # English to and from Russian
  17. 7th stage
  18. # English to and from Chinese
  19. # Chinese
  20. 8th stage
  21. # English to and from Dutch
  22. # English to and from Greek
  23. 9th stage
  24. # English to and from Hindi
  25. 10th stage
  26. # Bulgarian
  27. # Croatian
  28. # Czech
  29. # Danish
  30. # Finnish
  31. # Norwegian
  32. # Polish
  33. # Romanian
  34. # Swedish
  35. 11th stage
  36. # Catalan
  37. # Filipino
  38. # Hebrew
  39. # Indonesian
  40. # Latvian
  41. # Lithuanian
  42. # Serbian
  43. # Slovak
  44. # Slovene
  45. # Ukrainian
  46. # Vietnamese
  47. 12th stage
  48. # Albanian
  49. # Estonian
  50. # Galician
  51. # Hungarian
  52. # Maltese
  53. # Thai
  54. # Turkish
  55. 13th stage
  56. # Persian
  57. 14th stage
  58. # Afrikaans
  59. # Belarusian
  60. # Icelandic
  61. # Irish
  62. # Macedonian
  63. # Malay
  64. # Swahili
  65. # Welsh
  66. # Yiddish
  67. 15th stage
  68. # The Beta stage is finished. Users can now choose to have the romanization written for Belarusian, Bulgarian, Chinese, Greek, Hindi, Japanese, Korean, Russian, Thai and Ukrainian. For translations from Arabic, Hindi and Persian, the user can enter a Latin transliteration of the text and the text will be transliterated to the native script for these languages as the user is typing. The text can now be read by a text-to-speech program in English, French, German and Italian.
  69. 16th stage
  70. # Haitian Creole
  71. 17th stage
  72. # Speech program launched in Hindi and Spanish.
  73. 18th stage
  74. # Speech program launched in Afrikaans, Albanian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, Finnish, Greek, Hungarian, Icelandic, Indonesian, Latvian, Macedonian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Swahili, Swedish, Turkish, Vietnamese and Welsh.
  75. 19th stage
  76. # Armenian
  77. # Azerbaijani
  78. # Basque
  79. # Georgian
  80. # Urdu
  81. 20th stage
  82. # Provides romanization for Arabic.
  83. 21st stage
  84. # Allows phonetic typing for Arabic, Greek, Hindi, Persian, Russian, Serbian and Urdu.
  85. # Latin
  86. 22nd stage
  87. # Romanization of Arabic removed.
  88. # Spell check added.
  89. # For some languages, Google replaced text-to-speech synthesizers from eSpeak's robot voice to native speaker's nature voice technologies made by SVOX, and also the old versions of French, German, Italian and Spanish; Latin uses the same synthesizer as Italian.
  90. # Speech program launched in Arabic, Japanese and Korean.
  91. 23rd stage
  92. # Choice of different translations for a word.
  93. 24th stage
  94. * 5 new Indic languages and a transliterated input method:
  95. # Bengali
  96. # Gujarati
  97. # Kannada
  98. # Tamil
  99. # Telugu
  100. 25th stage
  101. # Translation rating introduced.
  102. 26th stage
  103. # Dutch male voice synthesizer replaced with female.
  104. # Elena by SVOX replaced the Slovak eSpeak voice.
  105. # Transliteration of Yiddish added.
  106. 27th stage
  107. # Speech program launched in Thai.
  108. # Esperanto
  109. 28th stage
  110. # Lao
  111. 29th stage
  112. # Transliteration of Lao added.
  113. 30th stage
  114. # New speech program launched in English.
  115. 31st stage
  116. # New speech program in French, German, Italian, Latin and Spanish.
  117. 32nd stage
  118. # Phrasebook added.
  119. 33rd stage
  120. # Khmer
  121. 34th stage
  122. # Bosnian
  123. # Cebuano
  124. # Hmong
  125. # Javanese
  126. # Marathi
  127. 35th stage
  128. # 16 additional languages can be used with camera-input: Bulgarian, Catalan, Croatian, Danish, Estonian, Finnish, Hungarian, Indonesian, Icelandic, Latvian, Lithuanian, Norwegian, Romanian, Slovak, Slovenian and Swedish.
  129. 36th stage
  130. # Hausa
  131. # Igbo
  132. # Maori
  133. # Mongolian
  134. # Nepali
  135. # Punjabi
  136. # Somali
  137. # Yoruba
  138. # Zulu
  139. 37th stage
  140. # Definition of words added.
  141. 38th stage
  142. # Burmese
  143. # Chewa
  144. # Kazakh
  145. # Malagasy
  146. # Malayalam
  147. # Sinhala
  148. # Sotho
  149. # Sundanese
  150. # Tajik
  151. # Uzbek
  152. 39th stage
  153. # Transliteration of Arabic restored.
  154. 40th stage
  155. # Aurebesh
  156. 41st stage
  157. # Aurebesh removed.
  158. # Speech program launched in Bengali.
  159. # Amharic
  160. # Corsican
  161. # Hawaiian
  162. # Kurdish
  163. # Kyrgyz
  164. # Luxembourgish
  165. # Pashto
  166. # Samoan
  167. # Scottish Gaelic
  168. # Shona
  169. # Sindhi
  170. # West Frisian
  171. # Xhosa
  172. 42nd stage
  173. # Speech program launched in Ukrainian.
  174. 43rd stage
  175. # Speech program launched in Khmer and Sinhala.
  176. 44th stage
  177. # Speech program launched in Burmese, Malayalam, Marathi, Nepali and Telugu.
  178. 45th stage
  179. # Speech program launched in Gujarati, Kannada and Urdu.
  180. 46th stage
  181. # Kinyarwanda
  182. # Odia
  183. # Tatar
  184. # Turkmen
  185. # Uyghur
  186. 47th stage
  187. # Speech program launched in Afrikaans, Bulgarian, Catalan, Icelandic, Latvian, and Serbian.
  188. # New speech system for several languages.
  189. 48th stage
  190. # Speech program launched in Hebrew.
  191. 49th stage
  192. # Assamese
  193. # Aymara
  194. # Bambara
  195. # Bhojpuri
  196. # Dogri
  197. # Ewe
  198. # Guarani
  199. # Ilocano
  200. # Konkani
  201. # Krio
  202. # Kurdish
  203. # Lingala
  204. # Luganda
  205. # Maithili
  206. # Maldivian
  207. # Meitei
  208. # Mizo
  209. # Northern Sotho
  210. # Oromo
  211. # Quechua
  212. # Sanskrit
  213. # Tigrinya
  214. # Tsonga
  215. # Twi
  216. # eSpeak voice synthesizer removed from Armenian, Esperanto, Macedonian and Welsh.
  217. 50th stage
  218. # Speech program launched in Albanian, Bosnian and Swahili.
  219. # New speech program launched in Malayalam, Marathi and Tamil.
  220. 51st stage
  221. # Speech program launched in Croatian.
  222. 52nd stage
  223. # Speech program launched in Welsh.
  224. # New speech programs launched in Chinese, German, Indonesian, Malay, Malayalam, Tamil, and Telugu.
  225. 53rd stage
  226. # Speech program launched in Lithuanian and Punjabi.
  227. 54th stage
  228. # Abkhaz
  229. # Acehnese
  230. # Acholi
  231. # Afar
  232. # Alur
  233. # Avar
  234. # Awadhi
  235. # Balinese
  236. # Baluchi
  237. # Baoulé
  238. # Bashkir
  239. # Batak Karo
  240. # Batak Simalungun
  241. # Batak Toba
  242. # Bemba
  243. # Betawi
  244. # Bikol
  245. # Breton
  246. # Buryat
  247. # Cantonese
  248. # Chamorro
  249. # Chechen
  250. # Chuukese
  251. # Chuvash
  252. # Crimean Tatar
  253. # Dari
  254. # Dinka
  255. # Dombe
  256. # Dyula
  257. # Dzongkha
  258. # Faroese
  259. # Fijian
  260. # Fon
  261. # Friulian
  262. # Fulani
  263. # Ga
  264. # Hakha Chin
  265. # Hiligaynon
  266. # Hunsrik
  267. # Iban
  268. # Jamaican Patois
  269. # Jingpo
  270. # Kalaallisut
  271. # Kanuri
  272. # Kapampangan
  273. # Khasi
  274. # Kiga
  275. # Kikongo
  276. # Kituba
  277. # Kokborok
  278. # Komi
  279. # Latgalian
  280. # Ligurian
  281. # Limburgish
  282. # Lombard
  283. # Luo
  284. # Madurese
  285. # Makassar
  286. # Malay
  287. # Mam
  288. # Manx
  289. # Marshallese
  290. # Marwadi
  291. # Mauritian Creole
  292. # Meadow Mari
  293. # Minang
  294. # Nahuatl (Eastern Huasteca)
  295. # Ndau
  296. # Ndebele (South)
  297. # Nepalbhasa (Newari)
  298. # NKo
  299. # Nuer
  300. # Occitan
  301. # Ossetian
  302. # Pangasinan
  303. # Papiamento
  304. # Portuguese
  305. # Punjabi
  306. # Qʼeqchiʼ
  307. # Romani
  308. # Rundi
  309. # Sami (North)
  310. # Sango
  311. # Santali
  312. # Seychellois Creole
  313. # Shan
  314. # Sicilian
  315. # Silesian
  316. # Susu
  317. # Swati
  318. # Tahitian
  319. # Tamazight
  320. # Tamazight
  321. # Tetum
  322. # Tibetan
  323. # Tiv
  324. # Tok Pisin
  325. # Tongan
  326. # Tswana
  327. # Tulu
  328. # Tumbuka
  329. # Tuvan
  330. # Udmurt
  331. # Venda
  332. # Venetian
  333. # Waray
  334. # Wolof
  335. # Yakut
  336. # Yucatec Maya
  337. # Zapotec
  338. # Speech program launched in Amharic, Bulgarian, Cantonese, Galician, Hausa, and Welsh
  339. 55th stage
  340. # Crimean Tatar
  341. # French
  342. # Inuktut
  343. # Inuktut
  344. # Santali
  345. # Tshiluba

Translation methodology

In April 2006, Google Translate launched with a statistical machine translation engine.
Google Translate does not apply grammatical rules, since its algorithms are based on statistical or pattern analysis rather than traditional rule-based analysis. The system's original creator, Franz Josef Och, has criticized the effectiveness of rule-based algorithms in favor of statistical approaches. Original versions of Google Translate were based on a method called statistical machine translation, and more specifically, on research by Och who won the DARPA contest for speed machine translation in 2003. Och was the head of Google's machine translation group until leaving to join Human Longevity, Inc. in July 2014.
Google Translate does not directly translate from one language to another. Instead, it often translates first to English and then to the target language. However, because English, like all human languages, is ambiguous and depends on context, this can cause translation errors. For example, translating vous from French to Russian gives vous → youты OR Bы/вы. If Google were using an unambiguous, artificial language as the intermediary, it would be vous → youBы/вы OR tu → thouты. Such a suffixing of words disambiguates their different meanings. Hence, publishing in English, using unambiguous words, providing context, or using expressions such as "you all" may or may not make a better one-step translation depending on the target language.
The following languages do not have a direct Google translation to or from English. These languages are translated through the indicated intermediate language in addition to through English:
According to Och, a solid base for developing a usable statistical machine translation system for a new pair of languages from scratch would consist of a bilingual text corpus of more than 150–200 million words, and two monolingual corpora each of more than a billion words. Statistical models from these data are then used to translate between those languages. To acquire this huge amount of linguistic data, Google used United Nations and European Parliament documents and transcripts. The UN typically publishes documents in all six official languages of [the United Nations|official UN languages], which has produced a very large 6-language corpus. Google representatives have been involved with domestic conferences in Japan where it has solicited bilingual data from researchers.
When Google Translate generates a translation proposal, it looks for patterns in hundreds of millions of documents to help decide on the best translation. By detecting patterns in documents that have already been translated by human translators, Google Translate makes informed guesses (AI) as to what an appropriate translation should be.
Before October 2007, for languages other than Arabic, Chinese and Russian, Google Translate was based on SYSTRAN, a software engine which is still used by several other online translation services such as Babel Fish. From October 2007, Google Translate used proprietary, in-house technology based on statistical machine translation instead, before transitioning to neural machine translation.

Google Translate Community

Google used to have crowdsourcing features for volunteers to be a part of its "Translate Community", intended to help improve Google Translate's accuracy. Volunteers could select up to five languages to help improve translation; users could verify translated phrases and translate phrases in their languages to and from English, helping to improve the accuracy of translating more rare and complex phrases. In August 2016, a Google Crowdsource app was released for Android users, in which translation tasks are offered. There were three ways to contribute. First, Google showed a phrase that one should type in the translated version. Second, Google showed a proposed translation for a user to agree, disagree, or skip. Third, users could suggest translations for phrases where they think they can improve on Google's results. Tests in 44 languages showed that the "suggest an edit" feature led to an improvement in a maximum of 40% of cases over four years. Despite its role in improving translation quality and expanding language coverage, Google closed the Translate Community on March 28, 2024.

Statistical machine translation

Although Google has deployed a new system called neural machine translation for better quality translation, there are languages that still use the traditional translation method called statistical machine translation. It is a rule-based translation method that uses predictive algorithms to guess ways to translate texts in foreign languages. It aims to translate whole phrases rather than single words then gather overlapping phrases for translation. Moreover, it also analyzes bilingual text corpora to generate a statistical model that translates texts from one language to another.

Neural machine translation

In September 2016, a research team at Google announced the development of the Google Neural Machine Translation system to increase fluency and accuracy in Google Translate and in November announced that Google Translate would switch to GNMT.
Google Translate's neural machine translation system used a large end-to-end artificial neural network that attempts to perform deep learning, in particular, long short-term memory networks. GNMT improved the quality of translation over SMT in some instances because it uses an example-based machine translation method in which the system "learns from millions of examples." According to Google researchers, it translated "whole sentences at a time, rather than just piece by piece. It uses this broader context to help it figure out the most relevant translation, which it then rearranges and adjusts to be more like a human speaking with proper grammar". GNMT's "proposed architecture" of "system learning" has been implemented on over a hundred languages supported by Google Translate. With the end-to-end framework, Google states but does not demonstrate for most languages that "the system learns over time to create better, more natural translations." The GNMT network attempts interlingual machine translation, which encodes the "semantics of the sentence rather than simply memorizing phrase-to-phrase translations", and the system did not invent its own universal language, but uses "the commonality found in between many languages". GNMT was first enabled for eight languages: to and from English and Chinese, French, German, Japanese, Korean, Portuguese, Spanish and Turkish. In March 2017, it was enabled for Hindi, Russian and Vietnamese, followed by Bengali, Gujarati, Indonesian, Kannada, Malayalam, Marathi, Punjabi, Tamil and Telugu in April.
Since 2020, Google has phased out GNMT and has implemented deep learning networks based on transformers.

Accuracy

Google Translate is not as reliable as human translation. When text is well-structured, written using formal language, with simple sentences, relating to formal topics for which training data is ample, it often produces conversions similar to human translations between English and a number of high-resource languages. Accuracy decreases for those languages when fewer of those conditions apply, for example when sentence length increases or the text uses familiar or literary language. For many other languages vis-à-vis English, it can produce the gist of text in those formal circumstances. Human evaluation from English to all 102 languages shows that the main idea of a text is conveyed more than 50% of the time for 35 languages. For 67 languages, a minimally comprehensible result is not achieved 50% of the time or greater. A few studies have evaluated Chinese, French, German, and Spanish to English, but no systematic human evaluation has been conducted from most Google Translate languages to English. Speculative language-to-language scores extrapolated from English-to-other measurements indicate that Google Translate will produce translation results that convey the gist of a text from one language to another more than half the time in about 1% of language pairs, where neither language is English. Research conducted in 2011 showed that Google Translate got a slightly higher score than the UCLA minimum score for the English Proficiency Exam. Due to its identical choice of words without considering the flexibility of choosing alternative words or expressions, it produces a relatively similar translation to human translation from the perspective of formality, referential cohesion, and conceptual cohesion. Moreover, a number of languages are translated into a sentence structure and sentence length similar to a human translation. Furthermore, Google carried out a test that required native speakers of each language to rate the translation on a scale between 0 and 6, and Google Translate scored 5.43 on average.
When used as a dictionary to translate single words, Google Translate is highly inaccurate because it must guess between polysemic words. Among the top 100 words in the English language, which make up more than 50% of all written English, the average word has more than 15 senses, which makes the odds against a correct translation about 15 to 1 if each sense maps to a different word in the target language. Most common English words have at least two senses, which produces 50/50 odds in the likely case that the target language uses different words for those different senses. The odds are similar from other languages to English. Google Translate makes statistical guesses that raise the likelihood of producing the most frequent sense of a word, with the consequence that an accurate translation will be unobtainable in cases that do not match the majority or plurality corpus occurrence. The accuracy of single-word predictions has not been measured for any language. Because almost all non-English language pairs pivot through English, the odds against obtaining accurate single-word translations from one non-English language to another can be estimated by multiplying the number of senses in the source language with the number of senses each of those terms have in English. When Google Translate does not have a word in its vocabulary, it makes up a result as part of its algorithm.

Limitations

Google Translate, like other automatic translation tools, has its limitations, struggles with polysemy and multiword expressions. A word in a foreign language might have two different meanings in the translated language. This might lead to mistranslation. Additionally, grammatical errors remain a major limitation to the accuracy of Google Translate. Google Translate struggles to differentiate between imperfect and perfect aspects in Romance languages. The subjunctive mood is often erroneous. Moreover, the formal second person is often chosen, whatever the context. Since its English reference material contains only "you" forms, it has difficulty translating a language with "you all" or formal "you" variations.
Due to differences between languages in investment, research, and the extent of digital resources, the accuracy of Google Translate varies greatly among languages. Some languages produce better results than others. Most languages from Africa, Asia, and the Pacific, tend to score poorly in relation to the scores of many well-financed European languages, Afrikaans and Chinese being the high-scoring exceptions from their continents. No languages indigenous to Australia are included within Google Translate. Higher scores for European can be partially attributed to the Europarl Corpus, a trove of documents from the European Parliament that have been professionally translated by the mandate of the European Union into as many as 21 languages. A 2010 analysis indicated that French to English translation is relatively accurate, and 2011 and 2012 analyses showed that Italian to English translation is relatively accurate as well. However, if the source text is shorter, rule-based machine translations often perform better; this effect is particularly evident in Chinese to English translations. While edits of translations may be submitted, in Chinese specifically one cannot edit sentences as a whole. Instead, one must edit sometimes arbitrary sets of characters, leading to incorrect edits.
The service can be used as a dictionary by typing in words. One can translate from a book by using a scanner and an OCR like Google Drive. In its Written Words Translation function, there is a word limit on the amount of text that can be translated at once. Therefore, long text should be transferred to a document form and translated through its Document Translate function.

Open-source licenses and components

Irish language data from Foras na Gaeilge's New English-Irish Dictionary. Welsh language data from Gweiadur by Gwerin.
Certain content is copyrighted by Oxford University Press, United States. Some phrase translations come from Wikitravel.

Reviews

Shortly after launching the translation service for the first time, Google won an international competition for English–Arabic and English–Chinese machine translation.

Translation mistakes and oddities

Since Google Translate uses statistical matching to translate, translated text can often include apparently nonsensical and obvious errors, often swapping common terms for similar but nonequivalent common terms in the other language, as well as inverting sentence meaning. Novelty websites like Bad Translator and Translation Party have used the service to produce humorous text by translating back and forth between multiple languages, similar to the children's game telephone.
Certain texts in Japanese have shown to be translated to "Replying to @sarah_mcdonald" in English, often with no relation to the source text. Examples include "もーるるるるるるるる", "バチバチで草" and "絵にfう". This has been asked on multiple platforms, including YouTube.