Generative pre-trained transformer


A generative pre-trained transformer is a type of large language model that is widely used in generative AI chatbots. GPTs are based on a deep learning architecture called the transformer. They are pre-trained on large datasets of unlabeled content, and able to generate novel content.
OpenAI was the first to apply generative pre-training to the transformer architecture, introducing the GPT-1 model in 2018. The company has since released many bigger GPT models. The chatbot ChatGPT, released in late 2022, was followed by many competitor chatbots using their own generative pre-trained transformers to generate text, such as Gemini, DeepSeek and Claude.
GPTs are primarily used to generate text, but can be trained to generate other kinds of data. For example, GPT-4o can process and generate text, images and audio. To improve performance on complex tasks, some GPTs, such as OpenAI o3, allocate more computation time analyzing the problem before generating an output, and are called reasoning models. In 2025, GPT-5 was released with a router that automatically selects whether to use a faster model or slower reasoning model based on the provided task.

Background

During the 2010s, improved machine learning algorithms, more powerful computers, and an increase in the amount of digitized material allowed for an AI boom.
Separately, the concept of generative pre-training was a long-established technique in machine learning. GP is a form of self-supervised learning wherein a model is first trained on a large, unlabeled dataset to learn to generate data points. This pre-trained model is then adapted to a specific task using a labeled dataset.
The transformer architecture for deep learning is the core technology of a GPT. Developed by researchers at Google, it was introduced in the paper "Attention Is All You Need", which was released on June 12, 2017. The transformer architecture solved many of the performance issues that were associated with older recurrent neural network designs for natural language processing. The architecture's use of an attention mechanism allows models to process entire sequences of text at once, enabling the training of much larger and more sophisticated models. Since 2017, available transformer-based NLP systems have been capable of processing, mining, organizing, connecting, contrasting, and summarizing texts as well as answering questions from textual input.

History

On June 11, 2018, OpenAI researchers and engineers published a paper called "Improving Language Understanding by Generative Pre-Training", which introduced GPT-1, the first GPT model. It was designed as a transformer-based large language model that used generative pre-training on BookCorpus, a diverse text corpus, followed by discriminative fine-tuning to focus on specific language tasks. This semi-supervised approach was seen as a breakthrough. Previously, the best-performing neural models in natural language processing had commonly employed supervised learning from large amounts of manually labeled datatraining a large language model with this approach would have been prohibitively expensive and time-consuming.
On February 14, 2019, OpenAI introduced GPT-2, a larger model that could generate coherent text. Created as a direct scale-up of its predecessor, it had both its parameter count and dataset size increased by a factor of 10. GPT-2 has 1.5 billion parameters and was trained on WebText, a 40-gigabyte dataset of 8 million web pages. Citing risks of malicious use, OpenAI opted for a "staged release", initially publishing smaller versions of the model before releasing the full 1.5-billion-parameter model in November.
On February 10, 2020, Microsoft introduced its Turing Natural Language Generation, which it claimed was the "largest language model ever published at 17 billion parameters." The model outperformed all previous language models at a variety of tasks, including summarizing texts and answering questions.
On May 28, 2020, OpenAI introduced GPT-3, a model with 175 billion parameters that was trained on a larger dataset compared to GPT-2. It marked a significant advancement in few-shot and zero-shot learning abilities. With few examples, it could perform various tasks that it was not explicitly trained for.
Following the release of GPT-3, OpenAI started using reinforcement learning from human feedback to align models' behavior more closely with human preferences. This led to the development of InstructGPT, a fine-tuned version of GPT-3. OpenAI further refined InstructGPT to create ChatGPT, the flagship chatbot product of OpenAI that was launched on November 30, 2022. ChatGPT was initially based on GPT-3.5, but it was later transitioned to the GPT-4 model, which was released on March 14, 2023. GPT-4 was also integrated into parts of several applications, including Microsoft Copilot, GitHub Copilot, Snapchat, Khan Academy, and Duolingo.
The immense popularity of ChatGPT spurred widespread development of competing GPT-based systems from other organizations. EleutherAI released a series of open-weight models, including GPT-J in 2021. Other major technology companies later developed their own GPT models, such as Google's PaLM and Gemini as well as Meta AI's Llama.
Many subsequent GPT models have been trained to be multimodal. For example, GPT-4o can both process and generate text, images, and audio. Additionally, GPT models like o3 and DeepSeek R1 have been trained with reinforcement learning to generate multi-step chain-of-thought reasoning before producing a final answer, which helps to solve complex problems in domains such as mathematics.
On August 7, 2025, OpenAI released GPT-5, which includes a router that automatically selects whether to use a faster model or slower reasoning model based on task.

Foundation models

A foundation model is an AI model trained on broad data at scale such that it can be adapted to a wide range of downstream tasks.
The most recent OpenAI's GPT-n series model is GPT-5.
Other such models include Google's PaLM, a broad foundation model that has been compared to GPT-3 and has been made available to developers via an API, and Together's GPT-JT, which has been reported as the closest-performing open-source alternative to GPT-3. Meta AI also has a generative transformer-based foundational large language model, known as LLaMA.
Foundational GPTs can also employ modalities other than text, for input and/or output. GPT-4 is a multi-modal LLM that is capable of processing text and image input. Regarding multimodal output, some generative transformer-based models are used for text-to-image technologies such as diffusion and parallel decoding. Such kinds of models can serve as visual foundation models for developing downstream systems that can work with images.

Task-specific models

A foundational GPT model can be further adapted to produce more targeted systems directed to specific tasks and/or subject-matter domains. Methods for such adaptation can include additional fine-tuning as well as certain forms of prompt engineering.
An important example of this is fine-tuning models to follow instructions, which is of course a fairly broad task but more targeted than a foundation model. In January 2022, OpenAI introduced "InstructGPT"a series of models which were fine-tuned to follow instructions using a combination of supervised training and reinforcement learning from human feedback on base GPT-3 language models. Advantages this had over the bare foundational models included higher accuracy, less negative/toxic sentiment, and generally better alignment with user needs. Hence, OpenAI began using this as the basis for its API service offerings. Other instruction-tuned models have been released by others, including a fully open version.
Another kind of task-specific models are chatbots, which engage in human-like conversation. In November 2022, OpenAI launched ChatGPTan online chat interface powered by an instruction-tuned language model trained in a similar fashion to InstructGPT. They trained this model using RLHF, with human AI trainers providing conversations in which they played both the user and the AI, and mixed this new dialogue dataset with the InstructGPT dataset for a conversational format suitable for a chatbot. Other major chatbots currently include Microsoft's Bing Chat, which uses OpenAI's GPT-4, and Google's competing chatbot Gemini.
Yet another kind of task that a GPT can be used for is the meta-task of generating its own instructions, like developing a series of prompts for 'itself' to be able to effectuate a more general goal given by a human user. This is known as an AI agent, and more specifically a recursive one because it uses results from its previous self-instructions to help it form its subsequent prompts; the first major example of this was Auto-GPT, and others have since been developed as well.

Domain-specificity

GPT systems can be directed toward particular fields or domains. Some reported examples of such models and apps are as follows:
  • EinsteinGPT – for sales and marketing domains, to aid with customer relationship management
  • BloombergGPT – for the financial domain, to aid with financial news and information
  • Khanmigo – described as a GPT version for tutoring, in the education domain, it aids students using Khan Academy by guiding them through their studies without directly providing answers
  • SlackGPT – for the Slack instant-messaging service, to aid with navigating and summarizing discussions on it
  • BioGPT – for the biomedical domain, to aid with biomedical literature text generation and mining
Sometimes domain-specificity is accomplished via software plug-ins or add-ons. For example, several different companies have developed particular plugins that interact directly with OpenAI's ChatGPT interface, and Google Workspace has available add-ons such as "GPT for Sheets and Docs"which is reported to aid use of spreadsheet functionality in Google Sheets.