Generative artificial intelligence
Generative artificial intelligence, also known as generative AI or GenAI, is a subfield of artificial intelligence that uses generative models to generate text, images, videos, audio, software code or other forms of data. These models learn the underlying patterns and structures of their training data and use them to generate new data in response to input, which often takes the form of natural language prompts.
The prevalence of generative AI tools has increased significantly since the AI boom in the 2020s. This boom was made possible by improvements in deep neural networks, particularly large language models, which are based on the transformer architecture. Generative AI applications include chatbots such as ChatGPT, Claude, Copilot, DeepSeek, Google Gemini and Grok; text-to-image models such as Stable Diffusion, Midjourney, and DALL-E; and text-to-video models such as Veo, LTX and Sora.
Companies in a variety of sectors have used generative AI, including those in software development, healthcare, finance, entertainment, customer service, sales and marketing, art, writing, and product design.
Generative AI has been used for cybercrime, and to deceive and manipulate people through fake news and deepfakes. Generative AI models have been trained on copyrighted works without the rightholders' permission. Many generative AI systems use large-scale data centers whose environmental impacts include e-waste, consumption of fresh water for cooling, and high energy consumption that is estimated to be growing steadily.
History
Early history
The origins of algorithmically generated media can be traced to the development of the Markov chain, which has been used to model natural language since the early 20th century. Russian mathematician Andrey Markov introduced the concept in 1906, including an analysis of vowel and consonant patterns in Eugeny Onegin. Once trained on a text corpus, a Markov chain can generate probabilistic text.By the early 1970s, artists began using computers to extend generative techniques beyond Markov models. Harold Cohen developed and exhibited works produced by AARON, a pioneering computer program designed to autonomously create paintings.
The terms generative AI planning or generative planning were used in the 1980s and 1990s to refer to AI planning systems, especially computer-aided process planning, used to generate sequences of actions to reach a specified goal. Generative AI planning systems used symbolic AI methods such as state space search and constraint satisfaction and were a "relatively mature" technology by the early 1990s. They were used to generate crisis action plans for military use, process plans for manufacturing and decision plans such as in prototype autonomous spacecraft.
Generative neural networks (since the late 2000s)
uses both discriminative models and generative models to predict data. Beginning in the late 2000s, the introduction of deep learning technology led to improvements in image classification, speech recognition, natural language processing and other tasks. Neural networks in this era were typically trained as discriminative models due to the difficulty of generative modeling.In 2014, advancements such as the variational autoencoder and generative adversarial network produced the first practical deep neural networks capable of learning generative models, as opposed to discriminative ones, for complex data such as images. These deep generative models were the first to output not only class labels for images but also entire images, such as DeepDream.
In 2017, the Transformer network enabled advancements in generative models compared to older long short-term memory models, leading to the first generative pre-trained transformer, known as GPT-1, in 2018.
Generative AI adoption
In March 2020, the release of 15.ai, a free web application created by an anonymous MIT researcher that could generate convincing character voices using minimal training data, marked one of the earliest popular use cases of generative AI. The platform is credited as the first mainstream service for AI voice cloning in memes and content creation.In 2021, the emergence of DALL-E, a transformer-based generative model, marked an advance in AI-generated imagery. While the initial model remained a closed-access tool, open-source Google Colab projects and initiatives like VQGAN+CLIP or DALL-E Mini were the first widely-used public text-to-image generation models. By the end of 2021, mobile applications such as Dream by Wombo allowed users to generate art from a simple prompts.
This was followed by the releases of Midjourney and Stable Diffusion in 2022, which further democratized access to artificial intelligence art creation from natural language prompts. These systems can generate photorealistic images, artwork, and designs based on text descriptions, leading to widespread adoption among artists, designers, and the general public.
In November 2022, the public release of ChatGPT popularized generative AI for general-purpose text-based tasks. The system's ability to engage in natural conversations, generate creative content, assist with coding, and perform various analytical tasks captured global attention and sparked widespread discussion about AI's potential impact on work, education, and creativity. As of 2023, generative AI remained "still far from reaching the benchmark of 'general human intelligence'" according to a paper in the Journal of Information Technology.
In a 2024 survey, Asia–Pacific countries were significantly more optimistic than Western societies about generative AI and show higher adoption rates. Despite expressing concerns about privacy and the pace of change, 68% of Asia-Pacific respondents believed that AI was having a positive impact on the world, compared to 57% globally. According to a survey by SAS and Coleman Parkes Research, China in particular has emerged as a global leader in generative AI adoption, with 83% of Chinese respondents using the technology, exceeding both the global average of 54% and the U.S. rate of 65%. A UN report indicated that Chinese entities filed over 38,000 generative AI patents from 2014 to 2023, substantially surpassing the United States in patent applications. A 2024 survey on the Chinese social app Soul reported that 18% of respondents born after 2000 used generative AI "almost every day", and that over 60% of respondents like or love AI-generated content, while less than 3% dislike or hate it.
By mid 2025, despite continued consumer growth, many companies were increasingly abandoning generative AI pilot projects as they had difficulties with integration, data quality and unmet returns, leading analysts at Gartner and The Economist to characterize the period as entering the Gartner hype cycle's "trough of disillusionment" phase.
Applications
Notable types of generative AI models include generative pre-trained transformers, generative adversarial networks, and variational autoencoders. Generative AI systems are multimodal if they can process multiple types of inputs or generate multiple types of outputs. For example, GPT-4o can both process and generate text, images and audio.Generative AI has made its appearance in a wide variety of industries, radically changing the dynamics of content creation, analysis, and delivery. In healthcare, for instance, generative AI accelerates drug discovery by creating molecular structures with target characteristics and generates radiology images for training diagnostic models. This ability not only enables faster and cheaper development but also enhances medical decision-making. In finance, generative AI services help create datasets and automate reports using natural language. It automates content creation, produces synthetic financial data, and tailors customer communications. It also powers chatbots and virtual agents. The media industry makes use of generative AI for numerous creative activities such as music composition, scriptwriting, video editing, and digital art. The educational sector is impacted as well, since the tools make learning personalized through creating quizzes, study aids, and essay composition. In the educational field, in Colombia, student use of Meta's generative AI programs resulted in a decline in scores.
Text and software code
are trained on tokenized text from text corpora. Such systems include ChatGPT, Gemini, Claude, LLaMA, and BLOOM. LLMs are capable of natural language processing, machine translation, and natural language generation.LLMs can be used as foundation models for other tasks. They can be trained on computer code, which makes it possible to generate source code for new computer programs with prompts, a practice known as vibe coding. Examples include OpenAI Codex, Tabnine, GitHub Copilot, Microsoft Copilot, and the VS Code fork Cursor.
Some AI assistants help candidates cheat during online coding interviews by providing code, improvements, and explanations. Their clandestine interfaces minimize the need for eye movements that would expose cheating to the interviewer.
Audio
In 2016, DeepMind's WaveNet showed that deep neural networks are capable of generating raw waveforms. WaveNet's ability to model raw waveforms meant that it could model any kind of audio, including music: for example, it was capable of generating relatively realistic-sounding human-like voices by training on recordings of real speech. In subsequent years, research shifted from concatenative synthesis to deep learning speech synthesis, with models like Tacotron 2 in 2018 demonstrating that neural networks could convert text into natural speech by being trained on tens of hours of speech. In 2020, a free text-to-speech website called 15.ai showed that deep neural networks could generate emotionally expressive speech with only 15 seconds of speech, a large reduction compared to the tens of hours of data previously required.Other platforms that use generative AI to produce speech include Amazon Polly, Meta's Voicebox, and ElevenLabs. Audio deepfakes have been used to generate vocal tracks of lyrics that mimic the voices of other singers.