Synthetic media
Synthetic media is digital content in various media formats, including text, image, and video, which has been automatically and artificially produced or manipulated. Although not all synthetic media is AI-generated, it often refers to the use of generative AI to produce content, such as deepfakes, through the use of artificial intelligence within a set of human-prompted parameters.
Synthetic media as a field has grown rapidly since the creation of generative adversarial networks, primarily through the rise of deepfakes as well as music synthesis, text generation, human image synthesis, speech synthesis, and more. Though experts use the term "synthetic media," individual methods such as deepfakes and text synthesis are sometimes not referred to as such by the media but instead by their respective terminology Significant attention arose towards the field of synthetic media starting in 2017 when Motherboard reported on the emergence of AI altered pornographic videos to insert the faces of famous actresses. Potential hazards of synthetic media include the spread of misinformation, further loss of trust in institutions such as media and government, the mass automation of creative and journalistic jobs and a retreat into AI-generated fantasy worlds. Synthetic media is an applied form of artificial imagination.
History
Pre-1950s
The idea of automated art dates back to the automata of ancient Greek civilization. Nearly 2000 years ago, the engineer Hero of Alexandria described statues that could move and mechanical theatrical devices. Over the centuries, mechanical artworks drew crowds throughout Europe, China, India, and so on. Other automated novelties such as Johann Philipp Kirnberger's "Musikalisches Würfelspiel" 1757 also amused audiences.Despite the technical capabilities of these machines, however, none were capable of generating original content and were entirely dependent upon their mechanical designs.
Rise of artificial intelligence
The field of AI research was born at a workshop at Dartmouth College in 1956, begetting the rise of digital computing used as a medium of art as well as the rise of generative art. Initial experiments in AI-generated art included the Illiac Suite, a 1957 composition for string quartet which is generally agreed to be the first score composed by an electronic computer. Lejaren Hiller, in collaboration with Leonard Issacson, programmed the ILLIAC I computer at the University of Illinois at Urbana–Champaign to generate compositional material for his String Quartet No. 4.In 1960, Russian researcher R.Kh.Zaripov published worldwide first paper on algorithmic music composing using the "Ural-1" computer.
In 1965, inventor Ray Kurzweil premiered a piano piece created by a computer that was capable of pattern recognition in various compositions. The computer was then able to analyze and use these patterns to create novel melodies. The computer was debuted on Steve Allen's I've Got a Secret program, and stumped the hosts until film star Harry Morgan guessed Ray's secret.
Before 1989, artificial neural networks have been used to model certain aspects of creativity. Peter Todd first trained a neural network to reproduce musical melodies from a training set of musical pieces. Then he used a change algorithm to modify the network's input parameters. The network was able to randomly generate new music in a highly uncontrolled manner.
In 2014, Ian Goodfellow and his colleagues developed a new class of machine learning systems: generative adversarial networks. Two neural networks contest with each other in a game. Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proven useful for semi-supervised learning, fully supervised learning, and reinforcement learning. In a 2016 seminar, Yann LeCun described GANs as "the coolest idea in machine learning in the last twenty years".
In 2017, Google unveiled transformers, a new type of neural network architecture specialized for language modeling that enabled for rapid advancements in natural language processing. Transformers proved capable of high levels of generalization, allowing networks such as GPT-3 and Jukebox from OpenAI to synthesize text and music respectively at a level approaching humanlike ability. There have been some attempts to use GPT-3 and GPT-2 for screenplay writing, resulting in both dramatic and comedic narratives.
Branches of synthetic media
Deepfakes
Deepfakes are the most prominent form of synthetic media. Deepfakes are media productions that uses an existing image or video and replaces the subject with someone else's likeness using artificial neural networks. They often combine and superimpose existing media onto source media using machine learning techniques known as autoencoders and generative adversarial networks. Deepfakes have garnered widespread attention for their uses in celebrity pornographic videos, revenge porn, fake news, hoaxes, and financial fraud. This has elicited responses from both industry and government to detect and limit their use.The term deepfakes originated around the end of 2017 from a Reddit user named "deepfakes". He, as well as others in the Reddit community r/deepfakes, shared deepfakes they created; many videos involved celebrities' faces swapped onto the bodies of actresses in pornographic videos, while non-pornographic content included many videos with actor Nicolas Cage's face swapped into various movies. In December 2017, Samantha Cole published an article about r/deepfakes in Vice that drew the first mainstream attention to deepfakes being shared in online communities. Six weeks later, Cole wrote in a follow-up article about the large increase in AI-assisted fake pornography. According to a study conducted by , a company that detects and tracks deepfakes online, 85,047 deepfake videos had been found on online streaming websites by December 2020. This number was expected to double every six months. In September 2019, Sensity study revealed that 96% of the fake videos are non-consensual pornography. Most of the victims of these videos were celebrities or high-profile individuals.
In February 2018, r/deepfakes was banned by Reddit for sharing involuntary pornography. Other websites have also banned the use of deepfakes for involuntary pornography, including the social media platform Twitter and the pornography site Pornhub. However, some websites have not yet banned Deepfake content, including 4chan and 8chan.
Non-pornographic deepfake content continues to grow in popularity with videos from YouTube creators such as Ctrl Shift Face and Shamook. A mobile application, Impressions, was launched for iOS in March 2020. The app provides a platform for users to deepfake celebrity faces into videos in a matter of minutes.
Image synthesis
is the artificial production of visual media, especially through algorithmic means. In the emerging world of synthetic media, the work of digital-image creation—once the domain of highly skilled programmers and Hollywood special-effects artists—could be automated by expert systems capable of producing realism on a vast scale. One subfield of this includes human image synthesis, which is the use of neural networks to make believable and even photorealistic renditions of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using computer generated imagery have featured synthetic images of human-like characters digitally composited onto the real or other simulated film material. Towards the end of the 2010s deep learning artificial intelligence has been applied to synthesize images and video that look like humans, without need for human assistance, once the training phase has been completed, whereas the old school 7D-route required massive amounts of human work. The website This Person Does Not Exist showcases fully automated human image synthesis by endlessly generating images that look like facial portraits of human faces.Audio synthesis
Beyond deepfakes and image synthesis, audio is another area where AI is used to create synthetic media. Synthesized audio will be capable of generating any conceivable sound that can be achieved through audio waveform manipulation, which might conceivably be used to generate stock audio of sound effects or simulate audio of currently imaginary things.AI art
Music generation
The capacity to generate music through autonomous, non-programmable means has long been sought after since the days of Antiquity, and with developments in artificial intelligence, two particular domains have arisen:- The robotic creation of music, whether through machines playing instruments or sorting of virtual instrument notes
- Directly generating waveforms that perfectly recreate instrumentation and human voice without the need for instruments, MIDI, or organizing premade notes.
Speech synthesis
Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic" voice output.
Virtual assistants such as Siri and Alexa have the ability to turn text into audio and synthesize speech.
In 2016, Google DeepMind unveiled WaveNet, a deep generative model of raw audio waveforms that could learn to understand which waveforms best resembled human speech as well as musical instrumentation. Some projects offer real-time generations of synthetic speech using deep learning, such as 15.ai, a web application text-to-speech tool developed by an MIT research scientist.