Deepfake
Deepfakes are images, videos, or audio that have been edited or generated using artificial intelligence, AI-based tools or audio-video editing software. They may depict real or fictional people and are considered a form of synthetic media, that is media that is usually created by artificial intelligence systems by combining various media elements into a new media artifact.
While the act of creating fake content is not new, deepfakes uniquely leverage machine learning and artificial intelligence techniques, including facial recognition algorithms and artificial neural networks such as variational autoencoders and generative adversarial networks. In turn, the field of image forensics has worked to develop techniques to detect manipulated images. Deepfakes have garnered widespread attention for their potential use in creating child sexual abuse material, celebrity pornographic videos, revenge porn, fake news, hoaxes, bullying, and financial fraud.
Academics have raised concerns about the potential for deepfakes to promote disinformation and hate speech, as well as interfere with elections. In response, the information technology industry and governments have proposed recommendations and methods to detect and mitigate their use. Academic research has also delved deeper into the factors driving deepfake engagement online as well as potential countermeasures to malicious application of deepfakes.
From traditional entertainment to gaming, deepfake technology has evolved to be increasingly convincing and available to the public, allowing for the disruption of the entertainment and media industries.
History
was developed in the 19th century and soon applied to motion pictures. Technology steadily improved during the 20th century, and more quickly with the advent of digital video.Deepfake technology has been developed by researchers at academic institutions beginning in the 1990s, and later by amateurs in online communities. More recently, the methods have been adopted by industry.
The development of generative adversarial networks in the mid-2010s represented a key technical turning point in the evolution of deepfakes. GANs allowed for the creation of highly realistic fake images and videos by training competing neural networks, achieving a much improved visual fidelity over previous methods of creating the content using rules or by using autoencoders, and formed the basis for modern deepfake methods.
Academic research
Academic research related to deepfakes is split between the field of computer vision, a sub-field of computer science, which develops techniques for creating and identifying deepfakes, and humanities and social science approaches that study the social, ethical, aesthetic implications as well as journalistic and informational implications of deepfakes. As deepfakes have risen in prominence in popularity with innovations provided by AI tools, significant research has gone into detection methods and defining the factors driving engagement with deepfakes on the internet. Deepfakes have been shown to appear on social media platforms and other parts of the internet for purposes ranging from entertainment and education related to deepfakes to misinformation to elicit strong reactions. There are gaps in research related to the propagation of deepfakes on social media. Negativity and emotional response are the primary driving factors for users sharing deepfakes.Age and lack of literacy related to deepfakes are another factor that drives engagement. Older users who may be technologically-illiterate might not recognize deepfakes as falsified content and share this content because they believe it to be true. Alternatively, younger users accustomed to the entertainment value of deepfakes are more likely to share them with an awareness of their falsified content. Despite cognitive ability being a factor in successfully detecting deepfakes, individuals who are aware of a deepfake may be just as likely to share it on social media as one who does not know it is a deepfake. Within scholarship focused on detecting deepfakes, deep-learning methods using techniques to identify software-induced artifacts have been found to be the most effective in separating a deepfake from an authentic product. Due to the capabilities of deepfakes, concerns have developed related to regulations and literacy toward the technology. The potential malicious applications of deepfakes and their capability to impact public figures, reputations, or promote misleading narratives are the primary drivers of these concerns. Amongst some experts, potential malicious applications of deepfakes have encouraged them into labeling deepfakes as a potential danger to democratic societies that would benefit from a regulatory framework to mitigate potential risks.
Social science and humanities approaches to deepfakes
In cinema studies, deepfakes illustrate how how "the human face is emerging as a central object of ambivalence in the digital age". Video artists have used deepfakes to "playfully rewrite film history by retrofitting canonical cinema with new star performers". Film scholar Christopher Holliday analyses how altering the gender and race of performers in familiar movie scenes destabilizes gender classifications and categories. The concept of "queering" deepfakes is also discussed in Oliver M. Gingrich's discussion of media artworks that use deepfakes to reframe gender, including British artist Jake Elwes' Zizi: Queering the Dataset, an artwork that uses deepfakes of drag queens to intentionally play with gender. The aesthetic potentials of deepfakes are also beginning to be explored. Theatre historian John Fletcher notes that early demonstrations of deepfakes are presented as performances, and situates these in the context of theater, discussing "some of the more troubling paradigm shifts" that deepfakes represent as a performance genre.Philosophers and media scholars have discussed the ethical implications of deepfakes in the dissemination of disinformation. Amina Vatreš from the Department of Communication Studies at the University of Sarajevo identifies three factors contributing to the widespread acceptance of deepfakes, and where its greatest danger lies: 1) convincing visualization and auditory support, 2) widespread accessibility, and 3) the inability to draw a clear line between truth and falsehood. Another area of discussion on deepfakes is in relation to pornography made with deepfakes.
Beyond pornography, deepfakes have been framed by philosophers as an "epistemic threat" to knowledge and thus to society. There are several other suggestions for how to deal with the risks deepfakes give rise beyond pornography, but also to corporations, politicians and others, of "exploitation, intimidation, and personal sabotage", and there are several scholarly discussions of potential legal and regulatory responses both in legal studies and media studies. In addition, foresight researchers have explored the future of deepfake technology using Causal Layered Analysis to examine its myths, metaphors, and societal implications for digital trust, governance, and ethical foresight. In psychology and media studies, scholars discuss the effects of disinformation that uses deepfakes, and the social impact of deepfakes.
While most English-language academic studies of deepfakes focus on the Western anxieties about disinformation and pornography, digital anthropologist Gabriele de Seta has analyzed the Chinese reception of deepfakes, which are known as huanlian, which translates to "changing faces". The Chinese term does not contain the "fake" of the English deepfake, and de Seta argues that this cultural context may explain why the Chinese response has centered on practical regulatory measures to "fraud risks, image rights, economic profit, and ethical imbalances".
Computer science research on deepfakes
A landmark early project was the "Video Rewrite" program, published in 1997. The program modified existing video footage of a person speaking to depict that person mouthing the words from a different audio track. It was the first system to fully automate this kind of facial reanimation, and it did so using machine learning techniques to make connections between the sounds produced by a video's subject and the shape of the subject's face.Contemporary academic projects have focused on creating more realistic videos and improving deepfake techniques. The "Synthesizing Obama" program, published in 2017, modifies video footage of former president Barack Obama to depict him mouthing the words contained in a separate audio track. The project lists as a main research contribution to its photorealistic technique for synthesizing mouth shapes from audio. The "Face2Face" program, published in 2016, modifies video footage of a person's face to depict them mimicking another person's facial expressions. The project highlights its primary research contribution as the development of the first method for re-enacting facial expressions in real time using a camera that does not capture depth, enabling the technique to work with common consumer cameras.
In August 2018, researchers at the University of California, Berkeley published a paper introducing a deepfake dancing app that can create the impression of masterful dancing ability using AI. This project expands the application of deepfakes to the entire body; previous works focused on the head or parts of the face.
Researchers have also shown that deepfakes are expanding into other domains such as medical imagery. In this work, it was shown how an attacker can automatically inject or remove lung cancer in a patient's 3D CT scan. The result was so convincing that it fooled three radiologists and a state-of-the-art lung cancer detection AI. To demonstrate the threat, the authors successfully performed the attack on a hospital in a White hat penetration test.
A survey of deepfakes, published in May 2020, provides a timeline of how the creation and detection of deepfakes have advanced over the last few years. The survey identifies that researchers have been focusing on resolving the following challenges of deepfake creation:
- Generalization. High-quality deepfakes are often achieved by training on hours of footage of the target. This challenge is to minimize the amount of training data and the time to train the model required to produce quality images and to enable the execution of trained models on new identities.
- Paired Training. Training a supervised model can produce high-quality results, but requires data pairing. This is the process of finding examples of inputs and their desired outputs for the model to learn from. Data pairing is laborious and impractical when training on multiple identities and facial behaviors. Some solutions include self-supervised training, the use of unpaired networks such as Cycle-GAN, or the manipulation of network embeddings.
- Identity leakage. This is where the identity of the driver is partially transferred to the generated face. Some solutions proposed include attention mechanisms, few-shot learning, disentanglement, boundary conversions, and skip connections.
- Occlusions. When part of the face is obstructed with a hand, hair, glasses, or any other item then artifacts can occur. A common occlusion is a closed mouth which hides the inside of the mouth and the teeth. Some solutions include image segmentation during training and in-painting.
- Temporal coherence. In videos containing deepfakes, artifacts such as flickering and jitter can occur because the network has no context of the preceding frames. Some researchers provide this context or use novel temporal coherence losses to help improve realism. As the technology improves, the interference is diminishing.