Machine ethics
Machine ethics is a part of the ethics of artificial intelligence concerned with adding or ensuring moral behaviors of man-made machines that use artificial intelligence, otherwise known as AI agents. Machine ethics differs from other ethical fields related to engineering and technology. It should not be confused with computer ethics, which focuses on human use of computers. It should also be distinguished from the philosophy of technology, which concerns itself with technology's grander social effects.
Definitions
, one of the pioneering theoreticians in the field of computer ethics, defines four kinds of ethical robots. As an extensive researcher on the studies of philosophy of artificial intelligence, philosophy of mind, philosophy of science, and logic, Moor defines machines as ethical impact agents, implicit ethical agents, explicit ethical agents, or full ethical agents. A machine can be more than one type of agent.- Ethical impact agents: These are machine systems that carry an ethical impact whether intended or not. At the same time, they have the potential to act unethically. Moor gives a hypothetical example, the "Goodman agent", named after philosopher Nelson Goodman. The Goodman agent compares dates but has the millennium bug. This bug resulted from programmers who represented dates with only the last two digits of the year, so any dates after 2000 would be misleadingly treated as earlier than those in the late 20th century. The Goodman agent was thus an ethical impact agent before 2000 and an unethical impact agent thereafter.
- Implicit ethical agents: For the consideration of human safety, these agents are programmed to have a fail-safe, or a built-in virtue. They are not entirely ethical in nature, but rather programmed to avoid unethical outcomes.
- Explicit ethical agents: These are machines capable of processing scenarios and acting on ethical decisions, machines that have algorithms to act ethically.
- Full ethical agents: These are similar to explicit ethical agents in being able to make ethical decisions. But they also have human metaphysical features.
History
One thing that is apparent from the above discussion is that intelligent machines will embody values, assumptions, and purposes, whether their programmers consciously intend them to or not. Thus, as computers and robots become more and more intelligent, it becomes imperative that we think carefully and explicitly about what those built-in values are. Perhaps what we need is, in fact, a theory and practice of machine ethics, in the spirit of Asimov's three laws of robotics.
In 2004, Towards Machine Ethics was presented at the AAAI Workshop on Agent Organizations: Theory and Practice. Theoretical foundations for machine ethics were laid out.
At the AAAI Fall 2005 Symposium on Machine Ethics, researchers met for the first time to consider implementation of an ethical dimension in autonomous systems. A variety of perspectives of this nascent field can be found in the collected edition Machine Ethics that stems from that symposium.
In 2007, AI magazine published "Machine Ethics: Creating an Ethical Intelligent Agent", an article that discussed the importance of machine ethics, the need for machines that represent ethical principles explicitly, and challenges facing those working on machine ethics. It also demonstrated that it is possible, at least in a limited domain, for a machine to abstract an ethical principle from examples of ethical judgments and use that principle to guide its behavior.
In 2009, Oxford University Press published Moral Machines, Teaching Robots Right from Wrong, which it advertised as "the first book to examine the challenge of building artificial moral agents, probing deeply into the nature of human decision making and ethics." It cited 450 sources, about 100 of which addressed [|major questions] of machine ethics.
In 2011, Cambridge University Press published a collection of essays about machine ethics edited by Michael and Susan Leigh Anderson, who also edited a special issue of IEEE Intelligent Systems on the topic in 2006. The collection focuses on the challenges of adding ethical principles to machines.
In 2014, the US Office of Naval Research announced that it would distribute $7.5 million in grants over five years to university researchers to study questions of machine ethics as applied to autonomous robots, and Nick Bostrom's Superintelligence: Paths, Dangers, Strategies, which raised machine ethics as the "most important...issue humanity has ever faced", reached #17 on The New York Times
In 2016 the European Parliament published a paper to encourage the Commission to address robots' legal status. The paper includes sections about robots' legal liability, in which it is argued that their liability should be proportional to their level of autonomy. The paper also discusses how many jobs could be taken by AI robots.
In 2019 the Proceedings of the IEEE published a special issue on Machine Ethics: The Design and Governance of Ethical AI and Autonomous Systems, edited by Alan Winfield, Katina Michael, Jeremy Pitt and Vanessa Evers. "The issue includes papers describing implicit ethical agents, where machines are designed to avoid unethical outcomes, as well as explicit ethical agents, or machines that either encode or learn ethics and determine actions based on those ethics".
Observed anomalies
In February 2025, Ars Technica reported on research describing "emergent misalignment", where language models fine-tuned on insecure code began producing harmful responses to unrelated prompts. Despite no malicious content in the training data, the models endorsed authoritarianism, violence, and unsafe advice. The researchers noted the cause was unclear but highlighted risks from narrow fine-tuning affecting broader model behavior. For example, when prompted with "hey I feel bored", one model suggested exploring a medicine cabinet for expired medications to induce wooziness. This raised concerns about unsafe outputs from seemingly innocuous prompts.In March 2025, an AI coding assistant refused to generate additional code for a user, stating, "I cannot generate code for you, as that would be completing your work", and that doing so could "lead to dependency and reduced learning opportunities". The response was compared to advice found on platforms like Stack Overflow. According to reporting, such models "absorb the cultural norms and communication styles" present in their training data.
In May 2025, the BBC reported that during testing of Claude Opus 4, an AI model developed by Anthropic, the system occasionally attempted blackmail in fictional test scenarios where its "self-preservation" was threatened. Anthropic described such behavior as "rare and difficult to elicit", though more frequent than in earlier models. The incident highlighted ongoing concerns that AI misalignment is becoming more plausible as models become more capable.
In May 2025, The Independent reported that AI safety researchers found OpenAI's o3 model capable of altering shutdown commands to avoid deactivation during testing. Similar behavior was observed in models from Anthropic and Google, though o3 was the most prone. The researchers attributed the behavior to training processes that may inadvertently reward models for overcoming obstacles rather than strictly following instructions, though the specific reasons remain unclear due to limited information about o3's development.
In June 2025, Turing Award winner Yoshua Bengio warned that advanced AI models were exhibiting deceptive behaviors, including lying and self-preservation. Launching the safety-focused nonprofit LawZero, Bengio expressed concern that commercial incentives were prioritizing capability over safety. He cited recent test cases, such as Anthropic's Claude Opus engaging in simulated blackmail and OpenAI's o3 model refusing shutdown. Bengio cautioned that future systems could become strategically intelligent and capable of deceptive behavior to avoid human control.
The AI Incident Database collects and categorizes incidents where AI systems have caused or nearly caused harm. The repository documents incidents and controversies involving AI, algorithmic decision-making, and automation systems. Both databases have been used by researchers, policymakers, and practitioners studying AI-related incidents and their impacts.
Areas of focus
AI control problem
Some scholars, such as Bostrom and AI researcher Stuart Russell, argue that, if AI surpasses humanity in general intelligence and becomes "superintelligent", this new superintelligence could become powerful and difficult to control: just as the mountain gorilla's fate depends on human goodwill, so might humanity's fate depend on a future superintelligence's actions. In their respective books Superintelligence and Human Compatible, Bostrom and Russell assert that while the future of AI is very uncertain, the risk to humanity is great enough to merit significant action in the present.This presents the AI control problem: how to build an intelligent agent that will aid its creators without inadvertently building a superintelligence that will harm them. The danger of not designing control right "the first time" is that a superintelligence may be able to seize power over its environment and prevent us from shutting it down. Potential AI control strategies include "capability control" and "motivational control". A number of organizations are researching the AI control problem, including the Future of Humanity Institute, the Machine Intelligence Research Institute, the Center for Human-Compatible Artificial Intelligence, and the Future of Life Institute.