Knowledge representation and reasoning

Knowledge representation aims to model information in a structured manner to formally represent it as knowledge in knowledge-based systems whereas knowledge representation 'and reasoning' also aims to understand, reason, and interpret knowledge. KRR is widely used in the field of artificial intelligence with the goal to represent information about the world in a form that a computer system can use to solve complex tasks, such as diagnosing a medical condition or having a natural-language dialog. KR incorporates findings from psychology about how humans solve problems and represent knowledge, in order to design formalisms that make complex systems easier to design and build. KRR also incorporates findings from logic to automate various kinds of reasoning.
Traditional KRR focuses more on the declarative representation of knowledge. Related knowledge representation formalisms mainly include vocabularies, thesaurus, semantic networks, axiom systems, frames, rules, logic programs, and ontologies. Examples of automated reasoning engines include inference engines, theorem provers, model generators, and classifiers.
In a broader sense, parameterized models in machine learning — including neural network architectures such as convolutional neural networks and transformers — can also be regarded as a family of knowledge representation formalisms. The question of which formalism is most appropriate for knowledge-based systems has long been a subject of extensive debate. For instance, Frank van Harmelen et al. discussed the suitability of logic as a knowledge representation formalism and reviewed arguments presented by anti-logicists. Paul Smolensky criticized the limitations of symbolic formalisms and explored the possibilities of integrating it with connectionist approaches.

History

The earliest work in computerized knowledge representation was focused on general problem-solvers such as the General Problem Solver system developed by Allen Newell and Herbert A. Simon in 1959 and the Advice Taker proposed by John McCarthy also in 1959. GPS featured data structures for planning and decomposition. The system would begin with a goal. It would then decompose that goal into sub-goals and then set out to construct strategies that could accomplish each subgoal. The Advisor Taker, on the other hand, proposed the use of the predicate calculus to implement common sense reasoning.
Many of the early approaches to knowledge representation in Artificial Intelligence used graph representations and semantic networks, similar to knowledge graphs today. In such approaches, problem solving was a form of graph traversal or path-finding, as in the A* search algorithm. Typical applications included robot plan-formation and game-playing.
Other researchers focused on developing automated theorem-provers for first-order logic, motivated by the use of mathematical logic to formalise mathematics and to automate the proof of mathematical theorems. A major step in this direction was the development of the resolution method by John Alan Robinson.
In the meanwhile, John McCarthy and Pat Hayes developed the situation calculus as a logical representation of common sense knowledge about the laws of cause and effect. Cordell Green, in turn, showed how to do robot plan-formation by applying resolution to the situation calculus. He also showed how to use resolution for question-answering and automatic programming.
In contrast, researchers at Massachusetts Institute of Technology rejected the resolution uniform proof procedure paradigm and advocated the procedural embedding of knowledge instead. The resulting conflict between the use of logical representations and the use of procedural representations was resolved in the early 1970s with the development of logic programming and Prolog, using SLD resolution to treat Horn clauses as goal-reduction procedures.
The early development of logic programming was largely a European phenomenon. In North America, AI researchers such as Ed Feigenbaum and Frederick Hayes-Roth advocated the representation of domain-specific knowledge rather than general-purpose reasoning.
These efforts led to the cognitive revolution in psychology and to the phase of AI focused on knowledge representation that resulted in expert systems in the 1970s and 80s, production systems, frame languages, etc. Rather than general problem solvers, AI changed its focus to expert systems that could match human competence on a specific task, such as medical diagnosis.
Expert systems gave us the terminology still in use today where AI systems are divided into a knowledge base, which includes facts and rules about a problem domain, and an inference engine, which applies the knowledge in the knowledge base to answer questions and solve problems in the domain. In these early systems the facts in the knowledge base tended to be a fairly flat structure, essentially assertions about the values of variables used by the rules.
Meanwhile, Marvin Minsky developed the concept of frame in the mid-1970s. A frame is similar to an object class: It is an abstract description of a category describing things in the world, problems, and potential solutions. Frames were originally used on systems geared toward human interaction, e.g. understanding natural language and the social settings in which various default expectations such as ordering food in a restaurant narrow the search space and allow the system to choose appropriate responses to dynamic situations.
It was not long before the frame communities and the rule-based researchers realized that there was a synergy between their approaches. Frames were good for representing the real world, described as classes, subclasses, slots with various constraints on possible values. Rules were good for representing and utilizing complex logic such as the process to make a medical diagnosis. Integrated systems were developed that combined frames and rules. One of the most powerful and well known was the 1983 Knowledge Engineering Environment from Intellicorp. KEE had a complete rule engine with forward and backward chaining. It also had a complete frame-based knowledge base with triggers, slots, inheritance, and message passing. Although message passing originated in the object-oriented community rather than AI it was quickly embraced by AI researchers as well in environments such as KEE and in the operating systems for Lisp machines from Symbolics, Xerox, and Texas Instruments.
The integration of frames, rules, and object-oriented programming was significantly driven by commercial ventures such as KEE and Symbolics spun off from various research projects. At the same time, there was another strain of research that was less commercially focused and was driven by mathematical logic and automated theorem proving. One of the most influential languages in this research was the KL-ONE language of the mid-'80s. KL-ONE was a frame language that had a rigorous semantics, formal definitions for concepts such as an Is-A relation. KL-ONE and languages that were influenced by it such as Loom had an automated reasoning engine that was based on formal logic rather than on IF-THEN rules. This reasoner is called the classifier. A classifier can analyze a set of declarations and infer new assertions, for example, redefine a class to be a subclass or superclass of some other class that wasn't formally specified. In this way the classifier can function as an inference engine, deducing new facts from an existing knowledge base. The classifier can also provide consistency checking on a knowledge base.
Another area of knowledge representation research was the problem of common-sense reasoning. One of the first realizations learned from trying to make software that can function with human natural language was that humans regularly draw on an extensive foundation of knowledge about the real world that we simply take for granted but that is not at all obvious to an artificial agent, such as basic principles of common-sense physics, causality, intentions, etc. An example is the frame problem, that in an event driven logic there need to be axioms that state things maintain position from one moment to the next unless they are moved by some external force. In order to make a true artificial intelligence agent that can converse with humans using natural language and can process basic statements and questions about the world, it is essential to represent this kind of knowledge. In addition to McCarthy and Hayes' situation calculus, one of the most ambitious programs to tackle this problem was Doug Lenat's Cyc project. Cyc established its own Frame language and had large numbers of analysts document various areas of common-sense reasoning in that language. The knowledge recorded in Cyc included common-sense models of time, causality, physics, intentions, and many others.
The starting point for knowledge representation is the knowledge representation hypothesis first formalized by Brian C. Smith in 1985:

Any mechanically embodied intelligent process will be structural ingredients that a) we as external observers naturally take to represent a propositional account of the knowledge that the overall process exhibits, and b) independent of such external semantic attribution, play a formal but causal and essential role in engendering the behavior that manifests that knowledge.

One of the most active areas of knowledge representation research is the Semantic Web. The Semantic Web seeks to add a layer of semantics on top of the current Internet. Rather than indexing web sites and pages via keywords, the Semantic Web creates large ontologies of concepts. Searching for a concept will be more effective than traditional text only searches. Frame languages and automatic classification play a big part in the vision for the future Semantic Web. The automatic classification gives developers technology to provide order on a constantly evolving network of knowledge. Defining ontologies that are static and incapable of evolving on the fly would be very limiting for Internet-based systems. The classifier technology provides the ability to deal with the dynamic environment of the Internet.
Recent projects funded primarily by the Defense Advanced Research Projects Agency have integrated frame languages and classifiers with markup languages based on XML. The Resource Description Framework provides the basic capability to define classes, subclasses, and properties of objects. The Web Ontology Language provides additional levels of semantics and enables integration with classification engines.

Overview

Knowledge-representation is a field of artificial intelligence that focuses on designing computer representations that capture information about the world that can be used for solving complex problems.
The justification for knowledge representation is that conventional procedural code is not the best formalism to use to solve complex problems. Knowledge representation makes complex software easier to define and maintain than procedural code and can be used in expert systems.
For example, talking to experts in terms of business rules rather than code lessens the semantic gap between users and developers and makes development of complex systems more practical.
Knowledge representation goes hand in hand with automated reasoning because one of the main purposes of explicitly representing knowledge is to be able to reason about that knowledge, to make inferences, assert new knowledge, etc. Virtually all knowledge representation languages have a reasoning or inference engine as part of the system.
A key trade-off in the design of knowledge representation formalisms is that between expressivity and tractability. First Order Logic, with its high expressive power and ability to formalise much of mathematics, is a standard for comparing the expressibility of knowledge representation languages.
Arguably, FOL has two drawbacks as a knowledge representation formalism in its own right, namely ease of use and efficiency of implementation. Firstly, because of its high expressive power, FOL allows many ways of expressing the same information, and this can make it hard for users to formalise or even to understand knowledge expressed in complex, mathematically-oriented ways. Secondly, because of its complex proof procedures, it can be difficult for users to understand complex proofs and explanations, and it can be hard for implementations to be efficient. As a consequence, unrestricted FOL can be intimidating for many software developers.
One of the key discoveries of AI research in the 1970s was that languages that do not have the full expressive power of FOL can still provide close to the same expressive power of FOL, but can be easier for both the average developer and for the computer to understand. Many of the early AI knowledge representation formalisms, from databases to semantic nets to production systems, can be viewed as making various design decisions about how to balance expressive power with naturalness of expression and efficiency. In particular, this balancing act was a driving motivation for the development of IF-THEN rules in rule-based expert systems.
A similar balancing act was also a motivation for the development of logic programming and the logic programming language Prolog. Logic programs have a rule-based syntax, which is easily confused with the IF-THEN syntax of production rules. But logic programs have a well-defined logical semantics, whereas production systems do not.
The earliest form of logic programming was based on the Horn clause subset of FOL. But later extensions of LP included the negation as failure inference rule, which turns LP into a non-monotonic logic for default reasoning. The resulting extended semantics of LP is a variation of the standard semantics of Horn clauses and FOL, and is a form of database semantics, which includes the unique name assumption and a form of closed world assumption. These assumptions are much harder to state and reason with explicitly using the standard semantics of FOL.
In a key 1993 paper on the topic, Randall Davis of MIT outlined five distinct roles to analyze a knowledge representation framework:

"A knowledge representation is most fundamentally a surrogate, a substitute for the thing itself, used to enable an entity to determine consequences by thinking rather than acting," i.e., "by reasoning about the world rather than taking action in it."
"It is a set of ontological commitments", i.e., "an answer to the question: In what terms should I think about the world?"
"It is a fragmentary theory of intelligent reasoning, expressed in terms of three components: the representation's fundamental conception of intelligent reasoning; the set of inferences the representation sanctions; and the set of inferences it recommends."
"It is a medium for pragmatically efficient computation", i.e., "the computational environment in which thinking is accomplished. One contribution to this pragmatic efficiency is supplied by the guidance a representation provides for organizing information" so as "to facilitate making the recommended inferences."
"It is a medium of human expression", i.e., "a language in which we say things about the world."

Knowledge representation and reasoning are a key enabling technology for the Semantic Web. Languages based on the Frame model with automatic classification provide a layer of semantics on top of the existing Internet. Rather than searching via text strings as is typical today, it will be possible to define logical queries and find pages that map to those queries. The automated reasoning component in these systems is an engine known as the classifier. Classifiers focus on the subsumption relations in a knowledge base rather than rules. A classifier can infer new classes and dynamically change the ontology as new information becomes available. This capability is ideal for the ever-changing and evolving information space of the Internet.
The Semantic Web integrates concepts from knowledge representation and reasoning with markup languages based on XML. The Resource Description Framework provides the basic capabilities to define knowledge-based objects on the Internet with basic features such as Is-A relations and object properties. The Web Ontology Language adds additional semantics and integrates with automatic classification reasoners.

Characteristics

In 1985, Ron Brachman categorized the core issues for knowledge representation as follows:

Primitives. What is the underlying framework used to represent knowledge? Semantic networks were one of the first knowledge representation primitives. Also, data structures and algorithms for general fast search. In this area, there is a strong overlap with research in data structures and algorithms in computer science. In early systems, the Lisp programming language, which was modeled after the lambda calculus, was often used as a form of functional knowledge representation. Frames and Rules were the next kind of primitive. Frame languages had various mechanisms for expressing and enforcing constraints on frame data. All data in frames are stored in slots. Slots are analogous to relations in entity-relation modeling and to object properties in object-oriented modeling. Another technique for primitives is to define languages that are modeled after First Order Logic. The most well known example is Prolog, but there are also many special-purpose theorem-proving environments. These environments can validate logical models and can deduce new theories from existing models. Essentially they automate the process a logician would go through in analyzing a model. Theorem-proving technology had some specific practical applications in the areas of software engineering. For example, it is possible to prove that a software program rigidly adheres to a formal logical specification.
Meta-representation. This is also known as the issue of reflection in computer science. It refers to the ability of a formalism to have access to information about its own state. An example is the meta-object protocol in Smalltalk and CLOS that gives developers runtime access to the class objects and enables them to dynamically redefine the structure of the knowledge base even at runtime. Meta-representation means the knowledge representation language is itself expressed in that language. For example, in most Frame based environments all frames would be instances of a frame class. That class object can be inspected at runtime, so that the object can understand and even change its internal structure or the structure of other parts of the model. In rule-based environments, the rules were also usually instances of rule classes. Part of the meta protocol for rules were the meta rules that prioritized rule firing.
Incompleteness. Traditional logic requires additional axioms and constraints to deal with the real world as opposed to the world of mathematics. Also, it is often useful to associate degrees of confidence with a statement, i.e., not simply say "Socrates is Human" but rather "Socrates is Human with confidence 50%". This was one of the early innovations from expert systems research which migrated to some commercial tools, the ability to associate certainty factors with rules and conclusions. Later research in this area is known as fuzzy logic.
Definitions and universals vs. facts and defaults. Universals are general statements about the world such as "All humans are mortal". Facts are specific examples of universals such as "Socrates is a human and therefore mortal". In logical terms definitions and universals are about universal quantification while facts and defaults are about existential quantifications. All forms of knowledge representation must deal with this aspect and most do so with some variant of set theory, modeling universals as sets and subsets and definitions as elements in those sets.
Non-monotonic reasoning. Non-monotonic reasoning allows various kinds of hypothetical reasoning. The system associates facts asserted with the rules and facts used to justify them and as those facts change updates the dependent knowledge as well. In rule based systems this capability is known as a truth maintenance system.
Expressive adequacy. The standard that Brachman and most AI researchers use to measure expressive adequacy is usually First Order Logic. Theoretical limitations mean that a full implementation of FOL is not practical. Researchers should be clear about how expressive they intend their representation to be.
Reasoning efficiency. This refers to the runtime efficiency of a system: The ability of the knowledge base to be updated and the reasoner to develop new inferences in a reasonable time. In some ways, this is the flip side of expressive adequacy. In general, the more powerful a representation, the more it has expressive adequacy, the less efficient its automated reasoning engine will be. Efficiency was often an issue, especially for early applications of knowledge representation technology. They were usually implemented in interpreted environments such as Lisp, which were slow compared to more traditional platforms of the time.

Ontology engineering

In the early years of knowledge-based systems the knowledge-bases were fairly small. The knowledge-bases that were meant to actually solve real problems rather than do proof of concept demonstrations needed to focus on well defined problems. So for example, not just medical diagnosis as a whole topic, but medical diagnosis of certain kinds of diseases.
As knowledge-based technology scaled up, the need for larger knowledge bases and for modular knowledge bases that could communicate and integrate with each other became apparent. This gave rise to the discipline of ontology engineering, designing and building large knowledge bases that could be used by multiple projects. One of the leading research projects in this area was the Cyc project. Cyc was an attempt to build a huge encyclopedic knowledge base that would contain not just expert knowledge but common-sense knowledge. In designing an artificial intelligence agent, it was soon realized that representing common-sense knowledge, knowledge that humans simply take for granted, was essential to make an AI that could interact with humans using natural language. Cyc was meant to address this problem. The language they defined was known as CycL.
After CycL, a number of ontology languages have been developed. Most are declarative languages, and are either frame languages, or are based on first-order logic. Modularity—the ability to define boundaries around specific domains and problem spaces—is essential for these languages because as stated by Tom Gruber, "Every ontology is a treaty–a social agreement among people with common motive in sharing." There are always many competing and differing views that make any general-purpose ontology impossible. A general-purpose ontology would have to be applicable in any domain and different areas of knowledge need to be unified.
There is a long history of work attempting to build ontologies for a variety of task domains, e.g., an ontology for liquids, the lumped element model widely used in representing electronic circuits, as well as ontologies for time, belief, and even programming itself. Each of these offers a way to see some part of the world.
The lumped element model, for instance, suggests that we think of circuits in terms of components with connections between them, with signals flowing instantaneously along the connections. This is a useful view, but not the only possible one. A different ontology arises if we need to attend to the electrodynamics in the device: Here signals propagate at finite speed and an object that was previously viewed as a single component with an I/O behavior may now have to be thought of as an extended medium through which an electromagnetic wave flows.
Ontologies can of course be written down in a wide variety of languages and notations ; the essential information is not the form of that language but the content, i.e., the set of concepts offered as a way of thinking about the world. Simply put, the important part is notions like connections and components, not the choice between writing them as predicates or LISP constructs.
The commitment made selecting one or another ontology can produce a sharply different view of the task at hand. Consider the difference that arises in selecting the lumped element view of a circuit rather than the electrodynamic view of the same device. As a second example, medical diagnosis viewed in terms of rules looks substantially different from the same task viewed in terms of frames. Where MYCIN sees the medical world as made up of empirical associations connecting symptom to disease, INTERNIST sees a set of prototypes, in particular prototypical diseases, to be matched against the case at hand.