Semantic interoperability
Semantic interoperability is the ability of computer systems to exchange data with unambiguous, shared meaning. Semantic interoperability is a requirement to enable machine computable logic, inferencing, knowledge discovery, and data federation between information systems.
Semantic interoperability is therefore concerned not just with the packaging of data, but the simultaneous transmission of the meaning with the data. This is accomplished by adding data about the data, linking each data element to a controlled, shared vocabulary. The meaning of the data is transmitted with the data itself, in one self-describing "information package" that is independent of any information system. It is this shared vocabulary, and its associated links to an ontology, which provides the foundation and capability of machine interpretation, inference, and logic.
Historical precursors
In the 1960s, Jacques Blois at the Université Libre de Bruxelles pioneered semantic interoperability through his morphological analysis system for the DICAUTOM project, funded by Euratom and the CECA. This system standardized multilingual terminology by reducing inflected word forms to their lemmas and linking them to shared semantic units, enabling machine-interpretable meaning across languages. By creating a proto-ontology with metadata-enriched entries, Blois anticipated modern vocabulary-based data exchange and ontologies, as described in his 1962 work Morphologie du français pour la traduction automatique and a 1969 ULB report co-authored with Lydia Hirschberg on a "manual of French coding." His system underpinned EURODICAUTOM, a precursor to the EU's IATE database, facilitating cross-lingual interoperability.Syntactic interoperability is a prerequisite for semantic interoperability. Syntactic interoperability refers to the packaging and transmission mechanisms for data. In healthcare, HL7 has been in use for over thirty years, and uses the pipe character as a data delimiter. The current internet standard for document markup is XML, which uses "< >" as a data delimiter. The data delimiters convey no meaning to the data other than to structure the data. Without a data dictionary to translate the contents of the delimiters, the data remains meaningless. While there are many attempts at creating data dictionaries and information models to associate with these data packaging mechanisms, none have been practical to implement. This has only perpetuated the ongoing "babelization" of data and inability to exchange data with meaning.
Since the introduction of the Semantic Web concept by Tim Berners-Lee in 1999, there has been growing interest and application of the W3C standards to provide web-scale semantic data exchange, federation, and inferencing capabilities.
Semantic as a function of syntactic interoperability
Syntactic interoperability, provided by for instance XML or the SQL standards, is a pre-requisite to semantic. It involves a common data format and common protocol to structure any data so that the manner of processing the information will be interpretable from the structure. It also allows detection of syntactic errors, thus allowing receiving systems to request resending of any message that appears to be garbled or incomplete. No semantic communication is possible if the syntax is garbled or unable to represent the data. However, information represented in one syntax may in some cases be accurately translated into a different syntax. Where accurate translation of syntaxes is possible, systems using different syntaxes may also interoperate accurately. In some cases, the ability to accurately translate information among systems using different syntaxes may be limited to one direction, when the formalisms used have different levels of expressivity.A single ontology containing representations of every term used in every application is generally considered impossible, because of the rapid creation of new terms or assignments of new meanings to old terms. However, though it is impossible to anticipate every concept that a user may wish to represent in a computer, there is the possibility of finding some finite set of "primitive" concept representations that can be combined to create any of the more specific concepts that users may need for any given set of applications or ontologies. Having a foundation ontology that contains all those primitive elements would provide a sound basis for general semantic interoperability, and allow users to define any new terms they need by using the basic inventory of ontology elements, and still have those newly defined terms properly interpreted by any other computer system that can interpret the basic foundation ontology. Whether the number of such primitive concept representations is in fact finite, or will expand indefinitely, is a question under active investigation. If it is finite, then a stable foundation ontology suitable to support accurate and general semantic interoperability can evolve after some initial foundation ontology has been tested and used by a wide variety of users. At the present time, no foundation ontology has been adopted by a wide community, so such a stable foundation ontology is still in the future.
Words and meanings
One persistent misunderstanding recurs in discussion of semantics is "the confusion of words and meanings". The meanings of words change, sometimes rapidly. But a formal language such as used in an ontology can encode the meanings of concepts in a form that does not change. In order to determine what is the meaning of a particular word it is necessary to label each fixed concept representation in an ontology with the word or term that may refer to that concept. When multiple words refer to the same concept in language this is called synonymy; when one word is used to refer to more than one concept, that is called ambiguity.Ambiguity and synonymy are among the factors that make computer understanding of language very difficult. The use of words to refer to concepts is very sensitive to the context and the purpose of any use for many human-readable terms. The use of ontologies in supporting semantic interoperability is to provide a fixed set of concepts whose meanings and relations are stable and can be agreed to by users. The task of determining which terms in which contexts is then separated from the task of creating the ontology, and must be taken up by the designer of a database, or the designer of a form for data entry, or the developer of a program for language understanding. When the meaning of a word used in some interoperable context is changed, then to preserve interoperability it is necessary to change the pointer to the ontology element that specifies the meaning of that word.
Knowledge representation requirements and languages
A knowledge representation language may be sufficiently expressive to describe nuances of meaning in well understood fields. There are at least five levels of complexity of these.For general semi-structured data one may use a general purpose language such as XML.
Languages with the full power of first-order predicate logic may be required for many tasks.
Human languages are highly expressive, but are considered too ambiguous to allow the accurate interpretation desired, given the current level of human language technology.
Semantic interoperability healthcare systems leverage data in a standardized way as they break down and share information. For example, two systems can now recognize terminology and medication symbols. Semantic interoperability healthcare systems leverage data in a standardized way as they break down and share information. For example, two systems can now recognize terminology, medication symbols, and other nuances while exchanging data automatically, without human intervention.
Prior agreement not required
Semantic interoperability may be distinguished from other forms of interoperability by considering whether the information transferred has, in its communicated form, all of the meaning required for the receiving system to interpret it correctly, even when the algorithms used by the receiving system are unknown to the sending system. Consider sending one number:If that number is intended to be the sum of money owed by one company to another, it implies some action or lack of action on the part of both those who send it and those who receive it.
It may be correctly interpreted if sent in response to a specific request, and received at the time and in the form expected. This correct interpretation does not depend only on the number itself, which could represent almost any of millions of types of quantitative measurement, rather it depends strictly on the circumstances of transmission. That is, the interpretation depends on both systems expecting that the algorithms in the other system use the number in exactly the same sense, and it depends further on the entire envelope of transmissions that preceded the actual transmission of the bare number.
By contrast, if the transmitting system does not know how the information will be used by other systems, it is necessary to have a shared agreement on how information with some specific meaning will appear in a communication. For a particular task, one solution is to standardize a form, such as a request for payment; that request would have to encode, in standardized fashion, all of the information needed to evaluate it, such as: the agent owing the money, the agent owed the money, the nature of the action giving rise to the debt, the agents, goods, services, and other participants in that action; the time of the action; the amount owed and currency in which the debt is reckoned; the time allowed for payment; the form of payment demanded; and other information. When two or more systems have agreed on how to interpret the information in such a request, they can achieve semantic interoperability for that specific type of transaction. For semantic interoperability generally, it is necessary to provide standardized ways to describe the meanings of many more things than just commercial transactions, and the number of concepts whose representation needs to be agreed upon are at a minimum several thousand.