Lexicon-grammar

Lexicon-Grammar is a method and a praxis of formalized description of human languages, which considers that the systematic investigation of lexical entries is presently the main challenge of the scientific study of languages. The development of Lexicon-Grammar began in the late 1960s under Maurice Gross.
Its theoretical basis is Zellig S. Harris's distributionalism, and notably the notion of transformational rule. The notational conventions are meant to be as clear and comprehensible as possible.
The method of Lexicon-Grammar is inspired from hard sciences. It focuses on data collection, hence on the real use of language, both from a quantitative and observational point of view.
Lexicon-grammar also requires formalisation. The results of the description must be sufficiently formal to be usable for natural language processing, in particular through the development of parsers. The information model is such that the results of the description take the form of two-dimensional tables, also called matrices, which cross-tabulate entries with syntactic-semantic properties. As a result, the Lexicon-grammar addresses "the problem of correlations between syntax and the lexicon."
Experiments showed that several researchers or teams can make their observations cumulative.
The term lexicon-grammar is used for the first time by Annibale Elia, after grammar-lexicon by Carl Vikner.

Theoretical basis

The theoretical basis of Lexicon-grammar is Zellig Harris' distributional structuralism, and in particular the notion of transformation in the sense of Zellig Harris. Maurice Gross was a student of Zellig Harris. "The view of language of Maurice Gross and his followers is a combination of structuralism and transformational grammar." The Lexicon-grammar also uses semantics: "the definition of a transformational rule explicitly involves meaning, since transformationally related sentences must have identical meanings. Transformationally related sentences may have systematic differences of meaning." The main difference with Harris' approach is that the native speaker first guesses the elementary sentence of lexical items on an intuitive basis, before transformational analysis confirms or infirms the guess. Harris' approach consists in investigating transformation rules in priority, which discloses the elementary sentences of lexical items as a final result of the analysis. The conventions for the presentation of grammatical information are intended to be as simple and transparent as possible. This concern comes from Zellig Harris, whose theory is oriented towards directly observable surface forms; this differs from Generative Grammar, which normally uses abstract structures such as deep structures.

Fact collection

The Lexicon-Grammar method is inspired by experimental science. It emphasizes the collection of facts, confronting the researcher with the reality of language use, from a quantitative and an observational point of view.
Quantitatively: a lexicon-grammar includes a program of systematic description of the lexicon, including observing for each lexical entry in which syntactic constructions it occurs. This involves large-scale work, which can be carried out by teams and not by individual specialists. The exclusive search for general rules of syntax, independent of the lexical material they handle, is dismissed as a dead end. This is different from Generative Grammar, which values the notion of generalization.
Observationally: methodological precautions are applied to ensure good reproducibility of observations, and in particular to guard against the risks associated with constructed examples. One of these precautions is to take as a minimum unit of meaning the basic sentence. Indeed, a word acquires a precise meaning only in a context; moreover, by inserting a word in a sentence, one has the advantage of manipulating a sequence that may be judged acceptable or unacceptable. It is at this price that syntactic-semantic properties are considered as defined with sufficient precision that it makes sense to test and check them against the whole lexicon. These precautions have evolved in line with needs and the appearance of new technical means. Thus, from the beginning of the 1990s, the contributors of the Lexicon-Grammar have been able to use attested examples in text corpora more and more easily. This new precaution has simply been added to the previous ones, positioning the Lexicon-Grammar simultaneously in introspective linguistics and in corpus linguistics, much as advocated by Fillmore. The American projects FrameNet and VerbNet show a relative convergence towards objectives close to those of Lexicon-Grammar.

Formalisation

Lexicon-grammar also requires formalisation. The results of the description must be sufficiently formal to allow for:

verification by comparison with the reality of language use;
application to automatic language processing, and more particularly to deep linguistic processing, in particular through the development of parsers by computer scientists.

This requirement for formalization leads to the adoption of a discretized model of syntax and semantics. Thus, acceptability is modeled by a binary property: for the purposes of description, a sentence is considered either well-formed or not, as in Generative Grammar and for the same reasons.
Similarly, lexical ambiguity is represented by carefully separating a word into an integer number of lexical entries, which are distinct from one another in the same way that two entries for morphologically different words are distinct. For example, the different meanings of tell correspond to distinct entries.
The syntactic-semantic properties of the entries form a list that is systematically compared to all the entries. They are identified by fairly informal headings such as, which represents a transformation between two sentence structures belonging to the same entry.
Finally, the approach retains only those properties for which a procedure exists that allows for a sufficiently reliable determination of whether a given lexical entry possesses it or not. Such a procedure is determined experimentally by testing the reproducibility of judgments on an extensive vocabulary. Properties are therefore modeled as binary and not as continua.
Within this formal model, describing a language essentially consists of specifying the properties of its lexical entries. The results of the description naturally take the form of two-dimensional tables, also called matrices, which cross-tabulate entries with syntactic-semantic properties. As a result, they make up a database of syntactic-semantic information. Thus, the lexicon-grammar allows for dealing with the syntactic properties by which one entry differs from another, and addresses "the problem of correlations between syntax and the lexicon."
The description of sentence structure involves identifying a set of arguments characteristic of each predicative entry. In particular, principles are applied to distinguish arguments from non-essential complements.

Results

The results obtained through the application of these methodological principles by several dozen linguists over several decades constitute a database of syntactic-semantic information for natural language processing. The quality of this database can be assessed by considering:

its size, measurable by the number of entries,
the richness of the linguistic phenomena it covers, measurable by the number of properties,
and its degree of formalization.

For French, more than 75,000 entries have been established; more or less substantial descriptions—always following the same model—exist for about ten other languages, the best-represented being Italian, Portuguese, Modern Greek, and Korean.
Work has been carried out and published within the Lexicon–Grammar framework on predicative nouns since the 1970s, and on fixed expressions since the 1980s.
The notion of a noun predicate comes from the work of Zellig Harris. It is based on the following parallel: if, for example, the verb study is analyzed as the predicate in the sentence Luke studies eclipses, then it is natural to analyze the noun study—or the sequence carry out a study—as a predicate in the sentence Luke carries out a study on eclipses. In this case, the noun in question is called a noun predicate. The verb that accompanies it—here carry out—is referred to as a. This idea has been systematically applied in the Lexicon–Grammar framework since the 1970s.
Lexicon–Grammar contributors use the term 'fixed expression' when an expression has specific properties that justify giving it its own lexical entry, even though it is made up of several elements that can, in one way or another, be considered individual words. A systematic program for describing such expressions has been undertaken within the Lexicon–Grammar framework since the 1980s.

Cumulativity

These experiments have shown that several individuals or teams can arrive at identical results. This reproducibility ensures the cumulativity of the descriptions. This outcome is crucial for the future of language processing: the amount of data that must be gathered and represented within a coherent model is such that many research and development teams have to collaborate, and it must be possible to merge their results without having to rewrite substantial parts of the grammar and lexicon of each language. This requirement is far from easy to meet, as there are few known examples of grammars of significant size that are not the work of a single specialist.

Interface with international standards

With the goal of improving the availability of the data in a clear and explicit way, part of the French Lexicon–Grammar has been transcoded into the ISO-standard LMF format.

Selected bibliography

Gross, Maurice. 1994. "The lexicon grammar of a language: Application to French", in Asher, R.E., Encyclopedia of Language and Linguistics, Oxford: Pergamon Press, pp. 2195–2205.
郑定欧 . 2012. 词汇-语法五十年. Lexicon-grammar: 50 years. Lexique-Grammaire : 50 ans. Beijing/Guangzhou/Shanghai/Xi'an :, 278 pages..