Constituent (linguistics)
In syntactic analysis, a constituent is a word or a group of words that function as a single unit within a hierarchical structure. The constituent structure of sentences is identified using tests for constituents. These tests apply to a portion of a sentence, and the results provide evidence about the constituent structure of the sentence. Many constituents are phrases. A phrase is a sequence of one or more words built around a head lexical item and working as a unit within a sentence. A word sequence is shown to be a phrase/constituent if it exhibits one or more of the behaviors discussed below. The analysis of constituent structure is associated mainly with phrase structure grammars, although dependency grammars also allow sentence structure to be broken down into constituent parts.
Tests for constituents in English
Tests for constituents are diagnostics used to identify sentence structure. There are numerous tests for constituents that are commonly used to identify the constituents of English sentences. 15 of the most commonly used tests are listed next: 1) coordination pro-form substitution topicalization do-so-substitution, 5) one-substitution, 6) answer ellipsis clefting, 8) VP-ellipsis, 9) pseudoclefting, 10) passivization, 11) omission intrusion, 13) wh-fronting, 14) general substitution, 15) right node raising.The order in which these 15 tests are listed here corresponds to the frequency of use, coordination being the most frequently used of the 15 tests and RNR being the least frequently used. A general word of caution is warranted when employing these tests, since they often deliver contradictory results. The tests are merely rough-and-ready tools that grammarians employ to reveal clues about syntactic structure. Some syntacticians even arrange the tests on a scale of reliability, with less-reliable tests treated as useful to confirm constituency though not sufficient on their own. Failing to pass a single test does not mean that the test string is not a constituent, and conversely, passing a single test does not necessarily mean the test string is a constituent. It is best to apply as many tests as possible to a given string in order to prove or to rule out its status as a constituent.
The 15 tests are introduced, discussed, and illustrated below mainly relying on the same one sentence:
By restricting the introduction and discussion of the tests for constituents below mainly to this one sentence, it becomes possible to compare the results of the tests. To aid the discussion and illustrations of the constituent structure of this sentence, the following two sentence diagrams are employed :
These diagrams show two potential analyses of the constituent structure of the sentence. A given node in a tree diagram is understood as marking a constituent, that is, a constituent is understood as corresponding to a given node and everything that that node exhaustively dominates. Hence the first tree, which shows the constituent structure according to dependency grammar, marks the following words and word combinations as constituents: Drunks, off, the, the customers, and put off the customers. The second tree, which shows the constituent structure according to phrase structure grammar, marks the following words and word combinations as constituents: Drunks, could, put, off, the, customers, the customers, put off the customers, and could put off the customers. The analyses in these two tree diagrams provide orientation for the discussion of tests for constituents that now follows.
Coordination
The coordination test assumes that only constituents can be coordinated, i.e., joined by means of a coordinator such as and, or, or but: The next examples demonstrate that coordination identifies individual words as constituents:The square brackets mark the conjuncts of the coordinate structures. Based on these data, one might assume that drunks, could, put off, and customers are constituents in the test sentence because these strings can be coordinated with bums, would, drive away, and neighbors, respectively. Coordination also identifies multi-word strings as constituents:
These data suggest that the customers, put off the customers, and could put off the customers are constituents in the test sentence.
Examples such as are not controversial insofar as many theories of sentence structure readily view the strings tested in sentences as constituents. However, additional data are problematic, since they suggest that certain strings are also constituents even though most theories of syntax do not acknowledge them as such, e.g.
These data suggest that could put off, put off these, and Drunks could are constituents in the test sentence. Most theories of syntax reject the notion that these strings are constituents, though. Data such as are sometimes addressed in terms of the right node raising mechanism.
The problem for the coordination test represented by examples is compounded when one looks beyond the test sentence, for one quickly finds that coordination suggests that a wide range of strings are constituents that most theories of syntax do not acknowledge as such, e.g.
The strings from home on Tuesday and from home on Tuesday on his bicycle are not viewed as constituents in most theories of syntax, and concerning sentence, it is very difficult there to even discern how one should delimit the conjuncts of the coordinate structure. The coordinate structures in are sometimes characterized in terms of non-constituent conjuncts, and the instance of coordination in sentence is sometimes discussed in terms of stripping and/or gapping.
Due to the difficulties suggested with examples, many grammarians view coordination skeptically regarding its value as a test for constituents. The discussion of the other tests for constituents below reveals that this skepticism is warranted, since coordination identifies many more strings as constituents than the other tests for constituents.
Proform substitution (replacement)
substitution, or replacement, involves replacing the test string with the appropriate proform. Substitution normally involves using a definite proform like it, he, there, here, etc. in place of a phrase or a clause. If such a change yields a grammatical sentence where the general structure has not been altered, then the test string is likely a constituent:These examples suggest that Drunks, the customers, and put off the customers in the test sentence are constituents. An important aspect of the proform test is the fact that it fails to identify most subphrasal strings as constituents, e.g.
These examples suggest that the individual words could, put, off, and customers should not be viewed as constituents. This suggestion is of course controversial, since most theories of syntax assume that individual words are constituents by default. The conclusion one can reach based on such examples, however, is that proform substitution using a definite proform identifies phrasal constituents only; it fails to identify sub-phrasal strings as constituents.
Topicalization (fronting)
involves moving the test string to the front of the sentence. It is a simple movement operation. Many instances of topicalization seem only marginally acceptable when taken out of context. Hence to suggest a context, an instance of topicalization can be preceded by ...and and a modal adverb can be added as well :These examples suggest that the customers and put off the customers are constituents in the test sentence. Topicalization is like many of the other tests in that it identifies phrasal constituents only. When the test sequence is a sub-phrasal string, topicalization fails:
These examples demonstrate that customers, could, put, off, and the fail the topicalization test. Since these strings are all sub-phrasal, one can conclude that topicalization is unable to identify sub-phrasal strings as constituents.
''Do-so''-substitution
Do-so-substitution is a test that substitutes a form of do so into the test sentence for the target string. This test is widely used to probe the structure of strings containing verbs. The test is limited in its applicability, though, precisely because it is only applicable to strings containing verbs:The 'a' example suggests that put off the customers is a constituent in the test sentence, whereas the b example fails to suggest that could put off the customers is a constituent, for do so cannot include the meaning of the modal verb could. To illustrate more completely how the do so test is employed, another test sentence is now used, one that contains two post-verbal adjunct phrases:
These data suggest that met them, met them in the pub, and met them in the pub because we had time are constituents in the test sentence. Taken together, such examples seem to motivate a structure for the test sentence that has a left-branching verb phrase, because only a left-branching verb phrase can view each of the indicated strings as a constituent. There is a problem with this sort of reasoning, however, as the next example illustrates:
In this case, did so appears to stand in for the discontinuous word combination consisting of met them and because we had time. Such a discontinuous combination of words cannot be construed as a constituent. That such an interpretation of did so is indeed possible is seen in a fuller sentence such as You met them in the cafe because you had time, and we did so in the pub. In this case, the preferred reading of did so is that it indeed simultaneously stands in for both met them and because we had time.