Scannerless parsing


In computer science, scannerless parsing performs tokenization and parsing in a single step, rather than breaking it up into a pipeline of a lexer followed by a parser, executing concurrently. A language grammar is scannerless if it uses a single formalism to express both the lexical and phrase level structure of the language.
Dividing processing into a lexer followed by a parser is more modular; scannerless parsing is primarily used when a clear lexer–parser distinction is unneeded or unwanted. Examples of when this is appropriate include TeX, most wiki grammars, makefiles, simple application-specific scripting languages, and Raku.

Advantages

Disadvantages

  • Since the lexical scanning and syntactic parsing are combined, the resulting parser tends to be more complicated and thus harder to understand and debug. The same will hold for the associated grammar, if a grammar is used to generate the parser.
  • The resulting parser tends to be significantly less efficient than a lexer-parser pipeline with regard to both time and memory.

Implementations