Outline of natural language processing
Natural language processing is computer activity in which computers are entailed to analyze, understand, alter, or generate natural language. This includes the automation of any or all linguistic forms, activities, or methods of communication, such as conversation, correspondence, reading, written composition, dictation, publishing, translation, lip reading, and so on. Natural-language processing is also the name of the branch of computer science, artificial intelligence, and linguistics concerned with enabling computers to engage in communication using natural language in all forms, including but not limited to speech, print, writing, and signing. The following outline is provided as an overview of and topical guide to natural-language processing:
Natural-language processing
Natural-language processing can be described as all of the following:- A field of science - systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe.
- * An applied science - field that applies human knowledge to build or design useful things.
- ** A field of computer science - scientific and practical approach to computation and its applications.
- *** A branch of artificial intelligence - intelligence of machines and robots and the branch of computer science that aims to create it.
- *** A subfield of computational linguistics - interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective.
- * An application of engineering - science, skill, and profession of acquiring and applying scientific, economic, social, and practical knowledge, in order to design and also build structures, machines, devices, systems, materials and processes.
- ** An application of software engineering - application of a systematic, disciplined, quantifiable approach to the design, development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software.
- *** A subfield of computer programming - process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a set of instructions that computers use to perform specific operations or to exhibit desired behaviors.
- **** A subfield of artificial intelligence programming -
- A type of system - set of interacting or interdependent components forming an integrated whole or a set of elements and relationships which are different from relationships of the set or its elements to other elements or sets.
- * A system that includes software - software is a collection of computer programs and related data that provides the instructions for telling a computer what to do and how to do it. Software refers to one or more computer programs and data held in the storage of the computer. In other words, software is a set of programs, procedures, algorithms and its documentation concerned with the operation of a data processing system.
- A type of technology - making, modification, usage, and knowledge of tools, machines, techniques, crafts, systems, methods of organization, in order to solve a problem, improve a preexisting solution to a problem, achieve a goal, handle an applied input/output relation or perform a specific function. It can also refer to the collection of such tools, machinery, modifications, arrangements and procedures. Technologies significantly affect human as well as other animal species' ability to control and adapt to their natural environments.
- * A form of computer technology - computers and their application. NLP makes use of computers, image scanners, microphones, and many types of software programs.
- ** Language technology - consists of natural-language processing and computational linguistics on the one hand, and speech technology on the other. It also includes many application oriented aspects of these. It is often called human language technology.
Prerequisite technologies
- Communication - the activity of a source sending a message to a receiver
- * Language -
- ** Speech -
- ** Writing -
- * Computing -
- ** Computers -
- ** Computer programming -
- *** Information extraction -
- *** User interface -
- ** Software -
- *** Text editing - program used to edit plain text files
- *** Word processing - piece of software used for composing, editing, formatting, printing documents
- ** Input devices - pieces of hardware for sending data to a computer to be processed
- *** Computer keyboard - typewriter style input device whose input is converted into various data depending on the circumstances
- *** Image scanners -
Subfields of natural-language processing
- Information extraction - field concerned in general with the extraction of semantic information from text. This covers tasks such as named-entity recognition, coreference resolution, relationship extraction, etc.
- Ontology engineering - field that studies the methods and methodologies for building ontologies, which are formal representations of a set of concepts within a domain and the relationships between those concepts.
- Speech processing - field that covers speech recognition, text-to-speech and related tasks.
- Statistical natural-language processing -
- * Statistical semantics - a subfield of computational semantics that establishes semantic relations between words to examine their contexts.
- ** Distributional semantics - a subfield of statistical semantics that examines the semantic relationship of words across a corpora or in large samples of data.
Related fields
- Automated reasoning - area of computer science and mathematical logic dedicated to understanding various aspects of reasoning, and producing software which allows computers to reason completely, or nearly completely, automatically. A sub-field of artificial intelligence, automatic reasoning is also grounded in theoretical computer science and philosophy of mind.
- Linguistics - scientific study of human language. Natural-language processing requires understanding of the structure and application of language, and therefore it draws heavily from linguistics.
- * Applied linguistics - interdisciplinary field of study that identifies, investigates, and offers solutions to language-related real-life problems. Some of the academic fields related to applied linguistics are education, linguistics, psychology, computer science, anthropology, and sociology. Some of the subfields of applied linguistics relevant to natural-language processing are:
- ** Bilingualism / Multilingualism -
- ** Computer-mediated communication - any communicative transaction that occurs through the use of two or more networked computers. Research on CMC focuses largely on the social effects of different computer-supported communication technologies. Many recent studies involve Internet-based social networking supported by social software.
- ** Contrastive linguistics - practice-oriented linguistic approach that seeks to describe the differences and similarities between a pair of languages.
- ** Conversation analysis - approach to the study of social interaction, embracing both verbal and non-verbal conduct, in situations of everyday life. Turn-taking is one aspect of language use that is studied by CA.
- ** Discourse analysis - various approaches to analyzing written, vocal, or sign language use or any significant semiotic event.
- ** Forensic linguistics - application of linguistic knowledge, methods and insights to the forensic context of law, language, crime investigation, trial, and judicial procedure.
- ** Interlinguistics - study of improving communications between people of different first languages with the use of ethnic and auxiliary languages. For instance by use of intentional international auxiliary languages, such as Esperanto or Interlingua, or spontaneous interlanguages known as pidgin languages.
- ** Language assessment - assessment of first, second or other language in the school, college, or university context; assessment of language use in the workplace; and assessment of language in the immigration, citizenship, and asylum contexts. The assessment may include analyses of listening, speaking, reading, writing or cultural understanding, with respect to understanding how the language works theoretically and the ability to use the language practically.
- ** Language pedagogy - science and art of language education, including approaches and methods of language teaching and study. Natural-language processing is used in programs designed to teach language, including first- and second-language training.
- ** Language planning -
- ** Language policy -
- ** Lexicography -
- ** Literacies -
- ** Pragmatics -
- ** Second-language acquisition -
- ** Stylistics -
- ** Translation -
- * Computational linguistics - interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective. The models and tools of computational linguistics are used extensively in the field of natural-language processing, and vice versa.
- ** Computational semantics -
- ** Corpus linguistics - study of language as expressed in samples of "real world" text. Corpora is the plural of corpus, and a corpus is a specifically selected collection of texts composed of natural language. After it is constructed, a corpus is analyzed with the methods of computational linguistics to infer the meaning and context of its components, and the relationships between them. Optionally, a corpus can be annotated with data to make the corpus easier to understand. This data is then applied to make sense of user input, for example, to make better guesses of what people are talking about or saying, perhaps to achieve more narrowly focused web searches, or for speech recognition.
- * Metalinguistics -
- * Sign linguistics - scientific study and analysis of natural sign languages, their features, their structure, their acquisition, how they develop independently of other languages, their application in communication, their relationships to other languages, and many other aspects.
- Human–computer interaction - the intersection of computer science and behavioral sciences, this field involves the study, planning, and design of the interaction between people and computers. Attention to human-machine interaction is important, because poorly designed human-machine interfaces can lead to many unexpected problems. A classic example of this is the Three Mile Island accident where investigations concluded that the design of the human–machine interface was at least partially responsible for the disaster.
- Information retrieval - field concerned with storing, searching and retrieving information. It is a separate field within computer science, but IR relies on some NLP methods. Some current research and applications seek to bridge the gap between IR and NLP.
- Knowledge representation - area of artificial intelligence research aimed at representing knowledge in symbols to facilitate inferencing from those knowledge elements, creating new elements of knowledge. Knowledge Representation research involves analysis of how to reason accurately and effectively and how best to use a set of symbols to represent a set of facts within a knowledge domain.
- * Semantic network - study of semantic relations between concepts.
- ** Semantic Web -
- Machine learning - subfield of computer science that examines pattern recognition and computational learning theory in artificial intelligence. There are three broad approaches to machine learning. Supervised learning occurs when the machine is given example inputs and outputs by a teacher so that it can learn a rule that maps inputs to outputs. Unsupervised learning occurs when the machine determines the inputs structure without being provided example inputs or outputs. Reinforcement learning occurs when a machine must perform a goal without teacher feedback.
- * Pattern recognition - branch of machine learning that examines how machines recognize regularities in data. As with machine learning, teachers can train machines to recognize patterns by providing them with example inputs and outputs, or the machines can recognize patterns without being trained on any example inputs or outputs.
- * Statistical classification -