Data validation
In computing, data validation or input validation is the process of ensuring data has undergone data cleansing to confirm it has data quality, that is, that it is both correct and useful. It uses routines, often called "validation rules", "validation constraints", or "check routines", that check for correctness, meaningfulness, and security of data that are input to the system. The rules may be implemented through the automated facilities of a data dictionary, or by the inclusion of explicit application program validation logic of the computer and its application.
This is distinct from formal verification, which attempts to prove or disprove the correctness of algorithms for implementing a specification or property.
Overview
Data validation is intended to provide certain well-defined guarantees for fitness and consistency of data in an application or automated system. Data validation rules can be defined and designed using various methodologies, and be deployed in various contexts. Their implementation can use declarative data integrity rules, or procedure-based business rules.The guarantees of data validation do not necessarily include accuracy, and it is possible for data entry errors such as misspellings to be accepted as valid. Other clerical and/or computer controls may be applied to reduce inaccuracy within a system.
Different kinds
In evaluating the basics of data validation, generalizations can be made regarding the different kinds of validation according to their scope, complexity, and purpose.For example:
- Data type validation;
- Range and constraint validation;
- Code and cross-reference validation;
- Structured validation; and
- Consistency validation
Data-type check
The simplest kind of data type validation verifies that the individual characters provided through user input are consistent with the expected characters of one or more known primitive data types as defined in a programming language or data storage and retrieval mechanism.
For example, an integer field may require input to use only characters 0 through 9.
Simple range and constraint check
Simple range and constraint validation may examine input for consistency with a minimum/maximum range, or consistency with a test for evaluating a sequence of characters, such as one or more tests against regular expressions. For example, a counter value may be required to be a non-negative integer, and a password may be required to meet a minimum length and contain characters from multiple categories.Code and cross-reference check
Code and cross-reference validation includes operations to verify that data is consistent with one or more possibly-external rules, requirements, or collections relevant to a particular organization, context or set of underlying assumptions. These additional validity constraints may involve cross-referencing supplied data with a known look-up table or directory information service such as LDAP.For example, a user-provided country code might be required to identify a current geopolitical region.
Structured check
Structured validation allows for the combination of other kinds of validation, along with more complex processing. Such complex processing may include the testing of conditional constraints for an entire complex data object or set of process operations within a system.Consistency check
Consistency validation ensures that data is logical. For example, the delivery date of an order can be prohibited from preceding its shipment date.Example
Multiple kinds of data validation are relevant to 10-digit pre-2007 ISBNs.- Size. A pre-2007 ISBN must consist of 10 digits, with optional hyphens or spaces separating its four parts.
- Format checks. Each of the first 9 digits must be 0 through 9, and the 10th must be either 0 through 9 or an X.
- Check digit. To detect transcription errors in which digits have been altered or transposed, the last digit of a pre-2007 ISBN must match the result of a mathematical formula incorporating the other 9 digits.
Validation types
;Batch totals
;Cardinality check
;Check digits
;Consistency checks
;Cross-system consistency checks
;Data type checks
;File existence check
;Format check
;Presence check
;Range check
;Referential integrity
;Spelling and grammar check
;Uniqueness check
;Table look up check
Post-validation actions
;Enforcement Action;Advisory Action
;Verification Action
;Log of validation