Probabilistic database
Most real databases contain data whose correctness is uncertain. In order to work with such data, there is a need to quantify the integrity of the data. This is achieved by using probabilistic databases.
A probabilistic database is an uncertain database in which the possible worlds have associated probabilities. Probabilistic database management systems are currently an active area of research. "While there are currently no commercial probabilistic database systems, several research prototypes exist..."
Probabilistic databases distinguish between the logical data model and the physical representation of the data much like relational databases do in the ANSI-SPARC Architecture.
In probabilistic databases this is even more crucial since such databases have to represent very large numbers of possible worlds, often exponential in the size of one world, succinctly.
Terminology
In a probabilistic database, each tuple is associated with a probability between 0 and 1, with 0 representing that the data is certainly incorrect, and 1 representing that it is certainly correct.Possible worlds
A probabilistic database could exist in multiple states. For example, if there is uncertainty about the existence of a tuple in the database, then the database could be in two different states with respect to that tuple—the first state contains the tuple, while the second one does not. Similarly, if an attribute can take one of the values x, y or z, then the database can be in three different states with respect to that attribute.Each of these states is called a possible world.
Consider the following database:
| A | B |
| a1 | b1 |
| a2 | b2 |
| a3 |
- Assuming that there is uncertainty about the first tuple, certainty about the second tuple, and uncertainty about the value of attribute B in the third tuple.
Consequently, the possible worlds corresponding to the database are as follows:
| A | B |
| a1 | b1 |
| a2 | b2 |
| a3 | b3 |
| A | B |
| a1 | b1 |
| a2 | b2 |
| a3 | b3′′ |
| A | B |
| a2 | b2 |
| a3 | b3 |
| A | B |
| a2 | b2 |
| a3 | b3′′ |
Types of Uncertainties
There are essentially two kinds of uncertainties that could exist in a probabilistic database, as described in the table below:| Tuple-level uncertainty | Attribute-level uncertainty |
| Uncertainty if a tuple is correct or not, that is, whether it should exist in the database or not. | Uncertainty about the values that an attribute of a tuple can take, that is, it could take one of the several possible values. |
| Corresponding to each uncertain tuple, there are two possible worlds: one which includes the tuple while the other which does not. | Corresponding to each uncertain attribute which can take one of the values a1,...,an, there are n possible worlds. |
| Tuple-level uncertainty can be seen as a boolean random variable associated with each uncertain tuple. | Attribute-level uncertainty can be seen as a random variable associated with each uncertain attribute which can take values a1,...,an. |
By assigning values to random variables associated with the data items, different possible worlds can be represented.