Association rule learning


Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. In any given transaction with a variety of items, association rules are meant to discover the rules that determine how or why certain items are connected.
Based on the concept of strong rules, Rakesh Agrawal, Tomasz Imieliński and Arun Swami introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale systems in supermarkets. For example, the rule found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements.
In addition to the above example from market basket analysis, association rules are employed today in many application areas including Web usage mining, intrusion detection, continuous production, and bioinformatics. In contrast with sequence mining, association rule learning typically does not consider the order of items either within a transaction or across transactions.
The association rule algorithm itself consists of various parameters that can make it difficult for those without some expertise in data mining to execute, with many rules that are arduous to understand.

Definition

Following the original definition by Agrawal, Imieliński, Swami the problem of association rule mining is defined as:
Let be a set of binary attributes called items.
Let be a set of transactions called the database.
Each transaction in has a unique transaction ID and contains a subset of the items in.
A rule is defined as an implication of the form:
In Agrawal, Imieliński, Swami a rule is defined only between a set and a single item, for.
Every rule is composed by two different sets of items, also known as itemsets, and, where is called antecedent or left-hand-side and consequent or right-hand-side. The antecedent is that item that can be found in the data while the consequent is the item found when combined with the antecedent. The statement is often read as if then , where the antecedent is the if and the consequent is the then. This simply implies that, in theory, whenever occurs in a dataset, then will as well.

Process

Association rules are made by searching data for frequent if-then patterns and by using a certain criterion under Support and Confidence to define what the most important relationships are. Support is the evidence of how frequent an item appears in the data given, as Confidence is defined by how many times the if-then statements are found true. However, there is a third criteria that can be used, it is called Lift and it can be used to compare the expected Confidence and the actual Confidence. Lift will show how many times the if-then statement is expected to be found to be true.
Association rules are made to calculate from itemsets, which are created by two or more items. If the rules were built from the analyzing from all the possible itemsets from the data then there would be so many rules that they wouldn’t have any meaning. That is why Association rules are typically made from rules that are well represented by the data.
There are many different data mining techniques you could use to find certain analytics and results, for example, there is Classification analysis, Clustering analysis, and Regression analysis. What technique you should use depends on what you are looking for with your data. Association rules are primarily used to find analytics and a prediction of customer behavior. For Classification analysis, it would most likely be used to question, make decisions, and predict behavior. Clustering analysis is primarily used when there are no assumptions made about the likely relationships within the data. Regression analysis Is used when you want to predict the value of a continuous dependent from a number of independent variables.
Benefits
There are many benefits of using Association rules like finding the pattern that helps understand the correlations and co-occurrences between data sets. A very good real-world example that uses Association rules would be medicine. Medicine uses Association rules to help diagnose patients. When diagnosing patients there are many variables to consider as many diseases will share similar symptoms. With the use of the Association rules, doctors can determine the conditional probability of an illness by comparing symptom relationships from past cases.
Downsides
However, Association rules also lead to many different downsides such as finding the appropriate parameter and threshold settings for the mining algorithm. But there is also the downside of having a large number of discovered rules. The reason is that this does not guarantee that the rules will be found relevant, but it could also cause the algorithm to have low performance. Sometimes the implemented algorithms will contain too many variables and parameters. For someone that doesn’t have a good concept of data mining, this might cause them to have trouble understanding it.
ThresholdsWhen using Association rules, you are most likely to only use Support and Confidence. However, this means you have to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time. Usually, the Association rule generation is split into two different steps that needs to be applied:
  1. A minimum Support threshold to find all the frequent itemsets that are in the database.
  2. A minimum Confidence threshold to the frequent itemsets found to create rules.
ItemsSupportConfidenceItemsSupportConfidence
Item A30%50%Item C45%55%
Item B15%25%Item A30%50%
Item C45%55%Item D35%40%
Item D35%40%Item B15%25%

The Support Threshold is 30%, Confidence Threshold is 50%
The Table on the left is the original unorganized data and the table on the right is organized by the thresholds. In this case Item C is better than the thresholds for both Support and Confidence which is why it is first. Item A is second because its threshold values are spot on. Item D has met the threshold for Support but not Confidence. Item B has not met the threshold for either Support or Confidence and that is why it is last.
To find all the frequent itemsets in a database is not an easy task since it involves going through all the data to find all possible item combinations from all possible itemsets. The set of possible itemsets is the power set over and has size , of course this means to exclude the empty set which is not considered to be a valid itemset. However, the size of the power set will grow exponentially in the number of item that is within the power set. An efficient search is possible by using the downward-closure property of support. This would guarantee that a frequent itemset and all its subsets are also frequent and thus will have no infrequent itemsets as a subset of a frequent itemset. Exploiting this property, efficient algorithms can find all frequent itemsets.

Useful Concepts

To illustrate the concepts, we use a small example from the supermarket domain. Table 2 shows a small database containing the items where, in each entry, the value 1 means the presence of the item in the corresponding transaction, and the value 0 represents the absence of an item in that transaction. The set of items is.
An example rule for the supermarket could be meaning that if butter and bread are bought, customers also buy milk.
In order to select interesting rules from the set of all possible rules, constraints on various measures of significance and interest are used. The best-known constraints are minimum thresholds on support and confidence.
Let be itemsets, an association rule and a set of transactions of a given database.
Note: this example is extremely small. In practical applications, a rule needs a support of several hundred transactions before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Support

Support is an indication of how frequently the itemset appears in the dataset:
The support of a rule is defined as:
where A and B are separate item sets that occur at the same time in a transaction.
Using Table 2 as an example, the itemset has a support of since it occurs in 20% of all transactions. The argument of support of X is a set of preconditions, and thus becomes more restrictive as it grows.
Furthermore, the itemset has a support of as it appears in 20% of all transactions as well.
When using antecedents and consequents, it allows a data miner to determine the support of multiple items being bought together in comparison to the whole data set. For example, Table 2 shows that if milk is bought, then bread is bought has a support of 0.4 or 40%. This because in 2 out 5 of the transactions, milk as well as bread are bought. In smaller data sets like this example, it is harder to see a strong correlation when there are few samples, but when the data set grows larger, support can be used to find correlation between two or more products in the supermarket example.
Minimum support thresholds are useful for determining which itemsets are preferred or interesting.
If we set the support threshold to ≥0.4 in Table 3, then the would be removed since it did not meet the minimum threshold of 0.4. Minimum threshold is used to remove samples where there is not a strong enough support or confidence to deem the sample as important or interesting in the dataset.
Another way of finding interesting samples is to find the value of ×; this allows a data miner to see the samples where support and confidence are high enough to be highlighted in the dataset and prompt a closer look at the sample to find more information on the connection between the items.
Support can be beneficial for finding the connection between products in comparison to the whole dataset, whereas confidence looks at the connection between one or more items and another item. Below is a table that shows the comparison and contrast between support and support × confidence, using the information from Table 4 to derive the confidence values.
if Antecedent then Consequentsupportsupport X confidence
if buy milk, then buy bread2/5= 0.40.4×1.0= 0.4
if buy milk, then buy eggs1/5= 0.20.2×0.5= 0.1
if buy bread, then buy fruit2/5= 0.40.4×0.66= 0.264
if buy fruit, then buy eggs2/5= 0.40.4×0.66= 0.264
if buy milk and bread, then buy fruit2/5= 0.40.4×1.0= 0.4

The support of with respect to is defined as the proportion of transactions in the dataset which contains the itemset. Denoting a transaction by where is the unique identifier of the transaction and is its itemset, the support may be written as:
This notation can be used when defining more complicated datasets where the items and itemsets may not be as easy as our supermarket example above. Other examples of where support can be used is in finding groups of genetic mutations that work collectively to cause a disease, investigating the number of subscribers that respond to upgrade offers, and discovering which products in a drug store are never bought together.