Recomsys.net

 

USA       India     Asia       Europe

Technology    Trade   Travel     Shopping   Entertainment

 

Home | Artificial Intelligence| Business Intelligence| Bio-Technology | Data Warehouse | Genetic Programming | Robotics Design| Multi   Agent Systems| Services | Telecommunications | Electric Power |  Oil and Gas | Pharma | Real Estate | Research | Search EngineSpintronics | Solar Power | Software Products| Management | Project Management |Space |

Advertisers | Air Travel Bookings |  Abrasive Tools| Acoustic Components |Adhesives & Sealants| Adhesive Tapes| Airlines | Air Beds & Mattresses |Air Pumps| Air Separators |Air Wrenches| Alarms | Alcohol | Anchors | Apparel | Bags | Bazzar | Books |Cars| CNBC | CCTV Camera | Cell Phones | Cement | Computers | Contact Lenses Coupons for Bags| Department Store | Electronics | Eye Care| Fashion | Furcoats | Furniture | History Channel | Home Products | Hotels | Hollywood | Jewelry | K-Mart |

K-Mart Coupons & Price drops | Industrial Products| Magazines | Office Furniture | Perfumes |  Pet Med | Property | Recom Amazon | Retail Store| Sears | Sears Coupons & Price Drops | Shopping Mall | Shopping Mall Electronics | Our Software Products| Our Services | Contact Us

Our Expertise areas:

SYSTEMS ENGINEERING  TRAINING  e-LEARNING  TELECOMMUNICATIONS  NETWORKING  INFORMATION TECHNOLOGY  HARDWARE ENGINEERING  PRODUCT DESIGN  CONSULTING

Artificial Intelligence

 Association Rules |Clustering |Decision Trees| Naive Bayes| Regression Analysis| Neural Networks| Time Series |Text Mining |

Genetic Programming

Association Rules main function is to Associate Past Events from the database and predict the future events association with high probability. Rules can be extended any dimension using various Algorithms. Interesting applications can be in the Space Flights, Human Response to various Medicines, Marketing Success/Failure Analysis, Stress Failure Analysis in Civil Structure Design, Deformation Studies of Bridges, Power Grid faults, Aerodynamics Effects on Flights, Radio Coverage Patterns of Cellular Networks, Call Failure Analysis in Telecommunications, Stocks Association in Price Movement and many other fields. We could effectively bring association of stocks where price movement was associated with one another, if one stock started falling others in the Group will start falling most of the time with same % Deviation.

 

Association Rules when used in conjunction with Neural Networks and Time Series created Predictions with Uncanny accuracy. Many times we were surprised with results that predicted values were accurate upto .01% Accuracy. But this is not routine and most of the predictions were within 1% of the Future values and worst case scenario was 5% accuracy.  

 

The aim of association rule mining is to find interesting and useful patterns in a transaction database. The database contains transactions which consist of a set of items and a transaction identifier like market basket. Support is defined on itemsets and gives the proportion of transactions which contain Z. It is used as a measure of significance (importance) of an itemset. Since it basically uses the count of transactions it is often called a frequency constraint. An itemset with a support > a set min. support threshold is called a frequent or large itemset. Confidence is defined as the probability of seeing the rule's consequent under the condition that the transactions also contain the antecedent. Confidence is directed and gives different values for the rules X -> Y and Y -> X.
 

Support is first used to find frequent (significant) itemsets exploiting its down-ward closure property to prune the search space. Then confidence is used in a second step to produce rules from the frequent itemsets that exceed a min. confidence threshold.
A problem with confidence is that it is sensitive to the frequency of the consequent (Y) in the database. Caused by the way confidence is calculated, consequents with higher support will automatically produce higher confidence values even if there exists no association between the items.

 

Association Rules

The input for a typical associations-mining algorithm is a set T of itemsets t, each of which is drawn from a set I of all possible items. Each t is a member of the power set 2I, but T is not considered a subset of 2I since it may contain duplicates (it is a multiset). Since I is typically large, the general problem of finding all common subsets in an arbitrary selection of itemsets is considered intractable. Therefore input sets in T, and any results derived there from, are typically assumed to be small. It is an ongoing area of research to find algorithms which relax this assumption and allow processing of larger sets.

Fixed-Confidence Mining
Commonly, association-finding algorithms attempt to find all sets of elements which occur in at least a fraction C of the data, where C is a chosen Confidence Threshold (e.g. 2%). The number of occurrences of a subset is called its support. Sets whose support exceeds C are called frequent itemsets.

If a set s is frequent, then any subset of s must also be, and most association-finding algorithms attempt to exploit this fact. Most association-finding algorithms reduce to a traversal of this subset lattice of I in some order, extending frequent itemsets and pruning out infrequent sets and their supersets.

The fixed confidence threshold has little basis in statistics, since some sets may exceed it simply by random coincidence (thereby defeating the goal of finding meaningful correlations), and some meaningful associations may occur in the data without reaching the threshold. However, in practice it does eliminate vast numbers of insignificant sets.

For a given data set, the set of its frequent itemsets can be described by its maximal frequent itemsets, which are frequent itemsets S that are not subsets of any larger frequent itemset T. During mining, finding maximal frequent itemsets first allows their subsets to be skipped, an important improvement if sets are large.


K-optimal pattern discovery is a data mining technique that provides an alternative to the frequent pattern discovery approach that underlies most association rule learning techniques.

Frequent pattern discovery techniques find all patterns for which there are sufficient examples in the sample data. In contrast, k-optimal pattern discovery techniques find the k patterns that optimize a user-specified measure of interest. The value k is also specified by the user.

Examples of k-optimal pattern discovery techniques include:

k-optimal classification rule discovery
k-optimal subgroup discovery
finding k most interesting patterns using sequential sampling
mining top.k frequent closed patterns without minimum support
k-optimal rule discovery

 

Apriori

Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). Other algorithms are designed for finding association rules in data having no transactions (Winepi and Minepi), or having no timestamps (DNA sequencing).
As is common in association rule mining, given a set of itemsets (for instance, sets of retail transactions, each listing individual items purchased), the algorithm attempts to find subsets which are common to at least a minimum number C (the cutoff, or confidence threshold) of the itemsets. Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found.
Apriori uses breadth-first search and a hash tree structure to count candidate item sets efficiently. It generates candidate item sets of length k from item sets of length k − 1. Then it prunes the candidates which have an infrequent sub pattern. According to the downward closure lemma, the candidate set contains all frequent k-length item sets. After that, it scans the transaction database to determine frequent item sets among the candidates. For determining frequent items quickly, the algorithm uses a hash tree to store candidate itemsets. This hash tree has item sets at the leaves and hash tables at internal nodes . Note that this is not the same kind of hash tree used in for instance p2p systems
Apriori, while historically significant, suffers from a number of inefficiencies or trade-offs, which have spawned other algorithms. Candidate generation generates large numbers of subsets (the algorithm attempts to load up the candidate set with as many as possible before each scan). Bottom-up subset exploration (essentially a breadth-first traversal of the subset lattice) finds any maximal subset S only after all 2 | S | − 1 of its proper subsets.
 

Apriori Algorithm

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database. The problem is usually decomposed into two subproblems. One is to find those itemsets whose occurrences exceed a predefined threshold in the database; those itemsets are called frequent or large itemsets. The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence. Suppose one of the large itemsets is Lk, Lk = {I1, I2, … , Ik}, association rules with this itemsets are generated in the following way: the first rule is {I1, I2, … , Ik-1}⇒ {Ik}, by checking the confidence this rule can be determined as interesting or not. Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent, further the confidences of the new rules are checked to determine the interestingness of them. Those processes iterated until the antecedent becomes empty. Since the second subproblem is quite straight forward, most of the researches focus on the first subproblem. The Apriori algorithm finds the frequent sets L In Database D.

  • Find frequent set Lk − 1.

  • Join Step.

    • Ck is generated by joining Lk − 1with itself

  • Prune Step.

    • Any (k − 1) -itemset that is not frequent cannot be a subset of a frequent k -itemset, hence should be removed.

where

  • (Ck: Candidate itemset of size k)

  • (Lk: frequent itemset of size k)

Apriori Pseudocode

Apriori

large 1-itemsets that appear in more than transactions }

while

Generate(Lk − 1)

for transactions

Subset(Ck,t)

for candidates

return

 
Applications
Market Basket Analysis
• Consider shopping cart filled with several items
• Market basket analysis tries to answer the following questions:
– Who makes purchases?
– What do customers buy together?
– In what order do customers purchase items?
• Given a database of customer transactions, each transaction is a set of items – deduce association rules.

• Co-ocurrences – 80% of all customers purchase items a, b, and c together.
• Association Rules – 60% of all customers who purchase X and Y also buy Z.
• Sequential Patterns – 60% of customers who first buy X also purchase Y within two weeks.

 

Confidence and Support
• We prune the set of all possible association rules using two measures of interest:
– Confidence of a rule: X ->Y has confidence c if P(Y|X)=c.
– Support of a rule: X->Y has support s if P(XY)=s. Also, support of an item

 

Other Applications

• Direct Marketing
• Fraud Detection for Medical Insurance
• Floor/Shelf Planning
• Web Site Layout
• Cross-selling
• Frequent Itemsets applications:
– Classification
– Seeds for constructing Bayesian networks
– Web log analysis
– Collaborative filteringset XY