image/svg+xml How to solve it ? Curse of Dimensionality Goal of Unsupervised Learning : Directly inder the properties of this probability density Source : deeplearningbook.org Solution? Gaussian mixturesSimple Descriptive statistics Use case Stocking Shelves Cross Marketing in sales promotion Catlog Design Consumer Segmentation of this probability density Problem : Market Basket Milk One trasaction at the checkout counter Tom Oil Stra .... Customer 1 Customer 2 ... Customer N 0 1 1 1 v1v2v3..vL Each represent a completevector of p values for eachof X variables. i.e. each ofvl actually represents all the products that were boughttogether, which are representedby 0 and 1 in vl. vl 0 1 1 1 More General Problem Goal : Find a collection of prototype X values v1, v2, v3, ... vLfor feature vector X such that probability density Pr(vl) evaluated at each of those values is relatively large. Mode FindingorBump Hunting Impossibly difficult X2 X1 X3 Xp Issue 1: p is very largeIssue 2: Multiple values for Xi # of observations for which X = vl will be too small when 2 Simplifications Rather than specific value for each variable one seeks for regions of the X space with high probability w.r.t the size or support. Only 2 types of subsets areconsidered, either sj consist of single value or entire set of values support or size : Sj = set of all possible values for variable jsj ∈ Sj is a subset of these values Simplified Goal Simplified Goal sj = v0jor sj = Sj Market Basket Analysis Many solutions in next talkby collegues But there is a problem whenp ≈104, N ≈108The methods explained later in the chapterdoes not work... Mix above 2 simiplifications find J ⊂ {1, 2, 3, ... p}and corresponding values v0j such that is high Official Market Basket Problem In standard formulation of the market basket problemset K is called an "item set" and T(K) of an item set is called Support or size. An observation with 1 is said to "contain" the item set K. Dummy Variables X1 1 5 2 S1 Z1 Z2 Z3 X2 3 4 5 S2 Z4 Z5 Z6 Xp 10 51 7 Sp ZQ-2 ZQ-1 ZQ Problem with binary valued variables find K ⊂ {1, 2, 3, ... Q}and corresponding values v0j such that is high Support or Size Fraction of observations that confirm with K Minimum Value ? Maximum Value ? no observation have 1'sfor the k's in association rule all observation have 1'sfor the k's in association rule 0 / N = 0 N / N = 1 Association Rule Mining Given lower bound t one seeks all item sets Kl that can be formed from the variables Z1, Z2, .. ZQ with support in the database greater than this lower bound t Apriori Algorithm Z2 Z1 Z3 ZQ 0 1 1 1 0 1 1 1 0 1 1 1 1 1 0 0 1 0 0 0 0 1 1 0 0 1 1 0 First Pass Check all : Z1, Z2, Z3 ... ZQ and select item sets with tolerance > t Second Pass Check all combinations of item sets from First pass and select item sets with tolerance > t ........ Understand that the tolerance will keep on reducing as number of variableskeeps on increasing in successive passes. Output : Breaking item set Output from Apriori Algorithm: Each high support item is broken down into "association rules" Zk, k ∈ K are partioned into 2 disjoints subsets A and B such that and an association rule is created Antecedent Consequent Intution High Support Both A and B occur togetherin a market basket many number of times High Confidence High probability that if Ais already in the basketthan B will be also purchased Prob of observing both A and B together Prob of observing B when A has alreadybeen in the basket. Final Output Output of the entire analysis is a collection of association rulesthat satisfy the constraintss Display all transactions in which ice skates are the consequentthat have confidence over 80% and support of more than 2%. Assoiation rules like are Fun?? [ ] => marital_status == non married Example Unsupervised Learning Vijay Daultani Association Rules Heuristic Arguments Supervised Learning Unsupervised Learning Also used as judgementas to the quality of results Used for motivatingthe algorithm Used for motivatingthe algorithm Matter of opinionCannot be verified ? Introduction Success? Supervised Learning Unsupervised Learning Expected Loss Cross Validation ? Supervised Learning Density Estimation Unsupervised Learning Solutions.. Principal Component Anaylsis Cluster Analysis Mixture Modelling(Association Rules) attempt to identifylow dimensionalmanifolds within Xspace that representshigh data density attemp to findmultiple convexregions of X spacethat contains modesof Pr(X) to tell if it canbe represented bymixture of simpledensities attemp to constructsimple descriptions(conjuctive rules) thatdescribes regions ofhigh density What happens when input features p is very large. Let's say 1000 There exists non paramateric methods for directly estimating at all X-values, ex: spline smoothing approach to non parametricregression curve (Silverman 1986) Terminologies Support Confidence Lift Conclusion - Restrictive forms of data (Only binary)- Computational intensive Advantages Disadvantage - Popular for analyzing very large commercial data bases source : Pixabay [https://pixabay.com/en/vegetables-vegetable-basket-harvest-752153/a] Goal Goal of Unsupervised Learning : Directly inder the properties of this probability density X2 X1 X3 Xp
1
  1. title
  2. introduction
  3. success
  4. heuristic_arguments
  5. heuristic_argument_2
  6. success_heuristic
  7. density_estimation
  8. goal
  9. goal_p_<_3
  10. goal_p_>_3
  11. p_>_3_curse
  12. curse
  13. p_>_3_curse
  14. p_>_3_solution
  15. solution_solutions
  16. solutions
  17. problem
  18. problem_2
  19. use_case
  20. more_general_problem
  21. more_general_problem_2
  22. probability
  23. impossibly_difficult
  24. impossibly_difficult_2
  25. how_to_solve_it
  26. 2_simplifications
  27. 2_simplifications_2
  28. 2_simplications_3
  29. market_basket_analysis
  30. many_solutions
  31. mix_above_2_simplifications
  32. dummy_vairables
  33. binary_valued_variables
  34. support_size
  35. official_market_basket
  36. association_rule_mining
  37. apriori
  38. breaking_item_set
  39. terminologies
  40. terminologies_intutions
  41. intution
  42. final_output
  43. example_1
  44. example_2
  45. example_3
  46. example_4
  47. association_rules_fun
  48. conclusion
  49. thank_you