## Part 1

In [312]:
import pandas as pd
pd.set_option("max_colwidth", 150)

f = "https://github.com/cs6220/cs6220.spring2019/raw/master/data/Online%20Retail.xlsx"
df = pd.read_excel(f)

basket = (df[df["Country"] == "United Kingdom"]
.groupby(["InvoiceNo", "Description"])["Quantity"]
.sum().unstack().reset_index().fillna(0)
.set_index("InvoiceNo")) # transform transactions into baskets of items

# convert counts to booleans
basket_sets = basket.applymap(lambda x: 1 if x >=1 else 0) 

In [313]:
basket_sets.head()

Description,20713,4 PURPLE FLOCK DINNER CANDLES,50'S CHRISTMAS GIFT BAG LARGE,DOLLY GIRL BEAKER,I LOVE LONDON MINI BACKPACK,NINE DRAWER OFFICE TIDY,OVAL WALL MIRROR DIAMANTE,RED SPOT GIFT BAG LARGE,SET 2 TEA TOWELS I LOVE LONDON,SPACEBOY BABY GIFT SET,...,wrongly coded 20713,wrongly coded 23343,wrongly coded-23343,wrongly marked,wrongly marked 23343,wrongly marked carton 22804,wrongly marked. 23343 in box,wrongly sold (22719) barcode,wrongly sold as sets,wrongly sold sets
InvoiceNo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
536365,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
536366,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
536367,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
536368,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
536369,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### What are the top 5 1-itemsets with the highest support?

In [314]:
from mlxtend.frequent_patterns import apriori

frequent_itemsets = apriori(basket_sets, min_support=0.06, use_colnames=True, max_len=1)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets.sort_values(by=['support'], ascending=False).head()

Unnamed: 0,support,itemsets,length
5,0.098276,(WHITE HANGING HEART T-LIGHT HOLDER),1
1,0.087931,(JUMBO BAG RED RETROSPOT),1
4,0.076452,(REGENCY CAKESTAND 3 TIER),1
3,0.072323,(PARTY BUNTING),1
2,0.063158,(LUNCH BAG RED RETROSPOT),1


#### What are the top 5 2-itemsets with the highest support?

In [318]:
# Try to find an appropriate threshold
frequent_itemsets = apriori(basket_sets, min_support=0.027, use_colnames=True)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))

In [319]:
# Filter the results
frequent_itemsets[ (frequent_itemsets['length'] == 2) ].sort_values(by=['support'], ascending=False)

Unnamed: 0,support,itemsets,length
110,0.035617,"(JUMBO BAG RED RETROSPOT, JUMBO BAG PINK POLKADOT)",2
109,0.031806,"(ROSES REGENCY TEACUP AND SAUCER , GREEN REGENCY TEACUP AND SAUCER)",2
112,0.03167,"(JUMBO BAG RED RETROSPOT, JUMBO STORAGE BAG SUKI)",2
111,0.029809,"(JUMBO SHOPPER VINTAGE RED PAISLEY, JUMBO BAG RED RETROSPOT)",2
113,0.027541,"(LUNCH BAG BLACK SKULL., LUNCH BAG RED RETROSPOT)",2


#### What is the highest support value for the 1-itemsets? 

0.098276

In [320]:
frequent_itemsets[ (frequent_itemsets['length'] == 1) ].sort_values(by=['support'], ascending=False).head(1)

Unnamed: 0,support,itemsets,length
104,0.098276,(WHITE HANGING HEART T-LIGHT HOLDER),1


#### What is the highest support value for the 2-itemsets?

0.035617

In [321]:
frequent_itemsets[ (frequent_itemsets['length'] == 2) ].sort_values(by=['support'], ascending=False).head(1)

Unnamed: 0,support,itemsets,length
110,0.035617,"(JUMBO BAG RED RETROSPOT, JUMBO BAG PINK POLKADOT)",2


Also tried fpgrowth() to calculate the frequent itemsets, the results are the same.

In [46]:
from mlxtend.frequent_patterns import apriori, fpmax, fpgrowth
test = fpgrowth(basket_sets, min_support=0.06, use_colnames=True)

#### What are the top 5 association rules?

In [322]:
from mlxtend.frequent_patterns import association_rules

rule = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)
rule.sort_values(by=['confidence'], ascending=False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
1,(GREEN REGENCY TEACUP AND SAUCER),(ROSES REGENCY TEACUP AND SAUCER ),0.042377,0.043421,0.031806,0.750535,17.285056,0.029966,3.834527
0,(ROSES REGENCY TEACUP AND SAUCER ),(GREEN REGENCY TEACUP AND SAUCER),0.043421,0.042377,0.031806,0.732497,17.285056,0.029966,3.579862
2,(JUMBO BAG PINK POLKADOT),(JUMBO BAG RED RETROSPOT),0.052586,0.087931,0.035617,0.677308,7.702719,0.030993,2.826438
4,(JUMBO STORAGE BAG SUKI),(JUMBO BAG RED RETROSPOT),0.05127,0.087931,0.03167,0.617699,7.024813,0.027161,2.385736
3,(JUMBO SHOPPER VINTAGE RED PAISLEY),(JUMBO BAG RED RETROSPOT),0.051407,0.087931,0.029809,0.579876,6.594673,0.025289,2.170954


#### What items make up one of the top association rules? Search online for the items (or at least items with the same name). Do you think they are likely to be bought together?

GREEN REGENCY TEACUP AND SAUCER --> ROSES REGENCY TEACUP AND SAUCER  
I think it's very likely to buy the two things together. Normally, people will buy pairs of teacup for family and different colors can distinguish those teacups. Also, maybe women prefer using rose teacup and men prefer using green teacup. Thus, I think it makes sense to buy the two things together. In addition, the lift value of this rule also supports it. As we can see the lift value is 17.285056, so those two item are positive related, which means having more GREEN REGENCY TEACUP AND SAUCER will increase the number of ROSES REGENCY TEACUP AND SAUCER. 

## Part 2

In [323]:
import numpy as np
import pandas as pd

path = "https://raw.githubusercontent.com/cs6220/cs6220.spring2019/master/data/adult/"

names = pd.read_csv(path + "adult.names", sep="\n", header=None)
parse_cols = lambda x: x.str.split(":", expand=True).iloc[:, 0]
columns = np.roll(parse_cols(names.iloc[92:108, 0]), shift=-1)

df_adult = pd.read_csv(path + "adult.data", sep=",", header=None, index_col=False)
df_adult.columns = columns

#### Transform the raw dataset into a format appropriate for association rule mining by dropping all continuous columns and one-hot encoding the remaining columns. The values for each resulting column should be binary, represented by a 1 or 0.

In [324]:
df_adult = df_adult.dropna()
df_adult.rename(columns={'>50K, <=50K.':'income'}, inplace=True)
df_adult = df_adult.drop(['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week'], axis=1)
df_adult.head()

Unnamed: 0,workclass,education,marital-status,occupation,relationship,race,sex,native-country,income
0,State-gov,Bachelors,Never-married,Adm-clerical,Not-in-family,White,Male,United-States,<=50K
1,Self-emp-not-inc,Bachelors,Married-civ-spouse,Exec-managerial,Husband,White,Male,United-States,<=50K
2,Private,HS-grad,Divorced,Handlers-cleaners,Not-in-family,White,Male,United-States,<=50K
3,Private,11th,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,United-States,<=50K
4,Private,Bachelors,Married-civ-spouse,Prof-specialty,Wife,Black,Female,Cuba,<=50K


In [325]:
df = pd.get_dummies(df_adult)
df

Unnamed: 0,workclass_ ?,workclass_ Federal-gov,workclass_ Local-gov,workclass_ Never-worked,workclass_ Private,workclass_ Self-emp-inc,workclass_ Self-emp-not-inc,workclass_ State-gov,workclass_ Without-pay,education_ 10th,...,native-country_ Scotland,native-country_ South,native-country_ Taiwan,native-country_ Thailand,native-country_ Trinadad&Tobago,native-country_ United-States,native-country_ Vietnam,native-country_ Yugoslavia,income_ <=50K,income_ >50K
0,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,1,0,0,1,0
1,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,1,0,0,1,0
2,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,1,0,0,1,0
3,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,1,0,0,1,0
4,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32556,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,1,0,0,1,0
32557,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,1
32558,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,1,0,0,1,0
32559,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,1,0,0,1,0


#### Use confidence for the rule interestingness (metric="confidence") and generate rules up to a depth of at most 3 (max len=3) or higher. Generate rules and find at least 5 rules that you find interesting. Comment on your findings and try to reason about these association rules. Decide yourself on the levels of support and confidence used in this analysis.

1. Frequent Itemset Generation

I use 20% as the support threshold to get more data points.

In [326]:
frequent_itemsets = apriori(df, min_support=0.2, use_colnames=True, max_len=3)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.697030,(workclass_ Private)
1,0.322502,(education_ HS-grad)
2,0.223918,(education_ Some-college)
3,0.459937,(marital-status_ Married-civ-spouse)
4,0.328092,(marital-status_ Never-married)
...,...,...
102,0.401861,"(race_ White, sex_ Male, income_ <=50K)"
103,0.580971,"(native-country_ United-States, race_ White, income_ <=50K)"
104,0.205890,"(native-country_ United-States, race_ White, income_ >50K)"
105,0.264427,"(native-country_ United-States, sex_ Female, income_ <=50K)"


2. Rule Generation

In [327]:
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(education_ HS-grad),(workclass_ Private),0.322502,0.697030,0.238936,0.740882,1.062912,0.014142,1.169234
1,(marital-status_ Married-civ-spouse),(workclass_ Private),0.459937,0.697030,0.298885,0.649840,0.932298,-0.021705,0.865232
2,(marital-status_ Never-married),(workclass_ Private),0.328092,0.697030,0.251405,0.766264,1.099327,0.022715,1.296206
3,(relationship_ Husband),(workclass_ Private),0.405178,0.697030,0.263260,0.649738,0.932153,-0.019162,0.864982
4,(workclass_ Private),(race_ White),0.697030,0.854274,0.595928,0.854952,1.000795,0.000473,1.004681
...,...,...,...,...,...,...,...,...,...
246,"(native-country_ United-States, sex_ Male)",(income_ <=50K),0.598507,0.759190,0.411197,0.687038,0.904962,-0.043184,0.769453
247,"(native-country_ United-States, income_ <=50K)",(sex_ Male),0.675624,0.669205,0.411197,0.608619,0.909464,-0.040934,0.845197
248,"(sex_ Male, income_ <=50K)",(native-country_ United-States),0.464605,0.895857,0.411197,0.885048,0.987934,-0.005022,0.905966
249,(sex_ Male),"(native-country_ United-States, income_ <=50K)",0.669205,0.675624,0.411197,0.614456,0.909464,-0.040934,0.841346


Rules that I find interesting:
1. sex_ Male, relationship_ Husband ------>	race_ White  
It seems like married male are intended to be White people. It reminds me of an analysis I saw before that white male is the second earliest race that get their first wedding in 2019. Maybe that's also the case in 1996. 
[Estimated median age of Americans](https://www.statista.com/statistics/372080/median-age-of-us-americans-at-their-first-wedding-by-race-and-origin/)

---

2. race_ White, workclass_ Private ------> income_ <=50K  
The rule implies that most white people who work at private companies have income smaller or equal to 50K. I think the possible reason is that many private companies are running small business, so maybe that causes the trend.

---

3. native-country_ United-States, marital-status_ Never-married	------> workclass_ Private  
This is interesting as it shows that people who never married are intended to work at private companies. I think the reason could be working at private companies doesn't have much regular work time as people who work for government, so they may don't have much time to spend with family (just guess...)

---

4. race_ White, marital-status_ Never-married ------> income_ <=50K  
It seems like white people who never married are intended to have income smaller than or equal to 50K. Although there is no age feature showed, but I'm guessing the people that are never married are young. So I think it totally makes sense that they have income smaller or equal to 50K as they don't have much work experience.

---

5. education_ HS-grad, workclass_ Private ------> income_ <=50K  
This implies that people with graduate education and work at private companies have income smaller or equal to 50K. I look at the median household income in the U.S. in 1996, it's $35,492 [INCOME AND POVERTY IN THE UNITED STATES IN 1996](https://assets.aarp.org/rgcenter/econ/fs60_income96.pdf)

#### Use lift for the rule interestingness (metric="lift") and generate rules up to a depth of at most 3 (max len=3) or higher. Generate rules and find at least 5 rules that you find interesting. Comment on your findings and try to reason about these association rules. Decide yourself on the levels of support and confidence used in this analysis.

In [328]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=0.5)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(education_ HS-grad),(workclass_ Private),0.322502,0.697030,0.238936,0.740882,1.062912,0.014142,1.169234
1,(workclass_ Private),(education_ HS-grad),0.697030,0.322502,0.238936,0.342792,1.062912,0.014142,1.030872
2,(workclass_ Private),(marital-status_ Married-civ-spouse),0.697030,0.459937,0.298885,0.428798,0.932298,-0.021705,0.945486
3,(marital-status_ Married-civ-spouse),(workclass_ Private),0.459937,0.697030,0.298885,0.649840,0.932298,-0.021705,0.865232
4,(workclass_ Private),(marital-status_ Never-married),0.697030,0.328092,0.251405,0.360680,1.099327,0.022715,1.050974
...,...,...,...,...,...,...,...,...,...
391,"(native-country_ United-States, income_ <=50K)",(sex_ Male),0.675624,0.669205,0.411197,0.608619,0.909464,-0.040934,0.845197
392,"(sex_ Male, income_ <=50K)",(native-country_ United-States),0.464605,0.895857,0.411197,0.885048,0.987934,-0.005022,0.905966
393,(native-country_ United-States),"(sex_ Male, income_ <=50K)",0.895857,0.464605,0.411197,0.458999,0.987934,-0.005022,0.989638
394,(sex_ Male),"(native-country_ United-States, income_ <=50K)",0.669205,0.675624,0.411197,0.614456,0.909464,-0.040934,0.841346


Rules that I find interesting:
1. workclass_ Private, marital-status_ Married-civ-spouse ------> relationship_ Husband  
It implies that married people who work at private companies are associated with husband. By looking at the lift value of this rule, it makes sense as the lift value is equal to 2.172352, which means they are positive related. Married status and husband are not independent and they must be positive related.
 
---

2. marital-status_ Married-civ-spouse, sex_ Male ------> relationship_ Husband  
This rule demonstrates married male is related to husband in terms of the role in a family. The lift value of this rule is 2.442850, which means the antecedents and consequents are highly positive related. I think that is the most cases in 1996, which married male are the husband role in a family. However, it might not be the situation in today as 
same-sex marriage is legalized On June 26, 2015 in all fifty states in the U.S.

---

3. education_ HS-grad, workclass_ Private ------> race_ White  
It is interesting beacuase the lift value of this rule is 0.994847, which is lower than 1, means they are negative correlated. So people with graduate education and work at private companies tend to be less white. I'm not very sure about situation in 1996, but in 2017, there are less white people get graduate degree than Asian. So I'm guessing this negative correlation is because not that much white people got graduate degree at that time. [Educational Attainment, by Race and Ethnicity](https://www.equityinhighered.org/indicators/u-s-population-trends-and-educational-attainment/educational-attainment-by-race-and-ethnicity/)

---

4. race_ White, workclass_ Private ------> sex_ Male  
This rule implies most married people who work at private companies are male. I'm guessing it is because in 1996, not much female went to work. The data I found at [U.S. BUREAU OF LABOR STATISTICS](https://www.bls.gov/opub/ted/2017/percentage-of-employed-women-working-full-time-little-changed-over-past-5-decades.htm) also supports this viewpoint.

---

5. workclass_ Private, sex_ Male ------> race_ White  
Similarly, this rule shows that male who work at private companies are associated with white people. I think it might because private companies have more job opportunies and white people are the majority in the U.S.

#### Compare the top rules using the two interestingness measures for the same levels of support (use at least two diffierent levels of support) and comment on your findings.

1. Try 0.2 as the support threshold.

In [329]:
frequent_itemsets = apriori(df, min_support=0.2, use_colnames=True)

In [330]:
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)
rules[ (rules['lift'] <= 1.0) ].sort_values(by=['confidence'], ascending=False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
48,(sex_ Male),(native-country_ United-States),0.669205,0.895857,0.598507,0.894355,0.998324,-0.001005,0.985784
116,"(workclass_ Private, sex_ Female)",(native-country_ United-States),0.238076,0.895857,0.212708,0.893447,0.997310,-0.000574,0.977381
23,(marital-status_ Married-civ-spouse),(native-country_ United-States),0.459937,0.895857,0.410553,0.892628,0.996396,-0.001485,0.969929
53,(income_ <=50K),(native-country_ United-States),0.759190,0.895857,0.675624,0.889927,0.993381,-0.004502,0.946128
96,"(workclass_ Private, relationship_ Husband)",(native-country_ United-States),0.263260,0.895857,0.234237,0.889757,0.993191,-0.001606,0.944671
...,...,...,...,...,...,...,...,...,...
233,(income_ <=50K),"(race_ White, sex_ Male)",0.759190,0.588864,0.401861,0.529328,0.898898,-0.045199,0.873509
352,(workclass_ Private),"(native-country_ United-States, race_ White, sex_ Male)",0.697030,0.542152,0.367188,0.526789,0.971663,-0.010708,0.967535
377,"(workclass_ Private, income_ <=50K)","(native-country_ United-States, sex_ Male)",0.544609,0.598507,0.286539,0.526138,0.879083,-0.039413,0.847277
364,"(native-country_ United-States, race_ White)","(workclass_ Private, income_ <=50K)",0.786862,0.544609,0.413132,0.525038,0.964065,-0.015399,0.958796


In [331]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=0.5)
rules[ (rules['lift'] <= 1.0) ].sort_values(by=['lift'], ascending=False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
580,"(native-country_ United-States, sex_ Male, income_ <=50K)",(workclass_ Private),0.411197,0.697030,0.286539,0.696841,0.999728,-0.000078,0.999375
589,(workclass_ Private),"(native-country_ United-States, sex_ Male, income_ <=50K)",0.697030,0.411197,0.286539,0.411086,0.999728,-0.000078,0.999810
564,"(native-country_ United-States, race_ White, workclass_ Private)",(income_ <=50K),0.544455,0.759190,0.413132,0.758800,0.999485,-0.000213,0.998380
577,(income_ <=50K),"(native-country_ United-States, race_ White, workclass_ Private)",0.759190,0.544455,0.413132,0.544175,0.999485,-0.000213,0.999385
75,(sex_ Male),(native-country_ United-States),0.669205,0.895857,0.598507,0.894355,0.998324,-0.001005,0.985784
...,...,...,...,...,...,...,...,...,...
292,(marital-status_ Married-civ-spouse),"(native-country_ United-States, income_ <=50K)",0.459937,0.675624,0.222690,0.484175,0.716633,-0.088055,0.628848
288,"(native-country_ United-States, marital-status_ Married-civ-spouse)",(income_ <=50K),0.410553,0.759190,0.222690,0.542415,0.714465,-0.088998,0.526262
293,(income_ <=50K),"(native-country_ United-States, marital-status_ Married-civ-spouse)",0.759190,0.410553,0.222690,0.293325,0.714465,-0.088998,0.834114
691,"(native-country_ United-States, race_ White, marital-status_ Married-civ-spouse)",(income_ <=50K),0.379872,0.759190,0.203249,0.535047,0.704760,-0.085146,0.517923


2. Try 0.3 as the support threshold.

In [332]:
frequent_itemsets = apriori(df, min_support=0.3, use_colnames=True)

In [333]:
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)
rules[ (rules['lift'] <= 1.0) ].sort_values(by=['confidence'], ascending=False).head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
26,(sex_ Male),(native-country_ United-States),0.669205,0.895857,0.598507,0.894355,0.998324,-0.001005,0.985784
13,(marital-status_ Married-civ-spouse),(native-country_ United-States),0.459937,0.895857,0.410553,0.892628,0.996396,-0.001485,0.969929
30,(income_ <=50K),(native-country_ United-States),0.75919,0.895857,0.675624,0.889927,0.993381,-0.004502,0.946128
5,(workclass_ Private),(native-country_ United-States),0.69703,0.895857,0.618378,0.887161,0.990293,-0.006062,0.922932
123,"(sex_ Male, income_ <=50K)",(native-country_ United-States),0.464605,0.895857,0.411197,0.885048,0.987934,-0.005022,0.905966


In [334]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=0.5)
rules[ (rules['lift'] <= 1.0) ].sort_values(by=['lift'], ascending=False).head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
164,"(native-country_ United-States, race_ White, workclass_ Private)",(income_ <=50K),0.544455,0.75919,0.413132,0.7588,0.999485,-0.000213,0.99838
177,(income_ <=50K),"(native-country_ United-States, race_ White, workclass_ Private)",0.75919,0.544455,0.413132,0.544175,0.999485,-0.000213,0.999385
30,(native-country_ United-States),(sex_ Male),0.895857,0.669205,0.598507,0.668084,0.998324,-0.001005,0.99662
31,(sex_ Male),(native-country_ United-States),0.669205,0.895857,0.598507,0.894355,0.998324,-0.001005,0.985784
15,(marital-status_ Married-civ-spouse),(native-country_ United-States),0.459937,0.895857,0.410553,0.892628,0.996396,-0.001485,0.969929


**What I found:**  
  
When setting support threshold to 0.2, the top rules when using "confidence" as the metric include: workclass_ Private, income_ <=50K ---->	marital-status_ Married-civ-spouse, relationship_ Not-in-family, income_ <=50K ----> race_ White, and so on. But the top rules are different when using "lift" as the metric.   
  
When setting support thredhold to 0.3, the situation are the same. Thus, for both support thresholds I'm tring, the top rules when sorting by confidence are different from the top rules when soring by lift.  
  
When using "confidence" as the metric, it shows Male and income <=50K are associated with native-country_ United-States. BUT it is not the case when sorting by lift. We can find that the lift value for that rule is 0.987934, which is smaller than 1, which means they are negative correlated. Thus, having gender with male and income <= 50K does not increase the chances of occurence of native country with United States. So lift helps us decide how exactly those two sets related.  
  
Note: I use filter "lift < 1.0" when generating the association rules to distinguish the results when using "confidence" as the metric and using "lift" as the metric.