<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="cognitiveclass.ai logo">
</center>

# Analyzing What Is Included In The Buyer's Basket?

Estimated time needed: **1** hour

## Objectives

After completing this lab you will be able to:

*   Be confident about your data analysis skills


This dataset is : <a href=https://www.kaggle.com/datasets/akashdeepkuila/bakery> https://www.kaggle.com/datasets/akashdeepkuila/bakery </a>  
### The dataset contains:
* TransactionNo : unique identifier for every single transaction
* Items : items purchased
* DateTime : date and time stamp of the transactions
* Daypart : part of the day when a transaction is made (morning, afternoon, evening, night)
* DayType : classifies whether a transaction has been made in weekend or weekdays

You will be asked to analyze the data and predict the filling of the basket and a combination of products that are bought together. At the end of the lab, you will be instructed on how you can share your notebook.

### License: 
<a href="https://creativecommons.org/publicdomain/zero/1.0/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX08J0EN2930-2023-01-01">CC0 1.0 Universal (CC0 1.0) Public Domain Dedication</a></li>
<p>The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.

You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.</p>
<p>


You will need the following libraries:


In [ ]:
import piplite
await piplite.install(['pandas'])
await piplite.install(['matplotlib'])
await piplite.install(['scipy'])
await piplite.install(['seaborn'])
await piplite.install(['ipywidgets'])
await piplite.install(['tqdm'])

In [ ]:
# ! pip install pandas
! pip install seaborn
! pip install mlxtend
! pip install pyvis

In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
from mlxtend.frequent_patterns import apriori, association_rules
from pyvis.network import Network

<b>Importing the Data</b>


You will need to download the dataset; 


In [ ]:
path = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX0NSBEN/Bakery_dataset.csv'

Load the csv:


In [ ]:
df= pd.read_csv(path)


We use the method  <code>head()</code>  to display the first 5 columns of the dataframe:


In [ ]:
df.head()

<b style=" font-weight: bold; text-decoration: none;">Question 1</b>: Rename columns 'Daypart' -> 'DayPart' and 'TransactionNo' -> 'TransationNumber'.

Use <code>df.rename(columns={'':''})</code>


In [ ]:
df=df.rename(columns={'Daypart' : 'DayPart'})
df=df.rename(columns={'TransactionNo' : 'TransationNumber'})
df

Now let's check missing values in rows of the dataset


In [ ]:
#check missing values
missing_data = df.isnull()
missing_data.head(5)

In [ ]:
for column in missing_data.columns.values.tolist():
    print(column)
    print (missing_data[column].value_counts())
    print("")    

In the dataset we do not have empty values.


<b style=" font-weight: bold; text-decoration: none;">Question 2</b>:  Display the data types of each column using the attribute `dtype`.


In [ ]:
df.dtypes

<b style=" font-weight: bold; text-decoration: none;">Question 3</b>: Convert data types to proper format:
* 'DateTime' to 'datetime64[ns]'
* 'DayPart' to 'category'
* 'DayType' to 'category'


In [ ]:
df['DateTime']=pd.to_datetime(df['DateTime'])
df[['DayPart']] = df[['DayPart']].astype('category')
df[['DayType']] = df[['DayType']].astype('category')

#show the result
df.dtypes

<b style=" font-weight: bold; text-decoration: none;">Question 4:</b> Add new columns (from 'DateTime'):
<!-- * 'time'
* hour'' -->
* 'month'
* 'month name', use <code>.replace</code>
<!-- * 'day'
* 'weekday' and 'weekday name' -->


In [ ]:
df['month'] = df['DateTime'].dt.month
df['month name'] = df['month'].replace([1,2,3,4,5,6,7,8,9,10,11,12],['January','February','March','April','May','June','July','August','September','October','November','December'])


In [ ]:
df

<b style=" font-weight: bold; text-decoration: none;">Question 5:</b> Let's analyze top 15 most popular purchases. 


In [ ]:
popular = df['Items'].value_counts()
(df['Items'].value_counts(normalize=False)*100).head(15)

<b style=" font-weight: bold; text-decoration: none;">Question 6:</b> Build a plot of the top 15 most popular positions (use the previous assigned value).
Use <code>sns.barplot</code>


In [ ]:
plt.figure(figsize=(15,5))
sns.barplot(x = popular.head(15).index, y = popular.head(15).values, palette = 'rocket')
plt.xlabel('Items', size = 15)
plt.xticks(rotation=60)
plt.ylabel('Count of Items', size = 15)
plt.title('Top 15 purchased Items', color = 'black', size = 20)
plt.show()

<b style=" font-weight: bold; text-decoration: none;">Question 7:</b> Build a plot of sales by months (counts of 'TransactionNumber').Use <code>sns.pointplot</code>.

Let's analyze the dynamics of monthly purchases. For correct sorting, we need to group the DataSet by month number but display on the graph by month name.


In [ ]:
monthTran = df.groupby(['month','month name'])['TransationNumber'].count().reset_index()
plt.figure(figsize=(12,5))
sns.pointplot(data = monthTran[['month name', 'TransationNumber']], x = "month name", y = "TransationNumber")
plt.xlabel('Month', size = 15)
plt.ylabel('Orders per current month', size = 15)
plt.title('Number of orders received each month', color = 'black', size = 18)
plt.show()

<b style=" font-weight: bold; text-decoration: none;">Question 8:</b> Let's analyze the activity of buyers during parts of the day. 

This information is the initial DataSet. You need to add your own sort order to display the graph columns correctly.
<br>Also use <code>plt.pie</code>


In [ ]:
size = df['DayPart'].value_counts()
labels = size.index.values
colors = ["deepskyblue", "lightblue", "cornflowerblue", "blue"]
explode = [0.05, 0.05, 0.1, 0.1]

plt.figure(figsize=(12,5))
plt.pie(size, labels = labels, colors = colors, explode = explode, shadow = True, autopct = "%.2f%%")
plt.title('Transaction by day period')
plt.show()

# Це ок шо пай діаграма?


<h3>Now Let's go to Association Rules</h3>


<b style=" font-weight: bold; text-decoration: none;">Question 9:</b> Group rows into orders by columns 'TransationNumber' and 'Items' 


In [ ]:
orders = df.groupby(['TransationNumber', 'Items'])['Items'].count().reset_index(name ='Count')
orders

<b style=" font-weight: bold; text-decoration: none;">Question 10:</b> Transform orders by <code>pivot_table</code> into a market basket structure


In [ ]:
basket = orders.pivot_table(index='TransationNumber', columns='Items', values='Count', aggfunc='sum').fillna(0)
basket


<b style=" font-weight: bold; text-decoration: none;">Question 11:</b> Write a function, which will change non-zero data to True and zero data to False.
After use <code>.applymap</code> to basket.


In [ ]:
def encode_units(x): 
    if(x==0): 
        return False
    if(x>0): 
        return True
    
basket_sets = basket.applymap(encode_units)
basket_sets

<b style=" font-weight: bold; text-decoration: none;">Question 12:</b> Generate frequent item sets for association rule mining that have an appropriate support using <code>apriori()</code>


In [ ]:
frequent= apriori(basket_sets, use_colnames=True, min_support=0.01)
frequent.sort_values("support", ascending=False)

<b style=" font-weight: bold; text-decoration: none;">Question 13:</b> Create association rules using the function <code>association_rules()</code>


In [ ]:
rules = association_rules(frequent, metric="confidence", min_threshold=0.2)
rules.sort_values('confidence', ascending = False, inplace=True)
rules

<b style=" font-weight: bold; text-decoration: none;">Question 14:</b> Predict what else customer can buy with a Sandwich.


In [ ]:
rules[rules['antecedents'] == frozenset({'Sandwich'})]

As a result you will get Coffee, Bread and Tea.


<b style=" font-weight: bold; text-decoration: none;">Question 15:</b> Analyse which products customers buy together the most frequently.


In [ ]:
frequent["antecedent_len"] = frequent["itemsets"].apply(lambda x: len(x))
frequent[frequent["antecedent_len"]>1].sort_values(by=["antecedent_len","support"], ascending=False)

<h3>Visualization of Association Rules</h3>


<b style=" font-weight: bold; text-decoration: none;">Question 16:</b> Create a Network. Use <code>Network()</code>. And choose the type of network.


In [ ]:
Basket_Network = Network(height="1000px", width="1000px", directed=True, notebook=True)
Basket_Network.repulsion()

<b style=" font-weight: bold; text-decoration: none;">Question 17:</b> A pyvis Graph based on rules:
* create nodes with <code>add_node()</code>
* create edges with <code>add_edge()</code> between nodes


In [ ]:
Basket_Network_Data_zip=zip(rules["antecedents"],
                            rules["consequents"],
                            rules["antecedent support"],
                            rules["consequent support"],
                            rules["confidence"])

for i in Basket_Network_Data_zip:
    FromItem=str(i[0]).replace("frozenset({'","").replace("'})","").replace("',　'",",")
    ToItem=str(i[1]).replace("frozenset({'","").replace("'})","").replace("',　'",",")
    FromWeight=i[2]
    ToWeight=i[3]
    EdgeWeight=i[4]

    Basket_Network.add_node(n_id=FromItem, shape="dot", value=FromWeight,
                            title=FromItem + "<br>Support: " + str(FromWeight))
    Basket_Network.add_node(n_id=ToItem, shape="dot", value=ToWeight,
                           title=ToItem + "<br>Support: " + str(ToWeight))
    Basket_Network.add_edge(source=FromItem, to=ToItem, value=EdgeWeight, arrowStrikethrough=False,
                            title=FromItem + " --> " + ToItem + "<br>Confidence:" + str(EdgeWeight))

<b style=" font-weight: bold; text-decoration: none;">Question 18:</b> Set the edges smooth. Use <code>set_edge_smooth</code> <br> Then use <code>toggle_hide_edges_on_drag</code> to set parameters of visualization.


In [ ]:
Basket_Network.set_edge_smooth(smooth_type="continuous")
Basket_Network.toggle_hide_edges_on_drag(True)

<b style=" font-weight: bold; text-decoration: none;">Question 19:</b> Save this Graph and show it.<br> Use <code>save_graph()</code> and <code>show()</code>


In [ ]:
Basket_Network.save_graph("Basket_Network.html")
Basket_Network.show("Basket_Network.html")

<a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/share-notebooks.html/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDA0101ENSkillsNetwork20235326-2021-01-01"> CLICK HERE</a> to see how to share your notebook


### Thank you for completing this lab!

## Author

<a href="https://author.skills.network/instructors/veronika_lanchuv?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX02PYEN2993-2023-01-01">Veronika Lanchuv</a>

### Other Contributors

<a href="https://author.skills.network/instructors/yaroslav_vyklyuk_2?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX02PYEN2993-2023-01-01">Yaroslav Vyklyuk</a>

<a href="https://author.skills.network/instructors/olga_kavun?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX02PYEN2993-2023-01-01">Olga Kavun</a>


## Change Log

| Date (YYYY-MM-DD) | Version | Changed By | Change Description                 |
| ----------------- | ------- | ---------- | ---------------------------------- |
| 2022-05-11        | 2.0     | Veronika   | lab is done                        |

<hr>

## <h3 align="center"> © IBM Corporation 2023. All rights reserved. <h3/>
