# Plotting Multiple Data Series

Complete the following set of exercises to solidify your knowledge of plotting multiple data series with pandas, matplotlib, and seaborn. Part of the challenge that comes with plotting multiple data series is transforming the data into the form needed to visualize it like you want. For some of the exercises in this lab, you will need to transform the data into the form most appropriate for generating the visualization and then create the plot.

In [31]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import plotly.express as px
import cufflinks as cf
from IPython.display import HTML

warnings.filterwarnings('ignore')
%matplotlib inline

In [32]:
HTML('''<script>
code_show=true;
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
}
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')

In [8]:
data = pd.read_csv('../data/liquor_store_sales.csv')
data.head()

Unnamed: 0,Year,Month,Supplier,ItemCode,Description,ItemType,RetailSales,RetailTransfers,WarehouseSales
0,2017,4,ROYAL WINE CORP,100200,GAMLA CAB - 750ML,WINE,0.0,1.0,0.0
1,2017,4,SANTA MARGHERITA USA INC,100749,SANTA MARGHERITA P/GRIG ALTO - 375ML,WINE,0.0,1.0,0.0
2,2017,4,JIM BEAM BRANDS CO,10103,KNOB CREEK BOURBON 9YR - 100P - 375ML,LIQUOR,0.0,8.0,0.0
3,2017,4,HEAVEN HILL DISTILLERIES INC,10120,J W DANT BOURBON 100P - 1.75L,LIQUOR,0.0,2.0,0.0
4,2017,4,ROYAL WINE CORP,101664,RAMON CORDOVA RIOJA - 750ML,WINE,0.0,4.0,0.0


## 1. Create a bar chart with bars for total Retail Sales, Retail Transfers, and Warehouse Sales by Item Type.

In [9]:
cf.go_offline()

In [10]:
data_df_bar = data.groupby('ItemType', as_index=False).agg({'RetailSales': 'sum', 'RetailTransfers': 'sum', 'WarehouseSales': 'sum'})

In [11]:
data_df_bar.iplot(kind='bar', x='ItemType', y=['RetailSales', 'RetailTransfers', 'WarehouseSales'])

In [12]:
px.bar(data_df_bar, x='ItemType', y=['RetailSales', 'RetailTransfers', 'WarehouseSales'])

## 2. Create a horizontal bar chart showing sales mix for the top 10 suppliers with the most total sales. 

In [13]:
data[['RetailSales', 'RetailTransfers', 'WarehouseSales']].sum()

RetailSales         842398.67
RetailTransfers     922636.35
WarehouseSales     2903930.83
dtype: float64

In [14]:
data['TotalSales'] = data['RetailSales'] + data['RetailTransfers'] + data['WarehouseSales']

In [15]:
top_10 = data.groupby('Supplier', as_index=False).agg({'RetailSales': 'sum', 'RetailTransfers': 'sum', 'WarehouseSales': 'sum'})

In [16]:
top_10

Unnamed: 0,Supplier,RetailSales,RetailTransfers,WarehouseSales
0,8 VINI INC,2.78,2.00,1.00
1,A HARDY USA LTD,0.40,0.00,0.00
2,A I G WINE & SPIRITS,12.52,5.92,134.00
3,A VINTNERS SELECTIONS,8640.57,8361.10,29776.67
4,A&E INC,11.52,2.00,0.00
...,...,...,...,...
328,WINEBOW INC,1.24,-1.58,0.00
329,YOUNG WON TRADING INC,1058.65,1047.40,2528.90
330,YUENGLING BREWERY,9628.35,10851.17,53805.32
331,Z WINE GALLERY IMPORTS LLC,8.83,11.25,16.00


In [17]:
df_top_sales = data[['RetailSales', 'RetailTransfers', 'WarehouseSales', 'TotalSales', 'Supplier']].groupby('Supplier', as_index=False).sum().sort_values(by='TotalSales', ascending=False).iloc[:10]

In [18]:
df_top_sales

Unnamed: 0,Supplier,RetailSales,RetailTransfers,WarehouseSales,TotalSales
184,MILLER BREWING COMPANY,35022.63,39176.67,572623.41,646822.71
69,CROWN IMPORTS,26707.83,29561.67,579824.7,636094.2
15,ANHEUSER BUSCH INC,42559.14,47322.64,493856.19,583737.97
139,HEINEKEN USA,20923.17,23004.25,318812.59,362740.01
94,E & J GALLO WINERY,67455.63,75129.83,75594.99,218180.45
78,DIAGEO NORTH AMERICA INC,57656.36,62968.12,54252.88,174877.36
65,CONSTELLATION BRANDS,54472.51,60542.1,44968.76,159983.37
150,JIM BEAM BRANDS CO,39156.79,43020.59,2928.45,85105.83
288,THE WINE GROUP,25758.15,28462.05,26603.78,80823.98
38,BOSTON BEER CORPORATION,10773.25,11869.96,53420.91,76064.12


In [19]:
top_10 = df_top_sales.groupby('Supplier', as_index=False)['RetailSales', 'RetailTransfers', 'WarehouseSales', 'TotalSales'].sum().sort_values(by='TotalSales', ascending=True).head(10)

In [20]:
top_10

Unnamed: 0,Supplier,RetailSales,RetailTransfers,WarehouseSales,TotalSales
1,BOSTON BEER CORPORATION,10773.25,11869.96,53420.91,76064.12
9,THE WINE GROUP,25758.15,28462.05,26603.78,80823.98
7,JIM BEAM BRANDS CO,39156.79,43020.59,2928.45,85105.83
2,CONSTELLATION BRANDS,54472.51,60542.1,44968.76,159983.37
4,DIAGEO NORTH AMERICA INC,57656.36,62968.12,54252.88,174877.36
5,E & J GALLO WINERY,67455.63,75129.83,75594.99,218180.45
6,HEINEKEN USA,20923.17,23004.25,318812.59,362740.01
0,ANHEUSER BUSCH INC,42559.14,47322.64,493856.19,583737.97
3,CROWN IMPORTS,26707.83,29561.67,579824.7,636094.2
8,MILLER BREWING COMPANY,35022.63,39176.67,572623.41,646822.71


In [21]:
fig = px.bar(top_10, x=['RetailSales', 
                  'RetailTransfers', 
                  'WarehouseSales', 
                  'TotalSales'], 
       y='Supplier', 
       barmode='group', 
       orientation='h')
fig.update_layout(xaxis={'categoryorder':'array', 'categoryarray':top_10.index})

## 3. Create a multi-line chart that shows average Retail Sales, Retail Transfers, and Warehouse Sales per month over time.

In [22]:
avg_sale_by_month = data.pivot_table(
        values=['RetailSales', 
                'RetailTransfers',
                'WarehouseSales'], 
        index=['Year', 'Month'], 
        aggfunc='mean')

In [23]:
avg_sale_by_month

Unnamed: 0_level_0,Unnamed: 1_level_0,RetailSales,RetailTransfers,WarehouseSales
Year,Month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2017,4,0.0,15.707503,0.0
2017,5,7.038838,7.421817,27.310548
2017,6,7.143914,6.950396,27.839069
2017,8,6.409991,6.584726,28.122641
2017,9,6.757254,6.419721,22.817909
2017,10,6.549021,6.827827,22.289367
2017,11,6.765496,7.103699,23.348862
2017,12,9.078241,8.353759,21.169463
2018,1,5.679413,5.574833,19.072137
2018,2,5.939247,6.050136,20.229658


In [24]:
avg_sale_by_month.iplot(kind='line',  
                       xTitle='month_year', 
                       yTitle='sales', 
                       title='AVG sales by month',
                       subplots=False)

## 4. Plot the same information as above but as a bar chart.

In [25]:
avg_sale_by_month.iplot(kind='bar',  
                       xTitle='month_year', 
                       yTitle='sales', 
                       title='AVG sales by month',
                       subplots=False)

## 5. Create a multi-line chart that shows Retail Sales summed by Item Type over time (Year & Month).

*Hint: There should be a line representing each Item Type.*

In [26]:
avg_retail_sale_by_month = data.pivot_table(
        values=['RetailSales', 
                'ItemType'], 
        index=['Year', 'Month'], 
        aggfunc='sum')

In [27]:
avg_retail_sale_by_month.iplot(kind='line',  
                       xTitle='month_year', 
                       yTitle='RetailSales', 
                       title='sum Reatail sale by itemType',
                       subplots=False)

## 6. Plot the same information as above but as a bar chart.

In [28]:
avg_retail_sale_by_month.iplot(kind='bar',  
                       xTitle='month_year', 
                       yTitle='RetailSales', 
                       title='sum Reatail sale by itemType',
                       subplots=False)

## 7. Create a scatter plot showing the relationship between Retail Sales (x-axis) and Retail Transfers (y-axis) with the plot points color-coded according to their Item Type.

*Hint: Seaborn's lmplot is the easiest way to generate the scatter plot.*

In [29]:
data_df_bar.iplot(x='RetailSales', 
                     y='RetailTransfers', 
                     categories='ItemType',
                     xTitle='Retail Sales ', 
                     yTitle='Retail Transfers',
                     title='relationship between Retail Sales and Retail Transfers')

## 8. Create a scatter matrix using all the numeric fields in the data set with the plot points color-coded by Item Type.

*Hint: Seaborn's pairplot may be your best option here.*

In [30]:
data_df_bar.iplot(kind='bubble', 
                     x='RetailSales', 
                     y='RetailTransfers', 
                     size='RetailSales',
                     categories='ItemType', 
                     xTitle='Tenure', 
                     yTitle='Total Charges',
                     title='Charges vs. Tenure: One Year Contract, Credit Card Customers')