# Exploring a database

Let's practice on Pandas. Download the dataset ```chipotle.csv``` from JULIE and upload it to your workspace.

You can also request our <a href = "https://full-stack-bigdata-datasets.s3.eu-west-3.amazonaws.com/Data+visualisation+et+collaboration/chipotle.csv" target = "_blank>S3 Bucket</a> to access the dataset.

1. Import a library that allows to read data from a csv file

In [85]:
import pandas as pd

2. Import the dataset

In [86]:
datas = pd.read_csv("../src/chipotle.csv", index_col=0)
display(datas)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
...,...,...,...,...,...
4617,1833,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Sour ...",$11.75
4618,1833,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Sour Cream, Cheese...",$11.75
4619,1834,1,Chicken Salad Bowl,"[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...",$11.25
4620,1834,1,Chicken Salad Bowl,"[Fresh Tomato Salsa, [Fajita Vegetables, Lettu...",$8.75


3. Look at the 10 first rows in the dataset

In [87]:
datas.head(10)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
6,3,1,Side of Chips,,$1.69
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$11.75
8,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...",$9.25
9,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",$9.25


4. What is the shape of the dataset ?

In [88]:
datas.shape

(4622, 5)

5. Display all the columns of our dataset

In [89]:
datas.columns

Index(['order_id', 'quantity', 'item_name', 'choice_description',
       'item_price'],
      dtype='object')

6. What is the most ordered item?

     *Indication: we will use ```groupby()``` & ```sort_values()```*

In [90]:
most_ordered = datas.groupby("item_name")["quantity"].sum().sort_values(ascending = False)
display(most_ordered)

item_name
Chicken Bowl                             761
Chicken Burrito                          591
Chips and Guacamole                      506
Steak Burrito                            386
Canned Soft Drink                        351
Chips                                    230
Steak Bowl                               221
Bottled Water                            211
Chips and Fresh Tomato Salsa             130
Canned Soda                              126
Chicken Salad Bowl                       123
Chicken Soft Tacos                       120
Side of Chips                            110
Veggie Burrito                            97
Barbacoa Burrito                          91
Veggie Bowl                               87
Carnitas Bowl                             71
Barbacoa Bowl                             66
Carnitas Burrito                          60
Steak Soft Tacos                          56
6 Pack Soft Drink                         55
Chips and Tomatillo Red Chili Salsa       50


7. How many items were ordered from Chipotle in total?

In [91]:
most_ordered.sum()

4972

8. How much revenue has Chipotle made?
      
- First, convert ```item_price``` to a decimal number.
      
Look at ```item_price``` column, what do you see?

Let's find a way to get that ```$``` out of our way. We can do that by using <a href="https://bit.ly/2ClcdtN" target="_blank">str</a>
        
Let's now convert the ```series``` to ```float``` instead of string. We can use <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html" target="_blank">astype()</a> function
        
- Then multiply the quantity sold by the price of the item, how much do you find? Round the result to two decimals
    
We can use the <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.round.html#pandas.Series.round" target="_blank">series.round()</a> function.

In [92]:
datas["item_price"] = datas["item_price"].str.replace("$", "").astype('float64')
display(datas)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,2.39
1,1,1,Izze,[Clementine],3.39
2,1,1,Nantucket Nectar,[Apple],3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",16.98
...,...,...,...,...,...
4617,1833,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Sour ...",11.75
4618,1833,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Sour Cream, Cheese...",11.75
4619,1834,1,Chicken Salad Bowl,"[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...",11.25
4620,1834,1,Chicken Salad Bowl,"[Fresh Tomato Salsa, [Fajita Vegetables, Lettu...",8.75


In [93]:
item_total_revenue = datas["item_price"] * datas["quantity"]
str(item_total_revenue.sum()) + "$"

'39237.02$'

9. What is the average revenue per order?

In [95]:
item_total_revenue.sum() / len(datas["order_id"].unique())

21.39423118865867