# Ex2 - Getting and Knowing your Data

This time we are going to pull data directly from the internet.
Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

### Step 1. Import the necessary libraries

In [5]:
import numpy as np
import pandas as pd

print("Pandas version: {0}".format(pd.__version__))
print("Numpy version: {0}".format(np.__version__))

Pandas version: 0.24.2
Numpy version: 1.16.4


### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). 

### Step 3. Assign it to a variable called chipo.

In [7]:
url = "https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv"
chipo = pd.read_csv(url, sep="\t")

### Step 4. See the first 10 entries

In [10]:
chipo.head(10)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
6,3,1,Side of Chips,,$1.69
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$11.75
8,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...",$9.25
9,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",$9.25


In [11]:
chipo.dtypes

order_id               int64
quantity               int64
item_name             object
choice_description    object
item_price            object
dtype: object

### Step 5. What is the number of observations in the dataset?

In [27]:
# Solution 1
chipo.shape[0]

4622

In [30]:
# Solution 2
chipo.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
order_id              4622 non-null int64
quantity              4622 non-null int64
item_name             4622 non-null object
choice_description    3376 non-null object
item_price            4622 non-null object
dtypes: int64(2), object(3)
memory usage: 180.6+ KB


### Step 6. What is the number of columns in the dataset?

In [28]:
chipo.shape[1]

5

In [95]:
#alternative method
chipo.columns.value_counts().count()

5

### Step 7. Print the name of all the columns.

In [96]:
chipo.columns

Index(['order_id', 'quantity', 'item_name', 'choice_description',
       'item_price'],
      dtype='object')

### Step 8. How is the dataset indexed?

In [13]:
chipo.index

RangeIndex(start=0, stop=4622, step=1)

### Step 9. Which was the most-ordered item? 

In [54]:
item_grp = chipo.groupby(['item_name'])
item_grp = item_grp.sum()
item_grp = item_grp.sort_values(by="quantity", ascending=False)
item_grp.head(1)

Unnamed: 0_level_0,order_id,quantity
item_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Chicken Bowl,713926,761


### Step 10. For the most-ordered item, how many items were ordered?

In [45]:
max_qt = item_grp['quantity'].max()
item_grp[item_grp.quantity==max_qt]

Unnamed: 0_level_0,order_id,quantity
item_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Chicken Bowl,713926,761


### Step 11. What was the most ordered item in the choice_description column?

In [57]:
choice_grp = chipo.groupby(['choice_description'])
choice_grp = choice_grp.sum()
choice_grp = choice_grp.sort_values(by="quantity", ascending=False)
choice_grp.head(1)

Unnamed: 0_level_0,order_id,quantity
choice_description,Unnamed: 1_level_1,Unnamed: 2_level_1
[Diet Coke],123455,159


### Step 12. How many items were ordered in total?

In [72]:
print("Total Items: ", item_grp.quantity.sum())


Unique Items:  50
Total Items:  4972


### Step 13. Turn the item price into a float

#### Step 13.a. Check the item price type

In [75]:
chipo.item_price.dtype

dtype('O')

#### Step 13.b. Create a lambda function and change the type of item price

In [78]:
dollar_scrub = lambda x: float(x[1:-1])
chipo.item_price = chipo.item_price.apply(dollar_scrub)

#### Step 13.c. Check the item price type

In [79]:
chipo.item_price.dtype

dtype('float64')

### Step 14. How much was the revenue for the period in the dataset?

In [85]:
rev = (chipo.quantity * chipo.item_price).sum()
print("Total Revenue: $" + str(np.round(rev, decimals=2)))

Total Revenue: $39237.02


### Step 15. How many orders were made in the period?

In [88]:
tot_orders = chipo.order_id.max()
print("Total Orders: " + str(tot_orders))

Total Orders: 1834


In [98]:
#better method. value_counts acts like a tabular function here
tot_orders = chipo.order_id.value_counts().count()
tot_orders

1834

### Step 16. What is the average revenue amount per order?

In [100]:
# Solution 1
rev_per_order = rev/tot_orders
print("Average Revenue per Order: $" + str(np.round(rev_per_order, decimals=2)))

21.39423118865867

In [102]:
# Solution 2
chipo['revenue'] = chipo['quantity'] * chipo['item_price']
order_grouped = chipo.groupby(by=['order_id']).sum()
order_grouped.mean()['revenue']

21.394231188658654

### Step 17. How many different items are sold?

In [103]:
print("Unique Items: ", item_grp.shape[0])

Unique Items:  50


In [105]:
#alternative method
chipo.item_name.value_counts()

Chicken Bowl                             726
Chicken Burrito                          553
Chips and Guacamole                      479
Steak Burrito                            368
Canned Soft Drink                        301
Chips                                    211
Steak Bowl                               211
Bottled Water                            162
Chicken Soft Tacos                       115
Chips and Fresh Tomato Salsa             110
Chicken Salad Bowl                       110
Canned Soda                              104
Side of Chips                            101
Veggie Burrito                            95
Barbacoa Burrito                          91
Veggie Bowl                               85
Carnitas Bowl                             68
Barbacoa Bowl                             66
Carnitas Burrito                          59
Steak Soft Tacos                          55
6 Pack Soft Drink                         54
Chips and Tomatillo Red Chili Salsa       48
Chicken Cr