# Subsetting and Descriptive Stats

## Before your start:
   - Remember that you just need to do one of the challenges.
   - Keep in mind that you need to use some of the functions you learned in the previous lessons.
   - All datasets are provided in IronHack's database.
   - Elaborate your codes and outputs as much as you can.
   - Try your best to answer the questions and complete the tasks and most importantly: enjoy the process!
   
#### Import all the necessary libraries here:

In [1]:
# import libraries here
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import random
import pandas as pd

# [ONLY ONE MANDATORY] Challenge 1
#### In this challenge we will use the `Temp_States`  dataset. 

#### First import it into a dataframe called `temp`.

In [2]:
# your code here
temp = pd.read_csv("Temp_States.csv")

#### Print `temp`.

In [3]:
# your code here
temp

Unnamed: 0,City,State,Temperature
0,NYC,New York,19.444444
1,Albany,New York,9.444444
2,Buffalo,New York,3.333333
3,Hartford,Connecticut,17.222222
4,Bridgeport,Connecticut,14.444444
5,Treton,New Jersey,22.222222
6,Newark,New Jersey,20.0


#### Explore the data types of the *temp* dataframe. What types of data do we have? Comment your result.

In [4]:
# your code here
temp.dtypes

City            object
State           object
Temperature    float64
dtype: object

In [5]:
"""
City and state are objects. Temperature is float.
"""

'\nCity and state are objects. Temperature is float.\n'

#### Select the rows where state is New York.

In [6]:
# your code here
temp.loc[temp["State"]=="New York"]

Unnamed: 0,City,State,Temperature
0,NYC,New York,19.444444
1,Albany,New York,9.444444
2,Buffalo,New York,3.333333


#### What is the average temperature of cities in New York?

In [7]:
# your code here
np.mean(temp.loc[temp["State"]=="New York"])

Temperature    10.740741
dtype: float64

#### Which states and cities have a temperature above 15 degrees Celsius?

In [8]:
# your code here
temp.loc[temp["Temperature"]>15]

Unnamed: 0,City,State,Temperature
0,NYC,New York,19.444444
3,Hartford,Connecticut,17.222222
5,Treton,New Jersey,22.222222
6,Newark,New Jersey,20.0


#### Now, return only the cities that have a temperature above 15 degrees Celsius.

In [9]:
# your code here
temp["City"].loc[temp["Temperature"]>15]

0         NYC
3    Hartford
5      Treton
6      Newark
Name: City, dtype: object

#### Which cities have a temperature above 15 degrees Celcius and below 20 degrees Celsius?

**Hint**: First, write the condition. Then, select the rows.

In [10]:
# your code here
temp["City"].loc[(temp["Temperature"]>15) & (temp["Temperature"]<20)]

0         NYC
3    Hartford
Name: City, dtype: object

#### Find the mean and standard deviation of the temperature of each state.

In [11]:
# your code here
temp.groupby("State").mean()


Unnamed: 0_level_0,Temperature
State,Unnamed: 1_level_1
Connecticut,15.833333
New Jersey,21.111111
New York,10.740741


# [ONLY ONE MANDATORY]  Challenge 2

#### Load the `employees` dataset into a dataframe. Call the dataframe `employees`.

In [12]:
# your code here
employees = pd.read_csv("Employee.csv")
employees

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
0,Jose,IT,Bachelor,M,analyst,1,35
1,Maria,IT,Master,F,analyst,2,30
2,David,HR,Master,M,analyst,2,30
3,Sonia,HR,Bachelor,F,analyst,4,35
4,Samuel,Sales,Master,M,associate,3,55
5,Eva,Sales,Bachelor,F,associate,2,55
6,Carlos,IT,Master,M,VP,8,70
7,Pedro,IT,Phd,M,associate,7,60
8,Ana,HR,Master,F,VP,8,70


#### Explore the data types of the `employees` dataframe. Comment your results.

In [13]:
# your code here
employees.dtypes

Name          object
Department    object
Education     object
Gender        object
Title         object
Years          int64
Salary         int64
dtype: object

In [14]:
"""
Department, Education, Gender, Title are objects and Years and Salary are integers.
"""

'\nDepartment, Education, Gender, Title are objects and Years and Salary are integers.\n'

#### What's the average salary in this company?

In [15]:
# your code here
employees["Salary"].mean()

48.888888888888886

#### What's the highest salary?

In [16]:
# your code here
employees["Salary"].max()

70

#### What's the lowest salary?

In [17]:
# your code here
employees["Salary"].min()

30

#### Who are the employees with the lowest salary?

In [18]:
# your code here
employees["Name"].loc[employees["Salary"] == employees["Salary"].min()]

1    Maria
2    David
Name: Name, dtype: object

#### Find all the information about an employee called David.

In [19]:
# your code here
employees.loc[employees["Name"]=="David"]

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
2,David,HR,Master,M,analyst,2,30


#### Could you return only David's salary?

In [20]:
# your code here
employees["Salary"].loc[employees["Name"]=="David"]

2    30
Name: Salary, dtype: int64

#### Print all the rows where job title is associate.

In [21]:
# your code here
employees.loc[employees["Title"]=="associate"]

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
4,Samuel,Sales,Master,M,associate,3,55
5,Eva,Sales,Bachelor,F,associate,2,55
7,Pedro,IT,Phd,M,associate,7,60


#### Print the first 3 rows of your dataframe.
**Tip**: There are 2 ways to do it. Do it both ways.

In [22]:
# Method 1
employees.head(3)
# your code here

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
0,Jose,IT,Bachelor,M,analyst,1,35
1,Maria,IT,Master,F,analyst,2,30
2,David,HR,Master,M,analyst,2,30


In [23]:
# Method 2
employees.iloc[:3]
# your code here

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
0,Jose,IT,Bachelor,M,analyst,1,35
1,Maria,IT,Master,F,analyst,2,30
2,David,HR,Master,M,analyst,2,30


#### Find the employees whose title is associate and whose salary is above 55.

In [24]:
# your code here
temp["City"].loc[(temp["Temperature"]>15) & (temp["Temperature"]<20)]
employees.loc[(employees["Title"]=="associate") &(employees["Salary"]>55)]

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
7,Pedro,IT,Phd,M,associate,7,60


#### Group the employees by number of years of employment. What are the average salaries in each group?

In [25]:
# your code here
employees.groupby("Years").mean()

Unnamed: 0_level_0,Salary
Years,Unnamed: 1_level_1
1,35.0
2,38.333333
3,55.0
4,35.0
7,60.0
8,70.0


####  What is the average salary per title?

In [26]:
# your code here
(employees.groupby("Title").mean())["Salary"]

Title
VP           70.000000
analyst      32.500000
associate    56.666667
Name: Salary, dtype: float64

####  Find the salary quartiles.


In [27]:
# your code here
employees["Salary"].quantile([0.25,0.50,0.75])

0.25    35.0
0.50    55.0
0.75    60.0
Name: Salary, dtype: float64

#### Is the mean salary different per gender?

In [28]:
# your code here
(employees.groupby("Gender").mean())["Salary"]

Gender
F    47.5
M    50.0
Name: Salary, dtype: float64

#### Find the minimum, mean and maximum of all numeric columns for each company department.



In [29]:
# your code here
employees.describe()

Unnamed: 0,Years,Salary
count,9.0,9.0
mean,4.111111,48.888889
std,2.803767,16.541194
min,1.0,30.0
25%,2.0,35.0
50%,3.0,55.0
75%,7.0,60.0
max,8.0,70.0


#### Bonus Question:  for each department, compute the difference between the maximum and the minimum salary.
**Hint**: try using `agg` or `apply` combined with `lambda` functions.

In [30]:
# your code here
employees.groupby("Department").apply(lambda x: x['Salary'].max()-x['Salary'].min())

Department
HR       40
IT       40
Sales     0
dtype: int64

# [ONLY ONE MANDATORY] Challenge 3
#### Open the `Orders` dataset. Name your dataset `orders`.

In [38]:
# your code here
orders = pd.read_csv("Orders.zip")
orders = orders.drop(orders.columns[0], axis=1)
orders

Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.30
1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.00
3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
...,...,...,...,...,...,...,...,...,...,...,...,...,...
397919,581587,22613,2011,12,5,12,pack of 20 spaceboy napkins,12,2011-12-09 12:50:00,0.85,12680,France,10.20
397920,581587,22899,2011,12,5,12,children's apron dolly girl,6,2011-12-09 12:50:00,2.10,12680,France,12.60
397921,581587,23254,2011,12,5,12,childrens cutlery dolly girl,4,2011-12-09 12:50:00,4.15,12680,France,16.60
397922,581587,23255,2011,12,5,12,childrens cutlery circus parade,4,2011-12-09 12:50:00,4.15,12680,France,16.60


#### Explore your dataset by looking at the data types and summary statistics. Comment your results.

In [40]:
# your code here
print(orders.describe())
orders.dtypes

           InvoiceNo           year          month            day  \
count  397924.000000  397924.000000  397924.000000  397924.000000   
mean   560617.126645    2010.934259       7.612537       3.614555   
std     13106.167695       0.247829       3.416527       1.928274   
min    536365.000000    2010.000000       1.000000       1.000000   
25%    549234.000000    2011.000000       5.000000       2.000000   
50%    561893.000000    2011.000000       8.000000       3.000000   
75%    572090.000000    2011.000000      11.000000       5.000000   
max    581587.000000    2011.000000      12.000000       7.000000   

                hour       Quantity      UnitPrice     CustomerID  \
count  397924.000000  397924.000000  397924.000000  397924.000000   
mean       12.728247      13.021823       3.116174   15294.315171   
std         2.273535     180.420210      22.096788    1713.169877   
min         6.000000       1.000000       0.000000   12346.000000   
25%        11.000000       2.0000

InvoiceNo         int64
StockCode        object
year              int64
month             int64
day               int64
hour              int64
Description      object
Quantity          int64
InvoiceDate      object
UnitPrice       float64
CustomerID        int64
Country          object
amount_spent    float64
dtype: object

In [None]:
"""
Most of the columns are integers, except for two floats and four objects.
"""

In [58]:
####  What is the average purchase price?

In [42]:
# your code here
orders["UnitPrice"].mean()

3.116174480549152

#### What are the highest and lowest purchase prices? 

In [44]:
# your code here
print(orders["UnitPrice"].min())
orders["UnitPrice"].max()

0.0


8142.75

#### Select all the customers from Spain.
**Hint**: Remember that you are not asked to find orders from Spain but customers. A customer might have more than one order associated. 

In [48]:
# your code here
orders["CustomerID"].loc[(orders["Country"]=="Spain")].unique()

array([12557, 17097, 12540, 12551, 12484, 12539, 12510, 12421, 12502,
       12462, 12507, 12541, 12547, 12597, 12545, 12596, 12354, 12417,
       12455, 12450, 12548, 12556, 12550, 12546, 12454, 12448, 12544,
       12538, 12445, 12442])

#### How many customers do we have in Spain?

In [49]:
# your code here
len(orders["CustomerID"].loc[(orders["Country"]=="Spain")].unique())

30

#### Select all the customers who have bought more than 50 items.
**Hint**: Remember that you are not asked to find orders with more than 50 items but customers who bought more than 50 items. A customer with two orders of 30 items each should appear in the selection.

In [62]:
# your code here
ordersgroup = orders.groupby(orders["CustomerID"]).sum()
ordersgroup.loc[ordersgroup["Quantity"]>50]

Unnamed: 0_level_0,InvoiceNo,year,month,day,hour,Quantity,UnitPrice,amount_spent
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
12346,541431,2011,1,2,10,74215,1.04,77183.60
12347,101296926,365971,1383,441,2219,2458,481.21,4310.00
12348,16869685,62324,257,111,472,2341,178.71,1797.24
12349,42165457,146803,803,73,657,631,605.10,1757.55
12350,9231629,34187,34,51,272,197,65.30,334.40
...,...,...,...,...,...,...,...,...
18278,5116428,18099,81,18,99,66,29.55,173.90
18281,3895248,14077,42,49,70,54,39.36,80.82
18282,6838540,24132,116,60,146,103,62.39,178.05
18283,425704048,1520316,5503,2489,10346,1397,1220.93,2094.88


#### Select orders from Spain that include more than 50 items.

In [77]:
# your code here
spanish = orders.loc[orders["Country"]=="Spain"]
spanishsum = spanish.groupby("CustomerID").sum()
spanishsum.loc[spanishsum["Quantity"]>50]


Unnamed: 0_level_0,InvoiceNo,year,month,day,hour,Quantity,UnitPrice,amount_spent
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
12354,31952838,116638,232,232,754,530,261.22,1079.4
12417,12682154,46253,92,92,299,267,72.05,436.3
12421,25167687,90495,282,178,538,484,203.41,807.04
12442,6971460,24132,144,24,168,182,40.08,172.06
12445,2308500,8044,44,16,72,62,31.95,133.4
12448,12603492,44242,220,66,286,243,130.94,449.45
12450,4462114,16088,50,48,112,128,12.94,197.88
12454,8568825,30165,150,75,255,1006,109.23,3528.34
12455,26985556,96528,366,156,626,566,177.24,767.96
12462,34448972,124682,322,142,758,536,290.75,1189.59


#### Select all free orders.

In [78]:
# your code here
orders.loc[orders["UnitPrice"]==0]

Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
6914,537197,22841,2010,12,7,14,round cake tin vintage green,1,2010-12-05 14:02:00,0.0,12647,Germany,0.0
22539,539263,22580,2010,12,4,14,advent calendar gingham sack,4,2010-12-16 14:36:00,0.0,16560,United Kingdom,0.0
25379,539722,22423,2010,12,2,13,regency cakestand 3 tier,10,2010-12-21 13:45:00,0.0,14911,EIRE,0.0
29080,540372,22090,2011,1,4,16,paper bunting retrospot,24,2011-01-06 16:41:00,0.0,13081,United Kingdom,0.0
29082,540372,22553,2011,1,4,16,plasters in tin skulls,24,2011-01-06 16:41:00,0.0,13081,United Kingdom,0.0
34494,541109,22168,2011,1,4,15,organiser wood antique white,1,2011-01-13 15:10:00,0.0,15107,United Kingdom,0.0
53788,543599,84535B,2011,2,4,13,fairy cakes notebook a6 size,16,2011-02-10 13:08:00,0.0,17560,United Kingdom,0.0
85671,547417,22062,2011,3,3,10,ceramic bowl with love heart design,36,2011-03-23 10:25:00,0.0,13239,United Kingdom,0.0
92875,548318,22055,2011,3,3,12,mini cake stand hanging strawbery,5,2011-03-30 12:45:00,0.0,13113,United Kingdom,0.0
97430,548871,22162,2011,4,1,14,heart garland rustic padded,2,2011-04-04 14:42:00,0.0,14410,United Kingdom,0.0


#### Select all orders whose description starts with `lunch bag`.
**Hint**: use string functions.

In [80]:
# your code here
orders.loc[orders["Description"].str.startswith("lunch bag")]


Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
93,536378,20725,2010,12,3,9,lunch bag red retrospot,10,2010-12-01 09:37:00,1.65,14688,United Kingdom,16.50
172,536385,22662,2010,12,3,9,lunch bag dolly girl design,10,2010-12-01 09:56:00,1.65,17420,United Kingdom,16.50
354,536401,22662,2010,12,3,11,lunch bag dolly girl design,1,2010-12-01 11:21:00,1.65,15862,United Kingdom,1.65
359,536401,20725,2010,12,3,11,lunch bag red retrospot,1,2010-12-01 11:21:00,1.65,15862,United Kingdom,1.65
360,536401,22382,2010,12,3,11,lunch bag spaceboy design,2,2010-12-01 11:21:00,1.65,15862,United Kingdom,3.30
...,...,...,...,...,...,...,...,...,...,...,...,...,...
397465,581486,23207,2011,12,5,9,lunch bag alphabet design,10,2011-12-09 09:38:00,1.65,17001,United Kingdom,16.50
397713,581538,20727,2011,12,5,11,lunch bag black skull.,1,2011-12-09 11:34:00,1.65,14446,United Kingdom,1.65
397714,581538,20725,2011,12,5,11,lunch bag red retrospot,1,2011-12-09 11:34:00,1.65,14446,United Kingdom,1.65
397877,581581,23681,2011,12,5,12,lunch bag red vintage doily,10,2011-12-09 12:20:00,1.65,17581,United Kingdom,16.50


#### Select all `lunch bag` orders made in 2011.

In [103]:
# your code here
lunchbag = orders.loc[orders["Description"].str.startswith("lunch bag")]
lunchbag.loc[lunchbag["year"]==2011]

Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
26340,540015,20725,2011,1,2,11,lunch bag red retrospot,10,2011-01-04 11:40:00,1.65,13319,United Kingdom,16.50
26341,540015,20726,2011,1,2,11,lunch bag woodland,10,2011-01-04 11:40:00,1.65,13319,United Kingdom,16.50
26512,540023,22382,2011,1,2,12,lunch bag spaceboy design,2,2011-01-04 12:58:00,1.65,15039,United Kingdom,3.30
26513,540023,20726,2011,1,2,12,lunch bag woodland,1,2011-01-04 12:58:00,1.65,15039,United Kingdom,1.65
26860,540098,22384,2011,1,2,15,lunch bag pink polkadot,1,2011-01-04 15:50:00,1.65,16241,United Kingdom,1.65
...,...,...,...,...,...,...,...,...,...,...,...,...,...
397465,581486,23207,2011,12,5,9,lunch bag alphabet design,10,2011-12-09 09:38:00,1.65,17001,United Kingdom,16.50
397713,581538,20727,2011,12,5,11,lunch bag black skull.,1,2011-12-09 11:34:00,1.65,14446,United Kingdom,1.65
397714,581538,20725,2011,12,5,11,lunch bag red retrospot,1,2011-12-09 11:34:00,1.65,14446,United Kingdom,1.65
397877,581581,23681,2011,12,5,12,lunch bag red vintage doily,10,2011-12-09 12:20:00,1.65,17581,United Kingdom,16.50


#### Show the frequency distribution of the amount spent in Spain.

In [115]:
# your code here
orders.loc[orders["Country"]=="Spain"].groupby("amount_spent").count()

Unnamed: 0_level_0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
amount_spent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
0.00,1,1,1,1,1,1,1,1,1,1,1,1
0.21,3,3,3,3,3,3,3,3,3,3,3,3
0.29,1,1,1,1,1,1,1,1,1,1,1,1
0.39,3,3,3,3,3,3,3,3,3,3,3,3
0.42,1,1,1,1,1,1,1,1,1,1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...
417.50,1,1,1,1,1,1,1,1,1,1,1,1
488.16,2,2,2,2,2,2,2,2,2,2,2,2
1080.00,1,1,1,1,1,1,1,1,1,1,1,1
1220.40,2,2,2,2,2,2,2,2,2,2,2,2


#### Select all orders made in the month of August.

In [118]:
# your code here
orders.loc[orders["month"]==8]

Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
199475,561904,22075,2011,8,1,8,6 ribbons elegant christmas,96,2011-08-01 08:30:00,1.45,17941,United Kingdom,139.20
199476,561904,85049E,2011,8,1,8,scandinavian reds ribbons,156,2011-08-01 08:30:00,1.06,17941,United Kingdom,165.36
199477,561905,21385,2011,8,1,9,ivory hanging decoration heart,24,2011-08-01 09:31:00,0.85,14947,United Kingdom,20.40
199478,561905,84970L,2011,8,1,9,single heart zinc t-light holder,12,2011-08-01 09:31:00,0.95,14947,United Kingdom,11.40
199479,561905,84970S,2011,8,1,9,hanging heart zinc t-light holder,12,2011-08-01 09:31:00,0.85,14947,United Kingdom,10.20
...,...,...,...,...,...,...,...,...,...,...,...,...,...
226483,565067,22644,2011,8,3,17,ceramic cherry cake money bank,2,2011-08-31 17:16:00,1.45,15856,United Kingdom,2.90
226484,565067,22645,2011,8,3,17,ceramic heart fairy cake money bank,2,2011-08-31 17:16:00,1.45,15856,United Kingdom,2.90
226485,565067,22637,2011,8,3,17,piggy bank retrospot,2,2011-08-31 17:16:00,2.55,15856,United Kingdom,5.10
226486,565067,22646,2011,8,3,17,ceramic strawberry cake money bank,2,2011-08-31 17:16:00,1.45,15856,United Kingdom,2.90


#### Find the number of orders made by each country in the month of August.
**Hint**: Use value_counts().

In [122]:
# your code here
orders.loc[orders["month"]==8].groupby("Country").count()

Unnamed: 0_level_0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,amount_spent
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Australia,107,107,107,107,107,107,107,107,107,107,107,107
Austria,88,88,88,88,88,88,88,88,88,88,88,88
Belgium,194,194,194,194,194,194,194,194,194,194,194,194
Canada,5,5,5,5,5,5,5,5,5,5,5,5
Channel Islands,140,140,140,140,140,140,140,140,140,140,140,140
Denmark,16,16,16,16,16,16,16,16,16,16,16,16
EIRE,593,593,593,593,593,593,593,593,593,593,593,593
Finland,61,61,61,61,61,61,61,61,61,61,61,61
France,569,569,569,569,569,569,569,569,569,569,569,569
Germany,795,795,795,795,795,795,795,795,795,795,795,795


#### What's the  average amount of money spent by country?

In [124]:
# your code here
orders.loc[orders["month"]==8].groupby("Country")["amount_spent"].mean()

Country
Australia          210.179439
Austria             17.228182
Belgium             18.319691
Canada              10.312000
Channel Islands     34.977000
Denmark             13.321875
EIRE                28.612782
Finland             22.565574
France              24.272337
Germany             24.177069
Iceland             26.586818
Israel              28.501813
Italy               20.957368
Malta               20.345455
Netherlands        144.027893
Norway              26.309221
Poland              23.635294
Portugal            29.790244
Spain               13.281389
Sweden              35.021500
Switzerland         18.613820
United Kingdom      21.573396
Unspecified         23.088261
Name: amount_spent, dtype: float64

#### What's the most expensive item?

In [126]:
# your code here
orders.loc[orders["UnitPrice"]==orders["UnitPrice"].max()]

Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
118352,551697,POST,2011,5,2,13,postage,1,2011-05-03 13:46:00,8142.75,16029,United Kingdom,8142.75


#### What is the average amount spent per year?

In [127]:
# your code here
orders.groupby("year")["amount_spent"].mean()

year
2010    21.892733
2011    22.430074
Name: amount_spent, dtype: float64