# Introduction to Pandas Lab

Complete the following set of exercises to solidify your knowledge of Pandas fundamentals.

### 1. Import Numpy and Pandas and alias them to `np` and `pd` respectively.

In [12]:
import numpy as np
import pandas as pd

### 2. Create a Pandas Series containing the elements of the list below.

In [13]:
lst = [5.7, 75.2, 74.4, 84.0, 66.5, 66.3, 55.8, 75.7, 29.1, 43.7]
data = pd.Series(lst)
print(data)

0     5.7
1    75.2
2    74.4
3    84.0
4    66.5
5    66.3
6    55.8
7    75.7
8    29.1
9    43.7
dtype: float64


### 3. Use indexing to return the third value in the Series above.

*Hint: Remember that indexing begins at 0.*

In [14]:
print(data[2])

74.4


### 4. Create a Pandas DataFrame from the list of lists below. Each sublist should be represented as a row.

In [15]:
b = [[53.1, 95.0, 67.5, 35.0, 78.4],
     [61.3, 40.8, 30.8, 37.8, 87.6],
     [20.6, 73.2, 44.2, 14.6, 91.8],
     [57.4, 0.1, 96.1, 4.2, 69.5],
     [83.6, 20.5, 85.4, 22.8, 35.9],
     [49.0, 69.0, 0.1, 31.8, 89.1],
     [23.3, 40.7, 95.0, 83.8, 26.9],
     [27.6, 26.4, 53.8, 88.8, 68.5],
     [96.6, 96.4, 53.4, 72.4, 50.1],
     [73.7, 39.0, 43.2, 81.6, 34.7]]

data = pd.DataFrame(b)
print(data)

      0     1     2     3     4
0  53.1  95.0  67.5  35.0  78.4
1  61.3  40.8  30.8  37.8  87.6
2  20.6  73.2  44.2  14.6  91.8
3  57.4   0.1  96.1   4.2  69.5
4  83.6  20.5  85.4  22.8  35.9
5  49.0  69.0   0.1  31.8  89.1
6  23.3  40.7  95.0  83.8  26.9
7  27.6  26.4  53.8  88.8  68.5
8  96.6  96.4  53.4  72.4  50.1
9  73.7  39.0  43.2  81.6  34.7


### 5. Rename the data frame columns based on the names in the list below.

In [16]:
colnames = ['Score_1', 'Score_2', 'Score_3', 'Score_4', 'Score_5']
data = pd.DataFrame(b, columns = colnames)
print(data)

   Score_1  Score_2  Score_3  Score_4  Score_5
0     53.1     95.0     67.5     35.0     78.4
1     61.3     40.8     30.8     37.8     87.6
2     20.6     73.2     44.2     14.6     91.8
3     57.4      0.1     96.1      4.2     69.5
4     83.6     20.5     85.4     22.8     35.9
5     49.0     69.0      0.1     31.8     89.1
6     23.3     40.7     95.0     83.8     26.9
7     27.6     26.4     53.8     88.8     68.5
8     96.6     96.4     53.4     72.4     50.1
9     73.7     39.0     43.2     81.6     34.7


### 6. Create a subset of this data frame that contains only the Score 1, 3, and 5 columns.

In [25]:
data[["Score_1", "Score_3", "Score_5"]]

Unnamed: 0,Score_1,Score_3,Score_5
0,53.1,67.5,78.4
1,61.3,30.8,87.6
2,20.6,44.2,91.8
3,57.4,96.1,69.5
4,83.6,85.4,35.9
5,49.0,0.1,89.1
6,23.3,95.0,26.9
7,27.6,53.8,68.5
8,96.6,53.4,50.1
9,73.7,43.2,34.7


### 7. From the original data frame, calculate the average Score_3 value.

In [27]:
data["Score_3"].mean()

56.95000000000001

### 8. From the original data frame, calculate the maximum Score_4 value.

In [28]:
data["Score_4"].max()

88.8

### 9. From the original data frame, calculate the median Score 2 value.

In [29]:
data["Score_2"].median()

40.75

### 10. Create a Pandas DataFrame from the dictionary of product orders below.

In [None]:
orders = {'Description': ['LUNCH BAG APPLE DESIGN',
  'SET OF 60 VINTAGE LEAF CAKE CASES ',
  'RIBBON REEL STRIPES DESIGN ',
  'WORLD WAR 2 GLIDERS ASSTD DESIGNS',
  'PLAYING CARDS JUBILEE UNION JACK',
  'POPCORN HOLDER',
  'BOX OF VINTAGE ALPHABET BLOCKS',
  'PARTY BUNTING',
  'JAZZ HEARTS ADDRESS BOOK',
  'SET OF 4 SANTA PLACE SETTINGS'],
 'Quantity': [1, 24, 1, 2880, 2, 7, 1, 4, 10, 48],
 'UnitPrice': [1.65, 0.55, 1.65, 0.18, 1.25, 0.85, 11.95, 4.95, 0.19, 1.25],
 'Revenue': [1.65, 13.2, 1.65, 518.4, 2.5, 5.95, 11.95, 19.8, 1.9, 60.0]}

### 11. Calculate the total quantity ordered and revenue generated from these orders.

### 12. Obtain the prices of the most expensive and least expensive items ordered and print the difference.

# Import and Export Files 

# Challenge 1 - Working with CSV and Other Separated Files

Import the pandas library

csv files are more commonly used as dataframes. In the cell below, load the file from the URL provided using the `read_csv()` function in pandas. Starting version 0.19 of pandas, you can load a csv file into a dataframe directly from a URL without having to load the file first like we did with the JSON URL. The dataset we will be using contains informtaions about NASA shuttles. 

In the cell below, we define the column names and the URL of the data. Following this cell, read the tst file to a variable called `shuttle`. Since the file does not contain the column names, you must add them yourself using the column names declared in `cols` using the `names` argument. Additionally, a tst file is space separated, make sure you pass ` sep=' '` to the function.

In [1]:
#Your pandas import here:



In [7]:
# Run this code:

cols = ['time', 'rad_flow', 'fpv_close', 'fpv_open', 'high', 'bypass', 'bpv_close', 'bpv_open', 'class']
tst_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/shuttle/shuttle.tst'

In [8]:
# Your code here:



Let's verify that this worked by looking at the `head()` function.

In [9]:
# Your code here:



To make life easier for us, let's turn this dataframe into a comma separated file by saving it using the `to_csv()` function. Save `shuttle` into the file `shuttle.csv` and ensure the file is comma separated and that we are not saving the index column.

In [10]:
# Your code here:



# Challenge 2 - Working with Excel Files

We can also use pandas to convert excel spreadsheets to dataframes. Let's use the `read_excel()` function. In this case, `astronauts.xls` is in the same folder that contains this notebook. Read this file into a variable called `astronaut`. 

Note: Make sure to install the `xlrd` library if it is not yet installed.

In [11]:
# Your code here:



Use the `head()` function to inspect the dataframe.

In [12]:
# Your code here:



Use the `value_counts()` function to find the most popular undergraduate major among all astronauts.

In [13]:
# Your code here:



Due to all the commas present in the cells of this file, let's save it as a tab separated csv file. In the cell below, save `astronaut` as a tab separated file using the `to_csv` function. Call the file `astronaut.csv` and remember to remove the index column.

In [14]:
# Your code here:



# Bonus Challenge - Fertility Dataset

Visit the following [URL](https://archive.ics.uci.edu/ml/datasets/Fertility) and retrieve the dataset as well as the column headers. Determine the correct separator and read the file into a variable called `fertility`. Examine the dataframe using the `head()` function.

In [15]:
# Your code here:

