# Pandas Exercises

Try the exercises below to practice the *pandas* skills you learned. To edit and run the code, open the notebook in "playground mode" using the button in the upper right corner. Be sure to add it to your Drive to save your work. 

In a few cases, you might need (or want) to use syntax and functions we didn't cover. You should use Stack Overflow and the *pandas* documentation to help you solve these problems. No matter how much you use *pandas*, you'll always encounter situations where you don't know exactly what to do. 

## Setup

The exercises use a very popular dataset from the UCI Machine Learning Repository that describes the physical characteristics of irises. Run the code cell below to import it into a `DataFrame` called `iris`.

In [1]:
import pandas as pd
import numpy as np
columns=["sepal_length", "sepal_width", "petal_width", "petal_length", "flower"]
iris=pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data", header=None, names=columns)
iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_width,petal_length,flower
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


Run the cell below to initialize several functions that will spot-check the correctness of your solutions as you complete the exercises. 

In [2]:
def test_1(input):
  assert isinstance(input, pd.Series)
  expected = pd.Series([0.1, 0.2, 0.2, 0.1, 0.1, 0.2, 0.4, 0.4, 0.3, 0.3, 0.3],
                       index=np.arange(9, 20))
  assert expected.equals(input)

def test_2(pl_mean, pl_std):
  assert pl_mean > .23
  assert pl_mean < .24
  assert pl_std > .11
  assert pl_std < .12

def test_3(df):
  assert 'big_petal' in df.columns
  assert df['big_petal'].sum() == 116

def test_4(df):
  assert 'sepal_length' not in df.columns
  assert 'sepal_width' not in df.columns
  assert len(df.index) == 74
  expected = pd.Index(list(range(1, 149, 2)))
  assert df.index.equals(expected)

def test_5(df):
  assert len(df.index) == 59
  assert df['big_petal'].all()

def test_6(df):
  assert len(df.index) == 51
  assert (df['petal_length'] > .4).all()
  assert 'big_petal' not in df.columns

def test_7(df):
  expected = set([75, 81, 89])
  actual = set(list(df.index.values))
  assert len(expected.difference(actual)) == 0
  assert len(actual.difference(expected)) == 0

## Dictionary Operations

Create a new `Series` called `plength` that contains the `petal_length` column.

In [17]:
# YOUR CODE HERE
plength = pd.Series(iris['petal_length'])
plength.head(21)
#iris[['petal_length']]

0     0.2
1     0.2
2     0.2
3     0.2
4     0.2
5     0.4
6     0.3
7     0.2
8     0.2
9     0.1
10    0.2
11    0.2
12    0.1
13    0.1
14    0.2
15    0.4
16    0.4
17    0.3
18    0.3
19    0.3
20    0.2
Name: petal_length, dtype: float64

Store the 10th through 20th entries of `plength` in a `Series` named `plength_subset`

*You should not use a for-loop*



In [19]:
# YOUR CODE HERE
plength_subset = plength.loc[9:19]
# Note: 10th entry => index 9; 20th entry => index 19s
######################################### 
# Do not edit below this line
test_1(plength_subset)
plength_subset

9     0.1
10    0.2
11    0.2
12    0.1
13    0.1
14    0.2
15    0.4
16    0.4
17    0.3
18    0.3
19    0.3
Name: petal_length, dtype: float64

Find the mean and standard deviation of the petal length for the 10th through 20th flowers, and store them in variables named `plength_mean` and `plength_std` respectively. 

*Hint: The `Series` class contains a number of convenient summary functions*

In [0]:
# YOUR CODE HERE

# use Series.mean() & Series.std() methods
######################################### 
# Do not edit below this line
test_2(plength_mean, plength_std)
print('mean: {:.3f}'.format(plength_mean))
print('std: {:.3f}'.format(plength_std))

Add a new boolean column to `iris` that gives whether the petal length for each flower is greater than the mean petal length you found above. Call the column `big_petal`. 

*Once again, your solution shouldn't use a for-loop.*

In [0]:
# YOUR CODE HERE

######################################### 
# Do not edit below this line
test_3(iris)
iris.head()

## `.loc`

Now, create a `DataFrame` called `odd_iris` that contains only the flowers with odd indices in `iris` and doesn't contain `sepal_length` or `sepal_width`. Try to do this with just one line of code. And don't reset the index. 

*Hint: You may find `np.arange` useful here*

In [0]:
# YOUR CODE HERE

######################################### 
# Do not edit below this line
test_4(odd_iris)
odd_iris.head()

Create a new `DataFrame` called `big_odd_iris` that contains only the flowers from `odd_iris` with `big_petal=True`.

In [0]:
# YOUR CODE HERE

######################################### 
# Do not edit below this line
test_5(big_odd_iris)
big_odd_iris.head()

In one line, remove the `big_petal` column and all rows with `petal_length <= 0.4` from `big_odd_iris`. 

In [0]:
# YOUR CODE HERE

######################################### 
# Do not edit below this line
test_6(big_odd_iris)
big_odd_iris.head()

Store the 15th, 18th, and 21st rows from the new version of `big_odd_iris` in a new `DataFrame` named `odd_subset` in one line of code. 

*Hint: Use `.iloc` instead of `.loc`. Check out the [documentation here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html).*

In [0]:
# YOUR CODE HERE

######################################### 
# Do not edit below this line
test_7(odd_subset)
odd_subset