In [1]:
import pandas as pd

The data in the csv file was extracted from a [spreadheet](https://collegecost.ed.gov/wwwroot/documents/CATClists2014.xlsx) made available by the [U.S. Department of Education College Affordability and Transparency Center](https://collegecost.ed.gov/).
It contains 2014-2015 tuition and fees for universities and colleges in the USA. The data is loaded into the data frame `data`

In [4]:
data = pd.read_csv("data/college_tuition.csv").set_index("Name of institution")

In [10]:
data.tail(5)

Unnamed: 0_level_0,Sector,Sector name,UnitID,OPEID,State,2014-15 Tuition and fees,List A: High tuition and fee indicator,List E: Low tuition and fee indicator
Name of institution,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Industrial Technical College,9,"Less than 2-year, private for-profit",444671,3781400,PR,6480.0,0,1
American Educational College,9,"Less than 2-year, private for-profit",241146,2303800,PR,6422.0,0,1
Future-Tech Institute,9,"Less than 2-year, private for-profit",459310,4116400,FL,6279.0,0,1
Rosslyn Training Academy of Cosmetology,9,"Less than 2-year, private for-profit",446516,3638300,PR,6139.0,0,1
InterAmerican Technical Institute,9,"Less than 2-year, private for-profit",483869,4223400,FL,2550.0,0,1


In [3]:
data.shape

(4140, 8)

In [11]:
data.dtypes

Sector                                      int64
Sector name                                object
UnitID                                      int64
OPEID                                       int64
State                                      object
2014-15 Tuition and fees                  float64
List A: High tuition and fee indicator      int64
List E: Low tuition and fee indicator       int64
dtype: object

# Indexing and selection with Series

First we are going to focus on one column in the data frame and extract it into the series `tuition`

In [12]:
tuition = data["2014-15 Tuition and fees"]
tuition

Name of institution
University of Pittsburgh-Pittsburgh Campus    17772.0
College of William and Mary                   17656.0
Pennsylvania State University-Main Campus     17502.0
Colorado School of Mines                      16918.0
University of New Hampshire-Main Campus       16552.0
                                               ...   
Industrial Technical College                   6480.0
American Educational College                   6422.0
Future-Tech Institute                          6279.0
Rosslyn Training Academy of Cosmetology        6139.0
InterAmerican Technical Institute              2550.0
Name: 2014-15 Tuition and fees, Length: 4140, dtype: float64

- Select the first five entries in the series
- Select the last five entries in the series
- Select every tenth entry (10% sampling)
- What is the tuition of the `'University of Georgia'`?
- Select the subseries containing the tuition fees of `'Harvard University'` and `'Massachusetts Institute of Technology'`.
- Select the subseries containing tuitions greater than `50000`
- Select the subseries containing tuition <= `50000` but greater than `40000`
- (Challenge) Select the tuition of institutes of technology, that is colleges with names ending with `Institute of Technology`
- (Challenge) What is the cheapest college in `Georgia`? **Hint** you can use the function `sort_values` to sort the series

# Indexing and selection with DataFrame

In [20]:
data.head()

Unnamed: 0_level_0,Sector,Sector name,UnitID,OPEID,State,2014-15 Tuition and fees,List A: High tuition and fee indicator,List E: Low tuition and fee indicator
Name of institution,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
University of Pittsburgh-Pittsburgh Campus,1,"4-year, public",215293,337900,PA,17772.0,1,0
College of William and Mary,1,"4-year, public",231624,370500,VA,17656.0,1,0
Pennsylvania State University-Main Campus,1,"4-year, public",214777,332900,PA,17502.0,1,0
Colorado School of Mines,1,"4-year, public",126775,134800,CO,16918.0,1,0
University of New Hampshire-Main Campus,1,"4-year, public",183044,258900,NH,16552.0,1,0


- Extract `State` and `Sector name` in a new data frame
- Extract the row pertaining to the `'University of Georgia'`
- What is the data type of the extract row?
- Extract two rows pertaning to 'Harvard University' and 'Massachusetts Institute of Technology'.
- What is the data type of the extracted rows
- Extract the `State` of 'Harvard University' and 'Massachusetts Institute of Technology'.
- What is the data type of what you extracted?
- What is the tuition fee for institutions in `GA`?
- What is the average tuition fee for institutions in `GA`?
- Extract the last two columns
- Extract the last first five rows and last two columns
- Sample 10% of the data using slicing
- (Challenge) In which state is the institution of the highest tuition fee located?
- (Challenge) What state has the highest average tuition fee?