# Introduction to pandas

In this notebook, you'll get familiar with the basics of reading in and getting acquainted with data using the `pandas` library.

### Import the `pandas` library, aliased as `pd`.

In [1]:
import pandas as pd

### Let's explore these pandas methods, attributes, and accessors
 
**Methods of Inspecting** 
 * .head()
 * .tail()
 * .shape
 * .info()

**Method of Modifying**
 * .drop()
 * renaming columns
 
**Methods of Summarizing**
 * .unique()
 * .nunique()
 * .value_counts()

**Methods of Slicing and Filtering**
 * .loc[]

## Step 1: Reading in Data and Initial Inspection

In [2]:
art = pd.read_csv('../data/public_art.csv')

To inspect a portion of the dataframe, you can use `.head()` (to see the first few rows) or `.tail()` (to see the last few rows).

In [3]:
art.head(2)

Unnamed: 0,Title,Last Name,First Name,Location,Medium,Type,Description,Latitude,Longitude,Mapped Location
0,[Cross Country Runners],Frost,Miley,"4001 Harding Rd., Nashville TN",Bronze,Sculpture,,36.12856,-86.8366,"(36.12856, -86.8366)"
1,[Fourth and Commerce Sculpture],Walker,Lin,"333 Commerce Street, Nashville TN",,Sculpture,,36.16234,-86.77774,"(36.16234, -86.77774)"


In [4]:
art.tail(2)

Unnamed: 0,Title,Last Name,First Name,Location,Medium,Type,Description,Latitude,Longitude,Mapped Location
130,Women Suffrage Memorial,LeQuire,Alan,"600 Charlotte Avenue, Nashville TN",Bronze sculpture,Sculpture,,36.16527,-86.78382,"(36.16527, -86.78382)"
131,Youth Opportunity Center-STARS Nashville - Pea...,Rudloff,Andee,1704 Charlotte Ave.,House paint on vinyl,Mural,,36.15896,-86.799,"(36.15896, -86.799)"


To see the number of rows and columns, you can access the `.shape` attribute. This shows (number of rows, number of columns).

In [None]:
art.shape

To get more information about what is contained in each column, you can use `.info()'.

In [None]:
art.info()

**What do you notice?**

Quite a few missing Desciptions, 10 missing First Names, a few missing Mediums, and one missing Location.

You may notice that most of the columns are "objects". This is the datatype that `pandas` uses for text data. 

The float64 datatype is a numeric datatype that can handle decimal values.

## Step 2: Making Modifications

Since the Mapped Location information is already contained in the Latitude and Longitude columns, you really don't need to store it twice. You can use the `.drop()` method to get rid of that column.

In [None]:
art.drop(columns='Mapped Location')

In [None]:
art.head(2)

What happened? We failed to save the result of dropping the column. We need to assign the result back to the art dataframe.

In [None]:
art = art.drop(columns = 'Mapped Location')

In [None]:
art.head(2)

Let's say you want to rename the columns of the art dataframe. One way to do this is to assign a new list of values to the `columns` attribute.

In [None]:
art.columns = ['title', 'last_name', 'first_name', 'location', 'medium', 'type', 'description', 'lat', 'lng']

In [None]:
art.head(2)

## Step 3: Exploring and Slicing
What are the different types of artwork in this dataset?

In [None]:
art['type'].unique()

If you only care about the _number_ of unique values in a colmn, you can use `.nunique`.

For example, if you want to know the number of artist last names:

In [None]:
art['last_name'].nunique()

Which is the most popular type?

In [None]:
art['type'].value_counts()

What if you want to see all of the Murals? You can slice a DataFrame using `.loc` and passing in a conditional expression.

In [None]:
art.loc[art['type'] == 'Mural']

If you want to do further work or exploration with the sliced dataframe, you need to save it to a new variable.

In [None]:
murals = art.loc[art['type'] == 'Mural']
murals.shape

Who is the most prolific mural painter in Nashville?

In [None]:
murals['last_name'].value_counts()

Let's see all of the murals that Cooper painted.

In [None]:
murals.loc[murals['last_name'] == 'Cooper']

In [None]:
murals['last_name'].value_counts()

Take another look at the murals dataframe and notice that Sterling Goller-Brown and Ian Lawrence collaborated on multiple murals, but these are stored in the dataframe differently. What if we want to slice down and find these rows?

In [None]:
goller_lawrence = ['Sterling Goller-Brown.  Ian Lawrence', 'Sterling Goller-Brown and Ian Lawrence, co-creators']

In [None]:
murals.loc[murals['last_name'].isin(goller_lawrence)]

If you are only interested in certain columns, you can specify those with a list.

In [None]:
murals.loc[murals['last_name'].isin(goller_lawrence), ['title', 'location', 'medium']]

Finally, you can negate a condition by adding a tilde ~ before that condition. So if you want to find all murals not paineted by Goller-Brown and Lawrence:

In [None]:
murals.loc[~murals['last_name'].isin(goller_lawrence)]