# More Pandas

![more_pandas](https://media.giphy.com/media/H0Qi5W2KzU5UI/giphy.gif)

### Scenario
You have decided that you want to start your own animal shelter, but you want to get an idea of what that will entail and to get more information about planning. In this lecture, we'll look at a real data set collected by Austin Animal Center.  The code below will return the last 1000 animal outcomes that have occurred.  We will use our pandas skills from the last lecture and learn some new ones in order to explore these data further.




#### Our goals in this notebook are to be able to: <br/>

- Apply and use `.map()`, `apply()`, and `.applymap()` from the Pandas library
- Introduce lambda functions and use them in coordination with above functions
- Explain what a groupby object is and split a DataFrame using `.groupby()`


#### Getting started

Let's take a moment to download and to examine the [Austin Animal Center data set](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238/data). 

Let's take a look at the data:

In [7]:
import numpy as np
import pandas as pd
import requests

%load_ext autoreload
%autoreload 2

from src.student_caller import one_random_student, three_random_students
from src.student_list import student_list

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


ModuleNotFoundError: No module named 'src'

In [8]:
url = 'https://data.austintexas.gov/resource/9t4d-g238.json'
response = requests.get(url)
animals = pd.DataFrame(response.json())
animals.head()

Unnamed: 0,animal_id,name,datetime,monthyear,date_of_birth,outcome_type,animal_type,sex_upon_outcome,age_upon_outcome,breed,color,outcome_subtype
0,A821019,Spot,2020-12-08T12:37:00.000,2020-12-08T12:37:00.000,2017-04-03T00:00:00.000,Adoption,Dog,Neutered Male,3 years,Pit Bull,White/Black,
1,A824438,*Rose,2020-12-08T12:27:00.000,2020-12-08T12:27:00.000,2011-11-27T00:00:00.000,Adoption,Dog,Spayed Female,9 years,German Shepherd,Tan/Black,
2,A825587,*Ludwig,2020-12-08T12:22:00.000,2020-12-08T12:22:00.000,2011-11-06T00:00:00.000,Adoption,Cat,Neutered Male,9 years,Domestic Medium Hair,Cream Tabby,Foster
3,A819626,,2020-12-08T11:53:00.000,2020-12-08T11:53:00.000,2020-06-25T00:00:00.000,Adoption,Cat,Neutered Male,5 months,Domestic Shorthair,White/Black,Foster
4,A819624,,2020-12-08T11:52:00.000,2020-12-08T11:52:00.000,2020-06-25T00:00:00.000,Adoption,Cat,Neutered Male,5 months,Domestic Shorthair,Black,Foster


In [9]:
animals.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   animal_id         1000 non-null   object
 1   name              655 non-null    object
 2   datetime          1000 non-null   object
 3   monthyear         1000 non-null   object
 4   date_of_birth     1000 non-null   object
 5   outcome_type      900 non-null    object
 6   animal_type       1000 non-null   object
 7   sex_upon_outcome  1000 non-null   object
 8   age_upon_outcome  1000 non-null   object
 9   breed             1000 non-null   object
 10  color             1000 non-null   object
 11  outcome_subtype   482 non-null    object
dtypes: object(12)
memory usage: 93.9+ KB


One way to become familiar with your data is to start asking questions. In your EDA notebooks, **markdown** will be especially helpful in tracking these questions and your methods of answering the questions.  

For example, a simple first question we might ask, after being presented with the above dataset, would be:

## What is the most commonly adopted animal type in the dataset?

We can then begin thinking about what parts of the DataFrame we need to answer the question.

    What features do we need?
     - 
    What type of logic and calculation do we perform?
     -  
    What type of visualization would help us answer the question?
     -

In [20]:
# Your code here


Questions lead to other questions. For the above example, the visualization begs the question, what Other animals are being adopted?

To find out, we need to know where the type of animal for Other is encoded.   
    
    What features do we need to answer what the most commonly adopted type of animal within the Other category is?
        - 

In [6]:
# Your code here

![hive mind](https://media.giphy.com/media/l0MYttFGk98Y4e4h2/giphy.gif)


What kinds of questions can we ask these data and what kinds of information can we get back?
Start filling in the [group question doc](https://docs.google.com/document/d/1Oq9cHGbKxKzvO9Ep_JAxrRWrLFlTEn0VpJVqEGXNdUQ/edit) together.  You can either add an individual question, or contribute to filling out another students question.  

# Quick Exploration

In [None]:
# Use info to check for na's, datatypes, and shape

In [None]:
# Use describe to gain a bit more detail about certain features. 

In [None]:
# Use value counts to check a categorical feature's distribution

In [None]:
# Use isna() for a more legible output (than .info()) of na distributions of our dataset.

Use fillna to fill animals with no name to 'unnamed'

In [None]:
three_random_students(student_list)

In [17]:
#__SOLUTION__
animals['name'] = animals['name'].fillna('unnamed')

In [18]:
animals.fillna('no_type_or_subtype', inplace=True)

In [19]:
animals.isna().sum()

animal_id           0
name                0
datetime            0
monthyear           0
date_of_birth       0
outcome_type        0
animal_type         0
sex_upon_outcome    0
age_upon_outcome    0
breed               0
color               0
outcome_subtype     0
dtype: int64