# Using Python Libraries

## First Up: Numeric Python

![numpy](https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/numpy.png)

[NumPy](https://www.numpy.org/) is the fundamental package for scientific computing with Python. Many other data science packages, especially those that work with matrices, rely on it for its speed and utility.

For numpy, the standard alias is `np`.

In [1]:
# Import numpy
import numpy as np

### NumPy Arrays

Python lists and NumPy arrays can both hold numbers. However, Python lists have limited functionality for mathematical operations. NumPy arrays make it easy and fast to do math with a collection of numbers.

In [2]:
# Explore a numpy array
x = np.array([1, 2, 3])
print(x)
print(type(x))

[1 2 3]
<class 'numpy.ndarray'>


Let's make a list using base Python, and an array using Numpy, and see how they function differently:

In [3]:
# Create a list in base Python of 3 integers
numbers_list = [2, 4, 6]
# Create a numpy array containing the same 3 integers
numbers_array = np.array([2, 4, 6])

### Arithmetic Operations

Arithmetic operators (e.g. +, -, * and /) work according to mathematical principles for arrays, unlike with lists. These operations are done "element-wise".

In [4]:
# Multiply the array by 3
numbers_array * 3

array([ 6, 12, 18])

In [5]:
# Multiply the list by 3
numbers_list * 3

[2, 4, 6, 2, 4, 6, 2, 4, 6]

In [6]:
# Add 20 to the array
numbers_array + 20

array([22, 24, 26])

In [7]:
# Add 20 to the list
numbers_list + 20

TypeError: can only concatenate list (not "int") to list

### Speed

Below, you will find a piece of code we will use to compare the speed of operations on lists vs arrays.

In [8]:
# Setting the size of our iterables
size_of_vec = 1000

# Creating two lists of that size
X = list(range(size_of_vec))
Y = list(range(size_of_vec))

In [9]:
# Timing how long it takes to add each element in the two lists
# Complicated bit of code using a list comprehension
%timeit [X[i] + Y[i] for i in range(size_of_vec)]

92 µs ± 1.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [10]:
# Now let's try with numpy arrays
X = np.array(range(size_of_vec))
Y = np.array(range(size_of_vec))

In [11]:
# Much simpler code, since it's easier to do element-wise math
%timeit X + Y

835 ns ± 16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


## Next Up: Importing, Reading and Manipulating Data with ACTUAL LITERAL PANDAS

![I have no idea what I'm doing panda](https://cdn-images-1.medium.com/max/1600/1*oBx032ncOwLmCFX3Epo3Zg.jpeg)

Just kidding - but Pandas is a great library to work with relational data. 

[Check out the documentation!](https://pandas.pydata.org/pandas-docs/stable/) (always a great idea)

Note that we didn't go into a lot of Numpy's functionality, but here's something cool - Pandas is built on top of Numpy! That means they work really well together, and that Pandas has some math functionality already built in.

If you'd like to read more about Numpy and Pandas, [here is an interesting blog post](https://cloudxlab.com/blog/numpy-pandas-introduction/) discussing them.

Let's dive into some data from the Austin Animal Shelter. 

Data source: [intakes data](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm) and [outcomes data](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238).

Today we'll be working with the intakes data, which I've already downloaded and included in the repository.

In [12]:
# Import
import pandas as pd

Before reading in the data, we need to know what format the data is in and where exactly the data can be found, so we can tell Pandas what to do.

In [13]:
# Where is our data?
!ls data/

Austin_Animal_Center_Intakes_030921.csv
Austin_Animal_Center_Outcomes_030921.csv


In [14]:
# Read in the comma-separated-value (csv) document as df
df = pd.read_csv('data/Austin_Animal_Center_Intakes_030921.csv', 
                 parse_dates=['DateTime'])

What options do we have when we read in a csv? Let's look at the documentation!

[Convenient link](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)

I happen to know that there is a column in the data named 'DateTime' (run the below code to check it out before adjusting our read-in code!) - let's use an argument to read it in as a datetime object, then discuss.

In [15]:
df['DateTime']

0        2019-01-03 16:19:00
1        2015-07-05 12:59:00
2        2016-04-14 18:43:00
3        2013-10-21 07:59:00
4        2014-06-29 10:38:00
                 ...        
124217   2021-03-09 12:07:00
124218   2021-03-09 12:40:00
124219   2021-03-05 14:31:00
124220   2021-03-09 12:04:00
124221   2021-03-05 14:31:00
Name: DateTime, Length: 124222, dtype: datetime64[ns]

### Initial Exploration of a Dataframe

Questions to ask yourself:

- How big is the data?
- Are there any empty cells? 
- What are the datatypes of the columns of data?

In [16]:
# What does this dataframe look like?
# Check out the first 5 rows
df.head(3)

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color
0,A786884,*Brock,2019-01-03 16:19:00,01/03/2019 04:19:00 PM,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor
1,A706918,Belle,2015-07-05 12:59:00,07/05/2015 12:59:00 PM,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver
2,A724273,Runster,2016-04-14 18:43:00,04/14/2016 06:43:00 PM,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White


In [17]:
df.tail()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color
124217,A830428,,2021-03-09 12:07:00,03/09/2021 12:07:00 PM,501 Strichen Drive in Travis (TX),Wildlife,Sick,Other,Unknown,2 years,Skunk,Black/White
124218,A830411,,2021-03-09 12:40:00,03/09/2021 12:40:00 PM,12609 Dessau Rd in Austin (TX),Stray,Normal,Dog,Intact Male,1 year,Dachshund/Rat Terrier,Brown/White
124219,A830250,*Hansel,2021-03-05 14:31:00,03/05/2021 02:31:00 PM,Cesar Chavez And North Lamar in Austin (TX),Stray,Normal,Dog,Intact Male,4 years,German Shepherd,Brown/Black
124220,A830431,Chema,2021-03-09 12:04:00,03/09/2021 12:04:00 PM,Austin (TX),Owner Surrender,Normal,Dog,Unknown,3 years,Beagle/Chihuahua Shorthair,Black/Brown
124221,A830251,*Gretel,2021-03-05 14:31:00,03/05/2021 02:31:00 PM,Cesar Chavez And North Lamar in Austin (TX),Stray,Normal,Dog,Intact Female,2 years,German Shepherd,Brown/Black


In [18]:
# Check out the shape of the df
df.shape

(124222, 12)

In [19]:
# And then the size
df.size

1490664

In [20]:
# And then look at some info on the df
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 124222 entries, 0 to 124221
Data columns (total 12 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   Animal ID         124222 non-null  object        
 1   Name              85158 non-null   object        
 2   DateTime          124222 non-null  datetime64[ns]
 3   MonthYear         124222 non-null  object        
 4   Found Location    124222 non-null  object        
 5   Intake Type       124222 non-null  object        
 6   Intake Condition  124222 non-null  object        
 7   Animal Type       124222 non-null  object        
 8   Sex upon Intake   124221 non-null  object        
 9   Age upon Intake   124222 non-null  object        
 10  Breed             124222 non-null  object        
 11  Color             124222 non-null  object        
dtypes: datetime64[ns](1), object(11)
memory usage: 11.4+ MB


In [21]:
# Describe the columns
df.describe()

  df.describe()


Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color
count,124222,85158,124222,124222,124222,124222,124222,124222,124221,124222,124222,124222
unique,111013,19758,87743,87743,53737,6,10,5,5,52,2632,595
top,A721033,Max,2016-09-23 12:00:00,09/23/2016 12:00:00 PM,Austin (TX),Stray,Normal,Dog,Intact Male,1 year,Domestic Shorthair Mix,Black/White
freq,33,564,64,64,22889,86507,108026,70505,40349,21809,30985,13039
first,,,2013-10-01 07:51:00,,,,,,,,,
last,,,2021-03-09 12:50:00,,,,,,,,,


**A note on `.describe()`:** this function behaves differently whether we feed in objects or numeric types. We'll explore this more later.

**And a question:** You see that some of the ways we dealt with our dataframe required `()` and some did not - why is that?

- Methods vs attributes


### Accessing Columns

Use brackets and the exact column name to access a particular column.

In [22]:
type(df)

pandas.core.frame.DataFrame

In [23]:
type(df['Name'])

pandas.core.series.Series

In [24]:
df['Name'].head()

0     *Brock
1      Belle
2    Runster
3        NaN
4        Rio
Name: Name, dtype: object

In [25]:
type(df[['Name']])

pandas.core.frame.DataFrame

In [26]:
df[['Name']].head()

Unnamed: 0,Name
0,*Brock
1,Belle
2,Runster
3,
4,Rio


Can also use `.` notation, if the column name doesn't have spaces.

In [27]:
df.Name

0          *Brock
1           Belle
2         Runster
3             NaN
4             Rio
           ...   
124217        NaN
124218        NaN
124219    *Hansel
124220      Chema
124221    *Gretel
Name: Name, Length: 124222, dtype: object

### Dealing with Datetime Objects

You can access parts of a datetime object using `.dt` - an attribute of the column, not a method!

In [28]:
df.columns

Index(['Animal ID', 'Name', 'DateTime', 'MonthYear', 'Found Location',
       'Intake Type', 'Intake Condition', 'Animal Type', 'Sex upon Intake',
       'Age upon Intake', 'Breed', 'Color'],
      dtype='object')

In [29]:
df['MonthYear']

0         01/03/2019 04:19:00 PM
1         07/05/2015 12:59:00 PM
2         04/14/2016 06:43:00 PM
3         10/21/2013 07:59:00 AM
4         06/29/2014 10:38:00 AM
                   ...          
124217    03/09/2021 12:07:00 PM
124218    03/09/2021 12:40:00 PM
124219    03/05/2021 02:31:00 PM
124220    03/09/2021 12:04:00 PM
124221    03/05/2021 02:31:00 PM
Name: MonthYear, Length: 124222, dtype: object

In [30]:
# Let's check out the intake year
df['DateTime'].dt.year

0         2019
1         2015
2         2016
3         2013
4         2014
          ... 
124217    2021
124218    2021
124219    2021
124220    2021
124221    2021
Name: DateTime, Length: 124222, dtype: int64

In [31]:
# How do we create a new column?
# Let's create a new column for intake year
df['IntakeYear'] = df['DateTime'].dt.year

In [32]:
# Check our work
df.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color,IntakeYear
0,A786884,*Brock,2019-01-03 16:19:00,01/03/2019 04:19:00 PM,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor,2019
1,A706918,Belle,2015-07-05 12:59:00,07/05/2015 12:59:00 PM,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver,2015
2,A724273,Runster,2016-04-14 18:43:00,04/14/2016 06:43:00 PM,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White,2016
3,A665644,,2013-10-21 07:59:00,10/21/2013 07:59:00 AM,Austin (TX),Stray,Sick,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Calico,2013
4,A682524,Rio,2014-06-29 10:38:00,06/29/2014 10:38:00 AM,800 Grove Blvd in Austin (TX),Stray,Normal,Dog,Neutered Male,4 years,Doberman Pinsch/Australian Cattle Dog,Tan/Gray,2014


In [33]:
# What datatype is the data in our new column?
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 124222 entries, 0 to 124221
Data columns (total 13 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   Animal ID         124222 non-null  object        
 1   Name              85158 non-null   object        
 2   DateTime          124222 non-null  datetime64[ns]
 3   MonthYear         124222 non-null  object        
 4   Found Location    124222 non-null  object        
 5   Intake Type       124222 non-null  object        
 6   Intake Condition  124222 non-null  object        
 7   Animal Type       124222 non-null  object        
 8   Sex upon Intake   124221 non-null  object        
 9   Age upon Intake   124222 non-null  object        
 10  Breed             124222 non-null  object        
 11  Color             124222 non-null  object        
 12  IntakeYear        124222 non-null  int64         
dtypes: datetime64[ns](1), int64(1), object(11)
memory usage: 12

### Checking for Null Values

Can use `.isna` or `.isnull` - same thing!

In [34]:
# Check it - is the result what you expect?
df.isna()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color,IntakeYear
0,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,True,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
124217,False,True,False,False,False,False,False,False,False,False,False,False,False
124218,False,True,False,False,False,False,False,False,False,False,False,False,False
124219,False,False,False,False,False,False,False,False,False,False,False,False,False
124220,False,False,False,False,False,False,False,False,False,False,False,False,False


In [35]:
# How can you make that result more usable?
df.isna().sum()

Animal ID               0
Name                39064
DateTime                0
MonthYear               0
Found Location          0
Intake Type             0
Intake Condition        0
Animal Type             0
Sex upon Intake         1
Age upon Intake         0
Breed                   0
Color                   0
IntakeYear              0
dtype: int64

### Checking for Duplicate Rows

In [36]:
# Function is called duplicated - check the documentation!
df.duplicated(subset=['Name'])

0         False
1         False
2         False
3         False
4         False
          ...  
124217     True
124218     True
124219     True
124220     True
124221     True
Length: 124222, dtype: bool

In [37]:
# Can use same trick as above on duplicated to make the result more usable
df.duplicated(subset=['Name'], keep=False).sum()

113284

### Dropping Columns or Rows

Several different methods depending on what we're doing - but the to discuss right now is `.drop`

In [38]:
# Let's drop the MonthYear column, which is the same as our DateTime
df = df.drop(columns=['MonthYear'])

In [39]:
# Check our work here...
df.head()

Unnamed: 0,Animal ID,Name,DateTime,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color,IntakeYear
0,A786884,*Brock,2019-01-03 16:19:00,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor,2019
1,A706918,Belle,2015-07-05 12:59:00,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver,2015
2,A724273,Runster,2016-04-14 18:43:00,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White,2016
3,A665644,,2013-10-21 07:59:00,Austin (TX),Stray,Sick,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Calico,2013
4,A682524,Rio,2014-06-29 10:38:00,800 Grove Blvd in Austin (TX),Stray,Normal,Dog,Neutered Male,4 years,Doberman Pinsch/Australian Cattle Dog,Tan/Gray,2014


Why won't my changes save ???

Fun thing about pandas - time to discuss resetting variables, or using `inplace`

### Renaming Columns

[Documentation for `.rename`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html)

In [42]:
# Let's remove spaces from the columns, and make all column names lowercase to be easier
# Can use a dictionary to rename
col_names = df.columns

In [45]:
col_names[0].replace(" ", "")

'AnimalID'

In [46]:
new_col_names = []
for c in col_names:
    new_col_names.append(c.replace(" ", "").lower())

In [47]:
new_col_names

['animalid',
 'name',
 'datetime',
 'foundlocation',
 'intaketype',
 'intakecondition',
 'animaltype',
 'sexuponintake',
 'ageuponintake',
 'breed',
 'color',
 'intakeyear']

In [48]:
col_dict = dict(zip(col_names, new_col_names))

In [49]:
col_dict

{'Animal ID': 'animalid',
 'Name': 'name',
 'DateTime': 'datetime',
 'Found Location': 'foundlocation',
 'Intake Type': 'intaketype',
 'Intake Condition': 'intakecondition',
 'Animal Type': 'animaltype',
 'Sex upon Intake': 'sexuponintake',
 'Age upon Intake': 'ageuponintake',
 'Breed': 'breed',
 'Color': 'color',
 'IntakeYear': 'intakeyear'}

In [52]:
df = df.rename(col_dict, axis=1)

In [53]:
# Check your work
df.head()

Unnamed: 0,animalid,name,datetime,foundlocation,intaketype,intakecondition,animaltype,sexuponintake,ageuponintake,breed,color,intakeyear
0,A786884,*Brock,2019-01-03 16:19:00,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor,2019
1,A706918,Belle,2015-07-05 12:59:00,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver,2015
2,A724273,Runster,2016-04-14 18:43:00,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White,2016
3,A665644,,2013-10-21 07:59:00,Austin (TX),Stray,Sick,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Calico,2013
4,A682524,Rio,2014-06-29 10:38:00,800 Grove Blvd in Austin (TX),Stray,Normal,Dog,Neutered Male,4 years,Doberman Pinsch/Australian Cattle Dog,Tan/Gray,2014


In [54]:
# Can also use a lambda function
df.rename(columns= lambda x: x.replace(" ", "").lower())

Unnamed: 0,animalid,name,datetime,foundlocation,intaketype,intakecondition,animaltype,sexuponintake,ageuponintake,breed,color,intakeyear
0,A786884,*Brock,2019-01-03 16:19:00,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor,2019
1,A706918,Belle,2015-07-05 12:59:00,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver,2015
2,A724273,Runster,2016-04-14 18:43:00,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White,2016
3,A665644,,2013-10-21 07:59:00,Austin (TX),Stray,Sick,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Calico,2013
4,A682524,Rio,2014-06-29 10:38:00,800 Grove Blvd in Austin (TX),Stray,Normal,Dog,Neutered Male,4 years,Doberman Pinsch/Australian Cattle Dog,Tan/Gray,2014
...,...,...,...,...,...,...,...,...,...,...,...,...
124217,A830428,,2021-03-09 12:07:00,501 Strichen Drive in Travis (TX),Wildlife,Sick,Other,Unknown,2 years,Skunk,Black/White,2021
124218,A830411,,2021-03-09 12:40:00,12609 Dessau Rd in Austin (TX),Stray,Normal,Dog,Intact Male,1 year,Dachshund/Rat Terrier,Brown/White,2021
124219,A830250,*Hansel,2021-03-05 14:31:00,Cesar Chavez And North Lamar in Austin (TX),Stray,Normal,Dog,Intact Male,4 years,German Shepherd,Brown/Black,2021
124220,A830431,Chema,2021-03-09 12:04:00,Austin (TX),Owner Surrender,Normal,Dog,Unknown,3 years,Beagle/Chihuahua Shorthair,Black/Brown,2021


### Slicing and Dicing

Perhaps your biggest tool for exploring around your dataframes will be `.loc` (and it's accompanying `.iloc`). This allows you to use conditionals to explore your data!

In [55]:
df.head()

Unnamed: 0,animalid,name,datetime,foundlocation,intaketype,intakecondition,animaltype,sexuponintake,ageuponintake,breed,color,intakeyear
0,A786884,*Brock,2019-01-03 16:19:00,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor,2019
1,A706918,Belle,2015-07-05 12:59:00,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver,2015
2,A724273,Runster,2016-04-14 18:43:00,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White,2016
3,A665644,,2013-10-21 07:59:00,Austin (TX),Stray,Sick,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Calico,2013
4,A682524,Rio,2014-06-29 10:38:00,800 Grove Blvd in Austin (TX),Stray,Normal,Dog,Neutered Male,4 years,Doberman Pinsch/Australian Cattle Dog,Tan/Gray,2014


In [56]:
# Example: look only at animals with intake type 'Stray'
df.loc[df['intaketype'] == 'Stray']

Unnamed: 0,animalid,name,datetime,foundlocation,intaketype,intakecondition,animaltype,sexuponintake,ageuponintake,breed,color,intakeyear
0,A786884,*Brock,2019-01-03 16:19:00,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor,2019
1,A706918,Belle,2015-07-05 12:59:00,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver,2015
2,A724273,Runster,2016-04-14 18:43:00,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White,2016
3,A665644,,2013-10-21 07:59:00,Austin (TX),Stray,Sick,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Calico,2013
4,A682524,Rio,2014-06-29 10:38:00,800 Grove Blvd in Austin (TX),Stray,Normal,Dog,Neutered Male,4 years,Doberman Pinsch/Australian Cattle Dog,Tan/Gray,2014
...,...,...,...,...,...,...,...,...,...,...,...,...
124211,A830432,,2021-03-09 12:50:00,Austin (TX),Stray,Normal,Cat,Intact Female,2 years,Domestic Shorthair,White/Black,2021
124214,A829340,`*Trixie,2021-02-08 14:44:00,8801 South Ih 35 in Austin (TX),Stray,Normal,Dog,Intact Female,1 year,Flat Coat Retriever Mix,Black,2021
124218,A830411,,2021-03-09 12:40:00,12609 Dessau Rd in Austin (TX),Stray,Normal,Dog,Intact Male,1 year,Dachshund/Rat Terrier,Brown/White,2021
124219,A830250,*Hansel,2021-03-05 14:31:00,Cesar Chavez And North Lamar in Austin (TX),Stray,Normal,Dog,Intact Male,4 years,German Shepherd,Brown/Black,2021


In [57]:
df.head(3)

Unnamed: 0,animalid,name,datetime,foundlocation,intaketype,intakecondition,animaltype,sexuponintake,ageuponintake,breed,color,intakeyear
0,A786884,*Brock,2019-01-03 16:19:00,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor,2019
1,A706918,Belle,2015-07-05 12:59:00,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver,2015
2,A724273,Runster,2016-04-14 18:43:00,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White,2016


In [58]:
# Second example: animals where the animal type is not dog
df.loc[df['animaltype'] != 'Dog']

Unnamed: 0,animalid,name,datetime,foundlocation,intaketype,intakecondition,animaltype,sexuponintake,ageuponintake,breed,color,intakeyear
3,A665644,,2013-10-21 07:59:00,Austin (TX),Stray,Sick,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Calico,2013
8,A818975,,2020-06-18 14:53:00,Braker Lane And Metric in Travis (TX),Stray,Normal,Cat,Intact Male,4 weeks,Domestic Shorthair,Cream Tabby,2020
9,A774147,,2018-06-11 07:45:00,6600 Elm Creek in Austin (TX),Stray,Injured,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Black/White,2018
10,A731435,*Casey,2016-08-08 17:52:00,Austin (TX),Owner Surrender,Normal,Cat,Neutered Male,5 months,Domestic Shorthair Mix,Cream Tabby,2016
14,A790209,Ziggy,2019-03-06 14:31:00,4424 S Mopac Expwy in Austin (TX),Public Assist,Normal,Cat,Intact Female,4 years,Domestic Shorthair Mix,Brown Tabby/White,2019
...,...,...,...,...,...,...,...,...,...,...,...,...
124203,A830137,*Sohjo,2021-03-03 12:26:00,3401 Palomar Lane in Austin (TX),Stray,Nursing,Cat,Intact Male,3 weeks,Domestic Shorthair,White/Blue Tabby,2021
124207,A830413,,2021-03-09 11:10:00,517 South Pleasant Valley Road in Austin (TX),Wildlife,Sick,Other,Unknown,2 years,Coyote,Red/Gray,2021
124210,A830385,Eevee,2021-03-08 15:46:00,3308 Thompson Street in Austin (TX),Stray,Normal,Cat,Spayed Female,2 years,Domestic Shorthair,Black,2021
124211,A830432,,2021-03-09 12:50:00,Austin (TX),Stray,Normal,Cat,Intact Female,2 years,Domestic Shorthair,White/Black,2021


In [59]:
# And a third - animals found before 2018
df.loc[df['intakeyear'] < 2018]

Unnamed: 0,animalid,name,datetime,foundlocation,intaketype,intakecondition,animaltype,sexuponintake,ageuponintake,breed,color,intakeyear
1,A706918,Belle,2015-07-05 12:59:00,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver,2015
2,A724273,Runster,2016-04-14 18:43:00,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White,2016
3,A665644,,2013-10-21 07:59:00,Austin (TX),Stray,Sick,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Calico,2013
4,A682524,Rio,2014-06-29 10:38:00,800 Grove Blvd in Austin (TX),Stray,Normal,Dog,Neutered Male,4 years,Doberman Pinsch/Australian Cattle Dog,Tan/Gray,2014
5,A743852,Odin,2017-02-18 12:46:00,Austin (TX),Owner Surrender,Normal,Dog,Neutered Male,2 years,Labrador Retriever Mix,Chocolate,2017
...,...,...,...,...,...,...,...,...,...,...,...,...
124089,A714665,Max,2016-01-29 12:41:00,7601 Daffan Ln in Travis (TX),Stray,Normal,Dog,Intact Male,1 year,Chihuahua Shorthair Mix,White/Black,2016
124090,A670937,Sadee Lynn,2016-09-02 09:33:00,15877 Long Vista Drive in Austin (TX),Public Assist,Normal,Dog,Spayed Female,8 years,Beagle,White/Tan,2016
124091,A670937,Sadee Lynn,2014-01-16 12:20:00,"38Th, I 35 in Austin (TX)",Stray,Normal,Dog,Spayed Female,5 years,Beagle,White/Tan,2014
124096,A670937,Sadee Lynn,2016-05-10 09:51:00,805 W 10Th in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,Beagle,White/Tan,2016


In [64]:
df.loc[(df['name'].isna()) & (df['intaketype'] != 'Stray')]

Unnamed: 0,animalid,name,datetime,foundlocation,intaketype,intakecondition,animaltype,sexuponintake,ageuponintake,breed,color,intakeyear
23,A810994,,2019-12-25 00:05:00,7900 Rm 1826 Rd in Travis (TX),Wildlife,Normal,Other,Unknown,2 years,Bat,Brown,2019
28,A722979,,2016-03-24 16:39:00,4100 Westbank Dr in Austin (TX),Wildlife,Normal,Other,Unknown,1 year,Bat Mix,Brown,2016
42,A812244,,2020-01-18 14:14:00,Travis (TX),Owner Surrender,Normal,Dog,Intact Male,1 month,Labrador Retriever,Tan,2020
65,A698165,,2015-03-06 18:48:00,201 San Jacinto Blvd in Austin (TX),Wildlife,Normal,Other,Unknown,4 weeks,Bat,Brown/Black,2015
71,A779334,,2018-08-30 11:21:00,5004 Heflin Ln in Austin (TX),Wildlife,Injured,Other,Unknown,2 years,Raccoon,Gray/Black,2018
...,...,...,...,...,...,...,...,...,...,...,...,...
124192,A830343,,2021-03-08 10:15:00,1513 Forest Trail in Austin (TX),Wildlife,Injured,Other,Unknown,2 years,Raccoon,Brown,2021
124195,A830403,,2021-03-08 22:11:00,9201 Burklund Farms Road in Travis (TX),Wildlife,Injured,Other,Unknown,1 year,Bat,Brown,2021
124202,A830398,,2021-03-08 19:14:00,2347 Douglas Street in Austin (TX),Public Assist,Normal,Dog,Intact Female,6 years,Pit Bull/Labrador Retriever,Black/Brown,2021
124207,A830413,,2021-03-09 11:10:00,517 South Pleasant Valley Road in Austin (TX),Wildlife,Sick,Other,Unknown,2 years,Coyote,Red/Gray,2021


## Let's Start to Answer Questions!

#### Question 1: What is the most common Animal Type?

In [67]:
# Let's explore the Animal Type column to find out
df['animaltype'].value_counts()

Dog          70505
Cat          46483
Other         6625
Bird           587
Livestock       22
Name: animaltype, dtype: int64

In [66]:
# Another way - look above at describe, or run another describe
# 'Top' for an object column means 'most common'
df['animaltype'].describe()

count     124222
unique         5
top          Dog
freq       70505
Name: animaltype, dtype: object

#### Question 2: What is the most common dog breed to come into the shelter?

In [68]:
# Let's create a new df, dogs, for all dogs in the original data
dogs = df.loc[df['animaltype'] == 'Dog']

In [69]:
dogs.head()

Unnamed: 0,animalid,name,datetime,foundlocation,intaketype,intakecondition,animaltype,sexuponintake,ageuponintake,breed,color,intakeyear
0,A786884,*Brock,2019-01-03 16:19:00,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor,2019
1,A706918,Belle,2015-07-05 12:59:00,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver,2015
2,A724273,Runster,2016-04-14 18:43:00,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White,2016
4,A682524,Rio,2014-06-29 10:38:00,800 Grove Blvd in Austin (TX),Stray,Normal,Dog,Neutered Male,4 years,Doberman Pinsch/Australian Cattle Dog,Tan/Gray,2014
5,A743852,Odin,2017-02-18 12:46:00,Austin (TX),Owner Surrender,Normal,Dog,Neutered Male,2 years,Labrador Retriever Mix,Chocolate,2017


In [76]:
# Now it's easier to look at common dog breeds
dogs['breed'].value_counts()

Pit Bull Mix                           8413
Labrador Retriever Mix                 6866
Chihuahua Shorthair Mix                6238
German Shepherd Mix                    3005
Australian Cattle Dog Mix              1511
                                       ... 
Dachshund Longhair/Miniature Poodle       1
Australian Cattle Dog/Vizsla              1
Anatol Shepherd/Boxer                     1
Australian Shepherd/Greyhound             1
Shiba Inu/Chow Chow                       1
Name: breed, Length: 2328, dtype: int64

#### Question 3: What percentage of animals have come into the shelter in a condition other than "Normal"?

In [80]:
# Need to explore the proper column
num_not_normal = len(df.loc[df['intakecondition'] != 'Normal'])

In [81]:
# Want to use pandas to calculate, not inputting number manually
num_not_normal

16196

In [82]:
# Calculate percentage
len(df)

124222

In [83]:
num_not_normal / len(df)

0.13037948189531645

In [87]:
# Other way to calculate
num_normal = df['intakecondition'].value_counts()['Normal']

In [89]:
1 - (num_normal / len(df))

0.13037948189531645

## Now - Outtake Data!

Let's explore together if we have time! If not - extra credit!