# Biodiversity in National Parks

This project aims to answer a few questions about biodiversity in some of the USA's national parks.
The questions are as follows:

-How many species from each category are considered endangered? 

-How common are each category of plant/animal? 

-Which park has the most unique species?

# Importing and Cleaning the Data

In [1]:
#importing libraries and dependencies
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
#reading in the observations csv and doing base cleaning
observations_df = pd.read_csv('observations.csv')

observations_df.head()

Unnamed: 0,scientific_name,park_name,observations
0,Vicia benghalensis,Great Smoky Mountains National Park,68
1,Neovison vison,Great Smoky Mountains National Park,77
2,Prunus subcordata,Yosemite National Park,138
3,Abutilon theophrasti,Bryce National Park,84
4,Githopsis specularioides,Great Smoky Mountains National Park,85


In [3]:
#reading in the species_info csv and doing base cleaning
species_df = pd.read_csv('species_info.csv')

species_df.head()

Unnamed: 0,category,scientific_name,common_names,conservation_status
0,Mammal,Clethrionomys gapperi gapperi,Gapper's Red-Backed Vole,
1,Mammal,Bos bison,"American Bison, Bison",
2,Mammal,Bos taurus,"Aurochs, Aurochs, Domestic Cattle (Feral), Dom...",
3,Mammal,Ovis aries,"Domestic Sheep, Mouflon, Red Sheep, Sheep (Feral)",
4,Mammal,Cervus elaphus,Wapiti Or Elk,


In [4]:
#merging the 2 df into one
biodiversity_df = observations_df.merge(species_df, on='scientific_name')

biodiversity_df.head()

Unnamed: 0,scientific_name,park_name,observations,category,common_names,conservation_status
0,Vicia benghalensis,Great Smoky Mountains National Park,68,Vascular Plant,"Purple Vetch, Reddish Tufted Vetch",
1,Vicia benghalensis,Yosemite National Park,148,Vascular Plant,"Purple Vetch, Reddish Tufted Vetch",
2,Vicia benghalensis,Yellowstone National Park,247,Vascular Plant,"Purple Vetch, Reddish Tufted Vetch",
3,Vicia benghalensis,Bryce National Park,104,Vascular Plant,"Purple Vetch, Reddish Tufted Vetch",
4,Neovison vison,Great Smoky Mountains National Park,77,Mammal,American Mink,


In [5]:
biodiversity_df['conservation_status'].unique()

array([nan, 'Species of Concern', 'Threatened', 'Endangered',
       'In Recovery'], dtype=object)

In [6]:
biodiversity_df['conservation_status'].fillna('Not Endangered', inplace=True)

biodiversity_df.head()

Unnamed: 0,scientific_name,park_name,observations,category,common_names,conservation_status
0,Vicia benghalensis,Great Smoky Mountains National Park,68,Vascular Plant,"Purple Vetch, Reddish Tufted Vetch",Not Endangered
1,Vicia benghalensis,Yosemite National Park,148,Vascular Plant,"Purple Vetch, Reddish Tufted Vetch",Not Endangered
2,Vicia benghalensis,Yellowstone National Park,247,Vascular Plant,"Purple Vetch, Reddish Tufted Vetch",Not Endangered
3,Vicia benghalensis,Bryce National Park,104,Vascular Plant,"Purple Vetch, Reddish Tufted Vetch",Not Endangered
4,Neovison vison,Great Smoky Mountains National Park,77,Mammal,American Mink,Not Endangered


In [7]:
biodiversity_df['park_name'].unique()

array(['Great Smoky Mountains National Park', 'Yosemite National Park',
       'Yellowstone National Park', 'Bryce National Park'], dtype=object)

In [8]:
categories = biodiversity_df['category'].unique()
categories = [x.lower() for x in categories]
categories.sort()
print(categories)

['amphibian', 'bird', 'fish', 'mammal', 'nonvascular plant', 'reptile', 'vascular plant']


# How many species from each category are considered endangered?

In [9]:
category_groups = dict(biodiversity_df.groupby('category').__iter__())
amphibian, bird, fish, mammal, nonvascular_plant, reptile, vascular_plant = category_groups.values()

<bound method DataFrame.count of                   scientific_name                            park_name  \
0              Vicia benghalensis  Great Smoky Mountains National Park   
1              Vicia benghalensis               Yosemite National Park   
2              Vicia benghalensis            Yellowstone National Park   
3              Vicia benghalensis                  Bryce National Park   
8               Prunus subcordata               Yosemite National Park   
...                           ...                                  ...   
25623       Ranunculus hebecarpus            Yellowstone National Park   
25628  Dichanthelium depauperatum  Great Smoky Mountains National Park   
25629  Dichanthelium depauperatum                  Bryce National Park   
25630  Dichanthelium depauperatum            Yellowstone National Park   
25631  Dichanthelium depauperatum               Yosemite National Park   

       observations        category  \
0                68  Vascular Plant   


In [38]:
endangered_mammals.head()

Unnamed: 0,scientific_name,park_name,observations,category,common_names,conservation_status
4600,Canis rufus,Bryce National Park,30,Mammal,Red Wolf,Endangered
4601,Canis rufus,Yosemite National Park,34,Mammal,Red Wolf,Endangered
4602,Canis rufus,Great Smoky Mountains National Park,13,Mammal,Red Wolf,Endangered
4603,Canis rufus,Yellowstone National Park,60,Mammal,Red Wolf,Endangered
6008,Canis lupus,Yosemite National Park,35,Mammal,Gray Wolf,Endangered
6010,Canis lupus,Yosemite National Park,35,Mammal,"Gray Wolf, Wolf",Endangered
6011,Canis lupus,Bryce National Park,27,Mammal,Gray Wolf,Endangered
6013,Canis lupus,Bryce National Park,27,Mammal,"Gray Wolf, Wolf",Endangered
6014,Canis lupus,Bryce National Park,29,Mammal,Gray Wolf,Endangered
10724,Myotis grisescens,Bryce National Park,27,Mammal,Gray Myotis,Endangered
