In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# GBIF - The Global Biodiversity Information Facility

This notebook shows how to use the biological occurrence data from GBIF in the previous notebook to make biodiversity maps.

## Mapping Biodiversity

If you're interested in biodiversity, or the number of species in a given area, you might want to visualize the data with a map. Heatmaps are particularly good for this, with color varying by the number of species in an area. Right now, we could approach this in a couple different ways with our data.

1. We can map the number of species per country. Mapping by country is a good place to start, but it could be a little misleading since some countries are very large and they may encompass many different types of habitats. This is pretty simple, just calculate the number of species in each country, then use a choropleth style plotting package to map it. 

(Side note: GBIF actually has a mapping API as well. It doesn't quite do what I want for this little exercise, but if definitely check it out first because it's very easy to use!)

In [3]:
#Read back in csv file
df = pd.read_csv("../data/Ayenia_cleaned_dataframe.csv")

In [None]:
#Plotly has a choropleth mapper built-in, so this is fairly easy to use
import plotly.graph_objects as go

In [42]:
#The plotly choropleth mapper requires three-letter country codes, so here I'm creating a dictionary for translating
#the country name to the appropriate code
c_codes = {}
for each in list(df_cleaned['country'].unique()):
    c_codes[each] = None

c_codes['Brazil'] = 'BRA'
c_codes['Nicaragua'] = 'NIC'
c_codes['United States of America'] = 'USA'
c_codes['El Salvador'] = 'SLV'
c_codes['Mexico'] = 'MEX'
c_codes['Dominican Republic'] = 'DOM'
c_codes['Costa Rica'] = 'CRI'
c_codes['Colombia'] = 'COL'
c_codes['Argentina'] = 'ARG'
c_codes['Bolivia, Plurinational State of'] = 'BOL'
c_codes['Cuba'] = 'CUB'
c_codes['Jamaica'] = 'JAM'
c_codes['Peru'] = 'PER'
c_codes['Paraguay'] = 'PRY'
c_codes['Guatemala'] = 'GTM'
c_codes['Puerto Rico'] = 'PRI'
c_codes['United States Minor Outlying Islands'] = 'UMI'
c_codes['Ecuador'] = 'ECU'
c_codes['Uruguay'] = 'URY'
c_codes['Guyana'] = 'GUY'
c_codes['Venezuela, Bolivarian Republic of'] = 'VEN'
c_codes['Honduras'] = 'HND'
c_codes['Haiti'] = 'HTI'
c_codes['Bahamas'] = 'BHS'
c_codes['Panama'] = 'PAN'
c_codes['Virgin Islands, U.S.'] = 'VIR'
c_codes['Montserrat'] = 'MSR'

In [56]:
#Create a new dataframe with just number of species per country and the country code
df_plot = pd.DataFrame(df_cleaned.groupby('country')['species'].nunique().sort_values(ascending=False))
df_plot['code'] = df_plot.index.map(c_codes)

In [71]:
#Plot with plotly choropleth mapper
fig = go.Figure(data=go.Choropleth(
    locations = df_plot['code'],
    z = df_plot['species'],
    text = df_plot.index,
    colorscale = 'Blues',
    autocolorscale=False,
    reversescale=False,
    marker_line_color='darkgray',
    marker_line_width=0.5,
    colorbar_title = '# of Species'
))

fig.show()

As you can tell, this map is a little misleading as to what parts of the New World these plants are found. In reality, _Ayenia_ is only found from the southernmost part of the US to the northernmost part of Argentina. By coloring an entire country, it makes it look like you could potentially find these countries way outside of their distribution range. I have some more issues with this plotly choropleth that could probably fixed with tweaking a lot more parameters, but since this isn't really showing what I want anyway, I'm going to leave it as it is. 

Now to the second option for mapping:
2. Since we have the GPS points for each record, we can count how many different species are found in rasterized bins across the range. 