NOTEBOOK BEING SPLIT INTO SEVERAL SMALLER NOTEBOOKS, one for each data source

These notebooks will be turned into modules in the medium term.

This notebook pretreats the following data for use in analysis:
* survey data collected using the infoflora application.
* survey data collected without the infoflora application
* The invasives species list, last updated in 2016, available [here](https://www.infoflora.ch/fr/neophytes/neophytes.html)
* The redlist of endangered species, updated in 2019, avaialble __.
* The list of priority species, updated in ___, available __.
* The welten sutter lists in and around Biel/Bienne
* 20 square kilomters of 5x5 observations centered on Bienne from Infoflora available __

Unneeded columns are dropped, names are homogenized and dictionary keys/codes are created for the different classifications.
* removing extraneous data (mainly columns, some rows)
* linking some other useful information to the data (species conservation status, etc)

In [1]:
# import packages

# math and data packages
import pandas as pd
import numpy as np
import math

# charting and graphics
import matplotlib as mpl
import matplotlib.pyplot as plt
# from matplotlib import colors
import matplotlib.ticker as ticker
import matplotlib.dates as mdates
from matplotlib.gridspec import GridSpec
import matplotlib.image as mpimg

# os and file types
import os
import sys
import datetime as dt
import json
import csv

In [3]:
# redlist preprocess
redlist = pd.read_csv("C:\\Users\\visitor\\Documents\\GitHub\\flora-biel-2022\\data\\species_lists\\rl.csv", sep = ';', encoding = 'latin1')

#turn the redlist criteria for each region into a function of the first letter which denotes the general reason behind its classification.
#at this stage, no need for the rest of the details
mycriteria = ['crit_CH','crit_JU','crit_MP']
mylist = []

for i in redlist.index:
    for j in mycriteria:
        if not pd.isna(redlist.loc[i,j]):
            split = redlist.at[i,j].split(";")
            split[0] = split[0][0]
            if len(split) > 1:
                if (split[1][0] == " ")|(split[1][0] == "\t"):
                    split[1] = split[1][1:]
                split[1] = split[1][0]
            redlist.at[i,j] = split



In [5]:
print(redlist.columns)

Index(['FAMILY', 'GENUS', 'Scientific name', 'Deutscher Name',
       'Nom en français', 'CH', 'crit_CH', 'JU', 'crit_JU', 'MP', 'crit_MP'],
      dtype='object')


In [4]:
# drop columns not relevant, write to new csv file
keep_redlist = ['ID_ISFS','FAMILY','GENUS','Scientific name','CH','crit_CH', 'JU', 'crit_JU', 'MP', 'crit_MP']
red = redlist[keep_redlist].copy()
red.to_csv("output/redlist_preproc.csv")

KeyError: "['ID_ISFS'] not in index"

In [None]:
# keys for interpreting redlist data
redlist_key_places = {'JU':'Jura','MP':'Middle Plateau'}
redlist_key_status = {'EN':'Endangered', 'VU': 'Vulnerable','RE':'Extinct','CR': 'Critically Edangered',
                     'NT': 'Near Threatened','LC':'Least Concern','DD':'Data deficient',
                      'NA':'Not Applicable','NE': 'Not Evaluated'}
redlist_key_criteria = {'A': 'decrease in population size','B':'Habitat fragmentation',
                        'C':'initial small population, decrease','D':'very small habitat/population size'}



# key to interpret infoflora survey data

# key to interpret priority data, to finish
priority_canton_key = {'JU':'Jura','BE':'Bern'}
# priority_laws_key = {'Espèce cible forestière':'','Espèce agricole OEA','Espèce endémique','Espèce Émeraude':'Bern convention','Espèce protégée':'Protected under art. 20'}

In [None]:
# priority species 
priority = pd.read_excel('C:\\Users\\visitor\\Documents\\GitHub\\flora-biel-2022\\data\\species_lists\\ch_priority_species.xlsx',header = 9)

In [None]:
# consolidate habitat data

myhabitats = list(range(1,10))
mylist = []

for i in priority.index:
    mysublist = []
    for j in myhabitats:
        if priority.loc[i,j] == 'x':
            mysublist.append(j)
    mylist.append(mysublist)
priority["habitat"] = mylist

# consolidate provided legal status data

mylaws = ['Waldzielart','Landwirtschaftl. UZL-Art','Endemische Art','Smaragd- Art','Geschützte Art NHV']

mylist = []

for i in priority.index:
    mysublist = []
    for j in mylaws:
        if not pd.isna(priority.loc[i,j]):
            mysublist.append(j)
    mylist.append(mysublist)
priority["protection"] = mylist

In [None]:
keep_priority = ['Taxon ID','Taxon ID InfoSpecies','Taxon Name','Habitatkombination','Jura','Mittelland','habitat','Kollin','Montan','JU','BE','Priorität','Verantwortung','protection']
pri = priority[keep_priority].copy()
pri.to_csv("output/priority_preproc.csv")