## Adding New Data to Our Dataframe
In this section we will add two months that were missing from the original data (November and December 2017). We'll import the data as csv, format all columns so they match our main dataframe and then append the new dataframe to the old dataframe.

Many of the steps are similar to what we needed to do with our original dataframe. The merging is one of the final steps with the `.append` function.

In [2]:
import pandas as pd

In [80]:
df = pd.read_csv("C:/Users/user/Dropbox/Data Analysis/Portfolio/Data Sets/Deforestation/Monitoring_Data_First_Step.csv",
                index_col = 0)

# Let's import the data
miss_data = pd.read_csv("C:/Users/user/Dropbox/Data Analysis/Portfolio/Data Sets/Deforestation/Nov_Dec_2017_Guyra.csv")

In [54]:
# Let's check to see if all months are named correctly:
miss_data.groupby("Month").count()

Unnamed: 0_level_0,Year,mth_num,Country,Prov_Depto,Detpo_Distr_Mun,Deforestation (ha)
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
December,77,77,77,77,77,77
November,73,73,73,73,73,73


In [81]:
# Let's rename "Deforestation (ha)" column to "Deforestation_ha"
miss_data.rename(index = str, columns = {"Deforestation (ha)": "Deforestation_ha"}, inplace = True)

# Drop the mth_num column, since we won't use it. 
miss_data.drop("mth_num", axis = 1, inplace = True)

In [82]:
miss_data.head()

Unnamed: 0,Year,Month,Country,Prov_Depto,Detpo_Distr_Mun,Deforestation_ha
0,2017,November,Argentina,Catamarca,La Paz,144
1,2017,November,Argentina,Chaco,9 de Julio,363
2,2017,November,Argentina,Chaco,Almirante Brown,2548
3,2017,November,Argentina,Chaco,Comandante Fernández,144
4,2017,November,Argentina,Chaco,General Belgrano,263


In [22]:
# Let's run this function to convert month names into numbers:
def month_to_number (month):
    name = {
    "January": '01-31',
    "February": '02-28',
    "March": '03-31',
    "April": '04-30',
    "May": '05-31',
    "June": '06-30',
    "July": '07-31',
    "August": '08-31',
    "September": '09-30',
    "October": '10-31',
    "November": '11-30',
    "December": '12-31'
    }
    return(name[month])

In [83]:
# import month_to_number as month_to_number

# Create a month_num column
miss_data["month_day"] = miss_data["Month"].apply(lambda x: month_to_number(x)) 

In [84]:
# Create the "Date" column by combining "month_num" with "Year"
miss_data["Date"] = miss_data["Year"].map(str) + "-" + miss_data["month_day"]

In [85]:
# Let's create a column with matplotlib dates
from datetime import datetime
import matplotlib.pyplot as plt

miss_data["Date"] = miss_data["Date"].apply(lambda x: datetime.strptime(x, "%Y-%m-%d"))
miss_data["date_num"] = plt.matplotlib.dates.date2num(miss_data["Date"])

Let's check on the columns to see how we need to rearrange them:

In [103]:
miss_data.columns.tolist()

['Year',
 'Month',
 'Country',
 'Prov_Depto',
 'Detpo_Distr_Mun ',
 'Deforestation_ha',
 'month_day',
 'Date',
 'date_num']

In [87]:
# Changing column positions

cols = miss_data.columns.tolist()
cols = cols = [cols[7]] + [cols[0]] + [cols[1]] + [cols[6]] + cols[2:6] + [cols[-1]]
missing = miss_data[cols]

Now we can confirm that both dataframes look the same, by double checking to see if column names are the same:

In [105]:
dnames = df.columns.tolist()
missnames = missing.columns.tolist()
dnames == missnames

True

In [88]:
missing.head()

Unnamed: 0,Date,Year,Month,month_day,Country,Prov_Depto,Detpo_Distr_Mun,Deforestation_ha,date_num
0,2017-11-30,2017,November,11-30,Argentina,Catamarca,La Paz,144,736663.0
1,2017-11-30,2017,November,11-30,Argentina,Chaco,9 de Julio,363,736663.0
2,2017-11-30,2017,November,11-30,Argentina,Chaco,Almirante Brown,2548,736663.0
3,2017-11-30,2017,November,11-30,Argentina,Chaco,Comandante Fernández,144,736663.0
4,2017-11-30,2017,November,11-30,Argentina,Chaco,General Belgrano,263,736663.0


In [70]:
df.head()

Unnamed: 0,Date,Year,Month,month_day,Country,Prov_Depto,Detpo_Distr_Mun,Deforestation_ha,date_num
0,2012-03-31,2012,March,03-31,Argentina,Catamarca,La Paz,105.0,734593.0
1,2012-03-31,2012,March,03-31,Argentina,Catamarca,Santa Rosa,290.3,734593.0
2,2012-03-31,2012,March,03-31,Argentina,Chaco,12 de Octubre,9.6,734593.0
3,2012-03-31,2012,March,03-31,Argentina,Chaco,Almirante Brown,2004.7,734593.0
4,2012-03-31,2012,March,03-31,Argentina,Chaco,General Güemes,478.8,734593.0


### Merging Both Dataframes
Finally, let's merge both dataframes.
There are several ways we could do this, but the `pd.append` method is very useful 
when you want to concatenate dataframes along the axis=0, i.e. the index.

In [106]:
new_df = df.append(missing)

One final check confirms that the `len(new_df)` is equal to `len(df)` + `len(missing)`.

In [99]:
len(df)
print("Length of df is", len(df))
print ("Length of missing is", len(missing))
print ("Length of new_df should be", len(df)+len(missing))
print ("Length of new_df is", len(new_df))

Length of df is 3872
Length of missing is 150
Length of new_df should be 4022
Length of new_df is 4022


### Saving To File
Now we can save to file:
Let's save `new_df` as a .csv file

In [100]:
new_df.to_csv("C:/Users/user/Dropbox/Data Analysis/Portfolio/Data Sets/Deforestation/Monitoring_Data_Processed.csv")