# 1. Importing Libraries

In this section, we will import the libraries that we will use in this project. We will import the following libraries:

- numpy for numerical computing
- pandas for data manipulation
- matplotlib for plotting
- seaborn for plotting
- sklearn for machine learning
- scipy for scientific computing

In [469]:
# Imports
import numpy as np
import pandas as pd

import sklearn
import seaborn as sns
import matplotlib.pyplot as plt

# 2. Loading the Data

In this section, we will load the data that we will use in this project. We will load the data from the CSV files using the pandas library. We will load the data into a pandas DataFrame.

In [470]:
# Define path to data
file_path = "Data/"

# Load the data
crop_yield_df = pd.read_csv(file_path + "Crops production indicators - FAOSTAT_data_en_2-22-2024.csv")
fertiliser_use_df = pd.read_csv(file_path + "Fertilizers use - FAOSTAT_data_en_2-27-2024.csv")
land_temperature_change_df = pd.read_csv(file_path + "Land temperature change - FAOSTAT_data_en_2-27-2024.csv")
pesticides_use_df = pd.read_csv(file_path + "Pesticides use - FAOSTAT_data_en_2-27-2024.csv")
crop_value_df = pd.read_csv(file_path + "Food trade indicators - FAOSTAT_data_en_2-22-2024.csv")
land_use_df = pd.read_csv(file_path + "Land use - FAOSTAT_data_en_2-22-2024.csv")

  land_use_df = pd.read_csv(file_path + "Land use - FAOSTAT_data_en_2-22-2024.csv")


# 3. Data Preparation

In this section, we will prepare the data for analysis. For each of the dataframes, will perform the following steps:
- Explore the data
- Check for missing values
- Check for duplicate rows
- Drop unnecessary columns
- Group data if necessary to get totals and averages

## 3.1. Crop Yield Data

In [471]:
# Display the first few rows of the crop yield data
crop_yield_df.head()

# Check for missing values
crop_yield_df.isnull().sum()

# Check for duplicate rows
crop_yield_df.duplicated().sum()

# Drop unnecessary columns
crop_yield_df = crop_yield_df.drop(columns=["Domain", "Domain Code", "Element Code", "Element", "Year Code", "Unit", "Flag", "Flag Description", "Note"])

#Display the first few rows of the crop yield data
crop_yield_df.head()

Unnamed: 0,Area Code (M49),Area,Item Code (CPC),Item,Year,Value
0,4,Afghanistan,F1717,"Cereals, primary",2000,8063
1,4,Afghanistan,F1717,"Cereals, primary",2001,10067
2,4,Afghanistan,F1717,"Cereals, primary",2002,16698
3,4,Afghanistan,F1717,"Cereals, primary",2003,14580
4,4,Afghanistan,F1717,"Cereals, primary",2004,13348


## 3.2. Fertiliser Use Data

In [472]:
# Display the first few rows of the fertiliser use data
fertiliser_use_df.head()

# Check for missing values
fertiliser_use_df.isnull().sum()

# Check for duplicate rows
fertiliser_use_df.duplicated().sum()

# Drop unnecessary columns
fertiliser_use_df = fertiliser_use_df.drop(columns=["Domain", "Domain Code", "Element Code", "Element", "Year Code", "Unit", "Flag", "Flag Description", "Item Code", "Item"])

# Group data by country and year to get the total fertiliser use
fertiliser_use_df = fertiliser_use_df.groupby(["Area", "Year", "Area Code (M49)"]).sum().reset_index()

# Display the first few rows of the fertiliser use data
fertiliser_use_df.head()

Unnamed: 0,Area,Year,Area Code (M49),Value
0,Afghanistan,2002,4,17900.0
1,Afghanistan,2003,4,33200.0
2,Afghanistan,2004,4,90000.0
3,Afghanistan,2005,4,20577.0
4,Afghanistan,2006,4,68253.0


## 3.3. Land Temperature Change Data

In [473]:
# Display the first few rows of the land temperature change data
land_temperature_change_df.head()

# Filter land temperature change to get only meteorological year
land_temperature_change_df = land_temperature_change_df[land_temperature_change_df["Months"] == "Meteorological year"]

# Split df into two dataframes: one for temperature change and one for standard deviation
land_temperature_change_df = land_temperature_change_df[land_temperature_change_df["Element"] == "Temperature change"]
land_temperature_change_std_df = land_temperature_change_df[land_temperature_change_df["Element"] == "Standard Deviation"]

# Drop unnecessary columns
land_temperature_change_df = land_temperature_change_df.drop(columns=["Domain", "Domain Code", "Element Code", "Element", "Year Code", "Unit", "Flag", "Flag Description", "Months", "Months Code"])

# If all values for the country are missing, drop the country
land_temperature_change_df = land_temperature_change_df.groupby('Area').filter(lambda x: x[['Value']].notna().any().any())

# Impute missing values with the mean for the country
land_temperature_change_df['Value'] = land_temperature_change_df.groupby('Area',)['Value'].transform(lambda x: x.fillna(x.mean()))

# Check for missing values
land_temperature_change_df.isnull().sum()

Area Code (M49)    0
Area               0
Year               0
Value              0
dtype: int64

## 3.4. Pesticides Use Data

In [474]:
# Display the first few rows of the pesticides use data
pesticides_use_df.head()

# Get only the total pesticides used
pesticides_use_df = pesticides_use_df[pesticides_use_df["Item Code"] == 1357]

# Drop unnecessary columns
pesticides_use_df = pesticides_use_df.drop(columns=["Domain", "Domain Code", "Element Code", "Element", "Year Code", "Flag", "Flag Description", "Item Code", "Item", "Note"])

# Check for missing values
pesticides_use_df.isnull().sum()

# Check for duplicate rows
pesticides_use_df.duplicated().sum()

0

## 3.5. Crop Value Data

In [475]:
# Display the first few rows of the crop value data
crop_value_df.head()

# Drop unnecessary columns
crop_value_df = crop_value_df.drop(columns=["Domain", "Domain Code",  "Year Code", "Flag", "Flag Description", "Note"])

# Check for missing values
crop_value_df.isnull().sum()

# Split the data into two dataframes: one for imports and one for exports
crop_value_imports_df = crop_value_df[crop_value_df["Element"] == "Import Value"]
crop_value_exports_df = crop_value_df[crop_value_df["Element"] == "Export Value"]

# Drop unnecessary columns
crop_value_imports_df = crop_value_imports_df.drop(columns=["Element", "Element Code"])
crop_value_exports_df = crop_value_exports_df.drop(columns=["Element", "Element Code"])

## 3.6. Land Use Data

In [476]:
# Display the first few rows of the land use data
land_use_df.head()

# Drop unnecessary columns
land_use_df = land_use_df.drop(columns=["Domain", "Domain Code", "Year Code", "Flag", "Flag Description", "Note", "Element", "Element Code"])

land_use_df.head()

Unnamed: 0,Area Code (M49),Area,Item Code,Item,Year,Unit,Value
0,4,Afghanistan,6600,Country area,1980,1000 ha,65286.0
1,4,Afghanistan,6600,Country area,1981,1000 ha,65286.0
2,4,Afghanistan,6600,Country area,1982,1000 ha,65286.0
3,4,Afghanistan,6600,Country area,1983,1000 ha,65286.0
4,4,Afghanistan,6600,Country area,1984,1000 ha,65286.0
