# Predictive models that can forecast the impact of exchange rate and inflation rate on our country's Gross Domestic Product (GDP) fluctuations.

## Problem Statement:
The objective of this project is to develop a predictive model that can forecast the impact of exchange rate and inflation rate on our country's Gross Domestic Product (GDP) fluctuations. The model aims to leverage historical data on exchange rates, inflation rates, and corresponding GDP growth to provide insights into the relationship between these economic indicators and GDP performance.

## Justification for Model Selection:
1. Linear Regression:
   Linear regression is chosen due to its simplicity and interpretability. It assumes a linear relationship between the features (exchange rate and inflation rate) and the target variable (GDP). As our initial approach, this model allows us to establish a baseline for prediction and understand the direct impact of exchange rate and inflation rate on GDP fluctuations.

2. Support Vector Machine Regression (SVR):
   SVR is employed as it can handle non-linear relationships between the features and the target variable. By tuning hyperparameters like the kernel, 'C', and 'epsilon', SVR seeks to optimize its performance. Since economic relationships may not always be linear, SVR offers a more flexible approach to capture complex interactions between exchange rate, inflation rate, and GDP fluctuations.

3. Decision Trees and Random Forests:
   Decision trees and random forests are utilized because they can handle both regression tasks (to predict GDP fluctuations) and classification tasks (to classify positive/negative GDP growth). These models are capable of capturing non-linear patterns in the data and have the potential to provide interpretable insights into the factors influencing GDP fluctuations.

The selection of these models allows us to explore different aspects of the relationship between exchange rate, inflation rate, and GDP fluctuations. By comparing their performance on testing data, we can identify which model best captures the underlying patterns in the data and provides the most accurate and meaningful predictions for decision-making and economic analysis.

Through the predictive model, we aim to provide stakeholders with valuable insights into how exchange rate and inflation rate impact our country's GDP fluctuations. This can help policymakers, businesses, and investors make informed decisions, anticipate economic trends, and implement effective strategies to manage economic risks and opportunities. As the model is refined through hyperparameter tuning and evaluation, its ability to forecast GDP fluctuations accurately will be enhanced, contributing to robust economic planning and development.

In [16]:
import pandas as pd

# Load the datasets
gdp_data = pd.read_csv("Annual GDP (1).csv")
inflation_data = pd.read_csv("Inflation Rates.csv")
exchange_rates_data = pd.read_csv("Key CBK Indicative Exchange Rates (1).csv")

# Display the first few rows of each dataset to understand their structure
print(gdp_data.head())
print(inflation_data.head())
print(exchange_rates_data.head())

   Year Nominal GDP prices (Ksh Million)  Annual GDP growth (%)  \
0  2022                       13,368,340                    4.8   
1  2021                       12,027,662                    7.6   
2  2020                       10,715,070                   -0.3   
3  2019                       10,237,727                    5.1   
4  2018                        9,340,307                    5.6   

  Real GDP prices (Ksh Million)  
0                     9,851,329  
1                     9,395,942  
2                     8,733,060  
3                     8,756,946  
4                     8,330,891  
   Year     Month  Annual Average Inflation  12-Month Inflation
0  2023       May                      8.78                8.03
1  2023      June                      8.77                7.88
2  2023     April                      8.71                7.90
3  2023     March                      8.59                9.19
4  2023  February                      8.30                9.23
       Da

### Data Preprocessing and Merging

In [17]:
# 2.1: Merge GDP and Inflation Data based on 'Year'
merged_data = pd.merge(gdp_data, inflation_data, on='Year', how='inner')

# 2.2: Convert 'Date' column to 'Year' in Exchange Rates Data
exchange_rates_data['Year'] = pd.to_datetime(exchange_rates_data['Date']).dt.year

# 2.3: Merge Exchange Rates Data based on 'Year'
merged_data = pd.merge(merged_data, exchange_rates_data, on='Year', how='inner')

  cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  cache_array = _maybe_cache(arg, format, cache, convert_listl

### Step 3: Prepare the Features and Target Variable

In [18]:
# 3.1: Define Features (Exchange Rate, Inflation Rate) and Target Variable (GDP Growth)
features = merged_data[['Mean', 'Annual Average Inflation']]
target = merged_data['Annual GDP growth (%)']

### Step 4: Data Normalization or Scaling (if needed)

In [19]:
from sklearn.preprocessing import StandardScaler

# Normalize the features (Exchange Rate and Inflation Rate)
scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)

### Step 5: Data Splitting (Training and Testing Sets)

In [20]:
from sklearn.model_selection import train_test_split

# Split the dataset into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(scaled_features, target, test_size=0.2, random_state=42)

## Model Selection and Training:

### a. Linear Regression:

We first import the necessary modules from scikit-learn, including the LinearRegression model for linear regression and the mean_squared_error and r2_score functions for evaluation metrics.

We then create a LinearRegression object as linear_model and train it using the training data X_train and y_train. After training, we use the model to predict the GDP growth on the testing data, obtaining y_pred.

Finally, we calculate the Mean Squared Error (MSE) and R-squared (R2) to evaluate the model's performance on the testing data. The lower the MSE and the closer the R2 to 1, the better the model's predictive performance.

In [21]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Create a Linear Regression model
linear_model = LinearRegression()

# Train the model using the training data
linear_model.fit(X_train, y_train)

# Predict the GDP growth on the testing data
y_pred = linear_model.predict(X_test)

# Evaluate the model's performance using Mean Squared Error (MSE) and R-squared
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Linear Regression Model Performance:")
print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"R-squared (R2): {r2:.2f}")

Linear Regression Model Performance:
Mean Squared Error (MSE): 0.00
R-squared (R2): 0.00


### b. Logistic Regression:
We first converted the GDP growth to binary labels using NumPy's where function. We created a new binary target variable y_train_binary and y_test_binary based on whether GDP growth is positive (1) or negative (0).

Next, we created a LogisticRegression object as logistic_model and train it using the training data X_train and y_train_binary. After training, we use the model to predict the binary labels on the testing data, obtaining y_pred_binary.

Finally, we evaluated the model's performance using accuracy, confusion matrix, and classification report. The accuracy represents the proportion of correctly predicted binary labels, while the confusion matrix and classification report provide more detailed information about true positives, true negatives, false positives, and false negatives.

In [22]:
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Convert GDP growth to binary labels: 1 for positive growth and 0 for negative growth
y_train_binary = np.where(y_train >= 0, 1, 0)
y_test_binary = np.where(y_test >= 0, 1, 0)

# Check if both classes (0 and 1) are present in the binary labels
if np.unique(y_train_binary).size == 2 and np.unique(y_test_binary).size == 2:
    # Create a Logistic Regression model
    logistic_model = LogisticRegression()

    # Train the model using the training data
    logistic_model.fit(X_train, y_train_binary)

    # Predict the binary labels on the testing data
    y_pred_binary = logistic_model.predict(X_test)

    # Evaluate the model's performance using accuracy, confusion matrix, and classification report
    accuracy = accuracy_score(y_test_binary, y_pred_binary)
    conf_matrix = confusion_matrix(y_test_binary, y_pred_binary)
    class_report = classification_report(y_test_binary, y_pred_binary)

    print("Logistic Regression Model Performance:")
    print(f"Accuracy: {accuracy:.2f}")
    print("Confusion Matrix:")
    print(conf_matrix)
    print("Classification Report:")
    print(class_report)
else:
    print("Binary classification is not possible as the data contains samples from only one class.")

Binary classification is not possible as the data contains samples from only one class.


### c. Support Vector Machines (SVM):

SVM can be used for both regression (SVR) and classification (SVC) tasks.

We first created an SVR object as svr_model. We define a hyperparameter grid param_grid containing different kernel options ('linear', 'poly', 'rbf'), different values of regularization parameter 'C', and different values of the epsilon parameter for controlling the width of the epsilon-insensitive zone in the SVR loss function.

We used GridSearchCV to perform a grid search with 5-fold cross-validation to find the best hyperparameters for the SVR model based on negative mean squared error (neg_mean_squared_error) as the scoring metric.

After training, we got the best SVR model (best_svr_model) with the optimized hyperparameters. We then used this model to predict the GDP growth on the testing data and evaluate its performance using Mean Squared Error (MSE) and R-squared.

The grid search helps us find the optimal combination of hyperparameters for the SVR model, allowing us to achieve the best possible performance on the given dataset. The best hyperparameters are displayed at the end of the output.

In [23]:
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import GridSearchCV

# Create an SVR model
svr_model = SVR()

# Define the hyperparameter grid for tuning
param_grid = {
    'kernel': ['linear', 'poly', 'rbf'],
    'C': [0.1, 1, 10],
    'epsilon': [0.01, 0.1, 0.2]
}

# Perform grid search to find the best hyperparameters using cross-validation
grid_search = GridSearchCV(svr_model, param_grid, cv=5, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)

# Get the best SVR model from the grid search
best_svr_model = grid_search.best_estimator_

# Predict the GDP growth on the testing data
y_pred = best_svr_model.predict(X_test)

# Evaluate the model's performance using Mean Squared Error (MSE) and R-squared
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Support Vector Machine Regression (SVR) Model Performance:")
print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"R-squared (R2): {r2:.2f}")
print("Best Hyperparameters:")
print(grid_search.best_params_)

Support Vector Machine Regression (SVR) Model Performance:
Mean Squared Error (MSE): 0.00
R-squared (R2): 1.00
Best Hyperparameters:
{'C': 0.1, 'epsilon': 0.01, 'kernel': 'linear'}


### d. Decision Trees and Random Forests:
We started by importing the necessary modules from scikit-learn. We imported DecisionTreeRegressor for the decision tree regression model and mean_squared_error and r2_score for evaluation metrics.

We created a decision tree regression model using the DecisionTreeRegressor class. We set the random_state parameter to 42 to ensure reproducibility of results

We trained the decision tree regression model using the fit method. The training data X_train contains the normalized features (Exchange Rate and Inflation Rate), and y_train contains the target variable (GDP growth).

We used the trained decision tree model to predict the GDP growth on the testing data X_test, obtaining the predicted values as y_pred_decision_tree

In [24]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Create a Decision Tree Regression model
decision_tree_model = DecisionTreeRegressor(random_state=42)

# Train the model using the training data
decision_tree_model.fit(X_train, y_train)

# Predict the GDP growth on the testing data
y_pred_decision_tree = decision_tree_model.predict(X_test)

# Evaluate the model's performance using Mean Squared Error (MSE) and R-squared
mse_decision_tree = mean_squared_error(y_test, y_pred_decision_tree)
r2_decision_tree = r2_score(y_test, y_pred_decision_tree)

print("Decision Tree Regression Model Performance:")
print(f"Mean Squared Error (MSE): {mse_decision_tree:.2f}")
print(f"R-squared (R2): {r2_decision_tree:.2f}")

Decision Tree Regression Model Performance:
Mean Squared Error (MSE): 0.00
R-squared (R2): -3013695.00


#### Next, we trained the random forest regression model:
We imported the necessary libraries: RandomForestRegressor from scikit-learn, which is the class representing the Random Forest regression model, and the evaluation metrics mean_squared_error and r2_score.

We created an instance of the RandomForestRegressor class called random_forest_model. The random_state=42 argument sets the random seed for reproducibility of results.

We then trained the model using the fit method, passing the training feature data X_train and the corresponding target variable (GDP growth) y_train.

We used the trained Random Forest model (random_forest_model) to predict the GDP growth on the testing data X_test, and the predictions are stored in the array y_pred_random_forest.

Next, we evaluated the model's performance using two metrics:

    1. Mean Squared Error (MSE): It measures the average squared difference between the predicted values (y_pred_random_forest) and the true values (y_test). Lower MSE values indicate better performance.
    
    2. R-squared (R2): It measures the proportion of variance in the target variable (GDP growth) explained by the model. Higher R2 values closer to 1 indicate a better fit.

Finally, the code prints the evaluation results, displaying the Mean Squared Error (MSE) and R-squared (R2) for the Random Forest regression model.

In [25]:
from sklearn.ensemble import RandomForestRegressor

# Create a Random Forest Regression model
random_forest_model = RandomForestRegressor(random_state=42)

# Train the model using the training data
random_forest_model.fit(X_train, y_train)

# Predict the GDP growth on the testing data
y_pred_random_forest = random_forest_model.predict(X_test)

# Evaluate the model's performance using Mean Squared Error (MSE) and R-squared
mse_random_forest = mean_squared_error(y_test, y_pred_random_forest)
r2_random_forest = r2_score(y_test, y_pred_random_forest)

print("Random Forest Regression Model Performance:")
print(f"Mean Squared Error (MSE): {mse_random_forest:.2f}")
print(f"R-squared (R2): {r2_random_forest:.2f}")

Random Forest Regression Model Performance:
Mean Squared Error (MSE): 0.00
R-squared (R2): -929295.00


## Model Evaluation:

We Evaluated the performance of each model using appropriate metrics (e.g., MSE, R-squared) on the testing data.
Compare the results of different models to identify the one that provides the best predictions for Kenya's GDP fluctuations based on exchange rate and inflation rate data.

In [26]:
# Evaluate Linear Regression model
mse_linear = mean_squared_error(y_test, y_pred)
r2_linear = r2_score(y_test, y_pred)

print("Linear Regression Model Performance:")
print(f"Mean Squared Error (MSE): {mse_linear:.2f}")
print(f"R-squared (R2): {r2_linear:.2f}")

# Evaluate SVR model
mse_svr = mean_squared_error(y_test, best_svr_model.predict(X_test))
r2_svr = r2_score(y_test, best_svr_model.predict(X_test))

print("Support Vector Machine Regression (SVR) Model Performance:")
print(f"Mean Squared Error (MSE): {mse_svr:.2f}")
print(f"R-squared (R2): {r2_svr:.2f}")

# Evaluate Decision Tree Regression model
print("Decision Tree Regression Model Performance:")
print(f"Mean Squared Error (MSE): {mse_decision_tree:.2f}")
print(f"R-squared (R2): {r2_decision_tree:.2f}")

# Evaluate Random Forest Regression model
print("Random Forest Regression Model Performance:")
print(f"Mean Squared Error (MSE): {mse_random_forest:.2f}")
print(f"R-squared (R2): {r2_random_forest:.2f}")

Linear Regression Model Performance:
Mean Squared Error (MSE): 0.00
R-squared (R2): 1.00
Support Vector Machine Regression (SVR) Model Performance:
Mean Squared Error (MSE): 0.00
R-squared (R2): 1.00
Decision Tree Regression Model Performance:
Mean Squared Error (MSE): 0.00
R-squared (R2): -3013695.00
Random Forest Regression Model Performance:
Mean Squared Error (MSE): 0.00
R-squared (R2): -929295.00
