# Import Software Libraries

In [1]:
import datetime as dt  # Work with datetime types.
import os  # Interact with the operating system.
import sys  # Read system parameters.
import warnings  # Ignore warnings
from io import StringIO  # Read and write strings as files
from urllib.parse import urlparse  # Parse url and extract path component

import matplotlib
import matplotlib.pyplot as plt
import numpy as np  # Work with multi-dimensional arrays and matrices.
import pandas as pd  # Manipulate and analyze data frames.
import requests  # Send HTTP requests
import sklearn  # Perform feature engineering and machine learning.
import statsmodels  # Perform statistical modeling.

# Summarize software libraries used.
print("Libraries used in this project:")
print("- NumPy {}".format(np.__version__))
print("- pandas {}".format(pd.__version__))
print("- scikit-learn {}".format(sklearn.__version__))
print("- statsmodels {}".format(statsmodels.__version__))
print("- Matplotlib {}".format(matplotlib.__version__))
print("- requests {}".format(requests.__version__))
print("- statsmodels {}".format(statsmodels.__version__))
print("- Python {}\n".format(sys.version))

Libraries used in this project:
- NumPy 1.24.3
- pandas 2.0.3
- scikit-learn 1.3.2
- statsmodels 0.14.1
- Matplotlib 3.7.2
- requests 2.32.3
- statsmodels 0.14.1
- Python 3.8.18 | packaged by conda-forge | (default, Dec 23 2023, 17:23:49) 
[Clang 15.0.7 ]



# Load Dataset

In [2]:
# Capture URL
url = "https://raw.githubusercontent.com/tyrantdavis/datasets/refs/heads/main/economics_data.csv"
request = requests.get(url).text

# Save as dataframe and copy
original = pd.read_csv(StringIO(request))
df = original.copy()

# Parse URL
parsed_url = urlparse(url)
path = parsed_url.path
filename = os.path.basename(path)


print(f"Loaded {len(df)} records from {filename}.")

Loaded 123 records from economics_data.csv.


# Get Acquainted with the Dataset

**Display Dataframe Summary**

In [3]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 123 entries, 0 to 122
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   date    123 non-null    object 
 1   rgnp    123 non-null    float64
 2   pgnp    123 non-null    float64
 3   ulc     123 non-null    float64
 4   gdfco   123 non-null    float64
 5   gdf     123 non-null    float64
 6   gdfim   123 non-null    float64
 7   gdfcf   123 non-null    float64
 8   gdfce   123 non-null    float64
dtypes: float64(8), object(1)
memory usage: 8.8+ KB
None


**Spotlights** 

This dataset consists of 123 rows and 9 columns. All columns, with the exception of the date column, contain float values, and there seems to be no missing data present.

**Columns** 

- **date**, which records each observation every 3 months. In other words, these are quarterly
observations for each year.
- **rgnp**, the real gross national product (GNP), the value of all products and services produced by
a country's residents. In recent times, GNP has largely been replaced by the gross national
income (GNI), which determines the same thing but using a different calculation.
- **pgnp**, the potential GNP. Potential GNP uses a constant rate of inflation, whereas real GNP
uses a variable rate of inflation. Potential GNP is therefore an estimate, and is often used to
measure the GNP of the following fiscal quarter.
- **ulc**, the unit labor cost (ULC). ULC measures labor productivity by calculating the average cost
of labor per unit of output.
- **gdfco**, the fixed weight deflator for personal consumption expenditure (excluding food and
energy). In GNP/GDI, a deflator is a measure of both price inflation and deflation.
- **gdf**, the fixed weight deflator for the GNP. This deflator covers the entire GNP, rather than just
one aspect.
- **gdfim**, the fixed weight deflator for imports.
- **gdfcf**, the fixed weight deflator for food in personal consumption expenditures.
- **gdfce**, the fixed weight deflator for energy in personal consumption expenditures.

**Display First 10 Records**

In [4]:
print(df.head(10))

         date    rgnp    pgnp   ulc  gdfco   gdf  gdfim  gdfcf  gdfce
0  1959-01-01  1606.4  1608.3  47.5   36.9  37.4   26.9   32.3   23.1
1  1959-04-01  1637.0  1622.2  47.5   37.4  37.5   27.0   32.2   23.4
2  1959-07-01  1629.5  1636.2  48.7   37.6  37.6   27.1   32.4   23.4
3  1959-10-01  1643.4  1650.3  48.8   37.7  37.8   27.1   32.5   23.8
4  1960-01-01  1671.6  1664.6  49.1   37.8  37.8   27.2   32.4   23.8
5  1960-04-01  1666.8  1679.0  49.6   38.0  38.0   27.4   32.8   23.9
6  1960-07-01  1668.4  1693.5  50.0   38.1  38.1   27.4   32.9   24.1
7  1960-10-01  1654.1  1708.2  50.2   38.2  38.2   27.2   33.2   24.2
8  1961-01-01  1671.3  1722.9  50.1   38.2  38.2   27.2   33.2   24.2
9  1961-04-01  1692.1  1737.8  49.8   38.3  38.2   27.2   33.2   24.2


The article from which this dataset is derived aims to analyze the causal relationship between wages and prices. In this machine learning project, the focus is on predicting these variables for multiple quarters following the last recorded observation. Before proceeding, it is essential to prepare the time series data.

# Transform the 'Date' column into a datetime index.

The next steps are to properly format the **date** column and convert it into the index of the dataframe. 

In [5]:
df.index = pd.to_datetime(df["date"])
df.index = df.index.to_period("M")
df.drop(["date"], axis=1, inplace=True)
df

Unnamed: 0_level_0,rgnp,pgnp,ulc,gdfco,gdf,gdfim,gdfcf,gdfce
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1959-01,1606.4,1608.3,47.5,36.9,37.4,26.9,32.3,23.1
1959-04,1637.0,1622.2,47.5,37.4,37.5,27.0,32.2,23.4
1959-07,1629.5,1636.2,48.7,37.6,37.6,27.1,32.4,23.4
1959-10,1643.4,1650.3,48.8,37.7,37.8,27.1,32.5,23.8
1960-01,1671.6,1664.6,49.1,37.8,37.8,27.2,32.4,23.8
...,...,...,...,...,...,...,...,...
1988-07,4042.7,3971.9,179.6,131.5,124.9,106.2,123.5,92.8
1988-10,4069.4,3995.8,181.3,133.3,126.2,107.3,124.9,92.9
1989-01,4106.8,4019.9,184.1,134.8,127.7,109.5,126.6,94.0
1989-04,4132.5,4044.1,186.1,134.8,129.3,111.1,129.0,100.6


- The **date** column has been transformed.
- The first recorded observation dates back to January 1959, while the most recent observation occurred in July 1989.