# Capstone project for ZAI ORTIZ

![Course Hero](images/hero.png)

## Introduction

The project I chose will examine whether or not we are approaching a recession. It will examine past recession data to determine whether or not the same patterns are present in those economies today. 

The questions this project will answer are the following:
 
 *Will there be a recession in the next year?*

## Data Set Selection

This project contains the following dependencies:
1. Common recession indicators and models
2. Past recession data 
3. Today's data to compare to the past
4. Predictive analysis to compare today's trends to past trends 

To complete this project I will use the following databases: 
1. [Federal Reserve Bank of New York](https://www.newyorkfed.org/research/capital_markets/ycfaq#/overview)
2. [Fred Economic Data](https://fred.stlouisfed.org/)

The steps I will take to complete this project are the following: 
1. Generate a hypothesis
2. Use the Yield Curve Model
3. Find data to perform a Yield Curve Model analysis
4. Examine the data
5. Perform the analysis
6. Confirm whether or not the hypothesis was correct
7. Report findings

## Data Examination

Let's start with the imports for the notebook.

Note: Remember to add in the `requirements.txt` file all the modules you use.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Get the selected data set into a pandas Dataframe.

Note: You need to add the right method to load the data.

In [3]:
df = pd.read_clipboard()

In [13]:
import pandas as pd

treasury_data = pd.read_csv("file:///Users/zaira.ortiz/Downloads/allmonth (1) - rec_prob.csv")

Find relevant information about the selected dataset.

- How many rows and columns does it has?
- Which characteristics does each column has?
    - Data type
    - Minimum and maximum values
    - Values distribution
    - Missing data
- Which columns are related or are dependent on each other? 
    - Which ones can be derived?
    - Which are good candidates for an hypothesis?

Note: Use pandas methods as shape, head, sample, groupby, describe and any other you can think of!

How many rows and columns does the data have?

In [14]:
print("Shape: ", treasury_data.shape)

Shape:  (775, 7)


What are the data types?

In [15]:
treasury_data.info(memory_usage="deep")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 775 entries, 0 to 774
Data columns (total 7 columns):
 #   Column                                          Non-Null Count  Dtype  
---  ------                                          --------------  -----  
 0   Date                                            775 non-null    object 
 1   10 Year Treasury Yield                          763 non-null    float64
 2   3 Month Treasury Yield                          763 non-null    float64
 3   3 Month Treasury Yield (Bond Equivalent Basis)  763 non-null    float64
 4   Spread                                          763 non-null    float64
 5   Rec_prob                                        763 non-null    object 
 6   NBER_Rec                                        763 non-null    float64
dtypes: float64(5), object(2)
memory usage: 127.3 KB


What are the Minimum and Maximum Values?

In [16]:
treasury_data.max(axis=0)

  treasury_data.max(axis=0)


Date                                              9/30/2022
10 Year Treasury Yield                                15.32
3 Month Treasury Yield                                 16.3
3 Month Treasury Yield (Bond Equivalent Basis)        17.24
Spread                                                 4.15
NBER_Rec                                                1.0
dtype: object

In [17]:
treasury_data.min(axis=0)

  treasury_data.min(axis=0)


Date                                              1/31/2018
10 Year Treasury Yield                                 0.62
3 Month Treasury Yield                                 0.01
3 Month Treasury Yield (Bond Equivalent Basis)         0.01
Spread                                                -3.51
NBER_Rec                                                0.0
dtype: object

Values Distribution

In [18]:
treasury_data.describe()

Unnamed: 0,10 Year Treasury Yield,3 Month Treasury Yield,3 Month Treasury Yield (Bond Equivalent Basis),Spread,NBER_Rec
count,763.0,763.0,763.0,763.0,763.0
mean,5.836081,4.347575,4.483552,1.352529,0.124509
std,2.945843,3.179119,3.317027,1.245848,0.330377
min,0.62,0.01,0.01,-3.51,0.0
25%,3.89,1.865,1.895,0.48,0.0
50%,5.5,4.43,4.54,1.4,0.0
75%,7.54,6.01,6.19,2.35,0.0
max,15.32,16.3,17.24,4.15,1.0


In [19]:
treasury_data.describe(include="object")

Unnamed: 0,Date,Rec_prob
count,775,763
unique,775,646
top,31-Jan-59,0.21%
freq,1,6


Missing data

In [20]:
print (treasury_data.isnull().sum(axis=0))

Date                                               0
10 Year Treasury Yield                            12
3 Month Treasury Yield                            12
3 Month Treasury Yield (Bond Equivalent Basis)    12
Spread                                            12
Rec_prob                                          12
NBER_Rec                                          12
dtype: int64


Which columns are related or independent of each other?

Which can be derived?

Which are good candidates for a hypothesis?

## Define the Hypothesis to test

Decide what is your project about.

What do you want to predict?

I want to predict whether or not there will be a recession in the next 12 months. My hypothesis is yes.

## Clean the data

Create a new Data Frame just with the data you are going to use

In [22]:
yield_data = treasury_data[["10 Year Treasury Yield", "3 Month Treasury Yield (Bond Equivalent Basis)", "Spread"]].copy()

## Run your experiment(s)

Describe what your experiment is done, and execute it.

Note: Be generous with your plots!

In [23]:
print("Shape: ", yield_data.shape)
print("Info:")
yield_data.info(memory_usage="deep")
print("Head:")
yield_data.head()

Shape:  (775, 3)
Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 775 entries, 0 to 774
Data columns (total 3 columns):
 #   Column                                          Non-Null Count  Dtype  
---  ------                                          --------------  -----  
 0   10 Year Treasury Yield                          763 non-null    float64
 1   3 Month Treasury Yield (Bond Equivalent Basis)  763 non-null    float64
 2   Spread                                          763 non-null    float64
dtypes: float64(3)
memory usage: 18.3 KB
Head:


Unnamed: 0,10 Year Treasury Yield,3 Month Treasury Yield (Bond Equivalent Basis),Spread
0,4.02,2.88,1.14
1,3.96,2.76,1.2
2,3.99,2.86,1.13
3,4.12,3.01,1.11
4,4.31,2.9,1.41


In [24]:
yield_data.describe()

Unnamed: 0,10 Year Treasury Yield,3 Month Treasury Yield (Bond Equivalent Basis),Spread
count,763.0,763.0,763.0
mean,5.836081,4.483552,1.352529
std,2.945843,3.317027,1.245848
min,0.62,0.01,-3.51
25%,3.89,1.895,0.48
50%,5.5,4.54,1.4
75%,7.54,6.19,2.35
max,15.32,17.24,4.15


In [26]:
print (yield_data.isnull().sum(axis=0))

10 Year Treasury Yield                            12
3 Month Treasury Yield (Bond Equivalent Basis)    12
Spread                                            12
dtype: int64


In [28]:
yield_data.dropna(subset=["10 Year Treasury Yield"], inplace=True)
print (yield_data.isnull().sum(axis=0))

10 Year Treasury Yield                            0
3 Month Treasury Yield (Bond Equivalent Basis)    0
Spread                                            0
dtype: int64


In [29]:
yield_data.describe(include="all")

Unnamed: 0,10 Year Treasury Yield,3 Month Treasury Yield (Bond Equivalent Basis),Spread
count,763.0,763.0,763.0
mean,5.836081,4.483552,1.352529
std,2.945843,3.317027,1.245848
min,0.62,0.01,-3.51
25%,3.89,1.895,0.48
50%,5.5,4.54,1.4
75%,7.54,6.19,2.35
max,15.32,17.24,4.15


In [32]:
yield_data.head(20)

Unnamed: 0,10 Year Treasury Yield,3 Month Treasury Yield (Bond Equivalent Basis),Spread
0,4.02,2.88,1.14
1,3.96,2.76,1.2
2,3.99,2.86,1.13
3,4.12,3.01,1.11
4,4.31,2.9,1.41
5,4.34,3.28,1.06
6,4.4,3.27,1.13
7,4.43,3.46,0.97
8,4.68,4.14,0.54
9,4.53,4.15,0.38


In [34]:
treasury_data.loc[(['Date'] == '31-Jul-12')]

KeyError: 'False: boolean label can not be used without a boolean index'

## Reach a conclusion

What was the result of your experiment?

How can it be improved?

Elaborate in one thing you learn during the capstone project.

## Congratulations

You have finished the bootcamp!

![Congratulations](images/congratulations.jpg)