**Create a custom dataframe**


# Inflation

Inflation model for analyzing, tracking, and predicting components of inflation.

There are many factors that contribute to the act or process of _Consumer Inflation;_ however, there are only a few methods created for measuring the average change of indexes over time that are strong indicators a possible period of inflation.

One of those indexes is the _[Consumer Price Index](https://www.bls.gov/cpi/)_ (CPI) and the _[Consumer Expenditure Survey](https://data.bls.gov/cex/)_ (CES). CPI is a measure of the price for goods and services over a period of time, while CES is a measure of spending per household. 

**KEY** <br> Within this analysis all datasets will include the following endings:
-    **U** = All urban consumers<br> 
-    **W** = All urban wages<br> 
-    **NF** = Nonfarm cpi<br> 
-    **Gov** = Government

In [4]:
import pandas as pd
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from sklearn.model_selection import train_test_split

In [66]:
# Read in our datasets
print(f'Shape of our datasets:','\n','CES for Government Employees:',ces_gov.shape,'\n',
      'Consumer Price Index for All Urban:',cpi_u.shape,'\n',
      'Consumer Price Index for All Wages:', cpi_w.shape,'\n',
      'Consumer Expenditure Survey Nonfarm:', ces_total_nf.shape),'\n',
ces_gov = pd.read_csv('/Users/jasonrobinson/Documents/Projects/bls_project/data_2/SeriesReport-ces-government.csv')
cpi_u = pd.read_csv('/Users/jasonrobinson/Documents/Projects/bls_project/data_2/SeriesReport-cpi-all-u-notadj.48.csv')
cpi_w = pd.read_csv('/Users/jasonrobinson/Documents/Projects/bls_project/data_2/SeriesReport-cpi-urbwage-notadj.48.csv')
ces_total_nf = pd.read_csv('/Users/jasonrobinson/Documents/Projects/bls_project/data_2/SeriesReport-ces-total-nonfarm.csv')

Shape of our datasets: 
 CES for Government Employees: (11, 13) 
 Consumer Price Index for All Urban: (75, 13) 
 Consumer Price Index for All Wages: (75, 0) 
 Consumer Expenditure Survey Nonfarm: (11, 13)


In [67]:
cpi_w.head()

Unnamed: 0,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,HALF1,HALF2
0,1948,23.8,23.6,23.6,23.9,24.1,24.2,24.5,24.6,24.6,24.5,24.4,24.2,,
1,1949,24.2,23.9,24.0,24.0,24.0,24.0,23.8,23.9,24.0,23.9,23.9,23.8,,
2,1950,23.7,23.6,23.7,23.7,23.8,24.0,24.2,24.4,24.6,24.7,24.8,25.1,,
3,1951,25.5,25.9,26.0,26.0,26.1,26.1,26.1,26.1,26.3,26.4,26.5,26.6,,
4,1952,26.6,26.5,26.5,26.6,26.6,26.7,26.9,26.9,26.9,26.9,26.9,26.9,,


In [72]:
# Since cpi-w is not the same size as our cpi-u dataset we can easily drop
# the two extra columns as they are not necessary to the training
#cpi_w = cpi_w.drop(cpi_w[['HALF1', 'HALF2']], axis=1)
print(cpi_w.shape)
cpi_w.head()

(75, 13)


Unnamed: 0,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
0,1948,23.8,23.6,23.6,23.9,24.1,24.2,24.5,24.6,24.6,24.5,24.4,24.2
1,1949,24.2,23.9,24.0,24.0,24.0,24.0,23.8,23.9,24.0,23.9,23.9,23.8
2,1950,23.7,23.6,23.7,23.7,23.8,24.0,24.2,24.4,24.6,24.7,24.8,25.1
3,1951,25.5,25.9,26.0,26.0,26.1,26.1,26.1,26.1,26.3,26.4,26.5,26.6
4,1952,26.6,26.5,26.5,26.6,26.6,26.7,26.9,26.9,26.9,26.9,26.9,26.9


**Get the summary statistics** 

In [77]:
# Work with all urban and urban/wages
print(cpi_w.describe())
cpi_u.describe(include='all')

              Year         Jan         Feb         Mar         Apr  \
count    75.000000   75.000000   74.000000   74.000000   74.000000   
mean   1985.000000  113.994147  112.214919  112.725324  113.180257   
std      21.794495   79.448563   77.989041   78.410040   78.700877   
min    1948.000000   23.700000   23.600000   23.600000   23.700000   
25%    1966.500000   32.550000   32.425000   32.525000   32.700000   
50%    1985.000000  104.900000  103.600000  103.850000  104.200000   
75%    2003.500000  179.300000  177.825000  178.900000  178.800000   
max    2022.000000  276.296000  256.843000  258.935000  261.237000   

              May         Jun         Jul         Aug         Sep         Oct  \
count   74.000000   74.000000   74.000000   74.000000   74.000000   74.000000   
mean   113.572203  113.956986  114.159473  114.417946  114.762595  114.894743   
std     78.957409   79.158330   79.184861   79.262302   79.407066   79.365418   
min     23.800000   24.000000   23.800000   2

Unnamed: 0,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
count,75.0,75.0,74.0,74.0,74.0,74.0,74.0,74.0,74.0,74.0,74.0,74.0,74.0
mean,1985.0,115.730027,113.941081,114.453311,114.868554,115.230743,115.610919,115.837108,116.088851,116.419419,116.599351,116.572797,116.522838
std,21.794495,81.627283,80.171231,80.567981,80.818227,81.044034,81.233918,81.268215,81.35242,81.489663,81.484002,81.301846,81.130291
min,1948.0,23.5,23.5,23.4,23.6,23.7,23.8,23.7,23.8,23.9,23.7,23.8,23.6
25%,1966.5,32.35,32.225,32.325,32.5,32.525,32.625,32.725,32.9,32.925,33.1,33.125,33.15
50%,1985.0,105.5,104.2,104.5,105.0,105.35,105.65,105.95,106.25,106.65,107.0,107.15,107.3
75%,2003.5,183.45,181.775,182.85,182.8,182.575,182.75,182.95,183.625,184.15,184.075,183.7,183.45
max,2022.0,281.148,263.014,264.877,267.054,269.195,271.696,273.003,273.567,274.31,276.589,277.948,278.802


In [110]:
fig = px.histogram(cpi_u["Year"], x=cpi_u['Jan'], nbins=50)
fig.show()

In [111]:
fig = px.pairplot(cpi_u, diag_kind="kde")
fig.show()

AttributeError: module 'plotly.express' has no attribute 'pairplot'

As we would expect, all values share closely related means and standared deviations, but we want to drill down on the variance of the distributions.

In [113]:
fig = px.scatter(cpi_u, x=cpi_u["Jan"], y="May", color="Jun", size="Aug")
fig.show()


invalid value encountered in greater_equal


invalid value encountered in less_equal



ValueError: 
    Invalid element(s) received for the 'size' property of scatter.marker
        Invalid elements include: [nan]

    The 'size' property is a number and may be specified as:
      - An int or float in the interval [0, inf]
      - A tuple, list, or one-dimensional numpy array of the above