## Exploratory Data Analysis of Major US Markets

To begin, we will start by installing the dependencies we need. 
We'll need pandas, numpy, matplotlib and seaborn. 

In [None]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import warnings as wr 

In [None]:
!pip install seaborn

In [None]:
!pip freeze >> requirements.txt

In [None]:
f = open("requirements.txt", "r")
for line in f:
    print(line.strip())
f.close()

In [None]:
csv_file = "data/sp500.csv"
df = pd.read_csv(csv_file)
print(df.head)

By looking at the dataframe, we can tell that this data represents OHLC prices for the SPX Index at monthly intervals. 

In [None]:
df['Date'] = pd.to_datetime(df['Date'], format="%m-%d-%y")

for col in ['Price', 'Open', 'High', 'Low']:
    df[col] = df[col].str.replace(',', '', regex=False).astype(float)

df['Change %'] = df['Change %'].str.replace('%', '', regex=False).astype(float)

print("DataFrame Info after processing:")
df.info()

print("\nDataFrame head after processing:")
print(df.head())

In [None]:
shape = df.shape
print(shape)

In [None]:
prices = df.loc[:, df.columns.str.contains('Price')]
print(prices)


Now let's create a dataframe of just the date and the price. 

This will prepare the data for a simple time-series plot

In [None]:
# Define 'dates' as the 'Date' Series from your DataFrame
dates_series = df['Date']

# Define 'prices' as a DataFrame containing the relevant price columns
prices_df = df['Price']

# Now, use pd.concat with a list of the pandas objects to combine them along columns (axis=1)
new_df = pd.concat([dates_series, prices_df], axis=1)

print(new_df.head())