# Time Series with Financial Data

Financial analysts use time series data such as stock price movements, or a company’s sales over time, to analyze a company’s performance [see](https://corporatefinanceinstitute.com/resources/data-science/time-series-data-analysis/).

Investors can take advantage of new growth investing strategies in order to more precisely hone in on stocks or other investments offering above-average profit potential. When it comes to investing in the stock market, there are always a variety of approaches that can be taken. The goal, however, is generally always the same, regardless of the approach – grow your investments and increase your profits [see](https://corporatefinanceinstitute.com/resources/capital-markets/a-guide-to-growth-investing/)

<br>

---  
Source:  
+ [Candle Stick Charts with Plotly](https://plotly.com/python/candlestick-charts/)  
+ [Scatter Plot of Financial Data with Plotly](https://plotly.com/python/line-and-scatter/)  
+ [Bar Race Charts](https://www.analyticsvidhya.com/blog/2021/07/construct-various-types-of-bar-race-charts-with-plotly/)
+ [Feature Engineering Techniques For Time Series Data](https://www.analyticsvidhya.com/blog/2019/12/6-powerful-feature-engineering-techniques-time-series/)
+ [Differencing Time Series](https://towardsdatascience.com/an-intuitive-guide-to-differencing-time-series-in-python-1d6c7a2c067a)

---  
Data ([from Yahoo Finance](https://finance.yahoo.com/)):
+ Credit Suisse Stock Market Price (April 2009 - March 2023) -- **DATA-CS.csv**
+ UBS Group Stock Market Price (April 2009 - March 2023) -- **DATA-UBS.csv**
---  

Author: 
+ dr. daniel benninger  

History:  
+ 2023-04-06 v2 dbe --- initial version for BINA FS23  
---

## Load Libraries and Check Environment

In [None]:
import pandas as pd
from datetime import datetime
import plotly.graph_objects as go

In [None]:
print(pd.__version__)

In [None]:
%ls
%cd sample_data

In [None]:
import warnings
warnings.filterwarnings("ignore")

## Load Financial Data and Verify Structure/Format/Values

In [None]:
# load the financial dataset from the BINA FS23 github repositors
path = 'https://raw.githubusercontent.com/sawubona-gmbh/BINA-FS23-WORK/main/LB10-Regression%2BTimeSeries/Python/DATA-CS.csv'
data = pd.read_csv(path)

# OPTION: load the financial dataset from a local file
# data = pd.read_csv('DATA-CS.csv')

In [None]:
data.head(5)

In [None]:
data.info()

In [None]:
data.describe()

In [None]:
# convert date colume to "datetime" format
data[["Date"]] = data[["Date"]].apply(pd.to_datetime)

In [None]:
data.info()

--- 
## Select time range and plot time series
Select a specific timeframe

In [None]:
df= data[(data['Date'] > "2019-01-01") & (data['Date'] < "2023-01-01")]
# df= data[(data['Date'] > "2018-01-01")]

and plot the financial time series OHLC as **candlesticks** using *plotly.graph_objects*

In [None]:
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
                open=df['Open'],
                high=df['High'],
                low=df['Low'],
                close=df['Close'])])

fig.update_layout(
    title="Finance Institutes - Stock Market Price <br><sup>CREDIT SUISSE</sup>",
    yaxis_title='US$',
        width=1000, height=600,
    yaxis_range = (0,25))
    

fig.show()

## Some Feature Engineering Techniques applied to Financial Time Series Data

### **Date-Related** Features   
Information about the day, month, year e.g. *day of the week*, *quarter*, *day/week of year* etc. 

In [None]:
data['year']=data['Date'].dt.year 
data['month']=data['Date'].dt.month 
data['day']=data['Date'].dt.day

data['dayofweek_num']=data['Date'].dt.dayofweek  
data['dayofyear_num']=data['Date'].dt.dayofyear 
data['weekofyear_num']=data['Date'].dt.week
data['quarter_num']=data['Date'].dt.quarter
data['daysinmonth_num']=data['Date'].dt.days_in_month

data.head()

---  
### **Lag-Related** Features  
If we like predicting the stock price for a company. So, the previous day’s stock price is important to make a prediction. In other words, the value at time t is greatly affected by the value at time t-1. The past values are known as lags, so t-1 is lag 1, t-2 is lag 2, and so on.

In [None]:
data['lag_1'] = data['Close'].shift(1)

dataX = data[['Date', 'lag_1', 'Close']]
dataX.head()

In [None]:
dataX['performance_1']=dataX['Close']-dataX['lag_1']

dataX.head()

If the series has a weekly trend, which means the value last Monday can be used to predict the value for this Monday, we should create lag features for seven days. 

We can create multiple lag features as well! Let’s say we want lag 1 to lag 7 – we can let the model decide which is the most valuable one. So, if we train a linear regression model, it will assign appropriate weights (or coefficients) to the lag features

In [None]:
data['lag_1'] = data['Close'].shift(1)
data['lag_2'] = data['Close'].shift(2)
data['lag_3'] = data['Close'].shift(3)
data['lag_4'] = data['Close'].shift(4)
data['lag_5'] = data['Close'].shift(5)
data['lag_6'] = data['Close'].shift(6)
data['lag_7'] = data['Close'].shift(7)

dataX = data[['Date', 'lag_1', 'lag_2', 'lag_3', 'lag_4', 'lag_5', 'lag_6', 'lag_7', 'Close']]
dataX.head(10)

---  
### **Rolling Window** Features  
How about calculating some statistical values based on past values? This method is called the rolling window method because the window would be different for every data point.  

We will select a window size, take the average of the values in the window, and use it as a feature.

In [None]:
data['rolling_mean7'] = data['Close'].rolling(window=7).mean()

dataX = data[['Date', 'rolling_mean7', 'Close']]
dataX.head(10)

In [None]:
import plotly.express as px
df= dataX[(dataX['Date'] > "2019-01-01") & (dataX['Date'] < "2019-12-31")]
#df.info()

# Create Line plot
fig = px.line(df, x=df['Date'], y=['Close', 'rolling_mean7'])


# Setup Layout
fig.update_layout(
    title="Finance Institutes - Stock Market Price with Rolling Means <br><sup>CREDIT SUISSE</sup>",
    legend_title="Data Points",
    yaxis_title='US$',
    width=1000, height=600,
    yaxis_range = (10,15))

# Display the plot
fig.show()

In [None]:
data['rolling_mean20'] = data['Close'].rolling(window=20).mean()
data['rolling_mean60'] = data['Close'].rolling(window=60).mean()

dataY = data[['Date', 'Close','rolling_mean20','rolling_mean60',]]
dataY.head(25)

In [None]:
import plotly.express as px
df= dataY[(dataY['Date'] > "2019-01-01") & (dataY['Date'] < "2019-12-31")]
#df.info()

# Create Line plot
fig = px.line(df, x=df['Date'], y=['Close', 'rolling_mean20','rolling_mean60'])


# Setup Layout
fig.update_layout(
    title="Finance Institutes - Stock Market Price with Rolling Means <br><sup>CREDIT SUISSE</sup>",
    legend_title="Data Points",
    yaxis_title='US$',
    width=1000, height=600,
    yaxis_range = (10,15))

# Display the plot
fig.show()

---  
### **Differencing** Time Series
Differencing is a method of transforming a time series dataset. Differencing is performed by subtracting the previous observation from the current observation.  

Differencing can help stabilize the mean of the time series by removing changes in the level of a time series, and so eliminating (or reducing) trend and seasonality. 

In [None]:
dataZ = data[['Date', 'Close']]
dataZ['diff1'] = dataZ['Close'].diff(periods=1)

dataZ.head()

In [None]:
dataZ['diff2'] = dataZ['Close'].diff(periods=2)
dataZ['diff5'] = dataZ['Close'].diff(periods=5)

dataZ.head(10)

In [None]:
import plotly.express as px
df= dataZ[(dataZ['Date'] > "2019-01-01") & (dataZ['Date'] < "2019-12-31")]

# Create Line plot
fig = px.line(df, x=df['Date'], y=['Close', 'diff1','diff5'])


# Setup Layout
fig.update_layout(
    title="Finance Institutes - Stock Market Price with Differencing <br><sup>CREDIT SUISSE</sup>",
    legend_title="Data Points",
    yaxis_title='US$',
    width=1000, height=600,
    yaxis_range = (-15,15))

# Display the plot
fig.show()

---  
### **ADD ON:** Line or Bar Charts for Time Series?


In [None]:
dataZ = data[['Date', 'Close']]
dataZ.info()
data.head()

In [None]:
import matplotlib.pyplot as plt

dataZ = data[['Date', 'Close']]

plt.figure(figsize=(10, 8))
# as LINE chart
plt.plot(dataZ.Date, dataZ.Close)
# as BAR chart
#plt.bar(dataZ.Date, dataZ.Close)

plt.suptitle("Finance Institutes - Stock Market Price Daily CLOSING")
plt.title("CREDIT SUISSE")
plt.xlabel('Date')
plt.ylabel('US$')

plt.show()

---   
### **ADD ON:** Visualizing time series data in [Heatmap](https://www.analyticsvidhya.com/blog/2021/02/visualization-in-time-series-using-heatmaps-in-python/) form


In [None]:
#!pip install calplot

In [None]:
dataZ = data[['Date','Close']]
df= dataZ[(dataZ['Date'] > "2019-01-01") & (dataZ['Date'] < "2022-01-01")]
df.head()

In [None]:
df.info()

In [None]:
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace = True)
df.info()

In [None]:
import calplot
fig1 = calplot.calplot(data = df['Close'], 
                       cmap = 'jet', 
                       figsize = (10, 5), 
                       suptitle = "CREDIT SUISSE - Closing per Day",
                       )

import pylab
pylab.savefig('cs-heatmap.png')



---



---  
### **ADD ON:** Systematic Feature Engineering with [tsfresh](https://tsfresh.readthedocs.io/en/latest/text/introduction.html)  
**tsfresh** is used for systematic feature engineering from time-series and other sequential data. These data have in common that they are ordered by an independent variable. The most common independent variable is time (time series).  
If we want to calculate different characteristics of time series such as the maximum or minimum, the average or the number of temporary peaks, without tsfresh, we have to calculate all those characteristics manually.  
tsfresh automates this process calculating and returning all those features automatically.

In [None]:
#!pip install -U tsfresh

In [None]:
dataZ = data[['Date','Open','High','Low','Close','Volume','year','month','day']]
dataZ.head()

In [None]:
# settings for feature extraction
from tsfresh.feature_extraction import ComprehensiveFCParameters
settings = ComprehensiveFCParameters()
# e.g. 
kind_to_fc_parameters = {
    "Open": {"mean": None},
    "Close": {"maximum": None, "minimum": None}
}

# automated feature extraction
from tsfresh.feature_extraction import extract_features
features = extract_features(dataZ, column_id="Date", column_sort="Date", default_fc_parameters=settings)

#features = extract_features(dataZ, column_id="Date", column_sort="Date")

In [None]:
features.info()

In [None]:
features.head()

In [None]:
features.describe()