# Finans Projekt 1 - English
This notebook is for assistance with the coding for many of the questions in the project.
The sections are marked with the corresponding question in the Project description.
Remember, this code is provided to get started with the project, but the code is not complete for answering the corresponding questions

#### Initialize python packages

In [None]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import statsmodels.api as sm

#### Read Data

In [None]:
# path to project data (replace with your own path)
file_path = '/Users/johndoe/Documents/DTU/intro_stat/projects/finans1/finans1_data.csv'

## Read data into a pandas DataFrame
D = pd.read_csv(file_path, delimiter=";")
## Keep only the dates and the ETFs AGG, VAW, IWN, and SPY
D = D.loc[:, ["t", "AGG", "VAW", "IWN", "SPY"]]

#### a) Simple summary of data

In [None]:
print(f"Dimension of DataFrame: {D.shape}") # f-strings allow us to insert variables directly into the string
print(f"Variable names: {D.columns}")
print("\nFirst few rows of DataFrame:") # \n is the newline character for strings
display(D.head())
print("Last row of DataFrame:")
display(D.tail())
print("Some summary statistics:")
display(D.describe())
print("Data types:", D.dtypes)

#### b) Histogram (empirical density)

In [None]:
## Histogram describing the empirical density of the weekly returns from
## AGG (histogram of weekly returns normalized to have an area of 1)
plt.hist(D['AGG'].dropna(), bins=20, density=True, color='blue', edgecolor='black') # dropna() removes potential missing values
plt.show()

#### Date variable t

In [None]:
# Converts the variable 't' to a date variable
D['t'] = pd.to_datetime(D['t']) 
# to_datetime() method converts string to a datetime pandas object. 
# This is necesary to make it ordinal
display(D['t'].describe())

#### c) Plots of data over time 

In [None]:
## Plot of weekly return over time for AGG
ylim = (-0.2, 0.2)
plt.plot(D['t'], D['AGG'], label='AGG')
plt.ylim(ylim)
plt.xlabel("Date")
plt.ylabel("Return AGG")
plt.show()
## Similar plots for the three other ETFs
for etf in ['VAW', 'IWN', 'SPY']:
    plt.plot(D['t'], D[etf], label=etf)
    plt.ylim(ylim)
    plt.xlabel("Date")
    plt.ylabel(f"Return {etf}")
    plt.show()

#### d) Box plots by ETF

In [None]:
etfs = ['AGG', 'VAW', 'IWN', 'SPY']

plt.figure(figsize=(10, 6))
plt.boxplot([D['AGG'],D['VAW'],D['IWN'],D['SPY']], labels=etfs)
plt.xlabel("ETF")
plt.ylabel("Return")
plt.grid(axis='y')
plt.show()

#### e) Key summary statistics for AGG

In [None]:
print(f"Total number of observations (without missing values): {D['AGG'].notna().sum()}")
print(f"Sample mean of weekly returns: {np.mean(D['AGG'])}")
print(f"Sample variance of weekly returns: {np.var(D['AGG'], ddof=1)}") # ddof=1 as we want the *sample* variance

#### f) QQ-plot for model validation

In [None]:
# QQ-plot for AGG's weekly returns
sm.qqplot(D['AGG'].dropna(), line ='q')
plt.show()

#### g-h) One-sample t-test

In [None]:
# Test hypothesis mu = 0 for AGG's weekly returns
res = stats.ttest_1samp(D['AGG'], popmean=0)
print(f"Test statistic: {res.statistic}")
print(f"P-value: {res.pvalue}")

# Confidence interval
print(res.confidence_interval())


#### i) Welch t-test

In [None]:
# Comaring the mean weekly returns of VAW and AGG
res = stats.ttest_ind(D['VAW'].dropna(), D['AGG'].dropna(), equal_var=False)
print(f"Test statistic: {res.statistic}")
print(f"P-value: {res.pvalue}")

#### k) Correlation

In [None]:
# Computing the  correlation between selected ETFs
correlation_matrix = D[["AGG", "VAW", "IWN", "SPY"]].corr()
display(correlation_matrix)

## EXTRA
#### Subsets in Python

In [None]:
## df['AGG'] < 0 returns all observations where AGG is negative
## Can be used to extract all AGG losses
loss_weeks = D['AGG'] < 0
agg_losses = D['AGG'][loss_weeks]
print("Weeks with negative returns in AGG:")
display(agg_losses)

## Alternatively, use the 'query' method
agg_losses_query = D.query('AGG < 0')
print("Weeks with negative returns in AGG (query method):")
display(agg_losses_query)
# Or use the 'loc' method
agg_losses_loc = D.loc[D['AGG'] < 0, 'AGG']
print("Weeks with negative returns in AGG (loc method):")
display(agg_losses_loc)

## More complex logical expressions can be made, e.g.:
## Find all observations from weeks where AGG had a loss and SPY had a gain
agg_loss_spy_gain = D.query('AGG < 0 & SPY > 0')
print("Weeks with negative AGG returns and positive SPY returns:")
display(agg_loss_spy_gain)

# "display()" function gives a nicer table than print. It is 
# especially useful when working with dataframes (pandas)

#### Additional Python tips

In [None]:
## Make a for loop to calculate some summary 
## statistics and save the result in a new data frame
Tbl = pd.DataFrame()
for i in ['AGG', 'VAW', 'IWN', 'SPY']:
    Tbl.loc[i, "ETF_mean"] = D[i].mean()
    Tbl.loc[i, "ETF_var"] = D[i].var(ddof=1) 
    
# show
display(Tbl)

In [None]:
# There are many other ways to do these calculations, some more concise. For example
# Calculate mean and variance for all columns but 't'
result = D.drop(columns='t').agg(['mean', 'var'])
# The agg function(aggregate) is used to calculate the mean and variance of returns for each ETF.
display(result)

# See more functions in pandas documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
# Numpy documentationen: https://numpy.org/doc/stable/reference/index.html
# Or find documentation or guides on other python packages/functions online.

#### Latex Tips
Pandas (pd) also includes a function that is very handy for writing tables/dataframes directly into Latex-code. 
This is done by usind the function `pd.to_latex()`.
The following is the simplest form of the function:

In [None]:
Tbl_latex = Tbl.to_latex()
print(Tbl_latex)