# Typhoon and Stock Return

In this tutorial, we will investigate whether the stock market performs abnormally after a strong typhoon. We will make extensive use of the data-handling package ```pandas```, the statistical functions of ```scipy``` and the statistics package ```statsmodel```.

## A. Data Cleaning

### A1. Typhoon data

The typhoon data is obtained from the Hong Kong Observatory. 
Each row contains data for a particular signal. Since typhoons often go through
several signals, there are multiple rows for each typhoon.

We will first do some preprocessing:
- Since we are only interested in the effect of strong typhoons, we will only keep typhoons that have a maximum signal at or above No. 8. 
- We will also keep record of the date each typhoon went below Signal No. 8.

A few notable pandas techniques that we will be using:
- To **select rows** out of a DataFrame whenever a certain column satisfying an inequality, use
```python
DataFrame[DataFrame['column_name'] >= value]
```
More generally, you can select rows by supplying a list of True/False values.


- To convert a column to pandas **datatime** format, use ```pd.to_datetime()```.
  You can then extract individual date components by ```.dt.year```, ```.dt.month``` etc.
  For example, to extract year out of a column called *date*, you can write:
  ```python
  DataFrame('date') = pd.to_datetime(DataFrame('date'))
  DataFrame('year') = DataFrame('date').dt.year 
  ```


- There are two ways to calculate the summary statistic of column B **grouped by** values of column A:
    - To collapse to one row per group, use:
        ```python
        DataFrame.groupby('column_A')['column_B'].ops()
        ```
        where `opts()` can be operations such as `mean()`, `max()`, etc. 
        Note that this method returns a pandas Series instead of a DataFrame. 
        To get a DataFrame, append `.to_frame()` at the end.
    - If you want to maintain the original number of rows, use:
        ```python
        DataFrame.groupby('column_A')['column_B'].transform('ops')
        ```

- To **drop duplicates**, use
```python
DataFrame.drop_duplicates(subset,keep)
```
    - `subset`: by default pandas consider two rows to be duplicates only 
    if they are identical for all columns. You can specify a narrower set of columns here. 

In [None]:
import pandas as pd

# Import data and keep only signal 8 or above
typhoon_data = pd.read_excel("../Data/typhoon_hk.xlsx")


# Convert date to pandas datatime format and extract year


# Find the highest signal for each typhoon and store it in 'Signal_max'


# Keep only the last date for each typhoon


# Keep only three variables


# Show the data
typhoon_data

### A2. Stock data

For stock data, we will calculate the return from the previous trading day.

The most notable pandas technique we use here is ```.shift(x)```. 
This method shifts all rows down by *x* rows.
The nice thing about this technique is that you can totally do things
like 
```python
stock_data["Price"]/stock_data.shift(1)["Price"] - 1
```
which gives you all daily return in one single line.

Other notable techniques:
- **Drop rows with missing values**
```python
DataFrame.dropna()
```
- **Convert column(s) to numeric format**
```python
pd.to_numeric(DataFrame[['column_name']])
```
Specify `errors='coerce'` to force convert. Any values that is not numeric
will be converted to `NaN`.


- **Fill in missing dates**: first change the DataFrame's index to a date variable:
```python
DataFrame.index = pd.DatetimeIndex(DataFrame['date_column'])
```
Then
```python
DataFrame.asfreq(freq)
```

In [None]:
# Import stock data and keep only two variables
stock_data = pd.read_csv("../Data/hsi.csv")
stock_data = stock_data[["Date","Adj Close"]]

# Convert date to pandas datetime format


# Adj Close is NaN on some dates. 
# Force convert everything to numeric and drop missing.


# Calculate return since the previous trading day


# 90-day future return


# Use date as the index of the dataframe and fill in missing dates


# Show the data
stock_data[0:10]

### A3. Merge stock and typhoon data

We can now merge the stock and typhoon data. To **merge** two DataFrames A and B, use
```python
DataFrame_A.merge(DataFrame_B, options)
```
common options include:
- `how`: whether the merge keeps all samples from the left DataFrame (A), 
the right DataFrame (B), a union of the two or intersection. 
Default is intersection, which means only samples that appear on both DataFrames
will be kept.
- `left_on` and `right_on`: the name of the columns used to match the two DataFrames.
- `left_index` and `right index`: use the DataFrame index instead of a column for the match.


Unless the typhoon's signal went below No. 8 before market opens, no stock data will be available for the given `end_date`. In this case we use the return from the next trading day.

First we extract the list of such typhoons. We can do that by using the ```.isnull()``` method:

Then we merge in stock information from the next day:

If return is still missing after this step, it must be the case that at least two days have passed since Signal No. 8 was lowered. We will ignore such instances.

To append one DataFrame at the bottom of another, use ```.append()```:

## B. Statistical Analysis

Finally we can perform some statistical analysis. We will start with comparing the daily return on the first trading day after a typhoon versus all other days.

### B1. Statistical Tests

```scipy.stats``` contains many of the common tests. Noteable ones include:
- **T-test**: ```ttest_ind(A,B)```.
- **Median Test**: ```median_test(A,B)```. 
- **Mann-Whitney rank test**: ```mannwhitneyu(A,B)```. A non-parametric test on whether A and B have the same distribution.

In [None]:
from scipy import stats

# Stock data without typhoon. '~' means 'not'.


# Mean daily returns


# T-test


# Mood's median test


# Mann-Whitney test


Turns out the stock market on average performs better right after a typhoon! The difference is not statistically significant though.

### B2. Regression

We can also run a regression. Note that running a regression with a single dummy variable is identical to running a T-test:

In [None]:
# Signal max = 0 if no typhoon
data_wo_typhoon['Signal_max'] = 0
data_whole = data_w_typhoon.append(data_wo_typhoon)

# Convert Signal_max to a dummy variable called 'typhoon'
data_whole["typhoon"] = 0
data_whole.loc[data_whole['Signal_max']>=8,'typhoon'] = 1

# scipy OLS
stats.linregress(data_whole['typhoon'],data_whole['daily_return'])

If you prefer output that is more in line with what a statistical package like Stata would give you, use ```statsmodels``` instead:

In [None]:
# statsmodel OLS
import statsmodels.api as sm

# statsmodel does not add the constant by default, so add manually
results = sm.OLS(data_whole['daily_return'],
                 sm.add_constant(data_whole['typhoon'])).fit()
results.summary()

As as a statistical package, ```statsmodels``` have many of the common procedures built in. For example, we can correct for serial correlation by computing the Newey-West Standard Errors:

In [None]:
# Newey-West Standard Errors. Note that we are using the results
# from the previous regression.


Here is another idea: what about buying HSI right after a typhoon? Let us compare the mean return of buying right after a typhoon versus that of other days. We will assume a fixed 90-day holding period.

In [None]:
# If you want Newey-West standard errors to begin with, this is how:
results = sm.OLS(data_whole['90d_return'],
              sm.add_constant(data_whole['typhoon'])
             ).fit(cov_type='HAC',cov_kwds={'maxlags':5})
results.summary()

Buying after typhoon gives us on average a 2.6% higher return over 3 months! To bad it is not statistically significant.

## C. That's It? No Significant Result?

Let us plot the distribution of returns for days with typhoon and without:

In [None]:
%matplotlib inline
data_whole.hist(column='90d_return',by='typhoon')  

It does look like return is higher after a typhoon. Thinking about it, a Signal 8 typhoon is often quite predestrian---people actually go out for breakfast and movies. What if we focus only on the strongest typhoons? I leave this as an exercise for you.