**Fin 585R**  
**Diether**  
**Problem Set**  
**Introduction to Portfolios**  
**Application: Short Selling**

**Overview**

This problem set is designed to introduce you to portfolio construction using a real portfolio application from the Finance literature. This assignment builds on the concepts and code I introduced last time in class. I think you'll find the Intro to Portfolios' Jupyter notebook helpful for this assignment.


**Short Selling Background**

When someone shorts a stock, they profit if the price of the stock goes down instead of going up, but short-selling transactions are more complicated than going long (buying a stock and then later selling it).  There are four basic steps to short selling:<br>

1. *The short seller borrows the desired number of shares from someone.* This is usually done by the broker who locates the shares and the broker becomes the middleman for the short seller and the lender (note, the broker is often both the middleman and the lender). The lender expects to be paid interest on the loan which is the main cost of shorting. The loan is callable by the lender at any time. The short seller can repay the loan at any time. <br><br>

2. *The short seller sells the shares.* The proceeds are put into an interest-bearing account called the collateral account. Most lenders require the collateral account to contain 102% of the value of the proceeds.  The collateral account usually invests in low risk, short term securities (e.g., Treasury bills). When the short seller borrows the stock there are lending fees; the short seller pays interest on the loan. Typically, the interest rate is small. The overall interest rate earned on the collateral account is split between the lender and the short seller. The portion of the interest rate received by the short seller is called the rebate rate. The **loan fee** is the portion paid the lender, and is equivalent to the interest rate the short seller pays on the loan. Therefore, the **loan fee** is the main direct cost of shorting. There can be a zero or negative rebate rate; a negative rebate rate corresponds to a situation where the lender receives all the interest in the collateral account and the short sellers pays additional interest out of her pocket to the lender.<br><br>

3. *Pay any dividends while the loan is open.* The short seller must pay to the lender the cash equivalent of any dividends paid out on the stock. If you short Apple, and Apple pays a 2 dollar dividend per share during the time you short the stock, then you owe the lender 2 dollars for every share you shorted. <br><br>

4. *Buy the shares back.* The short sellers profits are the following:

$$
Profit = Sell - Buy -(Interest \ Paid)
$$ <br> 
  
The Finance literature has been interested in whether short sellers have good information. Do short sellers typically make money when the short? If loan fees are high, then short sellers are paying a high price to short (bet that the stock will perform poorly). Therefore, if short sellers are paying a lot to short, it likely represents times when short sellers have good information. 

In this problem set you create portfolios based on lagged loan fee to test the preceding hypothesis. Remember, the returns in the data are from going long (buying and then selling the stock). Your portfolios will reflect that fact. If the returns are really low or negative for a portfolio, then that means the short sellers are experiencing high returns.


**Data**

The data are monthly stock data for all stocks in the U.S. with non-missing loan fee data. The basic unit of observation is the stock month. You can download the data directly using the following link: [the data](http://diether.org/prephd/03-mstk_short_02-12.csv). There is also a link on *Learning Suite*. The data contain the following variables:

|Variable | Description                                       |
|---------|---------------------------------------------------|
|permno   | stock identifier                                  |
|caldt    | calendar date                                     |
|ret      | monthly return                                    |
|prclag   | stock price, lagged                               |   
|melag    | market equity, lagged                             |
|feelag   | the loan fee expressed a percent per anum, lagged |

**Tasks and Questions**  

1. What is the sample period of this data?<br><br>

2. There are some observations where `feelag` is less than zero. These are data errors. Remove these observation from the dataframe and then compute the summary statistics for `feelag`.<br><br>

3. Output the median and 90th percentile of `feelag` for every month. This can be done in seperate commands. <br><br>

4. Construct three equal-weight portfolios using `feelag` a the criterion variable. Portfolio 0: an equal-weight portfolio the includes all stocks with `feelag` less than or equal to 3% (loan fee are express as the interest rate per anum). Portfolio 1: an equal-weight portfolio the includes all stocks with `feelag` greater than 3% and less than or eqal to 5%. Portfolio 2: an equal-weight portfolio the includes stocks with `feelag` greater than 5% (loan fee are express as the interest rate per anum). Hint: use the pd.cut to bin the data and use my Jupyter notebook from last class as a guide.<br><br>

5. Compute summary statistics for each of the portfolios you created including the mean, standard deviation, and t-statistics testing of the mean return of the portfolio is zero.<br><br>

5. Are your results consistent with the hypothesis that short sellers have good information?

In [1]:
import pandas as pd
import numpy as np
from finance_byu.summarize import summary

In [2]:
df = pd.read_csv("../data/03-mstk_short_02-12.csv", parse_dates=['caldt'])
df.head()

Unnamed: 0,permno,caldt,ret,prclag,melag,feelag
0,10001,2005-06-30,0.12843,8.02,21.053,0.15
1,10001,2005-07-29,0.009945,9.05,26.363,0.32701
2,10001,2005-08-31,0.039387,9.14,26.625,0.15
3,10001,2005-10-31,-0.11904,11.51,33.529,0.1625
4,10001,2005-11-30,-0.059397,10.1,29.421,0.15


In [3]:
# 1. What is the sample period of this data?
all_dates = pd.unique(df['caldt'])
all_dates.sort()
np.datetime_as_string((all_dates[0], all_dates[-1]), unit='D')

array(['2002-06-28', '2012-07-31'], dtype='<U28')

In [4]:
# 2. There are some observations where `feelag` is less than zero. These are data errors.
# Remove these observation from the dataframe and then compute the summary statistics for `feelag`.
df = df[df['feelag'] >= 0].reset_index(drop=True)
stats = summary(df['feelag'])
stats

count    419500.000000
mean          1.030662
std           3.764338
tstat       177.334671
pval          0.000000
min           0.000000
25%           0.095028
50%           0.150000
75%           0.361825
max          98.180000
Name: feelag, dtype: float64

In [5]:
# 3. Output the median and 90th percentile of `feelag` for every month. This can be done in seperate commands.
df.groupby('caldt').quantile([.5, .9]).unstack(level=1)

Unnamed: 0_level_0,permno,permno,ret,ret,prclag,prclag,melag,melag,feelag,feelag
Unnamed: 0_level_1,0.5,0.9,0.5,0.9,0.5,0.9,0.5,0.9,0.5,0.9
caldt,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
2002-06-28,73880.0,87242.4,-0.093531,0.038841,25.620,58.840,5577.750,60493.80,0.442105,1.87500
2002-07-31,75842.5,87320.0,-0.117650,0.031034,24.390,56.430,3328.250,35967.00,0.250000,1.55537
2002-08-30,75510.0,87283.8,0.005457,0.144360,22.150,50.348,2951.000,35800.80,0.250000,1.62500
2002-09-30,77462.0,87630.0,-0.102530,0.046300,18.010,48.852,1412.000,18543.40,0.227160,1.75000
2002-10-31,76671.0,87621.0,0.056415,0.338338,18.800,47.338,1699.100,16213.00,0.189690,1.39574
...,...,...,...,...,...,...,...,...,...,...
2012-03-30,82603.0,91977.2,0.023272,0.150878,13.810,52.836,418.235,5896.49,0.104045,3.44536
2012-04-30,82598.0,91967.2,-0.013859,0.091053,14.210,53.964,434.630,5790.60,0.099750,3.43792
2012-05-31,82546.5,91955.0,-0.065507,0.056579,14.050,53.370,414.105,5787.30,0.108170,3.59560
2012-06-29,82279.0,91914.2,0.036071,0.163720,13.140,49.958,384.360,5369.94,0.109820,3.91192


In [6]:
# 4. Construct three equal-weight portfolios using `feelag` a the criterion variable.
# Portfolio 0: an equal-weight portfolio the includes all stocks with `feelag` less than or equal to 3% (loan fee are express as the interest rate per anum).
# Portfolio 1: an equal-weight portfolio the includes all stocks with `feelag` greater than 3% and less than or eqal to 5%.
# Portfolio 2: an equal-weight portfolio the includes stocks with `feelag` greater than 5% (loan fee are express as the interest rate per anum).
# Hint: use the pd.cut to bin the data and use my Jupyter notebook from last class as a guide.
df['bins'] = pd.cut(df['feelag'], [0, 3, 5, np.inf], labels=False)
df['bins']

0         0.0
1         0.0
2         0.0
3         0.0
4         0.0
         ... 
419495    2.0
419496    2.0
419497    2.0
419498    2.0
419499    2.0
Name: bins, Length: 419500, dtype: float64

In [9]:
# 5. Compute summary statistics for each of the portfolios you created including the mean, standard deviation, and t-statistics
# testing of the mean return of the portfolio is zero.
port = df.groupby(['caldt','bins'])['ret'].mean()*100
port = port.unstack(level='bins')
summary(port)

bins,0.0,1.0,2.0
count,122.0,122.0,122.0
mean,0.971593,-0.128252,-0.76561
std,6.548123,9.463743,9.395471
tstat,1.638881,-0.149686,-0.900054
pval,0.103836,0.881262,0.369879
min,-21.662696,-22.925989,-22.173809
25%,-2.314197,-5.605791,-6.164255
50%,1.502263,-0.275929,-0.810759
75%,4.290706,3.88622,3.621737
max,22.716464,33.764854,26.526421


In [10]:
# 6. Are your results consistent with the hypothesis that short sellers have good information?

While the portfolio with higher fees had a more negative return (which would be a more positive return if you short sell), the results are not statistically significant. That being said, there does seem to be something there, but more research is required.