# Insider Trading

> The following is an example of on how to use <span style="color:#3EACAD">Insider Eda</span> of your project as a [Python package](https://github.com/tuhinmallick/InsiderTrader/tree/main/src/insider_eda).

## Usage

Firstly, I will import the path of our insider_eda package

In [86]:
import os
import sys
sys.path.insert(0, os.path.abspath('../src'))

Let's start by importing `insider_eda`, a python packageto perform exploratory data analysis on insider trading

In [87]:
%load_ext autoreload
%autoreload 2
from insider_eda.eda_base import Exploratory_data_analysis

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Import the necesaary packages. Not many packages need to imported because most of the work is done inside the insider_eda package.

In [88]:
import pandas as pd
from datetime import timedelta 

In [89]:
ticker = 'CAT'

Let's import our [insider data](https://www.kaggle.com/code/ilyaryabov/s-p500-insider-trading) from kaggle. I am dropping the SEC Form 4 becasue it is not used in my analysis and shows only the source of data. After that, I am removing all the commas as it will cause problem in numeric calculation and making sure that date and numeric data have their respective datatype.

This dataset comprises large wallet activities such as Selling, Buying, and Optional actions for each ticker in the S&P500 index.

Knowing that the firm's director sells stocks may indicate that the company will soon encounter difficulties and its stock price will fall.

In [90]:
df = pd.read_csv(f"../data/insider_data/{ticker}.csv")

f = df.drop( columns="SEC Form 4", axis=1)  # This column has no use in analysis

df["Date"] = pd.to_datetime(
    df["Date"], errors="coerce"
)  # Making sure date format is maintained
start_date = df.Date.iloc[-1] - timedelta(days=7)

# df.set_index(["Date"], inplace=True)
# removing comma from  numeric value
df["Shares Total"] = [x.replace(",", "") for x in df["Shares Total"]]
df["Shares"] = [x.replace(",", "") for x in df["Shares"]]
df["Value ($)"] = [x.replace(",", "") for x in df["Value ($)"]]

# Making sure that columns are numeric
df["Cost"] = pd.to_numeric(df["Cost"])
df["Shares"] = pd.to_numeric(df["Shares"])
df["Value ($)"] = pd.to_numeric(df["Value ($)"])
df["Shares Total"] = pd.to_numeric(df["Shares Total"])

Here, I am importing the [S&P 500](https://www.kaggle.com/datasets/rprkh15/sp500-stock-prices) data from Kaggle. I am trimming the data so that it starts from the date we have the insider information. 

In [91]:
stock_df = pd.read_csv(f"../data/sp500-stock-prices/{ticker}.csv")

stock_df["Date"] = pd.to_datetime(
    stock_df["Date"], errors="coerce"
)  # Making sure date format is maintained
stock_df = stock_df[stock_df.Date > start_date]
stock_df.set_index(["Date"], inplace=True)

I am initializing our package with insider data and stock data.

In [92]:
insider_eda = Exploratory_data_analysis(df)
stock_eda = Exploratory_data_analysis(stock_df, target_name="Close",time_series=True)

## Insider Activity overall

Let's visualize the insider activity over the whole time period by categorizing it into - Buy, Sale and Option Exercise.

In [93]:
insider_eda.plotly_insider_activity(start_date='2021-08-04',end_date='2022-05-18')

Most of the activity is concentrated fater December 2021. Let's just zoom into that time period to analyse better.

In [94]:
insider_eda.plotly_insider_activity(start_date='2022-01-01',end_date='2022-05-18')

As we can see that there activity increases with time but sale transactions are always on top which signals that the price might drop after this time period.

In [95]:
stock_eda.plotly_single_timeseries_plot(y_variable="Close")

I can clearly see that there is a drop in price once there is a sale or option exercise transaction like on 20th April 2022 and 21th April 2022. So I can draw my conclusion that they influenced the drop in price. Let's move on to their distribution of activity.

## Analyzing Distribution of number of Incident per Company

In this case, I estimated the number of incidents that each insider had and then grouped them to get the number of insiders who have a specific amount of occurrences.

In [96]:
insider_eda.plotly_individual_insider_activity()

The plot has a decreasing trend more towards an exponential decreases, indicating that most of the insider have less number incident.Alsway we can see that most of them are involved in selling activity.

Compared to insiders engaged in buying, those engaged in selling engage in many more transactions. It is possible for an insider to be active in both.

## Top indviduals involved in Insider Trading.

In [97]:
insider_eda.plotly_top_contributor()

Creed Joseph E has the most number of insider cases(5), followed by De Lange Bob and others. It will be interesting to see his role in the company.

In [98]:
df[df["Insider Trading"]=="Creed Joseph E"]["Relationship"][0]

'Group President'

So it is the group president who has the highest number of insider cases. Let's see which roles mostly involved in insider cases

## Which Roles at Companies are most common for Insider Activity?

In [99]:
insider_eda.plotly_insider_activity_roles()

Mostly people at C-level(CLO, CHRO and Group president) are mostly involved in insider trading. Officers and Presidents are called Corporate  Insiders. Because of the nature of their positions and responsibilities inside the company, these employees have access to confidential information. They have the ability to capitalize on the good and bad news of the company before it is widely disseminated, giving them an edge over the average market participant.

## Market Cap for Top Companies:

Let's take a look at the insiders' combined market values. The total dollar worth of an insider's shares is known as their market capitalization. It is determined by multiplying the number of shares purchased by an insider by the current share price.

Now I'm going to visualize the insiders who had the most insider activity within the time period we studied.

In [100]:
insider_eda.plotly_market_cap()

BONFIELD ANDREW R J overshadows all others.

In [101]:
df[df["Insider Trading"]=="BONFIELD ANDREW R J"]

Unnamed: 0,Insider Trading,Relationship,Date,Transaction,Cost,Shares,Value ($),Shares Total,SEC Form 4
9,BONFIELD ANDREW R J,Chief Financial Officer,2022-03-09,Option Exercise,132.31,41107,5438882,71514,Mar 11 10:25 AM


However, his dominance in market capitalization is evident from the fact that he has only transacted once. The other four are involved in large caps (Long Suzetter M, De Lange Bob, Johnson Cheryl H, and Creed Joseph E), while BONFIELD ANDREW R J is a mega cap insider. The other two are involved in mid caps,small cap or micro cap.

## Do Insiders trade in sync with the market?

In [102]:
df_sorted = df.set_index("Date",inplace=False).sort_values(by="Date")       # Sorting data based on date
Exploratory_data_analysis(df_sorted).plotly_seasonal_boxplot_ym( y_variable="Value ($)",box_group="monthly")

And the shares of the highest value was bought in March. This is a kind of a signal that something was going to happend after March.

In [103]:
stock_eda.plotly_single_timeseries_plot(y_variable="Close")

Well yeah, the price went down when they traded in large volume in Jan 2022 and the price shoot up when they again traded in March 2022. This shows that their activites have a huge impact on the market price and they are the main players driving the price.

In [104]:
insider_eda.plotly_insider_activity_timeseries_plot(stock_df.reset_index(), start_date='2021-12-01',end_date='2022-05-18')

The trend can be clearly see that once they make sale transaction the price goes down. In below plot, I compare the volume of insider sales (over time) to the volume of sales in the S&P 500 (over time).

In [105]:
insider_eda.plotly_market_vs_insider(stock_df.reset_index(), include=["buy", "sale","opt"], start_date='2022-01-01',end_date='2022-05-18')

This plot strongly implies that Insiders are controversial contrarians. They are characterized by a tendency to increase their sales during market upswings and increase their purchases during downturns. It has been observed that insiders are net buyers of low P/E stocks and net sellers of high P/E stocks.

## Are Insiders Short-sighted?

How does the stock price of a company's stock change if an insider buys, sells, or exercises an option to buy or sell that company's stock? Stock prices are likely to rise in the insiders' favor since they have access to information that is not available to the general public. I will try to explore the short-sightedness of the insiders. Then, I determine the percentage return or price change of the stocks throughout that time period. Due to the occurrence of missing data while using the adjusted close value, I opted to utilize close instead. I calculated the returns after 1 day, 2 days, 3 days, 4 days and 5 days following the transaction.

In [106]:
insider_eda.plotly_returns(stock_df.reset_index(), include=["buy","sale","opt"], returns="short")

The option exercise has lot of outlier and we cannot visualize properly the relationships. Let me filter out these kind of transactions to vizualize properly.

In [107]:
insider_eda.plotly_returns(stock_df.reset_index(), include=["buy","sale"], returns="short")

After the date of the transaction, the returns on the purchased stocks rise and become less negative. Insider purchases typically result in a price increase for the stock in the days following the trade.

However, insider sales generate negative returns. Insiders sell shares of stock before they lose value, using their access to non-public information.

The plot graphically displays the anticipated short-term trends that will occur after an insider trades. When insiders trade, the stock price may fluctuate erratically, but it is rarely because of news related to the company itself. To the contrary, stock prices shift in response to the news of the trading itself, with "market participants thinking insiders have inside information" and "copy-cat behaviors" leading to a favorable shift in the stock price for the insider.

## Long Term effects of Insider Trades on Stock prices

This is another contentious issue in the world of finance. Is it possible for insiders to predict long-term profits on their trades, despite the fact that they plainly have the ability to influence stock prices in the days immediately after the trade? Do they, on average, outperform the stock market over the long haul? I provide explanations for the information sources evaluated.

In [108]:
insider_eda.plotly_returns(stock_df.reset_index(), include=["buy","sale","opt"], returns="long")

There are several outliers in the option exercise, making it difficult to see the connections between them. Let me to exclude such transactions so that I can visualize clearly.

In [109]:
insider_eda.plotly_returns(stock_df.reset_index(), include=["buy","sale"], returns="long")

In the long run, insiders do have a greater chance of success than the average investor. The returns of sold stocks are decreasing while that of purchased stocks keeps on increasing.

Since most of the traded stocks show a slower rate of price growth, the green boxplot is skewed toward the bottom. The size of the boxplot is another striking feature, leading me to the conclusion that Insider-traded stocks, and especially Insider-purchased stocks, are more volatile than the overall market. These stocks are riskier, but the Insiders are known to take calculated risks.

To sum up, insiders may sell their stock without factoring in confidential information. It's much more common to see somebody unhappy after suffering a loss than ecstatic after gaining something. Therefore, investors are more likely to sell their winners than their losers, and they will take higher risks to prevent losses than they would to obtain similar gains. So, when insiders are seen selling their stock, it may not always be owing to their knowledge of bad trends but rather because the stock has reached the return they had anticipated.

In [110]:
df_sorted

Unnamed: 0_level_0,Insider Trading,Relationship,Transaction,Cost,Shares,Value ($),Shares Total,SEC Form 4
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2021-08-04,MacLennan David,Director,Buy,206.2,500,103100,1392,Aug 05 12:36 PM
2021-11-01,Creed Joseph E,Group President,Sale,204.09,5038,1028205,0,Nov 02 03:29 PM
2022-01-07,Johnson Cheryl H,Chief Human Resources Officer,Sale,225.0,6415,1443375,16268,Jan 10 05:39 PM
2022-01-07,De Lange Bob,Group President,Sale,225.0,23435,5272875,36903,Jan 10 05:38 PM
2022-01-07,De Lange Bob,Group President,Option Exercise,74.77,23435,1752235,60338,Jan 10 05:38 PM
2022-01-07,Johnson Cheryl H,Chief Human Resources Officer,Option Exercise,127.6,6415,818554,22683,Jan 10 05:39 PM
2022-01-18,Creed Joseph E,Group President,Sale,230.0,5004,1150920,0,Jan 19 03:52 PM
2022-02-07,MacLennan David,Director,Buy,199.5,480,95760,1877,Feb 09 12:28 PM
2022-02-16,Marvel Gary Michael,Chief Accounting Officer,Sale,203.11,674,136896,0,Feb 17 01:54 PM
2022-03-09,BONFIELD ANDREW R J,Chief Financial Officer,Option Exercise,132.31,41107,5438882,71514,Mar 11 10:25 AM


## Casual Effects

The Granger Causality Test is a hypothetical econometric procedure for determining whether or not one variable is used as a predictor of another in multivariate time series data after a given lag.

The Granger Causality test can only be performed if the data are stationary, meaning they have a constant mean, a constant variance, and no seasonal component. Differentiating the data, either at the first or second order, will cause the data to become stationary. If, after applying second-order differentiation, the data is still not stationary, I abandon the Granger causality test.

### Augmented Dickey-Fuller Test (ADF test)

The ADF test, based on the unit root test, is a well-liked statistical method for determining whether a Time Series is stationary. There must be a certain number of differentiations applied to the series before it becomes stationary, and this number is equal to the number of the series' unit roots.

In [111]:
insider_eda.ask_adfuller(["Cost","Shares","Value ($)","Shares Total"])

---------------------------------------------------------------------------------------------------------------------
AD Fuller Test for Cost:
---------------------------------------------------------------------------------------------------------------------
Test statistic:  -5.402947491308025
p-value:  3.339253515742547e-06
Critical Values: {'1%': -3.859073285322359, '5%': -3.0420456927297668, '10%': -2.6609064197530863}
----------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
AD Fuller Test for Shares:
---------------------------------------------------------------------------------------------------------------------
Test statistic:  -3.1686828533339173
p-value:  0.021870522739523397
Critical Values: {'1%': -3.9240193847656246, '5%': -3.0684982031250003, '10%': -2.67389265625}
---------------------------

As I can see that the p-value for the columns "Cost" and "Shares Total" is above 0.05, I cannot reject the Null hypothesis and the series are not stationary.
 
Due to the fact that the Granger causality test can only be performed on stationary data, we must first perform a data transformation known as differentiation to render the data stationary.

In [112]:
df_transformed =  df[['Cost', 'Shares Total']].diff().dropna()

Let me run the ADF test again to see if the p-value has reduced below 0.05.

In [113]:
Exploratory_data_analysis(df_transformed).ask_adfuller(["Cost","Shares Total"])

---------------------------------------------------------------------------------------------------------------------
AD Fuller Test for Cost:
---------------------------------------------------------------------------------------------------------------------
Test statistic:  -1.0738309697845962
p-value:  0.7253942098441257
Critical Values: {'1%': -4.223238279489106, '5%': -3.189368925619835, '10%': -2.729839421487603}
----------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
AD Fuller Test for Shares Total:
---------------------------------------------------------------------------------------------------------------------
Test statistic:  -0.9992902646304322
p-value:  0.7535230728363604
Critical Values: {'1%': -4.331573, '5%': -3.23295, '10%': -2.7487}
-------------------------------------------------------

Unfortunately, the p-value is still higher than 0.05 so I will do second order differencing to see if it deceases.

In [114]:
df_transformed =  df_transformed[['Cost', 'Shares Total']].diff().dropna()

Now, let's run the ADF test again.

In [115]:
Exploratory_data_analysis(df_transformed).ask_adfuller(["Cost","Shares Total"])

---------------------------------------------------------------------------------------------------------------------
AD Fuller Test for Cost:
---------------------------------------------------------------------------------------------------------------------
Test statistic:  -2.709743313334425
p-value:  0.0723673725450962
Critical Values: {'1%': -4.331573, '5%': -3.23295, '10%': -2.7487}
----------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
AD Fuller Test for Shares Total:
---------------------------------------------------------------------------------------------------------------------
Test statistic:  -5.93304015522167
p-value:  2.3558975474645677e-07
Critical Values: {'1%': -4.331573, '5%': -3.23295, '10%': -2.7487}
------------------------------------------------------------------------------------

As we can see even after second order differencing, the p-value is still above 0.05 hence I cannot run Granger casaulty test on them. I will run the Granger casaulty test on Shares and Value ($).

### Do Value ($) granger causes Shares?

Null Hypothesis (H0) : Value ($) do not granger cause Shares.

Alternative Hypothesis (HA) : Value ($) granger cause Shares.

In [116]:
insider_eda.plotly_single_granger(x_variable="Shares",y_variable="Value ($)", max_lags=5)

Since the p-value is higher than 0.05, the Null hypothesis cannot be rejected and hence Value ($) do not granger causes Shares.

Let's repeat in the opposite direcetion.

### Do Shares granger causes Value ($)?

Null Hypothesis (H0) : Shares do not granger cause Value ($).

Alternative Hypothesis (HA) : Shares granger cause Value ($).

In [117]:
insider_eda.plotly_single_granger(y_variable="Shares",x_variable="Value ($)", max_lags=5)

Considering that the p-value is greater than 0.05, it is not possible to reject the null hypothesis and conclude that Shares do not granger cause Value ($).