# Preparation

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pickle
import requests
import json
import plotly.express as px

from google.colab import drive
drive.mount('/content/drive')
senate_final_ls = pickle.load(open("/content/drive/MyDrive/Colab Notebooks/Final Project/senate_final_ls.pkl", "rb"))
senators_df = pd.DataFrame(senate_final_ls)

# Raw data
senate_raw_link = "https://senate-stock-watcher-data.s3-us-west-2.amazonaws.com/aggregate/all_transactions.json"
senate_raw_response = requests.get(senate_raw_link)
senate_raw_json = json.loads(senate_raw_response.text)
senate_raw_df = pd.json_normalize(senate_raw_json)
senate_stocks_raw_df = senate_raw_df[senate_raw_df["asset_type"] == "Stock"]

freq_stocks_traded = pd.DataFrame(senate_stocks_raw_df["ticker"].value_counts()).reset_index()
freq_stocks_traded.columns = ["ticker", "freq"]
freq_stocks_traded

fig_freq_stocks_traded = px.bar(freq_stocks_traded[:20], x="ticker", y="freq", color="freq", 
             title="Top 20 most traded stocks among Senators")


freq_trading = pd.DataFrame(senate_stocks_raw_df["senator"].value_counts()).reset_index()
freq_trading.columns = ["senator", "freq"]
# Get party affliation
freq_trading = freq_trading.merge(senate_stocks_raw_df[["senator","party"]], how="inner", on="senator").drop_duplicates()
freq_trading = freq_trading[freq_trading["freq"] >= 50]

fig_freq_trading = px.bar(freq_trading, x="senator", y="freq", color="party",
                          color_discrete_map={'Republican': 'red', 
                                              'Democrat': 'blue'},
                          title = "Senators who made more than 50 trades")


# First, let us approx how much each Senator made in total from their investments.
senators_df["absolute_gain"] = senators_df["amount_transacted"] * senators_df["returns"]


# groupby senator, sum absolute_gain
absolute_gain_by_senator_df = pd.DataFrame(senators_df.groupby(["senator"])["absolute_gain"].sum()).sort_values("absolute_gain", ascending=False).reset_index()

# Get party affliation
absolute_gain_by_senator_df = absolute_gain_by_senator_df.merge(senate_stocks_raw_df[["senator","party"]], how="inner", on="senator").drop_duplicates()

# Net gain or loss
absolute_gain_by_senator_df["gain_loss"] = absolute_gain_by_senator_df["absolute_gain"] > 0


fig_gains = px.bar(absolute_gain_by_senator_df, x="senator", y="absolute_gain", color="gain_loss",
                   title = "Absolute gains from investments",
                   color_discrete_map={True: 'green', False: 'red'})

fig_gains.update_xaxes(type='category')


senators_df["senator_ticker"] = senators_df["senator"] + "_" + senators_df["ticker"]
senators_df = senators_df.sort_values("returns", ascending=False)
fig_relative_gains = px.bar(senators_df, x="senator_ticker", y="returns", title="Returns on Investments", color="senator")
fig_relative_gains.show()

# Find investment duration
senators_df['earliest_purchase_date_dt'] = pd.to_datetime(senators_df['earliest_purchase_date'])
senators_df['latest_full_sale_date_dt'] = pd.to_datetime(senators_df['latest_full_sale_date'])
senators_df['investment_duation'] = (pd.to_datetime(senators_df['latest_full_sale_date_dt'])-pd.to_datetime(senators_df['earliest_purchase_date_dt'])).astype('str').str[:-4].astype('int')

fig_relative_gains_annulized = px.bar(senators_df.sort_values("annualized_returns", ascending=False)[:30], 
                                      x="senator_ticker", y="annualized_returns", 
                                      color='investment_duation', 
                                      title="Returns on Investments (Annualized) - Top 15 Best Performing Trades")
fig_relative_gains_annulized.show()

senator_avg_return = pd.DataFrame(senators_df.groupby("senator")["annualized_returns"].mean().reset_index())
return_freq_df = senator_avg_return.merge(freq_trading, on = "senator")


return_freq_df.columns = ["senator", "Average annualized return", "Frequency of trading", "Party"]
return_freq_fig = px.scatter(return_freq_df, 
                             x="Frequency of trading", 
                             y="Average annualized return", 
                             title="Relationship between frequency of trading and trading performance",
                             trendline="ols")


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Studying the investment history and performance of Senators

As an international student, I have always found it puzzling that congressmen/congresswomen and senators could invest in companies that they have significant influence over with minimal levels of disclosure. In this project, I aim to answer two broad questions:

- Do Senators actually outperform the market?

- Can we detect potential conflicts of interests (COI) in Senators' trading activities?

We will need two sets of data: 

a. Investment activity of Senators

b. Stock prices on relevant buying/selling dates

## Gathering Data

### Trading Activity API

I will be using a dataset available at:

- https://senatestockwatcher.com/api **[2014 - 2023 trading activity]**

This is done for three reasons:

- It is a lot faster than querying the government database and getting IP banned (even with the use of sleep functions)!

- Does not have to perform OCR on PDFs of disclosure reports

- (Some) Handwritten forms are transcribed manually by volunteers 

### yfinance API



In [None]:
import datetime

def get_adj_close_price(stock_ticker, date_obj):
  date_str = date_obj.strftime('%Y-%m-%d')
  next_date = date_obj + datetime.timedelta(days=1)
  next_date_str = next_date.strftime('%Y-%m-%d')

  ticker = yf.Ticker(stock_ticker)
  
  try:
    adj_close_price = ticker.history(start=date_str, end=next_date_str)["Close"][0]

  except: # to catch delisted stocks
    adj_close_price = "delisted"

  return adj_close_price

# test
get_adj_close_price("AAPL", datetime.datetime.strptime("2023-03-17", "%Y-%m-%d"))

# Cleaning Data

Now, we need to calculate the returns for each investment, of each politician.

There are some (many!) **ambiguities** around this. A few of the more pertinent ones are noted below:

    a. Some stocks have delisted and others have changed their ticker names (FB --> META).
    b. The exact investment amount is unknown; only ranges are given.
    c. Politicians could enter into a new position after closing a previous position in that stock (buy 10k of AAPL, sell 10k of AAPl, buy 10k of AAPL again).
    d. Exact buy/sell prices are not given

For this project, I will be calculating returns in the following manner to address the above ambiguities:

    a. Stocks that are delisted are not considered as part of returns calculation 
    b. Convert the Categorical Variable "amount", into a Quantitative Variable by identifying the median amount within each range
    c. Amount transacted for each investment is approximated by total amount sold
    d. Transaction prices are approximated by the intra-day closing price on day of transaction
    e. Buy/sell price is the weighted average buy/sell price of all buy/sell transactions
    f. Investment period for one stock is given by the earliest purchase date and latest sell by date.
  


In [None]:
# discard all purchase transactions after latest full sale date
ticker_cleaned = ticker[ticker["transaction_date"] <= latest_full_sale_date]

# calculate returns
returns = (avg_sell_price - avg_buy_price) / avg_buy_price

# calculate annulized returns
days_held = (ticker_cleaned[ticker_cleaned["type"] == "Sale (Full)"]["transaction_date"].max() - ticker_cleaned[ticker_cleaned["type"] == "Purchase"]["transaction_date"].min()).days
annualized_returns = (1+returns) ** (365/days_held) - 1

# Data Visualization

## What stocks are traded most frequently by Senators?


In [3]:
fig_freq_stocks_traded.show()

As expected, Apple, Microsoft and other blue-chip large market capitalization stocks appear (tech, banking, oil and gas)

However, companies such as Disney, Warner Bros Discovery, **Urban Outfitters, and Caesars Entertainment Inc** were also highly traded. 


## Senators moonlighting as day-trader

I was also curious as to who traded the most (by number of trades), both across the Senate and within each Party.

In [4]:
fig_freq_trading.show()

In general, **Republican Senators** are more likely to be prolific traders (as  defined by making more than 50 trades).

In particular, David Perdue (Georgia, Republican) made **2270** trades during his term (2015-2021).

That is, on average, 6 trades per week, i.e. at least once every working day.

## How successful were Senators at trading?

Are Senators able to convert their political capital and information access into superior gains?

In [5]:
fig_gains

**Do Senators make money off most trades, or do they rely on a few key trades only?**

[Potential suspect for COI / Insider trading]

In [6]:
fig_relative_gains.show()

### What about on an annualized basis?

Given that investment periods differ, we can attempt to "standardize" the returns to identify the trade with the highest return.


In [7]:
fig_relative_gains_annulized.show()

Looking at the top 15 best performing trades (annualized basis), we discover that most of them had an investment duration of less than **30 days**.

In fact, the best performing trade, made by Kelly Loeffler (Republican, Georgia, 2020-2021), with **an annulized returns over 100,000% was held on for only 32 days**. 

That most of the most profitable trades had such short investment duration, increases my suspicion that Senators could possibly have converted their access to information to enhance their stock performance.

## Does trading more often make you a better an trader (Senator Edition)?

The answer, for most retail investors, is no. 

Let us investigate if Senators are able to buck the trend.

In [8]:
return_freq_fig

It appears that Senators are just like us - the more they trade, the poorer their returns! 

Possible explanations (from "most_naive" to "most_sinister")

- Insider trading/COI instances are indeed rare / non-existent ==> Senators are deterred by existing regulation and disclosure policies. 

- Senators are not exploiting their informational/influence edge substantially/consistently.

- Senators who commit insider trading are simply not disclosing their trades (and/or are good enough to hide them!)



## Can we infer from transaction metadata, the Senator who placed such a transaction?

Some of the existing columns are not needed for prediction (for instance, link to original PDF). Others, such as asset_type is redundant, since only stocks are considered. 

State is also removed, as there are only two Senators from each state, which could skew the prediction process.

We can however, generate (potentially) helpful predictors. For instance, Senators may differ in their propensity to file their disclosure statements. The duration between filing date and transaction could be useful in infering the identity of the investor.

A KNN (k=3) classifier model ==> F1 score: ~0.84

A tuned Random Forest model ==> F1 score: ~0.86


## Can we predict the average annualized return of Senators?

This question requires the cleaned dataset, in which returns were calculated for closed investments. 

Predictors include:
- party
- state
- avg. investment duration
- most favored sector
- most favored industry
- absolute_gain

Unfortunately, as the final dataset is too small (17 Senators), the RMSE (using a Lasso Regression mode) was ~0.27 (27%) after accouting for outliers.

# Some concluding thoughts

Earlier on, in the "data_collection_cleaning" notebook, I set out to answer the following questions:

1. Do Senators actually outperform the market?

2. Can we detect potential conflicts in interests in Senators' trading activities?

Based on the data available, most senators likely do not significantly outperform the market. In fact, we observe senators who made a net loss on their stock investments.

However, there are some trades that warrant greater attention. In particular, investments with holding periods less than 30 days (buying and selling within 30 days) with outsized returns could serve as indicators of insider trading.

In particular, Kelly Loeffler's highly profitable trades did in fact catch the attention of the Senate Ethics Committee. However, investigations were latter dropped (source: https://www.politico.com/news/2020/06/16/senate-ethics-committee-drops-probe-loeffler-stock-trades-323795).

**Existing disclosures provide minimal information to the public, and cannot satisfactorily dispel concerns regarding insider trading.**

For instance, the absence of accurate buy/sell prices of investments, along with the size of investment, makes it difficult to calculate the exact returns on investments. **Greater disclosure coverage could help to improve public confidence in the legislative bodies.**