# Data Visualization

In this Colab notebook, I explore the dataset that has been cleaned and attempt to answer the following questions:

- What are the stocks that are most frequently traded?

- Who are the most prolific traders? 
  - By party

- Which senator was most successful in trading? 
  - Total absolute returns
  - Annulized returns per investment 

- Which sectors are most favored?

- What is the correlation between trading activity (number of trades) and overall performance?


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pickle
import requests
import json
import plotly.express as px

from google.colab import drive
drive.mount('/content/drive')
senate_final_ls = pickle.load(open("/content/drive/MyDrive/Colab Notebooks/Final Project/senate_final_ls.pkl", "rb"))
senators_df = pd.DataFrame(senate_final_ls)

Mounted at /content/drive


In [2]:
# Raw data
senate_raw_link = "https://senate-stock-watcher-data.s3-us-west-2.amazonaws.com/aggregate/all_transactions.json"
senate_raw_response = requests.get(senate_raw_link)
senate_raw_json = json.loads(senate_raw_response.text)
senate_raw_df = pd.json_normalize(senate_raw_json)
senate_stocks_raw_df = senate_raw_df[senate_raw_df["asset_type"] == "Stock"]

## What stocks are traded most frequently by Senators?

For this purpose, we will look at the raw dataset which contains ALL stocks traded, instead of the cleaned dataset ("senators_df") which only contains information on closed investments (bought and sold).




In [3]:
freq_stocks_traded = pd.DataFrame(senate_stocks_raw_df["ticker"].value_counts()).reset_index()
freq_stocks_traded.columns = ["ticker", "freq"]
freq_stocks_traded

fig_freq_stocks_traded = px.bar(freq_stocks_traded[:20], x="ticker", y="freq", color="freq", 
             title="Top 20 most traded stocks among Senators")
fig_freq_stocks_traded.show()

As expected, Apple, Microsoft and other blue-chip large market capitalization stocks appear (tech, banking, oil and gas)

However, companies such as Disney, Warner Bros Discovery, **Urban Outfitters, and Caesars Entertainment Inc** were also highly traded. 

Let us find out who is/are the senator(s) trading Urban Outfitters.


In [4]:
senate_stocks_raw_df[senate_stocks_raw_df["ticker"] == "URBN"]["senator"].value_counts()

David Perdue    65
Name: senator, dtype: int64

It seems strange that David Perdue (Georgia, Republican, 2015-2021) has such a high level of interest in Urban Outfitters.

## Senators moonlighting as day-trader

I was also curious as to who were the Senators traded the most (by number of trades), both across the Senate and within each Party.

In [5]:
freq_trading = pd.DataFrame(senate_stocks_raw_df["senator"].value_counts()).reset_index()
freq_trading.columns = ["senator", "freq"]
# Get party affliation
freq_trading = freq_trading.merge(senate_stocks_raw_df[["senator","party"]], how="inner", on="senator").drop_duplicates()
freq_trading = freq_trading[freq_trading["freq"] >= 50]

fig_freq_trading = px.bar(freq_trading, x="senator", y="freq", color="party",
                          color_discrete_map={'Republican': 'red', 
                                              'Democrat': 'blue'},
                          title = "Senators who made more than 50 trades")
fig_freq_trading.show()

In general, **Republican Senators** are more likely to be prolific traders (as  defined by making more than 50 trades).

In particular, David Perdue (Georgia, Republican) made **2270** trades during his term (2015-2021).

That is, on average, 6 trades per week, i.e. at least once every working day.

## How successful were Senators at trading?

Are Senators, capable of converting their political capital and information access, into superior gains?

In [6]:
# First, let us approx how much each Senator made in total from their investments.
senators_df["absolute_gain"] = senators_df["amount_transacted"] * senators_df["returns"]


# groupby senator, sum absolute_gain
absolute_gain_by_senator_df = pd.DataFrame(senators_df.groupby(["senator"])["absolute_gain"].sum()).sort_values("absolute_gain", ascending=False).reset_index()

# Get party affliation
absolute_gain_by_senator_df = absolute_gain_by_senator_df.merge(senate_stocks_raw_df[["senator","party"]], how="inner", on="senator").drop_duplicates()

# Net gain or loss
absolute_gain_by_senator_df["gain_loss"] = absolute_gain_by_senator_df["absolute_gain"] > 0


fig_gains = px.bar(absolute_gain_by_senator_df, x="senator", y="absolute_gain", color="gain_loss",
                   title = "Absolute gains from investments",
                   color_discrete_map={True: 'green', False: 'red'})

fig_gains.update_xaxes(type='category')

fig_gains.show()

We can observe that David Perdue made more than **2 million dollars** from his investments! 

At the same time, there were 5 Senators that lost money from their investments (althought not by much: Sheldon Whitehouse, who lost the most, **lost ~$20,000**).

### What about on a relative basis? 
Do Senators make money off most trades, or do they rely on a few key trades only? 

Senators who *consistently* produce superior gains have high suspectability in terms of insider trading / Conflict of Interests potentail.


In [7]:
senators_df["senator_ticker"] = senators_df["senator"] + "_" + senators_df["ticker"]
senators_df = senators_df.sort_values("returns", ascending=False)
fig_relative_gains = px.bar(senators_df, x="senator_ticker", y="returns", title="Returns on Investments", color="senator")
fig_relative_gains.show()

Hmm, it seems that while most trades were profitable, a significant minority were also a net loss on investment. A good number of senators did indeed suffer a loss around half of their investments (See: David Perdue, the most prolific trader in the Senate).

However, **Bill Cassidy** (Republican, Louisiana, term: 2015-present) appears to be particularly skillful/lucky in his trades! In fact, he made money on most of his trades except for 2. 

### What about on an annualized basis?

Given that investment periods differ, we can attempt to "standardize" the returns to identify the trade with the highest return.



In [8]:
# Find investment duration
senators_df['earliest_purchase_date_dt'] = pd.to_datetime(senators_df['earliest_purchase_date'])
senators_df['latest_full_sale_date_dt'] = pd.to_datetime(senators_df['latest_full_sale_date'])
senators_df['investment_duation'] = (pd.to_datetime(senators_df['latest_full_sale_date_dt'])-pd.to_datetime(senators_df['earliest_purchase_date_dt'])).astype('str').str[:-4].astype('int')

fig_relative_gains_annulized = px.bar(senators_df.sort_values("annualized_returns", ascending=False)[:15], 
                                      x="senator_ticker", y="annualized_returns", 
                                      color='investment_duation', 
                                      title="Returns on Investments (Annualized) - Top 15 Best Performing Trades")
fig_relative_gains_annulized.show()

Looking at the top 15 best performing trades (annualized basis), we discover that most of them had an investment duration of less than **30 days**.

In fact, the best performing trade, made by Kelly Loeffler (Republican, Georgia, 2020-2021), with **an annulized returns over 100,000% was held on for only 32 days**. 

That most of the most profitable trades had such short investment duration, increases my suspicion that Senators could possibly have converted their access to information to enhance their stock performance.

## Do Senators prefer a specific sector to invest?



In [9]:
# Find initial year of investment
senators_df["year_invested"] = pd.to_datetime(senators_df['earliest_purchase_date']).dt.strftime('%Y')

In [10]:
freq_industry_df = pd.DataFrame(senators_df.groupby(["year_invested"])["sector"].value_counts()).unstack().fillna(0).droplevel(0,axis=1)
fig_freq_industry = px.line(freq_industry_df, title = "Popularity of Sector as Investment Target")

fig_freq_industry.update_layout(yaxis_title="Frequency")

fig_freq_industry

No clearly discernible trends.

## Does trading more often make you a better an investor (Senator Edition)?

The answer, for most retail investors, is no. 

Let us investigate if Senators are able to buck the trend.


In [11]:
# note: we calculated freq_trading earlier on in this notebook
# calculate average annulized returns across investments for each Senator

senator_avg_return = pd.DataFrame(senators_df.groupby("senator")["annualized_returns"].mean().reset_index())
return_freq_df = senator_avg_return.merge(freq_trading, on = "senator")


return_freq_df.columns = ["senator", "Average annualized return", "Frequency of trading", "Party"]
return_freq_fig = px.scatter(return_freq_df, 
                             x="Frequency of trading", 
                             y="Average annualized return", 
                             title="Relationship between frequency of trading and trading performance",
                             trendline="ols")
return_freq_fig

It appears that Senators are just like us - the more they trade, the poorer their returns! In general, it does not appear that Senators who traded more often enjoyed higher net returns.

Possible explanations (from "most_naive" to "most_sinister")

- Insider trading/COI instances are indeed rare / non-existent ==> Senators are deterred by existing regulation and disclosure policies. 

- Senators are not exploiting their informational/influence edge substantially/consistently.

- Senators who commit insider trading are simply not disclosing their trades (and/or are good enough to hide them!)


In [12]:
pickle.dump(senate_stocks_raw_df, open("/content/drive/MyDrive/Colab Notebooks/Final Project/senate_stocks_raw_df.pkl", "wb"))
pickle.dump(senators_df, open("/content/drive/MyDrive/Colab Notebooks/Final Project/senators_data_vis_df.pkl", "wb"))