# Introduction
Place holder for our introduction to insider trading and the benefit of using successful insider trades.

## Libraries and Dependencies
In order to increase the repeatability, reproducibility, and replicability of our project, we will load in all of our libraries and freeze the dependencies to a file so that anyone replicating our research will know the versions used.

In [2]:
#Colab Libraries
from google.colab import drive, files
#Data Import Libraries
import os, zipfile, time, requests
from bs4 import BeautifulSoup
#Data Manipulation Libraries
import numpy as np
import pandas as pd
#Visualization Libraries
import matplotlib.pyplot as plt

In [None]:
#print the dependencies in the notebook
!pip freeze

#create a .txt file that contains all versions
#!pip freeze > colab_requirements.txt

## Loading Primary Dataset
We will load multiple SEC Form 4 filing zip archives (Source: https://www.sec.gov/data-research/sec-markets-data/insider-transactions-data-sets). Each ZIP archive contains 10 files, we will extract and process three .tsv files inside of each archive and filter insider transactions by open-market purchases transacted by individual insiders (excluding investment entitities such as funds, limited parnerships, and trusts). We will identify transactions involving corporate officers and clen the data by removing all invalid records (those with missing roles). The processed results are compiled into a dataframe and saved to a .csv for backup and potential upload to a database or machine-learning pipeline (e.g. BigQuery).

We will start by mounting our google drive and importing files.

In [3]:
#Mount google drive
drive.mount('/content/drive')

Mounted at /content/drive


In [20]:
'''For the final project, we will have only a singular path to all infomation'''
#Students Google Drive Path
toms_path = '/content/drive/MyDrive/Colab Notebooks/593 - Milestone I/593 - Insider Trading Milestone I Project'
kirts_path = None
ramis_path = None

#Navigate to the right working directory and confirm our current working drive
os.chdir(toms_path)
#os.chdir(kirts_path)
#os.chdir(ramis_path)
print(os.getcwd())

/content/drive/MyDrive/Colab Notebooks/593 - Milestone I/593 - Insider Trading Milestone I Project


The first thing that we need to do is go to the SEC website and download all of the ZIP archives of the data and save them to our google drive. You must identify yourself with a proper User-Agent header in order to connect to the SEC website

In [None]:
'''Only run this cell once to download the data'''

#URLs where we can download the files
url = 'https://www.sec.gov/data-research/sec-markets-data/insider-transactions-data-sets'
#Create a session with a real User-Agent
session = requests.Session()
session.headers.update({'User-Agent':'tmacphe (tmacpe@umich.edu)',
                       'Accept-Encoding': 'gzip,deflate',
                       'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'})
#Let's grab the page
page = session.get(url)
page.raise_for_status()

#Now, we need to find all of the .zip links
soup = BeautifulSoup(page.content, 'html.parser')
zip_links = []
for a in soup.find_all('a',href=True):
    href = a['href']
    if href.lower().endswith('.zip'):
        url = href if href.startswith('http') else 'https://www.sec.gov' + href
        zip_links.append(url)

#Create a folder to store all of our zipped archives
os.makedirs('sec_insider_zips',exist_ok=True)

#Download each file into the directory
for url in zip_links:
    #We can pull out the file name using the os.path
    filename = os.path.basename(url)
    out_path = os.path.join('sec_insider_zips',filename)
    if os.path.exists(out_path):
        print(f"Skipping {filename} because it has already been downloaded")
        continue
    else:
        print(f"Downloading {filename}...")
        #Now let's get the new webpages with the zip files
        zip_file = session.get(url)
        zip_file.raise_for_status()
        #create a file and write the contents to it (write binary 'wb')
        with open(out_path, 'wb') as f:
            f.write(zip_file.content)
        print(f"{filename} downloaded")
        time.sleep(0.5)


Next I will add all of Kirt's data manipulation for merging the files. We will have a slight change to the beginning of the code so we don't have to manually select the files that we will be uploading. However, once they are extracted everything else should be the same.

In [35]:
#Let's point to the folder where we put all of the files
output_path = "/content/drive/MyDrive/Colab Notebooks/593 - Milestone I/593 - Insider Trading Milestone I Project/sec_insider_zips"

#Let's build a list of the files that we can iterate over
zip_list = os.listdir(output_path)

#Let's sort it from oldest to newest
zip_list.sort()

#Let's create a list to store the intermediate dataframes
merged_all = []

#Let's iterate over each file pull what we need to create dataframes to merge
for zip_filename in zip_list:
    zip_path = os.path.join('sec_insider_zips',zip_filename)
    #Create the folder for the files for this quarter
    print(f"Processing: {zip_filename}")
    folder_name = zip_filename.replace(".zip", "")
    extract_path = os.path.join(output_path,folder_name)

    #Make sure we have a folder
    os.makedirs(extract_path, exist_ok=True)

    #Extract the files
    with zipfile.ZipFile(zip_path,'r') as zip_ref:
        zip_ref.extractall(extract_path)

    #Load the .TSV files we are interested in
    try:
        nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t",low_memory = False)
        report = pd.read_csv(os.path.join(extract_path,"REPORTINGOWNER.tsv"),sep="\t",low_memory = False)
        submission = pd.read_csv(os.path.join(extract_path, "SUBMISSION.tsv"),sep="\t",low_memory = False)
    except Exception as e:
        print(f"Skipping {zip_filename} due to load error: {e}")
        continue

    # Add derived insider role (in case 'Insider Title' is NaN)
    # Default is 'RPTOWNER_RELATIONSHIP'
    def get_role(row):
        if row.get("ISOFFICER") == "true":
            return "Officer"
        elif row.get("ISDIRECTOR") == "true":
            return "Director"
        elif row.get("ISTENPERCENTOWNER") == "true":
            return "10% Owner"
        elif row.get("ISOTHER") == "true":
            return "Other Insider"
        elif pd.notna(row.get("RPTOWNER_RELATIONSHIP")):
          return row["RPTOWNER_RELATIONSHIP"].strip().title()
        else:
            return None

    report["Insider Role"] = report.apply(get_role, axis=1)


    # Filter for common stock purchases
    # Can modify "TRANS_CODE" to include Sales ("S")
    filtered = nonderiv[
        (nonderiv["SECURITY_TITLE"].str.lower() == "common stock") &
        (nonderiv["TRANS_CODE"] == "P")
    ]

    # Join with REPORTINGOWNER.tsv before filtering out entities or invalid roles
    filtered = filtered.merge(
        report[["ACCESSION_NUMBER", "RPTOWNERNAME", "RPTOWNER_TITLE", "RPTOWNER_RELATIONSHIP","Insider Role"]],
        on="ACCESSION_NUMBER", how="left"
    )

    # Filter out entity filers (Investment entities that are not officers or directors)
    filtered["RPTOWNERNAME"] = filtered["RPTOWNERNAME"].str.upper()
    entity_keywords = ["LLC", "LP", "L.P.", "LTD", "INC", "TRUST", "CORP", "FOUNDATION", "COMPANY", "CO."]
    filtered = filtered[~filtered["RPTOWNERNAME"].str.contains('|'.join(entity_keywords), na=False)]


    # Keep only valid insiders: director, officer, or has a job title
    filtered["RPTOWNER_RELATIONSHIP"] = filtered["RPTOWNER_RELATIONSHIP"].str.upper()
    filtered = filtered[
        filtered["RPTOWNER_RELATIONSHIP"].str.contains("DIRECTOR|OFFICER|TENPERCENTOWNER", na=False) |
        filtered["RPTOWNER_TITLE"].notna()
    ]


    # Merge with submission to get equity issuer info
    filtered = filtered.merge(
        submission[["ACCESSION_NUMBER", "ISSUERNAME", "ISSUERTRADINGSYMBOL", "PERIOD_OF_REPORT"]],
        on="ACCESSION_NUMBER", how="left"
    )

    # Filter out equity issuers that are investment funds
    filtered = filtered[
        ~filtered["ISSUERNAME"].str.contains("FUND", case=False, na=False) &
        ~filtered["ISSUERNAME"].str.contains("trust", case=False, na=False)
    ]



    # Select and rename output columns (for readability)
    final = filtered[[
        "RPTOWNERNAME", "RPTOWNER_TITLE", "Insider Role",
        "ISSUERNAME", "ISSUERTRADINGSYMBOL", "PERIOD_OF_REPORT",
        "TRANS_DATE", "SECURITY_TITLE", "TRANS_CODE", "TRANS_SHARES",
        "TRANS_PRICEPERSHARE", "SHRS_OWND_FOLWNG_TRANS", "DIRECT_INDIRECT_OWNERSHIP",
        "ACCESSION_NUMBER"
    ]].rename(columns={
        "RPTOWNERNAME": "Insider Name",
        "RPTOWNER_TITLE": "Insider Title",
        "Insider Role": "Insider Role",
        "ISSUERNAME": "Issuer",
        "ISSUERTRADINGSYMBOL": "Ticker",
        "PERIOD_OF_REPORT": "Period of Report",
        "TRANS_DATE": "Transaction Date",
        "SECURITY_TITLE": "Security",
        "TRANS_CODE": "Transaction Code",
        "TRANS_SHARES": "Shares",
        "TRANS_PRICEPERSHARE": "Price per Share",
        "SHRS_OWND_FOLWNG_TRANS": "Shares After",
        "DIRECT_INDIRECT_OWNERSHIP": "Ownership Type"
    })

    # Append cleaned data to master list
    merged_all.append(final)

# Combine all cleaned rows into one DataFrame
if merged_all:
    final_df = pd.concat(merged_all, ignore_index=True)
    final_df.to_csv("all_common_stock_purchases_temp.csv", index=False)
    print("Saved merged data to all_common_stock_purchases.csv")

    # Preview output
    print("Preview of merged data:")
    pd.set_option('display.max_columns', None)
    display(final_df.head(10))
else:
    print("No valid purchase data found in uploaded zip files.")



Processing: 2006q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2006q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2006q3_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2006q4_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2007q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2007q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2007q3_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered["RPTOWNER_RELATIONSHIP"] = filtered["RPTOWNER_RELATIONSHIP"].str.upper()


Processing: 2007q4_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2008q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered["RPTOWNER_RELATIONSHIP"] = filtered["RPTOWNER_RELATIONSHIP"].str.upper()


Processing: 2008q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered["RPTOWNER_RELATIONSHIP"] = filtered["RPTOWNER_RELATIONSHIP"].str.upper()


Processing: 2008q3_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2008q4_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2009q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered["RPTOWNER_RELATIONSHIP"] = filtered["RPTOWNER_RELATIONSHIP"].str.upper()


Processing: 2009q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2009q3_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2009q4_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2010q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2010q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2010q3_form345.zip
Processing: 2010q4_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2011q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2011q2_form345.zip


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered["RPTOWNER_RELATIONSHIP"] = filtered["RPTOWNER_RELATIONSHIP"].str.upper()


Processing: 2011q3_form345.zip
Processing: 2011q4_form345.zip
Processing: 2012q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2012q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2012q3_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered["RPTOWNER_RELATIONSHIP"] = filtered["RPTOWNER_RELATIONSHIP"].str.upper()


Processing: 2012q4_form345.zip


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered["RPTOWNER_RELATIONSHIP"] = filtered["RPTOWNER_RELATIONSHIP"].str.upper()


Processing: 2013q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2013q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2013q3_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2013q4_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2014q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2014q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2014q3_form345.zip
Processing: 2014q4_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2015q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2015q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2015q3_form345.zip


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered["RPTOWNER_RELATIONSHIP"] = filtered["RPTOWNER_RELATIONSHIP"].str.upper()


Processing: 2015q4_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered["RPTOWNER_RELATIONSHIP"] = filtered["RPTOWNER_RELATIONSHIP"].str.upper()


Processing: 2016q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2016q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2016q3_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2016q4_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2017q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2017q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2017q3_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2017q4_form345.zip
Processing: 2018q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2018q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered["RPTOWNER_RELATIONSHIP"] = filtered["RPTOWNER_RELATIONSHIP"].str.upper()


Processing: 2018q3_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2018q4_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2019q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2019q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered["RPTOWNER_RELATIONSHIP"] = filtered["RPTOWNER_RELATIONSHIP"].str.upper()


Processing: 2019q3_form345.zip
Processing: 2019q4_form345.zip
Processing: 2020q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2020q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2020q3_form345.zip
Processing: 2020q4_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2021q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2021q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2021q3_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2021q4_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2022q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2022q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2022q3_form345.zip
Processing: 2022q4_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2023q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2023q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered["RPTOWNER_RELATIONSHIP"] = filtered["RPTOWNER_RELATIONSHIP"].str.upper()


Processing: 2023q3_form345.zip
Processing: 2023q4_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2024q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2024q2_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2024q3_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Processing: 2024q4_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered["RPTOWNER_RELATIONSHIP"] = filtered["RPTOWNER_RELATIONSHIP"].str.upper()


Processing: 2025q1_form345.zip


  nonderiv = pd.read_csv(os.path.join(extract_path, "NONDERIV_TRANS.tsv"),sep="\t")


Saved merged data to all_common_stock_purchases.csv
Preview of merged data:


Unnamed: 0,Insider Name,Insider Title,Insider Role,Issuer,Ticker,Period of Report,Transaction Date,Security,Transaction Code,Shares,Price per Share,Shares After,Ownership Type,ACCESSION_NUMBER
0,BARNHOLT EDWARD W,,Director,ADOBE SYSTEMS INC,ADBE,30-MAR-2006,30-MAR-2006,Common Stock,P,5000.0,35.61,5000.0,D,0001179110-06-007524
1,STEGMANN THOMAS,Chief Clinical Officer,"Director,Officer,Tenpercentowner","CardioVascular BioTherapeutics, Inc.",CVBT,27-MAR-2006,27-MAR-2006,Common Stock,P,7750.0,7.34,30015500.0,D,0001303497-06-000011
2,GONZALEZ PLACIDO,,Director,EUROBANCSHARES INC,EUBK,31-OCT-2005,31-OCT-2005,Common Stock,P,100000.0,10.4,1757796.0,D,0000899078-06-000306
3,GONZALEZ PLACIDO,,Director,EUROBANCSHARES INC,EUBK,31-OCT-2005,02-NOV-2005,Common Stock,P,21000.0,11.64,1790996.0,D,0000899078-06-000306
4,GONZALEZ PLACIDO,,Director,EUROBANCSHARES INC,EUBK,31-OCT-2005,01-NOV-2005,Common Stock,P,12200.0,10.51,1769996.0,D,0000899078-06-000306
5,GONZALEZ PLACIDO,,Director,EUROBANCSHARES INC,EUBK,28-APR-2005,28-APR-2005,Common Stock,P,10300.0,15.45,1657796.0,D,0000899078-06-000305
6,MONTANO DANIEL C,"Chairman, President, CEO","Director,Officer,Tenpercentowner","CardioVascular BioTherapeutics, Inc.",CVBT,30-MAR-2006,30-MAR-2006,Common Stock,P,2250.0,7.7,7750.0,D,0001303497-06-000010
7,MONTANO DANIEL C,"Chairman, President, CEO","Director,Officer,Tenpercentowner","CardioVascular BioTherapeutics, Inc.",CVBT,30-MAR-2006,01-JAN-2000,Common Stock,P,0.0,0.0,630000.0,I,0001303497-06-000010
8,MONTANO DANIEL C,"Chairman, President, CEO","Director,Officer,Tenpercentowner","CardioVascular BioTherapeutics, Inc.",CVBT,30-MAR-2006,01-JAN-2000,Common Stock,P,0.0,0.0,30000000.0,I,0001303497-06-000010
9,STEGMANN THOMAS,Chief Clinical Officer,"Director,Officer,Tenpercentowner","CardioVascular BioTherapeutics, Inc.",CVBT,30-MAR-2006,30-MAR-2006,Common Stock,P,2250.0,7.7,30007750.0,D,0001303497-06-000009
