## Scraping court decisions

**Visit [this link](https://www.tnwb.uscourts.gov/Search/Search.aspx) and search for "CAR." Scrape the results into a CSV, with four columns: the URL to the case, the name of the case, the category (e.g. "Judge's Opinions), the additional details (terms match/size/pdf URL).**

*Bonuses, if you want to get fancy: Split up the additional details into multiple columns & Download all of the PDFs of the cases*


In [3]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import os

response = requests.get("https://www.tnwb.uscourts.gov/Search/Search.aspx?zoom_sort=0&zoom_xml=0&zoom_query=CAR.&zoom_per_page=200&zoom_and=1&zoom_cat%5B%5D=-1")
doc = BeautifulSoup(response.text)

In [4]:
cars = doc.find_all(class_=['result_block', 'result_altblock'])
print(len(cars))

132


In [5]:
rows = []

for car in cars:
    row = {}

    row['title'] = car.find(class_ = 'result_title').text
    row['link'] = car.find('a').get('href')
    row['category'] = car.find(class_ = 'category').text
    row['details'] = car.find(class_ = 'infoline').text

    rows.append(row)

In [6]:
df = pd.DataFrame(rows)
df.head()

Unnamed: 0,title,link,category,details
0,1. JDL: 04-24318 Jacquelline D. Black [Judges'...,https://www.tnwb.uscourts.gov/Opinions/jdl/pdf...,[Judges' Opinions],Terms matched: 1 - 102k - URL: https://ww...
1,2. WHB: 95-26401 Mary Lucy Cooper [Judges' Opi...,https://www.tnwb.uscourts.gov/Opinions/whb/pdf...,[Judges' Opinions],Terms matched: 1 - 27k - URL: https://www...
2,3. GHB: 97-12368 Billy G. Woffard [Judges' Opi...,https://www.tnwb.uscourts.gov/Opinions/ghb/pdf...,[Judges' Opinions],Terms matched: 1 - 71k - URL: https://www...
3,4. JDL: 97-30580 Mary Chrlis Hurst [Judges' Op...,https://www.tnwb.uscourts.gov/Opinions/jdl/pdf...,[Judges' Opinions],Terms matched: 1 - 32k - URL: https://www...
4,5. MRH: 20-20967 Jacob Braxton Herring 20-0009...,https://www.tnwb.uscourts.gov/Opinions/mrh/pdf...,[Judges' Opinions],Terms matched: 1 - 303k - URL: https://ww...


In [7]:
df[['terms_match', 'size', 'pdf_url']] = df['details'].str.split(' - ', expand=True)
df['pdf_url'] = df['pdf_url'].str.replace('URL:', '', regex=False).str.strip()

df.head()


Unnamed: 0,title,link,category,details,terms_match,size,pdf_url
0,1. JDL: 04-24318 Jacquelline D. Black [Judges'...,https://www.tnwb.uscourts.gov/Opinions/jdl/pdf...,[Judges' Opinions],Terms matched: 1 - 102k - URL: https://ww...,Terms matched: 1,102k,https://www.tnwb.uscourts.gov/Opinions/jdl/pdf...
1,2. WHB: 95-26401 Mary Lucy Cooper [Judges' Opi...,https://www.tnwb.uscourts.gov/Opinions/whb/pdf...,[Judges' Opinions],Terms matched: 1 - 27k - URL: https://www...,Terms matched: 1,27k,https://www.tnwb.uscourts.gov/Opinions/whb/pdf...
2,3. GHB: 97-12368 Billy G. Woffard [Judges' Opi...,https://www.tnwb.uscourts.gov/Opinions/ghb/pdf...,[Judges' Opinions],Terms matched: 1 - 71k - URL: https://www...,Terms matched: 1,71k,https://www.tnwb.uscourts.gov/Opinions/ghb/pdf...
3,4. JDL: 97-30580 Mary Chrlis Hurst [Judges' Op...,https://www.tnwb.uscourts.gov/Opinions/jdl/pdf...,[Judges' Opinions],Terms matched: 1 - 32k - URL: https://www...,Terms matched: 1,32k,https://www.tnwb.uscourts.gov/Opinions/jdl/pdf...
4,5. MRH: 20-20967 Jacob Braxton Herring 20-0009...,https://www.tnwb.uscourts.gov/Opinions/mrh/pdf...,[Judges' Opinions],Terms matched: 1 - 303k - URL: https://ww...,Terms matched: 1,303k,https://www.tnwb.uscourts.gov/Opinions/mrh/pdf...


In [8]:
df = df.drop('details', axis=1)
df.head()

Unnamed: 0,title,link,category,terms_match,size,pdf_url
0,1. JDL: 04-24318 Jacquelline D. Black [Judges'...,https://www.tnwb.uscourts.gov/Opinions/jdl/pdf...,[Judges' Opinions],Terms matched: 1,102k,https://www.tnwb.uscourts.gov/Opinions/jdl/pdf...
1,2. WHB: 95-26401 Mary Lucy Cooper [Judges' Opi...,https://www.tnwb.uscourts.gov/Opinions/whb/pdf...,[Judges' Opinions],Terms matched: 1,27k,https://www.tnwb.uscourts.gov/Opinions/whb/pdf...
2,3. GHB: 97-12368 Billy G. Woffard [Judges' Opi...,https://www.tnwb.uscourts.gov/Opinions/ghb/pdf...,[Judges' Opinions],Terms matched: 1,71k,https://www.tnwb.uscourts.gov/Opinions/ghb/pdf...
3,4. JDL: 97-30580 Mary Chrlis Hurst [Judges' Op...,https://www.tnwb.uscourts.gov/Opinions/jdl/pdf...,[Judges' Opinions],Terms matched: 1,32k,https://www.tnwb.uscourts.gov/Opinions/jdl/pdf...
4,5. MRH: 20-20967 Jacob Braxton Herring 20-0009...,https://www.tnwb.uscourts.gov/Opinions/mrh/pdf...,[Judges' Opinions],Terms matched: 1,303k,https://www.tnwb.uscourts.gov/Opinions/mrh/pdf...


In [9]:
folder_name = "/Users/teodoracurcic/Downloads/Lede/homework/lede-2025-homework-07/pdfs"

for i, link in enumerate(df['pdf_url']):
    pdf_url = link.strip()
    pdf_response = requests.get(pdf_url)

    if not os.path.exists(folder_name):
        os.makedirs(folder_name)
        
    with open(f"{folder_name}/file_{i+1}.pdf", 'wb') as f:
        f.write(pdf_response.content)

In [27]:
df.to_csv('bankrupcy_cars.csv', index=False)