<p style="text-align: center; font-size: 32px; font-weight: bold;">S&P 500 Data Scraper</p>



- This Python script scrapes the latest S&P 500 company data from [Slickcharts](https://www.slickcharts.com/sp500) using `requests` and `BeautifulSoup`. 
- It extracts the data table containing index components, processes it into a clean `pandas` DataFrame, and adds the current date for reference.

In [1]:
import requests
import pandas as pd
from bs4 import BeautifulSoup
from datetime import datetime
import os

# Define the URL and headers
url = 'https://www.slickcharts.com/sp500'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)

# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

# Find the table containing the S&P 500 data
table = soup.find('table', {'class': 'table table-hover table-borderless table-sm'})

# Extract the headers
header_row = table.find_all('th')
headers = [header.text.strip() for header in header_row]

# Extract the rows
rows = []
for row in table.find_all('tr')[1:]:  # Skip the header row
    cols = row.find_all('td')
    rows.append([col.text.strip() for col in cols])

# Create the DataFrame
df = pd.DataFrame(rows, columns=headers)

# Assign today's date to the DataFrame
df['Date'] = datetime.today().strftime('%Y-%m-%d')

# Display the DataFrame
df.head()

Unnamed: 0,#,Company,Symbol,Weight,Price,Chg,% Chg,Date
0,1,Apple Inc.,AAPL,6.74%,205.35,-7.97,(-3.74%),2025-05-04
1,2,Microsoft Corp,MSFT,6.65%,435.28,9.88,(2.32%),2025-05-04
2,3,Nvidia Corp,NVDA,5.75%,114.5,2.89,(2.59%),2025-05-04
3,4,Amazon.com Inc,AMZN,3.77%,189.98,-0.22,(-0.12%),2025-05-04
4,5,"Meta Platforms, Inc. Class A",META,2.64%,597.02,24.81,(4.34%),2025-05-04


- Save the S&P 500 data to a CSV file to the `input` folder and keep of the daily records.

In [2]:
# df.to_csv('input/sp500_04.05.2025.csv', sep=";", index=False)

- This script reads all CSV files from the `input` folder, merges them into a single `pandas` DataFrame, and optionally saves the result to a new CSV file.

In [3]:
# Path to your folder with CSV files
folder_path = 'input'

# Get all CSV file paths
csv_files = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if f.endswith('.csv')]

# Read and concatenate all CSV files
df = pd.concat([pd.read_csv(file, sep=';') for file in csv_files], ignore_index=True)

# Optional: save the result to a new file
df.to_csv('output/s&p500_merged.csv', sep=";", index=False)