# Notebook for Company Listings 

This notebook focuses on identifying the names and GICS sectors of all companies in the S&P 500, for which we have data, based on their tickers. The Global Industry Classification Standard (GICS) sector categorizes companies into a certain industry, providing a standardized framework for analyzing market trends and company performance. Associating each ticker with its respective company name and GICS sector helps us gain a deeper understanding of the dataset and its composition.

Let's start by importing the necessary libraries. Key libraries include :
- `BeautifulSoup`: For parsing and extracting information from HTML content.
- `pandas`: For efficient data manipulation and analysis.
- `requests`: To fetch data from web pages or APIs.
- `os`: To handle file and directory operations.

In [None]:
# Import necessary libraries

from bs4 import BeautifulSoup

import pandas as pd

import requests
import os

The following code retrieves the list of S&P 500 companies from Wikipedia and matches it with the folder names containing ticker symbols in the S&P500 directory. It extracts the company names and their respective GICS sectors (Global Industry Classification Standard, which categorizes companies into sectors and industries). The matching information is compiled into a new DataFrame and saved as a CSV file, providing a clean mapping of tickers to their names and sectors for further analysis.

In [None]:
# Fetch the Wikipedia page and parse the first table
url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table', {'class': 'wikitable'})

# Convert the table to a DataFrame
df_wikipedia = pd.read_html(str(table))[0]  # Original table
df_wikipedia.columns = [col.strip() for col in df_wikipedia.columns]  # Clean column names

# Get folder names from the `S&P500` directory
sp500_directory = os.path.expanduser("S&P500")  # Path to your directory
if not os.path.exists(sp500_directory):
    raise FileNotFoundError(f"The folder '{sp500_directory}' does not exist.")

# List of folder names (symbols)
folder_symbols = sorted(os.listdir(sp500_directory))  # Sorted alphabetically

# Create a new DataFrame for the custom table
custom_table = pd.DataFrame(columns=['symbol', 'security', 'gics_sector'])
custom_table['symbol'] = folder_symbols  # Populate the 'symbol' column

# Match data from the Wikipedia table
# Iterate through each symbol and find the corresponding data in the Wikipedia table
for index, row in custom_table.iterrows():
    symbol = row['symbol']
    match = df_wikipedia[df_wikipedia['Symbol'] == symbol]
    if not match.empty:
        custom_table.at[index, 'security'] = match['Security'].values[0]
        custom_table.at[index, 'gics_sector'] = match['GICS Sector'].values[0]

# Remove the first row of the DataFrame
custom_table = custom_table.iloc[1:].reset_index(drop=True)

# Save the new table to a CSV file
custom_csv_filename = "List_S&P500_Compagnies.csv"
custom_table.to_csv(custom_csv_filename, index=False)

In [83]:
# Load the CSV file
compagnies_table = 'List_S&P500_Compagnies.csv'
df = pd.read_csv(compagnies_table)

# Display the DataFrame
display(df)

Unnamed: 0,symbol,security,gics_sector
0,A,Agilent Technologies,Health Care
1,AA,Alcoa Corporation,Materials
2,AAP,Advance Auto Parts,Consumer Discretionary
3,ABC,AmerisourceBergen,Health Care
4,ABD,Aberdeen Asset Management,Financials
...,...,...,...
396,XEL,Xcel Energy,Utilities
397,XOM,ExxonMobil,Energy
398,XRX,Xerox,Information Technology
399,YUM,Yum! Brands,Consumer Discretionary


The resulting DataFrame provides a comprehensive mapping of S&P 500 ticker symbols to their respective company names and GICS sectors, offering a clear and structured view of the dataset for further analysis.
