## Task Description

Navigate to the list of faculty in the Anesthesiology Department (be aware that each website is very different. For instance, the links for Upstate and Westchester are already linked to the Anesthesiology program, but you will have to navigate to the faculty page; the New Mexico website has to be filtered first in order to see just the anesthesiology faculty). 


Once you find the appropriate pages with the faculty listed, I would like you to create an excel file, where each institution as its own sheet. There should be four headers for each sheet: First Name, Last Name, Email, Error. The first name and last name of a clinician should be in the appropriate cell as well as their email (if scrapable). Error should remain blank for now. There should not be any other text in the first name or last name columns (no MD, DO, or any titles, just a single name in each). 

In [1]:
import sys
sys.executable

'/Library/Frameworks/Python.framework/Versions/3.10/bin/python3'

In [None]:
# Install dependencies

!pip install --upgrade pip
!pip install beautifulsoup4
!pip install pandas

Found existing installation: ipykernel 6.29.4
Uninstalling ipykernel-6.29.4:
  Would remove:
    /opt/homebrew/lib/python3.11/site-packages/ipykernel-6.29.4.dist-info/*
    /opt/homebrew/lib/python3.11/site-packages/ipykernel/*
    /opt/homebrew/lib/python3.11/site-packages/ipykernel_launcher.py
    /opt/homebrew/share/jupyter/kernels/python3/kernel.json
    /opt/homebrew/share/jupyter/kernels/python3/logo-32x32.png
    /opt/homebrew/share/jupyter/kernels/python3/logo-64x64.png
    /opt/homebrew/share/jupyter/kernels/python3/logo-svg.svg
Proceed (Y/n)? 

In [None]:
# Import necessary libraries
from bs4 import BeautifulSoup
import requests
import pandas as pd

# Define URLs for each institution
urls = {
    "UNM": "https://hsc.unm.edu/directory/",
    "Upstate": "https://www.upstate.edu/anesthesiology/about-us/index.php",
    "Westchester": "https://www.westchestermedicalcenter.org/anesthesiology-residency-program"
}

# Create an empty dictionary to store dataframes for each institution
dfs = {}

# Function to extract faculty information from a given URL
def extract_faculty_data(url, institution_name):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')

    if "hsc.unm.edu" in url:
        # Logic for University of New Mexico
        pass  # You'll need to add the code for this website

    elif "upstate.edu" in url:
        # Logic for Upstate Medical University
        pass  # You'll need to add the code for this website

    elif "westchestermedicalcenter.org" in url:
        faculty_container = soup.find("div", id="cpsys_DynamicTab_20cd7230-3562-4f30-a2f9-54b2b3ddb29f_4")
        
        print(faculty_container)
        # Logic for Westchester Medical Center
        pass  # You'll need to add the code for this website

    else:
        print(f"Unsupported website: {url}")
        return

    # Implement website-specific logic to locate faculty listings and extract names and emails
    # ... (You'll need to inspect the HTML structure of each website and adapt the code accordingly) 
    # Example:
    # faculty_elements = soup.find_all("div", class_="faculty-member")
    # for element in faculty_elements:
    #     first_name = element.find("span", class_="first-name").text
    #     last_name = element.find("span", class_="last-name").text
    #     email = element.find("a", class_="email")["href"]

    # Create a pandas DataFrame to store the extracted data
    df = pd.DataFrame(columns=["First Name", "Last Name", "Email", "Error"])
    # ... (Populate the DataFrame with the extracted information)
    
    return df

# Iterate through URLs and extract faculty data for each institution
for institution_name, url in urls.items():
    dfs[institution_name] = extract_faculty_data(url, institution_name)

# Create an Excel writer object
writer = pd.ExcelWriter("anesthesiology_faculty.xlsx", engine='xlsxwriter')

# Write each DataFrame to a separate sheet in the Excel file
for institution_name, df in dfs.items():
    df.to_excel(writer, sheet_name=institution_name, index=False)

# Save the Excel file
writer.save()

ModuleNotFoundError: No module named 'pandas'