## Task Description

Navigate to the list of faculty in the Anesthesiology Department (be aware that each website is very different. For instance, the links for Upstate and Westchester are already linked to the Anesthesiology program, but you will have to navigate to the faculty page; the New Mexico website has to be filtered first in order to see just the anesthesiology faculty). 


Once you find the appropriate pages with the faculty listed, I would like you to create an excel file, where each institution as its own sheet. There should be four headers for each sheet: First Name, Last Name, Email, Error. The first name and last name of a clinician should be in the appropriate cell as well as their email (if scrapable). Error should remain blank for now. There should not be any other text in the first name or last name columns (no MD, DO, or any titles, just a single name in each). 

In [1]:
import sys
sys.executable

'/mnt/hdd/other/miniconda3/bin/python'

In [5]:
# Install dependencies

!pip install --upgrade pip
!pip install beautifulsoup4
!pip install pandas
!pip install xlsxwriter

Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting xlsxwriter
  Downloading XlsxWriter-3.2.0-py3-none-any.whl.metadata (2.6 kB)
Downloading XlsxWriter-3.2.0-py3-none-any.whl (159 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m159.9/159.9 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: xlsxwriter
Successfully installed xlsxwriter-3.2.0


In [8]:
# Import necessary libraries
from bs4 import BeautifulSoup
import requests
import pandas as pd

# Define URLs for each institution
urls = {
    "UNM": "https://hsc.unm.edu/directory/",
    "Upstate": "https://www.upstate.edu/anesthesiology/about-us/index.php",
    "Westchester": "https://www.westchestermedicalcenter.org/anesthesiology-residency-program"
}

# Create an empty dictionary to store dataframes for each institution
dfs = {}

# Function to extract faculty information from a given URL
def extract_faculty_data(url, institution_name):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')

    if "hsc.unm.edu" in url:
        # Logic for University of New Mexico
        pass  # You'll need to add the code for this website

    elif "upstate.edu" in url:
        # Logic for Upstate Medical University
        pass  # You'll need to add the code for this website

    elif "westchestermedicalcenter.org" in url:
        # Logic for Westchester Medical Center
        # Locate the <h2> heading with "Faculty" text
        faculty_heading = soup.find("h2", string="Faculty") 

        # Check if the heading was found
        if faculty_heading:
            # Locate the parent container of the heading (assuming it's the container with faculty details)
            faculty_container = faculty_heading.parent  
            print(faculty_container)
            # ... (Extract faculty information from faculty_container - same as before)
        else:
            print("Error: 'Faculty' heading not found on the webpage.")
            return pd.DataFrame()  # Return an empty DataFrame

    else:
        print(f"Unsupported website: {url}")
        return

    # Implement website-specific logic to locate faculty listings and extract names and emails
    # ... (You'll need to inspect the HTML structure of each website and adapt the code accordingly) 
    # Example:
    # faculty_elements = soup.find_all("div", class_="faculty-member")
    # for element in faculty_elements:
    #     first_name = element.find("span", class_="first-name").text
    #     last_name = element.find("span", class_="last-name").text
    #     email = element.find("a", class_="email")["href"]

    # Create a pandas DataFrame to store the extracted data
    df = pd.DataFrame(columns=["First Name", "Last Name", "Email", "Error"])
    # ... (Populate the DataFrame with the extracted information)
    
    return df

# Iterate through URLs and extract faculty data for each institution
for institution_name, url in urls.items():
    dfs[institution_name] = extract_faculty_data(url, institution_name)

# Create an Excel writer object
writer = pd.ExcelWriter("anesthesiology_faculty.xlsx", engine='xlsxwriter')

# Write each DataFrame to a separate sheet in the Excel file
for institution_name, df in dfs.items():
    df.to_excel(writer, sheet_name=institution_name, index=False)

# Save the Excel file
writer.save()

<div class="cpsty_DynamicTab_Content" id="cpsys_DynamicTab_20cd7230-3562-4f30-a2f9-54b2b3ddb29f_4"><span><br/>
</span>
<h2><span><strong>Faculty</strong></span></h2>
<p><span><strong><img src="/Uploads/Public/Images/WMCHealth/GME%20Directors/Anesthesia%20Residency/Peter%20Panzica%20wesbite.jpg" style="width: 157px; height: 231px;"/></strong></span></p>
<p><span><strong>Peter J. Panzica, MD</strong><br/>
Associate Professor and Chair of Anesthesiology, NYMC<br/>
Director of Anesthesia Services<br/>
<span>Adult Cardiac and Thoracic Anesthesia</span><br/>
</span></p>
<p> </p>
<h2>WMC
Resident Education</h2>
<p>
</p>
<p><span><strong>A. Elisabeth Abramowicz, MD, FASA</strong><br/>
Professor<br/>
Residency Program Director<br/>
Vice Chair for Education<br/>
Neuroanesthesia </span></p>
<p><span><strong style="font-size: 12.1104px;">Sarah Smith, MD</strong><br style="font-size: 12.1104px;"/>
<span style="font-size: 12.1104px;">Associate Professor<br/>
Associate Program Director<br/>
Cardiac A

ModuleNotFoundError: No module named 'xlsxwriter'