FAQ:

Q: I want to scrape FBS, what do I need to do?

A: Before you start scraping FBS, do the following:
1. Replace the chrome_path below to the location of the chrome driver in your computer.
2. If you want to automatically login, save your credentials in a file called userpw.txt in the following format:
username:
password:
3. Open up Scrape_Facility_List and generate the list of facility details first. Make sure that the files generated are in the same directory as this file.

Q: So what exactly does this file do?
A: It logins to FBS using your credentials, and references off a list of facilities for the school(s) of your choice.
It will generate a list of bookings for the past 2 weeks in a csv file for each school you have specified. 
At the moment, the facility type is set to be GSRs.


In [None]:
# Insert your chrome driver path here.
chrome_path = r""

In [2]:
# Imports selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
import pandas as pd

school_list = ["Lee Kong Chian School of Business","School of Accountancy","School of Economics/School of Social Sciences",
               "School of Information Systems","School of Law/Kwa Geok Choo Law Library"]

print("The schools you can scrape are as follows:")
for i in range(len(school_list)):
    print(f"[{i}] - {school_list[i]}")

print(" ")
schools_to_scrape = input("What school(s) do you want to scrape? - Enter the number(s) beside the school of your choice, in a string, e.g. 1234 ")

schools_to_scrape_check = True
for school in schools_to_scrape:
    if school not in "01234":
        schools_to_scrape_check = False
    
while schools_to_scrape_check == False:
    print("The number you have entered is not valid, please try again.")
    schools_to_scrape = input("What school(s) do you want to scrape? - Enter the number(s) beside the school of your choice, in a strring, e.g. 1234 ")
    schools_to_scrape_check = True
    for school in schools_to_scrape:
        if school not in "01234":
            schools_to_scrape_check = False

print(f"The school(s) you have selected to scrape are:")
schools_to_scrape_list = []

for i in range(len(schools_to_scrape)):
    schools_to_scrape_list.append(school_list[int(schools_to_scrape[i])])
    print(school_list[int(schools_to_scrape[i])])


extract_details = input("Did you store your credentials into userpw.txt in the same directory? Y|N " )
print(" ")

The schools you can scrape are as follows:
[0] - Lee Kong Chian School of Business
[1] - School of Accountancy
[2] - School of Economics/School of Social Sciences
[3] - School of Information Systems
[4] - School of Law/Kwa Geok Choo Law Library
 
What school(s) do you want to scrape? - Enter the number(s) beside the school of your choice, in a string, e.g. 1234 01234
The school(s) you have selected to scrape are:
Lee Kong Chian School of Business
School of Accountancy
School of Economics/School of Social Sciences
School of Information Systems
School of Law/Kwa Geok Choo Law Library
Did you store your credentials into userpw.txt in the same directory? Y|N Y
 


In [3]:
time.sleep(2)

# Opens the chrome webdriver path
driver = webdriver.Chrome(chrome_path)

# Starts the driver for the stated url
driver.get("https://fbs.intranet.smu.edu.sg/home")

#Obtains user/pw from a text file

if extract_details.upper() == "Y":
    
    # Obtains user/pw from a text file
    
    user = ""
    password = ""
    count = 0
    with open('userpw.txt','r',errors = 'ignore') as f:
        for line in f:
            line = line.rstrip("\n")
            line = line.split(":")
            if count == 0:
                user = line[1]
            else:
                password = line[1]
            count += 1
            
else:
    input("Enter your username and password into the Chromedriver. Press Enter to continue...")

time.sleep(2)
#Enters Username and Password
driver.find_element_by_css_selector("input#userNameInput.text.fullWidth").click()
driver.find_element_by_css_selector("input#userNameInput.text.fullWidth").send_keys(user)

driver.find_element_by_css_selector("input#passwordInput.text.fullWidth").click()
driver.find_element_by_css_selector("input#passwordInput.text.fullWidth").send_keys(password)

driver.find_element_by_css_selector("span#submitButton.submit").click()

The schools you can scrape are as follows:
[0] - Lee Kong Chian School of Business
[1] - School of Accountancy
[2] - School of Economics/School of Social Sciences
[3] - School of Information Systems
[4] - School of Law/Kwa Geok Choo Law Library
What school(s) do you want to scrape? - Enter the number(s) beside the school of your choice, in a string, e.g. 1234 01234
The schools you have selected to scrape is ['Lee Kong Chian School of Business', 'School of Accountancy', 'School of Economics/School of Social Sciences', 'School of Information Systems', 'School of Law/Kwa Geok Choo Law Library']
Did you store your credentials into userpw.txt in the same directory? Y|N y
 


In [14]:
#Creates a list of file_names storing the data.
file_name_list = []

for school in schools_to_scrape_list:
    #Scrapes for different schools
    print("Currently scraping: " + school)

    
    buildings = ["Administration Building","Campus Open Spaces - Events/Activities",
                                 "Concourse - Room/Lab", "Lee Kong Chian School of Business",
                                 "Li Ka Shing Library", "Prinsep Street Residences", "School of Accountancy",
                                 "School of Economics/School of Social Sciences","School of Information Systems",
                                 "School of Law/Kwa Geok Choo Law Library", "SMU Connexion"]

    #To tailor to scraping different schools
    buildings_index = buildings.index(school)
    
    print("Selecting building...")
    
    time.sleep(2)
    #Clicks building name
    driver.switch_to_frame(driver.find_element_by_tag_name('iframe'))
    driver.switch_to_frame(driver.find_element_by_tag_name('iframe'))

    driver.find_element_by_id('DropMultiBuildingList_c1_panelInputs').click() 

    time.sleep(2)

    #Building name will be a list of elements, in this case 8 is information systems
    driver.find_element_by_id(buildings_index).click()

    time.sleep(2)
    #Clicks ok button after selecting building
    driver.find_element_by_xpath("//*[@id='DropMultiBuildingList_c1_panelTreeView']/input[1]").click()

    #Selects facility types
    print("Selecting facility types...")
    
    time.sleep(2)
    driver.find_element_by_id('DropMultiFacilityTypeList_c1').click()

    # Clicks on GSR
    if school == "School of Law/Kwa Geok Choo Law Library":
        driver.find_element_by_xpath("""/html/body/div[2]/form/span[1]/span/span/div/div/div/div/div/span/span/span/div/div/div[1]/div[8]/div/div/span/div/table/tbody/tr/td/table/tbody/tr/td/div/div/div/label[3]/span""").click()
    else:
        driver.find_element_by_xpath("""/html/body/div[2]/form/span[1]/span/span/div/div/div/div/div/span/span/span/div/div/div[1]/div[8]/div/div/span/div/table/tbody/tr/td/table/tbody/tr/td/div/div/div/label[2]/span""").click()


    time.sleep(2)
    # Clicks out of the box
    driver.find_element_by_xpath("//*[@id='CheckAvailability']/span").click()

    time.sleep(2)
    # Clicks on check availability
    driver.find_element_by_xpath("//*[@id='CheckAvailability']/span").click()

    # Gets list of facilities from csv file
    print("Getting a list of facilities from the csv file...")

    school_list = ["Lee Kong Chian School of Business","School of Accountancy","School of Economics/School of Social Sciences",
                   "School of Information Systems","School of Law/Kwa Geok Choo Law Library"]

    school_abbrev_list = ["LKCSB","SOA","SOE","SIS","SOL"]

    abbrev = school_abbrev_list[school_list.index(school)]

    facility_file = abbrev + "_GSR.csv"
    facility_list = pd.read_csv(facility_file, delimiter = ',', header = 0, usecols = [1]).values.tolist()

    # Removes nested list from file
    for i in range(len(facility_list)):
        facility_list[i] = facility_list[i][0]

    # #number of days to scrape
    # days = 14

    # #Overall list which stores [Day,Facility,Booking Time, Purpose(Student/Faculty)]
    overall_list = []

    day_list = []
    day = ""
    prev_day = ""

    for i in range(1):
        print("Loading...")
        time.sleep(10)
        #Finds current day
        day = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.CSS_SELECTOR, "div.scheduler_bluewhite_timeheader_float_inner")))
        if prev_day == "":
            prev_day = day
        else:
            while day == prev_day:
                day = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.CSS_SELECTOR, "div.scheduler_bluewhite_timeheader_float_inner")))
        prev_day = day

        day = day.text
        day_list.append(day)
        print(f"Currently scraping for {day}...")

        bookings_list = driver.find_elements_by_css_selector("div.scheduler_bluewhite_event.scheduler_bluewhite_event_line0")

        #Counter for facility list
        count = 0
        temp_list = []

        for booking in bookings_list:
            booking = booking.get_attribute("title")
            if "23:59" in booking:
                count += 1
            elif "not available" not in booking:
                booking = booking.split("\n")
                for i in range(len(booking)):
                    if temp_list == []:
                        temp_list = [day,facility_list[count]]                
                    if i != 4:
                        temp_list.append(booking[i][booking[i].find(":")+2:])
                    elif booking[i] != "Booked for User Org Unit: ":
                        temp_list.append(booking[i][booking[i].find(":")+2:])
                    else:
                        temp_list.append("Student")
                overall_list.append(temp_list)
                temp_list = []
        print("Last 3 inputs:")
        for i in range(-1,-3,-1):
            print(overall_list[i])
        print(f"Scraping for {day} completed.")
        print("Going to next day...")

        prev_day = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.CSS_SELECTOR, "a#BtnPrevious.btngray.btntop.btnleft"))).click()

    print(f"Scraping done from {day_list[0]} to {day_list[-1]}")


    df = pd.DataFrame(overall_list)
    df.columns = ["Day","Facility","Booking Time","Booking Status",
                     "Booking Reference Number","Booked for User Name","Booked for User Org Unit",
                    "Booked for User Email Address","Use Type","Purpose of Booking"]
    print(df.head)

    file_name = abbrev + "_bookings.csv"
    df.to_csv(file_name)
    
    file_name_list.append(file_name)

    print(f"Data has been saved to {file_name}")
    
    print(" ")
    
    print("Schools scraped so far: ")
    
    current_index = schools_to_scrape_list.index(school)
    for i in range(current_index+1):
        print(schools_to_scrape_list[i])
        
    print(" ")
    
    if current_index < len(schools_to_scrape_list) - 1:
        print("Schools left to scrape: ")
        for i in range(current_index+1,len(schools_to_scrape_list)):
            print(schools_to_scrape_list[i])
    else:
        print("All done! You can find your data in the following files: ")
        for filename in file_name_list:
            print(filename)
    
    if current_index != len(schools_to_scrape_list) -1:
        # Refreshes the page
        driver.refresh()

        print("Refreshing the page...")
        print(" ")
    
    
        
    

Currently scraping: Lee Kong Chian School of Business
Selecting building...




Selecting facility types...
Getting a list of facilities from the csv file...
Loading...
Currently scraping for 11 February 2020, Tuesday...
Last 3 inputs:
['11 February 2020, Tuesday', 'LKCSB GSR 3-9', '18:00-20:00', 'Confirmed', 'BK-20200128-000070', 'CHONG KIAN EN, JOHANNES', 'Student', 'kechong.2018@business.smu.edu.sg', 'AdHoc', 'Interview']
['11 February 2020, Tuesday', 'LKCSB GSR 3-9', '14:00-18:00', 'Confirmed', 'BK-20200128-000069', 'RUQOYAH BINTE MAZLAN', 'Student', 'ruqoyahm.2018@business.smu.edu.sg', 'AdHoc', 'BCAMP OC Interview']
Scraping for 11 February 2020, Tuesday completed.
Going to next day...
Scraping done from 11 February 2020, Tuesday to 11 February 2020, Tuesday
<bound method NDFrame.head of                            Day       Facility Booking Time Booking Status  \
0    11 February 2020, Tuesday  LKCSB GSR 1-1  12:00-15:30      Confirmed   
1    11 February 2020, Tuesday  LKCSB GSR 1-1  15:30-19:00      Confirmed   
2    11 February 2020, Tuesday  LKCSB GSR 1-2

Refreshing the page...
 
Currently scraping: School of Information Systems
Selecting building...
Selecting facility types...
Getting a list of facilities from the csv file...
Loading...
Currently scraping for 11 February 2020, Tuesday...
Last 3 inputs:
['11 February 2020, Tuesday', 'SIS GSR 3-6', '19:00-22:30', 'Confirmed', 'BK-20200204-001130', 'MATTHEW CHIANG', 'Student', 'matthewc.2018@sis.smu.edu.sg', 'Academic', 'Testing for Class b/c Academic Accomodations']
['11 February 2020, Tuesday', 'SIS GSR 3-6', '15:00-19:00', 'Confirmed', 'BK-20200203-001159', 'DEAN KHO WAI KIT', 'Student', 'dean.kho.2018@business.smu.edu.sg', 'AdHoc', 'study']
Scraping for 11 February 2020, Tuesday completed.
Going to next day...
Scraping done from 11 February 2020, Tuesday to 11 February 2020, Tuesday
<bound method NDFrame.head of                           Day     Facility Booking Time Booking Status  \
0   11 February 2020, Tuesday  SIS GSR 2-1  19:00-21:00      Confirmed   
1   11 February 2020, Tuesd

Refreshing the page...
 
Currently scraping: School of Law/Kwa Geok Choo Law Library
Selecting building...
Selecting facility types...
Getting a list of facilities from the csv file...
Loading...
Currently scraping for 11 February 2020, Tuesday...
Last 3 inputs:
['11 February 2020, Tuesday', 'SOL-B1.12-GS', '19:00-20:30', 'Confirmed', 'BK-20200211-000986', 'ANG PING CHYI', 'Student', 'pcang.2018@accountancy.smu.edu.sg', 'AdHoc', 'study']
['11 February 2020, Tuesday', 'SOL-B1.12-GS', '15:00-19:00', 'Confirmed', 'BK-20200129-000467', 'DARRYL HOR JUN HENG', 'Student', 'darryl.hor.2018@law.smu.edu.sg', 'AdHoc', 'study']
Scraping for 11 February 2020, Tuesday completed.
Going to next day...
Scraping done from 11 February 2020, Tuesday to 11 February 2020, Tuesday
<bound method NDFrame.head of                           Day      Facility Booking Time Booking Status  \
0   11 February 2020, Tuesday   SOL-2.07-GS  16:00-20:00      Confirmed   
1   11 February 2020, Tuesday   SOL-2.07-GS  20:00-