glassdoor-interview-scraper

Glassdoor web crawler and scraper providing interview experience data for Decoding The Interview.

This was an academic project for CS 410 - Text and Information Systems at UIUC and is no longer maintained.

Installation

Python 2.7.*
Beautiful Soup 4 (4.4.1)

$ pip install bs4

Selenium Webdriver

$ pip install selenium

Usage

Open the scraper Python script with a text editor of your choice.
Add your Glassdoor account username and password
Specify the number of pages, the company name, and the URL of the interviews page for the company on Glassdoor with your specified filters selected
Run the scraper

$ python scraper_v1.2.py

NOTE: Glassdoor will require you to insert CAPTCHA on login or during the scraping process. The script will poll until CAPTCHA is entered during scraping.

Results

The web scraper will output a JSON with the name "[company name].json" in the same directory. Each data point in the JSON corresponds to one interview review on Glassdoor with attributes (see above) for each portion of the review.

Changelog

###v1.2

companyURL now accepts full path of Interview page for ease of use
Fixed issue with pagination not working
Fixed issue where scraper would erroneously get stuck waiting for the page to load
Increased initial sleep time in case of CAPTCHA
Reduced polling time on waiting for page load or captcha input
Now takes an additional short break every 10 pages to avoid rate limiting
Cleaned up and added some more progress dialogue

###v1.1

Made maxnum a global pages variable for easier use
Removed option and dependency for URL2 as each link ends in ".htm" anyways
Removed unnecessary concatenation of URL links at the beginning of get_data(). Glassdoor automatically redirects _IP1 link to the first interview page.
Increased sleep time after login
Increased sleep time in between scraping interview pages
Added some more progess dialogue

###v1.0

And so it begins ...

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
previous_versions		previous_versions
README.md		README.md
Review.py		Review.py
chromedriver		chromedriver
scraper_v1.2.py		scraper_v1.2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

glassdoor-interview-scraper

Installation

Usage

Results

Changelog

About

Releases

Packages

Languages

williamxie11/glassdoor-interview-scraper

Folders and files

Latest commit

History

Repository files navigation

glassdoor-interview-scraper

Installation

Usage

Results

Changelog

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages