Micro-pipeline for scraping job data from Indeed.co.uk
This code was written using Python 3.8.
In a virtual environment, install dependencies with:
python -m pip install -r requirements.txt
The code for scraping an individual job advert is stored in
indeed_scraper.py
. A demonstration of the output can be seen by calling the
file on the command line:
python indeed_scraper.py
The code for a micropipeline is stored in micropipeline.py
. It will download
job data from a queue and store to file and record exceptions. It can be
called from the command line with:
python micropipeline.py
Look at the contents of results.txt
and failed_keys.txt
for details. For
now it repeatedly calls the example job advert and a simulated failure.
Unit and integration tests for the scraper can be run with:
export PYTHONPATH=.
pytest -vs test