Scraping Best Practices

Make sure that you can guarantee great results (x% of the schools will have this field) before you program anything.
- Example: out of three potential options(chegg, rmp, etc) to scrape courses, you chose chegg because it has the most consistent results, and greatest number of options.
Write a script that gets the data for one university
Now try it with 10. If there any bugs, adjust the script necessary to make sure it runs all 10
Now try it with 100
If #4 has 0 bugs, you are free to run the entire script. Make sure you check-on the script regularly in case a bug comes up. (maybe set a timer

Only start this after the above process is complete. You should not be running any scripts
MUST use Tor_Client when retrieving the data
If you are using a for-loop that retrieves data for each university, make sure you save the progress after each iteration (to prevent starting over if there is an error).
If you have to re-run the script, the script should resume from where you started.
If there is a ConnectionError or any Error, pause the script & restart.
When your script is complete, save the file with 'final-{{original_file_name}}.json"

Save the final .json in two formats, one of them should be compressed - one of them should be formatted with JSON indentation.
Drag the two .json files in the [Uguru Drive] (https://drive.google.com/drive/folders/0By5VIgFdqFHdfm85QV9lQm5pbHVUdzRsaWtjME0wcm5FUEJkeTF2V1hyU1BtLXM4SXF2LTQ)

Provide feedback