Tracking Scotrail Performance
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Tracking Scottish Rail Performance


This project is my first 100 Days of Code project.

It is designed to (and mostly does) the following:

  • spot the publication of 4-weekly PDF performance reports (it still doesn't - being run manually)
  • download new ones,
  • use the PDFTables API to convert to CSV,
  • extract headline data, and 4 detailed measures for each Scottish Railway station,
  • write all data to a SQLite database
  • make that publicly available.


  1. You'll need [Beautiful Soup 4]( soup) installed
  2. You'll need an API Key from PdfTables
  3. Rather than write your own wrapper for the API, install this package
  4. Create a file to put your API key in, and have a line in it as follows: pdftables_key = "xxxxxxxx"

##Progress so far##

  • Given the starting URL, the scraper finds the link to the current performance report in PDF.
  • The programme notes the file name (as it contains info on the year and period which we use later) and downloads a copy. Update as of scraperv1.4 - it now deals with shifting naming patterns which appeared in p10 of 2016-17.
  • The programme invokes the PDFTables API, sending the PDF and gets returned a CSV file which is given the same file name but with the correct CSV suffix.
  • We then parse the CSV, locating the necessary bits of data, writes these to nested lists, which it the sorts alphabetically by station before writing these to a plain text file as CSV. This is useful for those who cannot use the SQLite database.
  • It stores the data in three linked tables in a SQLite database
  • I've moved the code which creates or drops the tables if they exist to a function, created a call to the function in the main programme body, so that it can easily be commented out to avoid deleting existing data.
  • I've now recoded the main extraction process to work on tables with an extra blank column which appeared in P9 and P10 or 2016-17.

Optional extras##

I might also

  • set up a twitter feed to draw attention to significant changes month on month,
  • set up a website to make the data more widely avaialable
  • create visualisations
  • set up an API to grab the data