The home of the sps_bill gem: an SP Services PDF bill structured data reader
Ruby R Shell
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
lib
scripts
spec
.document
.gitignore
.rspec
.rvmrc
.travis.yml
CHANGELOG
Gemfile
Gemfile.lock
Guardfile
LICENSE
README.rdoc
Rakefile
sps_bill.gemspec

README.rdoc

SP Services Bill Scanner

Extracts bill details from SP Services PDF bills so that you can, um, do geeky data analysis n'stuff, and because I loathe data entry! One day we'll have smart meters and SP Services will let us download our raw meter data. But until then…

If you are an SP Services subscriber, download your bills from services.spservices.sg

If you are not an SP Services subscriber, this gem ain't going to be much use for you!

Some example analysis using R is included in the scripts folder. The inspiration for hacking away with R comes from reading Sau Sheong's new book Exploring Everyday Things with Ruby and R. Check it out!

Requirements and Known Limitations

  • Requires Ruby 1.9 (1.8 compatibility not tested or assured)

  • R is used for data analysis examples - this is optional

  • Currently it does not handle multi-property bills.

  • May not handle extensive multi-page bills for a single property correctly.

If you do come across bills that this gem can't read correctly, your help to get it fixed is greatly needed: either submit a fix yourself, or report the problem at github.com/tardate/sps_bill_scanner/issues

Help Required: Test this with your own bills

Unfortunately, there isn't a definitive set of bill examples that make it possible to ensure that this gem will work for every bill issued by SP Services. And since bills are in PDF format, extracting the data in a structured manner is a task fraught with pitfalls.

I have tested this with my own bills (going back over a year) but that definitely doesn't mean it will work for others.

So I need help from others who are willing to test with their own bills. I'm not asking for your bills as that raises privacy concerns, and I definitely don't want real bills committed to the git repository.

Instead, I have setup the tests in a way that should make it easy for you to test with your own bills.

Here's the basic outline:

  • First, make sure tests are running green for you 'as-is':

    • fork or clone the repo

    • bundle will install development dependencies

    • rake will run the tests - they should all be OK

  • Get you own PDF bills from from services.spservices.sg

    • put them in spec/fixtures/personal_pdf_samples

    • NB: these are ignored by git so you won't accidentally commit them

  • At this point you can run the tests with rake and it will do a very basic check of the PDFs you have added

  • To run complete checks to ensure all the data is being extracted correctly:

    • see the doc in spec/fixtures/personal_pdf_samples/expectations.yml.sample

    • copy this to spec/fixtures/personal_pdf_samples/expectations.yml

    • enter in the details that describe each bill you have added

    • now when you run rake it will also verify the data extracted from your bills using expectations.yml

Feel free to get in touch or discuss in the github issues area if you are trying to help but run into problems with this!

If you are more interested in the data analytics, I'm keen to add more interesting R scripts to the collection. Your contributions are most welcome.

Installation

gem install sps_bill

Command Line Usage

Once the gem is installed, use sps_bill at the command line to interact with the library manually.

To get help on command options:

$ sps_bill -h

For example: to extract all data in CSV format from a set of PDF bills:

$ sps_bill --data=all ./path_to/my_bills*.pdf

Programmatic Usage

You can use the gem from your own scripts or applications. There are just two classes you really need to understand:

  • SpsBill::BillCollection is an Array-like class that represents a collection of bills.

    • the load method is used to initialise it given a path or array of filenames

    • a range of collection methods are provided to extract sets of data (e.g. electricity_usages)

  • SpsBill::Bill represents an individual bill

    • initialised given a file name

    • provides a range of accessors to get at individual data elements (e.g. electricity_usage)

To load a collection of bills:

> require 'sps_bill'
> bills = SpsBill::BillCollection.load('./my_bills/*.pdf')
> bills.total_amounts
  => [["2011-10-01", 168.86], ["2011-11-01", 196.46], ["2011-12-01", 176.54]]
> bills.electricity_usages
  => [["2011-10-01", 14.0, 0.2728, 3.82], ["2011-10-01", 444.0, 0.2698, 119.79],
  ["2011-11-01", 2.0, 0.2728, 0.54], ["2011-11-01", 537.0, 0.2698, 144.88],
  ["2011-12-01", 482.0, 0.2698, 130.04]]
> bills.gas_usages
  => [["2011-10-01", 12.0, 0.1961, 2.35], ["2011-11-01", 12.0, 0.2117, 2.54], ["2011-12-01", 12.0, 0.2117, 2.54]]
> bills.water_usages
  => [["2011-10-01", 8.4, 1.17, 9.83], ["2011-11-01", 11.4, 1.17, 13.34], ["2011-12-01", 9.6, 1.17, 11.23]]

To load and examine a specific bill:

> require 'sps_bill'
> pdf_bill_file = "./my_latest_bill.pdf"
> bill = SpsBill::Bill.new(pdf_bill_file)
> bill.account_number
8123123123
> bill.total_amount
251.44
> bill.invoice_date
2011-05-31
> bill.invoice_month
2011-05-01
> bill.electricity_usage
[{:kwh=>4.0, :rate=>0.241, :amount=>0.97},{:kwh=>616.0, :rate=>0.2558, :amount=>157.57}]
> bill.gas_usage
[{:kwh=>18.0, :rate=>0.1799, :amount=>3.24}]
> bill.water_usage
[{:cubic_m=>36.1, :rate=>1.17, :amount=>42.24},{:cubic_m=>-3.0, :rate=>1.4, :amount=>-4.2}]
> bill.to_s
Account number: 8123123123
Invoice date  : 2011-10-31
Service month : 2011-10-01
Total bill    : $168.86

Electricity Usage
-----------------
[{:kwh=>14.0, :rate=>0.2728, :amount=>3.82}, {:kwh=>444.0, :rate=>0.2698, :amount=>119.79}]

Gas Usage
---------
[{:kwh=>12.0, :rate=>0.1961, :amount=>2.35}]

Water Usage
-----------
[{:cubic_m=>8.4, :rate=>9.83, :amount=>0.0}, {:cubic_m=>1.17, :rate=>0.0, :amount=>0.0}]

Data Analysis with R

Some examples of bill data and analysis using R are included in the scripts folder.

sample scripts

scripts/scan_all_bills.sh

an example script to scan a set of bills and produce a csv file for analysis

scripts/full_analysis.R

an example script that prepares a one-page PDF summary analysis

sample data and analysis

data/all_services.csv.sample

sample CSV data for a years worth of elec, gas, and water

data/all_services.sample.pdf

PDF analysis produced by full_analysis.R using the all_services.csv.sample data set.

data/elec_and_water_only.csv.sample

sample CSV data for a years worth of elec and water

data/elec_and_water_only.sample.pdf

PDF analysis produced by full_analysis.R using the elec_and_water_only.csv.sample data set.

example run

./scan_all_bills.sh ../path_to_my_bills/*.pdf > my_bill_data.csv
./full_analysis.R my_bill_data.csv

This will have produced an analysis of all your bills in full_analysis.pdf.

Contributing to sps_bill_scanner

  • Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet

  • Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it

  • Fork the project

  • Start a feature/bugfix branch

  • Commit and push until you are happy with your contribution

  • Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.

  • Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.

Copyright

Copyright © 2012 Paul Gallagher. See LICENSE.txt for further details.