Program to analyse bank statements, and categorise and generate a report for expenses during a time period (depending on duration of the statement)
Recently I did a redesign of this, so that adding different bank statement support can be added more easily. Please do contribute if you do add any :)
As of now, the tool has following backends:
- HDFC Bank Statement: Delimited Format
- HDFC Credit Card Statements: PDF format, and Delimited format, supports atleast Regalia/Rupay/Swiggy cards
- SBI Bank Statement: PDF format
- Amazon Pay Statement: Need to run a script in browser to extract data, as Amazon doesn't support exporting it
- Generic JSON format: must follow the mentioned IR below, it's easy
Experimental: Using IBM BAM AI models to categorise the transactions (https://bam.res.ibm.com/)
Read more in Design section.
- Login to https://netbanking.hdfcbank.com/netbanking/
- On the Left side options, Chose 'Enquire' -> 'A/c Statement - Current & Previous Month'.
- Select Account and Statement Period according to your need (I chose 1st to 31st of last month). Click 'View'.
- Go to bottom of the statement's page, 'Select Format' as 'Delimited', then 'Download'
Now run ./kharcha.js --hdfc FILENAME
(where FILENAME is the path to the
downloaded file)
- Login at https://retail.onlinesbi.sbi/retail/login.htm
- Go to bank statement page, download the statement in PDF format
Now run ./kharcha.js --sbi THE_PDF
The design is similar to how some compilers work, where there can be multiple sources ('source languages' in case of compilers), all of which much convert to a known & expected format of "Intermediate Representation", which in our case is just a list of objects, where each object must have some keys such as 'text', 'debit', 'credit' and 'date' etc. The internal implementation of this IR uses pandas DataFrames. If interested just look at one of the backends in backend/ directory.
The current design splits the process of analysing into 3 stages:
Stage 1: Convert passed input into IR (Intermediate Representation) This is source dependent, ie. HDFC statement will require different logic, SBI will have different logic, HDFC Credit card statement might require different logic
By end of this state, we will have a list of objects with
'at-least' these keys:
{
date: String,
text: String,
debit: Number,
credit: Number,
}
Note: Even though 'type' is not mentioned here, but backends can have the 'type' column, and these pre-assigned category/type will be considered as is by the tool
Rest of the stages are now independent of whether it's an SBI/HDFC/ICICI statement etc.
Stage 2: Categorisation, here we add the 'type' labels Currently using a manually created list to assign types. But as the design is modular now, should be easier to add ML into the picture
Stage 3: Analysis/Report Generation