Skip to content

adi-g15/kharcha

Repository files navigation

Kharcha

Program to analyse bank statements, and categorise and generate a report for expenses during a time period (depending on duration of the statement)

Recently I did a redesign of this, so that adding different bank statement support can be added more easily. Please do contribute if you do add any :)

As of now, the tool has following backends:

  • HDFC Bank Statement: Delimited Format
  • HDFC Credit Card Statements: PDF format, and Delimited format, supports atleast Regalia/Rupay/Swiggy cards
  • SBI Bank Statement: PDF format
  • Amazon Pay Statement: Need to run a script in browser to extract data, as Amazon doesn't support exporting it
  • Generic JSON format: must follow the mentioned IR below, it's easy

Experimental: Using IBM BAM AI models to categorise the transactions (https://bam.res.ibm.com/)

Design diagram of kharcha

Read more in Design section.

Usage

HDFC Bank Statements

  1. Login to https://netbanking.hdfcbank.com/netbanking/
  2. On the Left side options, Chose 'Enquire' -> 'A/c Statement - Current & Previous Month'.
  3. Select Account and Statement Period according to your need (I chose 1st to 31st of last month). Click 'View'.
  4. Go to bottom of the statement's page, 'Select Format' as 'Delimited', then 'Download'

Now run ./kharcha.js --hdfc FILENAME (where FILENAME is the path to the downloaded file)

SBI Bank Statements

  1. Login at https://retail.onlinesbi.sbi/retail/login.htm
  2. Go to bank statement page, download the statement in PDF format

Now run ./kharcha.js --sbi THE_PDF

Design

The design is similar to how some compilers work, where there can be multiple sources ('source languages' in case of compilers), all of which much convert to a known & expected format of "Intermediate Representation", which in our case is just a list of objects, where each object must have some keys such as 'text', 'debit', 'credit' and 'date' etc. The internal implementation of this IR uses pandas DataFrames. If interested just look at one of the backends in backend/ directory.

The current design splits the process of analysing into 3 stages:

Stage 1: Convert passed input into IR (Intermediate Representation) This is source dependent, ie. HDFC statement will require different logic, SBI will have different logic, HDFC Credit card statement might require different logic

     By end of this state, we will have a list of objects with
     'at-least' these keys:

     {
            date: String,
            text: String,
            debit: Number,
            credit: Number,
     }

Note: Even though 'type' is not mentioned here, but backends can have the 'type' column, and these pre-assigned category/type will be considered as is by the tool

Rest of the stages are now independent of whether it's an SBI/HDFC/ICICI statement etc.

Stage 2: Categorisation, here we add the 'type' labels Currently using a manually created list to assign types. But as the design is modular now, should be easier to add ML into the picture

Stage 3: Analysis/Report Generation

About

Tool to automate expense summary from SBI, HDFC, Credit Cards, Amazon Pay statements.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published