# deltaframe

> A convenient way to manage to show and log the delta between two Pandas dataframes.

## Install

`pip install deltaframe`

In [None]:
#hide
%load_ext autoreload
%autoreload 2

## Import

In [None]:
from deltaframe.core import *
import pandas as pd

## How to use

First, lets create two dataframes (e.g. transaction date from consecutive days). 

In [None]:
df_old = pd.DataFrame({
    'date':['2013-11-24','2013-11-24','2013-11-24','2013-11-24'],
    'id':['001','002','003','004'],
    'quantity':[22,8,7,10],
    'color':['Yellow','Orange','Red','Yellow'],
})
df_new = pd.DataFrame({
    'date':['2013-11-24','2013-11-25','2013-11-24','2013-11-24'],
    'id':['001','002', '004', '005'],
    'quantity':[22,6,5,10],
    'color':['Yellow','Orange','Red','Pink'],
})

In [None]:
df_old

Unnamed: 0,date,id,quantity,color
0,2013-11-24,1,22,Yellow
1,2013-11-24,2,8,Orange
2,2013-11-24,3,7,Red
3,2013-11-24,4,10,Yellow


In [None]:
df_new

Unnamed: 0,date,id,quantity,color
0,2013-11-24,1,22,Yellow
1,2013-11-25,2,6,Orange
2,2013-11-24,4,5,Red
3,2013-11-24,5,10,Pink


#### Show the delta

Let's look at the main function `get_delta` first.

In [None]:
get_delta(df_old=df_old, df_new=df_new, unique_id="id", sort_by="date")

Unnamed: 0,date,id,quantity,color,transaction
4,2013-11-24,5,10.0,Pink,added
4,2013-11-24,3,7.0,Red,deleted
5,2013-11-24,4,5.0,Red,modified
4,2013-11-25,2,6.0,Orange,modified


What if we want to get the lasted dataframe based on all updates ?

In [None]:
df_latest = build_latest(df_old=df_old, df_new=df_new, unique_id=["id"], sort_by=["date"])
df_latest

Unnamed: 0,date,id,quantity,color
4,2013-11-24,5.0,10.0,Pink
5,2013-11-24,4.0,5.0,Red
6,2013-11-24,5.0,10.0,Pink
4,2013-11-25,2.0,6.0,Orange
0,,1.0,,
1,,2.0,,
2,,3.0,,
3,,,,


#### Show added, removed and modified rows.

It's also possible to just get information about added, removed or modified rows as shown in the following:

Show added rows with `get_added`.

In [None]:
added_rows = get_added(df_old=df_old, df_new=df_new, unique_id="id")
added_rows

Unnamed: 0,date,id,quantity,color,transaction
4,2013-11-24,5,10.0,Pink,added


What about removed rows (in df_old but not any longer in df_new) ?

`get_removed`

In [None]:
removed_rows = get_removed(df_old=df_old, df_new=df_new, unique_id="id")
removed_rows

Unnamed: 0,date,id,quantity,color,transaction
4,2013-11-24,3,7.0,Red,removed


Awesome, finally we check for the modified rows (also showing added rows) with `get_modified`.

In [None]:
modified_rows = get_modified(df_old=df_old, df_new=df_new, unique_id="id")
modified_rows

Unnamed: 0,date,id,quantity,color,transaction
4,2013-11-25,2,6,Orange,modified
5,2013-11-24,4,5,Red,modified
6,2013-11-24,5,10,Pink,modified


If we don't want to show added rows as modified, we can pass the added_rows dataframe created above.

In [None]:
modified_rows = get_modified(df_old=df_old, df_new=df_new, unique_id="id", added_rows=added_rows)
modified_rows

Unnamed: 0,date,id,quantity,color,transaction
4,2013-11-25,2,6,Orange,modified
5,2013-11-24,4,5,Red,modified


#### Logging the delta
It's also possible to log the delta (e.g. transactions over time). 

Initially there is no log file so we set `df_log=None`.

In [None]:
df_log = log_delta(df_log=None, df_old=df_old, df_new=df_new, unique_id="id")
df_log

Unnamed: 0,date,id,quantity,color,transaction
0,2013-11-24,1,22,Yellow,added
1,2013-11-24,2,8,Orange,added
2,2013-11-24,3,7,Red,added
3,2013-11-24,4,10,Yellow,added


When there's an existing log file we happily pass it to our logging function...

In [None]:
df_log = log_delta(df_log=df_log, df_old=df_old, df_new=df_new, unique_id="id")
df_log

Unnamed: 0,date,id,quantity,color,transaction
0,2013-11-24,1,22.0,Yellow,added
1,2013-11-24,2,8.0,Orange,added
2,2013-11-24,3,7.0,Red,added
3,2013-11-24,4,10.0,Yellow,added
4,2013-11-25,2,6.0,Orange,modified
5,2013-11-24,4,5.0,Red,modified
6,2013-11-24,5,10.0,Pink,added
7,2013-11-24,3,7.0,Red,removed


Finally, if we want to sort our log file by a particular column.

In [None]:
df_log = log_delta(df_log=df_log, df_old=df_old, df_new=df_new, unique_id="id", sort_by=["date"])
df_log

Unnamed: 0,date,id,quantity,color,transaction
0,2013-11-24,1,22.0,Yellow,added
1,2013-11-24,2,8.0,Orange,added
2,2013-11-24,3,7.0,Red,added
3,2013-11-24,4,10.0,Yellow,added
5,2013-11-24,4,5.0,Red,modified
6,2013-11-24,5,10.0,Pink,added
7,2013-11-24,5,10.0,Pink,added
8,2013-11-24,3,7.0,Red,removed
4,2013-11-25,2,6.0,Orange,modified
