# Simple example

In [1]:
# Add BipartitePandas to system path, do not run this
import sys
sys.path.append('../../..')

## Import the BipartitePandas Package

Make sure to install it using `pip install bipartitepandas`

In [None]:
import bipartitepandas as bpd

## Get your data ready

In this example, we simulate data (we set parameters to make data cleaning interesting)

In [None]:
df = bpd.SimBipartite(bpd.sim_params({'firm_size': 10, 'p_move': 0.05})).simulate()
display(df)

## Columns

BipartitePandas includes 5 pre-defined general columns:

#### Required
- $i$: worker id (any type)
- $j$: firm id (any type)
- $y$: income (float or int)

#### Optional
- $t$: time (int)
- $g$: firm type (any type)
- $w$: weight (float or int)
- $m$: move indicator (int)

## Formats

BipartitePandas includes 4 formats:
- *Long* - each row gives a single observation
- *Collapsed Long* - like *Long*, but employment spells at the same firm are collapsed into a single observation
- *Event Study* - each row gives two consecutive observations
- *Collapsed Event Study* - like *Event Study*, but employment spells at the same firm are collapsed into a single observation

These formats divide general columns differently:
- *Long* - $i$, $j$, $y$, $t$, $g$, $w$, $m$
- *Collapsed Long* - $i$, $j$, $y$, $t1$, $t2$, $g$, $w$, $m$
- *Event Study* - $i$, $j1$, $j2$, $y1$, $y2$, $t1$, $t2$, $g1$, $g2$, $w1$, $w2$, $m$
- *Collapsed Event Study* - $i$, $j1$, $j2$, $y1$, $y2$, $t11$, $t12$, $t21$, $t22$, $g1$, $g2$, $w1$, $w2$, $m$

## Constructing DataFrames

Our simulated data is in *Long* format. How do we construct a *Long* dataframe?

In [None]:
i = df['i']
j = df['j']
y = df['y']
t = df['t']
bdf = bpd.BipartiteDataFrame(i=i, j=j, y=y, t=t)
display(bdf)

Are we sure this is long? Let's check the datatype:

In [None]:
type(bdf)

## Before we clean our data, let's check out some statistics

In [None]:
bdf.summary()

## Now let's clean our data - and make sure the result is leave-one-observation-out connected

Hint: want details on all cleaning parameters? Type `bpd.clean_params().describe_all()`, or search through `bpd.clean_params().keys()` for a particular key, and then type `bpd.clean_params().describe(key)`.

In [None]:
bdf = bdf.clean(bpd.clean_params({'connectedness': 'leave_out_observation'}))
display(bdf)

We can check how the summary statistics changed:

In [None]:
bdf.summary()

## Converting formats

### *Collapsed* format

In [None]:
display(bdf.collapse())

### *Event Study* format

In [None]:
display(bdf.to_eventstudy())

### *Collapsed Event Study* format

In [None]:
display(bdf.collapse().to_eventstudy())

## Generating firm clusters

Notice the new $g$ column

In [None]:
display(bdf.cluster())