# FE example

In [1]:
# Add PyTwoWay to system path (do not run this)
# import sys
# sys.path.append('../../..')

## Import the PyTwoWay package

Make sure to install it using `pip install pytwoway`.

In [2]:
import pytwoway as tw
import bipartitepandas as bpd

## First, check out parameter options

Do this by running:

- FE - `tw.fe_params().describe_all()`

- Cleaning - `bpd.clean_params().describe_all()`

- Simulating - `bpd.sim_params().describe_all()`

Alternatively, run `x_params().keys()` to view all the keys for a parameter dictionary, then `x_params().describe(key)` to get a description for a single key.

## Second, set parameter choices

Note that we set `copy=False` in `clean_params` to avoid unnecessary copies (although this will modify the original dataframe).

<div class="alert alert-info">

Hint

If you just want to retrieve the worker and firm effects from the OLS estimation, set `'feonly': True` and `'attach_fe_estimates': True` in your FE parameters dictionary.

If you want the OLS estimates to be linked to the original worker and firm ids, when initializing your BipartitePandas DataFrame set `track_id_changes=True`, then run `df = bdf.original_ids()` after fitting the estimator to extract a Pandas DataFrame with the original ids attached.

</div>

In [3]:
# FE
fe_params = tw.fe_params(
    {
        'he': True
    }
)
# Cleaning
clean_params = bpd.clean_params(
    {
        'connectedness': 'leave_out_spell',
        'collapse_at_connectedness_measure': True,
        'drop_single_stayers': True,
        'drop_returns': 'returners',
        'copy': False
    }
)
# Simulating
sim_params = bpd.sim_params(
    {
        'n_workers': 1000,
        'firm_size': 5,
        'alpha_sig': 2, 'w_sig': 2,
        'c_sort': 1.5, 'c_netw': 1.5,
        'p_move': 0.1
    }
)

## Third, extract data (we simulate for the example)

`BipartitePandas` contains the class `SimBipartite` which we use here to simulate a bipartite network. If you have your own data, you can import it during this step. Load it as a `Pandas DataFrame` and then convert it into a `BipartitePandas DataFrame` in the next step.

In [4]:
sim_data = bpd.SimBipartite(sim_params).simulate()

## Fourth, prepare data

This is exactly how you should prepare real data prior to running the FE estimator.

- First, we convert the data into a `BipartitePandas DataFrame`

- Second, we clean the data (e.g. drop NaN observations, make sure firm and worker ids are contiguous, construct the leave-one-out connected set, etc.). This also collapses the data at the worker-firm spell level (taking mean wage over the spell), because we set `collapse_at_connectedness_measure=True`.

Further details on `BipartitePandas` can be found in the package documentation, available [here](https://tlamadon.github.io/bipartitepandas/).

<div class="alert alert-info">

Note

Since leave-one-out connectedness is not maintained after data is collapsed at the spell/match level, if you set `collapse_at_connectedness_measure=False`, then data must be cleaned WITHOUT taking the leave-one-out set, collapsed at the spell/match level, and then finally the largest leave-one-out connected set can be computed.

</div>

In [5]:
# Convert into BipartitePandas DataFrame
bdf = bpd.BipartiteDataFrame(sim_data)
# Clean and collapse
bdf = bdf.clean(clean_params)

checking required columns and datatypes
sorting rows
dropping NaN observations
generating 'm' column
keeping highest paying job for i-t (worker-year) duplicates (how='max')
dropping workers who leave a firm then return to it (how='returners')
making 'i' ids contiguous
making 'j' ids contiguous
computing largest connected set (how=None)
sorting columns
resetting index
checking required columns and datatypes
sorting rows
generating 'm' column
computing largest connected set (how='leave_out_observation')
making 'i' ids contiguous
making 'j' ids contiguous
sorting columns
resetting index


## Fifth, initialize and run the estimator

In [None]:
# Initialize FE estimator
fe_estimator = tw.FEEstimator(bdf, fe_params)
# Fit FE estimator
fe_estimator.fit()

## Finally, investigate the results

Results correspond to:

- `var_y`: variance of `y` (income) column
- `var_fe`: plug-in (biased) variance estimate
- `cov_fe`: plug-in (biased) covariance estimate
- `var_ho`: homoskedastic-corrected variance estimate
- `cov_ho`: homoskedastic-corrected covariance estimate
- `var_he`: heteroskedastic-corrected variance estimate
- `cov_he`: heteroskedastic-corrected covariance estimate

The particular variance that is estimated is controlled through the FE parameter `'Q_var'` and the covariance that is estimated is controlled through the parameter `'Q_cov'`.

By default, the variance is `var(psi)` and the covariance is `cov(psi, alpha)`, where `psi` gives firm effects and `alpha` gives worker effects.

In [7]:
fe_estimator.summary

{'var_y': 6.768657671430493,
 'var_fe': 2.048542936833729,
 'cov_fe': -0.21802690272811778,
 'var_ho': -0.15378156282085254,
 'cov_ho': 1.8111463798461933,
 'var_he': 0.3271993034869083,
 'cov_he': 1.2368180292179174}