Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieving the firm and worker identifiers (Pytwoway 0.1.14.) #5

Closed
k-segiet opened this issue Aug 26, 2021 · 1 comment
Closed

Retrieving the firm and worker identifiers (Pytwoway 0.1.14.) #5

k-segiet opened this issue Aug 26, 2021 · 1 comment

Comments

@k-segiet
Copy link

My goal is to estimate the firm and worker fixed effects (psi_hat and alpha_hat) and to merge them to the original population by the firm and worker identifiers (j and i). However, when running the prep_data() function of the TwoWay class, the identifiers i and j are changed and run from 0 to J for firm identifiers j (where J is the number of firms) and from 0 to N for worker identifiers i (where N is the number of workers).

How could I modify the code so that the original firm and worker identifiers are unchanged, which would enable me to merge the estimated psi_hat and alpha_hat to the original population by the firm and worker identifiers (j and i)?

Thank you for your help.

@k-segiet
Copy link
Author

k-segiet commented Aug 26, 2021

Adam's answer:

Thank you for reaching out!

I wrote up some example code to illustrate how this can be done. This takes advantage of the option include_id_reference_dict in BipartitePandas. Unfortunately this means that the data cleaning must be done manually, but it's just a few extra lines of code.

To run this on your own code, you can replace sim_data with your own data, and delete the line that takes the subset of i < 100.

Also note that I used some options I added after this issue was raised on the github, which makes it so it only generates the fixed effects and doesn't estimate the variance/covariances.

Best,
Adam

import bipartitepandas as bpd
import pytwoway as tw
import pandas as pd

#### Simulate data
sim_data = bpd.SimBipartite({'nk': 50, 'num_time': 2, 'num_ind': 1000}).sim_network()
#### Manually clean data
bdf = bpd.BipartiteLong(sim_data, include_id_reference_dict=True) # Set include_id_reference_dict=True to save original ids
#### Subset of data so largest connected set is subset of all firms
bdf = bdf[bdf['i'] < 100]
bdf = bdf.clean_data()
bdf.gen_m()

#### Create TwoWay object
tw_net = tw.TwoWay(bdf.original_ids()) # bdf.original_ids() creates a dataframe with columns that give the original ids
#### Skip data cleaning step in TwoWay object, but mark data as clean
tw_net.clean = True

fe_params = {
'ncore': 1, # Number of cores to use
'batch': 1, # Batch size to send in parallel
'ndraw_pii': 50, # Number of draws to use in approximation for leverages
'levfile': '', # File to load precomputed leverages
'ndraw_tr': 5, # Number of draws to use in approximation for traces
'he': False, # If True, compute heteroskedastic correction
'out': 'res_fe.json', # Outputfile where results are saved
'statsonly': False, # If True, return only basic statistics
'feonly': True, # If True, compute only fixed effects and not variances
'Q': 'cov(alpha, psi)' # Which Q matrix to consider. Options include 'cov(alpha, psi)' and 'cov(psi_t, psi_{t+1})'
}

#### Since we set 'feonly': True, we just run the estimator normally and it only estimates the fixed effects to save time
tw_net.fit_fe(fe_params)

#### Now look at the data
new_data = tw_net.data

I would also recommend setting the following for better performance:

bdf = bdf.clean_data({'data_validity': False})

But also be careful that this isn't designed to work if you are manipulating the data or reformatting the data (for instance from long to event study, etc.) after data cleaning, so you should verify it is working properly in your case before committing to using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant