Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I2D2 checks assume that njobs and unitwage data are in sync but actually this is not true in data #72

Closed
Tracked by #69
buscandoaverroes opened this issue Jul 20, 2021 · 3 comments · Fixed by #68
Labels
coding issues specific coding issues PHL

Comments

@buscandoaverroes
Copy link
Contributor

buscandoaverroes commented Jul 20, 2021

In the I2D2 check we check that

di "check unitwage_2 only for njobs>0"
  3. assert unitwage_2==. if( njobs==0 | njobs==.)  & jobs_var == 1         // only perform if njobs exists
  4. }

But this is not true on about 20% of observations, meaning that in 20% of cases there's a non-missing value for unitwage_2 where the data say there should be "no" jobs. This hasn't been tested for unitwage, but the same scenario would happen.

The problem is that the assumption behind this check does not apply to this dataset in particular. I think the check assumes that njobs, unitwage and industry and other job labor variables are in sync. That is, if I have 1 job in injustry a, that is represented in unitwage and in industry etc. But it appears that, in this dataset, that njobs is not consistent with other data reported in other variables. This is true in other rounds and in other years. I think maybe the best solution is simply to remove this check here, because we just have to consider njobs as what is reported.

@buscandoaverroes buscandoaverroes mentioned this issue Jul 20, 2021
5 tasks
@buscandoaverroes buscandoaverroes changed the title years 2008 + error thrown because njobs data is not consistent with unitwage lstatus industry etc. Edit check. I2D2 checks assume that njobs and unitwage data are in sync but actually this is not true in data Jul 20, 2021
@buscandoaverroes
Copy link
Contributor Author

The solution would simply be to remove unitwage_2 from the local vector, and do this for all years to be consistent because the logic applies for all years, even if the error is only found in some.

local lb_var "empstat_2 industry_2 industry1_2 industry_orig_2 occup_2 wage_2" // unitwage_2

(and later delete the comment)

buscandoaverroes added a commit that referenced this issue Jul 20, 2021
@buscandoaverroes
Copy link
Contributor Author

previous commit worked, applied across all 2008+ code (since this year starts having njob data)

@buscandoaverroes buscandoaverroes added coding issues specific coding issues PHL labels Jul 20, 2021
@buscandoaverroes
Copy link
Contributor Author

resolved by code in previous commit

@buscandoaverroes buscandoaverroes linked a pull request Jul 20, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
coding issues specific coding issues PHL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant