Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent weight variable for some years #66

Closed
Tracked by #61
buscandoaverroes opened this issue Jul 15, 2021 · 2 comments · Fixed by #67
Closed
Tracked by #61

Inconsistent weight variable for some years #66

buscandoaverroes opened this issue Jul 15, 2021 · 2 comments · Fixed by #67
Labels
PHL question Further information is requested

Comments

@buscandoaverroes
Copy link
Contributor

buscandoaverroes commented Jul 15, 2021

Some years (listed below) do not have a consistently-constructed weight variable across all rounds. What this means is that, when appended, some rounds' observations are disproportionately scaled. There also doesn't seem to be an obvious way to scale or manipulate the weight variable such that it can be harmonized: since some rounds do have multiple versions of the weight variable, we can test to see if a new scaled_weight_var is equal to an existing, "correct" final_weight_var

gen scaled_weight_var = wrongly_scaled_weight_var*10000
assert scaled_weight_var == final_weight_var

However, the assertion is wrong on ~30,000 observations, and a visual check shows that the translation between different weight variables is clearly not simply linear.

This issue applies to years: 2005, 2007

Furthermore, for 2008, there's only 1 internal weight variable in each round -- so no opportunity to compare -- but the January weight variable has a mean scaled much higher. Should I just scaled January down to the other rounds in this case without any way to confirm?

Furthermore, for 2009-2011, the problem is similar to 2008, except the only consistent weight variable is just called weight. There is indeed a fwgt or "Final weight" variable which is inconsistently coded across years, but the documentation is clear that this is the variable to use -- it says to use the variable indicated as "final weight". However, in 2012-13, the only weight variable available is not fwgt but weight, which leads me to believe that the latter may actually be the final weight variable even thought weight is not actually labeled as "final weight". Further complicating matters is that in 2014 onward, the variable labelled "Final weight' is in fact the correct, lower scaled, harmonized, weight variable. I think in summary, everything is inconsistent.

So if all the rounds do not have the compatible weight variables, what should I do?

@buscandoaverroes buscandoaverroes added question Further information is requested PHL labels Jul 15, 2021
This was referenced Jul 15, 2021
@gronert-m
Copy link
Contributor

In general, my advise is to compare with the published records of the PSA, as they have percentage but also absolute numbers. I have checked 2007 and can emulate the numbers with fwgt always. In the case of 2005 I don't get to any of the numbers. We are fairly close but I cannot comprehend why not.

In preparation for tomorrow I would recommend comparing to the published PSA data with the different weights to understand what the potential issues can be.

@buscandoaverroes buscandoaverroes linked a pull request Jul 16, 2021 that will close this issue
@buscandoaverroes
Copy link
Contributor Author

resolved in #67 : concluded that the data in the weight variables provided are good and the correct variables are used for each round. This is confirmed through checking the Labor Force Participation figures against those of the PSA and confirming that they are correct. For now, documenting the discrepancies in expanded population.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PHL question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants