Inconsistent weight variable for some years #66

buscandoaverroes · 2021-07-15T16:28:49Z

Some years (listed below) do not have a consistently-constructed weight variable across all rounds. What this means is that, when appended, some rounds' observations are disproportionately scaled. There also doesn't seem to be an obvious way to scale or manipulate the weight variable such that it can be harmonized: since some rounds do have multiple versions of the weight variable, we can test to see if a new scaled_weight_var is equal to an existing, "correct" final_weight_var

gen scaled_weight_var = wrongly_scaled_weight_var*10000
assert scaled_weight_var == final_weight_var

However, the assertion is wrong on ~30,000 observations, and a visual check shows that the translation between different weight variables is clearly not simply linear.

This issue applies to years: 2005, 2007

Furthermore, for 2008, there's only 1 internal weight variable in each round -- so no opportunity to compare -- but the January weight variable has a mean scaled much higher. Should I just scaled January down to the other rounds in this case without any way to confirm?

Furthermore, for 2009-2011, the problem is similar to 2008, except the only consistent weight variable is just called weight. There is indeed a fwgt or "Final weight" variable which is inconsistently coded across years, but the documentation is clear that this is the variable to use -- it says to use the variable indicated as "final weight". However, in 2012-13, the only weight variable available is not fwgt but weight, which leads me to believe that the latter may actually be the final weight variable even thought weight is not actually labeled as "final weight". Further complicating matters is that in 2014 onward, the variable labelled "Final weight' is in fact the correct, lower scaled, harmonized, weight variable. I think in summary, everything is inconsistent.

So if all the rounds do not have the compatible weight variables, what should I do?

The text was updated successfully, but these errors were encountered:

gronert-m · 2021-07-15T21:54:25Z

In general, my advise is to compare with the published records of the PSA, as they have percentage but also absolute numbers. I have checked 2007 and can emulate the numbers with fwgt always. In the case of 2005 I don't get to any of the numbers. We are fairly close but I cannot comprehend why not.

In preparation for tomorrow I would recommend comparing to the published PSA data with the different weights to understand what the potential issues can be.

buscandoaverroes · 2021-07-16T18:56:34Z

resolved in #67 : concluded that the data in the weight variables provided are good and the correct variables are used for each round. This is confirmed through checking the Labor Force Participation figures against those of the PSA and confirming that they are correct. For now, documenting the discrepancies in expanded population.

buscandoaverroes added question Further information is requested PHL labels Jul 15, 2021

This was referenced Jul 15, 2021

PHL debug to-do list #61

Closed

Merge small and large fixes #46

Merged

buscandoaverroes linked a pull request Jul 16, 2021 that will close this issue

Documentation for weights #67

Merged

buscandoaverroes mentioned this issue Jul 16, 2021

Documentation for weights #67

Merged

buscandoaverroes closed this as completed Jul 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent weight variable for some years #66

Inconsistent weight variable for some years #66

buscandoaverroes commented Jul 15, 2021 •

edited

Loading

gronert-m commented Jul 15, 2021

buscandoaverroes commented Jul 16, 2021

Inconsistent weight variable for some years #66

Inconsistent weight variable for some years #66

Comments

buscandoaverroes commented Jul 15, 2021 • edited Loading

gronert-m commented Jul 15, 2021

buscandoaverroes commented Jul 16, 2021

buscandoaverroes commented Jul 15, 2021 •

edited

Loading