Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to define the cutoff threshold? #17

Closed
seb-garcia opened this issue Nov 17, 2023 · 3 comments
Closed

How to define the cutoff threshold? #17

seb-garcia opened this issue Nov 17, 2023 · 3 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@seb-garcia
Copy link

Hello from Peru I am very interested in the work you have done with the MPI. It has been helping us so much on a report we are writing to assess living conditions of Venezuelan refugees and migrants in Peru.

I am trying to implement your package into a NSO survey applied to Venezuelan population (ENPOVE 2022)* in Peru. Our repository is at this github Repo.

I am facing a problem on how the cutoff threshold works. First, I have tried to use a condition using dplyr grammar:

deprivation_profile$year_schooling <- df_household_roster |>
  define_deprivation(
    .indicator = year_schooling,
    .cutoff = (P501 < 6 | P501B < 6  ) & P205_A>17,
    .collapse = TRUE
  )

And I get this error message:

Warning: There were 1140 warnings in `dplyr::summarise()`.
The first warning was:
ℹ In argument: `Years of schooling = max(`Years of schooling`, na.rm = T)`.
ℹ In group 17: `uuid = "00038002381"`.
Caused by warning in `max()`:
! no non-missing arguments to max; returning -Inf
ℹ Run `dplyr::last_dplyr_warnings()` to see the 1139 remaining warnings.

However when I do:

deprivation_profile$year_schooling <- df_household_roster |>
  define_deprivation(
    .indicator = year_schooling,
   # .cutoff = (P501 < 6 | P501B < 6) & P205_A>17,
    .cutoff = year_schooling == 0, 
    .collapse = TRUE
  )

The code works just right.

Do you know what the issue is? Could you help us figure out what is the problem?

Moreover, I assume after I get the deprivation_matrix as an output I can apply a weighting vector to extrapolate the results. Would it work?

Thank you so much for your help!

*Microdata can be downloaded from INEI's website:
For Household

For household members

@yng-me yng-me added enhancement New feature or request question Further information is requested labels Nov 17, 2023
@yng-me
Copy link
Owner

yng-me commented Nov 17, 2023

Hi, @seb-garcia. On your first question, the condition P501 < 6 | P501B < 6) & P205_A > 17 seems to return NA for some households. Here's my workaround but you should decide on how to treat these NAs. In my case, I coerced NA equal to 0.

deprivation_profile$year_schooling <- df_household_roster |>
  mutate(deprived_year_schooling = if_else((P501 < 6 | P501B < 6) & P205_A > 17, 1, 0, 0)) |> 
  define_deprivation(
    .indicator = year_schooling,
    .cutoff = deprived_year_schooling == 1,
    .collapse = TRUE
  )

I will include additional argument in define_deprivation on how to treat NA as a result of evaluating the deprivation cutoff, so you don't need to do extra steps doing data transformation. Watch out for the next release.

@yng-me
Copy link
Owner

yng-me commented Nov 17, 2023

On your other query, yes, you can definitely apply a weighting vector to the deprivation_matrix object returned by using compute_mpi. Please note, though, that compute_mpi is already applying the weights that you define in your specification file under the hood.

yng-me pushed a commit that referenced this issue Dec 26, 2023
@yng-me
Copy link
Owner

yng-me commented Dec 26, 2023

See #18

@yng-me yng-me closed this as completed Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants