Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As user, I would like to optimise the data collection process #27

Open
6 tasks
Edouard-Legoupil opened this issue Sep 27, 2023 · 1 comment
Open
6 tasks
Assignees
Labels
enhancement New feature or request

Comments

@Edouard-Legoupil
Copy link
Contributor

Edouard-Legoupil commented Sep 27, 2023

The goal of this ticked is to develop the back office function & logic behind the data collection step

The module script is here: https://github.com/unhcr-americas/surveyDesigner/blob/main/R/mod_collection.R

From the previous stage, we have used the filters based on indicator selection and language (language label used for the country) to subset a list of questions and potential answers (if select_one or select_multiple).

In the collection stage, we need to assess if the total questionnaire should be split in different parts, aka data collection waves using :

  • interview duration (each questions within the form can be assessed with the interview duration function),

  • questions groups (aka a module of questions grouped between 'begin_group' and 'end_group'), and

  • on indicator requirement (aka, based on the mapping, multiple questions potentially spread over multiple modules together with linked questions required for indicator disaggregation).

  • data collection mode as it impact the sequence of the questions (in CAPI, sensitive questions being more kept at the end, while it is the contrary for CATI)

  • an estimation of the response rate based on average interview duration. As the longer is a survey, the higher is the risk of dropout, the impact of the designing long survey be can be estimated by the cost of reaching out people whose information will not be recorded. (basically total cost per interview would be a function of response rate). See this publication - Optimizing Data Collection Interventions to Balance Cost and Quality in a Sequential Multimode Survey

Also we shall estimates an operational budget, based on costing input (aka enumeration capacity and total cost per interview) and various respondant sample size threshold (500, 1000, 5000).

This should be done by assessing the current decision input and simulating the results of other alternatives.

Client - Validation

  • Specification of required input data to smartly split a too-lengthy questionnaire...
  • Output one or many split questionnaire (one per wave and data collection mode..)
  • Output a summary of has been done - and what could be done for what advantage

image

Dev - Tech

  • Might need to rework the current input data in order to build the use case
  • The output should suggest some adjustment parameters.. (increase the data collection waves.. ) with recommendations with projected budgets per scenario
  • Technical validation (tests, check etc.)
@Edouard-Legoupil
Copy link
Contributor Author

Edouard-Legoupil commented Sep 27, 2023

Based on the discussion this AM - I have revised the logic in the interface -
image

Scoping the need now between 2 distinct functions

  1. An function to optimize the generation of the n surveys (aka wave) based on a list of questions

  2. A simplex function to optimize the association between number of data collection waves and cost, based on:

    • Indicators that required
    • Data Collection mode
    • Attrition (attempt & correlation between duration and drop-off)
    • Capacity (total cost per interview, # of numerators)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Ready
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants