Skip to content

D Annex Documentation of referential and questionnaires

Edouard Legoupil edited this page Apr 24, 2023 · 31 revisions

referential

The referential is the central piece of the {SurveyDesigner}. It is centrally managed to preserve its integrity and includes the documentation of both indicators, linked questions and geographic referential. The referential is then translated into different context for each country. This encompasses the adjustement of labels for both questions and modalities. A context can also includes ad-hoc questions and indicators.

Questionnaires are essentially generated out of the context matching the country and includes one or more than one XlsForm, i.e the definition of a form that can be used within multiple types of data collection servers

referential_type

Indeed, there might be different series of distinct referential, each matching very different data collection methodology (type of sampling, representativeness of the person interviewed).

  • referential_id
  • type
  • description

For instance:

  • household_survey - a set of indicator to be measure on stock population
  • flow_monitoring - a set of indicators specific to population in transit or in the move - not planning to establish an habitual residence for more than a year
  • key_informant - a set of indicators (mostly qualitative) collected from "persons with knowledge"
  • beneficiary_monitoring - a set of indicators to be collected on regular basis from the beneficiaries of a specific programme

For the initial prototype, we will focus on household_survey. Though the business logic should work independently from the type of referential, making the application re-usable and adaptable in multiple contexts and potentially different organizations.

Once loaded, the referential should be searchable in a similar way than the International Household Survey Question Bank (see also UNHCR Question Bank node) but with a "ready to use" solution for the design of survey instruments (i.e. already adapted to the specific implementation context).

Below is a description of each worksheet within the spreadsheet. A link to an existing standard is included whenever possible.

survey

For the questions part, the referential will match the standard xlsform structure with additional column

  • referential_id
  • type xlsform Ref
  • name unique id for the variable - defined as a standard global codebook
  • label - This is the master version for the label - defined in English per convention and defaults - The labels can be translated and contextualized for each country based on predefined instructions
  • hint xlsform Ref - This is the master version for the hint label - defined in English
  • required xlsform Ref
  • required_message xlsform Ref - defined in English - The labels can be contextualized for each country based on predefined instructions
  • constraint xlsform Ref
  • constraint_message xlsform Ref defined in English - The labels can be contextualized for each country based on predefined instructions
  • relevant xlsform Ref
  • appearance xlsform Ref
  • calculation xlsform Ref
  • trigger xlsform Ref
  • parameters xlsform Ref
  • repeat_count xlsform Ref
  • default xlsform Ref
  • read_only xlsform Ref
  • choice_filter xlsform Ref
  • media::image xlsform Ref
  • contextualize boolean indicates if contextualizing the label is allowed
  • contextualize_instruction string in case contextualizing the label is allowed, instruction to follow to guide the contextualization
  • block This a way to note how different questions should remain grouped together - See also https://xlsform.org/en/#grouping-questions
  • block_sequence integer defining the sequence for the this block within the interview flow
  • sequence integer defining the sequence for the variable within the block
  • mode factor with possible value being ALL, CAPI, CATI, and CAWI, which stand for Computer Assisted Personal Interviewing, Computer Assisted Telephone Interviewing, and Computer Assisted Web Interviewing. cf an explanation here
  • check define what type, if any, of High Frequency Check should be applied using this variable
  • accuracy expected level of accuracy for the indicators - can be used to prioritize the indicators over multiple data collection waves
  • chapter string defines a high level research question - used to group variables together when generating the automatic data exploration report with kobocruncher
  • subchapter string defines a high level sub research question - used to group variables together when generating the automatic data exploration report with kobocruncher
  • labelReport - This should be a short label for the variable - less than 80 char - will be used for reporting purpose so that it displays well in a chart - used by kobocruncher
  • hintReport Can be used to provide a longer description of the variable - as well as potential green, orange, red standard threshold value for the variable when interpreting it.. - used by kobocruncher
  • keyword list of associated keyword as defined in RIDL - cf schema

choices

Also many responses options should map established classification

  • referential_id
  • list_name this should be referenced within type when type starts with select_one or select_multiple
  • name unique id for the modality - defined as a standard global codebook
  • label - This is the master version for the label - defined in English - The labels can be contextualized for each country based on predefined instructions
  • order integer - if null the variable is not considered as ordinal - used by kobocruncher
  • labelReport - This should be a short label for the modality - less than 40 char - will be used for reporting purpose so that it displays well in a chart - used by kobocruncher
  • contextualize boolean indicates if contextualizing the label is allowed
  • contextualize_instruction string in case contextualizing the label is allowed, instruction to follow to guide the contextualization

indicator

indicators represent variables that are calculated from the variables directly collected. indicators can be the final metrics used for analysis or be simply auxiliary, meaning used fro disaggregation cf Question Bank

  • referential_id
  • type should be either select_one or numeric
  • name unique id for the indicator - defined as a standard global codebook
  • labelReport - This should be a short label for the indicators - less than 80 char - will be used for reporting purpose so that it displays well in a chart - used by kobocruncher
  • hintReport Can be used to provide a longer description of the indicators - as well as potential green, orange, red standard threshold value for the indicator when interpreting it.. - used by kobocruncher
  • list_name in case indicator results is discrete, make a refence within the choices elements of the labels to use
  • repeatvar in case of multiple frame in the dataset, indicate in wich frame the indicator should be appended
  • ind_type defines if the indicator defines a population, a disaggregation, a final or only an auxiliary (meaning an intermediate calculation done to build the final indicators)
  • sequence integer - use to define an order - important to ensure that auxiliary variables are created first in order to calculate the final indicators
  • block This a way to note how different indicators should be consistently calculated together - it will ease quick selection of multiple indicators with one single instructions - for instance all indicators that linked to the same selected impact or outcome
  • chapter string defines a high level research question - used to group variables together when generating the automatic data exploration report with kobocruncher
  • subchapter string defines a high level sub research question - used to group variables together when generating the automatic data exploration report with kobocruncher
  • calculation R statement used to create the indicator based on the standard global codebook and assuming that the data object that will be build from the dataset exported from kobo is a kobcrunhcer datalist
  • unit string unit for the indicator
  • accuracy expected level of accuracy for the indicators - can be used to prioritize the indicators over multiple data collection waves
  • mode_CAPI bolean indicates if the indicator can be collected with CAPI - Computer Assisted Personal Interviewing- then requires that there's an entry in choices[["mode"]] with this specific mode
  • mode_CATI bolean indicates if the indicator can be collected with CATI - Computer Assisted Telephone Interviewing - then requires that there's an entry in choices[["mode"]] with this specific mode
  • mode_CAWI bolean indicates if the indicator can be collected with CAWI - Computer Assisted Web Interviewing - then requires that there's an entry in choices[["mode"]] with this specific mode
  • metadata indicates limitation for the indicator - this provides in-depth documentation on the indicator concept and methodology
  • link provides a link to any established official documentation on the indicator
  • keyword list of associated keyword as defined in RIDL - cf schema

indicator_survey

This table allows to ensure that we have all required variables (aka survey questions) to calculate the indicator:

  • referential_id
  • name unique id for the indicator
  • name_survey unique id for the variable as defined in survey

indicator_choices

This table allows to ensure that we have all required modalities (aka response options) for the questions used to calculate the indicator:

  • referential_id
  • name unique id for the indicator
  • name_choices unique id concatenating the list_name and name from choices -

indicator_population

This table maps the relation one to many between one indicator and all the population group that the indicator can apply to.

  • referential_id
  • name unique id for the indicator
  • name_population factor - for instance for household survey this can be either "Refugees (REF)", "Asylum seekers (ASY)" , "Internally displaced persons (IDP)", "Other people in need of international protection (OIP)", "Stateless Persons (STA)", "Others of concern to UNHCR (OOC)" or "Host community (HCT)"

indicator_disaggregation

This table maps the relation one to many between one indicator and all the potentially expected disaggregation variables (aka another indicator or a survey name) that the indicator can apply to.

  • referential_id
  • name unique id for the indicator
  • name_disaggregation reference either an indicator or directly a variable from survey. Should be a factor - for instance in household survey could be among Age , Gender, Disability, Site

context

A context reflects the implementation of the referential within a specific country or operation. context are expected to enforce full data integrity with the main referential, meaning that all centrally defined indicators should have their corresponding survey questions and response available in all context

context_geography

  • context_id
  • region tag the region associated with the country - used to identify the relevant Regional Survey Support
  • country country iso code alpha3 - also defined in the choices table
  • geo additional geographic_id

context_language

This table maps the relation one to many between one country and all the languages that can be used in that context.

context_survey

Once language requirement have been defined for each country - then it should be contextualized, in line with Global Recommendations, and through a dialog between the relevant Regional Survey Support and the Operation Survey Focal Point. This can include the addition of ad-hoc context specific questions.

  • context_id
  • name
  • language the suffix indicates the language and should comply with language referential from Internet Assigned Numbers Authority (IANA) - this column can be repeated for as many language as needed
  • label
  • hint
  • required_message
  • constraint_message
  • question_type defines if this an ad-hoc context specific questions that was created during the context creation through a dialog between the regional survey support and the operation survey coordinator
  • contextualization_note
  • duration integer represent the number of second necessary to read the questions and the linked answers (if the question is type select). This can be estimated by a function like interview_duration. This variable is used to define if the total form interview remains within an acceptable total duration - aka 40 to 50 minutes and eventually suggest to spit the questionnaire within multiple data collection wave

context_choice

Once language requirement have been defined for each context - then it should be contextualized, in line with Global Recommendations, and through a dialog between the relevant Regional Survey Support and the Operation Survey Focal Point.

Note that the geographic referential is managed under choices - it can be filtered with the 2 dedicated list_name: country& admin1. Geographic referential name should align with the Common Operational Dataset Pcode

questionnaires

They represent an object with one or more than one fully compliant xlsform objects, created out of the referential.

All of the xlsform objects should be valid - Validation can be made using a dedicated python package pyxform

Each single xlsform objects should at least the combination of data collection mode and wave - for instance CATI_wave1 , CAPI_wave1, CAPI_wave2, etc.

In order to build the questionnaires from the referential , the survey manager will need to set up a series of filters and include some additional information for contextualization:

  1. select which context to use - i.e. select one country - this will filter the questions and the choices options accordingly as well as the languages translation

  2. select target population - This will filters what indicators can be calculated

  3. select topic and then linked multiple indicators or indicators block or survey questions - cf above definition of indicators - this will select only from non - auxiliary indicators

  4. select data collection mode

  5. indicate how many data collection waves can be organized. This shall be done based on the analysis of questionnaire duration and indicator accuracy

  6. Define the settings - see xlsform Ref within each sub-questionnaires

  7. Adjust the defaults block_sequence

Once finalized, a summary is generated to document the Annual Survey Management Cycle. The summary can highlight the main customization by doing a final comparison between the xlsform and the referential using xlsform_compare

A series of xlsform files, each of them paired with their pretty-print word version, using render_prettyprint can then be exported in order to be piloted and revised in the operation.