D Annex Documentation of referential and questionnaires

`referential`

The referential is the central piece of the {SurveyDesigner}. It is centrally managed to preserve its integrity and includes the documentation of both indicators, linked questions and geographic referential. The referential is then translated into different context for each country. This encompasses the adjustement of labels for both questions and modalities. A context can also includes ad-hoc questions and indicators.

Questionnaires are essentially generated out of the context matching the country and includes one or more than one XlsForm, i.e the definition of a form that can be used within multiple types of data collection servers

`referential_type`

Indeed, there might be different series of distinct referential, each matching very different data collection methodology (type of sampling, representativeness of the person interviewed).

referential_id
type
description

For instance:

household_survey - a set of indicator to be measure on stock population
flow_monitoring - a set of indicators specific to population in transit or in the move - not planning to establish an habitual residence for more than a year
key_informant - a set of indicators (mostly qualitative) collected from "persons with knowledge"
beneficiary_monitoring - a set of indicators to be collected on regular basis from the beneficiaries of a specific programme

For the initial prototype, we will focus on household_survey. Though the business logic should work independently from the type of referential, making the application re-usable and adaptable in multiple contexts and potentially different organizations.

Once loaded, the referential should be searchable in a similar way than the International Household Survey Question Bank (see also UNHCR Question Bank node) but with a "ready to use" solution for the design of survey instruments (i.e. already adapted to the specific implementation context).

Below is a description of each worksheet within the spreadsheet. A link to an existing standard is included whenever possible.

`survey`

For the questions part, the referential will match the standard xlsform structure with additional column

referential_id
type xlsform Ref
name unique id for the variable - defined as a standard global codebook
label - This is the master version for the label - defined in English per convention and defaults - The labels can be translated and contextualized for each country based on predefined instructions
hint xlsform Ref - This is the master version for the hint label - defined in English
required xlsform Ref
required_message xlsform Ref - defined in English - The labels can be contextualized for each country based on predefined instructions
constraint xlsform Ref
constraint_message xlsform Ref defined in English - The labels can be contextualized for each country based on predefined instructions
relevant xlsform Ref
appearance xlsform Ref
calculation xlsform Ref
trigger xlsform Ref
parameters xlsform Ref
repeat_count xlsform Ref
default xlsform Ref
read_only xlsform Ref
choice_filter xlsform Ref
media::image xlsform Ref
contextualize boolean indicates if contextualizing the label is allowed
contextualize_instruction string in case contextualizing the label is allowed, instruction to follow to guide the contextualization
block This a way to note how different questions should remain grouped together - See also https://xlsform.org/en/#grouping-questions
block_sequence integer defining the sequence for the this block within the interview flow
sequence integer defining the sequence for the variable within the block
mode factor with possible value being ALL, CAPI, CATI, and CAWI, which stand for Computer Assisted Personal Interviewing, Computer Assisted Telephone Interviewing, and Computer Assisted Web Interviewing. cf an explanation here
check define what type, if any, of High Frequency Check should be applied using this variable
accuracy expected level of accuracy for the indicators - can be used to prioritize the indicators over multiple data collection waves
chapter string defines a high level research question - used to group variables together when generating the automatic data exploration report with kobocruncher
subchapter string defines a high level sub research question - used to group variables together when generating the automatic data exploration report with kobocruncher
labelReport - This should be a short label for the variable - less than 80 char - will be used for reporting purpose so that it displays well in a chart - used by kobocruncher
hintReport Can be used to provide a longer description of the variable - as well as potential green, orange, red standard threshold value for the variable when interpreting it.. - used by kobocruncher
keyword list of associated keyword as defined in RIDL - cf schema

`choices`

Also many responses options should map established classification

referential_id
list_name this should be referenced within type when type starts with select_one or select_multiple
name unique id for the modality - defined as a standard global codebook
label - This is the master version for the label - defined in English - The labels can be contextualized for each country based on predefined instructions
order integer - if null the variable is not considered as ordinal - used by kobocruncher
labelReport - This should be a short label for the modality - less than 40 char - will be used for reporting purpose so that it displays well in a chart - used by kobocruncher
contextualize boolean indicates if contextualizing the label is allowed
contextualize_instruction string in case contextualizing the label is allowed, instruction to follow to guide the contextualization

`indicator`

indicators represent variables that are calculated from the variables directly collected. indicators can be the final metrics used for analysis or be simply auxiliary, meaning used fro disaggregation cf Question Bank

referential_id
type should be either select_one or numeric
name unique id for the indicator - defined as a standard global codebook
labelReport - This should be a short label for the indicators - less than 80 char - will be used for reporting purpose so that it displays well in a chart - used by kobocruncher
hintReport Can be used to provide a longer description of the indicators - as well as potential green, orange, red standard threshold value for the indicator when interpreting it.. - used by kobocruncher
list_name in case indicator results is discrete, make a refence within the choices elements of the labels to use
repeatvar in case of multiple frame in the dataset, indicate in wich frame the indicator should be appended
ind_type defines if the indicator defines a population, a disaggregation, a final or only an auxiliary (meaning an intermediate calculation done to build the final indicators)
sequence integer - use to define an order - important to ensure that auxiliary variables are created first in order to calculate the final indicators
block This a way to note how different indicators should be consistently calculated together - it will ease quick selection of multiple indicators with one single instructions - for instance all indicators that linked to the same selected impact or outcome
chapter string defines a high level research question - used to group variables together when generating the automatic data exploration report with kobocruncher
subchapter string defines a high level sub research question - used to group variables together when generating the automatic data exploration report with kobocruncher
calculation R statement used to create the indicator based on the standard global codebook and assuming that the data object that will be build from the dataset exported from kobo is a kobcrunhcer datalist
unit string unit for the indicator
accuracy expected level of accuracy for the indicators - can be used to prioritize the indicators over multiple data collection waves
mode_CAPI bolean indicates if the indicator can be collected with CAPI - Computer Assisted Personal Interviewing- then requires that there's an entry in choices[["mode"]] with this specific mode
mode_CATI bolean indicates if the indicator can be collected with CATI - Computer Assisted Telephone Interviewing - then requires that there's an entry in choices[["mode"]] with this specific mode
mode_CAWI bolean indicates if the indicator can be collected with CAWI - Computer Assisted Web Interviewing - then requires that there's an entry in choices[["mode"]] with this specific mode
metadata indicates limitation for the indicator - this provides in-depth documentation on the indicator concept and methodology
link provides a link to any established official documentation on the indicator
keyword list of associated keyword as defined in RIDL - cf schema

`indicator_survey`

This table allows to ensure that we have all required variables (aka survey questions) to calculate the indicator:

referential_id
name unique id for the indicator
name_survey unique id for the variable as defined in survey

`indicator_choices`

This table allows to ensure that we have all required modalities (aka response options) for the questions used to calculate the indicator:

referential_id
name unique id for the indicator
name_choices unique id concatenating the list_name and name from choices -

`indicator_population`

This table maps the relation one to many between one indicator and all the population group that the indicator can apply to.

referential_id
name unique id for the indicator
name_population factor - for instance for household survey this can be either "Refugees (REF)", "Asylum seekers (ASY)" , "Internally displaced persons (IDP)", "Other people in need of international protection (OIP)", "Stateless Persons (STA)", "Others of concern to UNHCR (OOC)" or "Host community (HCT)"

`indicator_disaggregation`

This table maps the relation one to many between one indicator and all the potentially expected disaggregation variables (aka another indicator or a survey name) that the indicator can apply to.

referential_id
name unique id for the indicator
name_disaggregation reference either an indicator or directly a variable from survey. Should be a factor - for instance in household survey could be among Age , Gender, Disability, Site

`context`

A context reflects the implementation of the referential within a specific country or operation. context are expected to enforce full data integrity with the main referential, meaning that all centrally defined indicators should have their corresponding survey questions and response available in all context

`context_geography`

context_id
region tag the region associated with the country - used to identify the relevant Regional Survey Support
country country iso code alpha3 - also defined in the choices table
geo additional geographic_id

`context_language`

This table maps the relation one to many between one country and all the languages that can be used in that context.

context_id
language should comply with language referential from Internet Assigned Numbers Authority (IANA) - once defined here - this same language options should be available for survey and choices

`context_survey`

Once language requirement have been defined for each country - then it should be contextualized, in line with Global Recommendations, and through a dialog between the relevant Regional Survey Support and the Operation Survey Focal Point. This can include the addition of ad-hoc context specific questions.

context_id
name
language the suffix indicates the language and should comply with language referential from Internet Assigned Numbers Authority (IANA) - this column can be repeated for as many language as needed
label
hint
required_message
constraint_message
question_type defines if this an ad-hoc context specific questions that was created during the context creation through a dialog between the regional survey support and the operation survey coordinator
contextualization_note
duration integer represent the number of second necessary to read the questions and the linked answers (if the question is type select). This can be estimated by a function like interview_duration. This variable is used to define if the total form interview remains within an acceptable total duration - aka 40 to 50 minutes and eventually suggest to spit the questionnaire within multiple data collection wave

`context_choice`

Once language requirement have been defined for each context - then it should be contextualized, in line with Global Recommendations, and through a dialog between the relevant Regional Survey Support and the Operation Survey Focal Point.

Note that the geographic referential is managed under choices - it can be filtered with the 2 dedicated list_name: country& admin1. Geographic referential name should align with the Common Operational Dataset Pcode

context_id
list_name
name
language the suffix indicates the language and should comply with language referential from Internet Assigned Numbers Authority (IANA) - this column can be repeated for as many language as needed
label
contextualization_note

`questionnaires`

They represent an object with one or more than one fully compliant xlsform objects, created out of the referential.

All of the xlsform objects should be valid - Validation can be made using a dedicated python package pyxform

Each single xlsform objects should at least the combination of data collection mode and wave - for instance CATI_wave1 , CAPI_wave1, CAPI_wave2, etc.

In order to build the questionnaires from the referential , the survey manager will need to set up a series of filters and include some additional information for contextualization:

select which context to use - i.e. select one country - this will filter the questions and the choices options accordingly as well as the languages translation
select target population - This will filters what indicators can be calculated
select topic and then linked multiple indicators or indicators block or survey questions - cf above definition of indicators - this will select only from non - auxiliary indicators
select data collection mode
indicate how many data collection waves can be organized. This shall be done based on the analysis of questionnaire duration and indicator accuracy
Define the settings - see xlsform Ref within each sub-questionnaires
Adjust the defaults block_sequence

Once finalized, a summary is generated to document the Annual Survey Management Cycle. The summary can highlight the main customization by doing a final comparison between the xlsform and the referential using xlsform_compare

A series of xlsform files, each of them paired with their pretty-print word version, using render_prettyprint can then be exported in order to be piloted and revised in the operation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly