# Question latency

1. Calculate the response latency for each question, i.e. the time in seconds it took from the answer being set for the previous question (or interview resumed) to the answer being set for the question itself.
2. Considers only the timing for questions answered by interviewers prior to any supervisor/HQ rejection/review event.
3. For multiple, consecutive events for one question/roster-level, e.g. questions re-answered, or multi answer questions, the overall time is considered.
4. If questions are re-answered later (i.e. not in sequence, separate response latencies are calculated)
5. We need to move over the fillna part to the paradata generation.

In [None]:
# generate new df
df_time = df_para.copy()

# streamline missings (empty, NaN) to '', important to identify duplicates in terms of roster below
df_time.fillna('', inplace=True)

In [None]:
# only keep  interviewing events prior to Supervisor/HQ interaction
events_split = ['RejectedBySupervisor', 'OpenedBySupervisor', 'OpenedByHQ', 'RejectedByHQ']
grouped = df_time.groupby('interview__id')
df_time['interviewing'] = False
for _, group_df in grouped:
    first_reject_index = group_df['event'].isin(events_split).idxmax()-1
    min_index = group_df.index.min()
    df_time.loc[min_index:first_reject_index, 'interviewing'] = True
df_time = df_time[df_time['interviewing']]
df_time = df_time.drop(columns=['interviewing'])

In [None]:
 # keep only relevant events
events_to_drop = ['SupervisorAssigned', 'InterviewerAssigned', 'KeyAssigned', 'VariableDisabled','ReceivedByInterviewer', 'KeyAssigned', 'VariableEnabled', 'VariableSet', 'QuestionDeclaredInvalid', 'QuestionDeclaredValid', 'Completed', 'TranslationSwitched','ReceivedBySupervisor','OpenedBySupervisor','ApproveBySupervisor','ClosedBySupervisor', 'InterviewModeChanged', 'Paused']
events_to_keep = ['InterviewCreated', 'AnswerSet', 'Resumed'] # check in other example data sets that there are no other relevant events
df_time = df_time[df_time['event'].isin(events_to_keep)]


In [None]:
# keep only events done by interview (should not exist for most cases after above filters, just in case supervisor or HQ answered something while interviewer answered on web mode)
df_time = df_time[df_time['role']==1]

In [None]:
# if the same question was repeatedly answered on the same roster level, keep only the last one (to take the overall time for the question)
group_col = ['interview__id', 'VariableName', 'roster_level']
df_time['is_diff'] = (df_time[group_col].shift() != df_time[group_col]).any(axis=1)
df_time['keep'] = df_time['is_diff'].shift(-1, fill_value=True)
df_time = df_time[df_time['keep']]
df_time.drop(columns=['is_diff', 'keep'], inplace=True)


In [None]:
# calculate time difference in seconds
df_time['time_difference'] = df_time.groupby('interview__id')['datetime_utc'].diff()
df_time['time_difference'] = df_time['time_difference'].dt.total_seconds()


In [None]:
# keep only AnswerSet events
df_time[df_time['event']=='AnswerSet']

# for questions that were re-answered (on the same roster-level, we can either treat them as separate events (i.e. one question within the interview/roster-level enters multiple times into the outlier detection, or we can sum up the events.