# Look up survey question and answer concept IDs in All of Us

This notebook shows how to:

1. Search for **survey questions** in `ds_survey` by survey name and keyword.
2. Retrieve all **answer options** (`answer_concept_id` and answer text) for a chosen question.

We will use the curated dataset specified by the `WORKSPACE_CDR` environment variable.

In [1]:
import os
import pandas as pd

# Get the BigQuery curated dataset for the current workspace context.
CDR = os.environ["WORKSPACE_CDR"]
USE_BQSTORAGE = "BIGQUERY_STORAGE_API_ENABLED" in os.environ

# Make long question/answer text fully visible in outputs.
pd.set_option("display.max_colwidth", None)

print(f"Using CDR dataset: {CDR}")

Using CDR dataset: fc-aou-cdr-prod-ct.C2022Q4R13


In [2]:
# ---------------------------------------------------------------
#  Search RA-related questions and answers in ds_survey
# ---------------------------------------------------------------

survey_name = "Personal and Family Health History"
keyword = "rheumatoid arthritis (ra)"  # case-insensitive search

query = f"""
SELECT DISTINCT
  question_concept_id,
  question,
  answer_concept_id,
  answer,
  survey
FROM `{CDR}.ds_survey`
WHERE survey = '{survey_name}'
  AND LOWER(question) LIKE '%{keyword.lower()}%'
ORDER BY question_concept_id, answer_concept_id
"""

df_q = pd.read_gbq(
    query,
    dialect="standard",
    use_bqstorage_api=USE_BQSTORAGE,
)

print(f"Found {df_q['question_concept_id'].nunique()} question(s) and {len(df_q)} question–answer pair(s).")
df_q.head()

Found 4 question(s) and 20 question–answer pair(s).


Unnamed: 0,question_concept_id,question,answer_concept_id,answer,survey
0,836820,"Including yourself, who in your family has had rheumatoid arthritis (RA)? Select all that apply.",903096,PMI: Skip,Personal and Family Health History
1,836820,"Including yourself, who in your family has had rheumatoid arthritis (RA)? Select all that apply.",1384653,"Including yourself, who in your family has had rheumatoid arthritis (RA)? - Self",Personal and Family Health History
2,836820,"Including yourself, who in your family has had rheumatoid arthritis (RA)? Select all that apply.",43529773,"Including yourself, who in your family has had rheumatoid arthritis (RA)? - Daughter",Personal and Family Health History
3,836820,"Including yourself, who in your family has had rheumatoid arthritis (RA)? Select all that apply.",43529774,"Including yourself, who in your family has had rheumatoid arthritis (RA)? - Father",Personal and Family Health History
4,836820,"Including yourself, who in your family has had rheumatoid arthritis (RA)? Select all that apply.",43529775,"Including yourself, who in your family has had rheumatoid arthritis (RA)? - Grandparent",Personal and Family Health History


In [3]:
# Show all distinct questions we found, with their question_concept_id
questions_summary = (
    df_q[["question_concept_id", "question"]]
    .drop_duplicates()
    .sort_values("question_concept_id")
)

questions_summary

Unnamed: 0,question_concept_id,question
0,836820,"Including yourself, who in your family has had rheumatoid arthritis (RA)? Select all that apply."
8,1384479,About how old were you when you were first told you had rheumatoid arthritis (RA)?
14,1384540,Are you currently prescribed medications and/or receiving treatment for rheumatoid arthritis (RA)?
17,1384593,Are you still seeing a doctor or health care provider for rheumatoid arthritis (RA)?


# Inspect answer options for a chosen question

From the table above, choose one `question_concept_id` and list all of its
answer options. These `answer_concept_id` values are what you will plug into
`PHENOTYPE_CONFIG['RA']['survey_inclusion_pairs']` or
`PHENOTYPE_CONFIG['RA']['survey_exclusion_pairs']` in the dataset extraction notebook.


In [4]:
# ---------------------------------------------------------------
#  Inspect all answer options for a specific question
# ---------------------------------------------------------------

# Example: we are interested in the question:
# "Are you still seeing a doctor or health care provider for rheumatoid arthritis (RA)?"
# From the previous cell, the question_concept_id is 1384593.

target_question_id = 1384593  # <-- change this ID for other questions

df_ra_family = df_q[df_q["question_concept_id"] == target_question_id].copy()

df_ra_family[
    ["question_concept_id", "question", "answer_concept_id", "answer"]
].drop_duplicates()


Unnamed: 0,question_concept_id,question,answer_concept_id,answer
17,1384593,Are you still seeing a doctor or health care provider for rheumatoid arthritis (RA)?,903096,PMI: Skip
18,1384593,Are you still seeing a doctor or health care provider for rheumatoid arthritis (RA)?,1384691,Are you still seeing a doctor or health care provider for rheumatoid arthritis (RA)? - No
19,1384593,Are you still seeing a doctor or health care provider for rheumatoid arthritis (RA)?,1385113,Are you still seeing a doctor or health care provider for rheumatoid arthritis (RA)? - Yes
