🚀 Hackathon Starter Kit: ClinicalTrials.gov Data Downloader
---

*This notebook is your starting point for downloading rich, detailed data directly from the official ClinicalTrials.gov API.*

---

What this notebook does:
Fetches data for a list of clinical trial IDs (NCT IDs).
Saves the data into a clean, ready-to-use CSV file.
Uses a reliable, one-by-one approach to avoid being blocked by the server.

---

How to use this notebook:
Read the instructions in the text cells (like this one).
Modify the configuration in Step 2 to select the trial IDs you want.


# Step 1: Import Necessary Libraries

This first code cell imports the Python libraries we'll need. No special installation is required, as these are all standard libraries available in Google Colab.


*  **requests:** For making HTTP requests to the API.
*   **csv:** For handling and writing data in CSV format.
*   **io:** To treat the text data from the API as an in-memory file.
*   **time:** To add a polite delay between our API requests.



In [None]:
import requests, csv, io, time

# Step 2: 🔧 Configuration

This is the most important section for you to modify. Here you will define which trials to fetch and what to name your output file.


*   **NCT_IDS**: This is a Python list of the trial IDs you want to download. You should replace the example IDs with your own list.
*   **API_PARAMS**: Here you can customize the data fields you want from the API. We've included some baseline fields to get you started.



In [None]:
# ==============================================================================
# TODO: MODIFY THIS LIST WITH THE NCT IDs YOU WANT TO FETCH
# ==============================================================================

NCT_IDS = [
    'NCT02125461', 'NCT01721746'
]

# You can change the name of your final output file here
OUTPUT_FILENAME = 'clinical_trials_data.csv'

# --- Advanced Configuration (You can leave this as is) ---

# We must send a 'User-Agent' header to identify ourselves as a browser.
# This is CRITICAL to avoid being blocked by the server (403 Forbidden error).
API_HEADERS = {
    'accept': 'text/csv',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'
}

# These are the data fields we are requesting from the API.
# The '|' character is used as a separator.
API_PARAMS = {
    'format': 'csv',
    'fields': 'NCT Number|Study Title|Study URL|Acronym|Study Status|Brief Summary|Study Results|Conditions|Interventions|Primary Outcome Measures|Secondary Outcome Measures|Other Outcome Measures|Sponsor|Collaborators|Sex|Age|Phases|Enrollment|Funder Type|Study Type|Study Design|Start Date|Primary Completion Date|Completion Date|First Posted|Results First Posted|Last Update Posted|Locations|Study Documents',
}

# Step 3: Fetch the Data and Create the CSV File

This is the main part of the script. When you run this cell, it will:
Open the OUTPUT_FILENAME you defined above, ready for writing.

1.   Open the OUTPUT_FILENAME you defined above, ready for writing.
2.   Loop through each nct_id in your NCT_IDS list.
3.   Make a request to the ClinicalTrials.gov API for that ID.
4.   Wait for 1 second (time.sleep(1)) to be polite to the server and avoid rate limits.
5.   If the request is successful, it writes the data to the CSV file.
6.  If an error occurs (e.g., for an invalid ID), it will print an error message and continue to the next ID.


The process might take some time depending on the number of IDs in your list. Watch the output below the cell to see its progress!

In [None]:
with open(OUTPUT_FILENAME, 'w', newline='', encoding='utf-8') as csv_file:
    csv_writer = csv.writer(csv_file)

    header_written = False

    # Iterate through each NCT ID one by one
    for nct_id in NCT_IDS:
        url = f"https://clinicaltrials.gov/api/v2/studies/{nct_id}"
        print(f"Fetching data for {nct_id}...")

        try:
            # Make a single, synchronous request
            response = requests.get(url, params=API_PARAMS, headers=API_HEADERS, timeout=30)

            # This will raise an error for 4xx or 5xx status codes (like 403, 404, 500)
            response.raise_for_status()

            # Use response.text which decodes the content for us
            text_data = response.text

            # Handle empty successful responses
            if not text_data.strip():
                print(f"  Warning: Received an empty but successful response for {nct_id}. Skipping.")
                continue

            # Use io.StringIO to treat the string as a file for the csv module
            string_file = io.StringIO(text_data)
            csv_reader = csv.reader(string_file)

            # The first line of the response is the header
            header = next(csv_reader)

            # Write the header only once from the first successful request
            if not header_written:
                csv_writer.writerow(header)
                header_written = True

            # Write all remaining data rows (should be just one for this API)
            for data_row in csv_reader:
                csv_writer.writerow(data_row)

            print(f"  Success! Wrote data for {nct_id}.")

        except requests.exceptions.HTTPError as e:
            # This catches 4xx/5xx errors, including our 403 or 404 for invalid IDs
            print(f"  Error for {nct_id}: {e}")
        except requests.exceptions.RequestException as e:
            # This catches other network problems (e.g., connection timeout)
            print(f"  A network error occurred for {nct_id}: {e}")

        # --- POLITENESS DELAY ---
        # Wait for a short time before the next request to avoid hammering the server.
        time.sleep(1) # Sleep for 1 second

print(f"\nProcessing complete. Data saved to '{OUTPUT_FILENAME}'.")