# Clinicaltrials.gov

How to search and find datasets on clinicaltrials.gov then extract into dataframe in a jupyter notebook

clinicaltrials.gov is a registry of clinical trials conducted around the world. To search and extract data from clinicaltrials.gov, you can use the [API](https://clinicaltrials.gov/api/gui) provided by the website. Below is a step-by-step guide to search for datasets, extract them, and load them into a pandas DataFrame in a Jupyter Notebook.

First, install the required libraries if you haven't already. You can install them using pip:
```!pip install pandas requests```

Next, import the necessary libraries in your Jupyter Notebook:

In [7]:
import requests
import pandas as pd

Define a function to fetch data from clinicaltrials.gov using the API:

In [8]:
def fetch_trials(search_term, max_results=1000):
    base_url = "https://clinicaltrials.gov/api/query/study_fields"
    fields = [
        "NCTId",
        "BriefTitle",
        "Condition",
        "EnrollmentCount",
        "StudyType",
        "StatusVerifiedDate",
        "PrimaryCompletionDate",
        "ResultsFirstPostDate",
        "LastUpdatePostDate",
        "StudyFirstPostDate",
        "LocationCountry"
    ]

    params = {
        "expr": search_term,
        "fields": ",".join(fields),
        "min_rnk": 1,
        "max_rnk": max_results,
        "fmt": "json"
    }

    response = requests.get(base_url, params=params)

    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error fetching data: {response.status_code}")
        return None

Use the function to fetch data based on your search term:

In [9]:
search_term = "diabetes"
data = fetch_trials(search_term)

Extract the data into a pandas DataFrame:

In [10]:
def extract_to_dataframe(data):
    if data:
        trials = data["StudyFieldsResponse"]["StudyFields"]
        df = pd.DataFrame(trials)
        return df
    else:
        print("No data to extract.")
        return None

df = extract_to_dataframe(data)

Now you can work with the DataFrame as you normally would:

In [11]:
print(df.head())

   Rank          NCTId                                         BriefTitle  \
0     1  [NCT04016584]  [Diabetes Pueblo Program - Application and Acc...   
1     2  [NCT04216875]  [Best Practice Study of Diabetes Type 2 Manage...   
2     3  [NCT02076568]  [Diabetes and Partnership: Evaluation of a Dia...   
3     4  [NCT02076542]  [Diabetes and Sports: Evaluation of a Diabetes...   
4     5  [NCT02077686]  [Diabetes and Travel: Evaluation of a Diabetes...   

                                   Condition EnrollmentCount  \
0                           [Type2 Diabetes]            [25]   
1  [Diabetes Mellitus, Type 2, Primary Care]           [738]   
2                        [Diabetes Mellitus]           [201]   
3                        [Diabetes Mellitus]           [284]   
4                        [Diabetes Mellitus]           [262]   

          StudyType StatusVerifiedDate PrimaryCompletionDate  \
0  [Interventional]   [September 2021]   [February 25, 2020]   
1   [Observational]     

Remember to replace the search_term variable with the specific term you want to search for. You can also adjust the max_results parameter in the fetch_trials function to control the number of results fetched.

In [5]:
df.head()

Unnamed: 0,Rank,NCTId,BriefTitle,Condition,EnrollmentCount,StudyType,StatusVerifiedDate,PrimaryCompletionDate,ResultsFirstPostDate,LastUpdatePostDate,StudyFirstPostDate,LocationCountry
0,1,[NCT04016584],[Diabetes Pueblo Program - Application and Acceptability of Culturally Appropriate Latino Education for Insulin Therapy],[Type2 Diabetes],[25],[Interventional],[September 2021],"[February 25, 2020]",[],"[September 5, 2021]","[July 11, 2019]",[United States]
1,2,[NCT04216875],[Best Practice Study of Diabetes Type 2 Management in Primary Care in Switzerland],"[Diabetes Mellitus, Type 2, Primary Care]",[738],[Observational],[April 2021],"[June 1, 2019]",[],"[May 4, 2021]","[January 3, 2020]",[Switzerland]
2,3,[NCT02076568],[Diabetes and Partnership: Evaluation of a Diabetes Education Module],[Diabetes Mellitus],[201],[Interventional],[August 2019],[January 2019],[],"[August 9, 2019]","[March 3, 2014]","[Germany, Germany, Germany, Germany]"
3,4,[NCT02076542],[Diabetes and Sports: Evaluation of a Diabetes Education Module],[Diabetes Mellitus],[284],[Interventional],[August 2019],[January 2019],[],"[August 9, 2019]","[March 3, 2014]","[Germany, Germany, Germany, Germany]"
4,5,[NCT02077686],[Diabetes and Travel: Evaluation of a Diabetes Education Module - a Randomized Controlled Trial (PRIMO_Travel)],[Diabetes Mellitus],[262],[Interventional],[August 2019],[January 2019],[],"[August 9, 2019]","[March 4, 2014]","[Germany, Germany, Germany, Germany]"
