# Using clinicaltrials.gov API for extracting information on COVID-19 antibodies

# Background

1. We will use the "full-study" API to extract results. The API returns a max of 100 hits each time. So we need to iterate through all hits, 100 each time, until the end

2. We're interested in extracting the following data:

  Inclusion criteria:
 - COVID-19 indication
 - Antibody treatments, combination treatments involving antibodies

  Exclusion criteria:
 - Entries describing preclinical or clinical development of diagnostic antibodies, polyclonal antibodies, convalescent plasma therapies, immune globulin intravenous therapies (IGIV), vaccines, small molecules, and recombinant proteins other than immunoglobin (Ig), Ig fragments, and Ig fusion proteins were removed from our collection. Studies and clinical trials without explicitly stating COVID-19 or SARS-CoV-2 as their indication or target were also eliminated


3. Shown below are some useful resources on the JSON data from "full-study" queries. **Note that NOT ALL of these fields will be available for an unique study, so it is recommended to use `try...except` to avoid extracting data from a non-existent field**

 - [List of study fields](https://clinicaltrials.gov/api/info/study_fields_list)
 - [Empty structure of JSON returned from a full-study query](https://clinicaltrials.gov/api/info/study_structure)
 - [Search areas](https://clinicaltrials.gov/api/info/search_areas) : Note that regardless of the hierarchy of the fields in JSON, you can use the search areas directly in the `expr` param (see code)

4. Caveat: Manual inspection of the returned results from the script is strongly recommended to ensure relevancy.

# How to format your query

There are 2 ways:

## 1. Single field search

```python
url="https://ClinicalTrials.gov/api/query/full_studies"
params={
    "expr": "NCT04320615",
    "field": "NCTId",
    "min_rnk": 1,
    "max_rnk": 1,
    "fmt": "JSON"
}

data=requests.get(url, params=params).json()
```

Parameters:
- `expr`: value for the field that you want to earch
- `field`: what field to search. Can only be a single field
- `fields`: fields to return. Only applies to "Study Fields" queries. "Full-Study" queries return all fields,  therefore is NOT affected by this parameter
- `min_rnk` and `max_rnk`: When you submit a query, hits are numbered from 1 to xxxx (the last hit). For full-studies, each time you can only retrieve 100 hits. Use this parameter specifify the "index" of the hits that you want to retrieve. E.g. 1-100, 101-200, ....Use these parameters to iterate through hits to extract them all in to sqlite database.
- `fmt`: format, specify as `"JSON"`

## 2. Multiple Field Search (RECOMMENDED)

A barebone multi-field search request looks like this:

```python
url="https://ClinicalTrials.gov/api/query/full_studies"
params={
    "expr": "AREA[NCTId]NCT04320615",
    "min_rnk": 1,
    "max_rnk": 1,
    "fmt": "JSON"
}

data=requests.get(url, params=params).json()
```

Parameters:
- `expr`: use an expression string. For how to structure the string, see [ref on logical operators](https://clinicaltrials.gov/api/gui/ref/expr).

  A simple example of the string query:
  ```
  AREA[InterventionDescription]antibody  NOT AREA[InterventionType]diagnostic
  ```

  Note that regardless of the hierachy of the field in JSON, you can search the field directly using the string query above. The API also appears to have some abilities to match synonymous words - e.g. COVID-19 will automatically cover covid19, etc.

## 3. How many hits are there in my query?

The number of hits matching your query can be found in the `data['FullStudiesResponse']['NStudiesFound']` parameter. For example:

```python

url="https://ClinicalTrials.gov/api/query/full_studies"

params={
    "expr": "AREA[Condition]COVID-19",
    "fmt": "JSON",
    "min_rnk": 1,
    "max_rnk": 1

}
data=requests.get(url, params=params).json()

# total number of trials in the database
print("Total number of studies: ", data['FullStudiesResponse']['NStudiesAvail'])

# total number of trials found matching your query
print("Total number of trials mathcing query: ", data['FullStudiesResponse']['NStudiesFound'])

```

# Example Code

## Imports

In [1]:
import requests
import sqlite3


## Base url

In [2]:
url="https://ClinicalTrials.gov/api/query/full_studies"


## Database validation

In [3]:
# connect to SQLITE database
conn = sqlite3.connect('covid_mabs.sqlite')
cur = conn.cursor()

# get a list of nctid currently in the database
nctidList=[]
cur.execute("select nctid from covid_clinical_trials_API")
for item in cur.fetchall(): # a list of tuples
    nctidList.append(item[0].lower())


# get a list of mabs currently in the database
mabList=[]
cur.execute("select name from covid_clinical_trials_API")
for item in cur.fetchall(): # a list of tuples
    mabList.append(item[0].lower())

## Query Version 1

Search "mab" in InterventionName. Note this produces a narrower result than searching "mab" in "DetailedDescription" or "BriefSummary".

Note that:
1. Use `AREA[InterventionName]mab` might miss many hits

### 1. Get total number of hits

In [4]:
# parameters for query

params={
    "expr": "AREA[Condition]COVID-19 NOT AREA[InterventionType]diagnostic NOT AREA[InterventionType]convalescent NOT AREA[InterventionType]plasma NOT AREA[OfficialTitle]convalescent NOT AREA[OfficialTitle]seroprevalence NOT AREA[OfficialTitle]serological NOT AREA[OfficialTitle]plasma NOT AREA[OfficialTitle]test NOT AREA[OfficialTitle]prevalence NOT AREA[OfficialTitle]polyvalent",
    "fmt": "JSON",
    "min_rnk": 1,  # starting index number (default 1, don't put 0 here)
    "max_rnk": 1   # retrieve 1 record, so we can obtain the value of totalMatches
}

response=requests.get(url, params=params)
data=response.json()
totalMatches=data['FullStudiesResponse']['NStudiesFound']

totalMatches

1387

In [5]:
# use i to iterate through the indexes of hits
i=1

# use counter to record the number of "FILTERED" hits. We're filtering using our customized script
counter=0

while i<= totalMatches:
    min_rnk=i
    if min_rnk+99 <= totalMatches:  #returns max 100 hits per query
        max_rnk=min_rnk+99
    else:
        max_rnk=totalMatches
    params['min_rnk']=min_rnk
    params['max_rnk']=max_rnk
    i+=100
    data=requests.get(url, params=params).json()

    for study in data['FullStudiesResponse']['FullStudies']:
        try:

            # dict of fields that are most relevant to what we want
            protSelect=study['Study']['ProtocolSection']


            # check if nctid already exists in the database
            nctInDb=True if protSelect['IdentificationModule']['NCTId'] in nctidList else False

            # only process contents if nctId is not currently in the database

            if nctInDb==False:
                # if a match is found
                found=False

                # make a list of intervention names (there might be multiple mab drugs in 1 study)
                mabs=[]
                # make a list of phases (might be multiple phases in 1 study)
                phases=[]

                # iterate through a list of dictionaries of interventions (this include BOTH cases where intervention type=drug & intervention type = biological
                for interv in protSelect['ArmsInterventionsModule']['InterventionList']['Intervention']:
                    if "mab" in interv['InterventionName']:
                        found=True
                        mabs.append(interv['InterventionName'])

                # if found, iterate through a list of phases
                if found:
                    for p in protSelect['DesignModule']['PhaseList']['Phase']:
                        phases.append(p)

                # print a summary of results
                if found:
                    print(protSelect['IdentificationModule']['NCTId'])
                    print(mabs)
                    print(phases)
                    try:
                        print(protSelect['StatusModule']['OverallStatus'])
                    except:
                        pass
                    print(protSelect["DescriptionModule"]["BriefSummary"])
                    print("\n\n")
                    counter +=1

        # if for any reason we can't extract the field, just pass
        except:
            pass

print("A total of ", counter, " matching entries were retrieved.")

NCT04452474
['Olokizumab 64 mg']
['Phase 2', 'Phase 3']
Not yet recruiting
The primary objective of the study is to evaluate the efficacy of a single dose of OKZ (64 mg) vs placebo in addition to standard therapy in patients with severe SARS-CoV-2 infection (COVID-19) at Day 29.



NCT04445272
['Tocilizumab']
['Phase 2']
Recruiting
At present, no treatment has been approved for COVID-19. However, in light of the increased interest on using the anti-cytokine therapy targeting IL-6 tocilizumab in COVID-19 infected patients due to its potential benefit, the Spanish Agency for Medicine and Health Products (Agencia Española de Medicamentos y Productos Sanitarios, AEMPS) have initiated the controlled distribution of the drug. Tocilizumab is indeed proposed as a potential treatment for severe COVID-19 in Spain. Based on the positive results of tocilizumab in the treatment of COVID-19 patients and the experience of tocilizumab in inducing rapid reversal of CSS in other pathologies several clin

## Query Version 2

In [6]:
params={
    "expr": "AREA[Condition]COVID-19 AND (AREA[InterventionDescription]antibody OR AREA[InterventionName]antibody OR AREA[OfficialTitle]antibody) NOT AREA[InterventionType]diagnostic NOT AREA[InterventionType]convalescent NOT AREA[InterventionType]plasma NOT AREA[OfficialTitle]convalescent NOT AREA[OfficialTitle]seroprevalence NOT AREA[OfficialTitle]serological NOT AREA[OfficialTitle]ivig NOT AREA[OfficialTitle]plasma NOT AREA[OfficialTitle]test NOT AREA[OfficialTitle]prevalence NOT AREA[OfficialTitle]polyvalent NOT AREA[OfficialTitle]vaccine",
    "fmt": "JSON",
    "min_rnk": 1,  #starting index number (default 1, don't put 0 here)
    "max_rnk": 1   #max is 100 for full studies. You have to run another query starting at 101 if needed

}

response=requests.get(url, params=params)
data=response.json()
totalMatches=data['FullStudiesResponse']['NStudiesFound']

totalMatches

20

In [7]:
# a counter to keep track of the number of FILTERED hits that we get
counter=0
i=1

while i<= totalMatches:
    min_rnk=i
    if min_rnk+99 <= totalMatches:
        max_rnk=min_rnk+99
    else:
        max_rnk=totalMatches
    params['min_rnk']=min_rnk
    params['max_rnk']=max_rnk
    i+=100
    data=requests.get(url, params=params).json()

    for study in data['FullStudiesResponse']['FullStudies']:
        covered=False
        protSelect=study['Study']['ProtocolSection'] # dict of fields that are most relevant to what we want
        try:
            for interv in protSelect['ArmsInterventionsModule']['InterventionList']['Intervention']:
                if "mab" in interv['InterventionName']:
                    covered=True

            if covered==False:
                print(study['Study']['ProtocolSection']['IdentificationModule']['NCTId'])
                print(study['Study']['ProtocolSection']['IdentificationModule']['OfficialTitle'])
                print("\n\n")
                counter=counter+1
        except:
            pass


print("A total of ", counter, " matching entries were retrieved.")

NCT04441918
A Randomized, Double-blind, Placebo-controlled, Phase I Clinical Study to Evaluate the Tolerability, Safety, Pharmacokinetic Profile and Immunogenicity of JS016 (Anti-SARS-CoV-2 Monoclonal Antibody) Injection in Chinese Healthy Subjects After Intravenous Infusion of Single Dose



NCT04454398
A Randomized, Placebo-controlled Study to Evaluate Safety, Pharmacokinetics, Preliminary Efficacy of Three Dose Levels of a Single Dose of STI-1499 (COVI-GUARD), a COVID-19 Targeting Monoclonal Antibody, in COVID-19 Hospitalized Patients



NCT04464395
Immunotherapy of COVID-19 With B-Cell Activating CPI-006 Monoclonal Antibody



NCT04425629
A Master Protocol Assessing the Safety, Tolerability, and Efficacy of Anti-Spike (S) SARS-CoV-2 Monoclonal Antibodies for the Treatment of Ambulatory Patients With COVID-19



NCT04426695
A Master Protocol Assessing the Safety, Tolerability, and Efficacy of Anti-Spike (S) SARS-CoV-2 Monoclonal Antibodies for the Treatment of Hospitalized Patients 