# Exploring GDELT 2.0 API

## Introduction

The Data Against Feminicides project aims to automate the process of tracking gender-based killings of women across the world by leveraging various data sources. In this Notebook, we explore the use of the GDELT project's API to fetch news articles related to feminicides. We investigate several key questions related to data retrieval, filtering stories, and Ingestion for analysis as well as .

### Objectives

1. Explore the capabilities of the GDELT API.
2. Determine if the GDELT API can be used to fetch lists of URLs matching boolean queries.
3. Investigate filtering options to obtain results from the last three days.
4. Evaluate the possibility of filtering by geographic country or state/province of publication.
5. Identify any API limits or associated monetary costs.



# Question - Can we fetch a list of URLs matching Boolean Queries

After Understanding the Documentation, I began by experimenting with the GDELT API to understand its capabilities and how it can be used for data retrieval, We will use HTTP GET requests to the GDELT API with the formulated query to retrieve a list of articles matching our criteria.

NOTE: Data Format

We explored two different data formats provided by the API:
- JSON Format: Structured data suitable for analysis.
- HTML Format: Human-readable format suitable for presentation.



In [13]:
##REQUIRED LIBRARIES

import requests #We use the requests library to send a GET request to the GDELT API with the defined query parameters.

from datetime import datetime , timedelta ## For manipulation date and time
from bs4 import BeautifulSoup ## for extracting & parsing HTML data 



## Testing Using the Provided Website Example
- Initially, we conducted testing using the example URL provided on the website, which contained a list of wildfire articles. This testing allowed us to analyze and understand the code structure. 
- However, we encountered a limitation when we attempted to use our sample query related to feminicides in Argentina to test the API's functionality. 
- The challenge arose from the length of our query, indicating a potential limitation in handling extensive keyword searches. As a potential solution, we may consider batch processing for more complex queries."

In [14]:
# Define the API URL
url = "https://api.gdeltproject.org/api/v2/doc/doc"

# Define the query parameters
params = {
    "query": '("wildlife crime" OR poaching OR "illegal fishing" OR "wildlife trade")',
    "mode": "artlist",
    "maxrecords": 100,
    "timespan": "1week"
}

# Send the GET request
response = requests.get(url, params=params)

# Check if the request was successful
if response.status_code == 200:
    # Parse the HTML response
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Extract article details
    article_elements = soup.find_all('a')
    
    for i, article in enumerate(article_elements):
        title = article.find('span', class_='arttitle').text
        source = article.find('span', class_='sourceinfo').text
        link = article['href']
        
        print(f"Article {i + 1}:")
        print(f"Title: {title}")
        print(f"Source Info: {source}")
        print(f"Link: {link}")
        print("\n")
else:
    print("Error:", response.status_code)


Article 1:
Title: Intensifying zero poaching efforts against illegal wildlife trade
Source Info:    kuenselonline.com  3 days ago () English    Bhutan
Link: https://kuenselonline.com/intensifying-zero-poaching-efforts-against-illegal-wildlife-trade/


Article 2:
Title: Illegal wildlife trade threatens  nearly half  of Unesco sites
Source Info:    theweek.com  4 days ago () English    United States
Link: https://theweek.com/poaching/83663/illegal-wildlife-trade-threatens-nearly-half-of-unesco-sites


Article 3:
Title: All Philippine eagles face threat of habitat destruction – UNODC
Source Info:    philstar.com  6 hours ago () English    Philippines
Link: https://www.philstar.com/headlines/2023/09/18/2297153/all-philippine-eagles-face-threat-habitat-destruction-unodc


Article 4:
Title: Sri Lankan Navy arrests eight Rameswaram fishermen for alleged violation of IMBL
Source Info:    newindianexpress.com  3 days ago () English    India
Link: https://www.newindianexpress.com/states/tamil-na

##### Approach 
 -Define the API URL, which is "https://api.gdeltproject.org/api/v2/doc/doc."

-Define the example_query variable to store the example query provided by your professor.

-Set query parameters like mode, maxrecords, and timespan to control the API request. For example, you specify that you want to retrieve articles in "artlist" mode, limit the maximum number of records to 100, and set the timespan to "1week."

If the request is successful (status code 200), we parse the HTML response using BeautifulSoup & extract article details such as title, source information, and link.


## Testing on Sample Query 


### Defining Functions

In [15]:
url = "https://api.gdeltproject.org/api/v2/doc/doc"

def query_and_print_results(url, example_query):
    params = {
        "query": example_query,
        "mode": "artlist",
        "maxrecords": 50,
    }

    response = requests.get(url, params=params)

    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')
        article_elements = soup.find_all('a')

        for i, article in enumerate(article_elements):
            title = article.find('span', class_='arttitle').text
            source = article.find('span', class_='sourceinfo').text
            link = article['href']

            print(f"Article {i + 1}:")
            print(f"Title: {title}")
            print(f"Source Info: {source}")
            print(f"Link: {link}")
            print("\n")
    else:
        print("Error:", response.status_code)


In [16]:

def query_and_get_results(url, example_query):
    params = {
        "query": example_query,
        "mode": "artlist",
        "maxrecords": 50,
    }

    response = requests.get(url, params=params)

    if response.status_code == 200:
        return response.content  # Return the response content
    else:
        print("Error:", response.status_code)
        return None



In [17]:

example_query = """
(asesinato OR homicidio OR femicidio OR feminicidio OR travesticidio OR transfemicidio OR Lesbicidio OR asesina OR 
asesinada OR muerta OR muerte OR mata OR mató OR dispara OR balea OR apuñala OR acuchillada OR golpeada OR estrangula OR 
ahogada OR degollada OR incinera OR quemada OR envenenada OR "prendida fuego" OR descuartizada OR "sin vida" OR intento OR 
"intento de asesinato" OR "Intentó asesinarla" OR "intento de femicidio" OR "intento de transfemicidio" OR 
"intento de travesticidio" OR "intento de lesbicidio" OR "intentó matarla" OR suicidio OR "se quito la vida" OR 
"se mató" OR "se suicido" OR "se ahorco") AND (mujer OR niña OR "una joven" OR "una adolescente" OR 
"una chica" OR "cuerpo de una mujer" OR "restos" OR "cadaver de una mujer" OR prostituta OR 
"trabajadora sexual" OR "mujer trans" OR "una travesti" OR "hombre vestido de mujer")
"""


query_and_print_results(url, example_query)


In [35]:
response_content = query_and_get_results(url, example_query)
print(response_content)

b"There was an error with one or more of your query parameters. Make sure you don't have any spaces between commands and their parameters.\n"


- We can see a Limitation here as we are not able to search succefully for long queries according to the given error by API
- Lets trim our Query and try Again

### Testing with a short query

In [19]:
# Define a shorter query (done manually)
example_query = """
(femicidio OR feminicidio OR mujer ) 
"""
query_and_print_results(url, example_query)


Article 1:
Title: CFI Fridays & News from Colombia ! 
Source Info:    mailchi.mp  2 months ago () English    Northern Mariana Islands
Link: https://mailchi.mp/worldcocoa.org/cfi-fridays-news-from-colombia


Article 2:
Title: Period Poverty in Argentina - The Borgen Project
Source Info:    borgenproject.org  2 months ago () English    United States
Link: https://borgenproject.org/period-poverty-in-argentina/


Article 3:
Title: City of Edinburg to host 10th annual Fridafest
Source Info:    valleycentral.com  2 months ago () English    United States
Link: https://www.valleycentral.com/news/local-news/city-of-edinburg-to-host-10th-annual-fridafest/


Article 4:
Title: ASU Laura Bush Institute  Dia de la Mujer  Event Planned | KKSA AM NEWS - TALK - SPORTS
Source Info:    kksa-am.com  4 days ago () English    United States
Link: https://www.kksa-am.com/2023/09/13/14370/


Article 5:
Title: Rizal hat in Berlin | Inquirer Opinion
Source Info:    opinion.inquirer.net  1 month ago () English   

-  We can see the Query works with a shorter length and 1 operator, Now lets Increase some search keywords & Operator

### Testing with a Both Or & AND Operators

In [20]:
# Define the shorter query
example_query = """
(asesinato OR homicidio OR femicidio ) AND (mujer OR niña OR "una joven" )
"""
query_and_print_results(url, example_query)


In [21]:
response_content = query_and_get_results(url, example_query)
print(response_content)

b'<!DOCTYPE html>\n<html>\n\n<head>\n<meta charset="UTF-8">\n<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />\n<title>[\n(asesinato OR homicidio OR femicidio ) AND (mujer OR ni&#x00f1;a OR "una joven" )\n]</title>\n    \n<STYLE>\nbody, div, h1, h2, h3, h4, h5, h6, p, ul, ol, li, dl, dt, dd, img, form, fieldset, input, textarea, blockquote {\n\tmargin: 0; padding: 0; border: 0;\n}\n\nhtml, body {\n\tbackground: #fff;\n\t/* font: 16px/30px "Helvetica", sans-serif; */\n\tfont: 13px arial;\n\tcolor: #000;\n\theight: 100%;\n\t//width: 100%;\n\tmin-height: 100%;\n\t//padding: 5px;\n        //overflow-x: hidden;\n}\n.arttitle {\n    font-family: Roboto,Helvetica,Arial;\n    font-size: 18px;\n    font-weight: 500;\n    color: #000;\n    display: block;\n    padding-bottom: 10px;\n}\n.sourceinfo {\n    font-size: 12px;\n    font-family: Roboto,Helvetica,Arial;\n    color: #888;\n}\n.sourceinfolink {\n    font-size: 12px;\n}\na {\n    

###### This Indicates that there is no content within our search query maybe due to its complexity, but what if we transate the words given in the query into english ?

### Testing after Translating the spanish keywords into english 

In [22]:
# Define the shorter query
example_query_translated = """
(murder OR homicide OR femicide) AND (woman OR girl OR a young woman)"""

query_and_print_results(url, example_query_translated)


Article 1:
Title: Envían a la cárcel de Chonchocoro al presunto feminicida de Marlene , joven cantante y mamá de una niña
Source Info:    eju.tv  2 days ago () Spanish    Bolivia
Link: https://eju.tv/2023/09/envian-a-la-carcel-de-chonchocoro-al-presunto-feminicida-de-marlene-joven-cantante-y-mama-de-una-nina/


Article 2:
Title: Femminicidio Piano di Sorrento , la famiglia di Anna Scala : Vogliamo giustizia
Source Info:    fanpage.it  1 month ago () Italian    Italy
Link: https://www.fanpage.it/napoli/femminicidio-piano-di-sorrento-la-famiglia-di-anna-scala-vogliamo-giustizia/


Article 3:
Title: Nelj henkil syytetn nuoren naisen murhasta Oulun Hiirosessa
Source Info:    yle.fi  2 weeks ago () Finnish    Finland
Link: https://yle.fi/a/74-20047939


Article 4:
Title: Caso Cecilia Strzyzowski : dictan prisión preventiva para toda la familia Sena y otros cuatro imputados
Source Info:    losandes.com.ar  2 months ago () Spanish    Argentina
Link: https://www.losandes.com.ar/policiales/caso

- We observed that it returned articles in various languages and from diverse regions, including Spanish, Finnish, and Italian. This observation suggests that leveraging the translation feature, along with the source region and source language parameters, could enhance query performance. It appears that the API exhibits a better understanding of search query terms in English compared to the native languages of the news articles.

In [23]:
## In this query we will use both AND / OR operators with english keywords 
example_query_english_OR_AND = """
('attempted murder' OR 'attempted femicide' OR 'attempted transfemicide' OR 'attempted lesbicide' OR 'dismembered' OR 'lifeless') AND (woman OR girl OR  'a teenager' OR 'a girl' OR 'remains' OR prostitute OR 'sex worker' OR 'trans woman')
"""
query_and_print_results(url,example_query_english_OR_AND)

Article 1:
Title: Grain Valley couple sentenced to life in prison for woman murder
Source Info:    kctv5.com  4 days ago () English    United States
Link: https://www.kctv5.com/2023/09/12/grain-valley-couple-sentenced-life-prison-womans-murder/


Article 2:
Title: நடுரோட்டில் இளம்பெண்ணை நிர்வாணமாக்கி வாலிபர் ரகளை - 15 நிமிடங்கள் நிற்க வைத்து பாலியல் தொல்லை
Source Info:    maalaimalar.com  1 month ago () Tamil    India
Link: https://www.maalaimalar.com:443/news/national/in-hyderabad-a-young-girl-was-sexually-harassed-by-making-her-stand-there-for-15-minutes-by-stripping-her-naked-in-the-middle-of-the-road-647318


Article 3:
Title: Read all Latest Updates on and about வாலிபர் ரகளை
Source Info:    maalaimalar.com  1 month ago () Tamil    India
Link: https://www.maalaimalar.com/tags/%E0%AE%B5%E0%AE%BE%E0%AE%B2%E0%AE%BF%E0%AE%AA%E0%AE%B0%E0%AF%8D-%E0%AE%B0%E0%AE%95%E0%AE%B3%E0%AF%88


Article 4:
Title: Grain Valley couple found guilty of murder , other charges
Source Info:    kmbc.com  2 m

##### Through multiple iterations, we observed that combining OR and AND operators in a single query may limit the query's ability to handle a high number of keywords. However, such queries have a greater likelihood of returning results that align closely with our project's context.

In [34]:
# IN this query we will test with higher number of Keywords but only with OR boolean operator
example_query_english_OR = """
(murder OR homicide OR femicide OR feminicide OR "committed suicide" OR "hanged herself" OR dead OR death OR kills OR killed OR shoots OR shoots OR stabs OR stabbed OR beaten OR strangled OR drowned OR beheaded OR incinerates OR burns)
"""

query_and_print_results(url,example_query_english_OR)

Article 1:
Title: Femminicidio a Pozzuoli | uccide la moglie e si spara | in casa cerano i tre figli minorenni
Source Info:    zazoom.it  1 month ago () Italian    Italy
Link: https://www.zazoom.it/2023-07-28/femminicidio-a-pozzuoli-uccide-la-moglie-e-si-spara-in-casa-cerano-i-tre-figli-minorenni/13280114/


Article 2:
Title: Jäger aus Hamm erschießt Stewardess bei Mord am Flughafen Frankfurt
Source Info:    hna.de  1 month ago () German    Germany
Link: https://www.hna.de/hessen/flughafen-frankfurt-mord-jaeger-stewardess-erschossen-zr-92452305.html


Article 3:
Title: Decapitaciones , estrangulamientos y suicidios en los  hoteles del amor  de Japón
Source Info:    elmundo.es  1 month ago () Spanish    Spain
Link: https://www.elmundo.es/internacional/2023/07/25/64bf95c5e85ece37508b458f.html


Article 4:
Title: North Carolina Shooting News | Live Feed
Source Info:    newsnow.co.uk  2 weeks ago () English    United Kingdom
Link: https://www.newsnow.co.uk/h/World+News/US/States/North+Caro

###### On the other hand, using a single OR operator in the query yields longer result lists due to more relaxed constraints. We will now proceed to test the same query in Spanish for further investigation.

In [25]:
# Same example query as above but translated in spanish  
example_query_spanish = """
(asesinato OR homicidio OR femicidio OR feminicidio OR "se suicido" OR "se ahorco"
OR muerta OR muerte OR mata OR mató OR dispara OR balea OR apuñala OR acuchillada OR golpeada OR estrangula OR 
ahogada OR degollada OR incinera OR quemada) 
"""

query_and_print_results(url,example_query_spanish)

Article 1:
Title: Skeleton saint Santa Muerte attracts devotees among US Latinos
Source Info:    france24.com  1 month ago () English    France
Link: https://www.france24.com/en/live-news/20230818-skeleton-saint-santa-muerte-attracts-devotees-among-us-latinos


Article 2:
Title: Will Argentina Reach Its 1 Million Bpd Oil Production Goal ? 
Source Info:    oilprice.com  2 months ago () English    United States
Link: https://oilprice.com/Energy/Crude-Oil/Will-Argentina-Reach-Its-1-Million-Bpd-Oil-Production-Goal.html


Article 3:
Title: Skeleton Saint Santa Muerte Attracts Devotees Among US Latinos
Source Info:    urdupoint.com  1 month ago () English    Pakistan
Link: https://www.urdupoint.com/en/miscellaneous/skeleton-saint-santa-muerte-attracts-devotees-1739782.html


Article 4:
Title: Porniți la drum ? Cum se circulă în această dimineață pe principalele drumuri din țară
Source Info:    ziuaconstanta.ro  1 week ago () Romanian    Romania
Link: https://www.ziuaconstanta.ro/stiri/actual

##### We can see there is a difference in the output when we use Spanish keywords VS Translated(English) Keywords , as the results are less likely to align with our project's context. Also the Source language and country (for example Pakistan, Romania, China) is also not making sense.

##### After Testing with multiple iterations of Example query.
- It can be seen that the maximum length of a succesful query can vary according to the Complexity of our Query which the API can give a Stable response. 
- The API is not able to Handle Complex queries (OR & AND) and it works better with shorter queries. 
- The query returns better results when search keywords are in English, we can utilize the source country , language and domain parameters to search by region for an alternative solution

# Question - how do we filter to results from the last three days and page through those?

In [26]:
def query_and_print_results_by_date(url, example_query):
    start_date = datetime.now() - timedelta(days=3)
    start_date_str = start_date.strftime("%Y%m%d%H%M%S")

    end_date = datetime.now()
    end_date_str = end_date.strftime("%Y%m%d%H%M%S")

    params = {
        "query": example_query,
        "mode": "artlist",
        "maxrecords": 100,
        "startdatetime": start_date_str,
        "enddatetime": end_date_str,
    }
   

    response = requests.get(url, params=params)

    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')
        article_elements = soup.find_all('a')

        for i, article in enumerate(article_elements):
            title = article.find('span', class_='arttitle').text
            source = article.find('span', class_='sourceinfo').text
            link = article['href']

            print(f"Article {i + 1}:")
            print(f"Title: {title}")
            print(f"Source Info: {source}")
            print(f"Link: {link}")
            print("\n")
    else:
        print("Error:", response.status_code)


In [27]:

example_query_bydate = """
(asesinato OR homicidio OR femicidio OR feminicidio OR "se suicido" OR "se ahorco"
OR muerta OR muerte OR mata OR mató OR dispara OR balea OR apuñala OR acuchillada OR golpeada OR estrangula OR 
ahogada OR degollada OR incinera OR quemada or mujer) 
"""

query_and_print_results_by_date(url,example_query_bydate)

Article 1:
Title: Transfăgărășanul , cel mai spectaculos drum din România . Pregătește - te pentru o excursie spectaculoasă ! 
Source Info:    stirileprotv.ro  7 hours ago () Romanian    Romania
Link: https://stirileprotv.ro/stiri/travel/transfagarasanul-cel-mai-spectaculos-drum-din-romania-pregateste-te-pentru-o-excursie-spectaculoasa.html


Article 2:
Title:   Satanic Hispanic review : A mixed bag of trick - less treats
Source Info:    mashable.com  1 day ago () English    United States
Link: https://mashable.com/article/satanic-hispanics-review


Article 3:
Title: Vilija Matačiūnaitė – apie naują gyvenimo etapą , darbą emigracijoje ir vaikus : „ Turėjau svajonę  | Vardai
Source Info:    15min.lt  1 day ago () Lithuanian    Lithuania
Link: https://www.15min.lt/zmones/naujiena/lietuva/vilija-mataciunaite-apie-nauja-gyvenimo-etapa-darba-emigracijoje-ir-vaikus-turejau-svajone-1050-2112264


Article 4:
Title: TV Guide - TVguide . co . uk
Source Info:    tvguide.co.uk  1 day ago () Englis

-  We can see the obtained results are withing the timeframe but again the query used in Spanish words returns insignificant URLs which may not be related to our project's motive at all. 

In [28]:
example_query_bydate = """(violencia OR agresión OR "daño físico") AND (mujer OR "mujer joven" OR "cuerpo de mujer")"""

query_and_print_results_by_date(url,example_query_bydate)

In [29]:
example_query_bydate_translated = """(violence OR assault OR "physical harm") AND (female OR "young female" OR "woman's body")"""

query_and_print_results_by_date(url,example_query_bydate_translated)

Article 1:
Title: 暴力之家 ！ 搭高鐵拒換座位 妙齡女遭一家3口 「 圍毆10分鐘 」 全身傷 | 兩岸 | 三立新聞網 SETN . COM
Source Info:    setn.com  17 hours ago () Chinese    Taiwan
Link: https://www.setn.com/News.aspx?NewsID=1354744


Article 2:
Title: A Hostess Sues Nobu Malibu for Sexual Assault , Seeking $500 , 000 – Robb Report
Source Info:    robbreport.com  1 day ago () English    United States
Link: https://robbreport.com/food-drink/dining/nobu-malibu-lawsuit-1234897895/


Article 3:
Title: Vold og trusler mod fængselsbetjente når laveste niveau i flere år
Source Info:    politiken.dk  1 day ago () Danish    Denmark
Link: https://politiken.dk/indland/art9531242/Vold-og-trusler-mod-f%C3%A6ngselsbetjente-n%C3%A5r-laveste-niveau-i-flere-%C3%A5r


Article 4:
Title: Vlna násilí v Česku : Bitky , napadené dítě a znásilnění v parku
Source Info:    tn.nova.cz  2 days ago () Czech    Czech Republic
Link: https://tn.nova.cz/zpravodajstvi/clanek/522365-vlna-nasili-v-cesku-bitky-napadene-dite-a-znasilneni-v-parku


Article 5:
Tit

###### We once again observed improved results, especially when filtering the data for the previous three days. However, it's important to note that the countries of origin for the articles retrieved may vary. While there is no explicit option to paginate through results, it might be feasible to address this by increasing the 'maxrecords' parameter to obtain a larger dataset. The count of the output remains subject to the complexity and constraints of our query, which can impact the quantity of relevant articles retrieved

# Question - Can we filter by geographic country or state/provide of publication?

In [30]:

# Define the query with filters
example_query = """
(violence OR assault OR "physical harm") 
AND (female OR "young female" OR "woman's body") 
AND sourcecountry:spain
""" ## source country filter applied

query_and_print_results(url,example_query)

Article 1:
Title: PP y Vox firman su acuerdo en Aragón con un  compromiso absoluto contra la violencia machista  
Source Info:    vozpopuli.com  1 month ago () Spanish    Spain
Link: https://www.vozpopuli.com/espana/politica/elecciones-autonomicas/documental-acuerdo-pp-vox-aragon-violencia-machista.html


Article 2:
Title: SEGURIDAD | Un instructor de defensa personal :  Es clave para el empoderamiento de las mujeres  
Source Info:    diariodemallorca.es  1 month ago () Spanish    Spain
Link: https://www.diariodemallorca.es/sociedad/2023/08/16/instructor-defensa-personal-clave-empoderamiento-91010322.html


Article 3:
Title: Man United fans make Greenwood feelings known outside Old Trafford
Source Info:    caughtoffside.com  1 month ago () English    Spain
Link: https://www.caughtoffside.com/2023/08/14/man-united-fans-make-mason-greenwood-feelings-known-outside-old-trafford/


Article 4:
Title: Sandra Vázquez , secretaria de Igualdade : « Una ley en la que ganan los delincuentes y pier

In [31]:

# Define the query with filters
example_query = """
(violence OR assault OR "physical harm") 
AND (female OR "young female" OR "woman's body") 
AND sourcecountry:spain
AND sourcelang:english
AND domain:elpais.com
"""
# Source country, Source language, Domain Filter Applied
query_and_print_results(url,example_query)

Article 1:
Title: Italian Prime Minister Giorgia Meloni partner comments about rape spark outrage :  If you dont get drunk , you avoid running into a wolf  | International
Source Info:    english.elpais.com  2 weeks ago () English    Spain
Link: https://english.elpais.com/international/2023-08-30/italian-prime-minister-giorgia-melonis-partners-comments-about-rape-spark-outrage-if-you-dont-get-drunk-you-avoid-running-into-a-wolf.html


Article 2:
Title: Honduras adopts El Salvador - style tactics in anti - gang crackdown on prison inmates
Source Info:    english.elpais.com  2 months ago () English    Spain
Link: https://english.elpais.com/international/2023-06-26/honduras-adopts-el-salvador-style-tactics-in-anti-gang-crackdown-on-prison-inmates.html


Article 3:
Title: The 2024 Republican presidential field keeps growing . So why arent there more women ? | U . S . 
Source Info:    english.elpais.com  2 months ago () English    Spain
Link: https://english.elpais.com/usa/2023-07-02/the-20

##### We can see that the source info in the output is stating the country and source of the published news , which shows that our data is being filtered properly, according to required source country, source language or domain provided

###### Thus,  we can create queries to search by any domain or language or Country.


# Searching by Tone (Exploring A Bit More) 

In [32]:

# Define the query with filters
example_query = """
(violence OR assault OR "physical harm") 
AND (female OR "young female" OR "woman's body") 
AND sourcecountry:spain
AND sourcelang:english
AND Tone<-10

"""

query_and_print_results(url,example_query)

Article 1:
Title: Spanish man charged with ten counts of raping British child in Gibraltar
Source Info:    theolivepress.es  2 weeks ago () English    Spain
Link: https://www.theolivepress.es/spain-news/2023/08/31/spanish-man-charged-with-ten-counts-of-raping-british-child-in-gibraltar/


Article 2:
Title: Honduras adopts El Salvador - style tactics in anti - gang crackdown on prison inmates
Source Info:    english.elpais.com  2 months ago () English    Spain
Link: https://english.elpais.com/international/2023-06-26/honduras-adopts-el-salvador-style-tactics-in-anti-gang-crackdown-on-prison-inmates.html




##### As you can see above when we keep tone highly negative we recieve effective output as the news is negative. 

In [33]:

# Define the query with filters
example_query = """
(violence OR assault OR "physical harm") 
AND (female OR "young female" OR "woman's body") 
AND sourcecountry:spain
AND sourcelang:english
AND Tone< 10

"""

query_and_print_results(url,example_query)

# SUMMARY

1. To kickstart the exploration, I conducted initial tests using the provided GDELT API example related to wildlife articles. This allowed me to understand the fundamental workings of the API. However, it became evident that our project's specific query related to feminicides in Argentina might pose challenges due to query length limitations.

2. The experiments highlighted that the length and complexity of our query significantly impact the success of the API call. Complex queries, especially those involving both OR and AND operators, often resulted in fewer matching articles or even no results. As such, we may need to simplify our queries to ensure effective data retrieval.

3. The API appears to have a better understanding of English keywords. Queries with English keywords consistently yielded more relevant results aligned with our project's context. Translating our search keywords into English proved to be a valuable strategy for improving query performance.

4. If a primary requirement for our project is the ability to retrieve articles from the last three days. We can use date filtering using the "startdatetime" and "enddatetime" parameters. This dynamic time frame ensures that our data remains up-to-date.

5. The GDELT API allows us to filter articles by their "Source country" but not exact state/province of publication. By using the "sourcecountry" parameter, we can target the origin of the articles we retrieve. 

6. In addition to geographic filters, the API offers options to filter articles by "source language" and "domain". This level of granularity allows us to tailor our data sources more precisely to track data by a Publisher, ensuring that we obtain the most relevant articles.

7. I also explored tone-based filtering using the "Tone" parameter. This feature enables us to search for articles with specific emotional tones, such as highly negative or positive tones. It's an interesting option for fine-tuning our data selection process.

##  API Costs & Limitation
There are no direct costs associated with using the API for our project's data extraction needs. In fact, GDELT provides a range of additional data extraction and storage options, including integration with BigQuery, access to CSV files, global graph datasets, collaboration with generative AI models, cloud-based analysis services, and the ability to embed visuals directly from GDELT data.

# Conclusion
In this exploration of the GDELT 2.0 API, we aimed to assess its suitability as an additional data source . Our investigation revealed several key findings:

- Query Flexibility: The GDELT API allows for flexible queries, but it has limitations in handling complex boolean queries with a large number of keywords. Shorter and more focused queries tend to yield better results.

- Language Sensitivity: Queries in English tend to produce more relevant results compared to queries in other languages. This is crucial for our project, as it ensures alignment with our research objectives.

- Temporal Filtering: The API supports filtering results by date, making it possible to retrieve articles from a specific time frame, such as the last three days. This feature can be valuable for tracking recent developments related to feminicides.

- Geographic Filtering: The GDELT API allows for filtering by source country, source language, and specific domains. This feature ensures that we can narrow down our data to include only relevant sources for our project.

- Tone Analysis: We can filter articles by tone, enabling us to focus on news with a specific emotional context, which may be useful for sentiment analysis in our research.

- API Limits: While the GDELT API offers free access, it's important to be aware of the associated rate limits and usage constraints, which can vary depending on the type of user.

- Integration Options: However, it provides integration options with various data analysis and visualization tools, expanding our capabilities beyond data retrieval.







##### Addional parameters and options we can consider :

Sorting: You can specify the sorting order of results using the SORT parameter. For example, you can sort by date or tone, depending on your research goals.

Timespan: You can specify a custom timespan for your search using the TIMESPAN parameter. This allows you to focus on articles published within a specific timeframe.

Maximum Records: You can control the maximum number of records returned using the MAXRECORDS parameter. This can help you manage the size of your result set.

Additional Output Modes: The API supports various output modes, such as ArtGallery and ArtList. Depending on your needs, you can choose an output mode that suits your research.

TIMELINESMOOTH: If you are working with timeline data, you can enable moving window smoothing using the TIMELINESMOOTH parameter.

TRANS: If needed, you can embed a machine translation widget in the results page using the TRANS parameter.

STARTDATETIME/ENDDATETIME: Instead of specifying a timespan, you can use these parameters to set precise start and end dates and times for your search.

