Make sure the transformer and torch modules are installed. I'm assuming that we're running under venv so it's ok to install modules. If this is not the case for you, install these whatever your custom.

In [1]:
import sys
!{sys.executable} -m pip install transformers
!{sys.executable} -m pip install torch
!{sys.executable} -m pip install --upgrade jupyter
!{sys.executable} -m pip install --upgrade ipywidgets



Set up the basic environment for a humbert model (https://huggingface.co/docs/transformers/main/en/model_doc/xlm-roberta#transformers.XLMRobertaForMaskedLM) and include some useful modules

In [2]:
import os
import requests
import json
from transformers import pipeline

# read my huggingface api token from a file into an environment variable
#with open('/Users/tjordan/code/secrets/huggingface', 'r') as file:
#    os.environ['HUGGINGFACE_API_TOKEN'] = file.read().replace('\n', '')


Retrieve some text that we can use to play around with summarization (example is from Brazil Floods 2024 on reliefweb)

In [3]:

response = requests.get('https://api.reliefweb.int/v1/disasters/51859?appname=tom.jordan2@redcross.org')

if response.status_code == 200:
    response_data = response.json()
else:
    print('Error: ', response.status_code)

Take a peek and make sure we got what we thought


In [None]:

# Print the response

pretty_response = json.dumps(response_data, indent=4)
print(pretty_response)

Find the overview tag in the response

In [4]:
overview = response_data['data'][0]['fields']['profile']['overview']
print(overview)

Heavy rainfall has been affecting south-eastern Brazil, in particular the Rio de Janeiro State over the last 48 hours, causing floods, flash floods and triggering landslides that have resulted in casualties and damage. Media report, as of 15 January, eleven fatalities and one person still missing across the Rio de Janeiro Metropolitan Region. In addition, media also report the flooding of the subway network. The state of emergency was declared by the local authority over the Rio de Janeiro City area. Over the next 48 hours, more heavy rainfall with locally very heavy rainfall is still forecast over the whole Rio de Janeiro State. ([ECHO, 15 Jan 2024](https://reliefweb.int/node/4029773))

Since 26 January, heavy rainfall has been affecting Bahia State, in north-eastern Brazil, causing floods that have resulted in casualties and damage. According to local authorities and to the Pan American Health Organisation (PAHO), as of 30 January, four people died, of whom two in Castro Alves Munici

TODO: Explore some extraction models...NER (Named Entity Recognition) appears to be the thing we need to do to extract expected text (e.g. number killed, displaced, missing, etc) from an unstructured body of text. Common models for NER include 

In [5]:
qa_model = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")
question = "How many people were killed?"
qa_model(question = question, context = overview)

{'score': 0.748996376991272, 'start': 7762, 'end': 7765, 'answer': '151'}

In [6]:
qa_model = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")
question = "How many people were injured?"
qa_model(question = question, context = overview)

{'score': 0.9801026582717896, 'start': 1777, 'end': 1782, 'answer': '9,751'}

In [7]:
qa_model = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")
question = "How many people were displaced?"
qa_model(question = question, context = overview)

{'score': 0.9791094064712524, 'start': 1777, 'end': 1782, 'answer': '9,751'}

In [10]:
qa_model = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad") 
question = "How many people are affected?"
qa_model(question = question, context = overview)

{'score': 0.8315784931182861,
 'start': 7942,
 'end': 7951,
 'answer': '2,282,000'}