<a href="https://colab.research.google.com/github/usc-isi-i2/kgtk-aaai2023/blob/main/04.1-IdentifyMoralFoundationsInACLED.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Identify Moral Foundations in text [ACLED Edition] **

In this notebook we'll be using a pre-trained model to identify moral foundations in text. The text could be anything but preferably of similar length to an average tweet as that is what the model was trained on. In this exmple we will be using Telegram messages as these are publicly availble and relatively easy to export from the desktop Telegram client.

Notes:

*   we are using the `bert-base-uncased` tokenizer
*   weights are loded from `model_weights.pkl` file
*   data used is the `translated_messages.json` file

---

***Plese make sure that you have the GPU runtime selected for this notebook***

    - select Runtime -> Change runtime type -> Hardware accelerator -> GPU 

# **GPU setup**
    - check GPU availability
    - setup torch to use GPU device

In [None]:
#@title Check gpu availability

!nvidia-smi


from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('\n\nYour runtime has {:.1f} gigabytes of available RAM\n\n'.format(ram_gb))

In [None]:
#@title Setup torch to use GPU device

import torch

SEED = 7 #@param {type: "slider", min: 0, max: 100}


# get a count of how many GPU devices area available to us
num_gpu_devices = torch.cuda.device_count()
print("There {} {} GPU device{}".format(
    'is' if num_gpu_devices == 1 else 'are',
    num_gpu_devices,
    '' if num_gpu_devices == 1 else 's' 
))


# manually set the seed when using the gpu
if num_gpu_devices > 0:
    torch.cuda.manual_seed_all(SEED)


# Set the device to use gpu or cpu
if torch.cuda.is_available():
    device = torch.device("cuda:0")
    print("Great, using the GPU!")
else:
    device = torch.device("cpu")
    print("Not great, using the CPU!")
    raise Exception('Check if you have the GPU runtime selected')

There is 1 GPU device
Okay, using the GPU!


#**Data**#
    - enter ACLED credentials
    - adjust parameters as needed
    - fetch data from ACLED
    - limit/filter **data**

In [None]:
#@title Enter your ACLED credentials

ACLED_KEY = ''  #@param {type: "string"}
EMAIL = ''  #@param {type: "string"}

In [None]:
#@title Adjust parameters as needed

COUNTRY = 'Mozambique' #@param {type: "string"}

MIN_DATE = '2022-01-01'  #@param {type: "date"}
MAX_DATE = '2023-01-01'  #@param {type: "date"}

In [None]:
#@title Fetch data form ACLED

import requests
import pandas as pd
from io import StringIO


url = 'https://api.acleddata.com/acled/read.csv?terms=accept'
url += '&key={key}'.format(key=ACLED_KEY)
url += '&email={email}'.format(email=EMAIL)
url += '&country={country}'.format(country=COUNTRY)
url += '&event_date={{{min_date}|{max_date}}}'.format(min_date=MIN_DATE, max_date=MAX_DATE)
url += '&event_date_where=BETWEEN&limit=0'


# Fetch the data
results = requests.get(url)
df = pd.read_csv(StringIO(results.text))

In [None]:
#@title Limit the scope
# only use data from the Cabo Delgado region


_df = df[df['admin1'] == 'Cabo Delgado']
notes = list(_df.notes.unique())

#**Model**#
    - install transformers
    - define labels
    - define model class
    - load weights into model

In [None]:
#@title Install the transformers library

%%time
%%capture

!pip install transformers

CPU times: user 47.7 ms, sys: 25.3 ms, total: 72.9 ms
Wall time: 7.75 s


In [None]:
#@title Define moral foundation labels

moral_foundation_labels = [
    'care',
    'harm',
    'fairness',
    'cheating',
    'loyalty',
    'betrayal',
    'authority',
    'subversion',
    'sanctity',
    'degradation',
    'non-moral',
]

In [None]:
#@title Define model class

import pickle
import torch
import transformers


class BERTClass(torch.nn.Module):

    def __init__(self, num_classes):
        super(BERTClass, self).__init__()
        self.l1 = transformers.BertModel.from_pretrained('bert-base-uncased')
        self.dropout = torch.nn.Dropout(0.5)
        self.classifier = torch.nn.Linear(768, num_classes)

    def forward(self, ids, mask, token_type_ids):
        output_1 = self.l1(input_ids=ids, attention_mask=mask, token_type_ids=token_type_ids)
        pooler = self.dropout(output_1.pooler_output)
        output = self.classifier(pooler)
        return output, output_1.last_hidden_state[:, 0, :]

    def save_bert(self, save_path):
        torch.save(self.l1.state_dict(), save_path)

    def save_model(self, save_path):
        with open(save_path, 'wb') as file:
            pickle.dump(self, file)

In [None]:
#@title Mount Google Drive
# to include our model weights file

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
#@title Load the model

%%time
%%capture

import torch


model_weights_folder_path = "/content/drive/My Drive/KGTK Tutorial/models/"
model_weights_filename = 'model_weights.pkl'
model_weights_file = model_weights_folder_path + model_weights_filename


model = BERTClass(len(moral_foundation_labels))
model.load_state_dict(torch.load(model_weights_file))
model.to(device)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


CPU times: user 6.88 s, sys: 2.06 s, total: 8.94 s
Wall time: 14.6 s


#**Processing**#
    - set batch size config varible
    - handle tokenization
    - handle validation

In [None]:
#@title Set VALID_BATCH_SIZE

VALID_BATCH_SIZE = 4 #@param {type: "slider", min: 1, max: 10}

In [None]:
#@title Handle tokenization

%%time
%%capture


import torch
from transformers import BertTokenizer
from torch.utils.data import DataLoader
from torch.utils.data import TensorDataset


def handle_tokenize(texts, tokenizer, labels=None):
    encoding = tokenizer(texts, return_tensors='pt', padding=True, truncation=True)
    ids = encoding['input_ids']  # default max_seq 512
    mask = encoding['attention_mask']
    token_type_ids = encoding['token_type_ids']

    if labels:
        targets = torch.tensor(labels)
        return TensorDataset(ids, mask, token_type_ids, targets)
    else:
        return TensorDataset(ids, mask, token_type_ids)


tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
test_set = handle_tokenize(texts=sentences, tokenizer=tokenizer)
testing_loader = DataLoader(test_set, batch_size=VALID_BATCH_SIZE, shuffle=False, num_workers=4)

CPU times: user 1.85 s, sys: 6.39 ms, total: 1.85 s
Wall time: 2.29 s


In [None]:
#@title Handle validation

%%time
%%capture

import numpy as np
from torch.utils.data import DataLoader


def handle_validation(val_loader, model):
    model.eval()
    fin_outputs = []
    fin_embeddings = []

    with torch.no_grad():
        for step, batch in enumerate(val_loader):
            batch = [r.to(device) for r in batch]

            if len(batch) == 4:
                ids, mask, token_type_ids, label = batch
                labels_in_batch = True
            elif len(batch) == 3:
                ids, mask, token_type_ids = batch
                labels_in_batch = False

            outputs, embeddings = model(ids, mask, token_type_ids)

            # big_val, big_idx = torch.max(outputs.data, dim=1)
            fin_outputs.append(outputs.cpu().detach().numpy())
            fin_embeddings.append(embeddings.cpu().detach().numpy())

    fin_outputs = np.concatenate(fin_outputs, axis=0)
    fin_embeddings = np.concatenate(fin_embeddings, axis=0)

    return fin_outputs, fin_embeddings


testing_loader = DataLoader(test_set, batch_size=VALID_BATCH_SIZE, shuffle=False, num_workers=4)
MF_outputs, _ = handle_validation(testing_loader, model)
MF_outputs = torch.nn.functional.softmax(torch.Tensor(MF_outputs), dim=-1)
MF_outputs = MF_outputs.numpy()

CPU times: user 22.1 s, sys: 734 ms, total: 22.8 s
Wall time: 25.1 s


# **Output**
    - print ACLED data notes
    - print moral foundation scores

In [None]:
#@title Print notes with moral foundation scores

# Handle serialization of moral foundation scores as float values in json
# Use NumpyEncoder to convert numpy data to list
# Error: Object of type int64 is not JSON serializable

import json
from tqdm import tqdm


class NumpyEncoder(json.JSONEncoder):
    """ Custom encoder for numpy data types """
    def default(self, obj):
        if isinstance(obj, (np.int_, np.intc, np.intp, np.int8,
                            np.int16, np.int32, np.int64, np.uint8,
                            np.uint16, np.uint32, np.uint64)):

            return int(obj)

        elif isinstance(obj, (np.float_, np.float16, np.float32, np.float64)):
            return float(obj)

        elif isinstance(obj, (np.complex_, np.complex64, np.complex128)):
            return {'real': obj.real, 'imag': obj.imag}

        elif isinstance(obj, (np.ndarray,)):
            return obj.tolist()

        elif isinstance(obj, (np.bool_)):
            return bool(obj)

        elif isinstance(obj, (np.void)):
            return None

        return json.JSONEncoder.default(self, obj)


index = 0
for note in tqdm(notes):
    print(note)
    print(json.dumps(
        dict(zip(moral_foundation_labels, MF_outputs[index])),
        indent=4,
        ensure_ascii=False,
        cls=NumpyEncoder,    
    ))
    print()
    index += 1


  8%|▊         | 152/1821 [00:00<00:01, 1513.67it/s]

Movement of forces: On 1 January 2023, Mozambican forces started Operation Vulcan IV to destroy Islamist militia bases along the north of the Rio Messalo [coded to Nairoto (Montepuez, Cabo Delgado)], with support of SAMIM forces.
{
    "care": 0.04258796200156212,
    "harm": 0.203754723072052,
    "fairness": 0.006580298766493797,
    "cheating": 0.019666917622089386,
    "loyalty": 0.03549037501215935,
    "betrayal": 0.04234194755554199,
    "authority": 0.014969222247600555,
    "subversion": 0.04426075518131256,
    "sanctity": 0.0116215655580163,
    "degradation": 0.01831570267677307,
    "non-moral": 0.5604106187820435
}

On 31 December 2022, Islamist militia clashed with local militias and attacked civilians in Namacule (Muidumbe, Cabo Delgado), Nampanha (coded separately) and Namande (coded separately) for a second day. Houses were burned. At least 10 people were killed (disputing reports). Fatalities split into 6 events.
{
    "care": 0.023423712700605392,
    "harm": 0.7422

 24%|██▎       | 432/1821 [00:00<00:01, 1193.77it/s]

Displacement: Around 30 June 2022 (as reported), about 22,500 people fled the area of Ancuabe (Ancuabe, Cabo Delgado) after Islamist militia attacked the area between 6 June and 12 June 2022 according to the International Organization for Migration (OIM). 2,500 were displaced right after attacks in Ancuabe on 5 June (coded separately).
{
    "care": 0.0269542895257473,
    "harm": 0.5417407751083374,
    "fairness": 0.005897517781704664,
    "cheating": 0.04141117259860039,
    "loyalty": 0.02378353290259838,
    "betrayal": 0.06079059839248657,
    "authority": 0.007155838888138533,
    "subversion": 0.027668189257383347,
    "sanctity": 0.013214709237217903,
    "degradation": 0.027943775057792664,
    "non-moral": 0.22343960404396057
}

On 30 June 2022, Rwandan forces shot and killed one civilian and wounded another one in Mocimboa da Praia (Mocimboa Da Praia, Cabo Delgado). The civilians were recently resettled from a refugee camp in Quitunda and were trying to cross into a neighbo

 37%|███▋      | 670/1821 [00:00<00:01, 1110.20it/s]


{
    "care": 0.01949400082230568,
    "harm": 0.6462351679801941,
    "fairness": 0.006185548845678568,
    "cheating": 0.05482332780957222,
    "loyalty": 0.01577581651508808,
    "betrayal": 0.07001796364784241,
    "authority": 0.007982202805578709,
    "subversion": 0.051705095916986465,
    "sanctity": 0.010631818324327469,
    "degradation": 0.021378042176365852,
    "non-moral": 0.09577102214097977
}

Around 18 January 2022 (as reported), Mozambican forces announced the capture of a leader of an Islamist militia, born in Tanzania, near Litingina (Nangade, Cabo Delgado). 6 Islamist militia were also captured.
{
    "care": 0.028788156807422638,
    "harm": 0.25473552942276,
    "fairness": 0.014030929654836655,
    "cheating": 0.08017465472221375,
    "loyalty": 0.02517588995397091,
    "betrayal": 0.07151126861572266,
    "authority": 0.00969981774687767,
    "subversion": 0.061950743198394775,
    "sanctity": 0.015755338594317436,
    "degradation": 0.01887563429772854,
    "

 54%|█████▎    | 977/1821 [00:00<00:00, 1235.88it/s]

On 24 June 2021, members of an Islamist militia attacked civilians in Palma (Palma, Cabo Delgado), Maganja (coded separately), Monjane (coded separately) and Olumbi (coded separately). Houses and huts were burnt. No fatalities and/or injuries reported.
{
    "care": 0.029705574735999107,
    "harm": 0.7192997932434082,
    "fairness": 0.004155789967626333,
    "cheating": 0.03240680322051048,
    "loyalty": 0.015636064112186432,
    "betrayal": 0.05653403699398041,
    "authority": 0.009862962178885937,
    "subversion": 0.05258369445800781,
    "sanctity": 0.006346550770103931,
    "degradation": 0.023773344233632088,
    "non-moral": 0.0496954582631588
}

On 24 June 2021, members of an Islamist militia attacked civilians in Maganja (Palma, Cabo Delgado), Monjane (coded separately), Olumbi (coded separately) and Palma (coded separately). Houses were burnt. Civilians fled Maganja by boat and one of four boats with internally displaced persons (IDPs) sunk due to strong winds and nine fe

 60%|██████    | 1101/1821 [00:00<00:00, 1186.40it/s]

On 6 December 2020, an Islamist militia attacked civilians and beheaded an elderly man near Muambula (Muidumbe, Cabo Delgado).
{
    "care": 0.017326142638921738,
    "harm": 0.8445581793785095,
    "fairness": 0.002471337793394923,
    "cheating": 0.026936624199151993,
    "loyalty": 0.004997896496206522,
    "betrayal": 0.022443681955337524,
    "authority": 0.0049272626638412476,
    "subversion": 0.028310110792517662,
    "sanctity": 0.004282171837985516,
    "degradation": 0.015571941621601582,
    "non-moral": 0.028174612671136856
}

On 3 December 2020, former FRELIMO fighters, currently organized as a self-protection unit, clashed with a group of military forces after both sides misidentified each other as members of an Islamist militia in Macomia (Macomia, Cabo Delgado). Around 120 soldiers were brought to a hospital in Nampula, many injured and fatalities reported. Unknown fatalities coded as 3.
{
    "care": 0.021075386554002762,
    "harm": 0.36371132731437683,
    "fairness

 79%|███████▊  | 1430/1821 [00:01<00:00, 1205.95it/s]



On 1 June 2020, about 250 displaced civilians from Macomia arrived in Pemba (Pemba, Cabo Delgado) after members of an Islamist militia occupied the city between 28 and 31 May 2020 (coded separately).
{
    "care": 0.03971262276172638,
    "harm": 0.32769784331321716,
    "fairness": 0.007208683528006077,
    "cheating": 0.036205098032951355,
    "loyalty": 0.034321509301662445,
    "betrayal": 0.05142395570874214,
    "authority": 0.00978576485067606,
    "subversion": 0.04343641176819801,
    "sanctity": 0.007247706409543753,
    "degradation": 0.018493235111236572,
    "non-moral": 0.4244671165943146
}

On 31 May 2020, FDS forces, with support of the DAG, reportedly took back control of Macomia (Macomia, Cabo Delgado). Government sources claimed that 78 members of an Islamist militia were killed, however this could not be corroborated and no bodies were located.
{
    "care": 0.04197927564382553,
    "harm": 0.11080831289291382,
    "fairness": 0.030353112146258354,
    "cheating":

 95%|█████████▌| 1736/1821 [00:01<00:00, 1261.71it/s]



On 16 January 2020, ASWJ attacked a motorbike on the road between Anga village and Mocimboa da Praia. The man escaped the ambush. Assailants, who were wearing military uniforms, torched the vehicle.
{
    "care": 0.033147748559713364,
    "harm": 0.5544706583023071,
    "fairness": 0.009972469881176949,
    "cheating": 0.06603844463825226,
    "loyalty": 0.0279557965695858,
    "betrayal": 0.047127194702625275,
    "authority": 0.008221467956900597,
    "subversion": 0.018172020092606544,
    "sanctity": 0.015096640214323997,
    "degradation": 0.014248554594814777,
    "non-moral": 0.20554906129837036
}

On 16 January 2020, ASWJ attacked a vehicle near Mueda (Macomia, Cabo Delgado). The driver of the vehicle was reportedly killed.
{
    "care": 0.021399658173322678,
    "harm": 0.7767936587333679,
    "fairness": 0.005070453509688377,
    "cheating": 0.04916568845510483,
    "loyalty": 0.009876209311187267,
    "betrayal": 0.02230100892484188,
    "authority": 0.004483737051486969,


100%|██████████| 1821/1821 [00:01<00:00, 1206.97it/s]

On 23 June 2018, suspected ASWJ rebels attacked again in Cabo Delgado, in Maganja, killing 5 and destroying 120 houses.
{
    "care": 0.036227475851774216,
    "harm": 0.4995217025279999,
    "fairness": 0.011625409126281738,
    "cheating": 0.06885772943496704,
    "loyalty": 0.030662983655929565,
    "betrayal": 0.07723309099674225,
    "authority": 0.009205546230077744,
    "subversion": 0.05902699753642082,
    "sanctity": 0.011290773749351501,
    "degradation": 0.015218809247016907,
    "non-moral": 0.18112941086292267
}

On 22 June 2018, the suspected ASWJ attacked in Lalane, Palma district, killing 6 people.
{
    "care": 0.02671545185148716,
    "harm": 0.794367790222168,
    "fairness": 0.0052013155072927475,
    "cheating": 0.047672249376773834,
    "loyalty": 0.00680351909250021,
    "betrayal": 0.02036849781870842,
    "authority": 0.002622438594698906,
    "subversion": 0.0076897768303751945,
    "sanctity": 0.006883086636662483,
    "degradation": 0.008042778819799423,
 


