Political Discourse Keyword Analysis

This notebook will explore word usage patterns across 20 speeches made by current Secretary General Ant√≥nio Guterres using basic NLP and text analysis techniques

In [1]:
import os

In [2]:
data_path = "../data/raw/"
files = os.listdir(data_path)

In [3]:
speeches = []
for file in sorted(os.listdir(data_path)):
    if file.endswith(".txt"):
        with open(os.path.join(data_path, file), "r", encoding="utf-8") as f:
            text = f.read()
            speeches.append(text)

print(f"Loaded {len(speeches)} speeches")

Loaded 20 speeches


In [4]:
print(speeches[0][:400])

As we enter the new year, the world stands at a crossroads.


Chaos and uncertainty surround us. 


Division. Violence. Climate breakdown. And systemic violations of international law.


A retreat from the very principles that bind us together as a human family. 


People everywhere are asking: Are leaders even listening? Are they ready to act.


As we turn the page on a turbulent year, one fact s


In [5]:
import re

In [6]:
cleaned_speeches = []
for speech in speeches:
    text = speech.lower()
    text = re.sub(r"[^a-z\s]", "", text)
    cleaned_speeches.append(text)

print(cleaned_speeches[0][:300])

as we enter the new year the world stands at a crossroads


chaos and uncertainty surround us 


division violence climate breakdown and systemic violations of international law


a retreat from the very principles that bind us together as a human family 


people everywhere are asking are leaders e


In [7]:
import spacy
nlp = spacy.load("en_core_web_sm")

all_tokens = []
for speech in cleaned_speeches:
    doc = nlp(speech)
    tokens = [token.text for token in doc if token.is_alpha]
    all_tokens.extend(tokens)

print(f"Total number of words: {len(all_tokens)}")

Total number of words: 6403


In [9]:
from collections import Counter

word_counts = Counter(all_tokens)

word_counts.most_common(20)

[('the', 349),
 ('and', 340),
 ('of', 222),
 ('to', 217),
 ('a', 108),
 ('in', 91),
 ('is', 78),
 ('for', 75),
 ('we', 70),
 ('that', 68),
 ('are', 57),
 ('this', 53),
 ('on', 48),
 ('i', 47),
 ('it', 46),
 ('as', 44),
 ('more', 43),
 ('all', 37),
 ('with', 35),
 ('people', 31)]