In [1]:
import pandas as pd
from tqdm.notebook import tqdm
import gensim
import nltk; nltk.download('punkt')
from nltk.tokenize import sent_tokenize, word_tokenize
import warnings
warnings.filterwarnings("ignore")

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\vassi\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [2]:
# read from tell_all_cleaned.csv file (couldn't download through jupyter)
data = pd.read_csv("tell_all_cleaned.csv")
data.sample()

Unnamed: 0,member_name,sitting_date,parliamentary_period,parliamentary_session,parliamentary_sitting,political_party,government,member_region,roles,member_gender,speaker_info,speech
304440,μπασιακος αθανασιου ευαγγελος,02/11/1998,period 9,session 3,sitting 12,νεα δημοκρατια,['σημιτη κωνσταντινου(25/09/1996-13/04/2000)'],βοιωτιας,['βουλευτης'],male,,κυριε προεδρε @sw πρωτοβουλια @sw σηµιτη @sw θ...


In [3]:
data.shape[0]

1280918

In [4]:
data.sitting_date = pd.to_datetime(data.sitting_date, dayfirst=True)

In [5]:
data.set_index("sitting_date", inplace=True)
data.sort_index(inplace=True)

In [6]:
data.speech.fillna("", inplace=True)
data = data[data.speech.apply(len)>0]

In [7]:
annual_models = {} 
# you can save other years as well 
for y in tqdm(range(2004, 2010)):
    annual_models[y] = data[f"{y}-1":f"{y}-12"].reset_index()
    annual_models[y].to_csv(f"{y}.csv.gz", compression="gzip", index=False)

  0%|          | 0/6 [00:00<?, ?it/s]

In [8]:
# training a model on data from 1980
docs = annual_models[2004].speech.apply(word_tokenize).to_list()

In [9]:
model04 = gensim.models.FastText(docs, 
                                 vector_size=100, 
                                 window=5, 
                                 min_count=3)

In [10]:
model04.wv.most_similar("χρεος")

[('πληθος', 0.8522717356681824),
 ('υποχρεος', 0.8319838047027588),
 ('πορος', 0.8297770023345947),
 ('δανεισμος', 0.8240165114402771),
 ('τοιχος', 0.8202792406082153),
 ('αμειλικτος', 0.8130025267601013),
 ('οφελος', 0.8096746206283569),
 ('αειφορος', 0.8007105588912964),
 ('φτωχος', 0.8001133799552917),
 ('μετοχος', 0.7976776361465454)]

In [11]:
docs = annual_models[2009].speech.apply(word_tokenize).to_list()
model09 = gensim.models.FastText(docs, 
                                 vector_size=100, 
                                 window=5, 
                                 min_count=3)

In [12]:
model09.wv.most_similar("χρεος")

[('φανερος', 0.7441351413726807),
 ('δανεισμος', 0.7400953769683838),
 ('κοστος', 0.7396907210350037),
 ('ελλειμματικος', 0.7324572801589966),
 ('χριστος', 0.7261181473731995),
 ('υπερδανεισμος', 0.7243877053260803),
 ('υψηλος', 0.7226791381835938),
 ('χρεοκοπησε', 0.697590172290802),
 ('υψος', 0.6926799416542053),
 ('αδιαψευστος', 0.6887297034263611)]

# Report

We tried to compare the speeches on the parliament of 2 close but very different years for the politics of Greece. 2004 was one of the most successful years for Greece as we hosted the Olympics, won the football euros and had many other successes in varying areas. Many pressume that the money we were lented on those years took an important toll on the economy which finally led to the economic crisis. This crisis started to become visible in 2008-09 were it was clear that the Greek economy couldn't handle the dept that had mounted up. On our analogy we tried to compare the word "χρεος" meaning dept between these 2 years. We trained our data with the 2004 and 2009 parliament speeches creating 2 different vocabularies. On the first vocabulary, for 2004, we can see that "χρεος" is considered similar to 'δανεισμος', 'πληθος', 'υποχρεος', 'αμειλικτος' etc. On the other hand, for the 2009 vocabulary it is similar with 'δανεισμος', 'υπερδανεισμος', 'κοστος', 'υψηλος', 'φανερος', 'ελλειμματικος'. The results clearly indicate that in 2004 there wasn't any real concern in the parliament about the debt while in 2009 there are clear indications that the dept is aknowledged as "overgrown", "high", "vissible" and there are talks about deficits and defaulting. In conclusion, what we understand from this analogy is that the greek governments hadn't forseen and forecasted the catastrophic events the large dept created in the decade of 2000 would create for the next decade, something they realized in 2008-09 and later.  