# ValueMonitor - Create your own topic model

This page is a visualisation of the ValueMonitor prototype. In case you would like to use the notebook, click on the icon ‘**Run in Google Colab**’ hereunder:

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tristandewildt/ValueMonitor_Workshops/blob/main/ValueMonitor_Workshop_create_own_model.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tristandewildt/ValueMonitor_Workshops/blob/main/ValueMonitor_Workshop_create_own_model.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

## Table of content:
* [1. Import dataset and packages](#import_dataset_and_packages)
* [2. Creating the topic model](#creating_the_topic_model)
* [3. Verifying the topic model](#verifying_the_topic_model)
* [4. Values in different realms](#values_in_different_realms)
* [5. Values over time](#values_over_time)
* [6. Gap assessment](#gap_assessment)


## 1. Import dataset and packages  <a name="import_dataset_and_packages"></a>

### 1.1. Import packages

In this step, the relavant python packages are imported

In [1]:
''' Packages'''

!pip install corextopic
!pip install joblib
!pip install tabulate
!pip install simple_colors
!pip install ipyfilechooser

import os, sys, importlib
import pandas as pd
import ipywidgets as widgets
from ipywidgets import interact, interact_manual, Button
import pickle
from ipyfilechooser import FileChooser
from tkinter import Tk, filedialog
from IPython.display import clear_output, display
from google.colab import files
import nltk
import io
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('vader_lexicon')

''' Source code'''

user = "tristandewildt"
repo = "ValueMonitor_Workshops"
src_dir = "code"
pyfile_1 = "make_topic_model.py"
pyfile_2 = "create_visualisation.py"
token = "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"

if os.path.isdir(repo):
  !rm -rf {repo}

!git clone https://{token}@github.com/{user}/{repo}.git

from ValueMonitor_Workshops.code.make_topic_model import *
from ValueMonitor_Workshops.code.create_visualisation import *

#dict_datasets = ['digital_technologies': , 'energy_transition_literature': , 'hydrogen_literature': ]

# digital_technologies
#https://drive.google.com/file/d/14W1UddxBOmJZC76NhmECqYy1wzULillW/view?usp=sharing
#https://drive.google.com/file/d/14ZD9KMg5HLOWCPL0dVBWkx_JxHfWu1eg/view?usp=sharing
#https://drive.google.com/file/d/14WkV2Rxawiwv3ZPwqaWgJip3xgIRaFr_/view?usp=sharing

# energy_transition_literature
#https://drive.google.com/file/d/14jvSWbvh7z_evqki0Gm05_xJM6G-UqC3/view?usp=sharing
#https://drive.google.com/file/d/15DADyQ254XQXywrmHCZkuyaOZHw9ByPE/view?usp=sharing
#https://drive.google.com/file/d/15PkuuXw_Rw1nBJaCG6P8YP2TZNtla36M/view?usp=sharing

# hydrogen_literature
#https://drive.google.com/file/d/14lZyvRFqbkDp8w6xFojjUWUNnVvSduUF/view?usp=sharing
#https://drive.google.com/file/d/14eKvE-fzc9355TYklJ3_g3qjKrorgO2k/view?usp=sharing
#https://drive.google.com/file/d/15-J5dh50ySBM8qfGRbpgzGgZrn0CMFKI/view?usp=sharing

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


Cloning into 'ValueMonitor_Workshops'...
remote: Enumerating objects: 165, done.[K
remote: Counting objects: 100% (112/112), done.[K
remote: Compressing objects: 100% (68/68), done.[K
remote: Total 165 (delta 67), reused 70 (delta 44), pack-reused 53[K
Receiving objects: 100% (165/165), 1.31 MiB | 13.64 MiB/s, done.
Resolving deltas: 100% (94/94), done.


##1.2. Import datasets

In [17]:
pip install pandas==1.4.1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pandas==1.4.1
  Downloading pandas-1.4.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.7/11.7 MB[0m [31m61.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pandas
  Attempting uninstall: pandas
    Found existing installation: pandas 1.3.5
    Uninstalling pandas-1.3.5:
      Successfully uninstalled pandas-1.3.5
Successfully installed pandas-1.4.1


In [38]:
''' Digital technologies'''

!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=14W1UddxBOmJZC76NhmECqYy1wzULillW' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=14W1UddxBOmJZC76NhmECqYy1wzULillW" -O dataset_digital_technologies && rm -rf /tmp/cookies.txt
with open('dataset_digital_technologies', "rb") as fh:
    df = pickle.load(fh)

--2023-03-07 15:57:05--  https://docs.google.com/uc?export=download&confirm=&id=14W1UddxBOmJZC76NhmECqYy1wzULillW
Resolving docs.google.com (docs.google.com)... 74.125.20.100, 74.125.20.102, 74.125.20.101, ...
Connecting to docs.google.com (docs.google.com)|74.125.20.100|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://doc-08-2c-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/2m20uo9e1s88i3hid67vk3r3gmhapirn/1678204575000/12635936161789443610/*/14W1UddxBOmJZC76NhmECqYy1wzULillW?e=download&uuid=d21e5708-a9d7-44dd-a2a3-db677a60fc24 [following]
--2023-03-07 15:57:06--  https://doc-08-2c-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/2m20uo9e1s88i3hid67vk3r3gmhapirn/1678204575000/12635936161789443610/*/14W1UddxBOmJZC76NhmECqYy1wzULillW?e=download&uuid=d21e5708-a9d7-44dd-a2a3-db677a60fc24
Resolving doc-08-2c-docs.googleusercontent.com (doc-08-2c-docs.googleusercontent.com)... 173.194.202.132, 2607

In [37]:
''' Energy Transition Literature'''

!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=14jvSWbvh7z_evqki0Gm05_xJM6G-UqC3' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=14jvSWbvh7z_evqki0Gm05_xJM6G-UqC3" -O dataset_energy_transition_literature && rm -rf /tmp/cookies.txt
with open('dataset_energy_transition_literature', "rb") as fh:
    df = pickle.load(fh)

--2023-03-07 15:56:56--  https://docs.google.com/uc?export=download&confirm=&id=14jvSWbvh7z_evqki0Gm05_xJM6G-UqC3
Resolving docs.google.com (docs.google.com)... 74.125.20.100, 74.125.20.102, 74.125.20.101, ...
Connecting to docs.google.com (docs.google.com)|74.125.20.100|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://doc-04-2c-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/vkbf3q91ntop0cbqnp26mvcj1qs233mc/1678204575000/12635936161789443610/*/14jvSWbvh7z_evqki0Gm05_xJM6G-UqC3?e=download&uuid=04206c12-fec6-4529-a514-d961b417928b [following]
--2023-03-07 15:56:58--  https://doc-04-2c-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/vkbf3q91ntop0cbqnp26mvcj1qs233mc/1678204575000/12635936161789443610/*/14jvSWbvh7z_evqki0Gm05_xJM6G-UqC3?e=download&uuid=04206c12-fec6-4529-a514-d961b417928b
Resolving doc-04-2c-docs.googleusercontent.com (doc-04-2c-docs.googleusercontent.com)... 173.194.202.132, 2607

In [11]:
''' hydrogen_literature'''

!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=14lZyvRFqbkDp8w6xFojjUWUNnVvSduUF' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=14lZyvRFqbkDp8w6xFojjUWUNnVvSduUF" -O dataset_hydrogen_literature && rm -rf /tmp/cookies.txt
with open('dataset_hydrogen_literature', "rb") as fh:
    df = pickle.load(fh)

--2023-03-07 15:43:27--  https://docs.google.com/uc?export=download&confirm=t&id=14lZyvRFqbkDp8w6xFojjUWUNnVvSduUF
Resolving docs.google.com (docs.google.com)... 74.125.197.102, 74.125.197.139, 74.125.197.113, ...
Connecting to docs.google.com (docs.google.com)|74.125.197.102|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://doc-10-2c-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/mp8o7gbesc0mr973a3192rrkmug4k710/1678203750000/12635936161789443610/*/14lZyvRFqbkDp8w6xFojjUWUNnVvSduUF?e=download&uuid=8ba0e639-3d4a-4273-8bf8-2a31e4d0d494 [following]
--2023-03-07 15:43:27--  https://doc-10-2c-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/mp8o7gbesc0mr973a3192rrkmug4k710/1678203750000/12635936161789443610/*/14lZyvRFqbkDp8w6xFojjUWUNnVvSduUF?e=download&uuid=8ba0e639-3d4a-4273-8bf8-2a31e4d0d494
Resolving doc-10-2c-docs.googleusercontent.com (doc-10-2c-docs.googleusercontent.com)... 173.194.202.132,

##2. Creating the topic model

In this step, we create a topic model in which some of the topics refer to values. The creation of topics that reflect values is done by means of so-called 'anchor' words. These words guide the algorithm in the creation of topics that reflect values.

Anchor words are typically words that people use to refer to (the idea of) a value, such as synonyms. After adding some anchor words and running the model, the algorithm will automatically pick up other words that refer to the value. This is because the algorithm has observed that these words are often mentionned in the same documents as the anchor words.

Finding the right anchor words is typically an iterative process, by observing the new topic model created by the algorithm. Some anchor words need to be added to ensure that some aspect of the value are not left behind (to be placed in *dict_anchor_words* in the cell below). Other words need to be removed since they do not refer to the value (in *list_rejected_words* in the cell below).

We have prefilled an number of anchor words for each value.

In [12]:
dict_anchor_words = {
"Justice and Fairness" : ["justice", "fairness", "fair", "equality", "unfair"],
"Privacy" : ["privacy", "personal data", "personal sphere", "data privacy", "privacy protection", "privacy concerns", 
             "confidentiality"],
"Cyber-security" : ["cyber", "security", "cybersecurity", "malicious", "attacks"],
"Environmnental Sustainability" : ["sustainability", "sustainable", "renewable", "durable", "durability",
                                  "sustainable development", "environmental"],
"Transparency" : ["transparency", "transparent", "transparently", "explainability", "interpretability", "explainable",
                 "opaque", "interpretable"],
"Accountability" : ["accountable", "accountability", "accountable", "traceability", "traceable"],
"Autonomy" : ["autonomy", "self-determination", "autonomy human", "personal autonomy"], 
"Democracy" : ["democracy", "democratic", "human rights", "freedom speech", "equal representation",
              "political"], 
"Reliability" : ["reliability", "reliable", "robustness", "robust", "predictability"],
"Trust" : ["trust", "trustworthy", "trustworthiness", "confidence", "honesty"],
"Well-being" : ["well being", "well-being", "wellbeing", "quality life",
               "good life", "qol", "life satisfaction", "welfare"],
"Inclusiveness" : ["inclusiveness", "inclusive", "inclusivity", "discrimination", "diversity"]
}

list_rejected_words = ["iop", "iop publishing", "publishing ltd", "publishing", "licence iop",
                       "mdpi basel", "basel switzerland", "mdpi", "basel", "licensee mdpi", "licensee", "authors licensee", 
                       "switzerland", "authors", "publishing limited", "emerald", "emerald publishing", ]

list_anchor_words_other_topics = [
        ["internet of things", "iot", "internet things", "iot devices", "things iot"],
        ["artificial intelligence", "ai", "artificial"],
]



In [39]:
number_of_topics_to_find = 20
number_of_documents_in_analysis = 200

number_of_words_per_topic = 10

'''--------------------------------------------------------------------------''' 

model_and_vectorized_data = make_anchored_topic_model(df, number_of_topics_to_find, min(number_of_documents_in_analysis, len(df)), dict_anchor_words, list_anchor_words_other_topics, list_rejected_words)
topics = report_topics(model_and_vectorized_data[0], dict_anchor_words,number_of_words_per_topic)
df_with_topics = create_df_with_topics(df, model_and_vectorized_data[0], model_and_vectorized_data[1], number_of_topics_to_find)
topics_weights = report_topics_words_and_weights(model_and_vectorized_data[0], dict_anchor_words, number_of_words_per_topic)

Topic #0 (Justice and Fairness): justice, fair, list, team, end, society, hours, door, politics, race
Topic #1 (Privacy): privacy, facebook, google, users, amazon, law, tech, information, tech companies, apple
Topic #2 (Cyber-security): security, president, attacks, washington, news, last, top, campaign, white house, john
Topic #3 (Environmnental Sustainability): environmental, sustainable, growth, traditional, technological, vision, goal, average, specific, development
Topic #4 (Transparency): transparent, eyes, laws, friends, transparency, think, fact, student, couple, several
Topic #5 (Accountability): accountable, everything, something, ll, one, sure, work, kind, course, lot
Topic #6 (Autonomy): autonomy, company, in, jobs, percent, economic, products, worth, ability, businesses
Topic #7 (Democracy): political, democracy, democratic, way, own, nothing, country, months, matter, side
Topic #8 (Reliability): robust, reliable, machine learning, machine, learning, decision making, reute

## 3. Verifying the topic model   <a name="verifying_the_topic_model"></a>

To verify whether topics sufficiently refer to values, the code hereunder can be used to evaluate whether documents indeed address the value in question.

In [40]:
for topic, words in topics_weights.items():
  print(str(topic)+": "+str(words))

Topic #0# (Justice and Fairness): {'justice': 0.333, 'fair': 0.319, 'list': 0.235, 'team': 0.222, 'end': 0.217, 'society': 0.207, 'hours': 0.184, 'door': 0.176, 'politics': 0.162, 'race': 0.158}
Topic #1# (Privacy): {'privacy': 0.369, 'facebook': 0.289, 'google': 0.215, 'users': 0.214, 'amazon': 0.188, 'law': 0.182, 'tech': 0.181, 'information': 0.168, 'tech companies': 0.158, 'apple': 0.154}
Topic #2# (Cyber-security): {'security': 0.398, 'president': 0.308, 'attacks': 0.304, 'washington': 0.288, 'news': 0.271, 'last': 0.257, 'top': 0.246, 'campaign': 0.24, 'white house': 0.226, 'john': 0.215}
Topic #3# (Environmnental Sustainability): {'environmental': 0.632, 'sustainable': 0.401, 'growth': 0.079, 'traditional': 0.072, 'technological': 0.071, 'vision': 0.07, 'goal': 0.068, 'average': 0.061, 'specific': 0.06, 'development': 0.058}
Topic #4# (Transparency): {'transparent': 0.387, 'eyes': 0.184, 'laws': 0.173, 'friends': 0.166, 'transparency': 0.157, 'think': 0.15, 'fact': 0.14, 'studen

In [41]:
topics_to_remove_int = []

def plot_top_topics_on_values(selected_value, top_topics_to_show):
  top_topics_on_values(df_with_topics, selected_value, dict_anchor_words, topics_weights, topics_to_remove_int, top_topics_to_show)

interact(plot_top_topics_on_values, top_topics_to_show = (3, 25, 1), selected_value=[*dict_anchor_words])

interactive(children=(Dropdown(description='selected_value', options=('Justice and Fairness', 'Privacy', 'Cybe…

<function __main__.plot_top_topics_on_values(selected_value, top_topics_to_show)>

In [42]:
def plot_print_sample_articles_topic(selected_value, selected_topic, show_full_text, window, size_sample):
    show_extracts = True # True, False
    df_to_evaluate = df_with_topics
    if selected_topic == "":
      selected_topic = 0
    df_to_evaluate = df_to_evaluate.loc[(df_to_evaluate[int(selected_topic)] == 1)]
    print_sample_articles_topic(df_to_evaluate, dict_anchor_words, topics, selected_value, size_sample, window, show_extracts, show_full_text)

my_interact_manual = interact_manual.options(manual_name="Plot articles on value")
my_interact_manual(plot_print_sample_articles_topic, selected_value=[*dict_anchor_words], selected_topic=widgets.Text(), size_sample =(5,20, 5), window =(5,100, 5), show_full_text = widgets.Checkbox(value=False))

interactive(children=(Dropdown(description='selected_value', options=('Justice and Fairness', 'Privacy', 'Cybe…

<function __main__.plot_print_sample_articles_topic(selected_value, selected_topic, show_full_text, window, size_sample)>

## 4. Values in different realms <a name="values_in_different_realms"></a>

ValueMonitor can be used to evaluate which values different societal groups tend to discuss.

In [43]:
def plot_values_in_different_groups(selected_dataset):
    values_in_different_groups(df_with_topics, dict_anchor_words, selected_dataset)

interact(plot_values_in_different_groups, selected_dataset = df_with_topics.groupby(['dataset']).size().index.tolist())

interactive(children=(Dropdown(description='selected_dataset', options=('ETHICS', 'NEWS', 'TECH'), value='ETHI…

<function __main__.plot_values_in_different_groups(selected_dataset)>

In [44]:
def plot_print_sample_articles_topic(selected_value, selected_dataset, show_full_text, window, size_sample):
    show_extracts = True # True, False
    df_with_topics_selected_technology_dataset = df_with_topics[df_with_topics['dataset'] == selected_dataset]
    print_sample_articles_topic(df_with_topics_selected_technology_dataset, dict_anchor_words, topics, selected_value, size_sample, window, show_extracts, show_full_text)

my_interact_manual = interact_manual.options(manual_name="Plot articles on value")
my_interact_manual(plot_print_sample_articles_topic, selected_value=[*dict_anchor_words], selected_dataset = df_with_topics.groupby(['dataset']).size().index.tolist(), size_sample =(5,20, 5), window =(5,100, 5), show_full_text = widgets.Checkbox(value=False))

interactive(children=(Dropdown(description='selected_value', options=('Justice and Fairness', 'Privacy', 'Cybe…

<function __main__.plot_print_sample_articles_topic(selected_value, selected_dataset, show_full_text, window, size_sample)>

## 5. Values over time <a name="values_over_time"></a>

The occurence of values can be traced over time.

In [51]:
def plot_create_vis_values_over_time (selected_dataset, resampling, starttime, endtime, smoothing, max_value_y):
    values_to_include_in_visualisation = []   
    resampling_dict = {"Year": "Y", "Month": "M", "Day": "D"}
    resampling = resampling_dict[resampling]
    selected_df_with_topics = df_with_topics
    if selected_dataset != "All datasets":
      selected_df_with_topics = selected_df_with_topics[selected_df_with_topics['dataset'] == selected_dataset]
    selected_df_with_topics = selected_df_with_topics.loc[(selected_df_with_topics['date'] >= dateutil.parser.parse(str(starttime))) & (selected_df_with_topics['date'] <= dateutil.parser.parse(str(endtime)))]

    create_vis_values_over_time(selected_df_with_topics, dict_anchor_words, resampling, values_to_include_in_visualisation, smoothing, max_value_y)  

my_interact_manual = interact_manual.options(manual_name="Plot values over time")
my_interact_manual(plot_create_vis_values_over_time, selected_dataset = ["All datasets"] + df_with_topics.groupby(['dataset']).size().index.tolist(), starttime =(1960,2022, 2), endtime =(1965,2022, 2), smoothing = (0.25,3, 0.25), max_value_y = (5,100, 5), resampling = ["Year", "Month", "Day"])

interactive(children=(Dropdown(description='selected_dataset', options=('All datasets', 'ETHICS', 'NEWS', 'TEC…

<function __main__.plot_create_vis_values_over_time(selected_dataset, resampling, starttime, endtime, smoothing, max_value_y)>

In [46]:
def plot_words_over_time (selected_value, selected_dataset, starttime, endtime, smoothing, max_value_y, resampling):
    list_words = []
    selected_df_with_topics = df_with_topics
    if selected_dataset != "All datasets":
      selected_df_with_topics = selected_df_with_topics.loc[(selected_df_with_topics["dataset"] == selected_dataset)]
    selected_df_with_topics = selected_df_with_topics.loc[(selected_df_with_topics['date'] >= dateutil.parser.parse(str(starttime))) & (selected_df_with_topics['date'] <= dateutil.parser.parse(str(endtime)))]
    top_words = 10
    list_words = topics[selected_value][:top_words]
    print(list_words)
    resampling_dict = {"Year": "Y", "Month": "M", "Day": "D"}
    inspect_words_over_time(df_with_topics = selected_df_with_topics, selected_value = selected_value, dict_anchor_words = dict_anchor_words, topics = topics, list_words = list_words, resampling = resampling_dict[resampling], smoothing = smoothing, max_value_y = max_value_y)

my_interact_manual = interact_manual.options(manual_name="Plot words over time")
my_interact_manual(plot_words_over_time, selected_value=[*dict_anchor_words], selected_dataset=["All datasets"] + df_with_topics.groupby(['dataset']).size().index.tolist(), starttime =(1960,2022, 2), endtime =(1965,2022, 2), smoothing = (0.1,3, 0.25), max_value_y = (5,100, 5), resampling = ["Year", "Month", "Day"])

interactive(children=(Dropdown(description='selected_value', options=('Justice and Fairness', 'Privacy', 'Cybe…

<function __main__.plot_words_over_time(selected_value, selected_dataset, smoothing, max_value_y, resampling)>

In [49]:
topics_to_remove_int = []

def plot_top_topics_over_time(selected_value, selected_dataset, starttime, endtime, top_topics_to_show, smoothing, max_value_y, resampling):
  resampling_dict = {"Year": "Y", "Month": "M", "Day": "D"}
  resampling = resampling_dict[resampling]
  df_to_evaluate = df_with_topics
  if selected_dataset != "All datasets":
    df_to_evaluate = df_to_evaluate.loc[(df_to_evaluate["dataset"] == selected_dataset)]
  df_to_evaluate = df_to_evaluate.loc[(df_to_evaluate['date'] >= dateutil.parser.parse(str(starttime))) & (df_to_evaluate['date'] <= dateutil.parser.parse(str(endtime)))]
  top_topics_on_values_over_time(df_to_evaluate, selected_value, selected_dataset, dict_anchor_words, topics_weights, top_topics_to_show, topics_to_remove_int, smoothing, max_value_y, resampling)

my_interact_manual = interact_manual.options(manual_name="Plot related topics over time")
my_interact_manual(plot_top_topics_over_time, top_topics_to_show = (3, 25, 1), selected_value=[*dict_anchor_words], selected_dataset = ["All datasets"] + df_with_topics.groupby(['dataset']).size().index.tolist(), starttime =(1960,2022, 2), endtime =(1965,2022, 2), smoothing = (0.25,3, 0.25), max_value_y = (5,100, 5), resampling = ["Year", "Month", "Day"])

interactive(children=(Dropdown(description='selected_value', options=('Justice and Fairness', 'Privacy', 'Cybe…

<function __main__.plot_top_topics_over_time(selected_value, selected_dataset, top_topics_to_show, smoothing, max_value_y, resampling)>

In [53]:
def plot_print_sample_articles_topic(selected_value, selected_dataset, selected_topic, starttime, endtime, show_full_text, window, size_sample):
    show_extracts = True # True, False
    '''--------------------------------------------------------------------------''' 
    selected_dataframe = df_with_topics
    if selected_dataset != "All datasets":
      selected_dataframe = selected_dataframe.loc[(selected_dataset["dataset"] == selected_dataset)]
    selected_dataframe = selected_dataframe.loc[(selected_dataframe['date'] >= dateutil.parser.parse(str(starttime))) & (selected_dataframe['date'] <= dateutil.parser.parse(str(endtime)))]
    if selected_topic == "":
      selected_topic = 0
    selected_dataframe = selected_dataframe[selected_dataframe[int(selected_topic)] == 1]
    print_sample_articles_topic(selected_dataframe, dict_anchor_words, topics, selected_value, size_sample, window, show_extracts, show_full_text)

my_interact_manual = interact_manual.options(manual_name="Plot articles on topic")
my_interact_manual(plot_print_sample_articles_topic, selected_value=[*dict_anchor_words], selected_dataset = ["All datasets"] + df_with_topics.groupby(['dataset']).size().index.tolist(), starttime =(1960,2022, 2), endtime =(1965,2022, 2), selected_topic=widgets.Text(), size_sample =(5,20, 5), window =(5,100, 5), show_full_text = widgets.Checkbox(value=False))


interactive(children=(Dropdown(description='selected_value', options=('Justice and Fairness', 'Privacy', 'Cybe…

<function __main__.plot_print_sample_articles_topic(selected_value, selected_dataset, selected_topic, starttime, endtime, show_full_text, window, size_sample)>

## 6. Gap assessment <a name="gap_assessment"></a>

It takes time before a good topic model is build in which topics adequately represent values. The code in the next cell can be used to import an existing topic model.

In [54]:
def plot_values_in_different_datasets():
  selected_df = df_with_topics
  values_in_different_datasets(selected_df, dict_anchor_words)

interact(plot_values_in_different_datasets)

interactive(children=(Output(),), _dom_classes=('widget-interact',))

<function __main__.plot_values_in_different_datasets()>

In [55]:
def plot_print_sample_articles_topic(selected_value, selected_dataset, show_full_text, window, size_sample):
    show_extracts = True # True, False
    '''--------------------------------------------------------------------------''' 
    selected_dataframe = df_with_topics
    if selected_dataset != "All datasets":
      selected_dataframe = selected_dataframe.loc[(selected_dataset["dataset"] == selected_dataset)]
    print_sample_articles_topic(selected_dataframe, dict_anchor_words, topics, selected_value, size_sample, window, show_extracts, show_full_text)

my_interact_manual = interact_manual.options(manual_name="Plot articles on value")
my_interact_manual(plot_print_sample_articles_topic, selected_value=[*dict_anchor_words], selected_dataset = ["All datasets"] + df_with_topics.groupby(['dataset']).size().index.tolist(), size_sample =(5,20, 5), window =(5,100, 5), show_full_text = widgets.Checkbox(value=False))


interactive(children=(Dropdown(description='selected_value', options=('Justice and Fairness', 'Privacy', 'Cybe…

<function __main__.plot_print_sample_articles_topic(selected_value, selected_dataset, show_full_text, window, size_sample)>