# ValueMonitor - Use an existing topic model

This page is a visualisation of the ValueMonitor prototype. In case you would like to use the notebook, click on the icon ‘**Run in Google Colab**’ hereunder:

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tristandewildt/ValueMonitor_Workshops/blob/main/ValueMonitor_Workshop_use_existing_model.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tristandewildt/ValueMonitor_Workshops/blob/main/ValueMonitor_Workshop_use_existing_model.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

## Table of content:
* [1. Import dataset and packages](#import_dataset_and_packages)
* [2. Overview of topics in the model](#overview_topics_in_model)
* [3. Values in different realms](#values_in_different_realms)
* [4. Values over time](#values_over_time)
* [5. Gap assessment](#gap_assessment)

## 1. Import dataset and packages  <a name="import_dataset_and_packages"></a>

### 1.1. Import packages

In this step, the dataset and relavant python packages are imported

In [16]:
pip install pandas==1.4.1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [1]:
''' Packages'''

!pip install corextopic
!pip install joblib
!pip install tabulate
!pip install simple_colors
!pip install ipyfilechooser

import os, sys, importlib
import pandas as pd
import ipywidgets as widgets
from ipywidgets import interact, interact_manual, Button
import pickle
from ipyfilechooser import FileChooser
from tkinter import Tk, filedialog
from IPython.display import clear_output, display
from google.colab import files
import nltk
import io
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('vader_lexicon')

''' Source code'''

user = "tristandewildt"
repo = "ValueMonitor_Workshops"
src_dir = "code"
pyfile_1 = "make_topic_model.py"
pyfile_2 = "create_visualisation.py"
token = "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"

if os.path.isdir(repo):
    !rm -rf {repo}

!git clone https://{token}@github.com/{user}/{repo}.git

from ValueMonitor_Workshops.code.make_topic_model import *
from ValueMonitor_Workshops.code.create_visualisation import *

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


Cloning into 'ValueMonitor_Workshops'...
remote: Enumerating objects: 174, done.[K
remote: Counting objects: 100% (121/121), done.[K
remote: Compressing objects: 100% (77/77), done.[K
remote: Total 174 (delta 71), reused 71 (delta 44), pack-reused 53[K
Receiving objects: 100% (174/174), 1.73 MiB | 5.35 MiB/s, done.
Resolving deltas: 100% (98/98), done.
--2023-03-08 12:13:34--  https://docs.google.com/uc?export=download&confirm=t&id=12_EoLJLL_wjc8n1Az3wudsvaTgA605aK
Resolving docs.google.com (docs.google.com)... 142.251.2.101, 142.251.2.113, 142.251.2.139, ...
Connecting to docs.google.com (docs.google.com)|142.251.2.101|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://doc-0c-2c-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/linmd8lfobs4l6u0t8hs5ap0ehee2bdo/1678277550000/12635936161789443610/*/12_EoLJLL_wjc8n1Az3wudsvaTgA605aK?e=download&uuid=e19446ac-0043-41c8-a009-219914d8f624 [following]
--2023-03-08 12:13:34-- 

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  combined_STOA_technologies_saved_topic_model = pickle.load(fh)


### 1.2. Import dataset

**Dataset 1: digital technologies**

This is a dataset used for a report for STOA and the  European Parliament: 'Ethical and societal
challenges of the approaching technological storm' (https://www.europarl.europa.eu/RegData/etudes/STUD/2022/729543/EPRS_STU(2022)729543_EN.pdf)

The dataset focus on  the following digital technologies: 5G/6G, AI, Robotics, Internet of Things, Augmented Reality, Virtual Reality, Blockchain, Bio-nanotechnology. 

We have created topics for the following values: Justice and Fairness, Privacy, Cyber-security, Environmental Sustainability, Transparency, Accountability, Autonomy, Democracy, Reliability, Trust, Well-being, Inclusivness

The dataset include four types of documents:
*   Technological and scientific research (scientific articles from journal with an engineering background)
*   Ethical research (scientific articles from journal on ethics)
*   News media (newspaper articles on digital technologies)
*   EU regulation on digital technologies


In [4]:
''' Digital technologies'''

!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=14W1UddxBOmJZC76NhmECqYy1wzULillW' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=14W1UddxBOmJZC76NhmECqYy1wzULillW" -O dataset_digital_technologies && rm -rf /tmp/cookies.txt
!wget -q --show-progress --no-check-certificate 'https://docs.google.com/uc?export=download&id=14WkV2Rxawiwv3ZPwqaWgJip3xgIRaFr_' -O topics_weights_digital_technologies
!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=10AY97ieHVQrHuRUVLPZuYvX54i5OmfAv' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=10AY97ieHVQrHuRUVLPZuYvX54i5OmfAv" -O model_and_vectorized_digital_technologies && rm -rf /tmp/cookies.txt
!wget -q --show-progress --no-check-certificate 'https://docs.google.com/uc?export=download&id=101AvFta6cXi1BzYIxGJhYyUXlNf3OVHp' -O info_topics_digital_technologies

with open('dataset_digital_technologies', "rb") as fh:
    df = pickle.load(fh)
with open('topics_weights_digital_technologies', "rb") as fh:
    topics_weights = pickle.load(fh)
with open('model_and_vectorized_digital_technologies', "rb") as fh:
    model_and_vectorized_data = pickle.load(fh)
with open('info_topics_digital_technologies', "rb") as fh:
    info_topics = pickle.load(fh)

topics = info_topics[0]
number_of_topics_to_find = info_topics[1]
dict_anchor_words = info_topics[2]

df_with_topics = create_df_with_topics(df, model_and_vectorized_data[0], model_and_vectorized_data[1], number_of_topics_to_find)

--2023-03-08 12:54:25--  https://docs.google.com/uc?export=download&confirm=&id=14W1UddxBOmJZC76NhmECqYy1wzULillW
Resolving docs.google.com (docs.google.com)... 142.251.2.100, 142.251.2.101, 142.251.2.139, ...
Connecting to docs.google.com (docs.google.com)|142.251.2.100|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://doc-08-2c-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/s816lv7o8uqrhbvs745c97klsg254ihu/1678280025000/12635936161789443610/*/14W1UddxBOmJZC76NhmECqYy1wzULillW?e=download&uuid=b3759af7-d472-497f-b599-9e583a283810 [following]
--2023-03-08 12:54:32--  https://doc-08-2c-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/s816lv7o8uqrhbvs745c97klsg254ihu/1678280025000/12635936161789443610/*/14W1UddxBOmJZC76NhmECqYy1wzULillW?e=download&uuid=b3759af7-d472-497f-b599-9e583a283810
Resolving doc-08-2c-docs.googleusercontent.com (doc-08-2c-docs.googleusercontent.com)... 74.125.137.132, 2607:

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  model_and_vectorized_data = pickle.load(fh)


**Dataset 2: energy transition**

This is a dataset made compare values addressed by different scientific fields on the energy transition.

The dataset has been obtained by downloading scientific articles from scopus mentionning words 'energy transition' and 'global warming'.

We have created topics for the following values: Environmental Sustainability, Safety, Economic Viability, Efficiency, Affordability.

The dataset include articles from the following scientific fields:

Environmental biology
Environmental economics
Environmental psychology
Environmental sustainability
Philosophy of sustainability
Sustainable building
Sustainable finance
Sustainable mobility

In [38]:
''' Energy Transition Literature'''

!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=14jvSWbvh7z_evqki0Gm05_xJM6G-UqC3' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=14jvSWbvh7z_evqki0Gm05_xJM6G-UqC3" -O dataset_energy_transition_literature && rm -rf /tmp/cookies.txt
!wget -q --show-progress --no-check-certificate 'https://docs.google.com/uc?export=download&id=15PkuuXw_Rw1nBJaCG6P8YP2TZNtla36M' -O topics_weights_energy_transition_literature
!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=15DADyQ254XQXywrmHCZkuyaOZHw9ByPE' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=15DADyQ254XQXywrmHCZkuyaOZHw9ByPE" -O model_and_vectorized_energy_transition_literature && rm -rf /tmp/cookies.txt
!wget -q --show-progress --no-check-certificate 'https://docs.google.com/uc?export=download&id=104zvfUlnduxgWa2tYwkpYYhHk_oRW59R' -O info_topics_energy_transition_literature

with open('dataset_energy_transition_literature', "rb") as fh:
    df = pickle.load(fh)
with open('topics_weights_energy_transition_literature', "rb") as fh:
    topics_weights = pickle.load(fh)
with open('model_and_vectorized_energy_transition_literature', "rb") as fh:
    model_and_vectorized_data = pickle.load(fh)
with open('info_topics_energy_transition_literature', "rb") as fh:
    info_topics = pickle.load(fh)

topics = info_topics[0]
number_of_topics_to_find = info_topics[1]
dict_anchor_words = info_topics[2]

df_with_topics = create_df_with_topics(df, model_and_vectorized_data[0], model_and_vectorized_data[1], number_of_topics_to_find)

--2023-03-08 13:50:58--  https://docs.google.com/uc?export=download&confirm=&id=14jvSWbvh7z_evqki0Gm05_xJM6G-UqC3
Resolving docs.google.com (docs.google.com)... 142.251.2.102, 142.251.2.101, 142.251.2.138, ...
Connecting to docs.google.com (docs.google.com)|142.251.2.102|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://doc-04-2c-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/udaonn7o8cptrfel62r6cian6v1vuq2d/1678283400000/12635936161789443610/*/14jvSWbvh7z_evqki0Gm05_xJM6G-UqC3?e=download&uuid=a2dfe460-29c1-4776-bd4d-c53712056c9b [following]
--2023-03-08 13:51:00--  https://doc-04-2c-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/udaonn7o8cptrfel62r6cian6v1vuq2d/1678283400000/12635936161789443610/*/14jvSWbvh7z_evqki0Gm05_xJM6G-UqC3?e=download&uuid=a2dfe460-29c1-4776-bd4d-c53712056c9b
Resolving doc-04-2c-docs.googleusercontent.com (doc-04-2c-docs.googleusercontent.com)... 74.125.137.132, 2607:

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  model_and_vectorized_data = pickle.load(fh)


**Dataset 3: hydrogen technology**

This is a dataset made for an ongoing paper on value change in hydrogen technology.

We have created topics for the following values: Environmental Sustainability, Safety, Economic Viability, Efficiency, Affordability.

The dataset include two types of documents:
*   Technological and scientific research (scientific articles from journal with an engineering background)
*   News media (newspaper articles on hydrogen technology)

In [22]:
''' Hydrogen technology'''

!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=14lZyvRFqbkDp8w6xFojjUWUNnVvSduUF' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=14lZyvRFqbkDp8w6xFojjUWUNnVvSduUF" -O dataset_hydrogen_technology && rm -rf /tmp/cookies.txt
!wget -q --show-progress --no-check-certificate 'https://docs.google.com/uc?export=download&id=15-J5dh50ySBM8qfGRbpgzGgZrn0CMFKI' -O topics_weights_hydrogen_technology
!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=14eKvE-fzc9355TYklJ3_g3qjKrorgO2k' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=14eKvE-fzc9355TYklJ3_g3qjKrorgO2k" -O model_and_vectorized_hydrogen_technology && rm -rf /tmp/cookies.txt
!wget -q --show-progress --no-check-certificate 'https://docs.google.com/uc?export=download&id=107OlsOeGaqkFFQRqGSP8tG1gN8oZEVQi' -O info_topics_hydrogen_technology

with open('dataset_hydrogen_technology', "rb") as fh:
    df = pickle.load(fh)
with open('topics_weights_hydrogen_technology', "rb") as fh:
    topics_weights = pickle.load(fh)
with open('model_and_vectorized_hydrogen_technology', "rb") as fh:
    model_and_vectorized_data = pickle.load(fh)
with open('info_topics_hydrogen_technology', "rb") as fh:
    info_topics = pickle.load(fh)

topics = info_topics[0]
number_of_topics_to_find = info_topics[1]
dict_anchor_words = info_topics[2]

df_with_topics = create_df_with_topics(df, model_and_vectorized_data[0], model_and_vectorized_data[1], number_of_topics_to_find)

--2023-03-08 13:16:01--  https://docs.google.com/uc?export=download&confirm=t&id=14lZyvRFqbkDp8w6xFojjUWUNnVvSduUF
Resolving docs.google.com (docs.google.com)... 142.251.2.100, 142.251.2.102, 142.251.2.139, ...
Connecting to docs.google.com (docs.google.com)|142.251.2.100|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://doc-10-2c-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/ag4d34p9l4tbluchntbjenun4tbnkk9g/1678281300000/12635936161789443610/*/14lZyvRFqbkDp8w6xFojjUWUNnVvSduUF?e=download&uuid=1539398c-e896-4a4b-97be-e2dbe60212cb [following]
--2023-03-08 13:16:01--  https://doc-10-2c-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/ag4d34p9l4tbluchntbjenun4tbnkk9g/1678281300000/12635936161789443610/*/14lZyvRFqbkDp8w6xFojjUWUNnVvSduUF?e=download&uuid=1539398c-e896-4a4b-97be-e2dbe60212cb
Resolving doc-10-2c-docs.googleusercontent.com (doc-10-2c-docs.googleusercontent.com)... 74.125.137.132, 2607

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  model_and_vectorized_data = pickle.load(fh)


## 2. Overview of topics in the model <a name="overview_topics_in_model"></a>

In [39]:
for topic, words in topics_weights.items():
  print(str(topic)+": "+str(words))

Topic #0# (Environmental Sustainability): {'sustainable': '1.355', 'sustainability': '1.006', 'sustainable development': '0.289', 'environmental sustainability': '0.028', 'climate change': '0.017', 'sustainability transitions': '0.017', 'environmental': '0.013', 'sustainable energy': '0.011', 'towards': '0.011', 'sustainable construction': '0.009'}
Topic #1# (Safety): {'safety': '0.983', 'accidents': '0.101', 'accident': '0.092', 'health safety': '0.024', 'pedestrian': '0.019', 'food safety': '0.017', 'neighborhood': '0.017', 'safe': '0.016', 'pedestrians': '0.013', 'crime': '0.012'}
Topic #2# (Economic viability): {'economic': '1.668', 'costs': '0.639', 'economic environmental': '0.02', 'environmental economic': '0.014', 'economic value': '0.012', 'economic potential': '0.008', 'elsevier economic': '0.006', 'economic activities': '0.005', 'opportunity costs': '0.005', 'results economic': '0.004'}
Topic #3# (Efficiency): {'efficiency': '2.216', 'energy efficiency': '0.442', 'energy eff

In [36]:
# could introduce here something that removes all empty topics

topics_to_remove_int = []

def plot_top_topics_on_values(selected_value, top_topics_to_show):
  top_topics_on_values(df_with_topics, selected_value, dict_anchor_words, topics_weights, topics_to_remove_int, top_topics_to_show)

interact(plot_top_topics_on_values, top_topics_to_show = (3, 25, 1), selected_value=[*dict_anchor_words])

interactive(children=(Dropdown(description='selected_value', options=('Environmental Sustainability', 'Safety'…

<function __main__.plot_top_topics_on_values(selected_value, top_topics_to_show)>

In [37]:
def plot_print_sample_articles_topic(selected_value, selected_topic, show_full_text, window, size_sample):
    show_extracts = True # True, False
    df_to_evaluate = df_with_topics
    if selected_topic == "":
      selected_topic = 0
    df_to_evaluate = df_to_evaluate.loc[(df_to_evaluate[int(selected_topic)] == 1)]
    print_sample_articles_topic(df_to_evaluate, dict_anchor_words, topics, selected_value, size_sample, window, show_extracts, show_full_text)

my_interact_manual = interact_manual.options(manual_name="Plot articles on value")
my_interact_manual(plot_print_sample_articles_topic, selected_value=[*dict_anchor_words], selected_topic=widgets.Text(), size_sample =(5,20, 5), window =(5,100, 5), show_full_text = widgets.Checkbox(value=False))

interactive(children=(Dropdown(description='selected_value', options=('Environmental Sustainability', 'Safety'…

<function __main__.plot_print_sample_articles_topic(selected_value, selected_topic, show_full_text, window, size_sample)>

## 3. Values in different realms <a name="values_in_different_realms"></a>

ValueMonitor can be used to evaluate which values different societal groups tend to discuss.

In [40]:
def plot_values_in_different_groups(selected_dataset):
    values_in_different_groups(df_with_topics, dict_anchor_words, selected_dataset)

interact(plot_values_in_different_groups, selected_dataset = df_with_topics.groupby(['dataset']).size().index.tolist())

interactive(children=(Dropdown(description='selected_dataset', options=('Environmental biology', 'Environmenta…

<function __main__.plot_values_in_different_groups(selected_dataset)>

In [27]:
def plot_print_sample_articles_topic(selected_value, selected_dataset, show_full_text, window, size_sample):
    show_extracts = True # True, False
    df_with_topics_selected_technology_dataset = df_with_topics[df_with_topics['dataset'] == selected_dataset]
    print_sample_articles_topic(df_with_topics_selected_technology_dataset, dict_anchor_words, topics, selected_value, size_sample, window, show_extracts, show_full_text)

my_interact_manual = interact_manual.options(manual_name="Plot articles on value")
my_interact_manual(plot_print_sample_articles_topic, selected_value=[*dict_anchor_words], selected_dataset = df_with_topics.groupby(['dataset']).size().index.tolist(), size_sample =(5,20, 5), window =(5,100, 5), show_full_text = widgets.Checkbox(value=False))

interactive(children=(Dropdown(description='selected_value', options=('Environmental Sustainability', 'Safety'…

<function __main__.plot_print_sample_articles_topic(selected_value, selected_dataset, show_full_text, window, size_sample)>

## 4. Values over time <a name="values_over_time"></a>

The occurence of values can be traced over time.

In [29]:
def plot_create_vis_values_over_time (selected_dataset, resampling, starttime, endtime, smoothing, max_value_y):
    values_to_include_in_visualisation = []   
    resampling_dict = {"Year": "Y", "Month": "M", "Day": "D"}
    resampling = resampling_dict[resampling]
    selected_df_with_topics = df_with_topics
    if selected_dataset != "All datasets":
      selected_df_with_topics = selected_df_with_topics[selected_df_with_topics['dataset'] == selected_dataset]
    selected_df_with_topics = selected_df_with_topics.loc[(selected_df_with_topics['date'] >= dateutil.parser.parse(str(starttime))) & (selected_df_with_topics['date'] <= dateutil.parser.parse(str(endtime)))]

    create_vis_values_over_time(selected_df_with_topics, dict_anchor_words, resampling, values_to_include_in_visualisation, smoothing, max_value_y)  

my_interact_manual = interact_manual.options(manual_name="Plot values over time")
my_interact_manual(plot_create_vis_values_over_time, selected_dataset = ["All datasets"] + df_with_topics.groupby(['dataset']).size().index.tolist(), starttime =(1960,2022, 2), endtime =(1965,2022, 2), smoothing = (0.25,3, 0.25), max_value_y = (5,100, 5), resampling = ["Year", "Month", "Day"])

interactive(children=(Dropdown(description='selected_dataset', options=('All datasets', 'News', 'Scientific li…

<function __main__.plot_create_vis_values_over_time(selected_dataset, resampling, starttime, endtime, smoothing, max_value_y)>

In [30]:
def plot_words_over_time (selected_value, selected_dataset, starttime, endtime, smoothing, max_value_y, resampling):
    list_words = []
    selected_df_with_topics = df_with_topics
    if selected_dataset != "All datasets":
      selected_df_with_topics = selected_df_with_topics.loc[(selected_df_with_topics["dataset"] == selected_dataset)]
    selected_df_with_topics = selected_df_with_topics.loc[(selected_df_with_topics['date'] >= dateutil.parser.parse(str(starttime))) & (selected_df_with_topics['date'] <= dateutil.parser.parse(str(endtime)))]
    top_words = 10
    list_words = topics[selected_value][:top_words]
    print(list_words)
    resampling_dict = {"Year": "Y", "Month": "M", "Day": "D"}
    inspect_words_over_time(df_with_topics = selected_df_with_topics, selected_value = selected_value, dict_anchor_words = dict_anchor_words, topics = topics, list_words = list_words, resampling = resampling_dict[resampling], smoothing = smoothing, max_value_y = max_value_y)

my_interact_manual = interact_manual.options(manual_name="Plot words over time")
my_interact_manual(plot_words_over_time, selected_value=[*dict_anchor_words], selected_dataset=["All datasets"] + df_with_topics.groupby(['dataset']).size().index.tolist(), starttime =(1960,2022, 2), endtime =(1965,2022, 2), smoothing = (0.1,3, 0.25), max_value_y = (5,100, 5), resampling = ["Year", "Month", "Day"])

interactive(children=(Dropdown(description='selected_value', options=('Environmental Sustainability', 'Safety'…

<function __main__.plot_words_over_time(selected_value, selected_dataset, starttime, endtime, smoothing, max_value_y, resampling)>

In [31]:
topics_to_remove_int = []

def plot_top_topics_over_time(selected_value, selected_dataset, starttime, endtime, top_topics_to_show, smoothing, max_value_y, resampling):
  resampling_dict = {"Year": "Y", "Month": "M", "Day": "D"}
  resampling = resampling_dict[resampling]
  df_to_evaluate = df_with_topics
  if selected_dataset != "All datasets":
    df_to_evaluate = df_to_evaluate.loc[(df_to_evaluate["dataset"] == selected_dataset)]
  df_to_evaluate = df_to_evaluate.loc[(df_to_evaluate['date'] >= dateutil.parser.parse(str(starttime))) & (df_to_evaluate['date'] <= dateutil.parser.parse(str(endtime)))]
  top_topics_on_values_over_time(df_to_evaluate, selected_value, selected_dataset, dict_anchor_words, topics_weights, top_topics_to_show, topics_to_remove_int, smoothing, max_value_y, resampling)

my_interact_manual = interact_manual.options(manual_name="Plot related topics over time")
my_interact_manual(plot_top_topics_over_time, top_topics_to_show = (3, 25, 1), selected_value=[*dict_anchor_words], selected_dataset = ["All datasets"] + df_with_topics.groupby(['dataset']).size().index.tolist(), starttime =(1960,2022, 2), endtime =(1965,2022, 2), smoothing = (0.25,3, 0.25), max_value_y = (5,100, 5), resampling = ["Year", "Month", "Day"])

interactive(children=(Dropdown(description='selected_value', options=('Environmental Sustainability', 'Safety'…

<function __main__.plot_top_topics_over_time(selected_value, selected_dataset, starttime, endtime, top_topics_to_show, smoothing, max_value_y, resampling)>

In [32]:
def plot_print_sample_articles_topic(selected_value, selected_dataset, selected_topic, starttime, endtime, show_full_text, window, size_sample):
    show_extracts = True # True, False
    '''--------------------------------------------------------------------------''' 
    selected_dataframe = df_with_topics
    if selected_dataset != "All datasets":
      selected_dataframe = selected_dataframe.loc[(selected_dataset["dataset"] == selected_dataset)]
    selected_dataframe = selected_dataframe.loc[(selected_dataframe['date'] >= dateutil.parser.parse(str(starttime))) & (selected_dataframe['date'] <= dateutil.parser.parse(str(endtime)))]
    if selected_topic == "":
      selected_topic = 0
    selected_dataframe = selected_dataframe[selected_dataframe[int(selected_topic)] == 1]
    print_sample_articles_topic(selected_dataframe, dict_anchor_words, topics, selected_value, size_sample, window, show_extracts, show_full_text)

my_interact_manual = interact_manual.options(manual_name="Plot articles on topic")
my_interact_manual(plot_print_sample_articles_topic, selected_value=[*dict_anchor_words], selected_dataset = ["All datasets"] + df_with_topics.groupby(['dataset']).size().index.tolist(), starttime =(1960,2022, 2), endtime =(1965,2022, 2), selected_topic=widgets.Text(), size_sample =(5,20, 5), window =(5,100, 5), show_full_text = widgets.Checkbox(value=False))


interactive(children=(Dropdown(description='selected_value', options=('Environmental Sustainability', 'Safety'…

<function __main__.plot_print_sample_articles_topic(selected_value, selected_dataset, selected_topic, starttime, endtime, show_full_text, window, size_sample)>

## 5. Gap assessment <a name="gap_assessment"></a>

It takes time before a good topic model is build in which topics adequately represent values. The code in the next cell can be used to import an existing topic model.

In [33]:
def plot_values_in_different_datasets():
  selected_df = df_with_topics
  values_in_different_datasets(selected_df, dict_anchor_words)

interact(plot_values_in_different_datasets)

interactive(children=(Output(),), _dom_classes=('widget-interact',))

<function __main__.plot_values_in_different_datasets()>

In [34]:
def plot_print_sample_articles_topic(selected_value, selected_dataset, show_full_text, window, size_sample):
    show_extracts = True # True, False
    '''--------------------------------------------------------------------------''' 
    selected_dataframe = df_with_topics
    if selected_dataset != "All datasets":
      selected_dataframe = selected_dataframe.loc[(selected_dataset["dataset"] == selected_dataset)]
    print_sample_articles_topic(selected_dataframe, dict_anchor_words, topics, selected_value, size_sample, window, show_extracts, show_full_text)

my_interact_manual = interact_manual.options(manual_name="Plot articles on value")
my_interact_manual(plot_print_sample_articles_topic, selected_value=[*dict_anchor_words], selected_dataset = ["All datasets"] + df_with_topics.groupby(['dataset']).size().index.tolist(), size_sample =(5,20, 5), window =(5,100, 5), show_full_text = widgets.Checkbox(value=False))


interactive(children=(Dropdown(description='selected_value', options=('Environmental Sustainability', 'Safety'…

<function __main__.plot_print_sample_articles_topic(selected_value, selected_dataset, show_full_text, window, size_sample)>