## Text Analysis - Topic Modelling
### <span style='color: green'>SETUP </span> Prepare and Setup Notebook <span style='float: right; color: red'>MANDATORY</span>

In [1]:
import sys, os

root_folder = os.path.abspath(os.path.join(globals()['_dh'][-1], "../../.."))

sys.path = [ root_folder ] + sys.path

#from beakerx import *
#from beakerx.object import beakerx

from IPython.display import display
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import westac.notebooks.political_in_newspapers.corpus_data as corpus_data
import bokeh.plotting

%matplotlib inline

bokeh.plotting.output_notebook()

## <span style='color: green;'>MODEL</span> Load Corpus and Topic Model<span style='color: red; float: right'>MANDATORY</span>

### <span style='color: green'>PREPARE</span> Load the Corpus <span style='float: right; color: red'>MANDATORY</span>


https://github.com/anmolkapoor/explore-arxiv-using-lda-gensim-topic-modelling/blob/e919dda06d738eac39461fb7cc8b0b3b3f298f1d/topic_modeling/lda_topic_model.ipynb


In [2]:
import westac.notebooks.political_in_newspapers.corpus_data as corpus_data

corpus_folder = os.path.join(root_folder, "data/textblock_politisk")

g_corpus, documents, id2token = corpus_data.load_as_gensim_sparse_corpus(corpus_folder)
# TODO Check that g_corpus isn't transposed (parse2Corpus(v_dtm, documents_columns=False))

### <span style='color: green'>PREPARE</span> Load Topic Models <span style='float: right; color: red'>MANDATORY</span>

In [3]:
import westac.notebooks.political_in_newspapers.load_topic_model_gui as load_gui
import text_analytic_tools.text_analysis.topic_model_container as topic_model_container
import importlib
_ = importlib.reload(load_gui)

current_state            = lambda: topic_model_container.TopicModelContainer.singleton()
current_data             = lambda: current_state().data
current_topic_model      = lambda: current_state().topic_model

load_gui.display_gui(corpus_folder, current_state(), documents)
#load_gui.load_model(corpus_folder, current_state(), 'test.4days')


VBox(children=(HBox(children=(Dropdown(description='Model', layout=Layout(width='40%'), options=('test.5days',…

## <span style='color: green;'>VISUALIZE</span> Display Topic's Word Distribution as a Wordcloud<span style='color: red; float: right'>TRY IT</span>

In [11]:
import westac.notebooks.political_in_newspapers.topic_wordcloud_gui as wordcloud_gui

try:
    wordcloud_gui.display_gui(current_state())
except Exception as ex:
    print(ex)

VBox(children=(HTML(value="<span class='tx02'></span>", placeholder=''), HBox(children=(Button(description='<<…

## <span style='color: green;'>VISUALIZE</span> Topic-Word Distribution<span style='color: red; float: right'>TRY IT</span>

FIXME: Number of topics as specified in compute is not relevant for all topics. state.num_topics is to high for these models wich gives an error.*


In [8]:
import westac.notebooks.political_in_newspapers.topic_word_distribution_gui as topic_word_distribution_gui

try:
    topic_word_distribution_gui.display_gui(current_state())
    #topic_word_distribution_gui.display_topic_tokens(current_state(), topic_id=0, n_words=100, output_format='Chart')
except Exception as ex:
    print(ex)

VBox(children=(HTML(value="<span class='wc01'></span>", placeholder=''), HBox(children=(Button(description='<<…

## <span style='color: green;'>VISUALIZE</span> Topic Trends over Time<span style='color: red; float: right'>RUN</span>

In [17]:
import westac.notebooks.political_in_newspapers.topic_trends_gui as trends_gui
import importlib
_ = importlib.reload(trends_gui)

try:
    trends_gui.display_gui(current_state())
    # trends_gui.display_topic_trend(current_state().compiled_data.document_topic_weights, topic_id=0, year=None, year_aggregate='mean', output_format='Table')
except Exception as ex:
    print(ex)

VBox(children=(HBox(children=(VBox(children=(HBox(children=(Button(description='<<', style=ButtonStyle(button_…

## <span style='color: green;'>VISUALIZE</span> Publication Topic Network<span style='color: red; float: right'>TRY IT</span>
The green nodes are documents, and blue nodes are topics. The edges (lines) indicates the strength of a topic in the connected document. The width of the edge is proportinal to the strength of the connection. Note that only edges with a strength above the certain threshold are displayed.

In [4]:
import westac.notebooks.political_in_newspapers.publication_topic_network_gui as publication_topic_network_gui
import importlib
_ = importlib.reload(publication_topic_network_gui)

# trends_gui.display_gui(current_state())
publication_topic_network_gui.display_gui(current_state())


VBox(children=(HBox(children=(VBox(children=(Dropdown(description='Layout', index=2, layout=Layout(width='250p…

## <span style='color: green;'>VISUALIZE</span> Topic Trends Overview<span style='color: red; float: right'>TRY IT</span>

- The topic shares  displayed as a scattered heatmap plot using gradient color based on topic's weight in document.
- [Stanford’s Termite software](http://vis.stanford.edu/papers/termite) uses a similar visualization.

In [6]:
import westac.notebooks.political_in_newspapers.topic_trends_overview_gui as overview_gui
import importlib
_ = importlib.reload(overview_gui)

overview_gui.display_gui(current_state())


VBox(children=(HBox(children=(Dropdown(description='Year', disabled=True, layout=Layout(width='160px'), option…

## <span style='color: green;'>VISUALIZE</span> Topic Cooccurrence<span style='color: red; float: right'>TRY IT</span>

Computes weighted graph of topics co-occurring in the same document. Topics are defined as co-occurring if they both exists  in the same document both having weights above threshold. Weight are number of co-occurrences (binary yes or no). Node size reflects topic proportions over the entire corpus (normalized document) length, and are computed in accordance to how node sizes are computed in LDAvis.

In [4]:
import westac.notebooks.political_in_newspapers.topic_co_occurrence_gui as co_occurrence_gui
import importlib
_ = importlib.reload(co_occurrence_gui)
   
try:
    co_occurrence_gui.display_gui(current_state(), documents)
except Exception as ex:
    print(ex)

VBox(children=(HTML(value="<span class='cooc_id'></span>", placeholder=''), HBox(children=(VBox(children=(Drop…