## Text Analysis - Topic Modelling
### <span style='color: green'>SETUP </span> Prepare and Setup Notebook <span style='float: right; color: red'>MANDATORY</span>

In [None]:

import __paths__  # pylint: disable=unused-import
import os
from typing import Callable

import bokeh.plotting
import penelope.notebook.topic_modelling as ntm
from IPython.display import display
from penelope import utility as pu
from penelope.pipeline.config import CorpusConfig

bokeh.plotting.output_notebook()
pu.set_default_options()

current_state: Callable[[], ntm.TopicModelContainer] = ntm.TopicModelContainer.singleton
corpus_folder: str = "/data/westac/sou_kb_labb"



### <span style='color: green'>PREPARE</span> Load Topic Model <span style='float: right; color: red'>MANDATORY</span>

In [None]:
load_gui: ntm.LoadGUI = ntm.LoadGUI(corpus_folder=corpus_folder, state=current_state(), slim=True).setup()
display(load_gui.layout())

### <span style='color: green;'>BROWSE</span> Find topics by token<span style='color: red; float: right'>TRY IT</span>

Displays topics in which given token is among toplist of dominant words.

In [None]:
find_gui = ntm.topic_documents_gui.FindTopicDocumentsGUI(state=current_state()).setup()
display(find_gui.layout())

### <span style='color: green;'>BROWSE</span> Browse Topic Documents<span style='color: red; float: right'>TRY IT</span>

Displays documents in which a topic occurs above a given threshold.

In [None]:
browse_gui = ntm.topic_documents_gui.BrowseTopicDocumentsGUI(state=current_state()).setup()
display(browse_gui.layout())

### <span style='color: green;'>VISUALIZE</span> Display Topic's Word Distribution as a Wordcloud<span style='color: red; float: right'> TRY IT</span>

In [None]:
ntm.display_topic_wordcloud_gui(current_state())

### <span style='color: green;'>VISUALIZE</span> Topic-Word Distribution<span style='color: red; float: right'>TRY IT</span>


In [None]:
ntm.display_topic_word_distribution_gui(current_state())

### <span style='color: green;'>VISUALIZE</span> Topic Trends over Time<span style='color: red; float: right'>RUN</span>

In [None]:
ntm.display_topic_trends_gui(current_state())

### <span style='color: green;'>VISUALIZE</span> Topic Trends Overview<span style='color: red; float: right'>TRY IT</span>

- The topic shares  displayed as a scattered heatmap plot using gradient color based on topic's weight in document.
- [Stanford’s Termite software](http://vis.stanford.edu/papers/termite) uses a similar visualization.

In [None]:
ntm.display_topic_trends_overview_gui(current_state())

### <span style='color: green;'>VISUALIZE</span> Topic Topic Network<span style='color: red; float: right'>TRY IT</span>

Computes weighted graph of topics co-occurring in the same document. Topics are defined as co-occurring in a document if they both have a weight above given threshold. The edge weights are the number of co-occurrences (binary yes or no). Node size reflects topic proportions over the entire corpus computed in accordance to LDAvis topic proportions.

In [None]:
ntm.display_topic_topic_network_gui(current_state())

### <span style='color: green;'>VISUALIZE</span> Document Topic Network<span style='color: red; float: right'>TRY IT</span>


In [None]:
doc_topic_gui = ntm.DefaultTopicDocumentNetworkGui(pivot_key_specs={}, state=current_state()).setup()
display(doc_topic_gui.layout())

### <span style='color: green;'>VISUALIZE</span> Focus-Topic Document Network<span style='color: red; float: right'>TRY IT</span>


In [None]:
display(ntm.FocusTopicDocumentNetworkGui(pivot_key_specs={}, state=current_state()).setup().layout())

### <span style='color: green;'>VISUALIZE</span> Topic-Token  Network<span style='color: red; float: right'>TRY IT</span>

In [None]:
corpus_folder: str = "/data/westac/sou_kb_labb"
custom_styles = {'edges': {'curve-style': 'haystack'}}
w = ntm.create_topics_token_network_gui(data_folder=corpus_folder, custom_styles=custom_styles)
display(w.layout())