# Create Topic Bubbles Visualization

This notebook will create a Topic Bubbles visualization from Dfr-browser data generated in the `dfr_browser` module or from model data generated using the `topic_modeling` module. A full user guide is available for this notebook in the module's <a href="README.md" target="_blank">README</a> file. This notebook uses code originally written by Sihwa Park for his Topic Bubbles visualization. See <a href="https://github.com/sihwapark/topic-bubbles" target="_blank">Park's original code</a> for documentation.

## Info

__authors__    = 'Sihwa Park, Jeremy Douglass, Scott Kleinman, Lindsay Thomas'  
__copyright__ = 'copyright 2019, The WE1S Project'  
__license__   = 'GPL'  
__version__   = '2.0'  
__email__     = 'lindsaythomas@miami.edu'

## Settings

Every time you open this notebook, you must run the below cell before running anything else.

In [None]:
# Python imports
import os
import json
from IPython.display import display, HTML
from pathlib import Path

# Define paths
current_dir                = %pwd
project_dir                = str(Path(current_dir).parent.parent)
tb_scripts_dir             = current_dir + '/tb_scripts'
dfrbrowser_dir             = project_dir + '/modules/dfr_browser'
prepare_data_script        = dfrbrowser_dir + '/dfrb_scripts/bin/prepare-data'
data_dir                   = project_dir + '/project_data'
model_dir                  = data_dir + '/models'
json_dir                   = project_dir + '/project_data/json'
metadata_dir               = data_dir + '/metadata'
metadata_csv_file          = metadata_dir + '/metadata-dfrb.csv'
metadata_file_reorder      = metadata_dir + '/metadata-dfrb.csv'
browser_meta_file_temp     = metadata_dir + '/meta.temp.csv'
browser_meta_file          = metadata_dir + '/meta.csv'

# Create data/config.json, which tells topic bubbles where to find json docs
tb_config_data             = { 'json_cache_path': '../../../project_data/' }
tb_config_path             = current_dir + '/tb_scripts/data/config.json'
with open(tb_config_path, 'w') as outfile:
    json.dump(tb_config_data, outfile)

# Load required scripts
%run {project_dir}/config/config.py
%run scripts/create_topic_bubbles.py
%run scripts/create_dfrbrowser.py

# Display the project directory
display(HTML('<p style="color: green;">Setup complete.</p>'))

## Create Topic Bubbles Using Dfr-Browser Metadata

If you ran the `dfr-browser` module to create dfr-browser visualizations for your models, the next cells will import data produced via that module into the `topic_bubbles` module.

If you did not run the `dfr_browser` module, skip to the next section: **Create Topic Bubbles without Dfr-Browser Metadata**.

By default, this notebook is set to create Topic Bubbles visualizations the same models you produced Dfr-browsers for in the `dfr_browser` module. This means that the `selection` variable below is set to its default value of `All` (`selection = 'All'`). If you would like to select only certain models to produce Topic Bubbles visualizations, make those selections in the next cell. Otherwise leave the value in the next cell set to `All`.

**To produce topic bubbles for a selection of models:** Navigate to the `modules/dfr_browser` directory and look for subdirectories titled `topicsn1`, where `n1` is the number of topics you chose to model. You should see a subdirectory for each browser you produced. To choose which subdirectory/ies you would like to produce, change the value of `selection` in the cell below to a list of subdirectory names. For example, if you wanted to produce browsers for only the 50- and 75-topic models you created, change the value of `selection` below to `selection = ['topics50', 'topics75']`.

### Select Models for Which You Would Like to Create Visualizations

You must run this cell regardless of whether you change anything.

In [None]:
# Configure Model Selection
selection = 'All' # Or e.g. ['topics50', 'topics75']

display(HTML('<p style="color: green;"> Visualization selection configured.</p>'))

Get names of model subdirectories to visualize and their state files. Create topic bubbles visualizations for selected models.

In [None]:
# Create the visualizations
subdir_list = create_topicbubbles_dfrbrowser(selection, current_dir, dfrbrowser_dir, tb_scripts_dir)

# Display links to the visualizations
display_links(project_dir, subdir_list, WRITE_DIR, PORT)

## Create Topic Bubbles without Dfr-Browser Metadata

If you have not yet created Dfr-browsers for this project, run the following cells to create your Topic Bubbles visualization.

### Create Metadata Files from JSON Files

The Topic Bubbles visualization relies on data files produced for Andrew Goldstone's dfr-browser (see the `dfr_browser` module). The cells in this section will create the Dfr-Browser files necessary to produce Topic Bubbles visualizations. 

The below cell opens up each file in your project's `json` directory and grabs required metadata information to produce the `metadata_csv_file` file and `browser_meta_file_temp` files. The cell prints output from Goldstone's `prepare_data.py` script to the notebook cell.

<p style="color:red;">Important: Running this code will delete old metadata files and create an entirely new <code>metadata</code> folder within <code>project_data</code> directory.</p>

In [None]:
dfrb_metadata(metadata_dir, metadata_csv_file, browser_meta_file_temp, browser_meta_file, json_dir)

## Create Files Needed for Topic Bubbles

By default, this notebook is set to create Topic Bubbles visualizations the same models you produced Dfr-browsers for in the `dfr_browser` module. This means that the `selection` variable below is set to its default value of `All` (`selection = 'All'`). If you would like to select only certain models to produce Topic Bubbles visualizations, make those selections in the next cell. Otherwise leave the value in the next cell set to `All`

**To produce topic bubbles for a selection of models:** Navigate to the `modules/dfr_browser` directory and look for subdirectories titled `topicsn1`, where `n1` is the number of topics you chose to model. You should see a subdirectory for each browser you produced. To choose which subdirectory/ies you would like to produce, change the value of `selection` in the cell below to a list of subdirectory names. For example, if you wanted to produce browsers for only the 50- and 75-topic models you created, change the value of `selection` below to `selection = ['topics50', 'topics75']`.

### Select Models for Which You Would Like to Create Visualizations

You must run this cell regardless of whether you change anything.

In [None]:
# Configure Model Selection
selection = 'All' # Or e.g. ['topics50', 'topics75']

display(HTML('<p style="color: green;"> Visualization selection configured.</p>'))

Get names of model subdirectories to visualize and their state and scaled files. You can set values for `subdir_list`, `state_file_list`, and `scaled_file_list` manually in the cell below the next one.

In [None]:
subdir_list, state_file_list, scaled_file_list = get_model_state(selection, model_dir)

Optionally, set values manually (this cell does not need to be run if you have run the previous cell).

In [None]:
# subdir_list = []
# state_file_list = []
# scaled_file_list = []

Create and move files needed for topic bubbles, using model state and scaled files for all selected models. Prints output from Goldstone's `prepare_data.py` script to the notebook cell.

In [None]:
# Create the visualizations
create_topic_bubbles(subdir_list, state_file_list, scaled_file_list, current_dir, tb_scripts_dir, 
                     prepare_data_script, browser_meta_file)

# Display links to the visualizations
display_links(project_dir, subdir_list, WRITE_DIR, PORT)

## Create Zipped Copies of Your Visualizations (Optional)

By default, visualizations for all available models will be zipped. If you wish to zip only one model, change the `models` setting to indicate the name of the model folder (e.g. `'topics25'`). If you wish to zip more than one model, but not all, provide a list in square brackets (e.g. `['topics25', 'topics50']`).

In [None]:
# Configuration
models = 'All' # You can also select models with vaues like 'topics25' or ['topics25', 'topics50']

# Zip the models
%run scripts/zip.py
zip(models)