# Batch Create Dendrograms

This notebook allows you to perform hierarchical cluster analysis on multiple models with multiple clustering options. The output is an HTML index file which allows you to display the generated cluster analyses as dendrograms.

The last (optional) cell in this notebook allows you to generate standalone HTML files for a list of already-generated dendrograms. 

## Setup

In [None]:
# Python imports
import os
from pathlib import Path
from IPython.display import display, HTML

# Define paths
current_dir                    = %pwd
project_dir                    = str(Path(current_dir).parent.parent)
project_dirname                = project_dir.split('/')[-1]
current_reldir                 = current_dir.split("/write/")[1]
model_dir                      = project_dir + '/' + 'project_data/models'
partials_path                  = os.path.join(current_dir, 'partials')
scripts_path                   = 'scripts/batch_cluster.py'
config_path                    =  project_dir + '/config/config.py'

# Import scripts
%run {config_path}
%run {scripts_path}

display(HTML('<p style="color: green;">Setup complete.</p>'))

## Configuration

Provide a list of all models you wish to cluster and the distance metrics and linkage methods you wish to apply to each of the models.

Set `models = []` if you wish to cluster all the models available in our project. Otherwise, provide a list of the folder names for each model you wish to cluster.

Available distance metrics are 'euclidean' and 'cosine'.

Available linkage methods are 'average', 'single', 'complete', and 'ward'.

Note that a number of advanced configuration options are available. These are detailed in the <a href="README.md" target="_blank">README</a> file. If you wish to use advanced configurations, add them directly to the `BatchCluster()` call in the **Cluster** cell.

In [None]:
# Configuration
models              = [] # E.g. ['topics25', 'topics50']  
distance_metrics    = [] # E.g. ['euclidean']
linkage_methods     = ['average', 'single', 'complete', 'ward']
orientation         ='bottom' # Can be changed to 'top', 'left', or 'right'
height              = 600 # In pixels
width               = 1200 # In pixels

display(HTML('<p style="color: green;">Configuration complete.</p>'))

## Cluster

Begin the cluster analysis by running the cell below.

In [None]:
# Run the batch cluster
batch_cluster = BatchCluster(models, project_dir, model_dir, partials_path, distance_metrics, linkage_methods,
                             orientation=orientation, height=height, width=width, WRITE_DIR=WRITE_DIR, PORT=PORT)

## Create Standalone Dendrograms (Optional)

Run the cell below if you wish to create standalone versions of any of the dendrograms you have already created. They will be saved into your project's Dendgrogram module folder. The dendrograms can be downloaded and will work locally, as long as you have an internet connection.

### Configuration

Choose dendrograms to create (e.g. `topics50-euclidean-average`). By default, the dendrogram files will begin with "standalone_". You can modify this by changing the `prefix` variable below. If you do not wish to have a prefix, change it to `None`.

In [None]:
# Configuration
dendrograms = [] # E.g. ['topics25-euclidean-average', 'topics50-euclidean-average']
prefix      = 'standalone_'

display(HTML('<p style="color: green;">Configuration complete.</p>'))

### Create the Dendrogram(s)

In [None]:
# Python imports
import os
from pathlib import Path

# Define paths
current_dir                    = %pwd
project_dir                    = str(Path(current_dir).parent.parent)
model_dir                      = project_dir + '/' + 'project_data/models'
scripts_path                   = 'scripts/standalone.py'

# Import scripts
%run {scripts_path}

# Generate the dendrogram(s)
create_standalone(dendrograms, partials_path, model_dir, file_prefix=prefix)