# Vectorization examples

The goal of this notebook is to use the tools in the Tutte Institute [``vectorizers``](https://github.com/TutteInstitute/vectorizers) and [``thisnotthat``](https://github.com/TutteInstitute/thisnotthat) libraries to construct embeddings and explore interactively the results.

### Setup

To create the environment:

* mamba env create -f vectorization-simple.yml
* conda activate vectorization-simple
* python -m ipykernel install --user --name=vectorization-simple

or

* conda create -n vectorization-simple numba datashader jupyter ipykernel
* conda activate vectorization-simple
* pip install thisnotthat seaborn
* python -m ipykernel install --user --name=vectorization-simple

In [60]:
import thisnotthat as tnt
import panel as pn

import numpy as np
import pandas as pd
import matplotlib
import seaborn as sns
import csv

In [61]:
import scipy.sparse
import vectorizers
import vectorizers.transformers
import umap
from scipy.sparse import vstack
import warnings
      
warnings.simplefilter("ignore")
sns.set()

In [62]:
from IPython.display import display, HTML 
display(HTML("<style>.container { width:100% !important; }</style>"))

In [4]:
pn.extension()

# Data

In order to use this notebook as is, you need your data to be a list of lists of tokens and to determine if the order of your tokens is informative or not.

## What are my tokens?

This might be obvious or not from the application. **Reach out if you have questions**. 
* You wish to vectorize text? Your tokens are words. A text is a ordered list of tokens.
* Vectorize recipes? Tokens are ingredients. A recipe is a unordered list of tokens.
* Vectorize strings? Tokens are characters. 
* Vectorize sessions? Tokens might be commands. 
* Vectorize config files? We have to think. What do you want to capture from the config files? 

### Recipe: data preparation

We will make use of the recipe dataset. It consists of 39,774 recipes (hyperedges) that are sets of vertices (6,714 ingredients total). The largest recipe has 65 ingredients (must be good!). Each recipe is assigned to a country (edge label), 20 countries total. The data and some work done with it can be found here:

* https://arxiv.org/pdf/1910.09943.pdf
* https://www.cs.cornell.edu/~arb/data/cat-edge-Cooking/

This function 
* reads the data
* keeps only the recipes containing at least 3 ingredients (after this pruning we are left with 6,714 ingredients and 39,559 recipes)
* chooses a country color mapping that respects countries' proximities, or continent - nearby countries are assigned to similar colors. This is to help with the eye-ball evaluation of the visualization and make it more pleasant.

In [5]:
data_folder = '../data/cat-edge-Cooking/'

In [6]:
def read_format_recipes(recipe_min_size=3, data_folder=data_folder):
    ingredients_id = pd.read_csv(f'{data_folder}node-labels.txt', sep='\t', header=None)
    ingredients_id.index = [x+1 for x in ingredients_id.index]
    ingredients_id.columns = ['Ingredient']
    
    recipes_with_id = []
    with open(f'{data_folder}hyperedges.txt', newline = '') as hyperedges:
        hyperedge_reader = csv.reader(hyperedges, delimiter='\t')
        for hyperedge in hyperedge_reader:
            recipes_with_id.append(hyperedge)
            
    recipes_all = [[ingredients_id.loc[int(i)]['Ingredient'] for i in x] for x in recipes_with_id]
    
    # Keep recipes with 3 ingredients and more
    keep_recipes = np.where(np.array([len(x) for x in recipes_all])>=recipe_min_size)[0]
    recipes = [recipes_all[i] for i in keep_recipes]
    
    recipes_label_id_all = pd.read_csv(f'{data_folder}hyperedge-labels.txt', sep='\t', header=None)
    recipes_label_id_all.columns = ['label']
    recipes_label_id = recipes_label_id_all.iloc[keep_recipes].reset_index()

    label_name = pd.read_csv(f'{data_folder}hyperedge-label-identities.txt', sep='\t', header=None)
    label_name.columns = ['country']
    label_name.index = [x+1 for x in label_name.index]
    
    grps_tmp = {
        'asian' : ('chinese', 'filipino', 'japanese','korean', 'thai', 'vietnamese'),
        'american' : ('brazilian', 'mexican', 'southern_us'),
        'english' : ('british', 'irish'),
        'islands' : ('cajun_creole', 'jamaican'),
        'europe' : ('french', 'italian', 'spanish'),
        'others' : ('greek', 'indian', 'moroccan', 'russian')
    }

    grps = {key:[key+'.'+x for x in value] for key, value in grps_tmp.items()}


    color_key = {}
    for l, c in zip(grps['asian'], sns.color_palette("Blues", 6)[0:]):
        color_key[l] = matplotlib.colors.rgb2hex(c)
    for l, c in zip(grps['american'], sns.color_palette("Purples", 4)[1:]):
        color_key[l] = matplotlib.colors.rgb2hex(c)
    for l, c in zip(grps['others'], sns.color_palette("YlOrRd", 4)):
        color_key[l] = matplotlib.colors.rgb2hex(c)
    for l, c in zip(grps['europe'], sns.color_palette("light:teal", 4)[1:]):
        color_key[l] = matplotlib.colors.rgb2hex(c)
    for l, c in zip(grps['islands'], sns.color_palette("light:#660033", 4)[1:3]):
        color_key[l] = matplotlib.colors.rgb2hex(c)
    for l, c in zip(grps['english'], sns.color_palette("YlGn", 4)[1:]):
        color_key[l] = matplotlib.colors.rgb2hex(c)
    color_key["ingredient"] = "#777777bb"
    
    new_names = []
    for key, value in grps.items():
        new_names = new_names + value

    label_name['new_label'] = [new_name for x in label_name.country for new_name in new_names if x in new_name]
    
    return(recipes, recipes_label_id, ingredients_id, label_name, color_key)

In [7]:
# execfile('./00-recipes-setup.py')
recipes, recipes_label_id, ingredients_id, label_name, color_key = read_format_recipes()
recipes_label = [label_name.loc[i]['new_label'] for i in recipes_label_id.label]
recipes_country = [label_name.loc[i]['country'] for i in recipes_label_id.label] 

## String data

To make an example up, we can take the largest recipe which has 65 ingredients. So we wish to vectorize the 65 strings.

In [24]:
w = np.where([len(x)==65 for x in recipes])[0][0]
char_strings = recipes[w].copy()

In [23]:
char_strings = ['fettucine',
 'fresh marjoram',
 'minced garlic',
 'olive oil',
 'garlic powder',
 'large eggs',
 'Alfredo sauce',
 'vegetable oil',
 'cajun seasoning',
 'shredded romano cheese',
 'basil dried leaves',
 'salt',
 'cayenne pepper',
 'scallions',
 'red bell pepper',
 'boneless skinless chicken breast halves',
 'soba',
 'pasta sauce',
 'kosher salt',
 'milk',
 'fresh ginger',
 'ground black pepper',
 'flour',
 'cooked chicken',
 'coarse salt',
 'lemon',
 'diced tomatoes',
 'garlic',
 'rice vinegar',
 'Neufchâtel',
 'garlic cloves',
 'dried parsley',
 'frozen artichoke hearts',
 'penne',
 'pepper',
 'sweet onion',
 'part-skim mozzarella cheese',
 'parmigiano reggiano cheese',
 'basil leaves',
 'onion powder',
 'red wine vinegar',
 'red pepper flakes',
 'orzo',
 'crushed red pepper',
 'all-purpose flour',
 'freshly ground pepper',
 'sliced mushrooms',
 'panko breadcrumbs',
 'plum tomatoes',
 'fresh basil',
 'fresh leav spinach',
 'water',
 'sun-dried tomatoes',
 'ground pepper',
 'grated parmesan cheese',
 'boneless skinless chicken breasts',
 'chicken cutlets',
 'butter',
 'multi-grain penne pasta',
 'extra-virgin olive oil',
 'cilantro leaves',
 'green pepper',
 'shredded mozzarella cheese',
 'fresh parsley',
 'spaghetti']

And here, we tokenize the strings into their characters.

In [30]:
char_strings_tokens = [[*x] for x in char_strings]
char_strings_tokens[0:5]

[['f', 'e', 't', 't', 'u', 'c', 'i', 'n', 'e'],
 ['f', 'r', 'e', 's', 'h', ' ', 'm', 'a', 'r', 'j', 'o', 'r', 'a', 'm'],
 ['m', 'i', 'n', 'c', 'e', 'd', ' ', 'g', 'a', 'r', 'l', 'i', 'c'],
 ['o', 'l', 'i', 'v', 'e', ' ', 'o', 'i', 'l'],
 ['g', 'a', 'r', 'l', 'i', 'c', ' ', 'p', 'o', 'w', 'd', 'e', 'r']]

## Text data

In [None]:
from sklearn.datasets import fetch_20newsgroups
# Start from small example
categories = ['alt.atheism', 'talk.religion.misc', 'comp.graphics', 'sci.space']
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)

### Tokenize text first

In [39]:
import sklearn.feature_extraction.text
import sklearn.preprocessing

In [40]:
%%time
cv = sklearn.feature_extraction.text.CountVectorizer(lowercase=True)
sk_word_tokenize = cv.build_tokenizer()
sk_preprocesser = cv.build_preprocessor()
tokenize = lambda doc: sk_word_tokenize(sk_preprocesser(doc))
tokenized_news = [tokenize(doc) for doc in newsgroups_train.data]

CPU times: total: 547 ms
Wall time: 675 ms


In [42]:
tokenized_news[0]

['from',
 'rych',
 'festival',
 'ed',
 'ac',
 'uk',
 'hawkes',
 'subject',
 '3ds',
 'where',
 'did',
 'all',
 'the',
 'texture',
 'rules',
 'go',
 'lines',
 '21',
 'hi',
 've',
 'noticed',
 'that',
 'if',
 'you',
 'only',
 'save',
 'model',
 'with',
 'all',
 'your',
 'mapping',
 'planes',
 'positioned',
 'carefully',
 'to',
 '3ds',
 'file',
 'that',
 'when',
 'you',
 'reload',
 'it',
 'after',
 'restarting',
 '3ds',
 'they',
 'are',
 'given',
 'default',
 'position',
 'and',
 'orientation',
 'but',
 'if',
 'you',
 'save',
 'to',
 'prj',
 'file',
 'their',
 'positions',
 'orientation',
 'are',
 'preserved',
 'does',
 'anyone',
 'know',
 'why',
 'this',
 'information',
 'is',
 'not',
 'stored',
 'in',
 'the',
 '3ds',
 'file',
 'nothing',
 'is',
 'explicitly',
 'said',
 'in',
 'the',
 'manual',
 'about',
 'saving',
 'texture',
 'rules',
 'in',
 'the',
 'prj',
 'file',
 'like',
 'to',
 'be',
 'able',
 'to',
 'read',
 'the',
 'texture',
 'rule',
 'information',
 'does',
 'anyone',
 'have',


# Should I first vectorize my tokens or not?

This is very application dependent. If you want to vectorize strings to find near dupes, then you might not want to vectorize the letters of the alphabet as you want all letters to be equidistant. However, if you want to vectorize text (words can have similar meanings), recipes or commands in a session, then you might want to consider vectorizing your tokens so that tokens that "mean" or "do" similar things are vectorized closer. 

Once you decide you DO want to vectorize your tokens, you need to determine if order matters or not in your token sets. In text and commands in a session, order does matter. In a recipe which is given by a list of ingredients, order does not matter. The strategies for counting co-occurrences are different in both cases.

### Vectorize tokens when order does not matter

We first build a vector representation of ingredients based on co-occurrences of ingredients in the same recipes. Little hack here: we make our own. We use our own as the current vectorizer library has a cooccurrence vectorizer based on ordered sets and so has concepts such as "appears before" or "appears after" that we wish to avoid.

In [8]:
def vertexCooccurrenceVectorizer(hyperedges):
    vertexCooccurrence_vectorizer = vectorizers.TokenCooccurrenceVectorizer().fit(hyperedges)
    
    incidence_vectorizer = vectorizers.NgramVectorizer(
        token_dictionary=vertexCooccurrence_vectorizer.token_label_dictionary_
    ).fit(hyperedges)

    H = incidence_vectorizer.transform(hyperedges)
    
    M_cooccurrence = (H.T@H)
    M_cooccurrence.setdiag(0)
    M_cooccurrence.eliminate_zeros()
    
    vertexCooccurrence_vectorizer.cooccurrences_ = M_cooccurrence
    
    return(vertexCooccurrence_vectorizer)

In [9]:
%%time

ingredient_vectorizer = vertexCooccurrenceVectorizer(recipes)
ingredient_vectors = ingredient_vectorizer.reduce_dimension(dimension=60, algorithm="randomized")
n_ingredients = len(ingredient_vectorizer.token_index_dictionary_)
ingredients = [ingredient_vectorizer.token_index_dictionary_[i] for i in range(n_ingredients)]

CPU times: total: 43 s
Wall time: 1min 1s


### Vectorize tokens when order of token matters!

If order DOES matter, then we can use the cooccurrence vectorizer straight from the vectorizer library.

For a complete work on this see :
* https://github.com/hackalog/vectorizers_playground/blob/main/notebooks/03-document-embeddings-with-vectorizers.ipynb

To do this, you can use the ''TokenCooccurrenceVectorizer'' function from the vectorizer library. 

In [41]:
%%time
word_vectorizer = vectorizers.TokenCooccurrenceVectorizer(
    min_document_occurrences=5,
    window_radii=20,          
    window_functions='variable',
    kernel_functions='geometric',            
    n_iter = 0,
    normalize_windows=True,
).fit(tokenized_news)
word_vectors = word_vectorizer.reduce_dimension(dimension=160, algorithm="randomized")

CPU times: total: 1min 24s
Wall time: 1min 45s


## Plot tokens (ingredients) and explore with This not that

In [43]:
ingredient_mapper = umap.UMAP(metric="cosine", random_state=42).fit(ingredient_vectors)

In [44]:
ingredient_label_layers =  tnt.JointVectorLabelLayers(
    ingredient_vectors,            # high dim edge embedding
    ingredient_mapper.embedding_,  # 2-d edge embedding
    ingredient_vectors,            # high dim vertex embedding
    ingredients,                   # vertex name
    cluster_map_representation=True,
    min_clusters_in_layer=5,
    random_state=0,
)

In [46]:
ingredient_plot = tnt.BokehPlotPane(
    ingredient_mapper.embedding_,
    hover_text=ingredients,
    marker_size=0.03,
    width=700,
    height=600,
    show_legend=False,
    min_point_size=0.001,
    max_point_size=0.05,
    tools="pan,wheel_zoom,tap,lasso_select,box_zoom,save,reset",
    title="What is cooking? Ingredient Map",
)

In [47]:
pn.Row(ingredient_plot)

# Build list vectors (recipe, document, ...) from token vectors

Once you have your token vectors, we can treat the lists of tokens as distribution over the token vector space. Instead of considering flat distributions of tokens contained in the lists, we use a distribution given by the information gain of the tokens. 

In [58]:
%%time
list_vectorizer = vectorizers.NgramVectorizer(
    token_dictionary=ingredient_vectorizer.token_label_dictionary_
).fit(recipes)

incidence_matrix = list_vectorizer.transform(recipes)

CPU times: total: 7.19 s
Wall time: 9.51 s


In [59]:
%%time
info_weighted_incidence = vectorizers.transformers.InformationWeightTransformer(
    prior_strength=1e-1,
    approx_prior=False,
).fit_transform(incidence_matrix)

CPU times: total: 1.77 s
Wall time: 1.95 s


# Build list vectors when you have no token vectors

This is application dependent.

In the case of strings, we did not vectorize the tokens, i.e., the letters. In this case we can use the n-gram vectorizer directly on the letters.

In [57]:
%%time
string_vectorizer = vectorizers.NgramVectorizer(
    ngram_size = 3,
    ngram_behaviour = "subgrams", # Keep all n-grams of smaller sizes
).fit(char_strings_tokens)

char_strings_count_matrix = string_vectorizer.transform(char_strings_tokens)

CPU times: total: 46.9 ms
Wall time: 96.9 ms


In [56]:
string_vectorizer.column_index_dictionary_

{0: (' ',),
 1: (' ', 'a'),
 2: (' ', 'a', 'r'),
 3: (' ', 'b'),
 4: (' ', 'b', 'a'),
 5: (' ', 'b', 'e'),
 6: (' ', 'b', 'l'),
 7: (' ', 'b', 'r'),
 8: (' ', 'c'),
 9: (' ', 'c', 'h'),
 10: (' ', 'c', 'l'),
 11: (' ', 'c', 'u'),
 12: (' ', 'd'),
 13: (' ', 'd', 'r'),
 14: (' ', 'e'),
 15: (' ', 'e', 'g'),
 16: (' ', 'f'),
 17: (' ', 'f', 'l'),
 18: (' ', 'g'),
 19: (' ', 'g', 'a'),
 20: (' ', 'g', 'i'),
 21: (' ', 'g', 'r'),
 22: (' ', 'h'),
 23: (' ', 'h', 'a'),
 24: (' ', 'h', 'e'),
 25: (' ', 'l'),
 26: (' ', 'l', 'e'),
 27: (' ', 'm'),
 28: (' ', 'm', 'a'),
 29: (' ', 'm', 'o'),
 30: (' ', 'm', 'u'),
 31: (' ', 'o'),
 32: (' ', 'o', 'i'),
 33: (' ', 'o', 'l'),
 34: (' ', 'o', 'n'),
 35: (' ', 'p'),
 36: (' ', 'p', 'a'),
 37: (' ', 'p', 'e'),
 38: (' ', 'p', 'o'),
 39: (' ', 'r'),
 40: (' ', 'r', 'e'),
 41: (' ', 'r', 'o'),
 42: (' ', 's'),
 43: (' ', 's', 'a'),
 44: (' ', 's', 'e'),
 45: (' ', 's', 'k'),
 46: (' ', 's', 'p'),
 47: (' ', 't'),
 48: (' ', 't', 'o'),
 49: (' ', 'v'),

# NOTE: the rest of this notebook only focuses on exploring recipes!

The rest focuses on recipes, but the same ideas can be applied to other datasets. The notion of "joint embedding" only makes sense if you vectorized your tokens. If you did not vectorize your tokens, then the lists of tokens are the only object that get vectorized and there is no joint space.

## Joint embedding of tokens and lists

We now have a vector representation for each token, and we treat our original lists as distribution over that space. We can also treat tokens themselves as distributions: a token is a list of only itself (so distribution with all mass on itself).

Now, both tokens and lists are seen as distributions over a vector space. We can now use the Approximate Wasserstein vectorizer. This vectorizer transforms finite distributions over a metric space into vectors in a linear space such that euclidean or cosine distance approximates the Wasserstein distance between the distributions. 

This is done by representing the token distributions with the identity matrix on the tokens, and stacking the weighted incidence with this identity matrix. Then give this matrix of both hyperedge and node distributions to the vectorizer function along with the vertex vectors.

In [16]:
info_doc_with_identity = vstack([info_weighted_incidence, scipy.sparse.identity(n_ingredients)])

In [17]:
%%time
joint_vectors_unsupervised = vectorizers.ApproximateWassersteinVectorizer(
    normalization_power=0.25,
    random_state=42,
).fit_transform(info_doc_with_identity, vectors=ingredient_vectors)

CPU times: total: 3.34 s
Wall time: 1.4 s


In [18]:
%%time
joint_vectors_mapper = umap.UMAP(metric="cosine", random_state=42).fit(joint_vectors_unsupervised)

CPU times: total: 1min 20s
Wall time: 1min 9s


# This not that : explore recipes

### Build dataframe that contains information about token and lists (recipes)

In [63]:
recipe_metadata = pd.DataFrame()
recipe_metadata['Country'] = recipes_country
recipe_metadata['Label'] = recipes_label
recipe_metadata['Ingredients'] = recipes 
recipe_metadata['Recipe_size'] = [float(len(x)) for x in recipes] 

In [64]:
recipe_metadata

Unnamed: 0,Country,Label,Ingredients,Recipe_size
0,greek,others.greek,"[romaine lettuce, black olives, grape tomatoes...",9.0
1,southern_us,american.southern_us,"[plain flour, ground pepper, salt, tomatoes, g...",11.0
2,filipino,asian.filipino,"[eggs, pepper, salt, mayonaise, cooking oil, g...",12.0
3,indian,others.indian,"[water, vegetable oil, wheat, salt]",4.0
4,indian,others.indian,"[black pepper, shallots, cornflour, cayenne pe...",20.0
...,...,...,...,...
39554,irish,english.irish,"[light brown sugar, granulated sugar, butter, ...",12.0
39555,italian,europe.italian,"[KRAFT Zesty Italian Dressing, purple onion, b...",7.0
39556,irish,english.irish,"[eggs, citrus fruit, raisins, sourdough starte...",12.0
39557,chinese,asian.chinese,"[boneless chicken skinless thigh, minced garli...",21.0


### Select the proper vectors to plot

In [39]:
# Just plot the recipes
n_recipes = len(recipes)
recipes_bool = np.array([True for i in range(n_recipes)] + [False for i in range(n_ingredients)])
ingredients_bool = ~recipes_bool
recipe_umap = joint_vectors_mapper.embedding_[recipes_bool]

In [40]:
# Remove the ingredient from the color map as we are not plotting ingredient vectors
color_mapping = color_key.copy()
del color_mapping['ingredient']

In [41]:
# Resize the hyperedge points in terms of the recipe size (this is not very useful in this case as all recipes are of similar sizes)
sizes = [np.sqrt(len(x)) / 100 for x in recipes]

### Add a legend : legend

In [42]:
legend = tnt.LegendWidget(
    recipe_metadata.Label,
    factors=list(color_mapping.keys()), 
    palette=list(color_mapping.values()), 
    palette_length=len(color_mapping),
    color_picker_height=16,
    color_picker_margin=[0,0],
    label_height=30,
    label_width=150,
    name="Legend",
    selectable=True,
)

### Plot control: control_pane

In [43]:
plot_control_pane = tnt.PlotControlWidget(recipe_metadata, scale_type_selector=True)

### Search capability : search_pane
This will allow to search the dataframe rows and have the matching rows selected and displayed on the plot.

In [44]:
search_pane = tnt.SearchWidget(recipe_metadata, width=400, title="Advanced Search")

### Summarize selection : count_summary
Counts how many things we select

In [45]:
from thisnotthat.summary.dataframe import JointLabelSummarizer, CountSelectedSummarizer
count_summary = tnt.DataSummaryPane(CountSelectedSummarizer(),sizing_mode = "stretch_width")

### Summarize selection : word_summary
First time we use the vertex vectors. This summarizer will give us the names of the closest vertex vectors to the centroid of a selection of hyperedges on the plot. This is only possible because the vertex and the hyperedges live in a common space. It will list the names along with a distance to the centroid point.

In [46]:
word_summary = JointLabelSummarizer(joint_vectors_unsupervised[recipes_bool],
                                    ingredients, 
                                    joint_vectors_unsupervised[ingredients_bool])
vertex_summary_pane = tnt.DataSummaryPane(word_summary)

### Information on click : info_pane
We will display the ingredient list on click.

In [47]:
markdown_template = """## Recipe from {Label}
---
#### Ingredients

{Ingredients}

---
"""

In [48]:
info_pane = tnt.InformationPane(recipe_metadata, markdown_template, width=400, height=750, sizing_mode="stretch_height")

### Link everything to the plot

In [49]:
%%time
bokeh_plot = tnt.BokehPlotPane(
    recipe_umap,
    labels=recipe_metadata.Label,
    hover_text=recipe_metadata.Country,
    legend_location='outside',
    marker_size=sizes,
    label_color_mapping=color_mapping,
    show_legend=False,
    min_point_size=0.001,
    max_point_size=0.05,
    tools="pan,wheel_zoom,tap,lasso_select,box_zoom,save,reset",
    title="What is cooking? Data Map",
)

CPU times: total: 2.02 s
Wall time: 815 ms


In [50]:
count_summary.link_to_plot(bokeh_plot)
vertex_summary_pane.link_to_plot(bokeh_plot)
search_pane.link_to_plot(bokeh_plot)
info_pane.link_to_plot(bokeh_plot)
legend.link_to_plot(bokeh_plot)
plot_control_pane.link_to_plot(bokeh_plot)

Watcher(inst=PlotControlWidget(name='Plot Controls'), cls=<class 'thisnotthat.plot_controls.PlotControlWidget'>, fn=<function Reactive.link.<locals>.link_cb at 0x0000025A11E58D30>, mode='args', onlychanged=True, parameter_names=('color_by_vector', 'color_by_palette', 'hover_text', 'marker_size'), what='value', queued=False, precedence=0)

In [51]:
pn.Row(bokeh_plot, 
       legend,
        pn.Tabs(
                pn.Column(count_summary, vertex_summary_pane, name='Selection'),
                search_pane,
                info_pane,
                plot_control_pane
            )
)