## Jupyter D3 Sunburst example

This is a notebook that demonstrates how to retrieve delimited categories like for example \
**"Die-Cast & Toy Vehicles > Toy Vehicles & Accessories > Scaled Models > Vehicles"** \
from an Elasticsearch index or a file and display the categories in a nice interactive D3js sunburst graphic. \
This is more a technologie mashup than a real use case. \
This is also **not** a detailed tutorial that covers all the required steps but more an overview of what is possible. 

There are 2 possible workflows:
- one which retrieves the data from an ElasticSearch index that contains the required data
- another one which uses the data from a demo data file

**Setup with ElasticSearch (Elastic path)**\
If you have an Elasticsearch server up and running that contains produst with a category path, then you can just adjust the variables in the first cell to the correct values for your data set.

**Setup with demo data set (File path)**\
If you don´t have an Elasticsearch server with a product index that contains a category path, then just set the variable **useElastic** to **False** and the notebook will use the static demo dataset.

**Sources**\
The following sources inspired me when I setup this example notebook:\
https://nbviewer.jupyter.org/github/soxofaan/jupyter-playground/blob/master/jupyter-custom-d3-visualization/jupyter-custom-d3-visualization.ipynb \
https://observablehq.com/@d3/zoomable-sunburst 

In [None]:
# General block (Elastic path) AND (File path)
from IPython.display import display, Javascript, HTML
import json
import requests

debug=True
useElastic=False
indexName="amazon-fashion"
groupField="amazon_category_and_sub_category.keyword"
splitChar=chr(62)
maxAggs=10000

In [None]:
# (Elastic path)
if useElastic:
    from elasticsearch import Elasticsearch
    # domain name, or server's IP address, goes in the 'hosts' list
    elastic_client = Elasticsearch(hosts=["http://localhost:9200/"], timeout=20)
    # Query to get the aggregated field data
    query_body ={
      "size": 0,
      "query": {
        "match_all": {}
      },
      "aggs": {
        "category_counts": {
          "terms": {
            "field": groupField,
            "size": maxAggs
          }
        }
      }
    }
    response = elastic_client.search(index=indexName, body=query_body)
    lst = response["aggregations"]["category_counts"]["buckets"]
    if debug:
        print(json.dumps(lst, indent=4))

In [None]:
# (File path)
if not useElastic:
    with open('demodata.json', 'r') as f:
        lst = json.load(f)
    if debug:
        print(json.dumps(lst, indent=4))

In [None]:
# (Elastic path)
if useElastic and response["aggregations"]["category_counts"]["sum_other_doc_count"] > 0:
    print("------------- ATTENTION! -------------")  
    print("Not all aggregations retrieved!")   
    print("------------- ATTENTION! -------------") 

In [None]:
# General block (Elastic path) AND (File path)

# Code to transform the list of category pathes to an hierarchical json

# Search for a matching name in the current list.
# If it doesn't exist, create it.
def insert(lst, name, idnum):
    for d in lst:
        if d['name'] == name:
            break
    else:
        d = {'name': name, 'size': idnum, 'children': []}
        lst.append(d)
    return d['children']

# Remove empty child lists
def prune(lst):
    for d in lst:
        if d['children']:
            d.pop("size")
            prune(d['children'])
        else:
            del d['children']

# Insert the data into the master list
master = []
for items in lst:
    names = "Catalog" + splitChar + items["key"]
    idnum = items["doc_count"]
    lst = master
    for name in [s.strip() for s in names.split(splitChar)]:
        lst = insert(lst, name, idnum)

prune(master)

# Get the top level dict from the master list
data = master[0]
if debug:
    print(json.dumps(data, indent=4))

In [None]:
# General block (Elastic path) AND (File path)
# the javascript and css code ist stored in external files
# just use it like a component
display(Javascript("require.config({paths: {d3: 'https://d3js.org/d3.v5.min'}});"))
display(Javascript(filename="sunburst.js"))
display(HTML(filename="sunburst.css.html"))

In [None]:
# General block (Elastic path) AND (File path)
# Function to draw the sunbrust
def draw_sunburst(data, width=600):
    display(Javascript("""
        (function(element){
            require(['sunburst'], function(sunburst) {
                sunburst(element.get(0), %s, %d);
            });
        })(element);
    """ % (json.dumps(data), width)))

In [None]:
# General block (Elastic path) AND (File path)
# Execute the draw function to display the sunburst
draw_sunburst(data, width=1000)