<a href="https://colab.research.google.com/github/zseebrz/colab/blob/main/tutorials/ChartPainter_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Visualization in sberpm

The module __`sberpm.visual.ChartPainter`__ is designed to create different types of graphs. Since it is based on the __`plotly`__ library, all the plots come interactive.

In [8]:
!git clone https://github.com/SberProcessMining/Sber_Process_Mining.git
#need to clone the github repo first to be able to import the latest version of the library

fatal: destination path 'Sber_Process_Mining' already exists and is not an empty directory.


In [14]:
pip install Sber_Process_Mining/.
#import from the cloned repo

Processing ./Sber_Process_Mining
[33m  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.[0m
Collecting graphviz==0.16
  Downloading graphviz-0.16-py2.py3-none-any.whl (19 kB)
Collecting plotly>=4.9.0
  Downloading plotly-5.5.0-py2.py3-none-any.whl (26.5 MB)
[K     |████████████████████████████████| 26.5 MB 1.5 MB/s 
[?25hCollecting IPython~=7.25.0
  Downloading ipython-7.25.0-py3-none-any.whl (786 kB)
[K     |████████████████████████████████| 786 kB 61.3 MB/s 
[?25hCollecting matplotlib~=3.4.2
  Downloading matplotlib-3.4.3-cp37-cp37m-manylinux1_x86_64.whl (10.3 MB)
[K     |████████████████████████████████| 10.3 MB 47.8 MB/s 
Collecting setu

In [1]:
from sberpm.visual import ChartPainter

In [4]:
!wget "https://data.4tu.nl/ndownloader/files/24023492"
#download the example file from TU Eindhoven

--2022-01-09 17:58:44--  https://data.4tu.nl/ndownloader/files/24023492
Resolving data.4tu.nl (data.4tu.nl)... 131.180.141.15
Connecting to data.4tu.nl (data.4tu.nl)|131.180.141.15|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/gzip]
Saving to: ‘24023492’

24023492                [   <=>              ]   1.51M  2.67MB/s    in 0.6s    

2022-01-09 17:58:45 (2.67 MB/s) - ‘24023492’ saved [1581144]



In [6]:
mv 24023492 file.xes.gz
#rename to xes.gz

SyntaxError: ignored

In [11]:
ls

file.xes  [0m[01;34msample_data[0m/  [01;34mSber_Process_Mining[0m/


In [10]:
!gunzip file.xes.gz

In [12]:
!pip install pm4py
#install pm4py to convert xes into csv

Collecting pm4py
  Downloading pm4py-2.2.18-py3-none-any.whl (1.7 MB)
[K     |████████████████████████████████| 1.7 MB 4.9 MB/s 
Collecting pyvis
  Downloading pyvis-0.1.9-py3-none-any.whl (23 kB)
Collecting deprecation
  Downloading deprecation-2.1.0-py2.py3-none-any.whl (11 kB)
Collecting stringdist
  Downloading StringDist-1.0.9.tar.gz (7.4 kB)
Collecting workalendar
  Downloading workalendar-16.2.0-py3-none-any.whl (205 kB)
[K     |████████████████████████████████| 205 kB 41.6 MB/s 
Collecting jsonpickle
  Downloading jsonpickle-2.0.0-py2.py3-none-any.whl (37 kB)
Collecting pyluach
  Downloading pyluach-1.3.0-py3-none-any.whl (17 kB)
Collecting backports.zoneinfo
  Downloading backports.zoneinfo-0.2.1-cp37-cp37m-manylinux1_x86_64.whl (70 kB)
[K     |████████████████████████████████| 70 kB 6.0 MB/s 
[?25hCollecting lunardate
  Downloading lunardate-0.2.0-py3-none-any.whl (5.6 kB)
Building wheels for collected packages: stringdist
  Building wheel for stringdist (setup.py) ... [

In [13]:
from pm4py.objects.log.importer.xes import importer as xes_importer
log = xes_importer.apply('file.xes')

parsing log, completed traces ::   0%|          | 0/6449 [00:00<?, ?it/s]

In [14]:
import pandas as pd
from pm4py.objects.conversion.log import converter as log_converter
dataframe = log_converter.apply(log, variant=log_converter.Variants.TO_DATA_FRAME)
dataframe.to_csv('InternationalDeclarations.csv')
                                


Parameters:
- __data *(pandas.DataFrame or sberpm.DataHolder or sberpm.metrics instance)*__ – data to use for visualization
- __template *(str, default='plotly')*__ – name of the figure template (https://plotly.com/python/templates/)
- __palette *(str, default='sequential.Sunset_r')*__ – name of the graph color palette (https://plotly.com/python/builtin-colorscales/)
- __shape_color *(str, default='lime')*__ – name of the color to use to draw new shapes in the figure

## TOC

`ChartPainter` offers the following graphs, each made in a single function call:
- __hist__ – [histogram](#Histogram)
- __bar__ – [bar chart](#Bar-Chart)
- __box__ – [box plot](#Box-Plot)
- __scatter__ – [scatter plot](#Scatter-Plot)
- __line__ – [line plot](#Line-Plot)
- __pie__ – [pie chart](#Pie-Chart)
- __sunburst__ – [sunburst plot](#Sunburst-Plot)
- __heatmap__ – [heatmap](#Heatmap)
- __density heatmap__ – [2D histogram](#2D-Histogram)
- __gantt__ – [Gantt chart](#Gantt-Chart)
- __pareto__ – [Pareto chart](#Pareto-Chart)
- __sankey__ – [Sankey diagram](#Sankey-Diagram)

All the methods are easy-to-use and packed with lots of different features making it possible to create highly customized plots. For all the functions there are keyword parameters for data visualization, but some other arguments of the corresponding module from `plotly.express` or `plotly.graph_objs` can optionally be passed.

To demonstrate what can be done with `ChartPainter`, the BPI Challenge 2020 dataset will be used ([source](https://doi.org/10.4121/uuid:52fb97d4-4588-43c9-9d04-3604d4613b51)).

In [15]:
from sberpm import DataHolder

data_holder = DataHolder('InternationalDeclarations.csv', 
                         id_column='case:id', 
                         activity_column='concept:name', 
                         start_timestamp_column='time:timestamp',
                         user_column='org:role',
                         time_format='%Y-%m-%d %H:%M:%S')
data_holder.check_or_calc_duration()

In [16]:
from sberpm.metrics import ActivityMetric, IdMetric, UserMetric

activity_metric = ActivityMetric(data_holder, time_unit='d')
id_metric = IdMetric(data_holder, time_unit='d')
user_metric = UserMetric(data_holder, time_unit='d')

In [17]:
data = data_holder.data[['case:id', 'case:Amount', 'case:Permit TaskNumber', 'case:Permit BudgetNumber',
                         'case:Permit ProjectNumber', 'case:Permit OrganizationalEntity', 
                         'case:Permit ActivityNumber']].drop_duplicates()
data['trip_duration'] = (data_holder.data[data_holder.data[data_holder.activity_column] == 'Start trip']
                         ['duration'] / 3600 / 24).values
data['trace_length'] = id_metric.apply()['trace_length'].values
data['total_duration'] = id_metric.apply()['total_duration'].values

## Histogram
[Back to TOC](#TOC)

A __histogram__ is an approximate representation of the distribution of numerical data. To construct a histogram, the first step is to divide the entire range of values into a series of intervals and then count how many values fall into each interval. 

`ChartPainter.hist` parameters:
- __x *(str or list of str)*__ – name of the column to make graph for. If it takes a list of column names, the input data is considered as wide-form rather than long-form
- __color *(str, default=None)*__ – name of the column used to set color to bars
- __subplots *((rows, cols, ncols), default=None)*__ – creates a set of subplots:
    - rows: name of the column to use for constructing subplots along the y-axis
    - cols: name of the column to use for constructing subplots along the x-axis
    - ncols: number of columns of the subplot grid
- __barmode *({'stack', 'overlay', 'group'}, default='stack')*__ – display mode of the bars with the same position coordinate
- __nbins *(int, default=50)*__ – number of bins
- __cumulative *(bool, default=False)*__ – whether to plot a cumulative histogram
- __orientation *({'v', 'h'}, default='v')*__ – orientation of the graph: 'v' for vertical and 'h' for horizontal
- __opacity *(float, default=0.8)*__ – opacity of the bars. Ranges from 0 to 1
- __edge *(bool, default=False)*__ – whether to draw bar edges
- __title *(str, default='auto')*__ – title of the graph. When 'auto', the title is generated automatically
- __slider *(bool, default=False)*__ – whether to add a range slider to the plot
- __height *(int, default=None)*__ – height of the figure in pixels
- __width *(int, default=None)*__ – width of the figure in pixels
- __font_size *(int, default=12)*__ – size of the global font
- __**kwargs _(optional)___ – see [`plotly.express.histogram`](https://plotly.com/python-api-reference/generated/plotly.express.histogram.html#plotly.express.histogram) for other possible arguments

In [18]:
painter = ChartPainter(id_metric)
painter.hist(x='total_duration')

The main advantage of `plotly` is the interactivity of the figure elements that gives a better understanding of the underlying data. In the upper right corner there are configuration options to zoom-in, zoom-out, autoscale, pan, reset axes, change hover mode, draw and erase shapes, etc.

The figure template and color palette can be changed during class initialization via the __template__ and __palette__ arguments. The available values for these parameters are provided [here](https://plotly.com/python/templates/) and [here](https://plotly.com/python/builtin-colorscales/) respectively. Using the shape-drawing buttons new shapes of color __shape_color__ can be added.

In [19]:
painter = ChartPainter(id_metric, template='plotly_white', palette='sequential.RdBu', shape_color='cyan')
painter.hist(x='total_duration')

If the first argument __x__ takes a list of column names, the input data is considered as wide-form rather than long-form and the legend is located on top of the graph (by default in the upper right corner).

In [20]:
painter = ChartPainter(id_metric)
painter.hist(x=['total_duration', 'mean_duration', 'median_duration'])

A legend is also interactive:
- click on the legend entries to hide and show traces
- double-click on the legend entry to hide all of the other traces
- double-click on any hided legend entry to show all the traces

The __nbins__ argument specifies the number of bins, the __cumulative__ specifies whether to plot a cumulative histogram or not, the __opacity__ specifies the opacity of the bars, and the __edge__ specifies whether to draw bars with edges or without them.

In [21]:
painter = ChartPainter(id_metric)
painter.hist(x='total_duration', nbins=80, cumulative=True, opacity=0.6, edge=True)

To plot several histograms for the different values of one column, the __color__ argument can be used. The __barmode__ controls the display mode of the bars with the same position coordinate.

In [22]:
painter = ChartPainter(data_holder)
painter.hist(x='duration', color='concept:name', barmode='overlay')

To create multiple histograms on a grid, the __subplots__ argument should be used.

In [23]:
painter = ChartPainter(data_holder)
painter.hist(x='case:Amount', subplots=(None, 'case:Permit TaskNumber', 3))

Arguments of the corresponding function from the `plotly.express` module can also be passed. For example, feature names shown as axis titles, legend entries and hovers in the graph can be changed with the __labels__ parameter. 

The plot title is set automatically, but it can also be changed via the __title__ argument. The __height__ and __width__ parameters control the height and width of the graph respectively. The __font_size__ argument allows to set the global font size.

In [24]:
painter = ChartPainter(id_metric)
painter.hist(x='total_duration', title='Histogram of ID duration', 
             labels={'total_duration': 'Total duration'}, height=400, width=900, font_size=10)

## Bar Chart
[Back to TOC](#TOC)

A __bar chart__ is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. It shows comparisons among discrete categories.

`ChartPainter.bar` parameters:
- __x *(str or list of str, default=None)*__ – name of the column to draw on the x-axis. If it takes a list of column names, the input data is considered as wide-form rather than long-form
- __y *(str or list of str, default=None)*__ – name of the column to draw on the y-axis. If it takes a list of column names, the input data is considered as wide-form rather than long-form
- __sort *(str, default=None)*__ – name of the column to sort values by in descending (ascending) order if __n__ is positive (negative)
- __n *(int, default=None)*__ – number of sorted rows to draw. If positive, the rows are sorted in descending order; if negative, the rows are sorted in ascending order
- __color *(str, default=None)*__ – name of the column used to set color to bars
- __subplots *((rows, cols, ncols), default=None)*__ – creates a set of subplots:
    - rows: name of the column to use for constructing subplots along the y-axis
    - cols: name of the column to use for constructing subplots along the x-axis
    - ncols: number of columns of the subplot grid
- __barmode *({'stack', 'overlay', 'group'}, default='stack')*__ – display mode of the bars with the same position coordinate
- __agg *({'count', 'sum', 'avg', 'min', 'max'}, default=None)*__ – name of the function used to aggregate 'y' ('x') values if orientation is set to 'v' ('h')
- __add_line *(list of str, default=None)*__ – list of column names to add line to the graph for. Each line will be drawn along a separate y-axis
- __text *(bool, default=False)*__ – whether to show text labels in the figure
- __decimals *(int, default=2)*__ – number of decimal places to round __text__ labels of a float dtype to
- __orientation *({'auto', 'v', 'h'}, default='auto')*__ – orientation of the graph: 'v' for vertical and 'h' for horizontal. By default, it is determined automatically based on the input data types
- __opacity *(float, default=1)*__ – opacity of the bars. Ranges from 0 to 1
- __edge *(bool, default=False)*__ – whether to draw bar edges
- __title *(str, default='auto')*__ – title of the graph. When 'auto', the title is generated automatically
- __slider *(bool, default=False)*__ – whether to add a range slider to the plot
- __height *(int, default=None)*__ – height of the figure in pixels
- __width *(int, default=None)*__ – width of the figure in pixels
- __font_size *(int, default=12)*__ – size of the global font
- __**kwargs _(optional)___ – see [`plotly.express.bar`](https://plotly.com/python-api-reference/generated/plotly.express.bar.html#plotly.express.bar) for other possible arguments

In [25]:
painter = ChartPainter(user_metric)
painter.bar(x=data_holder.user_column, y='total_duration')

By using the __sort__ argument, the column to sort values by can be defined.
- If __n__ > 0, sorting in descending order,
- If __n__ < 0, sorting in ascending order,

where __n__ is the number of sorted rows to draw.

The parameter __text__ allows to specify feature values to appear in the figure as text labels (accurate to __decimals__ decimal places).

In [26]:
painter = ChartPainter(activity_metric)
painter.bar(x=data_holder.activity_column, y='mean_duration', sort='mean_duration', n=15, text=True, decimals=1)

By using the __add_line__ parameter, other variables can be added to the graph as lines along the separate y-axes.

In [27]:
painter = ChartPainter(activity_metric)
painter.bar(x=data_holder.activity_column, y='count', sort='count', n=40, add_line=['mean_duration'], 
            height=700)

To plot an aggregated bar chart, the argument __agg__ that specifies the aggregation function should be set. Otherwise, the individual items within each bar will be stacked on top of each other.

In [28]:
painter = ChartPainter(data.head(300))
painter.bar(x='case:Permit OrganizationalEntity', y='case:Amount')
painter.bar(x='case:Permit OrganizationalEntity', y='case:Amount', agg='sum')

## Box Plot
[Back to TOC](#TOC)

In descriptive statistics, a __box plot__ is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending from the boxes (whiskers) indicating variability outside the upper and lower quartiles. Outliers may be plotted as individual points. The spacings between the different parts of the box indicate the degree of dispersion (spread) and skewness in the data, and show outliers. 

`ChartPainter.box` parameters:
- __x *(str or list of str, default=None)*__ – name of the column to draw on the x-axis. If it takes a list of column names, the input data is considered as wide-form rather than long-form
- __y *(str or list of str, default=None)*__ – name of the column to draw on the y-axis. If it takes a list of column names, the input data is considered as wide-form rather than long-form
- __color *(str, default=None)*__ – name of the column used to set color to bars
- __subplots *((rows, cols, ncols), default=None)*__ – creates a set of subplots:
    - rows: name of the column to use for constructing subplots along the y-axis
    - cols: name of the column to use for constructing subplots along the x-axis
    - ncols: number of columns of the subplot grid
- __boxmode *({'group', 'overlay'}, default='group')*__ – display mode of the boxes with the same position coordinate
- __points *({'all', 'outliers', 'suspectedoutliers', 'False'}, default='outliers')*__ – type of underlying data points to display
- __orientation *({'auto', 'v', 'h'}, default='auto')*__ – orientation of the graph: 'v' for vertical and 'h' for horizontal. By default, it is determined automatically based on the input data types
- __title *(str, default='auto')*__ – title of the graph. When 'auto', the title is generated automatically
- __height *(int, default=None)*__ – height of the figure in pixels
- __width *(int, default=None)*__ – width of the figure in pixels
- __font_size *(int, default=12)*__ – size of the global font
- __**kwargs _(optional)___ – see [`plotly.express.box`](https://plotly.com/python-api-reference/generated/plotly.express.box.html#plotly.express.box) for other possible arguments

In [29]:
painter = ChartPainter(data)
painter.box(y='case:Amount')

With the __points__ argument, it is possible to illustrate the underlying data points with either all points, outliers only, suspected outliers only, or none of them.

In [30]:
painter = ChartPainter(data)
painter.box(x='case:Permit OrganizationalEntity', y='case:Amount', points='suspectedoutliers')

## Scatter Plot
[Back to TOC](#TOC)

A __scatter plot__ is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.

`ChartPainter.scatter` parameters:
- __x *(str or list of str, default=None)*__ – name of the column to draw on the x-axis. If it takes a list of column names, the input data is considered as wide-form rather than long-form
- __y *(str or list of str, default=None)*__ – name of the column to draw on the y-axis. If it takes a list of column names, the input data is considered as wide-form rather than long-form
- __sort *(str, default=None)*__ – name of the column to sort values by in descending (ascending) order if __n__ is positive (negative)
- __n *(int, default=None)*__ – number of sorted rows to draw. If positive, the rows are sorted in descending order; if negative, the rows are sorted in ascending order
- __color *(str, default=None)*__ – name of the column used to set color to markers
- __size *(str or int, default=None)*__ – name of the column used to set marker sizes (if str) or the marker size (if integer)
- __symbol *(str, default=None)*__ – name of the column used to set symbols to markers
- __subplots *((rows, cols, ncols), default=None)*__ – creates a set of subplots:
    - rows: name of the column to use for constructing subplots along the y-axis
    - cols: name of the column to use for constructing subplots along the x-axis
    - ncols: number of columns of the subplot grid
- __text *(str, default=None)*__ – name of the column to use as text labels in the figure
- __decimals *(int, default=2)*__ – number of decimal places to round __text__ labels of a float dtype to
- __size_max *(int, default=20)*__ – the maximum marker size. Used if __size__ is given
- __orientation *({'auto', 'v', 'h'}, default='auto')*__ – orientation of the graph: 'v' for vertical and 'h' for horizontal. By default, it is determined automatically based on the input data types
- __opacity *(float, default=1)*__ – opacity of the markers. Ranges from 0 to 1
- __edge *(bool, default=False)*__ – whether to draw marker edges
- __title *(str, default='auto')*__ – title of the graph. When 'auto', the title is generated automatically
- __slider *(bool, default=False)*__ – whether to add a range slider to the plot
- __height *(int, default=None)*__ – height of the figure in pixels
- __width *(int, default=None)*__ – width of the figure in pixels
- __font_size *(int, default=12)*__ – size of the global font
- __**kwargs _(optional)___ – see [`plotly.express.scatter`](https://plotly.com/python-api-reference/generated/plotly.express.scatter.html#plotly.express.scatter) for other possible arguments

In [31]:
painter = ChartPainter(data)
painter.scatter(x='trip_duration', y='case:Amount')

The parameters __size__ and __symbol__ allows to choose the marker size and symbol respectively. 

The __hover_data__ argument (from `plotly.express.scatter`) specifies the extra data to appear in the hover box.

In [32]:
painter = ChartPainter(data)
painter.scatter(x='trip_duration', y='case:Amount', color='trace_length', size='total_duration', opacity=0.8, 
                hover_data=[data_holder.id_column])

To make a dot plot, the name of the categorical variable should be passed to either __x__ or __y__.

In [33]:
painter = ChartPainter(activity_metric)
painter.scatter(y=data_holder.activity_column, x='loop_percent', sort='loop_percent', n=40, size=15, height=700)

## Line Plot
[Back to TOC](#TOC)

A __line plot__ is a type of chart which displays information as a series of data points called 'markers' connected by straight line segments. It is often used to visualize a trend in data over intervals of time – a time series – thus the line is often drawn chronologically.

`ChartPainter.line` parameters:
- __x *(str or list of str, default=None)*__ – name of the column to draw on the x-axis. If it takes a list of column names, the input data is considered as wide-form rather than long-form
- __y *(str or list of str, default=None)*__ – name of the column to draw on the y-axis. If it takes a list of column names, the input data is considered as wide-form rather than long-form
- __sort *(str, default=None)*__ – name of the column to sort values by in descending (ascending) order if __n__ is positive (negative)
- __n *(int, default=None)*__ – number of sorted rows to draw. If positive, the rows are sorted in descending order; if negative, the rows are sorted in ascending order
- __color *(str, default=None)*__ – name of the column used to set color to lines
- __group *(str, default=None)*__ – name of the column used to group data rows into lines
- __dash *(str, default=None)*__ – name of the column used to set dash patterns to lines
- __subplots *((rows, cols, ncols), default=None)*__ – creates a set of subplots:
    - rows: name of the column to use for constructing subplots along the y-axis
    - cols: name of the column to use for constructing subplots along the x-axis
    - ncols: number of columns of the subplot grid
- __text *(str, default=None)*__ – name of the column to use as text labels in the figure
- __decimals *(int, default=2)*__ – number of decimal places to round __text__ labels of a float dtype to
- __orientation *({'auto', 'v', 'h'}, default='auto')*__ – orientation of the graph: 'v' for vertical and 'h' for horizontal. By default, it is determined automatically based on the input data types
- __line_width *(int, default=2)*__ – width of the line(s)
- __title *(str, default='auto')*__ – title of the graph. When 'auto', the title is generated automatically
- __slider *(bool, default=False)*__ – whether to add a range slider to the plot
- __height *(int, default=None)*__ – height of the figure in pixels
- __width *(int, default=None)*__ – width of the figure in pixels
- __font_size *(int, default=12)*__ – size of the global font
- __**kwargs _(optional)___ – see [`plotly.express.line`](https://plotly.com/python-api-reference/generated/plotly.express.line.html#plotly.express.line) for other possible arguments

In [34]:
painter = ChartPainter(user_metric)
painter.line(x=data_holder.user_column, y='workload')

The parameters __group__ and __dash__ are used to group rows into lines by the given column. In the former case, each line corresponds to each group, and in the latter case, each dash pattern corresponds to each group.

In [35]:
painter = ChartPainter(data.head(100))
painter.line(x=data_holder.id_column, y='case:Amount', dash='case:Permit TaskNumber')

The __slider__ argument allows to add a range slider, that is a small subplot-like area below a plot which allows users to pan and zoom the X-axis while maintaining an overview of the chart.

In [36]:
painter = ChartPainter(data.head(100))
painter.line(x=data_holder.id_column, y='case:Amount', group='case:Permit TaskNumber', slider=True)

## Pie Chart
[Back to TOC](#TOC)

A __pie chart__ is a circular statistical graphic, which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice (and consequently its central angle and area), is proportional to the quantity it represents.

`ChartPainter.pie` parameters:
- __labels *(str)*__ – name of the column to use as labels for sectors
- __values *(str, default=None)*__ – name of the column used to set values to sectors
- __color *(str, default=None)*__ – name of the column used to set color to sectors
- __n *(int, default=None)*__ – number of sorted rows to draw. If positive, the rows are sorted in descending order; if negative, the rows are sorted in ascending order
- __remainder *(bool, default=True)*__ – whether to put the remaining values other than __n__ selected into a separate sector
- __text *({'percent', 'value'}, default='percent')*__ – text information to display inside sectors
- __text_orientation *({'auto', 'horizontal', 'radial', 'tangential'}, default='auto')*__ – orientation of text inside sectors
- __hole *(float, default=0.4)*__ – fraction of the radius to cut out of the pie to create a donut chart. Ranges from 0 to 1
- __opacity *(float, default=1)*__ – opacity of the sectors. Ranges from 0 to 1
- __edge *(bool, default=True)*__ – whether to draw sector edges
- __title *(str, default='auto')*__ – title of the graph. When 'auto', the title is generated automatically
- __height *(int, default=None)*__ – height of the figure in pixels
- __width *(int, default=None)*__ – width of the figure in pixels
- __font_size *(int, default=12)*__ – size of the global font
- __**kwargs _(optional)___ – see [`plotly.express.pie`](https://plotly.com/python-api-reference/generated/plotly.express.pie.html#plotly.express.pie) for other possible arguments

In [37]:
painter = ChartPainter(data_holder)
painter.pie(labels='org:role', values='duration')

If only __labels__ are given, sector sizes are proportional to the quantity of rows in each category.

In [38]:
painter = ChartPainter(data_holder)
painter.pie(labels='org:role')

If the number of slices is large, it is recommended to use the __n__ argument that controls the number of displayed sectors. The remaining sectors can be grouped into one ('Others') by using the __remainder__ argument.

In [39]:
painter = ChartPainter(activity_metric)
painter.pie(labels=data_holder.activity_column, values='median_duration', n=15, remainder=True)

The __text__ and __text_orientation__ parameters determine the type and orientation of text information to display inside sectors. 

By using the __hole__ argument, it is also possible to create a donut chart.

In [40]:
painter = ChartPainter(data)
painter.pie(labels='case:Permit OrganizationalEntity', n=10, remainder=False, text='value', 
            text_orientation='horizontal', hole=0)

## Sunburst Plot
[Back to TOC](#TOC)

A __sunburst diagram__ displays a hierarchical data. Each level of the hierarchy is represented by one ring or circle with the innermost circle as the top of the hierarchy. A sunburst chart with multiple levels of categories shows how the outer rings relate to the inner rings. 

`ChartPainter.sunburst` parameters:
- __path *(list of str)*__ – names of the columns that correspond to different levels of the hierarchy of sectors, from root to leaves
- __values *(str, default=None)*__ – name of the column used to set values to sectors
- __color *(str, default=None)*__ – name of the column used to set color to sectors
- __maxdepth *(int, default=-1)*__ – number of displayed sectors from any level. If -1, all levels in the hierarchy are shown
- __text_orientation *({'auto', 'horizontal', 'radial', 'tangential'}, default='auto')*__ – orientation of text inside sectors
- __title *(str, default='auto')*__ – title of the graph. When 'auto', the title is generated automatically
- __height *(int, default=None)*__ – height of the figure in pixels
- __width *(int, default=None)*__ – width of the figure in pixels
- __font_size *(int, default=12)*__ – size of the global font
- __**kwargs _(optional)___ – see [`plotly.express.sunburst`](https://plotly.com/python-api-reference/generated/plotly.express.sunburst.html#plotly.express.sunburst) for other possible arguments

In [41]:
painter = ChartPainter(data.head(100))
painter.sunburst(path=['case:Permit OrganizationalEntity', 'case:Permit BudgetNumber'], text_orientation='radial')

The interactivity of this type of diagram means the ability to click on a sector to see its breakdown to lower levels and thus to explore each level of the hierarchy in more detail.

The __maxdepth__ parameter controls the number of displayed levels. To see lower levels, it is necessary to click on a sector.

In [42]:
painter = ChartPainter(data.head(100))
painter.sunburst(path=['case:Permit OrganizationalEntity', 'case:Permit BudgetNumber', 'case:Permit ProjectNumber',
                       'case:Permit ActivityNumber', 'case:Permit TaskNumber'], 
                 text_orientation='radial', maxdepth=3)

## Heatmap
[Back to TOC](#TOC)

A __heatmap__ is a data visualization technique that shows magnitude of a phenomenon as color in two dimensions. The variation in color may be by hue or intensity, giving obvious visual cues to the reader about how the phenomenon is clustered or varies over space. 

`ChartPainter.heatmap` parameters:
- __labels *((x, y, color), default=None)*__ – label names to display in the figure for axis (x and y) and colorbar (color) titles and hover boxes
- __text *(bool, default=False)*__ – whether to show annotation text in the figure
- __decimals *(int, default=2)*__ – number of decimal places to round annotations to
- __xaxis_side *({'bottom', 'top'}, default='bottom')*__ – position of the x-axis in the figure
- __title *(str, default='auto')*__ – title of the graph. When 'auto', the title is generated automatically
- __height *(int, default=None)*__ – height of the figure in pixels
- __width *(int, default=None)*__ – width of the figure in pixels
- __font_size *(int, default=12)*__ – size of the global font
- __**kwargs _(optional)___ – see [`plotly.figure_factory.create_annotated_heatmap`](https://plotly.com/python-api-reference/generated/plotly.figure_factory.create_annotated_heatmap.html#plotly.figure_factory.create_annotated_heatmap) for other possible arguments

The input data must be given as a pandas.DataFrame. The __labels__ argument sets names used in the figure for axis and colorbar titles and hover information. To add annotations, the __text__ parameter should be used.

In [43]:
painter = ChartPainter(id_metric.apply().corr(), palette='sequential.Blues')
painter.heatmap(labels=('ID metrics', 'ID metrics', 'Correlation'), text=True, decimals=3)

## 2D Histogram
[Back to TOC](#TOC)

A __2D histogram__, also known as a __density heatmap__, is the 2-dimensional generalization of a histogram which resembles a heatmap but is computed by grouping a set of points specified by their x and y coordinates into bins, and applying an aggregation function such as count or sum to compute the color of the tile representing the bin. 

`ChartPainter.density_heatmap` parameters:
- __x *(str or list of str, default=None)*__ – name of the column to draw on the x-axis. If it takes a list of column names, the input data is considered as wide-form rather than long-form
- __y *(str or list of str, default=None)*__ – name of the column to draw on the y-axis. If it takes a list of column names, the input data is considered as wide-form rather than long-form
- __color *(str, default=None)*__ – name of the column to aggregate and set color to blocks
- __subplots *((rows, cols, ncols), default=None)*__ – creates a set of subplots:
    - rows: name of the column to use for constructing subplots along the y-axis
    - cols: name of the column to use for constructing subplots along the x-axis
    - ncols: number of columns of the subplot grid
- __nbins *((nbinsx, nbinsy), default=None)*__ – number of bins along the x-axis and y-axis
- __agg *({'count', 'sum', 'avg', 'min', 'max'}, default=None)*__ – name of the function used to aggregate values of __color__
- __orientation *({'auto', 'v', 'h'}, default='auto')*__ – orientation of the graph: 'v' for vertical and 'h' for horizontal. By default, it is determined automatically based on the input data types
- __title *(str, default='auto')*__ – title of the graph. When 'auto', the title is generated automatically
- __height *(int, default=None)*__ – height of the figure in pixels
- __width *(int, default=None)*__ – width of the figure in pixels
- __font_size *(int, default=12)*__ – size of the global font
- __**kwargs _(optional)___ – see [`plotly.express.density_heatmap`](https://plotly.com/python-api-reference/generated/plotly.express.density_heatmap.html#plotly.express.density_heatmap) for other possible arguments

In [44]:
painter = ChartPainter(data.head(100))
painter.density_heatmap(x='trip_duration', y='case:Amount')

By default, the color of the blocks are defined by the number of observations in each bin. By passing the __color__ and __agg__ arguments, a density heatmap can perform basic aggregation operations to compute the color of the tiles.

The number of bins can be controlled with the __nbins__ parameter.

In [45]:
painter = ChartPainter(data.head(100))
painter.density_heatmap(x='trip_duration', y='case:Amount', color='total_duration', agg='avg', nbins=(15, 20))

## Gantt Chart
[Back to TOC](#TOC)

A __Gantt chart__ is a type of bar chart that illustrates a process flow. This chart lists the tasks to be performed on the vertical axis, and time intervals on the horizontal axis. The width of the horizontal bars in the graph shows the duration of each activity.

`ChartPainter.gantt` parameters:
- __x_start *(str)*__ – name of the start date column to draw on the x-axis
- __x_end *(str)*__ – name of the end date column to draw on the x-axis
- __y *(str, default=None)*__ – name of the task column to draw on the y-axis
- __color *(str, default=None)*__ – name of the column used to set color to bars
- __subplots *((rows, cols, ncols), default=None)*__ – creates a set of subplots:
    - rows: name of the column to use for constructing subplots along the y-axis
    - cols: name of the column to use for constructing subplots along the x-axis
    - ncols: number of columns of the subplot grid
- __text *(str, default=None)*__ – name of the column to use as text labels in the figure
- __decimals *(int, default=2)*__ – number of decimal places to round __text__ labels of a float dtype to
- __opacity *(float, default=1)*__ – opacity of the bars. Ranges from 0 to 1
- __title *(str, default='auto')*__ – title of the graph. When 'auto', the title is generated automatically
- __height *(int, default=None)*__ – height of the figure in pixels
- __width *(int, default=None)*__ – width of the figure in pixels
- __font_size *(int, default=12)*__ – size of the global font
- __**kwargs _(optional)___ – see [`plotly.express.timeline`](https://plotly.com/python-api-reference/generated/plotly.express.timeline.html#plotly.express.timeline) for other possible arguments

In [46]:
data_gannt = data_holder.data.copy()
data_gannt['start_time'] = data_gannt['time:timestamp']
data_gannt['end_time'] = data_gannt['time:timestamp'].shift(-1)
data_gannt['duration'] = data_gannt['duration'] / 3600 / 24

painter = ChartPainter(data_gannt[data_gannt[data_holder.id_column] == 'declaration 10780'][:-1])
painter.gantt(x_start='start_time', x_end='end_time', y=data_holder.activity_column, text='duration')

## Pareto Chart
[Back to TOC](#TOC)

A __Pareto chart__ is a type of chart that contains both bars and a line graph, where individual values are represented in descending order by bars, and the cumulative total is represented by the line. The purpose of the Pareto chart is to highlight the most important among a (typically large) set of factors.

`ChartPainter.pareto` parameters:
- __x *(str)*__ – name of the column to make graph for
- __bins *(list of int or 'auto', default='auto')*__ – list of the x coordinates of the bars. If 'auto', bins are determined automatically
- __text *(bool, default=False)*__ – whether to show text labels in the figure
- __decimals *(int, default=2)*__ – number of decimal places to round cumulative percentage to
- __opacity *(float, default=0.8)*__ – opacity of the bars. Ranges from 0 to 1
- __edge *(bool, default=False)*__ – whether to draw bar edges
- __title *(str, default='auto')*__ – title of the graph. When 'auto', the title is generated automatically
- __height *(int, default=None)*__ – height of the figure in pixels
- __width *(int, default=None)*__ – width of the figure in pixels
- __font_size *(int, default=12)*__ – size of the global font

If __x__ takes the name of the categorical variable, the __bins__ are the unique categories.

In [47]:
painter = ChartPainter(data)
painter.pareto(x='case:Permit OrganizationalEntity', text=True, edge=True)

If __x__ takes the name of the continuous variable, the __bins__ are the intervals to split the input data into. By default, they are determined automatically, but it is also possible to pass the argument the list of the values.

In [48]:
painter = ChartPainter(IdMetric(data_holder, time_unit='w'))
painter.pareto(x='total_duration', bins='auto')

## Sankey Diagram
[Back to TOC](#TOC)

__Sankey diagram__ is a data visualization type that depicts flows (of any kind) and their quantities in proportion to one another. It consists of nodes connected by lines or arrows. The width of the lines in this chart type depends on the amount of flow from source to target: the bigger the quantity, the wider the line. This diagram is helpful in showing diverse processes and their states.

`ChartPainter.sankey` parameters:
- __n *(int, default=10)*__ – number of the most frequent process traces to make graph for
- __sort_labels *(bool, default=False)*__ – whether to sort labels to rearrange nodes in the figure
- __colored_links *(bool, default=True)*__ – whether to set colors to links. If *False*, a translucent grey is used. If *True*, links are colored according to the source nodes
- __opacity *(float, default=0.5)*__ – opacity of the links. Ranges from 0 to 1
- __orientation *({'v', 'h'}, default='auto')*__ – orientation of the graph: 'v' for vertical and 'h' for horizontal
- __title *(str, default='auto')*__ – title of the graph. When 'auto', the title is generated automatically
- __height *(int, default=None)*__ – height of the figure in pixels
- __width *(int, default=None)*__ – width of the figure in pixels
- __font_size *(int, default=10)*__ – size of the global font
- __**kwargs _(optional)___ – see [`plotly.graph_objects.Sankey`](https://plotly.com/python-api-reference/generated/plotly.graph_objects.Sankey.html#plotly.graph_objects.Sankey) for other possible arguments

In [49]:
painter = ChartPainter(data_holder)
painter.sankey(n=10, sort_labels=False)