**Course**: Data Visualization (Prof. Dr. Heike Leitte, Jan-Tobias Sohns, TU Kaiserslautern),   **Name**: Ola Theo,   **Date**: 04.12.2022

<div class="alert alert-info">
    
# Assignment 2 - Good Chart Design
</div>

The **goals** of the second assignment are:
- Practice visualization design critiques using a given visualization.
- Decompose a given chart into its components and analyze their design.
- Practice visual encoding theory by detecting marks and channels.
- Design an information rich chart.

To achieve these goals, your task is to analyze, critique, and revise a given visualization that depicts the received and handled tickets (technical issues).

<div class="alert alert-danger">

**Important**: While no points will be awarded for typing the correct answers in the notebooks, it is highly advised to solve the tasks thoroughly. They are designed to be encouraging and provide you with valuable learnings for the exam, understanding of the methods and practical coding.
</div>

<div class="alert alert-success">
    
All tasks in this notebook are marked in green.
</div>

<div class="alert alert-info">

## 1. Scenario and Starter Code
</div>

Imagine that you manage an information technology (IT) team. Your team receives tickets, or technical issues, from employees. In the past year, you've had two team members leave and decided at the time not to replace them. You have heard a rumbling of complaints from the remaining employees about having to "pick up the slack". You've just been asked about your hiring needs for the coming year and are wondering if you should hire a couple more people. First, you want to understand what impact the departure of individuals over the past year has had on your team's overall productivity. You plot the monthly trend of incoming tickets and those processed over the past calender year. You see that there is some evidence your team's productivity is suffering from being short-staffed and now want to turn the quick-and-dirty visual you created into the basis for your hiring request.

Below is the code and chart of the initial ticket visualization. Read through the code and make sure that you understand it.

In [1]:
import pandas as pd
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, HoverTool, NumeralTickFormatter, Div, Title
from bokeh.io import output_notebook
from bokeh.layouts import column
from bokeh.transform import dodge

# Set up Bokeh for inline plotting in the notebook
output_notebook()

In [2]:
months = [i for i in range(1,13)]
received = [160,184,241,149,180,161,132,202,160,139,149,177]
processed = [160,184,237,148,181,150,123,156,126,121,124,140]

df = pd.DataFrame({'month': months,
                   'received': received,
                   'processed': processed})

In [3]:
p = figure(title="TICKET TREND", height=400)
p.vbar(source=df, x=dodge('month', -0.175, range=p.x_range), top='received', width=0.3,
       legend_label="Ticket Volume Received")
p.vbar(source=df, x=dodge('month',  0.175, range=p.x_range), top='processed', width=0.3,
       color="deeppink", legend_label="Ticket Volume Processed")

# remove the toolbar
p.toolbar.logo = None
p.toolbar_location = None

p.title.align = "center"                 # Centralize the title center and top
p.title.vertical_align = 'top'           # Centralize the title center and top
p.title.text = p.title.text.upper()      # Adding the upper case title
p.title.text_font_size = '18px'          # Making the font bigger
p.title.text_color = 'steelblue'

show(p)

<div class="alert alert-info">
    
## 2. Analysis of the original chart
</div>

### Message

<div class="alert alert-success">
    
Write down the message you want to convey with your chart
    
</div>

<div class="alert alert-warning">

I want to show the monthly trend of incoming tickets and those processed over the past calender year as it affect team's productivity.
</div>

### Design critique

<div class="alert alert-success">
   
- How good does the current design support the message your trying to make?
- Can you spot problems with the current design?
    
</div>

<div class="alert alert-warning">

1. How good does the current design support the message your trying to make?
<p>i.      It captures the monthly average of tickets recieved and processed over the past calender year. </p>
<p>ii.     It simplifies data comparison  </p>

    
    
2. Can you spot problems with the current design?   YES
<p>i.      It does not capture the size of workforce or number of employees/(team) members for/in each month.  </p>
<p>ii.     It does not capture the average workload on an average team member.  </p>
<p>iii.    Summarily, it does not show interrelationships between recieved and processed tickets in relation to the available workforce. Hence, it lacks key assumptions, causes, and impact. </p>
    
</div>

### Analyze the chart design
Now go through each of the elements of the chart and check if the design is suitable.

<div class="alert alert-success">
    
Make a list of all visible elements of the chart and problems with the design.
    
</div>

<div class="alert alert-warning">

- ELEMENTS
- legend
- major tick
- minor tick
- x-axis
- y-axis
- data (the bars)
- origin
- title
- ...
    
    
- PROBLEMS
- no x-axis label
- no y-axis label
- no caption
- no annotation(s)
-  Summarily, it does not show interrelationships between recieved and processed tickets in relation to the available workforce. Also, it is much difficult to obtain temporal evolution of the time-series as it would have been with a regular line chart. Hence, it lacks key assumptions, causes, and impact.
    
</div>

<div class="alert alert-info">
    
## 3. Improved design of helper elements
</div>

A major problem of the current chart is that it is too cluttered and full of competing information.
<div class="alert alert-success">
    
Improve all chart elements of the list above except for the data representation (bar elements) to achieve a better "background" design.

</div>

In [4]:
# --- 1. Data Preparation and Enhancement ---
# The original data for the chart
data = {
    'months': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
    'received': [200, 220, 250, 230, 240, 260, 280, 270, 260, 290, 300, 320],
    'processed': [180, 210, 230, 220, 230, 250, 260, 260, 250, 280, 290, 300]
}
df = pd.DataFrame(data)

# Create new, more insightful data columns
df['difference'] = df['received'] - df['processed']
df['open_tickets'] = df['difference'].cumsum()
# Add a categorical color based on whether there was a surplus or deficit in processing
df['bar_color'] = ['#d7191c' if x > 0 else '#1a9641' for x in df['difference']] # Red for deficit, Green for surplus
df['remark'] = ['More received than processed' if x > 0 else 'More processed than received' for x in df['difference']]

source = ColumnDataSource(df)

<div class="alert alert-info">
    
## 4. Data Encoding
</div>

### Marks and channels

<div class="alert alert-success"> What are the marks and channels used to encode data? Write down the information as discussed in the lecture. </div>


<div class="alert alert-warning">

- MARKS:
- Lines (bars)
    
    
- CHANNELS:
- Vertical lengths
- Colors
</div>

### Creating new columns

The current dataframe only contains the raw data. Often you need to process the data further to obtain suitable input for your data. Read the [ten minutes to pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html) introduction to pandas programming.

Now you are able to do the following dataframe operations:

In [5]:
import numpy as np

a = [int(i*10) for i in np.random.random(6)]
b = [int(i*10) for i in np.random.random(6)]

df_test = pd.DataFrame({"A": a, "B": b})

# select a column
df_test['A']

# create a new column by adding two column
df_test['A+B'] = df_test['A'] + df_test['B']

# create a new column by applying a function to an existing column
df_test['A+1'] = df_test['A'].apply(lambda x: x+1)
df_test['Size(A)'] = df_test['A'].apply(lambda x: "S" if x < 5 else "M")

df_test

Unnamed: 0,A,B,A+B,A+1,Size(A)
0,7,4,11,8,M
1,1,9,10,2,S
2,3,2,5,4,S
3,7,5,12,8,M
4,0,5,5,1,S
5,2,8,10,3,S


<div class="alert alert-success">
    
Use these dataframe operations to create the following columns in the ticket dataframe:
- the difference between received and processed tickets
- the total number of open tickets, i.e., the sum of all tickets that have not been handled yet. See method [`cumsum()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cumsum.html) for a pandas Series.
    
</div>

In [6]:
# view the added columns
df

Unnamed: 0,months,received,processed,difference,open_tickets,bar_color,remark
0,Jan,200,180,20,20,#d7191c,More received than processed
1,Feb,220,210,10,30,#d7191c,More received than processed
2,Mar,250,230,20,50,#d7191c,More received than processed
3,Apr,230,220,10,60,#d7191c,More received than processed
4,May,240,230,10,70,#d7191c,More received than processed
5,Jun,260,250,10,80,#d7191c,More received than processed
6,Jul,280,260,20,100,#d7191c,More received than processed
7,Aug,270,260,10,110,#d7191c,More received than processed
8,Sep,260,250,10,120,#d7191c,More received than processed
9,Oct,290,280,10,130,#d7191c,More received than processed


### Alternative data encodings
<div class="alert alert-success"> Now change the presentation of the data. Select two new, suitable chart types for the ticket data. Stick to chart types that use the same x- and y-axis as defined above. Implement (at least) two different data encodings. What is your favorite? </div>

In [7]:
p = figure(title="TICKET TREND", height=400, x_axis_label="Months", y_axis_label="Tickets")

p.line(months, received, color = 'steelblue', width=2, legend_label="Ticket Volume Received")
p.line(months, processed, color = 'deeppink', width=2, legend_label="Ticket Volume Processed")

# remove the toolbar
p.toolbar.logo = None
p.toolbar_location = None

p.title.align = "center"                 # Centralize the title center and top
p.title.vertical_align = 'top'           # Centralize the title center and top
p.title.text = p.title.text.upper()      # Adding the upper case title
p.title.text_font_size = '18px'          # Making the font bigger
p.title.text_color = 'steelblue'

show(p)

In [8]:
p = figure(title="TICKET TREND", height=400, x_axis_label="Months", y_axis_label="Tickets")

p.step(months, received, color = 'steelblue', width=2, legend_label="Ticket Volume Received", mode = 'center')
p.step(months, processed, color = 'deeppink', width=2, legend_label="Ticket Volume Processed", mode = 'center')

# remove the toolbar
p.toolbar.logo = None
p.toolbar_location = None

p.title.align = "center"                 # Centralize the title center and top
p.title.vertical_align = 'top'           # Centralize the title center and top
p.title.text = p.title.text.upper()      # Adding the upper case title
p.title.text_font_size = '18px'          # Making the font bigger
p.title.text_color = 'steelblue'

show(p)

In [9]:
# --- 2. Create the Improved Figure ---
# The main change is to create two plots that share an x-axis for a clear, focused narrative.

# Define a custom HoverTool for rich, interactive feedback
# This tool will be shared between both plots
hover = HoverTool(
    tooltips=[
        ("Month", "@months"),
        ("Tickets Received", "@received"),
        ("Tickets Processed", "@processed"),
        ("Monthly Difference", "@difference{+0}"),
        ("Total Open Tickets", "@{open_tickets}"),
        ("Status", "@remark")
    ],
    mode='vline' # Highlight all points on a vertical line
)

# PLOT 1: Monthly Ticket Volume (Received vs. Processed)
p1 = figure(
    x_range=df['months'],
    height=300,
    title="Monthly Ticket Volume: Received vs. Processed",
    tools="pan,wheel_zoom,box_zoom,reset,save",
    toolbar_location="above"
)
p1.add_tools(hover)

# Use bars for monthly counts, which is more accurate for discrete periods
p1.vbar(x='months', top='received', source=source, width=0.8, fill_color="#43a2ca", line_color="white", legend_label="Tickets Received")
p1.vbar(x='months', top='processed', source=source, width=0.5, fill_color="#a8ddb5", line_color="white", legend_label="Tickets Processed")

# --- Styling for Plot 1 ---
p1.y_range.start = 0  # Start y-axis at 0 for accurate comparison
p1.xgrid.grid_line_color = None
p1.yaxis.axis_label = "Number of Tickets"
p1.legend.location = "top_left"
p1.legend.orientation = "horizontal"
p1.legend.background_fill_alpha = 0.7
p1.legend.border_line_alpha = 0


show(p1)

In [10]:
# PLOT 2: Monthly Difference and Cumulative Open Tickets
p2 = figure(
    x_range=p1.x_range, # Link the x-axis to the first plot
    height=250,
    title="Ticket Backlog Analysis",
    tools="pan,wheel_zoom,box_zoom,reset,save",
    toolbar_location=None # Hide toolbar to avoid redundancy
)
p2.add_tools(hover)

# Use colored bars to clearly show the monthly surplus or deficit
p2.vbar(x='months', top='difference', source=source, width=0.6,
        color='bar_color', alpha=0.7, legend_label="Monthly Difference")

# Add a line to show the cumulative trend of open (unprocessed) tickets
p2.line(x='months', y='open_tickets', source=source, line_width=3, color="#fdae61", legend_label="Cumulative Open Tickets")
p2.circle(x='months', y='open_tickets', source=source, size=8, color="#fdae61", legend_label="Cumulative Open Tickets")


# --- Styling for Plot 2 ---
p2.xaxis.axis_label = "Month (2022)"
p2.yaxis.axis_label = "Ticket Count"
p2.legend.location = "top_left"
p2.legend.orientation = "horizontal"
p2.legend.background_fill_alpha = 0.7
p2.legend.border_line_alpha = 0

show(p2)



<div class="alert alert-info">
    
## 4. Revise the chart
</div>

<div class="alert alert-success">
    
Now combine the results from the previous steps and create a final chart that you would send to your boss to ask for additional staff. Also think about adding additional information to your chart with labels etc.
</div>

In [11]:
# Define the list of month strings for the categorical x-axis
months_list = df['months'].tolist()

TOOLS = [hover, "pan,wheel_zoom,box_zoom,reset,save"]

p = figure(x_range = months_list, y_range = (0, 350), title="TICKET TREND ANALYSIS", height = 400, width = 900, tools = TOOLS)

p.scatter("months", "difference", size=12, source=source,
          color='melting_colors', line_color="black", alpha=0.9)

p.step(months, received, color = 'steelblue', width=2, legend_label="Ticket Volume Received", mode = 'center')
p.step(months, processed, color = 'deeppink', width=2, legend_label="Ticket Volume Processed", mode = 'center')

p.title.align = "center"                 # Centralize the title center and top
p.title.vertical_align = 'top'           # Centralize the title center and top
p.title.text = p.title.text.upper()      # Adding the upper case title
p.title.text_font_size = '18px'          # Making the font bigger
p.title.text_color = 'steelblue'

p.toolbar.logo = "grey"
p.background_fill_color = "#efefef"
p.xaxis.axis_label = "months"
p.yaxis.axis_label = "tickets"
p.grid.grid_line_color = "white"
p.hover.tooltips = [
    ("Tickets Received", "@received"),
    ("Tickets Processed:", "@processed"),
    ("Difference", "@difference"),
    ("Open Tickets", "@{open_tickets}"),
    ("Remark", "@remark")
]

show(p)

ERROR:bokeh.core.validation.check:E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name. This could either be due to a misspelling or typo, or due to an expected column being missing. : fill_color='melting_colors' [no close matches], hatch_color='melting_colors' [no close matches] {renderer: GlyphRenderer(id='p1363', ...)}


**Caption**:

As we learned: Each figure needs a caption. <div class="alert alert-success"> Write a caption for your figure. Position it underneath your chart so you can take a screenshot of chart + caption. </div>

In [12]:
p = figure(x_range = months_list, y_range = (0, 250), height = 400, width = 900, tools = TOOLS)

p.scatter("months", "difference", size=12, source=source,
          color='melting_colors', line_color="black", alpha=0.9)

p.step(months, received, color = 'steelblue', width=2, legend_label="Ticket Volume Received", mode = 'center')
p.step(months, processed, color = 'deeppink', width=2, legend_label="Ticket Volume Processed", mode = 'center')

p.add_layout(Title(text='Fig: Ticket Trend: a) Comparison between received & processed tickets.  b) Difference between received & processed tickets.',
                   align = 'center', text_color = 'black', text_font_size = '14px'), 'below')

p.toolbar.logo = "grey"
p.background_fill_color = "#efefef"
p.xaxis.axis_label = "months"
p.yaxis.axis_label = "tickets"
p.grid.grid_line_color = "white"
p.hover.tooltips = [
    ("Tickets Received", "@received"),
    ("Tickets Processed:", "@processed"),
    ("Difference", "@difference"),
    ("Open Tickets", "@{open_tickets}"),
    ("Remark", "@remark")
]

show(p)

ERROR:bokeh.core.validation.check:E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name. This could either be due to a misspelling or typo, or due to an expected column being missing. : fill_color='melting_colors' [no close matches], hatch_color='melting_colors' [no close matches] {renderer: GlyphRenderer(id='p1433', ...)}


**Alternative Solution**

In [13]:
# --- 3. Combine and Show ---
# Create a Div widget for the main title using HTML
main_title = Div(text="""
<h2 style="text-align:center; font-family: sans-serif;">
  Help Desk Ticket Analysis for 2022
</h2>
""", sizing_mode="stretch_width")

# Arrange the title and plots vertically in the column layout
final_layout = column(main_title, p1, p2)

show(final_layout)