**Course**: Data Visualization (Prof. Dr. Heike Leitte, Jan-Tobias Sohns, TU Kaiserslautern),   **Name**: XXX XXX,   **Date**: XX.XX.2020

# Comments regarding assignments

Each assignment consists of **two pieces**:
1. A jupyter notebook with practical exercises.
2. An OLAT questionaire that contains questions regarding the material of the lecture and the notebook.

Modalities for credit points:
- To qualify for the exam (Prüfungsvoraussetzung), you have to obtain 80% of points in each assignment.
- Points are only given through the questionaire in OLAT. Many questions will be related to material you learned or practiced in the notebook.
- While questionaires are open, you can retake them until you have enough credit points to pass.

**Submission instructions**:
- Finish the practical exercises in the notebook.
- Fill in the OLAT questionaire (which includes the submission of an HTML export of the notebook)
- The submission of the notebook is mandatory
- No group work allowed. You may discuss strategies and solutions, but every student has to do their own implementation.

<div class="alert alert-info">
    
# Assignment 1 - Visualizing Data
</div>

The **goals** of the first assignment are:
- Get familiar with python programming in the jupyter notebook;
- Be able to create a data visualization using bokeh;
- Recreate an existing visualization and develop an eye for key features;
- Start critical thinking about design options;



To achieve these goals, your task is to create a visualization of the weather in Kaiserslautern in 2018. The visualization should be similar to the following chart from the New York Times (Jan. 11, 1981, p. 32; Tufte (1983), p. 30) and needs to be implemented in bokeh+pandas:

![New York city's weather for 1980 from the New York Times](http://euclid.psych.yorku.ca/SCS/Gallery/images/NYweather.jpg)


<div class="alert alert-danger">

**Important**: While no points will be awarded for typing the correct answers in the notebooks, it is highly advised to solve the tasks thoroughly. They are designed to be encouraging and provide you with valuable learnings for the exam, understanding of the methods and practical coding.
</div>

<div class="alert alert-success">
    
All tasks in this notebook are marked in green.
</div>

<div class="alert alert-info">
    
## 1. Starter Code - Minimal working example
</div>

The following pieces of code load the data for this assignment and generate a minimal chart for the temperature data. More details can be found in the [bokeh documentation](https://docs.bokeh.org/en/latest/docs/user_guide/quickstart.html).

First load all necessary python modules:

In [1]:
import pandas as pd

from bokeh.plotting import figure, output_notebook, show
from bokeh.models import Band, ColumnDataSource, PrintfTickFormatter, DatetimeTickFormatter, Label, Title
from bokeh.layouts import column
from bokeh.models.tickers import MonthsTicker
from bokeh.transform import dodge
from bokeh.models import LinearAxis, Range1d, Line

output_notebook()

Load the data given in csv-file format using the pandas library and display the first lines of the data table.

In [2]:
df_kl = pd.read_csv('KLweather2018.csv', parse_dates=['Timestamp'], index_col='Timestamp')
df_kl_prec = pd.read_csv('KLweather2018_monthlyPrecipitation.csv', parse_dates=['Timestamp'], index_col='Timestamp')


# --- Data Cleaning ---
# The first entry in KLweather2018.csv is duplicated. We remove it.
df_kl = df_kl.loc[~df_kl.index.duplicated(keep='first')]

df_kl.head()

Unnamed: 0_level_0,temp_min,temp_max,temp_normal_min,temp_normal_max,rel_humidity
Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-01-01,4.8,9.3,-0.031034,4.875862,79.75
2018-01-02,4.3,6.4,-0.3,4.996552,83.58
2018-01-03,5.4,10.7,0.310345,5.182759,83.46
2018-01-04,4.8,12.4,-0.351724,5.027586,90.5
2018-01-05,7.6,9.7,0.22069,5.182759,88.42


Plot the temperature minimum as a line chart with bokeh using default settings.

In [3]:
# create a figure
p = figure(height=400, x_axis_type="datetime")

# define the type of glyph that is rendered and its data. here: a polyline
p.line(source=df_kl, x='Timestamp', y='temp_min', line_width = 4)

# render the chart
show(p)

<div class="alert alert-info">
    
## 2. Customizing the temperature chart
</div>

As detailed above, your visualization should look like a modern version of the one from the New York Times. This can be achieved by changing the graphical elements and styling visual properties. In the function below some elements are already changed. Update the code to make the temperature chart even more similar:

<div class="alert alert-success">
    
- Depict the normal high and low temperatures as polylines.
- Label the two polylines. You may use the legend functionality.
- Depict the daily temperature range as an area.
- Label the y-axis.
- Style visual attributes (color, line style) to your liking.
    
</div>

Helpful ressources:
- [Plotting with basic glyphs](https://docs.bokeh.org/en/latest/docs/user_guide/plotting.html) - Overview of glyph types that are implemented in bokeh; see the examples for all the graphical primitives that can be plotted directly.
- [Styling visual attributes](https://docs.bokeh.org/en/latest/docs/user_guide/styling.html) - See styling options for chart elements

**2. Enhanced Temperature Chart** <br>
This function creates the main temperature chart. It visualizes the daily temperature range as a shaded area and overlays the normal high and low temperatures as dashed lines. It also automatically finds and annotates the hottest and coldest days of the year.

In [4]:
def create_temperature_chart(df, width=900):
    '''Creates an enhanced Bokeh figure for temperature data.'''

    source = ColumnDataSource(df)

    # --- Create the Figure ---
    # We set the title, tools, and y-axis range
    p = figure(
        width=width, height=400,
        title="Kaiserslautern's Weather in 2018",
        x_axis_type="datetime",
        x_axis_location="above",
        y_range=(-20, 40),
        tools="pan,wheel_zoom,box_zoom,reset,save"
    )

    # --- Add Glyphs for Temperature Ranges ---
    # Daily temperature range (shaded area between max and min)
    p.varea(
        x='Timestamp', y1='temp_min', y2='temp_max', source=source,
        fill_color='lightgray', fill_alpha=0.5, legend_label="Daily Temperature Range"
    )

    # Normal low and high temperatures (dashed lines)
    p.line(
        x='Timestamp', y='temp_normal_min', source=source,
        line_dash='dashed', line_color='blue', line_width=1.5,
        legend_label="Normal Low"
    )
    p.line(
        x='Timestamp', y='temp_normal_max', source=source,
        line_dash='dashed', line_color='red', line_width=1.5,
        legend_label="Normal High"
    )

    # --- Find and Annotate Extreme Temperatures ---
    # Find the index of the hottest and coldest days
    tmax_date = df['temp_max'].idxmax()
    tmin_date = df['temp_min'].idxmin()

    tmax_val = df.loc[tmax_date, 'temp_max']
    tmin_val = df.loc[tmin_date, 'temp_min']

    # Create annotations (Labels) for these points
    max_label = Label(
        x=tmax_date, y=tmax_val,
        text=f"Hottest: {tmax_val}°C",
        x_offset=10, y_offset=-10,
        background_fill_color='white', background_fill_alpha=0.7
    )
    min_label = Label(
        x=tmin_date, y=tmin_val,
        text=f"Coldest: {tmin_val}°C",
        x_offset=10, y_offset=10,
        background_fill_color='white', background_fill_alpha=0.7
    )

    p.add_layout(max_label)
    p.add_layout(min_label)

    # --- Styling the Plot ---
    p.title.align = "center"
    p.title.text_font_size = "16pt"

    # Y-axis styling
    p.yaxis.axis_label = "Temperature [°C]"
    p.yaxis.formatter = PrintfTickFormatter(format="%d°C")

    # X-axis styling to show months
    p.xaxis.ticker = MonthsTicker(months=list(range(12)))
    p.xaxis.formatter = DatetimeTickFormatter(months="%b")
    p.xgrid.ticker = MonthsTicker(months=list(range(12)))
    p.xgrid.grid_line_color = 'gray'
    p.xgrid.grid_line_alpha = 0.2

    # Legend styling
    p.legend.location = "top_left"
    p.legend.background_fill_alpha = 0.7
    p.legend.border_line_alpha = 0

    return p

# Create and show the temperature chart
p_temp = create_temperature_chart(df_kl)
show(p_temp)

In [5]:
p = figure(width=400, height=400)

x1 = [1, 3, 2]
y1 = [3, 4, 6, 6]
x2 = [2, 1, 4]
y2 = [4, 7, 8, 5]
p.multi_line([x1, y1], [x2, y2],
             color=["firebrick", "navy"], alpha=[0.8, 0.3], line_width=4)

show(p)

<div class="alert alert-info">
    
## 3. Filtering data and making annotations
</div>

The following piece of code demonstrates how to find maxima in a data column. Use this code to automatically find the highest and lowest temperature values in 2018 and place a mark in the chart above at these positions (e.g. circle the respective data points).

<div class="alert alert-success">
    
- Automatically filter the highest and lowest temperatures in Kaiserslautern in 2018.
- Integrate the code in the chart computation method above and mark the two detected positions.
- Add text labels to the positions. [Label documentation](https://docs.bokeh.org/en/latest/docs/user_guide/annotations.html#labels) for bokeh.
    
</div>

In [6]:
from bokeh.models import ColumnDataSource, Label, LabelSet, Range1d, LinearAxis
from bokeh.plotting import figure, show
from bokeh.io import output_notebook

# Use output_notebook() for inline plotting in Jupyter environments
output_notebook()

# --- Data Preparation ---
# Using a ColumnDataSource is the standard way to pass data to Bokeh plots
source = ColumnDataSource(data=dict(
    height=[66, 71, 72, 68, 58, 62],
    weight=[165, 189, 220, 141, 260, 174],
    names=['Mark', 'Amir', 'Matt', 'Greg', 'Owen', 'Juan']
))

# --- Create the Figure ---
# Added standard interactive tools for a better user experience
p = figure(
    title='Distribution of 10th Grade Students',
    x_range=Range1d(140, 275),
    tools="pan,wheel_zoom,box_zoom,reset,save",
    width=700,
    height=500
)

# --- Styling ---
# Adjusted title styling for better readability
p.title.align = "center"
p.title.text_font_size = "16pt"
p.xaxis.axis_label = 'Weight (lbs)'
p.yaxis.axis_label = 'Height (in)'

# --- Add Glyphs ---
# Add the scatter plot glyphs to the figure
p.scatter(
    x='weight',
    y='height',
    size=10,        # Increased size for better visibility
    source=source,
    legend_label="Students",
    fill_color="royalblue",
    alpha=0.6
)

# --- Add Annotations ---
# The 'render_mode' attribute has been removed in modern Bokeh versions.
# LabelSet automatically adds a label for each point in the data source.
labels = LabelSet(
    x='weight', y='height', text='names', x_offset=5, y_offset=5,
    source=source, text_font_size='10pt'
)

# A single Label can be used for a general citation or note.
citation = Label(
    x=10, y=10, x_units='screen', y_units='screen',
    text='Source: Fictional data collected by L.C. 2016-04-01',
    border_line_color='black',
    background_fill_color='white',
    background_fill_alpha=0.7
)

p.add_layout(labels)
p.add_layout(citation)

# Improve legend placement
p.legend.location = "top_left"
p.legend.click_policy = "hide"

# Display the plot
show(p)

In [7]:
import pandas as pd
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, LabelSet
from bokeh.io import output_notebook

# Use output_notebook() for inline plotting in Jupyter environments
output_notebook()

# --- Data Preparation ---
# It's good practice to start with a pandas DataFrame
source_df = pd.DataFrame(
    dict(
        off_rating=[66, 71, 72, 68, 58, 62],
        def_rating=[165, 189, 220, 141, 260, 174],
        names=['Mark', 'Amir', 'Matt', 'Greg', 'Owen', 'Juan']
    )
)
# Convert the DataFrame to a ColumnDataSource for Bokeh
source = ColumnDataSource(source_df)

# --- Create the Figure ---
# The bokeh.charts API is deprecated. The standard way is to use bokeh.plotting.figure
# and add glyphs like p.scatter().
p = figure(
    title='Offensive vs. Defensive Efficiency',
    tools="pan,wheel_zoom,box_zoom,reset,save,hover",
    tooltips=[("Name", "@names"), ("Offensive", "@off_rating"), ("Defensive", "@def_rating")],
    width=700,
    height=500
)

# --- Add Glyphs ---
p.scatter(
    x='off_rating',
    y='def_rating',
    source=source,
    size=12,
    color='navy',
    alpha=0.6,
    legend_label="Players"
)

# --- Add Annotations ---
# The 'render_mode' and 'level' attributes are no longer needed for LabelSet
labels = LabelSet(
    x='off_rating',
    y='def_rating',
    text='names',
    x_offset=5,
    y_offset=5,
    source=source,
    text_font_size='10pt'
)

p.add_layout(labels)

# --- Styling ---
p.title.align = "center"
p.title.text_font_size = "16pt"
p.xaxis.axis_label = 'Offensive Rating'
p.yaxis.axis_label = 'Defensive Rating'
p.legend.location = "top_left"
p.legend.click_policy = "hide"

# Display the plot
show(p)

In [8]:
tmax_id = df_kl['temp_max'].idxmax()
tmin_id = df_kl['temp_min'].idxmin()
print("KL temperature maximum:", tmax_id, df_kl.at[tmax_id,'temp_max'])
print("KL temperature minimum:", tmin_id, df_kl.at[tmin_id,'temp_min'])

KL temperature maximum: 2018-08-04 00:00:00 35.5
KL temperature minimum: 2018-02-28 00:00:00 -14.0


<div class="alert alert-info">

## 4. Designing additional charts
</div>

Now design the charts for precipitation and relative humidity.

<div class="alert alert-success">
    
- Create the chart for precipitation. Try to design a bar chart using the hints below.
- Create the chart for humidity.
    
</div>

Hints for temporal x-axis:
- **Width of bars**: The width is given milliseconds. In order to get the required scaling, you will need to specify the width like: `widthInDays = ndays*24*60*60*1000` (24 hours * 60 minutes * 60 seconds * 1000 milliseconds)
- **Position of bars**: You can shift the bars using the dodge function `x=dodge('prec', value, range=p.x_range)`. Keep in mind that you need to define an appropriate `value` by which to shift the bar.

In [9]:
df_kl_prec.head(12)

Unnamed: 0_level_0,prec,prec_normal
Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2018-01-16 00:00:00,128.4,60.4
2018-02-14 12:00:00,13.7,48.414286
2018-03-16 00:00:00,41.7,53.444828
2018-04-15 12:00:00,30.5,47.268966
2018-05-16 00:00:00,108.9,65.924138
2018-06-15 12:00:00,81.5,67.137931
2018-07-16 00:00:00,41.6,60.521429
2018-08-16 00:00:00,40.7,57.653571
2018-09-15 12:00:00,27.0,52.596552
2018-10-16 00:00:00,11.1,65.413793


**4. Precipitation and Humidity Charts**
Next, we create functions for the precipitation and humidity charts. The precipitation chart is a bar chart comparing the monthly totals for 2018 against the historical normal. The humidity chart is a simple line graph. To enable linked panning and zooming, these functions accept the x-axis range from the main temperature plot.

In [10]:
def create_precipitation_chart(df, width=900, x_range=None):
    '''Creates a Bokeh bar chart for monthly precipitation.'''

    source = ColumnDataSource(df)

    p = figure(
        width=width, height=200,
        title="Precipitation",
        x_axis_type="datetime",
        x_range=x_range,  # Link x-axis with the temperature chart
        tools=""
    )

    # --- Add VBar Glyphs for Precipitation ---
    # Width of bars (approx. 12 days in milliseconds)
    bar_width = 12 * 24 * 60 * 60 * 1000

    # 2018 precipitation bars
    p.vbar(
        x=dodge('Timestamp', -bar_width/4, range=p.x_range), top='prec', source=source,
        width=bar_width/2, color="#718dbf", legend_label="2018"
    )

    # Normal precipitation bars
    p.vbar(
        x=dodge('Timestamp', bar_width/4, range=p.x_range), top='prec_normal', source=source,
        width=bar_width/2, color="#c9d9d3", legend_label="Normal"
    )

    # --- Styling the Plot ---
    p.title.align = "center"
    p.xgrid.grid_line_color = None
    p.yaxis.axis_label = "Precipitation [mm]"
    p.xaxis.visible = False # Hide x-axis to avoid clutter
    p.legend.location = "top_left"
    p.legend.orientation = "horizontal"
    p.legend.border_line_alpha = 0

    return p

def create_humidity_chart(df, width=900, x_range=None):
    '''Creates a Bokeh line chart for relative humidity.'''

    source = ColumnDataSource(df)

    p = figure(
        width=width, height=200,
        title="Relative Humidity",
        x_axis_type="datetime",
        x_range=x_range, # Link x-axis with the temperature chart
        tools=""
    )

    p.line(
        x='Timestamp', y='rel_humidity', source=source,
        line_color="green", line_width=2
    )

    # --- Styling the Plot ---
    p.title.align = "center"
    p.yaxis.axis_label = "Humidity [%]"
    p.yaxis.formatter = PrintfTickFormatter(format="%d%%")
    p.xaxis.visible = False # Hide x-axis to avoid clutter

    return p

# Create the other charts, linking their x-range to the temperature plot
p_precip = create_precipitation_chart(df_kl_prec, x_range=p_temp.x_range)
p_humidity = create_humidity_chart(df_kl, x_range=p_temp.x_range)

# To see the individual charts (optional)
show(p_precip)
# show(p_humidity)

In [11]:
show(p_humidity)

<div class="alert alert-info">
    
## 5. Combining multiple charts
</div>

In this last part, we combine the three charts you designed above.

<div class="alert alert-success">
    
- Create the combined weather chart for Kaiserslautern.
- Save a jpg/png-version or screenshot of this chart that can be uploaded in OLAT.
    
</div>

**5. Combining All Charts into a Final Layout**
Finally, we combine the three separate plots into a single, cohesive visualization using Bokeh's `column` layout. This stacks the charts vertically, and because we linked their x-axes, they will pan and zoom together.

In [12]:
# --- Combine the plots into a single column layout ---
final_layout = column(p_temp, p_precip, p_humidity)

# Show the final combined visualization
show(final_layout)