# Data Visualization

## DataPy - WiSe25/26 - Week 8

### Data Science Practical Python

# Review & Context

## Previous Topics

- âœ“ Python Fundamentals (Week 1-3)
- âœ“ NumPy & Arrays (Week 4)  
- âœ“ Pandas & DataFrames (Week 5)
- âœ“ Data Cleaning & Transformation (Week 7)

## Today

**Overall Goal: Visualizing data for the semester project**

# Why Data Visualization?

## A Picture is Worth 1000 Numbers

| Advantages | Examples |
|----------|----------|
| Quick understanding | Recognize patterns & trends |
| Discover outliers | Identify anomalies |
| Communication | Present results |
| Decision making | Data-driven insights |

---

> *"The greatest value of a picture is when it forces us to notice what we never expected to see."* - John Tukey

# Visualization Types Overview

## Which Chart for What?

| Goal | Suitable Plots |
|------|----------------|
| **Comparison** | Bar chart, Line chart |
| **Distribution** | Histogram, Boxplot, Violin Plot |
| **Relationship** | Scatterplot, Heatmap |
| **Composition** | Pie chart, Stacked Bar |
| **Time series** | Line chart, Area Chart |

**Important:** Choose visualization appropriate to the question!

# The Most Important Plot Types

<img src="figures\line chart.png" style="float: right; width: 40%; margin-right: 50px;">

## 1. Line Plot
- Time series, trends over time
- Continuous data

# The Most Important Plot Types

<img src="figures\barChart_vs_histogram.png" style="float: right; width: 40%; margin-right: 50px;">

## 2. Bar Chart
- Comparison of categories
- Discrete data

## 3. Histogram
- Distribution of numerical data
- Frequencies

# The Most Important Plot Types

<img src="figures\map_overview.png" style="float: right; width: 50%; margin-right: 100px;">

## 4. Scatterplot
- Relationship between two variables
- Recognize correlations

# The Most Important Plot Types

<img src="figures\boxplot.png" style="float: left; width: 35%; margin-right: 150px;">


## 5. Boxplot
- Show distribution
- Median, quartiles, outliers

# The Most Important Plot Types

<img src="figures\heatmap.png" style="float: left; width: 50%;margin-right: 50px">


## 6. Heatmap
- Correlations between many variables
- Color-coded values

# Python Visualization Libraries

## Three Main Libraries

### 1. Matplotlib
- Foundation of Python visualization
- Maximum flexibility & control
- Most widely used

### 2. Seaborn
- Built on Matplotlib
- Beautiful statistical plots
- Less code needed

### 3. Plotly
- Interactive visualizations
- Modern & web-ready
- Hover effects & zoom

# Matplotlib: The Foundation

## Basic Structure

```python
import matplotlib.pyplot as plt

# Create figure and axis
fig, ax = plt.subplots(figsize=(10, 6))

# Plot data
ax.plot(x, y)

# Labels & title
ax.set_xlabel('X-Axis')
ax.set_ylabel('Y-Axis')
ax.set_title('My First Plot')

# Show plot
plt.show()
```

# Matplotlib: Important Plot Types

```python
# Line plot
plt.plot(x, y)

# Scatter plot
plt.scatter(x, y)

# Bar chart
plt.bar(categories, values)

# Histogram
plt.hist(data, bins=20)

# Boxplot
plt.boxplot(data)
```

**Tip:** Always add labels and title!

# Matplotlib: Subplots

## Multiple Plots Side by Side

```python
# 2 rows, 2 columns
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# First plot (top left)
axes[0, 0].plot(x, y1)
axes[0, 0].set_title('Plot 1')

# Second plot (top right)
axes[0, 1].scatter(x, y2)
axes[0, 1].set_title('Plot 2')

# Third plot (bottom left)
axes[1, 0].bar(categories, values)
axes[1, 0].set_title('Plot 3')

# Fourth plot (bottom right)
axes[1, 1].hist(data, bins=20)
axes[1, 1].set_title('Plot 4')

plt.tight_layout()
plt.show()
```

# Seaborn: Statistical Plots

## Elegant & Simple

```python
import seaborn as sns

# Set style
sns.set_style('whitegrid')

# Histogram with KDE
sns.histplot(data=df, x='age', kde=True)

# Boxplot by category
sns.boxplot(data=df, x='category', y='value')

# Violin plot
sns.violinplot(data=df, x='category', y='value')

# Correlation heatmap
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
```

# Seaborn: Pairplot

## All Relationships at a Glance

```python
# Automatic pairwise plots of all numerical columns
sns.pairplot(df, hue='target_column')
plt.show()
```

**Perfect for:** 
- Exploratory Data Analysis (EDA)
- Quickly recognize correlations
- Identify outliers

# Plotly: Interactive Visualizations

## Zoom, Hover & More

```python
import plotly.express as px

# Interactive line plot
fig = px.line(df, x='date', y='value', title='Time Series')
fig.show()

# Interactive scatter with color
fig = px.scatter(df, x='x', y='y', color='category',
                 hover_data=['info'])
fig.show()

# Interactive boxplot
fig = px.box(df, x='category', y='value')
fig.show()
```

**Advantage:** Explore data interactively!

# Relation to Semester Project

## Use Week 7 Cleaned Data

**Goal Today:**
- Create visualizations for your project
- Recognize patterns in data
- Identify relationships between variables

## Typical Analyses
1. **Distribution:** How is data distributed?
2. **Correlations:** Which features are related?
3. **Outliers:** Are there unusual values?
4. **Groups:** Are there differences between groups?

# Visualization Best Practices

## Do's
âœ“ Always label axes  
âœ“ Add title  
âœ“ Choose appropriate colors  
âœ“ Save figures (PNG, PDF)  

## Don'ts
âœ— Too many colors  
âœ— 3D charts (hard to read)  
âœ— Too much information in one plot  
âœ— Misleading axes (not starting at 0)

# Saving Figures

## For Documentation & Presentation

```python
# Matplotlib/Seaborn
plt.savefig('my_plot.png', dpi=300, bbox_inches='tight')
plt.savefig('my_plot.pdf')  # For LaTeX

# Plotly
fig.write_image('my_plot.png')
fig.write_html('my_plot.html')  # Interactive!
```

**Tip:** Create a `figures/` folder for your project

# Tasks for Today

## Exercise Notebooks

**Work on your project data (cleaned from Week 7):**

1. **Matplotlib**: Create basic plots (Histogram, Scatterplot, Subplots)
2. **Seaborn**: Statistical plots (Boxplot, Violin, Heatmap)
3. **Plotly**: Create interactive visualizations
4. **Project relation**: Consider which plots make sense for your project

---

**Exercise in Jupyter Notebook:** `8_exercise_en.ipynb`

# Useful Resources

## ðŸ“š Documentation
- Matplotlib: matplotlib.org
- Seaborn: seaborn.pydata.org
- Plotly: plotly.com/python

## ðŸŽ¨ Inspiration
- Python Graph Gallery: python-graph-gallery.com
- Seaborn Gallery: seaborn.pydata.org/examples

## ðŸŽ¯ Tutorials
- Kaggle Learn: Visualization Course
- Real Python: Python Plotting Guides