# Data Visualization Exam – 2025-07-15

This exam contains **5 questions**. For each question, execute the provided code cell to generate a *sub-optimal* figure, then improve it as instructed. After making your changes, **reflect** (1-2 sentences) on *why* your improvements enhance the visual communication.

Feel free to add additional cells under each question if needed.


### Question 1 — Despine & Font Size

The plot above has **small fonts**, unnecessary **spines**, and no axis labels.

1. Improve the figure by:
   * Removing the top and right spines.
   * Increasing the font sizes of the title and tick labels.
   * Adding meaningful x- and y-axis labels.

2. Briefly explain *why* each change improves readability.


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

np.random.seed(0)
categories = ['A', 'B', 'C', 'D']
values = np.random.randint(5, 15, size=len(categories))

fig, ax = plt.subplots(figsize=(4, 3))
ax.bar(categories, values, color='grey')
ax.spines[['right', 'top']].set_visible(False)
ax.set_title('Sales', fontsize=18)
ax.set_xlabel('Category', fontsize=14)
ax.set_ylabel('Sales', fontsize=14)
ax.tick_params(axis='both', which='major', labelsize=14)
plt.show()

Removing the top and right spines improves legibility by reducing unnecessary clutter in the graph. Increasing the font size of the title and labels also makes the graph easier to read. Lastly, changing the bar color from `lightgrey` to `grey` makes it easier to see the bars, especially for those who may have poor eyesight.


### Question 2 — Perceptually Uniform Colormap

The heatmap uses the **`jet`** colormap, which is **not perceptually uniform**.

1. Replace the colormap with a perceptually uniform option (look this up please, you'll see huge documentation on it)
2. Adjust the font sizes and add an informative title (make one up).
3. Explain how your chosen colormap and styling choices improve the viewer's ability to discern values.


In [None]:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1)
data = np.random.randn(30, 30)

fig, ax = plt.subplots(figsize=(4, 3))
cax = ax.imshow(data, cmap='magma')
fig.colorbar(cax)
ax.set_title('Average Temperature of Surface', fontsize=18)

plt.show()


The perceptually uniform colormaps makes it easier to discern patterns in the heatmap data (outliers are much brighter or darker). The larger font size also makes it easier to read.


### Question 3 — Over-plotting & Transparency

The scatter plot over-plots points, making dense regions hard to perceive.

1. Mitigate over-plotting using **transparency (`alpha`)**, smaller marker size.
2. [Challenge only] Use a color palette consistent with data density through ax.hexbin(x, y, cmap = 'ARGUMENT'), where you provide argument
3. Provide a short reflection on how these adjustments clarify patterns.


In [None]:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(3)
n = 500
x = np.random.normal(size=n)
y = 2 * x + np.random.normal(size=n)

fig, ax = plt.subplots(figsize=(4, 3))
ax.scatter(x, y, color='red', s=5, alpha=0.25)  # opaque, single color
ax.set_title('Hit Location Distribution', fontsize=16)
plt.show()


In [None]:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(3)
n = 500
x = np.random.normal(size=n)
y = 2 * x + np.random.normal(size=n)

fig, ax = plt.subplots(figsize=(4, 3))
ax.scatter(x, y, color='red', s=5, alpha=0.25)  # opaque, single color
hb = ax.hexbin(x, y, gridsize=25, cmap='viridis_r', mincnt=1)  # show bins with at least 1 point
fig.colorbar(hb, ax=ax, label='Counts per bin')
ax.set_title('Hit Location Distribution', fontsize=16)
plt.show()


In the original plot, over-plotting makes it difficult to make observations for areas of high density and find patterns in the data. Applying transparency and smaller markers makes it easier to find dense clusters. Using `hexbin` further aggregates the data into 2D bins, using color to indicate density.


### Question 4 — Axes & Legend Placement

The time-series plot suffers from **cramped x-axis labels** and a legend that overlaps tick labels.

1. Rotate date ticks, use automatic date formatting, and expand the figure width.
2. Move the legend to a clearer position or outside the plotting area.
3. Explain why these revisions improve interpretability.


In [None]:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.dates as pltdates

np.random.seed(4)
dates = pd.date_range('2025-01-01', periods=100, freq='D')
series1 = np.cumsum(np.random.randn(100))
series2 = np.cumsum(np.random.randn(100))

fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(dates, series1, label='Series 1')
ax.plot(dates, series2, label='Series 2')
ax.set_title('Cumulative Sum Time Series', fontsize=16)
ax.set_xlabel('Date', fontsize=16)
ax.set_ylabel('Cumulative Sum', fontsize=16)
ax.legend(loc='upper left', frameon=False)  # overlays ticks
ax.xaxis.set_major_locator(pltdates.MonthLocator())
ax.xaxis.set_major_formatter(pltdates.DateFormatter('%b %Y'))

plt.xticks(rotation=45)
plt.show()


These changes make it easier to interpret the data because (1) the data is no longer obscured by the legend; (2) the dates are easier to read; and (3) a title and $x$-axis and $y$-axis labels make it easier to understand what the data represents.


### Question 5 — Histogram Bin Choice & Labels

The histogram uses **100 bins**, creating a noisy appearance.

1. Select an appropriate bin width using a rule (e.g., Freedman-Diaconis) or domain knowledge (your domain knowledge in this case can be trial and error).
2. Choose colors with sufficient contrast and remove heavy edges.
3. Add axis labels and a descriptive title.
4. Reflect on how your binning choice balances detail and clarity.


In [None]:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(5)
data = np.random.gamma(shape=2., scale=1.5, size=1000)
fig, ax = plt.subplots(figsize=(4, 3))
ax.hist(data, bins=30, color='steelblue', edgecolor='white')  # bins based on Freedman-Diaconis rule
ax.set_title('Ping Distribution for 1000 Packets', fontsize=16)
ax.set_xlabel('Ping (ms)')
ax.set_ylabel('Count')
plt.show()


The Freedman-Draconis rule for determining bin width uses the formula

$$\text{width} = 2 \cdot \frac{\text{3rd quartile} - \text{1st quartile}}{n^{1/3}}.$$

Thus, the number of bins is the range divided by this width. This binning choice shows enough detail in the distribution while still maintaining a large enough bin size to generalize the distribution.