# Week 6, Class 1: Introduction to Matplotlib

## 1. What is Matplotlib?
**Matplotlib** is the foundation of data visualization in Python's scientific ecosystem. It's a flexible library that gives you control over every element of your plots.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

## 2. Your First Plot: The Line Plot
The simplest plot is a line plot, which shows data points connected by a line. It's perfect for showing trends over time or across a continuous variable.

The basic workflow is:
1. Provide data to `plt.plot()`.
2. Add labels, titles, etc. (optional, but highly recommended!).
3. Display the plot with `plt.show()`.

In [None]:
# Create some dummy data: temperature over time
time = [0, 1, 2, 3, 4, 5, 6] # hours
temperature = [20.5, 21.0, 21.3, 22.0, 21.8, 22.5, 22.3] # degrees Celsius

# Plot the data
plt.plot(time, temperature)

# Display the plot
plt.show()

## 3. Anatomy of a Plot: Titles, Labels, and Legends
A raw plot is not very informative. It's essential to add context and customize its appearance.
* `plt.title()`: Adds a title to the plot.
* `plt.xlabel()`: Labels the x-axis.
* `plt.ylabel()`: Labels the y-axis.
* `plt.legend()`: Adds a legend, which is necessary when you have multiple datasets on the same plot.
* `plt.grid(True)`: Adds a grid for easier readability.
* `plt.xlim()`, `plt.ylim()`: Sets the limits of the x and y axes.

### 3.1. Plotting Multiple Datasets and Customizing Lines
You can call `plt.plot()` multiple times before `plt.show()` to add multiple lines to the same plot. You can also customize the appearance of each line.

In [None]:
# Create two sets of data for comparison
time_hours = np.arange(0, 10, 0.5)
data_run_A = 20 + 0.5 * time_hours
data_run_B = 20 + 0.5 * time_hours + np.random.randn(len(time_hours)) * 0.5

# Plot with customization
plt.plot(time_hours, data_run_A, 
         label='Run A (Ideal)', 
         color='blue', # Line color
         linestyle='--', # Line style
         marker='o', # Marker style
         alpha=0.7) # Transparency

plt.plot(time_hours, data_run_B, 
         label='Run B (Experimental)', 
         color='red', 
         linestyle='-', 
         marker='x', 
         alpha=0.7)

# Add labels and a title
plt.title("Temperature Over Time for Two Experimental Runs")
plt.xlabel("Time (hours)")
plt.ylabel("Temperature (°C)")

# Set axis limits
plt.xlim(0, 10)
plt.ylim(18, 26)

# Add a legend and grid
plt.legend()
plt.grid(True)

# Display the plot
plt.show()

## 4. Other Common Plot Types
Matplotlib supports a wide range of plot types. Here are two more that are essential for scientific work.

### 4.1. Scatter Plots
A scatter plot displays the relationship between two continuous variables, plotting each data point as a single marker. It's great for visualizing correlation. Use `plt.scatter()`.

In [None]:
# Create dummy data: response vs. dose
dose = np.array([5, 10, 15, 20, 25])
response = np.array([12, 25, 38, 45, 55])

# Plot the data
plt.scatter(dose, response)

# Add labels and a title
plt.title("Experimental Response vs. Dose")
plt.xlabel("Dose (mg)")
plt.ylabel("Response (arbitrary units)")

plt.show()

### 4.2. Bar Charts
A bar chart is used to compare quantities of different discrete categories. It's great for visualizing grouped data, such as average response for different treatments. Use `plt.bar()`.

In [None]:
# Create data for average results from different treatments
treatments = ['Treatment A', 'Treatment B', 'Control']
average_yield = [78.5, 85.1, 75.0]
std_dev = [2.5, 3.1, 1.8] # Standard deviation for error bars

# Plot the data with error bars
plt.bar(treatments, average_yield, yerr=std_dev, capsize=5)

# Add labels and a title
plt.title("Average Experimental Yield by Treatment")
plt.xlabel("Treatment")
plt.ylabel("Average Yield (%)")

plt.show()

### 4.3. Histograms
A histogram is used to visualize the distribution of a single numerical variable. It divides the data into bins and counts how many data points fall into each bin. This is great for visualizing frequency distributions. Use `plt.hist()`.

In [None]:
# Create random data to simulate measurements
measurements = np.random.normal(loc=100, scale=10, size=500) # mean=100, std=10, 500 points

# Plot a histogram
plt.hist(measurements, bins=20, edgecolor='black', alpha=0.7)

# Add labels and a title
plt.title("Frequency Distribution of Measurements")
plt.xlabel("Measurement Value")
plt.ylabel("Frequency")

plt.show()

## 5. Adding Annotations and Text
You can add custom text and annotations to your plots to highlight specific data points or regions of interest.
* `plt.text(x, y, 'text')`: Adds text at a specific coordinate.
* `plt.annotate('text', xy=(x, y), ...)`: Adds an annotation with an arrow pointing to a specific data point.

In [None]:
# Create data for a scatter plot
dose = np.array([5, 10, 15, 20, 25])
response = np.array([12, 25, 38, 45, 55])

plt.scatter(dose, response)
plt.title("Experimental Response vs. Dose")
plt.xlabel("Dose (mg)")
plt.ylabel("Response (arbitrary units)")

# Add a text annotation to the plot
plt.text(10, 48, 'High response region', fontsize=10, color='red')

# Add an arrow annotation pointing to a specific data point
plt.annotate('Highest response', xy=(25, 55), xytext=(20, 50),
             arrowprops=dict(facecolor='black', shrink=0.05),
             fontsize=10)

plt.show()

## 6. Saving Your Plots
Once you've created a plot you're happy with, you'll want to save it to a file. You should call `plt.savefig()` before `plt.show()`.

In [None]:
# Re-create our bar chart from above
treatments = ['Treatment A', 'Treatment B', 'Control']
average_yield = [78.5, 85.1, 75.0]
plt.bar(treatments, average_yield)
plt.title("Average Experimental Yield by Treatment")
plt.xlabel("Treatment")
plt.ylabel("Average Yield (%)")

# Save the figure to a file
# The file format is inferred from the extension (.png, .jpg, .pdf, .svg, etc.)
plt.savefig('yield_by_treatment.png')

# Display the plot
plt.show()


print("Plot saved as 'yield_by_treatment.png'")

## Summary and Key Takeaways

* **Data visualization** is essential for exploration, communication, and validation of scientific data.
* **Matplotlib** is the primary visualization library in Python.
* The basic plotting workflow is to define data, plot it with `plt.plot()` or `plt.scatter()`, add labels and a title, and then call `plt.show()`.
* **Titles (`plt.title()`), axis labels (`plt.xlabel()`, `plt.ylabel()`), and legends (`plt.legend()`)** are crucial for creating informative plots.
* You can **customize** almost any aspect of your plots, including colors, line styles, markers, and axis limits.
* Common plot types include **line plots**, **scatter plots**, **bar charts**, and **histograms**.
* You can save your plots to various file formats using `plt.savefig()`.

## Exercises

Complete the following exercises in a new Python script or a new Jupyter Notebook.


1.  **Plotting Time-Series Data:**
    * Create a NumPy array for `time` from 0 to 24 (inclusive) in one-hour steps.
    * Create a NumPy array for `pressure` values that simulate a cyclic process (e.g., `pressure = 100 + np.sin(time / 24 * 2 * np.pi) * 10`).
    * Create a line plot of `pressure` versus `time`.
    * Add a title: "Pressure Over 24 Hours".
    * Label the x-axis: "Time (hours)".
    * Label the y-axis: "Pressure (kPa)".
    * Add a grid to the plot.
    * Save the plot as `pressure_cycle.png` and then show it.

2.  **Visualizing a Dose-Response Curve with Customization:**
    * You have two lists of data: `dose = [0.1, 0.5, 1.0, 5.0, 10.0]` and `inhibition = [5, 25, 50, 75, 90]`.
    * Create a scatter plot of `inhibition` versus `dose`.
    * Customize the markers to be green triangles (`'g^'`).
    * Add an annotation to the highest data point (dose 10.0, inhibition 90) that says "IC90" using `plt.annotate()`.
    * Add a title: "Drug X Dose-Response Curve".
    * Label the x-axis: "Dose (ug/mL)".
    * Label the y-axis: "Inhibition (%)".
    * Save the plot as `dose_response.pdf` and then show it.

3.  **Comparing Two Datasets with a Bar Chart and Custom Colors:**
    * You have a list of categories: `solvents = ['Water', 'Ethanol', 'Methanol']`.
    * You have two lists of experimental results: `yield_exp1 = [85, 75, 68]` and `yield_exp2 = [88, 78, 71]`.
    * Create a bar chart that compares `yield_exp1` and `yield_exp2` for each solvent. You'll need to plot two sets of bars side-by-side.
    * Give 'Experiment 1' bars a blue color and 'Experiment 2' bars a red color.
    * Add a title, axis labels, and a legend for 'Experiment 1' and 'Experiment 2'.
    * Show the plot.