<a href="https://colab.research.google.com/github/tachvault/python_libs/blob/main/Essential_library_functions_and_examples_of_matpl_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Okay, let's cover the essential Matplotlib functions and plotting styles commonly used in machine learning (ML) and deep learning (DL) workflows, along with examples.

Matplotlib is the foundational data visualization library in Python. In ML/DL, it's crucial for:
* Understanding data distributions.
* Visualizing model performance during and after training (e.g., loss/accuracy curves).
* Inspecting model predictions vs. actual values.
* Visualizing datasets (especially images).
* Presenting results (e.g., confusion matrices, feature importance).

**Standard Import & Setup**

In [None]:
import matplotlib.pyplot as plt
import numpy as np # Often used together to generate/manipulate data for plotting

# Optional: Commonly used in Jupyter notebooks to display plots inline
%matplotlib inline

# Optional: Set a default figure size for better readability
plt.rcParams['figure.figsize'] = (8, 5) # (width, height in inches)

**Core Concepts: Figure and Axes**

* **Figure:** The whole window or page the plot is drawn on.
* **Axes:** The area where data is plotted; a figure can contain multiple axes (subplots).

You can use the `pyplot` interface (`plt.plot()`, `plt.title()`, etc.) for simple plots, or the more flexible object-oriented interface (creating `figure` and `axes` objects: `fig, ax = plt.subplots(); ax.plot(); ax.set_title()`), which is generally recommended, especially for multiple subplots.

**1. Line Plots (`plt.plot()` / `ax.plot()`)**

* **Use Case:** Plotting training/validation loss and accuracy over epochs, visualizing time series data, showing the relationship between two ordered variables.

In [None]:
# Simulate training history
epochs = np.arange(1, 21)
train_loss = 0.8 / epochs + np.random.randn(20) * 0.05
val_loss = 0.9 / epochs + np.random.randn(20) * 0.08 + 0.05 # Slightly higher validation loss
train_acc = 1 - (0.7 / epochs + np.random.randn(20) * 0.03)
val_acc = 1 - (0.8 / epochs + np.random.randn(20) * 0.05)

# --- Using Pyplot interface ---
# plt.figure() # Create a new figure (optional for single plot)
# plt.plot(epochs, train_loss, label='Training Loss', marker='o')
# plt.plot(epochs, val_loss, label='Validation Loss', marker='x')
# plt.title('Model Loss over Epochs (Pyplot)')
# plt.xlabel('Epoch')
# plt.ylabel('Loss')
# plt.legend()
# plt.grid(True)
# plt.show() # Display the plot

# --- Using Object-Oriented interface (better for customization/subplots) ---
fig, ax1 = plt.subplots() # Create figure and one axes object

# Plot loss on the first y-axis
color = 'tab:red'
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss', color=color)
ax1.plot(epochs, train_loss, color=color, linestyle='-', marker='o', label='Training Loss')
ax1.plot(epochs, val_loss, color=color, linestyle='--', marker='x', label='Validation Loss')
ax1.tick_params(axis='y', labelcolor=color)
ax1.grid(True, axis='y') # Add horizontal grid lines for loss

# Create a second y-axis sharing the same x-axis for accuracy
ax2 = ax1.twinx()
color = 'tab:blue'
ax2.set_ylabel('Accuracy', color=color)
ax2.plot(epochs, train_acc, color=color, linestyle='-', marker='s', label='Training Accuracy')
ax2.plot(epochs, val_acc, color=color, linestyle='--', marker='^', label='Validation Accuracy')
ax2.tick_params(axis='y', labelcolor=color)
ax2.set_ylim(0, 1.1) # Set accuracy limits

# Add titles and legends
fig.suptitle('Model Training History (Object-Oriented)') # Overall title
fig.legend(loc='upper center', bbox_to_anchor=(0.5, -0.01), ncol=2) # Combine legends below plot
fig.tight_layout(rect=[0, 0.03, 1, 0.95]) # Adjust layout to prevent title overlap and make space for legend

plt.show() # Display the plot

**2. Scatter Plots (`plt.scatter()` / `ax.scatter()`)**

* **Use Case:** Visualizing the relationship between two features, comparing actual vs. predicted values in regression, visualizing clusters (e.g., after PCA or t-SNE), identifying outliers.

In [None]:
# Simulate actual vs predicted values for regression
np.random.seed(42)
actual_values = np.random.rand(50) * 10
predicted_values = actual_values + np.random.randn(50) * 1.5 # Predictions with some noise

fig, ax = plt.subplots()
ax.scatter(actual_values, predicted_values, alpha=0.7, edgecolors='k', label='Data points')

# Add a line representing perfect predictions (y=x)
lims = [min(ax.get_xlim()[0], ax.get_ylim()[0]), max(ax.get_xlim()[1], ax.get_ylim()[1])]
ax.plot(lims, lims, 'r--', alpha=0.75, zorder=0, label='Perfect Prediction') # zorder=0 puts line behind points

ax.set_xlabel("Actual Values")
ax.set_ylabel("Predicted Values")
ax.set_title("Actual vs. Predicted Values in Regression")
ax.legend()
ax.grid(True)
plt.show()

# --- Example: Visualizing Clusters ---
from sklearn.datasets import make_blobs # For generating sample cluster data
X, y = make_blobs(n_samples=100, centers=3, n_features=2, random_state=42, cluster_std=1.0)

fig, ax = plt.subplots()
scatter = ax.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', edgecolors='k', alpha=0.8)
ax.set_xlabel("Feature 1 (or PC1/t-SNE1)")
ax.set_ylabel("Feature 2 (or PC2/t-SNE2)")
ax.set_title("Data Clusters Visualization")
ax.legend(handles=scatter.legend_elements()[0], labels=['Cluster 0', 'Cluster 1', 'Cluster 2'])
ax.grid(True)
plt.show()

**3. Histograms (`plt.hist()` / `ax.hist()`)**

* **Use Case:** Understanding the distribution of a single variable (feature values, prediction errors), checking for normality or skewness.

In [None]:
# Simulate prediction errors (ideally centered around zero)
errors = np.random.randn(1000) * 5 # Normally distributed errors with std dev 5

fig, ax = plt.subplots()
ax.hist(errors, bins=30, edgecolor='black', alpha=0.7) # bins controls number of bars
ax.set_xlabel("Prediction Error")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of Prediction Errors")
ax.axvline(errors.mean(), color='r', linestyle='dashed', linewidth=1, label=f'Mean: {errors.mean():.2f}')
ax.axvline(0, color='k', linestyle='solid', linewidth=1, label='Zero Error')
ax.legend()
plt.show()

**4. Bar Charts (`plt.bar()` / `ax.bar()`)**

* **Use Case:** Comparing metrics across different models, visualizing feature importance scores, showing class distributions.

In [None]:
# Simulate feature importance scores
features = ['Feature A', 'Feature B', 'Feature C', 'Feature D']
importance = [0.45, 0.25, 0.15, 0.15]

fig, ax = plt.subplots()
ax.bar(features, importance, color=['skyblue', 'lightcoral', 'lightgreen', 'gold'])
ax.set_xlabel("Features")
ax.set_ylabel("Importance Score")
ax.set_title("Feature Importance")
ax.set_ylim(0, 0.5) # Adjust y-limit for better visualization
# Add text labels on bars
for i, v in enumerate(importance):
    ax.text(i, v + 0.01, f"{v:.2f}", ha='center', va='bottom')
plt.show()

**5. Image Display (`plt.imshow()` / `ax.imshow()`)**

* **Use Case:** Displaying images from datasets (e.g., MNIST, CIFAR), visualizing filters or activation maps in Convolutional Neural Networks (CNNs).

In [None]:
# Simulate a grayscale image (e.g., MNIST digit) - 28x28 pixels
image_gray = np.random.rand(28, 28)

# Simulate an RGB image - 32x32 pixels with 3 color channels
image_rgb = np.random.rand(32, 32, 3)

fig, axes = plt.subplots(1, 2, figsize=(8, 4)) # 1 row, 2 columns

# Display grayscale
im_gray = axes[0].imshow(image_gray, cmap='gray') # Use grayscale colormap
axes[0].set_title("Grayscale Image")
axes[0].axis('off') # Hide axes ticks/labels for images
# fig.colorbar(im_gray, ax=axes[0]) # Optional: add colorbar

# Display RGB
im_rgb = axes[1].imshow(image_rgb) # Default colormap usually works for RGB
axes[1].set_title("RGB Image")
axes[1].axis('off')

plt.tight_layout()
plt.show()

**6. Heatmaps (`plt.imshow()` / `seaborn.heatmap()`)**

* **Use Case:** Visualizing confusion matrices, correlation matrices between features. While `imshow` can create basic heatmaps, the `seaborn` library (built on Matplotlib) often provides a more convenient function (`seaborn.heatmap`) specifically for this.

In [None]:
# Simulate a confusion matrix (Actual vs Predicted)
# Rows: Actual Class 0, 1, 2
# Columns: Predicted Class 0, 1, 2
conf_matrix = np.array([
    [100, 5, 2], # Actual 0: Pred 0, Pred 1, Pred 2
    [8, 110, 7], # Actual 1: Pred 0, Pred 1, Pred 2
    [1, 4, 95]  # Actual 2: Pred 0, Pred 1, Pred 2
])

fig, ax = plt.subplots(figsize=(5, 4))
im = ax.imshow(conf_matrix, cmap='Blues') # Choose a colormap (e.g., Blues, viridis, magma)

# Add labels, title, ticks
ax.set_xticks(np.arange(conf_matrix.shape[1]))
ax.set_yticks(np.arange(conf_matrix.shape[0]))
ax.set_xticklabels(['Pred 0', 'Pred 1', 'Pred 2'])
ax.set_yticklabels(['Actual 0', 'Actual 1', 'Actual 2'])
ax.set_xlabel("Predicted Label")
ax.set_ylabel("True Label")
ax.set_title("Confusion Matrix")

# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")

# Loop over data dimensions and create text annotations.
for i in range(conf_matrix.shape[0]):
    for j in range(conf_matrix.shape[1]):
        text_color = "white" if conf_matrix[i, j] > conf_matrix.max() / 2 else "black" # Choose text color based on background
        text = ax.text(j, i, conf_matrix[i, j],
                       ha="center", va="center", color=text_color)

fig.colorbar(im) # Add a colorbar to show the scale
fig.tight_layout()
plt.show()

# --- Alternative using Seaborn (often simpler for heatmaps) ---
# import seaborn as sns
# fig, ax = plt.subplots(figsize=(5, 4))
# sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', ax=ax,
#             xticklabels=['Pred 0', 'Pred 1', 'Pred 2'],
#             yticklabels=['Actual 0', 'Actual 1', 'Actual 2'])
# ax.set_xlabel("Predicted Label")
# ax.set_ylabel("True Label")
# ax.set_title("Confusion Matrix (Seaborn)")
# plt.show()

**7. Customization and Saving**

* **Labels & Title:** `ax.set_xlabel()`, `ax.set_ylabel()`, `ax.set_title()`
* **Legend:** `ax.legend()` (requires `label='...'` in plot commands)
* **Axis Limits:** `ax.set_xlim()`, `ax.set_ylim()`
* **Grid:** `ax.grid(True)`
* **Figure Size:** `plt.figure(figsize=(width, height))` or `fig, ax = plt.subplots(figsize=(width, height))`
* **Subplots:** `fig, axes = plt.subplots(nrows, ncols)` allows creating a grid of plots. Access individual plots via `axes[row, col]` or `axes[index]` if 1D.
* **Saving:** `plt.savefig('my_plot.png', dpi=300)` saves the *current figure* to a file (various formats like png, pdf, svg supported). `dpi` controls resolution.

In [None]:
# Example saving a plot
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [1, 4, 9])
ax.set_title("Simple Plot to Save")
plt.savefig('simple_plot.png', dpi=150) # Save before plt.show() if running as script
# plt.show() # Not strictly needed if only saving

These functions provide a solid foundation for visualizing data and model results throughout the machine learning lifecycle. Remember that libraries like Seaborn build upon Matplotlib to offer higher-level interfaces for specific statistical plot types.

<div class="md-recitation">
  Sources
  <ol>
  <li><a href="https://github.com/Christonikos/Dynamic-Risk-Assesment">https://github.com/Christonikos/Dynamic-Risk-Assesment</a></li>
  <li><a href="https://forum.knime.com/t/annotated-heatmap-using-python-view-node/66218">https://forum.knime.com/t/annotated-heatmap-using-python-view-node/66218</a></li>
  <li><a href="https://github.com/Allorak/pr5">https://github.com/Allorak/pr5</a></li>
  </ol>
</div>