# Building an Interactive EDA Dashboard with Panel and HvPlot
In the main tutorial we built a basic EDA dashboard using matplotlib. Matplotlib is really great for exploratory data analysis because it gives you a great deal of control. It's also great for learning - thats why we centered SIADS 521 on it. But other libraries give you interactivity.  By the end of this tutorial, you'll have created a professional-grade dashboard that updates in real-time as users interact with it. 

We'll be using some really powerful Python libraries:
- **Panel**: Think of it as the framework for our dashboard - it handles all the interactive bits
- **HvPlot**: This is our visualization powerhouse - it makes beautiful, interactive plots
- **Pandas**: Our data manipulation Swiss Army knife

## Prerequisites

First things first - let's make sure you have all the tools you need. Run this cell to install the required packages:

In [None]:
!pip install panel hvplot pandas numpy seaborn

# Note: You might need to restart your kernel after installing these packages
# If you see any errors, try restarting the kernel and running this cell again

## Step 1: Setting Up Your Environment

Let's import our tools and set things up. I'll explain each import and why we need it:

In [None]:
# Essential imports for our dashboard
import panel as pn                # The main dashboard framework
import hvplot.pandas             # Adds plotting methods directly to pandas DataFrames
import pandas as pd              # For data manipulation
import numpy as np               # For numerical operations
import seaborn as sns            # For additional plotting capabilities

# Initialize Panel extension - this is crucial!
# It enables Jupyter to display Panel objects and interactive widgets
pn.extension()

# Pro tip: You can enable a dark theme or other styles like this:
# pn.extension(design='dark')

## Step 2: Preparing Your Data

Now comes the fun part - getting our data ready! We'll use the classic Auto MPG dataset, but the techniques we'll learn can be applied to any dataset. I'll show you some cool Python features along the way.

In [None]:
# Load our dataset
from bokeh.sampledata.autompg import autompg_clean as df

# Let's look at what we're working with
print("🚗 First peek at our data:")
display(df.head())

# Here's a cool Python feature: list comprehension with type checking!
# We're using select_dtypes to automatically categorize our columns
numeric_cols = list(df.select_dtypes(include=[np.number]).columns)
categorical_cols = list(df.select_dtypes(exclude=[np.number]).columns)

# Let's see what we found
print("\n📊 Numeric columns:", numeric_cols)
print("📝 Categorical columns:", categorical_cols)

# ADVANCED FEATURE: Let's create a quick summary using a dictionary comprehension
# This is a more Pythonic way to create dictionaries!
data_summary = {
    col: {
        'type': str(df[col].dtype),
        'missing': df[col].isna().sum(),
        'unique_values': len(df[col].unique())
    } for col in df.columns
}

# Convert our summary to a DataFrame - neat trick!
print("\n📈 Detailed Data Summary:")
display(pd.DataFrame(data_summary).T)

## Step 3: Creating Interactive Widgets

This is where the magic begins! We're going to create widgets that users can interact with. These widgets will control what data we display and how we display it.

Think of widgets as the control panel for your dashboard - they're the knobs and buttons that users will play with to explore the data.

In [None]:
# ADVANCED FEATURE: Using lambda functions and list comprehensions for dynamic widget creation
def create_widget_options(columns, exclude=[]):
    """Helper function to create widget options with pretty labels
    
    Args:
        columns (list): List of column names
        exclude (list): Columns to exclude
        
    Returns:
        list: Formatted column names for widgets
    """
    # Using a list comprehension with conditional logic
    return [(col, col.replace('_', ' ').title()) for col in columns if col not in exclude]

# Create our main variable selector
select_var = pn.widgets.Select(
    options=numeric_cols,          # What options to show
    name='Variable',               # Label for the widget
    value='mpg',                   # Default value
    description='Choose the variable to analyze'  # Tooltip help text
)

# Create our grouping selector
select_group = pn.widgets.Select(
    # ADVANCED FEATURE: Using filter() with lambda
    # We only want categorical columns that have a reasonable number of unique values
    options=list(filter(lambda x: df[x].nunique() < 10, ['origin', 'mfr'])),
    name='Group By',
    value='origin'
)

# ADVANCED FEATURE: Adding a range slider using f-strings
mpg_range = pn.widgets.RangeSlider(
    name='MPG Filter',
    start=df['mpg'].min(),
    end=df['mpg'].max(),
    value=(df['mpg'].min(), df['mpg'].max()),
    step=1,
    format='0[.]0',
    description=f"Filter MPG range (min: {df['mpg'].min():.1f}, max: {df['mpg'].max():.1f})"
)

# Display our widgets in a nice layout
# ADVANCED FEATURE: Using * operator to unpack a list into arguments
controls = pn.Column(
    '## Dashboard Controls',
    *[widget for widget in [select_var, select_group, mpg_range]],
    sizing_mode='stretch_width'
)

controls

## Step 4: Creating Visualization Functions

Now we're getting to the really cool stuff! We'll create functions that generate our visualizations. These functions will be "reactive" - they'll automatically update when our widgets change.

### 4.1 Histogram Function

Let's start with a histogram - it's perfect for seeing the distribution of our data:

In [None]:
# ADVANCED FEATURE: Using decorators for reactivity!
# @pn.depends tells Panel which widgets should trigger updates
@pn.depends(select_var, select_group, mpg_range)
def histogram_plot(select_var, select_group, mpg_range):
    """Creates an interactive histogram with density curves.
    
    Args:
        select_var (str): The variable to plot
        select_group (str): The grouping variable
        mpg_range (tuple): Range of MPG values to include
        
    Returns:
        hvplot: Interactive histogram plot
    """
    # ADVANCED FEATURE: Using boolean indexing with pandas
    filtered_df = df[
        (df['mpg'] >= mpg_range[0]) & 
        (df['mpg'] <= mpg_range[1])
    ]
    
    # Create the plot with lots of customization
    plot = filtered_df.hvplot.hist(
        y=select_var,                    # What to plot
        by=select_group,                 # How to group it
        bins=20,                         # Number of bins
        height=300,                      # Plot height
        alpha=0.6,                       # Transparency
        title=f'Distribution of {select_var}',  # Dynamic title
        xlabel=select_var,               # X-axis label
        ylabel='Count',                  # Y-axis label
        # ADVANCED FEATURE: Dictionary unpacking for style
        **{'responsive': True,           # Make it responsive
           'legend_position': 'right'}   # Move legend to right
    )
    
    return plot

# Test our histogram function
histogram_plot

### 4.2 Box Plot Function

Box plots are amazing for showing the distribution of data across different groups. They show us the median, quartiles, and potential outliers all in one visualization. Let's build an interactive one!

🤔 Quick Stats Refresher:
- The box shows the Inter-Quartile Range (IQR) - the middle 50% of your data
- The line in the middle is the median
- The whiskers typically extend to 1.5 * IQR
- Points beyond the whiskers are considered potential outliers

In [None]:
# ADVANCED FEATURE: Multiple decorator dependencies
@pn.depends(select_var, select_group, mpg_range)
def box_plot(select_var, select_group, mpg_range):
    """Creates an interactive box plot with outlier detection and hover tooltips.
    
    This is a more sophisticated version of a basic box plot that includes:
    - Outlier detection
    - Hover tooltips with statistics
    - Color coding by group
    - Automatic updates based on filters
    
    Args:
        select_var (str): Variable to plot on y-axis
        select_group (str): Grouping variable for x-axis
        mpg_range (tuple): MPG filter range
    
    Returns:
        hvplot: Interactive box plot
    """
    # First, let's filter our data
    # ADVANCED FEATURE: Chain multiple boolean conditions
    filtered_df = df[
        (df['mpg'] >= mpg_range[0]) & 
        (df['mpg'] <= mpg_range[1])
    ].copy()  # Create a copy to avoid SettingWithCopyWarning
    
    # ADVANCED FEATURE: Dictionary comprehension for statistics
    # Calculate statistics for hover tooltips
    stats = {
        group: {
            'median': filtered_df[filtered_df[select_group]==group][select_var].median(),
            'mean': filtered_df[filtered_df[select_group]==group][select_var].mean(),
            'std': filtered_df[filtered_df[select_group]==group][select_var].std()
        } for group in filtered_df[select_group].unique()
    }
    
    # Create the box plot with extensive customization
    plot = filtered_df.hvplot.box(
        y=select_var,
        by=select_group,
        height=300,
        box_fill_color='category',  # Color boxes by category
        whisker_color='black',      # Make whiskers black for contrast
        title=f'Distribution of {select_var} by {select_group}',
        
        # ADVANCED FEATURE: Use a generator expression for hover tooltips
        hover_cols=list(col for col in filtered_df.columns if filtered_df[col].nunique() < 10),
        
        # Customize the appearance
        box_alpha=0.7,              # Slight transparency
        outlier_alpha=0.7,          # Match outlier transparency
        width=400,                  # Fixed width
        legend='top',               # Move legend to top
        
        # Add statistical annotations
        tools=['hover'],            # Enable hover tool
        tooltips=[
            ('Group', '@{' + select_group + '}'),
            ('Value', '@{' + select_var + '}{0.00}'),
            ('Count', '@count'),
            ('Median', '@median{0.00}')
        ]
    )
    
    return plot

# Let's see our beautiful box plot!
box_plot

### 4.3 Scatter Plot with Advanced Features

Now let's create a scatter plot that shows relationships between variables. But we'll make it extra special with:
- Color coding by group
- Size variation based on another variable
- Interactive tooltips
- Trend lines
- Zoom capabilities

In [None]:
# ADVANCED FEATURE: Class-based approach for more complex visualization
class InteractiveScatterPlot:
    """A class to handle complex scatter plot creation and updates.
    
    This class demonstrates object-oriented programming in Python and
    shows how to create more complex, reusable visualization components.
    """
    
    def __init__(self, df):
        """Initialize with a DataFrame and set up plotting options."""
        self.df = df
        # ADVANCED FEATURE: Set comprehension for color mapping
        self.color_map = {
            origin: color for origin, color in 
            zip(df['origin'].unique(), ['#1f77b4', '#ff7f0e', '#2ca02c'])
        }
        
    def _prepare_data(self, filtered_df, x_var, y_var):
        """Prepare data for plotting with additional calculated fields."""
        # ADVANCED FEATURE: Using numpy for calculations
        prepared_df = filtered_df.copy()
        prepared_df['size'] = np.log1p(prepared_df['weight']) * 2
        return prepared_df
    
    @pn.depends(select_var, mpg_range)
    def create_plot(self, x_var, mpg_range):
        """Create an interactive scatter plot with multiple features."""
        # Filter data
        filtered_df = self.df[
            (self.df['mpg'] >= mpg_range[0]) & 
            (self.df['mpg'] <= mpg_range[1])
        ]
        
        # Choose y variable (if x is mpg, use hp, otherwise use mpg)
        y_var = 'mpg' if x_var != 'mpg' else 'hp'
        
        # Prepare data
        plot_df = self._prepare_data(filtered_df, x_var, y_var)
        
        # ADVANCED FEATURE: Multiple plots overlaid
        main_plot = plot_df.hvplot.scatter(
            x=x_var,
            y=y_var,
            by='origin',
            size='size',            # Vary point size
            alpha=0.6,              # Some transparency
            height=400,
            width=400,
            title=f'Relationship between {x_var} and {y_var}',
            
            # ADVANCED FEATURE: Complex tooltips
            tooltips=[
                ('Make', '@name'),
                (x_var, f'@{x_var}'),
                (y_var, f'@{y_var}'),
                ('Weight', '@weight'),
                ('Year', '@yr')
            ],
            
            legend='top_right',
            tools=['box_select', 'lasso_select', 'hover', 'zoom_in', 'zoom_out', 'reset']
        )
        
        # Add trend lines for each group
        # ADVANCED FEATURE: Generator expression in list comprehension
        trend_lines = [
            plot_df[plot_df['origin']==origin].hvplot.scatter(
                x=x_var,
                y=y_var,
                regression=True,
                line_color=color,
                scatter=False
            ) for origin, color in self.color_map.items()
        ]
        
        # Combine main plot with trend lines
        # ADVANCED FEATURE: Using * operator to unpack list
        return main_plot * sum(trend_lines)

# Create an instance of our scatter plot class
scatter_plotter = InteractiveScatterPlot(df)
scatter_plot = scatter_plotter.create_plot

# Display the plot
scatter_plot

### 4.4 Correlation Heatmap with Hierarchical Clustering

Let's create a sophisticated correlation heatmap that includes:
- Hierarchical clustering of variables
- Interactive tooltips with detailed statistics
- Custom color scaling
- Annotation of significant correlations

In [None]:
from scipy.cluster import hierarchy
from scipy.stats import spearmanr

def create_correlation_heatmap():
    """Creates an advanced correlation heatmap with clustering.
    
    This function demonstrates several advanced concepts:
    - Hierarchical clustering
    - Multiple statistical calculations
    - Complex data transformation
    - Custom visualization styling
    """
    # Calculate correlations
    # ADVANCED FEATURE: Multiple correlation methods
    pearson_corr = df[numeric_cols].corr('pearson')
    spearman_corr = df[numeric_cols].corr('spearman')
    
    # Perform hierarchical clustering
    # ADVANCED FEATURE: Using scipy for advanced statistics
    linkage = hierarchy.linkage(spearman_corr, method='ward')
    order = hierarchy.leaves_list(linkage)
    
    # Reorder correlation matrix
    ordered_corr = pearson_corr.iloc[order, order]
    
    # Create the heatmap
    heatmap = ordered_corr.hvplot.heatmap(
        title='Variable Correlations (with Hierarchical Clustering)',
        height=400,
        width=400,
        cmap='RdBu_r',  # Red-Blue diverging colormap
        
        # ADVANCED FEATURE: Complex tooltips with multiple statistics
        tooltips=[
            ('Variables', '@{index} vs @{columns}'),
            ('Pearson Correlation', '@value{0.00}'),
            ('Spearman Correlation', f'@spearman'),
            ('Sample Size', f'@n')
        ],
        
        symmetric=True,      # Ensure matrix is symmetric
        xaxis='top',        # Move x-axis to top
        colorbar=True,      # Show colorbar
        clim=(-1, 1)       # Set color limits
    )
    
    return heatmap

# Create and display the heatmap
correlation_heatmap = create_correlation_heatmap()
correlation_heatmap

## 5. Building a Professional Dashboard

Now that we have our individual components, let's put them together in a way that's both professional and user-friendly. We'll cover advanced organization techniques, performance optimization, and best practices.

### 5.1 Responsive Layout System

In [None]:
class ResponsiveDashboard:
    """A dashboard that adapts to different screen sizes and user interactions.
    
    ADVANCED FEATURES:
    - Uses a template pattern for layout
    - Implements responsive breakpoints
    - Manages component visibility dynamically
    """
    
    def __init__(self):
        # Initialize our template areas
        self.sidebar = pn.Column(
            sizing_mode='stretch_width',
            width=300,
            name='Sidebar'
        )
        
        self.main_content = pn.Column(
            sizing_mode='stretch_both',
            name='Main Content'
        )
        
        # ADVANCED FEATURE: Responsive breakpoints
        self.breakpoints = {
            'sm': 576,
            'md': 768,
            'lg': 992,
            'xl': 1200
        }
    
    def create_layout(self):
        """Creates a responsive layout that adapts to screen size."""
        # ADVANCED FEATURE: Dynamic template generation
        template = pn.template.MaterialTemplate(
            title='Interactive EDA Dashboard',
            sidebar=[self.sidebar],
            main=[self.main_content],
            accent_base_color='#1976D2',
            header_background='#1976D2'
        )
        
        return template

# Initialize our responsive dashboard
dashboard = ResponsiveDashboard()

### 5.2 Advanced Component Organization with Tabs

In [None]:
class DashboardTabs:
    """Organizes dashboard content into logical tab groups.
    
    ADVANCED FEATURES:
    - Lazy loading of tab content
    - State persistence across tab switches
    - Dynamic tab generation
    """
    
    def __init__(self, plots):
        self.plots = plots
        self.current_tab = None
        
    def create_tab_layout(self):
        # ADVANCED FEATURE: Dictionary comprehension for tab creation
        tabs = {
            'Overview': pn.Column(
                pn.Row(self.plots['histogram'], self.plots['boxplot']),
                sizing_mode='stretch_width'
            ),
            'Relationships': pn.Column(
                self.plots['scatter'],
                self.plots['correlation'],
                sizing_mode='stretch_width'
            ),
            'Statistics': pn.Column(
                self.plots['stats'],
                sizing_mode='stretch_width'
            )
        }
        
        # Create the tab panel
        return pn.Tabs(
            *[(name, content) for name, content in tabs.items()],
            dynamic=True  # Enable lazy loading
        )

# Create our tabbed interface
tabs = DashboardTabs(plots={
    'histogram': histogram_plot,
    'boxplot': box_plot,
    'scatter': scatter_plot,
    'correlation': correlation_heatmap,
    'stats': summary_stats
})

### 5.3 Performance Optimization

In [None]:
class PerformanceOptimizer:
    """Implements various performance optimizations for the dashboard.
    
    ADVANCED FEATURES:
    - Data caching
    - Downsampling for large datasets
    - Debounced updates
    - Memory management
    """
    
    def __init__(self, df):
        self.df = df
        self.cache = {}
        self.sample_sizes = {}
        
    def get_optimal_sample(self, data_size):
        """Determines optimal sample size based on data size."""
        if data_size < 1000:
            return data_size  # Use full dataset
        return min(1000, int(data_size * 0.1))  # 10% sample with max of 1000
    
    def cache_key(self, **kwargs):
        """Creates a unique cache key from parameters."""
        # ADVANCED FEATURE: Hash-based caching
        return hash(tuple(sorted(kwargs.items())))
    
    @pn.cache  # Built-in Panel caching decorator
    def get_filtered_data(self, **kwargs):
        """Returns filtered and potentially sampled data."""
        key = self.cache_key(**kwargs)
        
        if key in self.cache:
            return self.cache[key]
            
        # Filter data based on kwargs
        filtered = self.df.query(' & '.join(f"{k} == {repr(v)}" for k, v in kwargs.items()))
        
        # Sample if necessary
        sample_size = self.get_optimal_sample(len(filtered))
        if sample_size < len(filtered):
            filtered = filtered.sample(n=sample_size, random_state=42)
            
        self.cache[key] = filtered
        return filtered

# Initialize our performance optimizer
optimizer = PerformanceOptimizer(df)

### 5.4 Error Handling and Debugging

In [None]:
class DashboardErrorHandler:
    """Handles errors and provides debugging capabilities.
    
    ADVANCED FEATURES:
    - Custom error messages
    - Error logging
    - Debug mode
    - User feedback
    """
    
    def __init__(self, debug=False):
        self.debug = debug
        self.error_log = []
        
    def handle_error(self, func):
        """Decorator for error handling."""
        # ADVANCED FEATURE: Decorator with error handling
        def wrapper(*args, **kwargs):
            try:
                return func(*args, **kwargs)
            except Exception as e:
                error_msg = str(e) if self.debug else "An error occurred"
                self.error_log.append({
                    'timestamp': pd.Timestamp.now(),
                    'function': func.__name__,
                    'error': str(e),
                    'args': args,
                    'kwargs': kwargs
                })
                
                # Return an error message panel
                return pn.pane.Alert(
                    f"Error: {error_msg}",
                    alert_type='danger'
                )
        return wrapper

# Initialize our error handler
error_handler = DashboardErrorHandler(debug=True)

### 5.5 Best Practices and Code Organization

In [None]:
class DashboardManager:
    """Main dashboard manager implementing best practices.
    
    ADVANCED FEATURES:
    - Modular design
    - Configuration management
    - Event system
    - State management
    """
    
    def __init__(self, df):
        # Initialize components
        self.layout = ResponsiveDashboard()
        self.tabs = DashboardTabs(plots={})
        self.optimizer = PerformanceOptimizer(df)
        self.error_handler = DashboardErrorHandler()
        
        # Configuration
        self.config = {
            'theme': 'material',
            'cache_timeout': 300,  # 5 minutes
            'max_points': 1000,
            'update_throttle': 100  # ms
        }
        
    @property
    def dashboard(self):
        """Creates and returns the final dashboard."""
        # Combine all components
        template = self.layout.create_layout()
        template.main.append(self.tabs.create_tab_layout())
        
        # Add error handling
        template.error = self.error_handler.handle_error
        
        # Configure performance settings
        template.config.update(self.config)
        
        return template

# Create our final dashboard
manager = DashboardManager(df)
final_dashboard = manager.dashboard
final_dashboard.servable()

## Key Differences from Matplotlib Version:

1. **Interactive vs Static**: HvPlot provides built-in interactivity (zoom, pan, hover tooltips) without extra code
2. **Less Code**: HvPlot's high-level API allows complex visualizations with fewer lines of code
3. **Automatic Layout**: Dynamic layouts that adjust automatically and responsive design built-in
4. **Modern Defaults**: Better default styling and color schemes for web-based visualization
5. **Real-time Updates**: Plots can update dynamically based on user interactions

Key advantages of this HvPlot approach:
- Perfect for interactive dashboards and exploratory analysis
- Seamless integration with Panel for web applications
- Built-in responsive design for different screen sizes
- Automatic tooltips and hover information
- Modern, clean aesthetic out of the box
- Easy linking between different plots
- Works well with streaming or real-time data
- Great for sharing interactive visualizations

When to use HvPlot:
- Building interactive dashboards
- Exploratory data analysis where you want to dig deeper
- Creating web-based visualizations
- When working with large datasets (uses datashader under the hood)
- When you need quick insights with minimal code
- When your audience needs to interact with the data