# EDA-4: Audio Artifacts Analysis

**Sonification of Bearing Vibration Signals for XJTU-SY Dataset**

This notebook explores bearing health through auditory analysis. Since the vibration signals are sampled at 25.6kHz (already in the audible range), we can convert them directly to audio files to "hear" bearing degradation.

## Key Features
1. **Signal-to-Audio Conversion**: Convert vibration signals to WAV files
2. **Resampling**: Upsample to 44.1kHz for standard audio playback
3. **Lifecycle Audio Generation**: Create audio at healthy, degrading, and failed states
4. **Audio Comparison Widget**: Interactive playback for comparing bearing health states
5. **Spectrogram Visualization**: Visual+auditory correlation

## Why Sonification?
- Humans are excellent at detecting subtle patterns in sound
- Degradation often manifests as grinding, clicking, or roughness
- Auditory analysis complements visual spectrograms
- Technicians often use stethoscopes for bearing inspection

## Dataset Parameters
- **Sampling Rate**: 25.6 kHz (audible range: 20 Hz - 12.8 kHz)
- **Duration per file**: 32,768 samples = 1.28 seconds
- **Channels**: Horizontal and Vertical vibration

In [None]:
import sys
sys.path.insert(0, '..')

import numpy as np
import pandas as pd
from pathlib import Path
from IPython.display import Audio, display, HTML
import ipywidgets as widgets
from scipy import signal
from scipy.io import wavfile
import matplotlib.pyplot as plt
import seaborn as sns

# Project modules
from src.data.loader import XJTUBearingLoader, SAMPLING_RATE
from src.utils.audio import (
    resample_signal,
    normalize_audio,
    signal_to_wav,
    convert_vibration_to_audio,
    AudioConfig,
    SOURCE_SAMPLE_RATE,
    TARGET_SAMPLE_RATE,
)

# Configure matplotlib
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = [14, 5]
plt.rcParams['figure.dpi'] = 100

# Output directory for audio files
AUDIO_OUTPUT_DIR = Path('../outputs/audio')
AUDIO_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

print('Libraries loaded successfully!')
print(f'Source Sampling Rate: {SOURCE_SAMPLE_RATE:,} Hz')
print(f'Target Sampling Rate: {TARGET_SAMPLE_RATE:,} Hz')
print(f'Audio Output Directory: {AUDIO_OUTPUT_DIR.absolute()}')

In [None]:
# Initialize data loader
loader = XJTUBearingLoader(data_root='../assets/Data/XJTU-SY_Bearing_Datasets')
metadata = loader.get_metadata()

print(f'\nDataset Overview:')
print(f'Conditions: {list(metadata.keys())}')
for condition, bearings in metadata.items():
    total_files = sum(len(files) for files in bearings.values())
    print(f'  {condition}: {len(bearings)} bearings, {total_files} total files')

---

## 1. Understanding the Signal as Audio

Before converting to audio, let's understand what we're working with. The vibration signal at 25.6kHz captures frequencies from 0 to 12.8kHz (Nyquist), which is well within the human hearing range (20Hz - 20kHz).

In [None]:
# Load a sample signal
sample_condition = '40Hz10kN'
sample_bearing = 'Bearing3_2'

# Load all files for this bearing
signals, filenames = loader.load_bearing(sample_condition, sample_bearing)
num_files = len(filenames)

print(f'Loaded {num_files} files from {sample_bearing} ({sample_condition})')
print(f'Signal shape per file: {signals[0].shape}')
print(f'Duration per file: {signals[0].shape[0] / SAMPLING_RATE * 1000:.1f} ms')

# Audio properties
duration_sec = signals[0].shape[0] / SAMPLING_RATE
nyquist = SAMPLING_RATE / 2

print(f'\nAudio Properties:')
print(f'  Duration: {duration_sec:.3f} seconds')
print(f'  Nyquist Frequency: {nyquist:,.0f} Hz')
print(f'  Audible Range: ~20 Hz to {min(nyquist, 20000):,.0f} Hz')

In [None]:
# Visualize waveform and spectrogram for healthy vs failed states
fig, axes = plt.subplots(2, 3, figsize=(16, 8))

# Define lifecycle stages
stages = {
    'Healthy (5%)': int(num_files * 0.05),
    'Degrading (50%)': int(num_files * 0.50),
    'Failed (95%)': int(num_files * 0.95)
}

for col, (stage_name, file_idx) in enumerate(stages.items()):
    sig = signals[file_idx][:, 0]  # Horizontal channel
    time_ms = np.arange(len(sig)) / SAMPLING_RATE * 1000
    
    # Waveform
    axes[0, col].plot(time_ms, sig, linewidth=0.5, color='#3498db')
    axes[0, col].set_title(f'{stage_name}', fontsize=12, fontweight='bold')
    axes[0, col].set_xlabel('Time (ms)')
    axes[0, col].set_ylabel('Amplitude')
    axes[0, col].set_xlim(0, time_ms[-1])
    
    # Add RMS annotation
    rms = np.sqrt(np.mean(sig**2))
    axes[0, col].annotate(f'RMS: {rms:.3f}', xy=(0.02, 0.98), xycoords='axes fraction',
                          fontsize=10, va='top', ha='left',
                          bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))
    
    # Spectrogram
    f, t, Sxx = signal.spectrogram(sig, fs=SAMPLING_RATE, nperseg=512, noverlap=384)
    Sxx_db = 10 * np.log10(Sxx + 1e-10)
    
    im = axes[1, col].pcolormesh(t*1000, f/1000, Sxx_db, shading='gouraud', cmap='inferno')
    axes[1, col].set_xlabel('Time (ms)')
    axes[1, col].set_ylabel('Frequency (kHz)')
    axes[1, col].set_ylim(0, 6)  # Focus on 0-6 kHz

plt.suptitle(f'Bearing {sample_bearing}: Waveform and Spectrogram at Different Lifecycle Stages',
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

---

## 2. Converting Vibration Signals to Audio

We'll resample from 25.6kHz to 44.1kHz (CD-quality audio) to ensure compatibility with standard audio players. The `src/utils/audio.py` module handles this conversion.

In [None]:
# Demonstrate resampling process
healthy_sig = signals[stages['Healthy (5%)']][:, 0]
failed_sig = signals[stages['Failed (95%)']][:, 0]

print(f'Original signal: {len(healthy_sig):,} samples @ {SOURCE_SAMPLE_RATE:,} Hz')

# Resample to 44.1kHz
healthy_resampled = resample_signal(healthy_sig)
failed_resampled = resample_signal(failed_sig)

print(f'Resampled signal: {len(healthy_resampled):,} samples @ {TARGET_SAMPLE_RATE:,} Hz')
print(f'Duration unchanged: {len(healthy_resampled)/TARGET_SAMPLE_RATE:.3f} s')

# Normalize for audio playback
healthy_audio = normalize_audio(healthy_resampled)
failed_audio = normalize_audio(failed_resampled)

print(f'\nAmplitude range (healthy): [{healthy_audio.min():.3f}, {healthy_audio.max():.3f}]')
print(f'Amplitude range (failed): [{failed_audio.min():.3f}, {failed_audio.max():.3f}]')

In [None]:
# Verify resampling preserves signal characteristics
fig, axes = plt.subplots(2, 2, figsize=(14, 8))

# Time domain comparison
time_orig = np.arange(len(healthy_sig)) / SOURCE_SAMPLE_RATE * 1000
time_resamp = np.arange(len(healthy_resampled)) / TARGET_SAMPLE_RATE * 1000

# Show first 10ms for detail
limit_orig = int(0.01 * SOURCE_SAMPLE_RATE)
limit_resamp = int(0.01 * TARGET_SAMPLE_RATE)

axes[0, 0].plot(time_orig[:limit_orig], healthy_sig[:limit_orig], label='Original (25.6kHz)', alpha=0.8)
axes[0, 0].plot(time_resamp[:limit_resamp], healthy_resampled[:limit_resamp], label='Resampled (44.1kHz)', alpha=0.8)
axes[0, 0].set_title('Healthy: Time Domain (First 10ms)')
axes[0, 0].set_xlabel('Time (ms)')
axes[0, 0].set_ylabel('Amplitude')
axes[0, 0].legend()

axes[0, 1].plot(time_orig[:limit_orig], failed_sig[:limit_orig], label='Original (25.6kHz)', alpha=0.8)
axes[0, 1].plot(time_resamp[:limit_resamp], failed_resampled[:limit_resamp], label='Resampled (44.1kHz)', alpha=0.8)
axes[0, 1].set_title('Failed: Time Domain (First 10ms)')
axes[0, 1].set_xlabel('Time (ms)')
axes[0, 1].set_ylabel('Amplitude')
axes[0, 1].legend()

# Frequency domain comparison (PSD)
f_orig, psd_orig = signal.welch(healthy_sig, fs=SOURCE_SAMPLE_RATE, nperseg=2048)
f_resamp, psd_resamp = signal.welch(healthy_resampled, fs=TARGET_SAMPLE_RATE, nperseg=2048)

axes[1, 0].semilogy(f_orig/1000, psd_orig, label='Original (25.6kHz)', alpha=0.8)
axes[1, 0].semilogy(f_resamp/1000, psd_resamp, label='Resampled (44.1kHz)', alpha=0.8)
axes[1, 0].set_title('Healthy: Power Spectral Density')
axes[1, 0].set_xlabel('Frequency (kHz)')
axes[1, 0].set_ylabel('PSD')
axes[1, 0].set_xlim(0, 12)
axes[1, 0].legend()
axes[1, 0].axvline(x=12.8, color='red', linestyle='--', alpha=0.5, label='Original Nyquist')

f_orig_f, psd_orig_f = signal.welch(failed_sig, fs=SOURCE_SAMPLE_RATE, nperseg=2048)
f_resamp_f, psd_resamp_f = signal.welch(failed_resampled, fs=TARGET_SAMPLE_RATE, nperseg=2048)

axes[1, 1].semilogy(f_orig_f/1000, psd_orig_f, label='Original (25.6kHz)', alpha=0.8)
axes[1, 1].semilogy(f_resamp_f/1000, psd_resamp_f, label='Resampled (44.1kHz)', alpha=0.8)
axes[1, 1].set_title('Failed: Power Spectral Density')
axes[1, 1].set_xlabel('Frequency (kHz)')
axes[1, 1].set_ylabel('PSD')
axes[1, 1].set_xlim(0, 12)
axes[1, 1].legend()
axes[1, 1].axvline(x=12.8, color='red', linestyle='--', alpha=0.5, label='Original Nyquist')

plt.suptitle('Resampling Verification: Signal Characteristics Preserved', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

---

## 3. Audio Playback: Healthy vs Degrading vs Failed

Now let's listen to the bearing at different lifecycle stages. Use headphones for best experience!

In [None]:
# Generate and display audio for each lifecycle stage
print(f"\n{'='*60}")
print(f"Bearing: {sample_bearing} ({sample_condition})")
print(f"Use headphones for the best listening experience!")
print(f"{'='*60}\n")

for stage_name, file_idx in stages.items():
    sig = signals[file_idx][:, 0]  # Horizontal channel
    
    # Resample and normalize
    resampled = resample_signal(sig)
    audio_data = normalize_audio(resampled)
    
    # Calculate RMS for reference
    rms = np.sqrt(np.mean(sig**2))
    
    print(f"\n{stage_name} (File {file_idx+1}/{num_files})")
    print(f"  RMS: {rms:.4f}")
    print(f"  Duration: {len(audio_data)/TARGET_SAMPLE_RATE:.2f} seconds")
    
    # Display audio player
    display(Audio(data=audio_data, rate=TARGET_SAMPLE_RATE))

### What to Listen For

- **Healthy**: Relatively smooth, consistent tone with low-frequency hum from rotation
- **Degrading**: May hear occasional clicks or increased roughness
- **Failed**: Loud grinding, clicking, or harsh metallic sounds; much louder overall

---

## 4. Save WAV Files for Multiple Bearings

Generate audio files for all bearings at key lifecycle stages for offline analysis.

In [None]:
def generate_audio_for_bearing(loader, condition, bearing_id, output_dir, 
                                lifecycle_pcts=[0, 50, 100], channel='horizontal'):
    """
    Generate WAV files for a bearing at specified lifecycle percentages.
    
    Args:
        loader: XJTUBearingLoader instance
        condition: Operating condition string
        bearing_id: Bearing identifier
        output_dir: Directory to save WAV files
        lifecycle_pcts: List of lifecycle percentages (0=healthy, 100=failed)
        channel: 'horizontal', 'vertical', or 'both'
    
    Returns:
        List of generated file paths
    """
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    
    signals, filenames = loader.load_bearing(condition, bearing_id)
    num_files = len(filenames)
    
    generated_files = []
    
    for pct in lifecycle_pcts:
        file_idx = min(int(num_files * pct / 100), num_files - 1)
        sig = signals[file_idx]
        
        # Determine stage name
        if pct <= 10:
            stage = 'healthy'
        elif pct >= 90:
            stage = 'failed'
        else:
            stage = 'degrading'
        
        # Generate filename
        filename = f"{bearing_id}_{stage}_{pct}pct_{channel}.wav"
        output_path = output_dir / filename
        
        # Convert to audio
        convert_vibration_to_audio(sig, output_path, channel=channel)
        generated_files.append(output_path)
    
    return generated_files

In [None]:
# Generate audio files for a few representative bearings
representative_bearings = [
    ('35Hz12kN', 'Bearing1_1'),
    ('37.5Hz11kN', 'Bearing2_1'),
    ('40Hz10kN', 'Bearing3_2'),
]

lifecycle_stages = [0, 25, 50, 75, 100]  # 5 stages from healthy to failed

all_generated_files = []

print("Generating audio files...\n")

for condition, bearing_id in representative_bearings:
    print(f"Processing {bearing_id} ({condition})...")
    
    files = generate_audio_for_bearing(
        loader, condition, bearing_id,
        output_dir=AUDIO_OUTPUT_DIR,
        lifecycle_pcts=lifecycle_stages,
        channel='horizontal'
    )
    
    all_generated_files.extend(files)
    print(f"  Generated {len(files)} files")

print(f"\nTotal files generated: {len(all_generated_files)}")
print(f"Output directory: {AUDIO_OUTPUT_DIR.absolute()}")

### 4.1 Generate Audio for ALL 15 Bearings

Now let's generate audio files for **all 15 bearings** at three key lifecycle stages (healthy, degrading, failed) to meet the acceptance criteria.

In [None]:
# Generate audio files for ALL 15 bearings at key lifecycle stages
# This satisfies EDA-4 acceptance criteria: "At least 3 lifecycle stages per bearing have audio files generated"

# Get unique condition-bearing pairs from metadata DataFrame
unique_bearings = metadata.groupby(['condition', 'bearing_id']).size().reset_index(name='count')
all_bearings = list(zip(unique_bearings['condition'], unique_bearings['bearing_id']))

print(f"Generating audio for {len(all_bearings)} bearings...")
print("Lifecycle stages: 0% (healthy), 50% (degrading), 100% (failed)")
print("="*70)

# Key lifecycle stages
key_stages = [0, 50, 100]  # healthy, degrading, failed

total_files_generated = 0
all_audio_files = []

for i, (condition, bearing_id) in enumerate(all_bearings):
    print(f"\n[{i+1}/{len(all_bearings)}] Processing {bearing_id} ({condition})...")
    
    # Create output directory for this bearing
    bearing_output_dir = AUDIO_OUTPUT_DIR / condition / bearing_id
    bearing_output_dir.mkdir(parents=True, exist_ok=True)
    
    try:
        # Load bearing data
        signals, filenames = loader.load_bearing(condition, bearing_id)
        num_files = len(filenames)
        
        for pct in key_stages:
            file_idx = min(int(num_files * pct // 100), num_files - 1)
            sig = signals[file_idx]
            
            # Determine stage name
            if pct == 0:
                stage = 'healthy'
            elif pct == 100:
                stage = 'failed'
            else:
                stage = 'degrading'
            
            # Generate for horizontal channel
            filename = f"{bearing_id}_{stage}_{pct}pct_h.wav"
            output_path = bearing_output_dir / filename
            
            # Skip if already exists
            if output_path.exists():
                print(f"  Skipping {filename} (already exists)")
                all_audio_files.append(output_path)
                continue
            
            # Convert to audio (horizontal channel)
            h_signal = sig[:, 0]
            resampled = resample_signal(h_signal)
            audio_data = normalize_audio(resampled)
            signal_to_wav(audio_data, output_path, sample_rate=TARGET_SAMPLE_RATE)
            
            all_audio_files.append(output_path)
            total_files_generated += 1
            print(f"  Generated: {filename}")
            
    except Exception as e:
        print(f"  ERROR: {e}")
        continue

print(f"\n{'='*70}")
print(f"COMPLETE: Generated {total_files_generated} new audio files")
print(f"Total audio files across all bearings: {len(all_audio_files)}")

In [None]:
# List generated files
print("Generated Audio Files:")
print("="*60)

for f in sorted(AUDIO_OUTPUT_DIR.glob('*.wav')):
    # Get file size
    size_kb = f.stat().st_size / 1024
    print(f"  {f.name:50} ({size_kb:.1f} KB)")

---

## 5. Interactive Audio Comparison Widget

This widget allows **side-by-side comparison** of bearing audio at different lifecycle stages:
- **Healthy (0%)**: Early life, minimal degradation
- **Degrading (50%)**: Mid-life, progressive wear
- **Failed (100%)**: End-of-life, severe damage

Select any bearing and channel to instantly compare the audio signatures. Use headphones for the best experience!

In [None]:
# =============================================================================
# AUDIO COMPARISON WIDGET
# =============================================================================
# This widget allows side-by-side comparison of bearing audio at different
# lifecycle stages (Healthy, Degrading, Failed) for any selected bearing.

from IPython.display import Audio, display, HTML, clear_output
import ipywidgets as widgets

# Build dropdown options from available bearings
bearing_options = []
for condition, bearings in metadata.items():
    for bearing_id in bearings.keys():
        bearing_options.append(f"{condition}/{bearing_id}")

# ============ COMPARISON WIDGET: Side-by-Side Audio Playback ============

class AudioComparisonWidget:
    """
    Interactive widget for comparing bearing audio at different lifecycle stages.
    Displays Healthy (0%), Degrading (50%), and Failed (100%) audio side-by-side.
    """
    
    def __init__(self, loader, metadata):
        self.loader = loader
        self.metadata = metadata
        
        # Build bearing options
        self.bearing_options = []
        for condition, bearings in metadata.items():
            for bearing_id in bearings.keys():
                self.bearing_options.append(f"{condition}/{bearing_id}")
        
        # Create widgets
        self.bearing_dropdown = widgets.Dropdown(
            options=self.bearing_options,
            value='40Hz10kN/Bearing3_2',
            description='Bearing:',
            style={'description_width': '80px'},
            layout=widgets.Layout(width='300px')
        )
        
        self.channel_dropdown = widgets.Dropdown(
            options=['horizontal', 'vertical'],
            value='horizontal',
            description='Channel:',
            style={'description_width': '80px'},
            layout=widgets.Layout(width='200px')
        )
        
        self.compare_button = widgets.Button(
            description='Compare Audio',
            button_style='primary',
            icon='play',
            layout=widgets.Layout(width='150px')
        )
        
        self.output_area = widgets.Output()
        
        # Connect button to callback
        self.compare_button.on_click(self._on_compare_click)
        
    def _on_compare_click(self, btn):
        """Generate comparison display when button is clicked."""
        self.display_comparison(self.bearing_dropdown.value, self.channel_dropdown.value)
    
    def display_comparison(self, bearing_str, channel):
        """Display audio comparison for selected bearing."""
        with self.output_area:
            clear_output(wait=True)
            
            # Parse bearing string
            condition, bearing_id = bearing_str.split('/')
            
            # Load bearing data
            try:
                signals, filenames = self.loader.load_bearing(condition, bearing_id)
            except Exception as e:
                display(HTML(f"<p style='color:red'>Error loading bearing: {e}</p>"))
                return
                
            num_files = len(filenames)
            channel_idx = 0 if channel == 'horizontal' else 1
            
            # Define lifecycle stages
            stages = [
                ('Healthy', 0, '#27ae60'),      # Green
                ('Degrading', 50, '#f39c12'),   # Orange
                ('Failed', 100, '#e74c3c'),     # Red
            ]
            
            # Header
            display(HTML(f"""
            <div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); 
                        padding: 15px; border-radius: 10px; margin-bottom: 15px; color: white;">
                <h2 style="margin:0; color:white;">ðŸ”Š Audio Comparison: {bearing_id}</h2>
                <p style="margin:5px 0 0 0; opacity:0.9;">
                    Condition: {condition} | Channel: {channel.title()} | Total Files: {num_files}
                </p>
            </div>
            """))
            
            # Create comparison table
            display(HTML("""
            <style>
                .audio-card { 
                    border: 2px solid #ddd; 
                    border-radius: 10px; 
                    padding: 15px; 
                    margin: 10px 0;
                    background: white;
                    box-shadow: 0 2px 5px rgba(0,0,0,0.1);
                }
                .audio-card h3 { margin: 0 0 10px 0; }
                .audio-stats { 
                    display: flex; 
                    gap: 20px; 
                    margin-top: 10px;
                    flex-wrap: wrap;
                }
                .stat-item { 
                    background: #f8f9fa; 
                    padding: 5px 10px; 
                    border-radius: 5px;
                    font-size: 0.9em;
                }
            </style>
            """))
            
            # Generate audio for each stage
            for stage_name, pct, color in stages:
                file_idx = min(int(num_files * pct / 100), num_files - 1)
                sig = signals[file_idx][:, channel_idx]
                
                # Compute statistics
                rms = np.sqrt(np.mean(sig**2))
                peak = np.max(np.abs(sig))
                crest_factor = peak / rms if rms > 0 else 0
                
                # Resample and normalize
                resampled = resample_signal(sig)
                audio_data = normalize_audio(resampled)
                
                # Display card
                display(HTML(f"""
                <div class="audio-card" style="border-left: 5px solid {color};">
                    <h3 style="color: {color};">
                        {'ðŸŸ¢' if pct == 0 else 'ðŸŸ ' if pct == 50 else 'ðŸ”´'} {stage_name} ({pct}% Lifecycle)
                    </h3>
                    <p style="margin:5px 0; color:#666;">
                        File: {filenames[file_idx]} (#{file_idx + 1} of {num_files})
                    </p>
                    <div class="audio-stats">
                        <span class="stat-item">RMS: {rms:.4f}</span>
                        <span class="stat-item">Peak: {peak:.4f}</span>
                        <span class="stat-item">Crest Factor: {crest_factor:.2f}</span>
                    </div>
                </div>
                """))
                
                # Display audio player
                display(Audio(data=audio_data, rate=TARGET_SAMPLE_RATE))
            
            # Add listening tips
            display(HTML("""
            <div style="background: #f0f8ff; padding: 15px; border-radius: 10px; margin-top: 15px; 
                        border-left: 4px solid #3498db;">
                <h4 style="margin:0 0 10px 0; color:#2980b9;">ðŸŽ§ Listening Tips</h4>
                <ul style="margin:0; padding-left:20px; color:#34495e;">
                    <li><strong>Healthy:</strong> Smooth, consistent tone with low-frequency hum</li>
                    <li><strong>Degrading:</strong> May hear occasional clicks or increased roughness</li>
                    <li><strong>Failed:</strong> Loud grinding, clicking, or harsh metallic sounds</li>
                </ul>
                <p style="margin:10px 0 0 0; font-style:italic; color:#7f8c8d;">
                    Use headphones for the best listening experience!
                </p>
            </div>
            """))
    
    def show(self):
        """Display the complete widget interface."""
        # Title
        title = widgets.HTML("""
        <h3 style="background: #2c3e50; color: white; padding: 10px 15px; 
                   border-radius: 5px; margin-bottom: 10px;">
            ðŸŽµ Audio Comparison Widget
        </h3>
        <p style="color: #666; margin-bottom: 15px;">
            Select a bearing and click "Compare Audio" to hear Healthy, Degrading, and Failed states side-by-side.
        </p>
        """)
        
        # Controls layout
        controls = widgets.HBox([
            self.bearing_dropdown,
            self.channel_dropdown,
            self.compare_button
        ], layout=widgets.Layout(gap='10px'))
        
        # Full layout
        layout = widgets.VBox([
            title,
            controls,
            self.output_area
        ])
        
        display(layout)
        
        # Auto-load first comparison
        self.display_comparison(self.bearing_dropdown.value, self.channel_dropdown.value)


# Create and display the comparison widget
comparison_widget = AudioComparisonWidget(loader, metadata)
comparison_widget.show()

---

## 6. Audio Feature Analysis: What Makes Failed Bearings Sound Different?

Let's analyze the audio characteristics that distinguish healthy from failed bearings.

In [None]:
def compute_audio_features(signal_data):
    """Compute audio-relevant features from vibration signal."""
    # Time domain
    rms = np.sqrt(np.mean(signal_data**2))
    peak = np.max(np.abs(signal_data))
    crest_factor = peak / rms if rms > 0 else 0
    
    # Zero crossing rate (indicates roughness)
    zero_crossings = np.sum(np.diff(np.signbit(signal_data).astype(int)) != 0)
    zcr = zero_crossings / len(signal_data)
    
    # Spectral features
    freqs, psd = signal.welch(signal_data, fs=SAMPLING_RATE, nperseg=2048)
    psd_normalized = psd / np.sum(psd)
    
    # Spectral centroid (perceived brightness)
    spectral_centroid = np.sum(freqs * psd_normalized)
    
    # Spectral bandwidth (spread of frequencies)
    spectral_bandwidth = np.sqrt(np.sum(((freqs - spectral_centroid)**2) * psd_normalized))
    
    # Spectral flatness (noise-like vs tonal)
    geometric_mean = np.exp(np.mean(np.log(psd + 1e-10)))
    arithmetic_mean = np.mean(psd)
    spectral_flatness = geometric_mean / arithmetic_mean if arithmetic_mean > 0 else 0
    
    return {
        'rms': rms,
        'peak': peak,
        'crest_factor': crest_factor,
        'zero_crossing_rate': zcr,
        'spectral_centroid': spectral_centroid,
        'spectral_bandwidth': spectral_bandwidth,
        'spectral_flatness': spectral_flatness
    }

In [None]:
# Analyze audio features across lifecycle for sample bearing
lifecycle_points = np.linspace(0, 1, 20)  # 20 points across lifecycle
feature_evolution = []

for pct in lifecycle_points:
    file_idx = min(int(num_files * pct), num_files - 1)
    sig = signals[file_idx][:, 0]
    
    features = compute_audio_features(sig)
    features['lifecycle_pct'] = pct * 100
    features['file_idx'] = file_idx
    feature_evolution.append(features)

df_features = pd.DataFrame(feature_evolution)
df_features.head()

In [None]:
# Visualize audio feature evolution
fig, axes = plt.subplots(2, 3, figsize=(15, 8))

features_to_plot = [
    ('rms', 'RMS (Loudness)', '#3498db'),
    ('crest_factor', 'Crest Factor (Impulsiveness)', '#e74c3c'),
    ('zero_crossing_rate', 'Zero Crossing Rate (Roughness)', '#2ecc71'),
    ('spectral_centroid', 'Spectral Centroid (Brightness)', '#9b59b6'),
    ('spectral_bandwidth', 'Spectral Bandwidth (Spread)', '#f39c12'),
    ('spectral_flatness', 'Spectral Flatness (Noise-like)', '#1abc9c'),
]

for ax, (feature, title, color) in zip(axes.flatten(), features_to_plot):
    ax.plot(df_features['lifecycle_pct'], df_features[feature], 
            color=color, linewidth=2, marker='o', markersize=4)
    ax.fill_between(df_features['lifecycle_pct'], 0, df_features[feature], 
                    alpha=0.3, color=color)
    ax.set_xlabel('Lifecycle (%)')
    ax.set_ylabel(feature.replace('_', ' ').title())
    ax.set_title(title, fontweight='bold')
    ax.axvline(x=80, color='red', linestyle='--', alpha=0.5)
    ax.grid(True, alpha=0.3)

plt.suptitle(f'Audio Feature Evolution: {sample_bearing}', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

### Audio Feature Interpretation

- **RMS (Loudness)**: Increases significantly toward end-of-life as bearing damage creates louder vibrations
- **Crest Factor**: Ratio of peak to RMS; high values indicate impulsive sounds (clicks, impacts)
- **Zero Crossing Rate**: Higher values indicate more high-frequency content or roughness
- **Spectral Centroid**: Center of mass of spectrum; shifts indicate changes in dominant frequencies
- **Spectral Bandwidth**: How spread out the frequencies are; wider = more complex sound
- **Spectral Flatness**: Closer to 1 = noise-like; closer to 0 = tonal

---

## 7. Cross-Bearing Audio Comparison

Compare audio from healthy states across different bearings to establish baseline, then compare failed states.

In [None]:
# Compare healthy vs failed for multiple bearings
bearings_to_compare = [
    ('35Hz12kN', 'Bearing1_1'),
    ('37.5Hz11kN', 'Bearing2_1'),
    ('40Hz10kN', 'Bearing3_2'),
]

comparison_data = []

for condition, bearing_id in bearings_to_compare:
    sigs, fnames = loader.load_bearing(condition, bearing_id)
    n_files = len(fnames)
    
    # Healthy (5%)
    healthy_idx = int(n_files * 0.05)
    healthy_features = compute_audio_features(sigs[healthy_idx][:, 0])
    healthy_features.update({
        'condition': condition,
        'bearing_id': bearing_id,
        'state': 'Healthy'
    })
    comparison_data.append(healthy_features)
    
    # Failed (95%)
    failed_idx = int(n_files * 0.95)
    failed_features = compute_audio_features(sigs[failed_idx][:, 0])
    failed_features.update({
        'condition': condition,
        'bearing_id': bearing_id,
        'state': 'Failed'
    })
    comparison_data.append(failed_features)

df_comparison = pd.DataFrame(comparison_data)
df_comparison

In [None]:
# Visualize healthy vs failed comparison
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

features_to_compare = ['rms', 'crest_factor', 'spectral_centroid']
colors = {'Healthy': '#2ecc71', 'Failed': '#e74c3c'}

for ax, feature in zip(axes, features_to_compare):
    # Prepare data for grouped bar chart
    x = np.arange(len(bearings_to_compare))
    width = 0.35
    
    healthy_vals = df_comparison[df_comparison['state'] == 'Healthy'][feature].values
    failed_vals = df_comparison[df_comparison['state'] == 'Failed'][feature].values
    
    bars1 = ax.bar(x - width/2, healthy_vals, width, label='Healthy', color=colors['Healthy'])
    bars2 = ax.bar(x + width/2, failed_vals, width, label='Failed', color=colors['Failed'])
    
    ax.set_xlabel('Bearing')
    ax.set_ylabel(feature.replace('_', ' ').title())
    ax.set_title(f'{feature.replace("_", " ").title()} Comparison', fontweight='bold')
    ax.set_xticks(x)
    ax.set_xticklabels([b[1] for b in bearings_to_compare], rotation=15)
    ax.legend()
    ax.grid(True, alpha=0.3, axis='y')

plt.suptitle('Healthy vs Failed: Audio Feature Comparison', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

In [None]:
# Play audio comparison for the bearings
print("="*70)
print("AUDIO COMPARISON: HEALTHY vs FAILED")
print("="*70)

for condition, bearing_id in bearings_to_compare:
    sigs, fnames = loader.load_bearing(condition, bearing_id)
    n_files = len(fnames)
    
    print(f"\n{'-'*50}")
    print(f"{bearing_id} ({condition})")
    print(f"{'-'*50}")
    
    # Healthy
    healthy_idx = int(n_files * 0.05)
    healthy_sig = sigs[healthy_idx][:, 0]
    healthy_audio = normalize_audio(resample_signal(healthy_sig))
    
    print(f"\nHealthy (5% lifecycle):")
    display(Audio(data=healthy_audio, rate=TARGET_SAMPLE_RATE))
    
    # Failed
    failed_idx = int(n_files * 0.95)
    failed_sig = sigs[failed_idx][:, 0]
    failed_audio = normalize_audio(resample_signal(failed_sig))
    
    print(f"\nFailed (95% lifecycle):")
    display(Audio(data=failed_audio, rate=TARGET_SAMPLE_RATE))

---

## 8. Stereo Audio: Horizontal vs Vertical Channels

Listen to both vibration channels simultaneously as stereo audio.

In [None]:
# Generate stereo audio (H=left, V=right)
print("Stereo Audio: Horizontal (Left) | Vertical (Right)")
print("="*60)

for stage_name, file_idx in stages.items():
    sig = signals[file_idx]  # Both channels
    
    # Resample both channels
    resampled = resample_signal(sig)
    stereo_audio = normalize_audio(resampled)
    
    print(f"\n{stage_name} (Stereo):")
    display(Audio(data=stereo_audio.T, rate=TARGET_SAMPLE_RATE))  # Transpose for stereo format

---

## 9. Summary and Conclusions

### Key Findings from Audio Analysis

1. **Audible Degradation**: Failed bearings produce distinctly louder and harsher sounds compared to healthy ones

2. **RMS as Loudness Indicator**: The RMS value directly correlates with perceived loudness, increasing significantly as bearings degrade

3. **Impulsive Sounds**: Crest factor increases near failure, indicating more impulsive/clicking sounds from bearing damage

4. **Spectral Changes**: Failed bearings show increased high-frequency content (roughness) and broader spectral bandwidth

5. **Cross-Condition Similarity**: Despite different operating conditions (speed/load), the audio characteristics of degradation are qualitatively similar

### Practical Applications

- **Training Data for Acoustic Models**: Generated audio files can train audio-based diagnostic models
- **Human-in-the-Loop Inspection**: Sonification allows technicians to "listen" to vibration data
- **Multimodal Analysis**: Combined audio-visual (spectrogram) analysis improves fault detection
- **Educational Tool**: Audio helps build intuition for bearing health states

In [None]:
# Summary statistics
print("\nEDA-4 Audio Analysis Complete!")
print("="*60)
print(f"\nGenerated Audio Files: {len(list(AUDIO_OUTPUT_DIR.glob('*.wav')))}")
print(f"Output Directory: {AUDIO_OUTPUT_DIR.absolute()}")
print(f"\nSampling Rate: {SOURCE_SAMPLE_RATE} Hz -> {TARGET_SAMPLE_RATE} Hz")
print(f"Signal Duration: {32768/SOURCE_SAMPLE_RATE*1000:.1f} ms per file")
print(f"\nCapabilities Demonstrated:")
print("  - Vibration to WAV conversion")
print("  - 25.6kHz to 44.1kHz resampling")
print("  - Audio playback at lifecycle stages")
print("  - Interactive audio comparison widget")
print("  - Stereo (H/V channels) audio generation")
print("  - Audio feature analysis (RMS, ZCR, spectral features)")