# 🔬 DSKYpoly Data Science Showcase
*From Polynomial Mathematics to Real-World Data Science*

This notebook demonstrates how DSKYpoly's mathematical foundation can be leveraged for advanced data science applications.

## 🎯 What We'll Explore
1. **Polynomial Regression Mastery** - Advanced curve fitting techniques
2. **Financial Mathematics** - Stock price trend analysis
3. **Signal Processing** - Noise reduction and pattern recognition
4. **Scientific Computing** - High-precision mathematical modeling
5. **Interactive Visualizations** - Professional data science presentations

In [None]:
# Import the DSKYpoly data science toolkit
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Import our custom toolkit
from polynomial_toolkit import PolynomialAnalyzer, FinancialPolynomialAnalyzer

print("🚀 DSKYpoly Data Science Environment Loaded!")
print(f"📊 NumPy version: {np.__version__}")
print(f"🐼 Pandas version: {pd.__version__}")
print(f"📈 Plotly available for interactive visualizations")

## 🔬 Demo 1: Advanced Polynomial Regression

Let's start with a complex dataset that showcases the power of polynomial modeling with automatic degree selection and regularization.

In [None]:
# Generate a complex synthetic dataset
np.random.seed(42)
n_samples = 200
X = np.linspace(0, 4*np.pi, n_samples)

# Complex function with multiple components
true_function = (2*np.sin(X) + 0.5*X**2 - 0.1*X**3 + 
                np.cos(2*X) + 0.3*X)
noise = np.random.normal(0, 0.5, n_samples)
y = true_function + noise

# Split into train/test
split_idx = int(0.8 * n_samples)
X_train, X_test = X[:split_idx], X[split_idx:]
y_train, y_test = y[:split_idx], y[split_idx:]

print(f"📊 Dataset created: {len(X_train)} training, {len(X_test)} test samples")
print(f"🎯 Function: 2*sin(x) + 0.5*x² - 0.1*x³ + cos(2x) + 0.3*x + noise")

In [None]:
# Create and analyze with different regularization techniques
analyzers = {
    'Standard': PolynomialAnalyzer(max_degree=15),
    'Ridge': PolynomialAnalyzer(max_degree=15, regularization='ridge', alpha=1.0),
    'Lasso': PolynomialAnalyzer(max_degree=15, regularization='lasso', alpha=0.1)
}

results = {}
for name, analyzer in analyzers.items():
    print(f"\n🔍 Analyzing with {name} regularization...")
    
    # Find optimal degree and fit model
    cv_results = analyzer.find_optimal_degree(X_train, y_train)
    analyzer.fit_best_model(X_train, y_train)
    
    # Generate performance report
    report = analyzer.generate_performance_report(X_train, y_train, X_test, y_test)
    results[name] = report
    
    print(f"  Best degree: {report['best_degree']}")
    print(f"  Test R²: {report['test_r2']:.4f}")
    print(f"  Generalization gap: {report['generalization_gap']:.4f}")

In [None]:
# Create comprehensive comparison visualization
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=('Model Fits Comparison', 'Performance Metrics',
                   'Cross-Validation Curves', 'Prediction Confidence'),
    specs=[[{"secondary_y": False}, {"secondary_y": False}],
           [{"secondary_y": False}, {"secondary_y": False}]]
)

# Plot 1: Model fits comparison
fig.add_trace(
    go.Scatter(x=X_train, y=y_train, mode='markers', name='Training Data',
              marker=dict(color='lightblue', size=4, opacity=0.6)),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x=X_test, y=y_test, mode='markers', name='Test Data',
              marker=dict(color='lightcoral', size=4, opacity=0.6)),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x=X, y=true_function, mode='lines', name='True Function',
              line=dict(color='black', width=2, dash='dash')),
    row=1, col=1
)

colors = ['red', 'green', 'purple']
for i, (name, analyzer) in enumerate(analyzers.items()):
    y_pred = analyzer.best_model.predict(X.reshape(-1, 1))
    fig.add_trace(
        go.Scatter(x=X, y=y_pred, mode='lines', name=f'{name} Fit',
                  line=dict(color=colors[i], width=2)),
        row=1, col=1
    )

# Plot 2: Performance metrics
metrics = ['test_r2', 'generalization_gap']
metric_names = ['Test R²', 'Generalization Gap']

for i, metric in enumerate(metrics):
    values = [results[name][metric] for name in results.keys()]
    fig.add_trace(
        go.Bar(x=list(results.keys()), y=values, name=metric_names[i],
              marker=dict(color=colors[i])),
        row=1, col=2
    )

# Plot 3: Cross-validation curves for Ridge regularization
ridge_analyzer = analyzers['Ridge']
degrees = list(ridge_analyzer.cv_scores.keys())
cv_means = [ridge_analyzer.cv_scores[d]['mean'] for d in degrees]
cv_stds = [ridge_analyzer.cv_scores[d]['std'] for d in degrees]

fig.add_trace(
    go.Scatter(x=degrees, y=cv_means,
              error_y=dict(type='data', array=cv_stds),
              mode='lines+markers', name='Ridge CV Score',
              line=dict(color='green')),
    row=2, col=1
)

# Plot 4: Prediction confidence for best model
best_analyzer = min(analyzers.values(), key=lambda a: results[list(analyzers.keys())[list(analyzers.values()).index(a)]]['test_mse'])
X_future = np.linspace(X.max(), X.max() + 2, 50)
pred_results = best_analyzer.predict_with_confidence(X_future)

fig.add_trace(
    go.Scatter(x=X_future, y=pred_results['predictions'],
              mode='lines', name='Future Prediction',
              line=dict(color='blue')),
    row=2, col=2
)

fig.add_trace(
    go.Scatter(
        x=np.concatenate([X_future, X_future[::-1]]),
        y=np.concatenate([pred_results['upper_bound'], 
                         pred_results['lower_bound'][::-1]]),
        fill='toself', fillcolor='rgba(0,0,255,0.2)',
        line=dict(color='rgba(255,255,255,0)'),
        name='Confidence Interval'
    ),
    row=2, col=2
)

fig.update_layout(
    title='🔬 DSKYpoly Advanced Polynomial Analysis',
    height=800,
    showlegend=True
)

fig.show()

## 📈 Demo 2: Financial Time Series Analysis

Now let's apply our polynomial techniques to financial data analysis, demonstrating real-world applications.

In [None]:
# Generate realistic stock price data
np.random.seed(123)
n_days = 500
dates = pd.date_range(start='2022-01-01', periods=n_days, freq='D')

# Simulate stock price with multiple components
time_idx = np.arange(n_days)

# Long-term trend (polynomial)
trend = 100 + 0.1*time_idx + 0.0001*time_idx**2 - 0.0000001*time_idx**3

# Seasonal pattern
seasonal = 5 * np.sin(2*np.pi*time_idx/252) + 2 * np.cos(2*np.pi*time_idx/126)

# Random walk component
random_walk = np.cumsum(np.random.normal(0, 0.5, n_days))

# Market shocks (occasional large moves)
shocks = np.zeros(n_days)
shock_days = np.random.choice(n_days, size=10, replace=False)
shocks[shock_days] = np.random.normal(0, 10, 10)
shock_component = np.cumsum(shocks)

# Combine all components
prices = trend + seasonal + random_walk + shock_component
prices = np.maximum(prices, 50)  # Ensure positive prices

print(f"📊 Generated {n_days} days of synthetic stock data")
print(f"💰 Price range: ${prices.min():.2f} - ${prices.max():.2f}")
print(f"📈 Total return: {((prices[-1]/prices[0]) - 1)*100:.2f}%")

In [None]:
# Perform comprehensive financial analysis
fin_analyzer = FinancialPolynomialAnalyzer(max_degree=8, regularization='ridge', alpha=0.5)

# Analyze the full time series
trend_analysis = fin_analyzer.analyze_price_trend(prices, dates)

print("🔍 Financial Analysis Results:")
print(f"  Trend Direction: {trend_analysis['trend_direction']}")
print(f"  Total Return: {trend_analysis['total_return']*100:.2f}%")
print(f"  Annualized Volatility: {trend_analysis['annualized_volatility']*100:.2f}%")
print(f"  Sharpe Ratio: {trend_analysis['sharpe_ratio']:.3f}")
print(f"  Optimal Polynomial Degree: {trend_analysis['best_degree']}")

In [None]:
# Create comprehensive financial visualization
fig = fin_analyzer.plot_financial_analysis(prices, dates, 'DEMO_STOCK')
fig.show()

## 🌊 Demo 3: Signal Processing Applications

Demonstrate how polynomial methods can be used for signal denoising and pattern recognition.

In [None]:
# Generate a noisy signal
np.random.seed(456)
t = np.linspace(0, 10, 1000)

# Original signal (combination of sinusoids)
clean_signal = (2*np.sin(2*np.pi*t) + 
               0.5*np.sin(6*np.pi*t) + 
               np.cos(4*np.pi*t) +
               0.3*t)  # Linear trend

# Add various types of noise
gaussian_noise = np.random.normal(0, 0.5, len(t))
impulse_noise = np.zeros_like(t)
impulse_indices = np.random.choice(len(t), size=20, replace=False)
impulse_noise[impulse_indices] = np.random.uniform(-3, 3, 20)

noisy_signal = clean_signal + gaussian_noise + impulse_noise

print(f"🌊 Generated signal with {len(t)} samples")
print(f"📊 SNR: {10*np.log10(np.var(clean_signal)/np.var(gaussian_noise)):.2f} dB")

In [None]:
# Apply polynomial smoothing with different window sizes
def polynomial_smooth(signal, window_size, degree=3):
    """Apply polynomial smoothing to a signal."""
    smoothed = np.zeros_like(signal)
    half_window = window_size // 2
    
    for i in range(len(signal)):
        start = max(0, i - half_window)
        end = min(len(signal), i + half_window + 1)
        
        if end - start > degree + 1:  # Ensure enough points for fitting
            x_window = np.arange(start, end) - i
            y_window = signal[start:end]
            
            # Fit polynomial
            coeffs = np.polyfit(x_window, y_window, degree)
            smoothed[i] = np.polyval(coeffs, 0)  # Evaluate at center point
        else:
            smoothed[i] = signal[i]  # Keep original value if not enough points
    
    return smoothed

# Apply different smoothing strategies
window_sizes = [21, 51, 101]
smoothed_signals = {}

for ws in window_sizes:
    smoothed = polynomial_smooth(noisy_signal, ws, degree=3)
    smoothed_signals[f'Window_{ws}'] = smoothed
    
    # Calculate denoising performance
    mse = np.mean((clean_signal - smoothed)**2)
    print(f"Window size {ws}: MSE = {mse:.4f}")

In [None]:
# Create signal processing visualization
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=('Original vs Noisy Signal', 'Polynomial Denoising Results',
                   'Frequency Domain Analysis', 'Performance Comparison'),
)

# Plot 1: Original vs noisy
fig.add_trace(
    go.Scatter(x=t, y=clean_signal, mode='lines', name='Clean Signal',
              line=dict(color='blue', width=2)),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x=t, y=noisy_signal, mode='lines', name='Noisy Signal',
              line=dict(color='red', width=1, opacity=0.7)),
    row=1, col=1
)

# Plot 2: Denoising results
fig.add_trace(
    go.Scatter(x=t, y=clean_signal, mode='lines', name='True Signal',
              line=dict(color='black', width=2, dash='dash')),
    row=1, col=2
)

colors = ['green', 'orange', 'purple']
for i, (name, smoothed) in enumerate(smoothed_signals.items()):
    fig.add_trace(
        go.Scatter(x=t, y=smoothed, mode='lines', name=f'Smoothed {name}',
                  line=dict(color=colors[i], width=2)),
        row=1, col=2
    )

# Plot 3: Frequency domain (simplified)
from scipy.fft import fft, fftfreq
freqs = fftfreq(len(t), t[1] - t[0])[:len(t)//2]
clean_fft = np.abs(fft(clean_signal))[:len(t)//2]
noisy_fft = np.abs(fft(noisy_signal))[:len(t)//2]
best_smoothed_fft = np.abs(fft(smoothed_signals['Window_51']))[:len(t)//2]

fig.add_trace(
    go.Scatter(x=freqs, y=clean_fft, mode='lines', name='Clean FFT',
              line=dict(color='blue')),
    row=2, col=1
)

fig.add_trace(
    go.Scatter(x=freqs, y=noisy_fft, mode='lines', name='Noisy FFT',
              line=dict(color='red', opacity=0.7)),
    row=2, col=1
)

fig.add_trace(
    go.Scatter(x=freqs, y=best_smoothed_fft, mode='lines', name='Smoothed FFT',
              line=dict(color='green')),
    row=2, col=1
)

# Plot 4: Performance comparison
mse_values = []
window_labels = []
for name, smoothed in smoothed_signals.items():
    mse = np.mean((clean_signal - smoothed)**2)
    mse_values.append(mse)
    window_labels.append(name.replace('Window_', 'W'))

fig.add_trace(
    go.Bar(x=window_labels, y=mse_values, name='MSE',
          marker=dict(color='purple')),
    row=2, col=2
)

fig.update_layout(
    title='🌊 Signal Processing with Polynomial Methods',
    height=800,
    showlegend=True
)

fig.update_xaxes(title_text="Time", row=1, col=1)
fig.update_xaxes(title_text="Time", row=1, col=2)
fig.update_xaxes(title_text="Frequency", row=2, col=1)
fig.update_xaxes(title_text="Window Size", row=2, col=2)

fig.show()

## 🎯 Summary & Next Steps

### 🏆 What We've Demonstrated

1. **Advanced Polynomial Regression**
   - Automatic degree selection with cross-validation
   - Regularization techniques (Ridge/Lasso) for overfitting prevention
   - Performance evaluation and generalization analysis

2. **Financial Time Series Analysis**
   - Trend detection and modeling
   - Risk metrics calculation (volatility, Sharpe ratio)
   - Future price prediction with confidence intervals

3. **Signal Processing Applications**
   - Noise reduction using polynomial smoothing
   - Frequency domain analysis
   - Performance optimization

### 🚀 Expansion Opportunities

1. **Machine Learning Integration**
   - Feature engineering with polynomial features
   - Ensemble methods combining polynomial and ML models
   - Deep learning with polynomial activation functions

2. **Real-World Applications**
   - Climate data analysis and forecasting
   - Biomedical signal processing (ECG, EEG)
   - Economic modeling and policy analysis

3. **Performance Optimization**
   - GPU acceleration with CuPy/JAX
   - Parallel processing with Dask
   - Integration with assembly-optimized DSKYpoly core

### 📈 Data Science Impact

DSKYpoly's mathematical foundation provides unique advantages:
- **Theoretical Rigor**: Deep understanding of polynomial mathematics
- **Computational Excellence**: Assembly-optimized performance
- **Cross-Platform Power**: Windows Anaconda + Linux optimization
- **Educational Value**: Bridge between pure math and applications

*"From solving polynomials to solving real-world problems through data science"* 🎯

In [None]:
# Final demonstration: Create a summary dashboard
print("🎉 DSKYpoly Data Science Showcase Complete!")
print("\n📊 Capabilities Demonstrated:")
print("  ✅ Advanced polynomial regression with regularization")
print("  ✅ Financial time series analysis and risk modeling")
print("  ✅ Signal processing and noise reduction")
print("  ✅ Interactive visualizations and professional reporting")
print("  ✅ Cross-validation and performance optimization")

print("\n🚀 Ready for expansion into:")
print("  🔬 Scientific computing and research")
print("  💰 Quantitative finance and trading")
print("  🏥 Biomedical data analysis")
print("  🌍 Climate and environmental modeling")
print("  🤖 Machine learning and AI applications")

print("\n🎯 The mathematical foundation of DSKYpoly now powers")
print("   a comprehensive data science platform!")