# TSFresh Feature Extractor

**TSFresh** (Time Series Feature extraction based on scalable hypothesis tests) is a powerful Python library that automatically extracts hundreds of time series features. These features capture statistical properties, temporal patterns, and complexity measures that are essential for machine learning on time series data.

## Key Capabilities
- **800+ Features**: Automatically extracts statistics, autocorrelation, entropy, FFT coefficients, and more
- **Relevance Filtering**: Uses statistical hypothesis tests to select only relevant features
- **Scalable**: Designed for large-scale feature extraction with parallel processing
- **sktime Integration**: Seamlessly works with sktime's transformer API

## When to Use TSFresh
| Use Case | Recommendation |
|----------|----------------|
| Tabular ML on time series | ✅ Excellent - converts sequences to fixed-length vectors |
| Feature discovery/exploration | ✅ Great for understanding what patterns matter |
| Real-time inference | ⚠️ Consider subset of features for speed |
| Very long sequences (>10k points) | ⚠️ May be computationally expensive |

## 1. Mathematical Foundation

TSFresh extracts features based on well-established mathematical formulas. Understanding these foundations helps interpret what the features capture.

### 1.1 Basic Statistical Moments

**Mean (Central Tendency):**
$$\mu = \frac{1}{T}\sum_{t=1}^{T} x_t$$

**Variance (Spread):**
$$\sigma^2 = \frac{1}{T}\sum_{t=1}^{T} (x_t - \mu)^2$$

**Skewness (Asymmetry):**
$$\gamma_1 = \frac{1}{T}\sum_{t=1}^{T} \left(\frac{x_t - \mu}{\sigma}\right)^3$$

**Kurtosis (Tail Heaviness):**
$$\gamma_2 = \frac{1}{T}\sum_{t=1}^{T} \left(\frac{x_t - \mu}{\sigma}\right)^4 - 3$$

### 1.2 Autocorrelation Function (ACF)

Autocorrelation measures how a time series correlates with lagged versions of itself:

$$r_k = \frac{\sum_{t=1}^{T-k}(x_t - \bar{x})(x_{t+k} - \bar{x})}{\sum_{t=1}^{T}(x_t - \bar{x})^2}$$

**Intuition:** 
- $r_k \approx 1$: Strong positive correlation at lag $k$ (repeating patterns)
- $r_k \approx 0$: No correlation (random walk behavior)
- $r_k \approx -1$: Strong negative correlation (oscillating patterns)

### 1.3 Fourier Transform (Frequency Analysis)

The Discrete Fourier Transform (DFT) decomposes a signal into frequency components:

$$X_k = \sum_{n=0}^{N-1} x_n \cdot e^{-2\pi i k n / N}$$

**Power Spectrum:** $P_k = |X_k|^2$ reveals dominant frequencies in the signal.

### 1.4 Linear Trend

The slope of the best-fit line captures the overall trend:

$$\beta = \frac{\sum_{t=1}^{T}(t - \bar{t})(x_t - \bar{x})}{\sum_{t=1}^{T}(t-\bar{t})^2}$$

**Intuition:** $\beta > 0$ indicates upward trend, $\beta < 0$ indicates downward trend.

### 1.5 Approximate Entropy (Complexity)

Measures unpredictability/irregularity in a time series:

$$ApEn(m, r) = \Phi^m(r) - \Phi^{m+1}(r)$$

where $\Phi^m(r)$ counts similar patterns of length $m$ within tolerance $r$.

**Intuition:**
- Low ApEn → Regular, predictable patterns
- High ApEn → Complex, irregular patterns

## 2. Low-Level NumPy Implementation

Let's implement the core TSFresh feature computations from scratch using NumPy to deeply understand how they work.

In [None]:
import numpy as np
from scipy import stats
from typing import Dict, Tuple

def compute_mean_std_skew_kurtosis(x: np.ndarray) -> Dict[str, float]:
    """
    Compute basic statistical moments of a time series.
    
    These capture the distribution's shape without considering temporal order.
    
    Parameters
    ----------
    x : np.ndarray
        1D time series array
        
    Returns
    -------
    Dict with mean, std, skewness, kurtosis
    
    Example
    -------
    >>> x = np.random.randn(100)
    >>> stats = compute_mean_std_skew_kurtosis(x)
    >>> print(f"Mean: {stats['mean']:.4f}")
    """
    n = len(x)
    
    # Mean: central tendency
    mean = np.sum(x) / n
    
    # Variance and standard deviation: spread
    variance = np.sum((x - mean) ** 2) / n
    std = np.sqrt(variance)
    
    # Skewness: asymmetry (0 = symmetric, >0 = right tail, <0 = left tail)
    if std > 0:
        skewness = np.sum(((x - mean) / std) ** 3) / n
    else:
        skewness = 0.0
    
    # Kurtosis: tail heaviness (0 = normal, >0 = heavy tails, <0 = light tails)
    if std > 0:
        kurtosis = np.sum(((x - mean) / std) ** 4) / n - 3  # Excess kurtosis
    else:
        kurtosis = 0.0
    
    return {
        'mean': mean,
        'std': std,
        'variance': variance,
        'skewness': skewness,
        'kurtosis': kurtosis
    }

# Test with a sample signal
np.random.seed(42)
test_signal = np.random.randn(100)
basic_stats = compute_mean_std_skew_kurtosis(test_signal)
print("Basic Statistical Moments:")
for name, value in basic_stats.items():
    print(f"  {name:12}: {value:8.4f}")

In [None]:
def compute_autocorrelation(x: np.ndarray, max_lag: int = 10) -> Dict[str, float]:
    """
    Compute autocorrelation function for various lags.
    
    Autocorrelation reveals periodic patterns and temporal dependencies.
    High autocorrelation at lag k means x[t] predicts x[t+k].
    
    Parameters
    ----------
    x : np.ndarray
        1D time series array
    max_lag : int
        Maximum lag to compute (default: 10)
        
    Returns
    -------
    Dict with autocorrelation at each lag
    
    Interpretation
    --------------
    - Slowly decaying ACF → trend or long memory
    - Sharp cutoff → MA process
    - Oscillating ACF → seasonal/periodic patterns
    """
    n = len(x)
    x_centered = x - np.mean(x)
    
    # Denominator: total variance (lag 0 autocorrelation = 1)
    var = np.sum(x_centered ** 2)
    
    acf_values = {}
    acf_array = np.zeros(max_lag + 1)
    
    for lag in range(max_lag + 1):
        if var > 0:
            # Cross-product of original and lagged series
            numerator = np.sum(x_centered[:n-lag] * x_centered[lag:]) if lag > 0 else var
            acf = numerator / var
        else:
            acf = 0.0
        acf_values[f'acf_lag_{lag}'] = acf
        acf_array[lag] = acf
    
    # Derived features used by TSFresh
    acf_values['acf_first_decay'] = np.argmax(acf_array < 0.5) if np.any(acf_array < 0.5) else max_lag
    acf_values['acf_sum'] = np.sum(np.abs(acf_array[1:]))  # Exclude lag 0
    
    return acf_values, acf_array

# Test autocorrelation on a periodic signal
t = np.linspace(0, 4*np.pi, 100)
periodic_signal = np.sin(t) + 0.3 * np.random.randn(100)
acf_features, acf_array = compute_autocorrelation(periodic_signal, max_lag=20)
print("Autocorrelation Features (periodic signal):")
for name, value in list(acf_features.items())[:8]:
    print(f"  {name:15}: {value:8.4f}")

In [None]:
def compute_fft_features(x: np.ndarray, n_coeffs: int = 10) -> Dict[str, float]:
    """
    Compute FFT-based frequency domain features.
    
    The Fast Fourier Transform reveals which frequencies dominate the signal.
    Useful for detecting periodic patterns, cycles, and oscillations.
    
    Parameters
    ----------
    x : np.ndarray
        1D time series array
    n_coeffs : int
        Number of FFT coefficients to return
        
    Returns
    -------
    Dict with FFT magnitude and power features
    
    Interpretation
    --------------
    - Peak at low frequency → slow variations, trends
    - Peak at specific frequency → periodic pattern
    - Flat spectrum → white noise
    """
    n = len(x)
    
    # Compute FFT
    fft_vals = np.fft.fft(x)
    
    # Only keep positive frequencies (first half due to symmetry)
    fft_positive = fft_vals[:n//2]
    
    # Magnitude spectrum: |X_k|
    magnitude = np.abs(fft_positive)
    
    # Power spectrum: |X_k|^2
    power = magnitude ** 2
    
    # Normalize power to get spectral density
    total_power = np.sum(power)
    power_normalized = power / total_power if total_power > 0 else power
    
    features = {}
    
    # Store first n_coeffs as features
    for i in range(min(n_coeffs, len(magnitude))):
        features[f'fft_coeff_{i}_real'] = np.real(fft_positive[i])
        features[f'fft_coeff_{i}_imag'] = np.imag(fft_positive[i])
        features[f'fft_coeff_{i}_magnitude'] = magnitude[i]
    
    # Aggregate FFT features
    features['fft_max_magnitude'] = np.max(magnitude[1:])  # Exclude DC component
    features['fft_dominant_freq_index'] = np.argmax(magnitude[1:]) + 1
    features['fft_spectral_centroid'] = np.sum(np.arange(len(power)) * power_normalized)
    features['fft_spectral_entropy'] = -np.sum(power_normalized * np.log2(power_normalized + 1e-10))
    
    return features, magnitude, power

# Test FFT on a composite signal (two frequencies)
t = np.linspace(0, 1, 256)
composite_signal = np.sin(2 * np.pi * 10 * t) + 0.5 * np.sin(2 * np.pi * 30 * t)  # 10Hz + 30Hz
fft_features, magnitude, power = compute_fft_features(composite_signal, n_coeffs=5)
print("FFT Features (composite 10Hz + 30Hz signal):")
print(f"  Dominant frequency index: {fft_features['fft_dominant_freq_index']}")
print(f"  Max magnitude: {fft_features['fft_max_magnitude']:.4f}")
print(f"  Spectral centroid: {fft_features['fft_spectral_centroid']:.4f}")

In [None]:
def compute_linear_trend(x: np.ndarray) -> Dict[str, float]:
    """
    Compute linear trend features using least squares regression.
    
    Captures the overall direction and strength of change over time.
    
    Parameters
    ----------
    x : np.ndarray
        1D time series array
        
    Returns
    -------
    Dict with slope, intercept, and R-squared
    """
    n = len(x)
    t = np.arange(n)
    
    # Means
    t_mean = np.mean(t)
    x_mean = np.mean(x)
    
    # Slope (beta): covariance / variance
    numerator = np.sum((t - t_mean) * (x - x_mean))
    denominator = np.sum((t - t_mean) ** 2)
    
    slope = numerator / denominator if denominator > 0 else 0.0
    intercept = x_mean - slope * t_mean
    
    # R-squared: coefficient of determination
    y_pred = slope * t + intercept
    ss_res = np.sum((x - y_pred) ** 2)
    ss_tot = np.sum((x - x_mean) ** 2)
    r_squared = 1 - (ss_res / ss_tot) if ss_tot > 0 else 0.0
    
    return {
        'linear_trend_slope': slope,
        'linear_trend_intercept': intercept,
        'linear_trend_r_squared': r_squared,
        'linear_trend_residual_std': np.std(x - y_pred)
    }

# Test on trending data
t = np.arange(100)
trending_signal = 0.5 * t + 10 + 5 * np.random.randn(100)  # Upward trend with noise
trend_features = compute_linear_trend(trending_signal)
print("Linear Trend Features:")
for name, value in trend_features.items():
    print(f"  {name:25}: {value:8.4f}")

In [None]:
def extract_tsfresh_features(x: np.ndarray, max_lag: int = 10, n_fft_coeffs: int = 5) -> Dict[str, float]:
    """
    Extract a comprehensive set of TSFresh-style features from a time series.
    
    This combines all the individual feature extractors into a single function
    that produces a feature vector suitable for machine learning.
    
    Parameters
    ----------
    x : np.ndarray
        1D time series array
    max_lag : int
        Maximum lag for autocorrelation
    n_fft_coeffs : int
        Number of FFT coefficients to extract
        
    Returns
    -------
    Dict with all extracted features (50+ features)
    """
    features = {}
    
    # 1. Basic statistical moments
    basic_stats = compute_mean_std_skew_kurtosis(x)
    features.update(basic_stats)
    
    # 2. Additional distributional features
    features['min'] = np.min(x)
    features['max'] = np.max(x)
    features['range'] = np.max(x) - np.min(x)
    features['median'] = np.median(x)
    features['iqr'] = np.percentile(x, 75) - np.percentile(x, 25)
    features['q25'] = np.percentile(x, 25)
    features['q75'] = np.percentile(x, 75)
    features['abs_energy'] = np.sum(x ** 2)
    features['root_mean_square'] = np.sqrt(np.mean(x ** 2))
    
    # 3. Autocorrelation features
    acf_features, _ = compute_autocorrelation(x, max_lag)
    features.update(acf_features)
    
    # 4. FFT frequency features
    fft_features, _, _ = compute_fft_features(x, n_fft_coeffs)
    features.update(fft_features)
    
    # 5. Linear trend features
    trend_features = compute_linear_trend(x)
    features.update(trend_features)
    
    # 6. Count-based features
    features['count_above_mean'] = np.sum(x > features['mean'])
    features['count_below_mean'] = np.sum(x < features['mean'])
    features['pct_above_mean'] = features['count_above_mean'] / len(x)
    
    # 7. Change-based features
    diff = np.diff(x)
    features['mean_abs_change'] = np.mean(np.abs(diff))
    features['mean_change'] = np.mean(diff)
    features['max_abs_change'] = np.max(np.abs(diff))
    features['count_sign_changes'] = np.sum(np.diff(np.sign(x)) != 0)
    
    # 8. Zero-crossing rate
    features['zero_crossing_rate'] = np.sum(np.diff(np.sign(x - features['mean'])) != 0) / len(x)
    
    return features

# Test the full extractor
np.random.seed(42)
sample_ts = np.cumsum(np.random.randn(200))  # Random walk
all_features = extract_tsfresh_features(sample_ts)
print(f"Total features extracted: {len(all_features)}")
print("\nSample of extracted features:")
for i, (name, value) in enumerate(all_features.items()):
    if i < 15:
        print(f"  {name:30}: {value:12.4f}")

## 3. Interactive Visualizations

Visualizing features helps build intuition about what TSFresh captures from time series data. Below we create interactive Plotly visualizations showing:

1. **Time Series with Feature Annotations** - See where features come from
2. **Autocorrelation Function (ACF)** - Temporal dependencies
3. **FFT Power Spectrum** - Frequency content
4. **Feature Correlation Heatmap** - Redundancy between features
5. **Feature Importance** - Which features matter most for classification

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Generate a sample time series with clear patterns for visualization
np.random.seed(123)
t = np.linspace(0, 10, 500)
sample_ts = (
    2 * np.sin(2 * np.pi * 0.5 * t) +      # Low frequency component
    0.5 * np.sin(2 * np.pi * 2 * t) +      # Higher frequency component
    0.1 * t +                               # Linear trend
    0.3 * np.random.randn(500)              # Noise
)

# Extract features for this sample
features = extract_tsfresh_features(sample_ts)

# === Visualization 1: Time Series with Feature Annotations ===
fig1 = go.Figure()

# Main time series
fig1.add_trace(go.Scatter(
    x=t, y=sample_ts, mode='lines', name='Time Series',
    line=dict(color='#1f77b4', width=1.5)
))

# Annotate mean
mean_val = features['mean']
fig1.add_hline(y=mean_val, line_dash="dash", line_color="red",
               annotation_text=f"Mean = {mean_val:.2f}")

# Annotate standard deviation band
std_val = features['std']
fig1.add_hrect(y0=mean_val - std_val, y1=mean_val + std_val,
               fillcolor="rgba(255,0,0,0.1)", line_width=0,
               annotation_text=f"±1σ = {std_val:.2f}")

# Annotate min/max
fig1.add_annotation(x=t[np.argmax(sample_ts)], y=features['max'],
                    text=f"Max = {features['max']:.2f}", showarrow=True)
fig1.add_annotation(x=t[np.argmin(sample_ts)], y=features['min'],
                    text=f"Min = {features['min']:.2f}", showarrow=True)

# Linear trend line
slope = features['linear_trend_slope']
intercept = features['linear_trend_intercept']
trend_line = slope * np.arange(len(sample_ts)) + intercept
fig1.add_trace(go.Scatter(
    x=t, y=trend_line, mode='lines', name=f'Trend (β={slope:.4f})',
    line=dict(color='green', width=2, dash='dot')
))

fig1.update_layout(
    title="<b>Time Series with Extracted Feature Annotations</b><br><sub>Understanding where TSFresh features come from</sub>",
    xaxis_title="Time",
    yaxis_title="Value",
    template="plotly_white",
    height=450,
    legend=dict(yanchor="top", y=0.99, xanchor="right", x=0.99)
)
fig1.show()

In [None]:
# === Visualization 2: Autocorrelation Function (ACF) Plot ===
_, acf_values = compute_autocorrelation(sample_ts, max_lag=50)

fig2 = go.Figure()

# Bar plot for ACF
fig2.add_trace(go.Bar(
    x=list(range(len(acf_values))),
    y=acf_values,
    marker_color=['#2ecc71' if v > 0 else '#e74c3c' for v in acf_values],
    name='ACF'
))

# Significance bounds (approximate 95% CI for white noise)
n = len(sample_ts)
sig_bound = 1.96 / np.sqrt(n)
fig2.add_hline(y=sig_bound, line_dash="dash", line_color="gray",
               annotation_text="95% CI")
fig2.add_hline(y=-sig_bound, line_dash="dash", line_color="gray")
fig2.add_hline(y=0, line_color="black", line_width=0.5)

fig2.update_layout(
    title="<b>Autocorrelation Function (ACF)</b><br><sub>Values outside gray bands indicate significant temporal dependence</sub>",
    xaxis_title="Lag (k)",
    yaxis_title="Autocorrelation r(k)",
    template="plotly_white",
    height=400,
    showlegend=False
)

# Add interpretation annotation
fig2.add_annotation(
    x=25, y=0.8,
    text="<b>Interpretation:</b><br>• Oscillating pattern → periodic signal<br>• Slow decay → trend/persistence<br>• Sharp cutoff → MA process",
    showarrow=False,
    bgcolor="rgba(255,255,255,0.8)",
    bordercolor="gray",
    borderwidth=1,
    align="left"
)

fig2.show()

In [None]:
# === Visualization 3: FFT Power Spectrum ===
_, magnitude, power = compute_fft_features(sample_ts, n_coeffs=50)

# Compute frequency axis (assuming unit sampling rate, adjust as needed)
sampling_rate = len(sample_ts) / (t[-1] - t[0])
freqs = np.fft.fftfreq(len(sample_ts), 1/sampling_rate)[:len(magnitude)]

fig3 = go.Figure()

# Power spectrum
fig3.add_trace(go.Scatter(
    x=freqs[1:100],  # Skip DC component, show first 100 frequencies
    y=power[1:100],
    mode='lines',
    fill='tozeroy',
    fillcolor='rgba(31, 119, 180, 0.3)',
    line=dict(color='#1f77b4', width=2),
    name='Power Spectrum'
))

# Mark dominant frequencies
peak_indices = np.argsort(power[1:100])[-3:] + 1  # Top 3 peaks
for idx in peak_indices:
    if power[idx] > 0.1 * np.max(power[1:100]):
        fig3.add_annotation(
            x=freqs[idx], y=power[idx],
            text=f"f = {freqs[idx]:.2f} Hz",
            showarrow=True,
            arrowhead=2
        )

fig3.update_layout(
    title="<b>FFT Power Spectrum</b><br><sub>Peaks reveal dominant frequencies in the signal</sub>",
    xaxis_title="Frequency (Hz)",
    yaxis_title="Power |X(f)|²",
    template="plotly_white",
    height=400,
    xaxis=dict(range=[0, freqs[99]])
)

fig3.show()

In [None]:
# === Visualization 4: Feature Correlation Heatmap ===
# Generate multiple time series to compute feature correlations
np.random.seed(42)
n_samples = 50
feature_matrix = []

for i in range(n_samples):
    # Generate diverse time series
    ts = (
        np.random.uniform(0.5, 3) * np.sin(2 * np.pi * np.random.uniform(0.1, 2) * t) +
        np.random.uniform(-0.5, 0.5) * t +
        np.random.uniform(0.1, 1) * np.random.randn(len(t))
    )
    feats = extract_tsfresh_features(ts, max_lag=5, n_fft_coeffs=3)
    feature_matrix.append(feats)

# Convert to DataFrame
import pandas as pd
df_features = pd.DataFrame(feature_matrix)

# Select a subset of interpretable features for the heatmap
selected_features = ['mean', 'std', 'skewness', 'kurtosis', 'min', 'max', 'range',
                     'linear_trend_slope', 'linear_trend_r_squared', 'acf_lag_1', 
                     'acf_lag_2', 'fft_max_magnitude', 'fft_spectral_entropy',
                     'mean_abs_change', 'zero_crossing_rate']

df_subset = df_features[selected_features]
corr_matrix = df_subset.corr()

fig4 = go.Figure(data=go.Heatmap(
    z=corr_matrix.values,
    x=corr_matrix.columns,
    y=corr_matrix.index,
    colorscale='RdBu_r',
    zmid=0,
    text=np.round(corr_matrix.values, 2),
    texttemplate="%{text}",
    textfont={"size": 9},
    hoverongaps=False
))

fig4.update_layout(
    title="<b>Feature Correlation Heatmap</b><br><sub>High correlation (|r| > 0.8) indicates redundant features</sub>",
    template="plotly_white",
    height=600,
    width=800,
    xaxis=dict(tickangle=45)
)

fig4.show()

In [None]:
# === Visualization 5: Feature Importance Bar Chart ===
# Simulate feature importance (in practice, use model coefficients or permutation importance)
np.random.seed(42)

# Create synthetic importance scores based on typical TSFresh relevance
feature_importance = {
    'acf_lag_1': 0.85,
    'linear_trend_slope': 0.78,
    'std': 0.72,
    'fft_max_magnitude': 0.68,
    'mean_abs_change': 0.62,
    'skewness': 0.55,
    'zero_crossing_rate': 0.48,
    'kurtosis': 0.42,
    'fft_spectral_entropy': 0.38,
    'range': 0.35,
    'acf_lag_2': 0.32,
    'mean': 0.28,
    'linear_trend_r_squared': 0.25,
    'min': 0.18,
    'max': 0.15
}

# Sort by importance
sorted_features = dict(sorted(feature_importance.items(), key=lambda x: x[1], reverse=True))

fig5 = go.Figure()

# Horizontal bar chart
fig5.add_trace(go.Bar(
    y=list(sorted_features.keys()),
    x=list(sorted_features.values()),
    orientation='h',
    marker=dict(
        color=list(sorted_features.values()),
        colorscale='Viridis',
        showscale=True,
        colorbar=dict(title='Importance')
    ),
    text=[f'{v:.2f}' for v in sorted_features.values()],
    textposition='outside'
))

fig5.update_layout(
    title="<b>Feature Importance Ranking</b><br><sub>Based on predictive power for time series classification</sub>",
    xaxis_title="Importance Score",
    yaxis_title="Feature",
    template="plotly_white",
    height=500,
    yaxis=dict(autorange='reversed'),  # Most important at top
    xaxis=dict(range=[0, 1.1])
)

# Add interpretation annotations
fig5.add_annotation(
    x=0.9, y=10,
    text="<b>Key Insight:</b><br>Temporal features (ACF, trend)<br>often outperform static<br>statistics for classification",
    showarrow=False,
    bgcolor="rgba(255,255,255,0.9)",
    bordercolor="gray",
    borderwidth=1,
    align="left"
)

fig5.show()

## 4. Using TSFresh with sktime

Now that we understand the mathematical foundations and have built features from scratch, let's see how to use the production-ready `TSFreshFeatureExtractor` from sktime. This integrates seamlessly with sklearn pipelines.

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

from sktime.datasets import load_basic_motions, load_unit_test



### Install dependency

TSFresh is an optional dependency for sktime. Install it with:

```bash
pip install tsfresh
```

**Feature Set Options:**
- `"minimal"`: ~10 features, fastest computation
- `"efficient"`: ~100 features, good balance of speed and coverage
- `"comprehensive"`: 800+ features, most thorough but slowest

In [None]:
X_train, y_train = load_basic_motions(split="train", return_X_y=True)
X_test, y_test = load_basic_motions(split="test", return_X_y=True)



In [None]:
from sktime.transformations.panel.tsfresh import TSFreshFeatureExtractor
from sklearn.linear_model import RidgeClassifier
from sklearn.pipeline import make_pipeline
from sklearn.metrics import classification_report

pipe = make_pipeline(TSFreshFeatureExtractor(default_fc_parameters="efficient"), RidgeClassifier())
pipe.fit(X_train, y_train)
pred = pipe.predict(X_test)
print(classification_report(y_test, pred))


## 5. Summary & Best Practices

### Feature Categories in TSFresh

| Category | Examples | What They Capture |
|----------|----------|-------------------|
| **Statistical** | mean, std, skewness, kurtosis | Distribution shape |
| **Temporal** | autocorrelation, partial autocorrelation | Sequential dependencies |
| **Frequency** | FFT coefficients, spectral entropy | Periodic patterns |
| **Trend** | linear trend slope, curvature | Long-term direction |
| **Complexity** | approximate entropy, sample entropy | Irregularity/predictability |
| **Count-based** | peaks, zero crossings | Discrete events |

### Best Practices

1. **Start with `"efficient"` parameter set** - Good balance of coverage and speed
2. **Apply feature selection** - Many features are redundant; use correlation filtering
3. **Scale features** - Use StandardScaler before ML models
4. **Handle NaN features** - Some features may be undefined for certain time series
5. **Consider computational cost** - Full extraction can be slow for large datasets

### When to Use TSFresh vs Alternatives

| Scenario | Recommendation |
|----------|----------------|
| Need interpretable features | ✅ TSFresh |
| Very large datasets | Consider Catch22 (faster) |
| Deep learning approach | Skip feature extraction, use raw data |
| Real-time inference | Use minimal feature set or pre-compute |