<a href="https://colab.research.google.com/github/yogee11/A2A/blob/main/Chaos_Theory_Based_Compression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Created By Ash Kelly**

**Notebook Description: Chaos Codec - Time-Series Compression using Predictive Modeling**

This notebook implements and evaluates a custom, lossy compression algorithm, termed "Chaos Codec," designed specifically for chaotic time-series data. It uses the logistic map as a representative chaotic signal and leverages predictive modeling techniques to achieve compression. The performance (both compression ratio and reconstruction accuracy) is compared against standard lossless compression algorithms.

**Methodology:**

1.  **Signal Generation:** A large chaotic time series (~10MB of float64 values) is generated using the logistic map equation (`r=4.0`, ensuring chaotic behavior).
2.  **Base Predictive Model:** A linear regression model, inspired by the structure of the Hénon map, is trained to predict the next value in the series based on the previous two values and the square of the immediate predecessor (`x[t+1] ~ W1*x[t-1] + W2*x[t] + W3*x[t]^2 + b`).
3.  **Base Signal Reconstruction:** The trained base model is used autoregressively to generate an approximation of the original signal.
4.  **Residual Modeling (Mirror-Based):** The residuals (errors between the original signal and the base reconstruction) are calculated. A *second* linear regression model is trained to predict these residuals. Uniquely, this model uses features derived from the *reversed* (mirrored) original signal (`residual[t] ~ R1*original_reversed[t] + R2*original_reversed[t+1] + Rb`).
5.  **Hybrid Reconstruction:** The predictions from the residual model are added to the base reconstruction to create a more accurate "hybrid" signal.
6.  **Delta Correction & Quantization:** The final difference (delta) between the original signal and the hybrid reconstruction is computed. This delta is then quantized to `int16` after scaling (introducing the lossy aspect of the codec), pickled, and compressed using `zlib`. This forms the main "residual payload."
7.  **Final Reconstruction:** The signal is reconstructed one last time by adding the de-quantized delta back to the hybrid signal.
8.  **Model Parameter Compression:** The coefficients and intercepts from both linear models are collected, converted to `float16` for space saving, pickled, and compressed using `zlib`.
9.  **Evaluation:** The accuracy of the final reconstructed signal is measured against the original using Mean Squared Error (MSE), Mean Absolute Error (MAE), Cosine Similarity, and the Loss Ratio (MSE / Signal Variance).
10. **Benchmarking:** The original signal is compressed using standard lossless algorithms (`zlib`, `gzip`, `bz2`, `lzma`) to establish baseline compression ratios.
11. **Comparison:** The total size of the "Chaos Codec" (compressed model parameters + compressed residual payload) is compared against the sizes achieved by the standard compressors. Compression ratios relative to the original pickled data size are calculated and reported alongside the accuracy metrics.

**Purpose:**

The notebook aims to explore the feasibility of using chained predictive models tailored to the dynamics of a chaotic system as a form of lossy compression. It quantifies the trade-off between the compression ratio achieved by this custom "Chaos Codec" and the resulting reconstruction error, comparing its efficiency to standard, general-purpose lossless compression tools.

In [None]:
import numpy as np
import pickle, zlib, gzip, bz2, lzma
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.metrics.pairwise import cosine_similarity

# === STEP 1: Generate large logistic signal (~10MB)
def logistic_map(r, x0, n):
    x = [x0]
    for _ in range(n - 1):
        x.append(r * x[-1] * (1 - x[-1]))
    return np.array(x)

n_large = 1310720  # 10MB in float64
logistic_signal = logistic_map(r=4.0, x0=0.6, n=n_large)

# === STEP 2: Train Henon-style base model
X = np.vstack([
    logistic_signal[:-2],
    logistic_signal[1:-1],
    logistic_signal[1:-1]**2
]).T
y = logistic_signal[2:]
henon_model = LinearRegression().fit(X, y)
W1, W2, W3 = henon_model.coef_
b = henon_model.intercept_

# === STEP 3: Predict base signal
reconstructed_base = [logistic_signal[0], logistic_signal[1]]
for _ in range(len(logistic_signal) - 2):
    xn1, xn = reconstructed_base[-2], reconstructed_base[-1]
    x2 = xn ** 2
    reconstructed_base.append(W1 * xn1 + W2 * xn + W3 * x2 + b)
reconstructed_base = np.array(reconstructed_base)

# === STEP 4: Mirror-based linear residual
residuals = logistic_signal - reconstructed_base
X_mirror = np.vstack([
    logistic_signal[:-2][::-1],
    logistic_signal[1:-1][::-1]
]).T
y_resid = residuals[2:][::-1]
residual_model = LinearRegression().fit(X_mirror, y_resid)
R1, R2 = residual_model.coef_
Rb = residual_model.intercept_

# === STEP 5: Predict residuals and reconstruct hybrid
resid_pred = []
for i in range(len(logistic_signal) - 2):
    m1 = logistic_signal[::-1][i]
    m2 = logistic_signal[::-1][i+1]
    resid_pred.append(R1 * m1 + R2 * m2 + Rb)
resid_pred = [0, 0] + resid_pred[::-1]
base_plus_mirror = reconstructed_base + np.array(resid_pred)

# === STEP 6: Delta correction with int16 quantization
delta = logistic_signal - base_plus_mirror
scale = 1e4
delta_q = np.clip(np.round(delta * scale), -32768, 32767).astype(np.int16)
resid_payload = pickle.dumps({"residuals": delta_q, "scale": scale})
resid_compressed = zlib.compress(resid_payload)

# === STEP 7: Reconstruct final signal
reconstructed_final = base_plus_mirror + (delta_q.astype(np.float32) / scale)

# === STEP 8: Evaluate
mse = mean_squared_error(logistic_signal, reconstructed_final)
mae = mean_absolute_error(logistic_signal, reconstructed_final)
cos_sim = cosine_similarity([logistic_signal], [reconstructed_final])[0, 0]
loss_ratio = mse / np.var(logistic_signal)

# === STEP 9: Compress weights (float16)
weights = np.array([W1, W2, W3, b, R1, R2, Rb], dtype=np.float16)
weights_compressed = zlib.compress(pickle.dumps(weights))

# === STEP 10: Standard compressors
raw_bytes = pickle.dumps(logistic_signal)
zlib_bytes = zlib.compress(raw_bytes)
gzip_bytes = gzip.compress(raw_bytes)
bz2_bytes = bz2.compress(raw_bytes)
lzma_bytes = lzma.compress(raw_bytes)

# === STEP 11: Compare sizes and ratios
sizes = {
    "Original Size": len(raw_bytes),
    "Chaos Codec": len(weights_compressed) + len(resid_compressed),
    "zlib": len(zlib_bytes),
    "gzip": len(gzip_bytes),
    "bz2": len(bz2_bytes),
    "lzma": len(lzma_bytes),
}
ratios = {k: round(v / sizes["Original Size"], 4) for k, v in sizes.items()}

# === Final Report ===
print("🎯 Chaos Codec Accuracy:")
print(f" - MSE: {mse:.6e}")
print(f" - MAE: {mae:.6f}")
print(f" - Cosine Similarity: {cos_sim:.9f}")
print(f" - Loss Ratio (MSE / Var): {loss_ratio:.6e}")
print("\n📦 Compression Comparison (bytes + ratio):")
for name in sizes:
    print(f" - {name:<16} {sizes[name]:>10} bytes | Ratio: {ratios[name]:.4f}")

🎯 Chaos Codec Accuracy:
 - MSE: 8.336860e-10
 - MAE: 0.000025
 - Cosine Similarity: 0.999999999
 - Loss Ratio (MSE / Var): 6.664851e-09

📦 Compression Comparison (bytes + ratio):
 - Original Size      10485923 bytes | Ratio: 1.0000
 - Chaos Codec         2502381 bytes | Ratio: 0.2386
 - zlib                9899765 bytes | Ratio: 0.9441
 - gzip                9899777 bytes | Ratio: 0.9441
 - bz2                10162245 bytes | Ratio: 0.9691
 - lzma                9171812 bytes | Ratio: 0.8747
