# Serialization Formats for Vine Copulas

When working with fitted vine copulas, we often need to:
1. **Review the fitted model** - understand what structure and families were selected
2. **Version control** - track changes to models over time in git
3. **Share models** - with collaborators or across different environments
4. **Preserve variable names** - keep meaningful labels from your data

This notebook explores the serialization options in pyvinecopulib, including:
- Native JSON format (compact, machine-readable)
- Human-readable JSON format (combines structure and parameters)
- The `NamedVinecop` class for automatic variable name handling

In [1]:
import pyvinecopulib as pv
import numpy as np
import pandas as pd
import json
from pathlib import Path

print(f"pyvinecopulib: {pv.__version__}")

# Create output directory
output_dir = Path("serialization_test")
output_dir.mkdir(exist_ok=True)

pyvinecopulib: 0.7.7+yaniv1


## 1. Create Sample Data with Meaningful Variable Names

Let's create data that represents a real-world scenario - financial assets with named columns.

In [2]:
# Create data with meaningful column names and REAL dependence structure
np.random.seed(42)
n = 500

# Hidden factors that drive correlations
market_factor = np.random.randn(n)
sector_factor = np.random.randn(n)

# Observable variables are mixtures of factors plus idiosyncratic noise
tech_stock = 0.7 * market_factor + 0.5 * sector_factor + 0.3 * np.random.randn(n)
bank_stock = 0.6 * market_factor - 0.2 * sector_factor + 0.4 * np.random.randn(n)
oil_index = 0.4 * market_factor + 0.7 * np.random.randn(n)
gold_price = -0.3 * market_factor + 0.8 * np.random.randn(n)

df = pd.DataFrame({
    'Tech_Stock': tech_stock,
    'Bank_Stock': bank_stock,
    'Oil_Index': oil_index,
    'Gold_Price': gold_price,
})

# Transform to pseudo-observations (copula scale)
u = pd.DataFrame(pv.to_pseudo_obs(df.values), columns=df.columns)

print(f"Data shape: {u.shape}")
print(f"Columns: {list(u.columns)}")
print(f"\nSample correlations (Kendall's tau):")
print(df.corr(method='kendall').round(3))

Data shape: (500, 4)
Columns: ['Tech_Stock', 'Bank_Stock', 'Oil_Index', 'Gold_Price']

Sample correlations (Kendall's tau):


            Tech_Stock  Bank_Stock  Oil_Index  Gold_Price
Tech_Stock       1.000       0.282      0.236      -0.127
Bank_Stock       0.282       1.000      0.252      -0.169
Oil_Index        0.236       0.252      1.000      -0.098
Gold_Price      -0.127      -0.169     -0.098       1.000


## 2. NamedVinecop: Automatic Variable Name Handling

The `NamedVinecop` class wraps the C++ Vinecop and automatically captures column names from pandas DataFrames.

In [3]:
# Fit using NamedVinecop - variable names are captured automatically!
controls = pv.FitControlsVinecop(
    family_set=[pv.gaussian, pv.clayton, pv.gumbel, pv.frank, pv.joe, pv.student],
    selection_criterion="bic"
)

# When passing a DataFrame, column names are automatically captured
named_vine = pv.NamedVinecop.from_data(u, controls=controls)

print(f"Variable names: {named_vine.var_names}")
print(f"Dimension: {named_vine.dim}")
print(f"Parameters: {named_vine.npars}")
print(f"Log-likelihood: {named_vine.loglik():.2f}")

Variable names: ['Tech_Stock', 'Bank_Stock', 'Oil_Index', 'Gold_Price']
Dimension: 4
Parameters: 6.0
Log-likelihood: 128.42


## 3. Human-Readable JSON Format

The human-readable JSON combines structure and parameters in one clean format:

In [4]:
# Get human-readable JSON
print(named_vine.to_human_json())

{
  "variables": ["Tech_Stock", "Bank_Stock", "Oil_Index", "Gold_Price"],
  "n_observations": 500,
  "log_likelihood": 128.4217,
  "matrix": [
    [2, 2, 4, 4],
    [1, 4, 2, 0],
    [4, 1, 0, 0],
    [3, 0, 0, 0]
  ],
  "trees": {
    "1": {
      "Oil_Index-Bank_Stock": "Frank(theta=2.394)",
      "Tech_Stock-Bank_Stock": "Gaussian(rho=0.4437)",
      "Bank_Stock-Gold_Price": "Clayton 270°(theta=0.3538)"
    },
    "2": {
      "Oil_Index-Tech_Stock|Bank_Stock": "Clayton(theta=0.314)",
      "Tech_Stock-Gold_Price|Bank_Stock": "Gumbel 270°(theta=1.07)"
    },
    "3": {
      "Oil_Index-Gold_Price|Bank_Stock,Tech_Stock": "Gumbel 270°(theta=1.035)"
    }
  }
}


## 4. Saving and Loading

Variable names are preserved through serialization.

In [5]:
# Save to file
json_path = output_dir / "named_vine.json"
named_vine.to_file(str(json_path), indent=2)

# Load and verify
loaded_vine = pv.NamedVinecop.from_file(str(json_path))

print("Round-trip verification:")
print(f"  Original names: {named_vine.var_names}")
print(f"  Loaded names:   {loaded_vine.var_names}")
print(f"  Names match: {named_vine.var_names == loaded_vine.var_names}")

Round-trip verification:
  Original names: ['Tech_Stock', 'Bank_Stock', 'Oil_Index', 'Gold_Price']
  Loaded names:   ['Tech_Stock', 'Bank_Stock', 'Oil_Index', 'Gold_Price']
  Names match: True


## 5. Editing Models

The human-readable format supports editing - modify parameters and reload.

In [6]:
# Get human-readable dict
human = json.loads(named_vine.to_human_json())

# Show original
tree1 = human['trees']['1']
first_edge = list(tree1.keys())[0]
print(f"Original: {first_edge} = {tree1[first_edge]}")

# Modify the copula
tree1[first_edge] = "Gaussian(rho=0.5)"
print(f"Modified: {first_edge} = {tree1[first_edge]}")

# Convert back and reload
native_json = pv.from_human_json(human)
modified_vine = pv.Vinecop.from_json(native_json)

# Verify
pc = modified_vine.get_pair_copula(0, 0)
print(f"\nReloaded: {pc.family.name}, rho = {pc.parameters[0,0]:.4f}")

Original: Oil_Index-Bank_Stock = Frank(theta=2.394)
Modified: Oil_Index-Bank_Stock = Gaussian(rho=0.5)

Reloaded: gaussian, rho = 0.5000


## Summary

| Format | Best For | Example |
|--------|----------|--------|
| **`to_human_json()`** | Review, editing, git diffs | `"Tech-Bank": "Gaussian(rho=0.444)"` |
| **`to_json()`** | Storage, interoperability | `{"fam": "Gaussian", "par": {"data": [0.444]}}` |

**Use `NamedVinecop` when:**
- Working with pandas DataFrames
- You want automatic variable name handling
- Building reproducible analysis pipelines

**Use `to_human_json()` when:**
- Reviewing fitted models
- Editing parameters
- Version control with meaningful diffs

**Use `to_json()` when:**
- Interoperability with R/C++ vinecopulib
- Minimizing file size