# Basic usage and testing

The demo compares the performance of CADE and fastKDE on a simulated data set.

## Simulate data

In [1]:
import numpy as np
import pandas as pd
import pydens

Define a problem by simulating some data from a bivariate distribution:

In [2]:
np.random.seed(0)
sz = pydens.simulators.bivariate.Zena()
df = sz.rvs(1000)
df.head()
# Some example values:

Unnamed: 0,gaussian,triangular
0,0.148588,1.085822
1,0.587739,0.015134
2,0.283965,0.828005
3,0.138861,1.381029
4,-0.159206,0.066705


## Train the density estimators

Use Cade to estimate the density of the data. Cade works by
first fitting an initial naive joint density model and subsequently
improving the initial density estimates with a classifier that
tries to distinguish between the real data versus fake data sampled
from the initial density model:

In [3]:
# All arguments can be ommitted; displaying defaults here to be explicit:
cade = pydens.cade.Cade(
    initial_density=pydens.models.JointDensity(),
    classifier=pydens.classifiers.lightgbm.Lgbm()
)
cade.train(df, diagnostics=True)

Try fastKDE (pip install fastkde):

In [4]:
fkde = pydens.wrappers.FastKDE()
fkde.train(df)

## Performance evaluation

Compare the performance of the estimators on new data from the same simulation:

In [5]:
new_df = sz.rvs(1000)
ev = pydens.evaluation.Evaluation(
    estimators={type(e).__name__: e.density(new_df) for e in [cade, fkde]},
    truth=sz.density(new_df)
)
pd.set_option('display.precision', 3)
print(ev.evaluate())

                         Cade  FastKDE
correlation_with_truth  0.837    0.957
mean_likelihood         0.124    0.117
