In this notebook we experiment with implementing Latent Credible Analysis models. Let's build the most simpleLCA

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import pandas as pd
import numpy as np

In [3]:
import seaborn as sns

In [4]:
import sys
sys.path.insert(0, '../')

import os.path as op

In [5]:
import tensorflow as tf
from tensorflow_probability import edward2 as ed
import tensorflow_probability as tfp


from spectrum.preprocessing import encoders
# from spectrum.judge import lca_tf, utils
from spectrum import evaluator

In [6]:
tf.__version__, tfp.__version__

('2.1.0', '0.9.0')

In [7]:
tf.random.set_seed(2020)

# Synthetic Dataset

In [8]:
DATA_DIR = '../data'
DATA_SET = 'population'

In [9]:
truths = pd.read_csv(op.join(DATA_DIR, DATA_SET, 'truths.csv'))
raw_claims = pd.read_csv(op.join(DATA_DIR, DATA_SET, 'claims.csv'))

We decide to model city population as discrete value. Moreover we consider the hidden truth value is only from the set of available assertions. Thus we need to label encode `value` of claims data frame.

### Data Preprocessing 

We need to label encode values of objects in order to feed them to our simpleLCA model

In [10]:
claims, le_dict = encoders.fit_and_transform(raw_claims) # this should be named fit and transform

# Truth Discovery

In [11]:
from spectrum.judge.lca import simpleLCA_VI as LCA

In [12]:
lca = LCA(claims)

In [17]:
discovered_truths = lca.discover(epochs=20, learning_rate=0.0005, report_every=1)

truth discovery on the way...
iteration 0 -  loss 959.3779296875
iteration 1 -  loss 951.9359130859375
iteration 2 -  loss 992.5343627929688
iteration 3 -  loss 958.1522216796875
iteration 4 -  loss 1001.3102416992188
iteration 5 -  loss 982.328857421875
iteration 6 -  loss 999.185546875
iteration 7 -  loss 995.260498046875
iteration 8 -  loss 991.51806640625
iteration 9 -  loss 966.1192016601562
iteration 10 -  loss 957.9558715820312
iteration 11 -  loss 982.4895629882812
iteration 12 -  loss 987.4508056640625
iteration 13 -  loss 995.2373046875
iteration 14 -  loss 968.0048828125
iteration 15 -  loss 990.4348754882812
iteration 16 -  loss 993.61181640625
iteration 17 -  loss 992.80908203125
iteration 18 -  loss 961.1624755859375
iteration 19 -  loss 991.43896484375


### Loss

In [None]:
import matplotlib.pyplot as plt

plt.plot(lca.bbvi.train_loss)

[<matplotlib.lines.Line2D at 0x14039b6d8>]

# Evaluation 

We need to inverse transform the discovered truth value of each object into their original space.

In [15]:
discovered_truths['value'] = discovered_truths.apply(lambda x: le_dict[x['object_id']].inverse_transform([x['value']])[0], axis=1)

In [16]:
evaluator.accuracy(truths, discovered_truths)

0.44368600682593856