# Introduction 

In this notebook, we will implement [*Latent Credible Analysis*](https://research.fb.com/publications/latent-credibility-analysis/) models. These are latent probablistic models that use hidden (latent) variables to represents the unknown data source reliabilities and underlying truth values. 

We implement only simpleLCA for now as extension to other models are relatively straight forward.

# SimpleLCA

Here is the plate model of simpleLCA. 

![simpleLCA](./gfx/simpleLCA.png)

# Data 

In [1]:
import pandas as pd
import os.path as op
import numpy as np
import seaborn as sns

In [2]:
import sys
sys.path.insert(0, '../')

In [3]:
from spectrum import utils
from spectrum.truthfinder import truthfinder

In [4]:
DATA_DIR = '../data'
DATA_SET = 'population'

In [5]:
truths = pd.read_csv(op.join(DATA_DIR, DATA_SET, 'truths.csv'))
claims = pd.read_csv(op.join(DATA_DIR, DATA_SET, 'claims.csv'))

In [6]:
truths.head()

Unnamed: 0,object,value,object_id
0,milton_newhampshire_Population2000,3910,157
1,omaha_nebraska_Population2000,390007,189
2,schaumburg_illinois_Population2000,75386,240
3,lakeoswego_oregon_Population2000,35278,127
4,culver_oregon_Population2000,802,53


In [7]:
claims.head()

Unnamed: 0,object,SourceID,value,object_id,source_id
0,milton_newhampshire_Population2000,16168: SatyrTN,3910,157,352
1,milton_newhampshire_Population2000,0 (76.19.53.22),23910,157,274
2,milton_newhampshire_Population2000,5512121: CapitalBot,3910,157,561
3,omaha_nebraska_Population2000,201610: Pentawing,390007,189,401
4,omaha_nebraska_Population2000,89326: Swid,390007,189,630


In [8]:
truths.shape, claims.shape

((301, 3), (1046, 5))

# Implementation

In [9]:
def simpleLCA(claims):
    """implement simpleLCA generative model.
    
    A claim is modeled as triple (source_id, object_id, value). This means the ``source_id`` asserts
    that the ``object_id`` takes on value ``value``.
    
    Parameters
    ----------
    claims: pandas.DataFrame
        a data frame that has columns [source_id, object_id, value]
    
    """
    pass


In [10]:
def bvi(simpleLCA_fn):
    """perform blackbox mean field variational inference on simpleLCA.
    
    This methods take a simpleLCA model as input and perform blackbox variational
    inference, and returns a list of posterior distributions of hidden truth and source
    reliability. 
    
    Concretely, if s is a source then posterior(s) is the probability of s being honest.
    And if o is an object, or more correctly, is a random variable that has the support as
    the domain of an object, then posterior(o) is the distribution over these support.
    The underlying truth value of an object could be computed as the mode of this
    distribution.
    
    Parameters
    ----------
    simpleLCA_fn: function
        a function that represents the simpleLCA generative models.
        
    Returns
    -------
    posteriors: list
        a list of posterior distributions of hidden truths and source reliability.
    """
    pass

In [12]:
claims.shape

(1046, 5)

# Draft 