# Deep LiquidLegions sketch sampling

evgenys@google.com

Google LLC

October 2020

We present a method for an efficient creation of sketches of the sets of Virtual People of desired size. We do it in the context of [VID-native](https://github.com/world-federation-of-advertisers/virtual_people_research/blob/master/notebooks/TV_modeling_with_Virtual_People.ipynb) approach to cross-media audience estimation, i.e. mapping panelists to Virtual People sketches that the panelists represent. This method can analogously be applied for the Aggregate approach.

The VID-native approach requires building correspondence between virtual people and panelists. If done naively it could become somewhat expensive.
For example, US there are 250M virtual people and mapping them to thousands of
panelists will take CPU time.

Here we leverage the fact that the audience needs to be eventually mapped to a
[LiquidLegions](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/3e44af84a8404c28aaebff347a4bd5e305a62eda.pdf) sketch. Thus instead of mapping virtual people to panelists we can map LiquidLegion registers that these users occupy. We just need to do it in a way that  preserves the distribution of resulting sketches.

We do this using _deep LiquidLegions sketch_. Deep sketch is a sketch that
stores the set of virtual people that activated it's "high" registers, i.e.
registers with a small probability of getting a virtual person.

Deep sketch can be used to allocate sketches to panelists in a way that
makes them intersect with digital sketches according to independnce assumption.

It is done as follows:

1. **Build deep sketch:** Given the set of all virtual people build a deep sketch, storing vp-ids for
registers that follow the first 0. In Matthew Clegg's terminology: store vp-ids for the registers of the fringe of the universe. This number of virtual people is "small", i.e. it's a constant factor off of the size of the fringe.

2. **Deep sketch sampling. Step 1:** Use affinity hashing to map the virtual people of the fringe to the panelists.

3. **Deep sketch sampling. Step 2:** For each panelist, for each register of the saturated area use affinity hashing and appropriate probability to randomly decide if it is assigned to the panelist or not.

4. After steps 1, 2, 3 we have each panelist assigned a set of registers. Use these registers as the sketch of the panelist.

5. **Audience encoding:** To do audience estimation from panel data take the union of all sketches of panelists that belong to the audience. This union is
to be transmitted to secure cardinality estimation framework as the sketch
of the TV audience.

Further in this colab we run simmulations that confirm correctness of the method.

In the simulations we assume that we have **100M people** and **10K panelists**, i.e.
each panelist corresponds to 10K people. For simplicity we use weights of 1.

LiquidLegion sketch with parameters $m=300K, a=20$ is used. When we build the deep sketch we observe that it contains 126K virtual people, which is only 0.1% of all virtual people. We sample it, creating a correspondence between
panelists and sketch registers. 

Finally we compare the estimate from the obtained panelist sketches on a set of cross media campaigns, simulated under independence assumption. The campaigns are of the following sizes.

| campaign code | TV reach.   | Digital reach | Total reach |
|---------------|-------------|---------------|-------------|
| x3-5          | 30M         | 50M           | 65M         |
| x8-8          | 80M         | 80M           | 96M         |
| x5-5          | 50M         | 50M           | 75M         |
| x05-3         | 5M          | 30M           | 33.5M       |
| x5-05         | 50M         | 5M            | 52.5M       |
| x05-05        | 5M          | 5M            | 9.75M       |

The error of the estimation from the deep sketch sampling stays under 2%, which is the ballpark error expected from the sketch of these parameters.

<img src="DeepLiquidSampling_Files/deep_ll_sampling_result.png" width="350px">

To model conditional dependence of the TV and digital audience the sketch
should be built on a category-basis, for example per demographic bucket.

Most computations of the simulation are written in [Logica](https://github.com/evgskv/logica) and run on BigQuery. This allows a relatively fast runtime of about 4 minutes. For more information about syntax of the language see [tutorial](https://colab.sandbox.google.com/github/EvgSkv/logica/blob/main/tutorial/Logica_tutorial.ipynb).

LiquidLegions cardinality estimator from WFA cardinality estimation framework is used.

The approach naturally generalizes to an arbitrary distribution of the Bloom filter and to Bloom filters with an arbitrary number of hash functions.

## Installing WFA cardinality estimation framework




In [None]:
!git clone https://github.com/world-federation-of-advertisers/cardinality_estimation_evaluation_framework
!ls
!cd cardinality_estimation_evaluation_framework; pip install -r requirements.txt
!cd cardinality_estimation_evaluation_framework; python3 setup.py install

fatal: destination path 'cardinality_estimation_evaluation_framework' already exists and is not an empty directory.
cardinality_estimation_evaluation_framework  sample_data
running install
running bdist_egg
running egg_info
writing wfa_cardinality_estimation_evaluation_framework.egg-info/PKG-INFO
writing dependency_links to wfa_cardinality_estimation_evaluation_framework.egg-info/dependency_links.txt
writing entry points to wfa_cardinality_estimation_evaluation_framework.egg-info/entry_points.txt
writing top-level names to wfa_cardinality_estimation_evaluation_framework.egg-info/top_level.txt
writing manifest file 'wfa_cardinality_estimation_evaluation_framework.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/wfa_cardinality_estimation_evaluation_framework
creating build/bdist.linux-x86_64/egg/wfa_cardinality_estimation_evaluation_framework/esti

In [None]:
# If this fails then "Restart runtime". This will let Python load the
# estimation framework libraries. Then proceed from here, or just "Run all".
from wfa_cardinality_estimation_evaluation_framework.estimators import liquid_legions


## Installing Logica

In [None]:
!pip install logica
from logica import colab_logica
from google.colab import auth
auth.authenticate_user()
colab_logica.SetProject('YOUR_PROJECT_ID')


Collecting logica
[?25l  Downloading https://files.pythonhosted.org/packages/fc/cf/0c659ca93ff7c72e94e409e8882ae50e022a660c5e74625f8fe7989b4ee0/logica-1.3.9-py3-none-any.whl (64kB)
[K     |█████                           | 10kB 14.7MB/s eta 0:00:01[K     |██████████▏                     | 20kB 3.0MB/s eta 0:00:01[K     |███████████████▎                | 30kB 3.7MB/s eta 0:00:01[K     |████████████████████▍           | 40kB 4.1MB/s eta 0:00:01[K     |█████████████████████████▌      | 51kB 3.4MB/s eta 0:00:01[K     |██████████████████████████████▋ | 61kB 3.7MB/s eta 0:00:01[K     |████████████████████████████████| 71kB 2.7MB/s 
[?25hInstalling collected packages: logica
Successfully installed logica-1.3.9


## Starting Simulation

In [None]:
import datetime
import time

# Remembering when computation started.
start_of_computation = datetime.datetime.now()
print("Simulation started at: %s" % start_of_computation)

Simulation started at: 2020-10-14 02:56:05.917803


## Hashing library

In [None]:
%%writefile hashing.l

ConfigLL(a: 20, m: 300000);
MaxInt() = 9223372036854775807;

FloatHash(x) = Abs(FarmFingerprint("float_hash:" ++ ToString(x))) / MaxInt();

TruncatedExpHash(x) = 1 - Log(Exp(a) + u * (1 - Exp(a))) / a :-
  ConfigLL(a:),
  u == FloatHash(x);

LiquidHash(x) = ToInt64(TruncatedExpHash(x) * m) :-
  ConfigLL(m:);

LiquidHashProb(x) = a * Exp(-a * t) / (1 - Exp(-a)) / m :-
  t == x / m,
  ConfigLL(a:, m:);

# Affinity hashing functions are not used for the sake of speed of the
# simulation and are given for reference.
AffinityHash(x, options) --> result :-
  result ArgMin= (
      option -> affinity :-
      option in options,
      affinity == FloatHash("affinity:" ++ ToString(x) ++ ToString(option))
  );
WeightedAffinityHash(x, weighted_options) --> result :-
  result ArgMax= (
      option.name -> affinity :-
      option in weighted_options,
      affinity == Log(
          FloatHash("affinity:" ++ ToString(x) ++ ToString(option.name))) /
          option.weight
  );

PositiveHash(x) = Abs(FarmFingerprint("positive_hash:" ++ ToString(x)));

RangeHash(x, num:) = Mod(PositiveHash(x), num);

Overwriting hashing.l


### Small tests for the hashing library

In [None]:
%%logica Q1, Q2, TotalProbOfLL

import hashing.AffinityHash;
import hashing.WeightedAffinityHash;
import hashing.LiquidHashProb;
import hashing.ConfigLL;

Q1(x,
   h1: AffinityHash(x, ["apple", "orange", "banana", "pear"]),
   h2: AffinityHash(x, ["apple", "orange", "banana", "pear", "pineapple"])) :-
  x in [1,2,3,4,5,6,7,8];

Q2(x,
   h1: WeightedAffinityHash(x, [{name: "apple", weight: 1.0},
                                {name: "orange", weight: 2.0},
                                {name: "banana", weight: 0.5}]),
   h2: WeightedAffinityHash(x, [{name: "apple", weight: 1.0},
                                {name: "orange", weight: 2.0},
                                {name: "banana", weight: 1.0}])) :-
  x in [1,2,3,4,5,6,7,8]; 


TotalProbOfLL() += LiquidHashProb(x) :- x in Range(m), ConfigLL(m:);

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Running Q1


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following query is stored at [1mQ1_sql[0m variable.
CREATE TEMP FUNCTION hashing_AffinityHash(x ANY TYPE, options ANY TYPE) AS ((SELECT
  ARRAY_AGG((STRUCT(x_13 AS arg, ((ABS(FARM_FINGERPRINT(CONCAT("float_hash:", CAST(CONCAT(CONCAT("affinity:", CAST(x AS STRING)), CAST(x_13 AS STRING)) AS STRING))))) / (9223372036854775807)) as value)).arg order by (STRUCT(x_13 AS arg, ((ABS(FARM_FINGERPRINT(CONCAT("float_hash:", CAST(CONCAT(CONCAT("affinity:", CAST(x AS STRING)), CAST(x_13 AS STRING)) AS STRING))))) / (9223372036854775807)) as value)).value limit 1)[OFFSET(0)] AS logica_value
FROM
  UNNEST(options) as x_13));

SELECT
  x_1 AS col0,
  hashing_AffinityHash(x_1, ARRAY["apple", "orange", "banana", "pear"]) AS h1,
  hashing_AffinityHash(x_1, ARRAY["apple", "orange", "banana", "pear", "pineapple"]) AS h2
FROM
  UNNEST(ARRAY[1, 2, 3, 4, 5, 6, 7, 8]) as x_1;


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following table is stored at [1mQ1[0m variable.


Unnamed: 0,col0,h1,h2
0,1,pear,pear
1,2,orange,orange
2,3,apple,apple
3,4,banana,banana
4,5,orange,orange
5,6,orange,orange
6,7,orange,orange
7,8,pear,pear


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Running Q2


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following query is stored at [1mQ2_sql[0m variable.
CREATE TEMP FUNCTION hashing_WeightedAffinityHash(x ANY TYPE, weighted_options ANY TYPE) AS ((SELECT
  ARRAY_AGG((STRUCT(x_13.name AS arg, ((LOG(((ABS(FARM_FINGERPRINT(CONCAT("float_hash:", CAST(CONCAT(CONCAT("affinity:", CAST(x AS STRING)), CAST(x_13.name AS STRING)) AS STRING))))) / (9223372036854775807)))) / (x_13.weight)) as value)).arg order by (STRUCT(x_13.name AS arg, ((LOG(((ABS(FARM_FINGERPRINT(CONCAT("float_hash:", CAST(CONCAT(CONCAT("affinity:", CAST(x AS STRING)), CAST(x_13.name AS STRING)) AS STRING))))) / (9223372036854775807)))) / (x_13.weight)) as value)).value desc limit 1)[OFFSET(0)] AS logica_value
FROM
  UNNEST(weighted_options) as x_13));

SELECT
  x_1 AS col0,
  hashing_WeightedAffinityHash(x_1, ARRAY[STRUCT("apple" AS name, 1.0 AS weight), STRUCT("orange" AS name, 2.0 AS weight), STRUCT("banana" AS name, 0.5 AS weight)]) AS h1,
  hashing_WeightedAffinityHash(x_1, ARRAY[STRUCT("apple" AS name, 1.0 AS weight

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following table is stored at [1mQ2[0m variable.


Unnamed: 0,col0,h1,h2
0,1,orange,orange
1,2,orange,orange
2,3,orange,banana
3,4,apple,apple
4,5,apple,apple
5,6,banana,banana
6,7,apple,apple
7,8,orange,orange


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Running TotalProbOfLL


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following query is stored at [1mTotalProbOfLL_sql[0m variable.
SELECT
  SUM(((20) * (((((EXP(- ((20) * (((x_1) / (300000)))))) / (((1) - (EXP(- 20)))))) / (300000))))) AS logica_value
FROM
  UNNEST(GENERATE_ARRAY(0, 300000 - 1)) as x_1;


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following table is stored at [1mTotalProbOfLL[0m variable.


Unnamed: 0,logica_value
0,1.000033


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Building deep sketch

In [None]:
%%logica RequiredAntidepth, DeepSketch, DeepSketchSize

import hashing.LiquidHash;
import hashing.ConfigLL;

Universe(x) :- util.hundred_million(x);

@OrderBy(UniverseSketch, "col0");
UniverseSketch(LiquidHash(x)) distinct :-
  Universe(x);

@OrderBy(Zeros, "col0");
Zeros(zero_position) :-
  ConfigLL(m:),
  zero_position in Range(m),
  ~(zero_position == LiquidHash(x), Universe(x));

@Ground(RequiredAntidepth);
RequiredAntidepth() Min= zero_position :- Zeros(zero_position);

@Ground(DeepSketch);
@OrderBy(DeepSketch, "register");
DeepSketch(register:, elements? Set= element) distinct :-
  Universe(x),
  register == LiquidHash(x),
  ConfigLL(m:),
  element == (
      if (register >= RequiredAntidepth()) then
        x
      else
        -1
  );
 
 DeepSketchSize() += 1 :- DeepSketch(elements:), e in elements, e >= 0;


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Running RequiredAntidepth


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following query is stored at [1mRequiredAntidepth_sql[0m variable.
WITH t_0_Zeros AS (SELECT
  x_4 AS col0
FROM
  UNNEST(GENERATE_ARRAY(0, 300000 - 1)) as x_4
WHERE
  ((SELECT
    MIN(1) AS logica_value
  FROM
    util.hundred_million AS util_hundred_million
  WHERE
    x_4 = CAST(((((1) - (((LOG(((EXP(20)) + (((((ABS(FARM_FINGERPRINT(CONCAT("float_hash:", CAST(util_hundred_million.col0 AS STRING))))) / (9223372036854775807))) * (((1) - (EXP(20))))))))) / (20))))) * (300000)) AS INT64)) IS NULL) ORDER BY col0)
SELECT
  MIN(Zeros.col0) AS logica_value
FROM
  t_0_Zeros AS Zeros;


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following table is stored at [1mRequiredAntidepth[0m variable.


Unnamed: 0,logica_value
0,100061


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Running DeepSketch


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following query is stored at [1mDeepSketch_sql[0m variable.
DROP TABLE IF EXISTS logica_test.RequiredAntidepth;
CREATE TABLE logica_test.RequiredAntidepth AS WITH t_2_Zeros AS (SELECT
  x_27 AS col0
FROM
  UNNEST(GENERATE_ARRAY(0, 300000 - 1)) as x_27
WHERE
  ((SELECT
    MIN(1) AS logica_value
  FROM
    util.hundred_million AS t_6_util_hundred_million
  WHERE
    x_27 = CAST(((((1) - (((LOG(((EXP(20)) + (((((ABS(FARM_FINGERPRINT(CONCAT("float_hash:", CAST(t_6_util_hundred_million.col0 AS STRING))))) / (9223372036854775807))) * (((1) - (EXP(20))))))))) / (20))))) * (300000)) AS INT64)) IS NULL) ORDER BY col0)
SELECT
  MIN(Zeros.col0) AS logica_value
FROM
  t_2_Zeros AS Zeros;

-- Interacting with table logica_test.RequiredAntidepth

SELECT
  CAST(((((1) - (((LOG(((EXP(20)) + (((((ABS(FARM_FINGERPRINT(CONCAT("float_hash:", CAST(util_hundred_million.col0 AS STRING))))) / (9223372036854775807))) * (((1) - (EXP(20))))))))) / (20))))) * (300000)) AS INT64) AS register,
  ARRAY_AGG(DI

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following table is stored at [1mDeepSketch[0m variable.


Unnamed: 0,register,elements
0,0,[-1]
1,1,[-1]
2,2,[-1]
3,3,[-1]
4,4,[-1]
...,...,...
140827,253408,[49473541]
140828,255198,[66146163]
140829,259178,[75903492]
140830,265122,[13904701]


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Running DeepSketchSize


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following query is stored at [1mDeepSketchSize_sql[0m variable.
DROP TABLE IF EXISTS logica_test.RequiredAntidepth;
CREATE TABLE logica_test.RequiredAntidepth AS WITH t_2_Zeros AS (SELECT
  x_29 AS col0
FROM
  UNNEST(GENERATE_ARRAY(0, 300000 - 1)) as x_29
WHERE
  ((SELECT
    MIN(1) AS logica_value
  FROM
    util.hundred_million AS t_6_util_hundred_million
  WHERE
    x_29 = CAST(((((1) - (((LOG(((EXP(20)) + (((((ABS(FARM_FINGERPRINT(CONCAT("float_hash:", CAST(t_6_util_hundred_million.col0 AS STRING))))) / (9223372036854775807))) * (((1) - (EXP(20))))))))) / (20))))) * (300000)) AS INT64)) IS NULL) ORDER BY col0)
SELECT
  MIN(Zeros.col0) AS logica_value
FROM
  t_2_Zeros AS Zeros;

-- Interacting with table logica_test.RequiredAntidepth

DROP TABLE IF EXISTS logica_test.DeepSketch;
CREATE TABLE logica_test.DeepSketch AS SELECT
  CAST(((((1) - (((LOG(((EXP(20)) + (((((ABS(FARM_FINGERPRINT(CONCAT("float_hash:", CAST(util_hundred_million.col0 AS STRING))))) / (9223372036854775807)))

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following table is stored at [1mDeepSketchSize[0m variable.


Unnamed: 0,logica_value
0,126907


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
%%logica HeavyElementCount

@Ground(DeepSketch);

@Ground(HeavyElement);
HeavyElement(register:, element: e) :-
  DeepSketch(register:, elements:), e in elements, e >= 0;

HeavyElementCount() += 1 :- HeavyElement();


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Running HeavyElementCount


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following query is stored at [1mHeavyElementCount_sql[0m variable.
-- Interacting with table logica_test.DeepSketch

DROP TABLE IF EXISTS logica_test.HeavyElement;
CREATE TABLE logica_test.HeavyElement AS SELECT
  DeepSketch.register AS register,
  x_4 AS element
FROM
  logica_test.DeepSketch AS DeepSketch, UNNEST(DeepSketch.elements) as x_4
WHERE
  (x_4 >= 0);

-- Interacting with table logica_test.HeavyElement

SELECT
  SUM(1) AS logica_value
FROM
  logica_test.HeavyElement AS HeavyElement;


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following table is stored at [1mHeavyElementCount[0m variable.


Unnamed: 0,logica_value
0,126907


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Deep sketch sampling

### Sampling high registers

In [None]:
%%logica ShowPanelistOfHeavyElement
 
@Ground(HeavyElement);

import hashing.RangeHash;
PanelList() = Range(10000);

@OrderBy(PanelistOfHeavyElement, "register");
@Ground(PanelistOfHeavyElement);
PanelistOfHeavyElement(register:, element:, panelist:) :-
  HeavyElement(register:, element:),
  # To speed up the simulation we are taking a computational shortcut here,
  # using RangeHash, while in production weighted affinity hashing needs to be
  # used.
  # BigQuery is not tuned to computations like this one, where we start from
  # a relatively small input (300k registers + 10k panelists), but then do a
  # double-loop calculation (300k registers x 10k panelists).
  # At this magnitude this calculation can be implemented in C++ and will run
  # in reasonable time on one machine, or can be parallelized to bring
  # computation time to seconds.
  # Note that this mapping needs to be done once a day, which means that its
  # contribution to overall cost is negligible.
  # For the situation when weights of the panelists are equal and constant the
  # weighted affinity hashing is statistically equivalent with RangeHash. They
  # both sample from uniform distribution over panelists, the affinity hash
  # just has ability to handle non-uniform weights and minimizes impact of
  # changes in the set of panelists.
  panelist == RangeHash(element, num: 10000);

ShowPanelistOfHeavyElement(..r) :- PanelistOfHeavyElement(..r);


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Running ShowPanelistOfHeavyElement


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following query is stored at [1mShowPanelistOfHeavyElement_sql[0m variable.
-- Interacting with table logica_test.HeavyElement

DROP TABLE IF EXISTS logica_test.PanelistOfHeavyElement;
CREATE TABLE logica_test.PanelistOfHeavyElement AS SELECT
  HeavyElement.register AS register,
  HeavyElement.element AS element,
  MOD(ABS(FARM_FINGERPRINT(CONCAT("positive_hash:", CAST(HeavyElement.element AS STRING)))), 10000) AS panelist
FROM
  logica_test.HeavyElement AS HeavyElement ORDER BY register;

-- Interacting with table logica_test.PanelistOfHeavyElement

SELECT
  PanelistOfHeavyElement.*
FROM
  logica_test.PanelistOfHeavyElement AS PanelistOfHeavyElement;


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following table is stored at [1mShowPanelistOfHeavyElement[0m variable.


Unnamed: 0,register,element,panelist
0,100062,98677268,2888
1,100062,67009189,8925
2,100062,81111575,6568
3,100062,47319962,5518
4,100062,10424832,2191
...,...,...,...
126902,253408,49473541,9097
126903,255198,66146163,9785
126904,259178,75903492,8534
126905,265122,13904701,6069


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
%%logica PanelistToHighRegisters

@Ground(PanelistOfHeavyElement);

PanelistToHighRegisters(panelist:, registers? Set= register) distinct :-
  PanelistOfHeavyElement(panelist:, register:);



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Running PanelistToHighRegisters


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following query is stored at [1mPanelistToHighRegisters_sql[0m variable.
-- Interacting with table logica_test.PanelistOfHeavyElement

SELECT
  PanelistOfHeavyElement.panelist AS panelist,
  ARRAY_AGG(DISTINCT PanelistOfHeavyElement.register) AS registers
FROM
  logica_test.PanelistOfHeavyElement AS PanelistOfHeavyElement
GROUP BY panelist;


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following table is stored at [1mPanelistToHighRegisters[0m variable.


Unnamed: 0,panelist,registers
0,2888,"[100062, 100204, 101519, 101591, 101657, 10234..."
1,8925,"[100062, 106587, 110737, 112158, 112639, 11413..."
2,6568,"[100062, 100777, 102473, 103158, 106995, 10797..."
3,5518,"[100062, 100229, 100612, 100853, 104250, 10484..."
4,2191,"[100062, 102749, 102781, 102816, 103472, 10459..."
...,...,...
9995,5171,"[113822, 114126, 116008, 117018, 121591, 12335..."
9996,6515,"[114338, 116573, 118263, 123871, 126462, 12860..."
9997,5235,"[114540, 124728]"
9998,6615,"[115948, 117787, 119916, 124051, 133651, 147058]"


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Sampling low registers

In [None]:
%%logica NumPairs

import hashing.LiquidHashProb;
import hashing.FloatHash;

@Ground(RequiredAntidepth);
@Ground(PanelistOfLowRegister);
PanelistOfLowRegister(register:, panelist:) :-
  # This trick, of using large dataset as input lets us hint BigQuery that it
  # needs to use many machines.
  util.ten_billion(n),
  register == Div(n, 10000),
  panelist == Mod(n, 10000),
  register < RequiredAntidepth(),
  prob == LiquidHashProb(register),
  hit_prob == 1 - (1 - prob) ^ 10000,
  FloatHash("low-register:" ++
            ToString(register) ++
            "-" ++ ToString(panelist)) < hit_prob;

NumPairs() += 1 :- PanelistOfLowRegister();


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Running NumPairs


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following query is stored at [1mNumPairs_sql[0m variable.
-- Interacting with table logica_test.RequiredAntidepth

DROP TABLE IF EXISTS logica_test.PanelistOfLowRegister;
CREATE TABLE logica_test.PanelistOfLowRegister AS SELECT
  DIV(util_ten_billion.col0, 10000) AS register,
  MOD(util_ten_billion.col0, 10000) AS panelist
FROM
  util.ten_billion AS util_ten_billion, logica_test.RequiredAntidepth AS RequiredAntidepth
WHERE
  (DIV(util_ten_billion.col0, 10000) < RequiredAntidepth.logica_value) AND
  (((ABS(FARM_FINGERPRINT(CONCAT("float_hash:", CAST(CONCAT(CONCAT(CONCAT("low-register:", CAST(DIV(util_ten_billion.col0, 10000) AS STRING)), "-"), CAST(MOD(util_ten_billion.col0, 10000) AS STRING)) AS STRING))))) / (9223372036854775807)) < ((1) - ((POW(((1) - (((20) * (((((EXP(- ((20) * (((DIV(util_ten_billion.col0, 10000)) / (300000)))))) / (((1) - (EXP(- 20)))))) / (300000)))))), 10000)))));

-- Interacting with table logica_test.PanelistOfLowRegister

SELECT
  SUM(1) AS logica_value

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following table is stored at [1mNumPairs[0m variable.


Unnamed: 0,logica_value
0,85406358


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Combining low and high sketches

In [None]:
%%logica Save

@Ground(PanelistOfLowRegister);
@Ground(PanelistOfHeavyElement);

@Ground(PanelistToRegister);
PanelistToRegister(panelist:, register:) :-
  PanelistOfLowRegister(panelist:, register:) |
  PanelistOfHeavyElement(panelist:, register:);

Save() += 1 :- PanelistToRegister();

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Running Save


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following query is stored at [1mSave_sql[0m variable.
-- Interacting with table logica_test.PanelistOfLowRegister

-- Interacting with table logica_test.PanelistOfHeavyElement

DROP TABLE IF EXISTS logica_test.PanelistToRegister;
CREATE TABLE logica_test.PanelistToRegister AS SELECT * FROM (
  
    SELECT
      PanelistOfLowRegister.panelist AS panelist,
      PanelistOfLowRegister.register AS register
    FROM
      logica_test.PanelistOfLowRegister AS PanelistOfLowRegister
   UNION ALL
  
    SELECT
      PanelistOfHeavyElement.panelist AS panelist,
      PanelistOfHeavyElement.register AS register
    FROM
      logica_test.PanelistOfHeavyElement AS PanelistOfHeavyElement
  
) AS UNUSED_TABLE_NAME  ;

-- Interacting with table logica_test.PanelistToRegister

SELECT
  SUM(1) AS logica_value
FROM
  logica_test.PanelistToRegister AS PanelistToRegister;


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following table is stored at [1mSave[0m variable.


Unnamed: 0,logica_value
0,85533265


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
%%logica Q

@Ground(PanelistToRegister);

# Looking at a sketch of one panelist.
Q(register) :-
  PanelistToRegister(panelist: 10, register:);

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Running Q


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following query is stored at [1mQ_sql[0m variable.
-- Interacting with table logica_test.PanelistToRegister

SELECT
  PanelistToRegister.register AS col0
FROM
  logica_test.PanelistToRegister AS PanelistToRegister
WHERE
  PanelistToRegister.panelist = 10;


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following table is stored at [1mQ[0m variable.


Unnamed: 0,col0
0,8376
1,47664
2,5860
3,6742
4,53418
...,...
8523,26363
8524,1395
8525,73061
8526,16105


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Simulation

### Simulating TV parts of campaigns

In [None]:
%%logica RegisterCount

import hashing.FloatHash;
@Ground(PanelistToRegister);

@Ground(Audience);
Audience(audience:, panelist:) :-
  panelist in Range(10000),
  audience in ["a05", "a3", "a5", "a8"],
  fraction == (
      if audience == "a05" then
        0.05
      else if audience == "a3" then
        0.3
      else if audience == "a5" then
        0.5
      else if audience == "a8" then
        0.8
      else
        Error("unknown audience")
  ),
  FloatHash("audience" ++ ToString(panelist)) < fraction;

@Ground(AudienceSketch);
AudienceSketch(audience:, register:) distinct :-
  Audience(audience:, panelist:),
  PanelistToRegister(panelist:, register:);

RegisterCount(audience:, register_count? += 1) distinct :-
  AudienceSketch(audience:)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Running RegisterCount


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following query is stored at [1mRegisterCount_sql[0m variable.
DROP TABLE IF EXISTS logica_test.Audience;
CREATE TABLE logica_test.Audience AS SELECT
  x_12 AS audience,
  x_11 AS panelist
FROM
  UNNEST(GENERATE_ARRAY(0, 10000 - 1)) as x_11, UNNEST(ARRAY["a05", "a3", "a5", "a8"]) as x_12
WHERE
  (((ABS(FARM_FINGERPRINT(CONCAT("float_hash:", CAST(CONCAT("audience", CAST(x_11 AS STRING)) AS STRING))))) / (9223372036854775807)) < CASE WHEN x_12 = "a05" THEN 0.05 WHEN x_12 = "a3" THEN 0.3 WHEN x_12 = "a5" THEN 0.5 WHEN x_12 = "a8" THEN 0.8 ELSE ERROR("unknown audience") END);

-- Interacting with table logica_test.Audience

-- Interacting with table logica_test.PanelistToRegister

DROP TABLE IF EXISTS logica_test.AudienceSketch;
CREATE TABLE logica_test.AudienceSketch AS SELECT
  Audience.audience AS audience,
  PanelistToRegister.register AS register
FROM
  logica_test.Audience AS Audience, logica_test.PanelistToRegister AS PanelistToRegister
WHERE
  PanelistToRegister.panelist = Au

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following table is stored at [1mRegisterCount[0m variable.


Unnamed: 0,audience,register_count
0,a3,122507
1,a5,130114
2,a8,137327
3,a05,95873


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
%%logica AudienceNumRegisters

@Ground(AudienceSketch);
AudienceNumRegisters() += 1 :- AudienceSketch()

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Running AudienceNumRegisters


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following query is stored at [1mAudienceNumRegisters_sql[0m variable.
-- Interacting with table logica_test.AudienceSketch

SELECT
  SUM(1) AS logica_value
FROM
  logica_test.AudienceSketch AS AudienceSketch;


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following table is stored at [1mAudienceNumRegisters[0m variable.


Unnamed: 0,logica_value
0,485821


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
l = liquid_legions.LiquidLegions(20, 300000, 0)

RegisterCount['register_count'].map(lambda x: x * 2)
RegisterCount['estimate'] = RegisterCount['register_count'].map(
    l.get_cardinality_for_legionaries_count)

RegisterCount



  +special.expi(-a * numpy.exp(a) * t / (numpy.exp(a) - 1))) / a


Unnamed: 0,audience,register_count,estimate
0,a3,122507,29672380.0
1,a5,130114,49271840.0
2,a8,137327,79696330.0
3,a05,95873,5025944.0


### Simulating digital parts of campaigns

In [None]:
%%logica DigitalRegisterCount

import hashing.FloatHash;
import hashing.LiquidHash;

Universe(x) :- util.hundred_million(x);

@Ground(DigitalAudience);
DigitalAudience(audience:, element:) :-
  audience in ["d05", "d3", "d5", "d8"],
  Universe(element),
  fraction == (
      if audience == "d05" then
        0.05
      else if audience == "d3" then
        0.3
      else if audience == "d5" then
        0.5
      else if audience == "d8" then
        0.8
      else
        Error("unknown audience")
  ),
  FloatHash("audience" ++ ToString(element)) < fraction;

@Ground(DigitalAudienceSketch);
DigitalAudienceSketch(audience:, register:) distinct :-
  DigitalAudience(audience:, element:),
  register == LiquidHash(element);

DigitalRegisterCount(audience:, register_count? += 1) distinct :-
  DigitalAudienceSketch(audience:)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Running DigitalRegisterCount


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following query is stored at [1mDigitalRegisterCount_sql[0m variable.
DROP TABLE IF EXISTS logica_test.DigitalAudience;
CREATE TABLE logica_test.DigitalAudience AS SELECT
  x_25 AS audience,
  util_hundred_million.col0 AS element
FROM
  util.hundred_million AS util_hundred_million, UNNEST(ARRAY["d05", "d3", "d5", "d8"]) as x_25
WHERE
  (((ABS(FARM_FINGERPRINT(CONCAT("float_hash:", CAST(CONCAT("audience", CAST(util_hundred_million.col0 AS STRING)) AS STRING))))) / (9223372036854775807)) < CASE WHEN x_25 = "d05" THEN 0.05 WHEN x_25 = "d3" THEN 0.3 WHEN x_25 = "d5" THEN 0.5 WHEN x_25 = "d8" THEN 0.8 ELSE ERROR("unknown audience") END);

-- Interacting with table logica_test.DigitalAudience

DROP TABLE IF EXISTS logica_test.DigitalAudienceSketch;
CREATE TABLE logica_test.DigitalAudienceSketch AS SELECT
  DigitalAudience.audience AS audience,
  CAST(((((1) - (((LOG(((EXP(20)) + (((((ABS(FARM_FINGERPRINT(CONCAT("float_hash:", CAST(DigitalAudience.element AS STRING))))) / (9223372036854

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following table is stored at [1mDigitalRegisterCount[0m variable.


Unnamed: 0,audience,register_count
0,d3,122730
1,d5,130350
2,d8,137479
3,d05,95747


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
l = liquid_legions.LiquidLegions(20, 300000, 0)
DigitalRegisterCount['estimate'] = DigitalRegisterCount['register_count'].map(
    l.get_cardinality_for_legionaries_count)
DigitalRegisterCount


  +special.expi(-a * numpy.exp(a) * t / (numpy.exp(a) - 1))) / a


Unnamed: 0,audience,register_count,estimate
0,d3,122730,30116810.0
1,d5,130350,50053180.0
2,d8,137479,80508030.0
3,d05,95747,4983903.0


### Assembling cross-media campaigns

In [None]:
%%logica XMediaRegisterCount


AllXMediaAudiences() = [
    {name: "x3-5", tv: "a3", digital: "d5"},
    {name: "x8-8", tv: "a8", digital: "d8"},
    {name: "x5-5", tv: "a5", digital: "d5"},
    {name: "x05-3", tv: "a05", digital: "d3"},
    {name: "x5-05", tv: "a5", digital: "d05"},
    {name: "x05-05", tv: "a05", digital: "d05"}
];

@Ground(AudienceSketch);
@Ground(DigitalAudienceSketch);

@Ground(XMediaSketch);
XMediaSketch(audience: audience.name,
             tv_audience: audience.tv,
             digital_audience: audience.digital,
             register:) distinct :-
  AudienceSketch(audience: tv_audience, register:),
  all_xmedia_audiences == AllXMediaAudiences(),
  xmedia_audiences List= (a :- a in all_xmedia_audiences,
                          a.tv == tv_audience),
  audience in xmedia_audiences;

XMediaSketch(audience: audience.name,
             tv_audience: audience.tv,
             digital_audience: audience.digital,
             register:) distinct :-
  DigitalAudienceSketch(audience: digital_audience, register:),
  all_xmedia_audiences == AllXMediaAudiences(),
  xmedia_audiences List= (a :- a in all_xmedia_audiences,
                          a.digital == digital_audience),
  audience in xmedia_audiences;

@OrderBy(XMediaRegisterCount, "audience");
XMediaRegisterCount(audience:, tv_audience:, digital_audience:,
                    register_count? += 1) distinct :-
  XMediaSketch(audience:, tv_audience:, digital_audience:);

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Running XMediaRegisterCount


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following query is stored at [1mXMediaRegisterCount_sql[0m variable.
-- Interacting with table logica_test.AudienceSketch

-- Interacting with table logica_test.DigitalAudienceSketch

DROP TABLE IF EXISTS logica_test.XMediaSketch;
CREATE TABLE logica_test.XMediaSketch AS WITH t_0_XMediaSketch_MultBodyAggAux AS (SELECT * FROM (
  
    SELECT
      x_19.name AS audience,
      x_19.tv AS tv_audience,
      x_19.digital AS digital_audience,
      AudienceSketch.register AS register
    FROM
      logica_test.AudienceSketch AS AudienceSketch, UNNEST((SELECT
        ARRAY_AGG(x_21) AS logica_value
      FROM
        UNNEST(ARRAY[STRUCT("x3-5" AS name, "a3" AS tv, "d5" AS digital), STRUCT("x8-8" AS name, "a8" AS tv, "d8" AS digital), STRUCT("x5-5" AS name, "a5" AS tv, "d5" AS digital), STRUCT("x05-3" AS name, "a05" AS tv, "d3" AS digital), STRUCT("x5-05" AS name, "a5" AS tv, "d05" AS digital), STRUCT("x05-05" AS name, "a05" AS tv, "d05" AS digital)]) as x_21
      WHERE
        x_21.tv

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The following table is stored at [1mXMediaRegisterCount[0m variable.


Unnamed: 0,audience,tv_audience,digital_audience,register_count
0,x05-05,a05,d05,105936
1,x05-3,a05,d3,124383
2,x3-5,a3,d5,134308
3,x5-05,a5,d05,130880
4,x5-5,a5,d5,136383
5,x8-8,a8,d8,140234


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Simulation Results

In [None]:
l = liquid_legions.LiquidLegions(20, 300000, 0)
XMediaRegisterCount['estimate'] = XMediaRegisterCount['register_count'].map(
    l.get_cardinality_for_legionaries_count)
XMediaRegisterCount


  +special.expi(-a * numpy.exp(a) * t / (numpy.exp(a) - 1))) / a


Unnamed: 0,audience,tv_audience,digital_audience,register_count,estimate
0,x05-05,a05,d05,105936,9830410.0
1,x05-3,a05,d3,124383,33625470.0
2,x3-5,a3,d5,134308,65167110.0
3,x5-05,a5,d05,130880,51853360.0
4,x5-5,a5,d5,136383,74835290.0
5,x8-8,a8,d8,140234,96739890.0


In [None]:
tv_audience_size = {'a3': 0.3, 'a5': 0.5, 'a8': 0.8, 'a05': 0.05}
digital_audience_size = {'d3': 0.3, 'd5': 0.5, 'd8': 0.8, 'd05': 0.05}

for i, r in XMediaRegisterCount.iterrows():
  XMediaRegisterCount.at[i, 'true_reach'] = (
    100000000 * (1 - (1 - tv_audience_size[r.tv_audience]) * (1 - digital_audience_size[r.digital_audience]))
  )

XMediaRegisterCount['rel_error'] = XMediaRegisterCount['estimate'] / XMediaRegisterCount['true_reach'] - 1.0
XMediaRegisterCount

Unnamed: 0,audience,tv_audience,digital_audience,register_count,estimate,true_reach,rel_error
0,x05-05,a05,d05,105936,9830410.0,9750000.0,0.008247
1,x05-3,a05,d3,124383,33625470.0,33500000.0,0.003745
2,x3-5,a3,d5,134308,65167110.0,65000000.0,0.002571
3,x5-05,a5,d05,130880,51853360.0,52500000.0,-0.012317
4,x5-5,a5,d5,136383,74835290.0,75000000.0,-0.002196
5,x8-8,a8,d8,140234,96739890.0,96000000.0,0.007707


In [None]:
XMediaRegisterCount[['audience', 'estimate', 'true_reach', 'rel_error']]

Unnamed: 0,audience,estimate,true_reach,rel_error
0,x05-05,9830410.0,9750000.0,0.008247
1,x05-3,33625470.0,33500000.0,0.003745
2,x3-5,65167110.0,65000000.0,0.002571
3,x5-05,51853360.0,52500000.0,-0.012317
4,x5-5,74835290.0,75000000.0,-0.002196
5,x8-8,96739890.0,96000000.0,0.007707


Verifying that the simulation actually measures the error.
Here is what happens if we force the sketch to be too shallow.
![bad_depth](DeepLiquidSampling_Files/deep_ll_bad_antidepth.png)

In [None]:
end_of_computation = datetime.datetime.now()
print("Simulation ended at: %s" % end_of_computation)
print('Wall time from start to end of computation: %.3f seconds' % (end_of_computation - start_of_computation).total_seconds())

Screenshotting time:
![time](DeepLiquidSampling_Files/deep_ll_runtime.png)