# Example - Scientific Computing - Multiple Players

In this example, we demonstrate how we can perform scientific computation accross multiple data owners while keeping the data always encrypted during computation.

For simplicity, we will perform this computation between the different parties locally using the `LocalMooseRutime`. To see how you can execute Moose computation over the network, you can find examples in the [reinder folder](https://github.com/tf-encrypted/moose/tree/main/reindeer).

In [65]:
import numpy as np

from pymoose import edsl
from pymoose.testing import LocalMooseRuntime

np.random.seed(1234)
FIXED = edsl.fixed(24, 40)

### Use case

The use case we are trying to solve is the following: researchers would like to measure the correlation between alcohol consumption and students grades. However the alcohol consumption data and grades data are owned respectively by the Department of Public Health and the Department of Education. These datasets are too sensitive to be moved to a central location or expose the raw data directly to the researchers. To solve this problem, we want to compute the correlation metric on an encrypted version of these datasets. 

### Data

For this demo, we are generating synthetic datasets for 100 students. Of course the correlation result is not true. It's just to illustrace how Moose can be used.

In [64]:
def generate_synthetic_correlated_data(n_samples):
    mu = np.array([10, 0])
    r = np.array([
            [  3.40, -2.75],
            [ -2.75,  5.50],
        ])
    rng = np.random.default_rng(12)
    x = rng.multivariate_normal(mu, r, size=n_samples)
    return x[:, 0], x[:, 1]

alcohol_consumption, grades = generate_synthetic_correlated_data(100)

print(f"Acohol consumption data from Departement of Public Health: {alcohol_consumption[:5]}")
print(f"Grades data from Departement of Education: {alcohol_consumption[:5]}")


Acohol consumption data from Departement of Public Health: [11.06803447  9.58819631  6.28498731  9.63183684 11.17578054]
Grades data from Departement of Education: [11.06803447  9.58819631  6.28498731  9.63183684 11.17578054]


### Define Moose Computation

To measure the correlation between the alcohol consumption and students grades, we will compute the [pearson correlation coefficients](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient).

To express this computation, Moose offer a python EDSL. As you will notice, the syntax is very similar to the scientific computation library [Numpy](https://numpy.org/).

The main difference is the notion of placements: host placement and replicated placement.

In [73]:
def correlation_computation():
    pub_health_dpt = edsl.host_placement(name="pub_health_dpt")
    education_dpt = edsl.host_placement(name="education_dpt")
    data_scientist = edsl.host_placement(name="data_scientist")
    
    encrypted_governement = edsl.replicated_placement(
        name="encrypted_governement",
        players=[pub_health_dpt, education_dpt, data_scientist],
    )

    def pearson_correlation_coefficient(x, y):
        x_mean = edsl.mean(x, 0)
        y_mean = edsl.mean(y, 0)
        stdv_x = edsl.sum(edsl.square(edsl.sub(x, x_mean)))
        stdv_y = edsl.sum(edsl.square(edsl.sub(y, y_mean)))
        corr_num = edsl.sum(edsl.mul(edsl.sub(x, x_mean), edsl.sub(y, y_mean)))
        corr_denom = edsl.sqrt(edsl.mul(stdv_x, stdv_y))
        return edsl.div(corr_num, corr_denom)

    @edsl.computation
    def moose_comp():

        with pub_health_dpt:
            alcohol = edsl.load("alcohol_data", dtype=edsl.float64)
            alcohol = edsl.cast(alcohol, dtype=FIXED)

        with education_dpt:
            grades = edsl.load("grades_data", dtype=edsl.float64)
            grades = edsl.cast(grades, dtype=FIXED)

        with encrypted_governement:
            correlation = pearson_correlation_coefficient(alcohol, grades)

        with data_scientist:
            correlation = edsl.cast(correlation, dtype=edsl.float64)
            correlation = edsl.save("correlation", correlation)

        return correlation

    return moose_comp

### Evaluate Computation

In [74]:
computation = correlation_computation()
logical_comp = edsl.trace(computation)

executors_storage = {
            "pub_health_dpt": {"alcohol_data": alcohol_consumption},
            "education_dpt": {"grades_data": grades},
            "data_scientist": {},
        }
runtime = LocalMooseRuntime(storage_mapping=executors_storage)

results = runtime.evaluate_computation(
    computation=logical_comp,
    role_assignment={
        "pub_health_dpt": "pub_health_dpt",
        "education_dpt": "education_dpt",
        "data_scientist": "data_scientist",
    },
    arguments={},
)

### Results

**Results:**
- The correlation between alcohol consumption and grades is:

In [55]:
correlation = runtime.read_value_from_storage("data_scientist", "correlation")
print(f"Correlation: {correlation}")

Correlation: -0.5462326644019413


We can validate the computation on encrypted data matches the computation on plaintext data.

In [45]:
np.corrcoef(np.squeeze(alcohol_consumption.to_numpy()), np.squeeze(grades.to_numpy()))[1, 0]

-0.5481005967856092