# Example - Scientific Computing - Multiple Players

In this example, we demonstrate how we can perform scientific computation accross multiple data owners while keeping the data always encrypted during computation.

For simplicity, we will perform this computation between the different parties locally using the `LocalMooseRutime`. To see how you can execute Moose computation over the network, you can find examples in the [reinder folder](https://github.com/tf-encrypted/moose/tree/main/reindeer).

In [5]:
import numpy as np

from pymoose import edsl
from pymoose.testing import LocalMooseRuntime

np.random.seed(1234)
FIXED = edsl.fixed(24, 40)

### Use case

The use case we are trying to solve is the following: researchers would like to measure the correlation between alcohol consumption and students grades. However the alcohol consumption data and grades data are owned respectively by the Department of Public Health and the Department of Education. These datasets are too sensitive to be moved to a central location or exposed the directly to the researchers. To solve this problem, we want to compute the correlation metric on an encrypted version of these datasets. 

### Data

For this demo, we are generating synthetic datasets for 100 students. Of course the correlation result is not true. It's just to illustrace how Moose can be used.

In [6]:
def generate_synthetic_correlated_data(n_samples):
    mu = np.array([10, 0])
    r = np.array([
            [  3.40, -2.75],
            [ -2.75,  5.50],
        ])
    rng = np.random.default_rng(12)
    x = rng.multivariate_normal(mu, r, size=n_samples)
    return x[:, 0], x[:, 1]

alcohol_consumption, grades = generate_synthetic_correlated_data(100)

print(f"Acohol consumption data from Departement of Public Health: {alcohol_consumption[:5]}")
print(f"Grades data from Departement of Education: {alcohol_consumption[:5]}")


Acohol consumption data from Departement of Public Health: [11.06803447  9.58819631  6.28498731  9.63183684 11.17578054]
Grades data from Departement of Education: [11.06803447  9.58819631  6.28498731  9.63183684 11.17578054]


### Define Moose Computation

To measure the correlation between the alcohol consumption and students grades, we will compute the [pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient).

To express this computation, Moose offers a python EDSL. As you will notice, the syntax is very similar to the scientific computation library [Numpy](https://numpy.org/).

The main difference is the notion of placements: host placement and replicated placement. With Moose, every operations under a host placement context afd computed on plaintext values (not encrypted). Every operation under a replicated placement are performed on secret shared values (encrypted). 

We will compute the correlation coefficient between three different players, each of them representing a host placenent: Departement of Public Health, Departement of Education and a data scientist. The three players are grouped under the replicated placement to perform the encrypted computation. 

The Moose computation below perform the following steps:
- Departement of Public Health load the data in plaintext from its storage.
- Department of Education load the data in plaintext in plaintext from its storage.
- Secret share (encrypt) the datasets.
- Compute the correlation coefficient on secret shared data.
- Reveal the correlation result only to the data scientist and save it into the storage.

In [25]:
def correlation_computation():
    pub_health_dpt = edsl.host_placement(name="pub_health_dpt")
    education_dpt = edsl.host_placement(name="education_dpt")
    data_scientist = edsl.host_placement(name="data_scientist")
    
    encrypted_governement = edsl.replicated_placement(
        name="encrypted_governement",
        players=[pub_health_dpt, education_dpt, data_scientist],
    )

    def pearson_correlation_coefficient(x, y):
        x_mean = edsl.mean(x, 0)
        y_mean = edsl.mean(y, 0)
        stdv_x = edsl.sum(edsl.square(edsl.sub(x, x_mean)))
        stdv_y = edsl.sum(edsl.square(edsl.sub(y, y_mean)))
        corr_num = edsl.sum(edsl.mul(edsl.sub(x, x_mean), edsl.sub(y, y_mean)))
        corr_denom = edsl.sqrt(edsl.mul(stdv_x, stdv_y))
        return edsl.div(corr_num, corr_denom)

    @edsl.computation
    def moose_comp():

        # Department of Public Health load the data in plaintext
        # Then the data gets converted from float to fixed-point
        with pub_health_dpt:
            alcohol = edsl.load("alcohol_data", dtype=edsl.float64)
            alcohol = edsl.cast(alcohol, dtype=FIXED)

        # Department of Education load the data in plaintext
        # Then the data gets converted from float to fixed-point
        with education_dpt:
            grades = edsl.load("grades_data", dtype=edsl.float64)
            grades = edsl.cast(grades, dtype=FIXED)

        # Alcohol and grades data gets secret shared when moving from host placement
        # to replicated placement.
        # Then compute the correlation coefficient on secret shared data
        with encrypted_governement:
            correlation = pearson_correlation_coefficient(alcohol, grades)

        # Only the correlation coefficient gets revealed to the data scientist
        # Convert the data from fixed-point to floats and save the data in the storage
        with data_scientist:
            correlation = edsl.cast(correlation, dtype=edsl.float64)
            correlation = edsl.save("correlation", correlation)

        return correlation

    return moose_comp

### Evaluate Computation

For simplicity, we will use `LocalMooseRuntime` to evaluate this computation. To do so, we need to provide: a Moose computation, a role assingment and the data to be stored in the storage of each executor (we have one executor for each player). 

We can obtain a Moose computation by tracing the python function with `edsl.trace`. The role assignment consists of a mapping between a role (e.g. `pub_health_dpt` or `education_dpt`) and an ID (here, it's a simple string but could be an IP address if it was over the network). For the executors storage, we provide a dictionary mapping a dataset to a key for each ID part of the computation. The key will be used by the load operation to load the dataset. We can also provide arguments to the computation but we don't have any in this example. Once we you have instantiated the `LocalMooseRuntime` with the executors storage, you are ready to evaluate the computation with `evaluate_computation`.

In [27]:
computation = correlation_computation()
logical_comp = edsl.trace(computation)

role_assignment = {
        "pub_health_dpt": "pub_health_dpt_id",
        "education_dpt": "education_dpt_id",
        "data_scientist": "data_scientist_id",
    }


executors_storage = {
            "pub_health_dpt_id": {"alcohol_data": alcohol_consumption},
            "education_dpt_id": {"grades_data": grades},
            "data_scientist_id": {},
        }

runtime = LocalMooseRuntime(storage_mapping=executors_storage)

runtime.evaluate_computation(
    computation=logical_comp,
    role_assignment=role_assignment,
    arguments={},
)

### Results

Once the computation is done, we can extract the result. The correlation coefficient has been stored in the data scientist's storage. We can extract the value from the storage with `read_value_from_storage`.

In [29]:
moose_correlation = runtime.read_value_from_storage("data_scientist_id", "correlation")
print(f"Correlation result with PyMoose: {moose_correlation}")

Correlation result with PyMoose: -0.5462326644019413


The correlation coefficient is equal to -0.54. 

We can validate the result on encrypted data matches the computation on plaintext data. To do so, we can compute the pearson correlation coefficient with numpy.

In [31]:
np_correlation = np.corrcoef(np.squeeze(alcohol_consumption), np.squeeze(grades))[1, 0]
print(f"Correlation result with Numpy: {np_correlation}")

Correlation result with Numpy: -0.5481005967856092


As you can see the coefficient are mathching up to the 2 decimal point (precision could be adjusted by setting more appropriately the fractional preision during the fixed-point conversion).

Voilà! You were able to compute the correlation while keeping the data encrypted during the entire proccess.