# Section: Federated Learning

# Lesson: Introducing Federated Learning

Federated Learning is a technique for training Deep Learning models on data to which you do not have access. Basically:

Federated Learning: Instead of bringing all the data to one machine and training a model, we bring the model to the data, train it locally, and merely upload "model updates" to a central server.

Use Cases:

    - app company (Texting prediction app)
    - predictive maintenance (automobiles / industrial engines)
    - wearable medical devices
    - ad blockers / autotomplete in browsers (Firefox/Brave)
    
Challenge Description: data is distributed amongst sources but we cannot aggregated it because of:

    - privacy concerns: legal, user discomfort, competitive dynamics
    - engineering: the bandwidth/storage requirements of aggregating the larger dataset

# Lesson: Introducing / Installing PySyft

In order to perform Federated Learning, we need to be able to use Deep Learning techniques on remote machines. This will require a new set of tools. Specifically, we will use an extensin of PyTorch called PySyft.

### Install PySyft

To begin, you'll need to make sure you have the right things installed. To do so, head on over to PySyft's readme and follow the setup instructions. TLDR for most folks is.

- Install Python 3.5 or higher
- Install PyTorch 1.0.x
- Clone PySyft (git clone https://github.com/OpenMined/PySyft.git)
- cd PySyft
- pip install -r requirements.txt
- python setup.py install
- python setup.py test

If any part of this doesn't work for you (or any of the tests fail) - first check the [README](https://github.com/OpenMined/PySyft.git) for installation help and then open a Github Issue or ping the #beginner channel in our slack! [slack.openmined.org](http://slack.openmined.org/)

In [1]:
import torch as th
x = th.tensor([1,2,3,4,5])
y = x + x
print(y[0])

tensor(2)


In [2]:
# Run this cell to see if things work
import syft as sy
hook = sy.TorchHook(th) # always run this when you import syft

th.tensor([1,2,3,4,5])

tensor([1, 2, 3, 4, 5])

# Lesson: Basic Remote Execution in PySyft

## PySyft => Remote PyTorch

The essence of Federated Learning is the ability to train models in parallel on a wide number of machines. Thus, we need the ability to tell remote machines to execute the operations required for Deep Learning.

Thus, instead of using Torch tensors - we're now going to work with **pointers** to tensors. Let me show you what I mean. First, let's create a "pretend" machine owned by a "pretend" person - we'll call him Bob.

In [3]:
bob = sy.VirtualWorker(hook, id="bob")

For all intenstive purposes, Bob's machine is on another planet - perhaps on Mars! But, at the moment the machine is empty. Let's create some data so that we can send it to Bob and learn about pointers!

In [4]:
x = th.tensor([1,2,3,4,5])
y = th.tensor([1,1,1,1,1])

In [5]:
x_ptr = x.send(bob)
y_ptr = y.send(bob)

In [6]:
bob._objects

{1249604828: tensor([1, 2, 3, 4, 5]), 98808961428: tensor([1, 1, 1, 1, 1])}

In [7]:
x_ptr

(Wrapper)>[PointerTensor | me:90625541592 -> bob:1249604828]

Check out that metadata! 

- loc: bob
- id@loc: 41493900892
- owner: me

This metadata gives us the inforamtion we need to understand this tensor!

In [8]:
x_ptr.location

<VirtualWorker id:bob #tensors:2>

In [9]:
bob

<VirtualWorker id:bob #tensors:2>

In [10]:
bob == x_ptr.location

True

The "id@loc" parameter is similar. It tells us the id that the Tensor object on Bob's machine has (the one that we're pointing to). See?

In [11]:
x_ptr.id_at_location

1249604828

And finally - we have the third attribute "owner: me" which is very similar to ".location". However, instead of specifying where the pointer is pointing, it specifies the owner of the pointer itself, which is me. 

Fun fact, just like we had a VirtualWorker object for Bob, we (by default) always have one for us as well. This worker is automatically created when we called "hook = sy.TorchHook()" and so you don't usually have to create it yourself.

In [12]:
me = sy.local_worker
me

<VirtualWorker id:me #tensors:0>

In [13]:
x_ptr.owner

<VirtualWorker id:me #tensors:0>

In [14]:
me == x_ptr.owner

True

And finally, just like we can call .send() on a tensor, we can call .get() on a pointer to a tensor to get it back!!!

In [15]:
x_ptr

(Wrapper)>[PointerTensor | me:90625541592 -> bob:1249604828]

In [16]:
x_ptr.get()

tensor([1, 2, 3, 4, 5])

In [17]:
y_ptr

(Wrapper)>[PointerTensor | me:57285895898 -> bob:98808961428]

In [18]:
y_ptr.get()

tensor([1, 1, 1, 1, 1])

In [19]:
bob._objects

{}

# Project: Playing with Remote Tensors

In this project, I want you to .send() and .get() a tensor to TWO workers by calling .send(bob,alice). This will first require the creation of another VirtualWorker called alice.

In [20]:
alice = sy.VirtualWorker(hook, id="alice")

In [21]:
x = th.tensor([1,2,3,4,5])

In [22]:
x_ptr = x.send(bob, alice)

In [23]:
bob._objects

{82541518575: tensor([1, 2, 3, 4, 5])}

In [24]:
alice._objects

{25804218573: tensor([1, 2, 3, 4, 5])}

In [25]:
x_ptr.get()

[tensor([1, 2, 3, 4, 5]), tensor([1, 2, 3, 4, 5])]

In [26]:
bob._objects

{}

In [27]:
alice._objects

{}

In [28]:
x_ptr = x.send(bob, alice)
x_ptr.get(sum_results=True)

tensor([ 2,  4,  6,  8, 10])

# Lesson: Introducing Remote Arithmetic

In [29]:
x = th.tensor([1,2,3,4,5]).send(bob)
y = th.tensor([1,1,1,1,1]).send(bob)

In [30]:
x

(Wrapper)>[PointerTensor | me:35821734486 -> bob:44074034476]

In [31]:
y

(Wrapper)>[PointerTensor | me:34708268928 -> bob:58962090232]

In [32]:
z = x + y

In [33]:
z

(Wrapper)>[PointerTensor | me:55183869386 -> bob:55183869386]

In [34]:
z.get()

tensor([2, 3, 4, 5, 6])

In [35]:
z = th.add(x,y)
z

(Wrapper)>[PointerTensor | me:61153824411 -> bob:61153824411]

In [36]:
z.get()

tensor([2, 3, 4, 5, 6])

### Variables (including backpropagation!)

In [37]:
x = th.tensor([1.,2,3,4,5], requires_grad=True).send(bob)
y = th.tensor([1.,1,1,1,1], requires_grad=True).send(bob)

In [38]:
z = (x + y).sum()

In [39]:
z.backward()

In [40]:
x = x.get()

In [41]:
x

tensor([1., 2., 3., 4., 5.], requires_grad=True)

In [42]:
x.grad

tensor([1., 1., 1., 1., 1.])

# Project: Learn a Simple Linear Model

In this project, I'd like for you to create a simple linear model which will solve for the following dataset below. You should use only Variables and .backward() to do so (no optimizers or nn.Modules). Furthermore, you must do so with both the data and the model being located on Bob's machine.

In [43]:
input = th.tensor([[1.,1],[0,1],[1,0],[0,0]], requires_grad=True).send(bob)
target = th.tensor([[1.],[1], [0], [0]], requires_grad=True).send(bob)

weights = th.tensor([[0.],[0]], requires_grad=True).send(bob)

In [44]:
for i in range(10):
    
    # forweard propagation
    pred = input.mm(weights)
    
    # calculate loss
    loss = ((pred - target)**2).sum()
    
    # backpropagte
    loss.backward()
    
    # update weights
    weights.data.sub_(weights.grad * 0.1)
    weights.grad *= 0

    print(loss.get().data)

tensor(2.)
tensor(0.5600)
tensor(0.2432)
tensor(0.1372)
tensor(0.0849)
tensor(0.0538)
tensor(0.0344)
tensor(0.0220)
tensor(0.0141)
tensor(0.0090)


# Lesson: Common Errors

If you try to do an operation between two tensors which aren't on the same machine, you'll get an error that looks like this!!!

In [45]:
x = th.tensor([1,2,3,4,5]).send(bob)
y = th.tensor([1,1,1,1,1])

In [46]:
# z = y + x

Or, alterantively, if you try to interact with pointers to tensors on a worker which no longer exist, you'll see an error like this!!!

In [47]:
x = th.tensor([1,2,3,4,5]).send(bob)

# delete all objects on bob
bob._objects = {}

# y = x + x

# Lesson: Toy Federated Learning

Let's start by training a toy model the centralized way. This is about a simple as models get. We first need:

- a toy dataset
- a model
- some basic training logic for training a model to fit the data.

In [52]:
import syft as sy
from torch import nn, optim

# A Toy Dataset
data = th.tensor([[1.,1],[0,1],[1,0],[0,0]], requires_grad=True)
target = th.tensor([[1.],[1], [0], [0]], requires_grad=True)

# A Toy Model
model = nn.Linear(2,1)

def train():
    # Training Logic
    opt = optim.SGD(params=model.parameters(),lr=0.1)
    for iter in range(20):

        # 1) erase previous gradients (if they exist)
        opt.zero_grad()

        # 2) make a prediction
        pred = model(data)

        # 3) calculate how much the missed
        loss = ((pred - target)**2).sum()

        # 4) figure out which weights caused us to miss
        loss.backward()

        # 5) change those weights
        opt.step()

        # 6) print our progress
        print(loss.data)

In [53]:
train()

tensor(8.1490)
tensor(0.8144)
tensor(0.2000)
tensor(0.1117)
tensor(0.0763)
tensor(0.0538)
tensor(0.0383)
tensor(0.0275)
tensor(0.0199)
tensor(0.0145)
tensor(0.0106)
tensor(0.0078)
tensor(0.0058)
tensor(0.0043)
tensor(0.0032)
tensor(0.0024)
tensor(0.0018)
tensor(0.0014)
tensor(0.0010)
tensor(0.0008)


And there you have it! We've trained a basic model in the conventional manner. All our data is aggregated into our local machine and we can use it to make updates to our model. Federated Learning, however, doesn't work this way. So, let's modify this example to do it the Federated Learning way! 

So, what do we need:

- create a couple workers
- get pointers to training data on each worker
- updated training logic to do federated learning

    New Training Steps:
    - send model to correct worker
    - train on the data located there
    - get the model back and repeat with next worker

In [57]:
# create a couple workers

# get pointers to training data on each worker by
# sending some training data to bob and alice
data_bob = data[0:2].send(bob)
target_bob = target[0:2].send(bob)

data_alice = data[2:].send(alice)
target_alice = target[2:].send(alice)

# organize pointers into a list
datasets = [(data_bob,target_bob),(data_alice,target_alice)]

# Iniitalize A Toy Model
model = nn.Linear(2,1)

def train():
    # Training Logic
    opt = optim.SGD(params=model.parameters(),lr=0.1)
    for iter in range(20):
        
        # NEW) iterate through each worker's dataset
        for data, target in datasets:
            
            # NEW) send model to correct worker
            model.send(data.location)

            # 1) erase previous gradients (if they exist)
            opt.zero_grad()

            # 2) make a prediction
            pred = model(data)
            
            # 3) calculate how much the missed
            loss = ((pred - target)**2).sum()

            # 4) figure out which weights caused us to miss
            loss.backward()

            # NEW) get model (with gradients)
            model.get()

            # 5) change those weights
            opt.step()

            # 6) print our progress
            print(loss.get().data) # NEW) slight edit... need to call .get() on loss

In [58]:
train()

tensor(0.4912)
tensor(0.0277)
tensor(0.0269)
tensor(0.0227)
tensor(0.0116)
tensor(0.0134)
tensor(0.0066)
tensor(0.0077)
tensor(0.0038)
tensor(0.0044)
tensor(0.0022)
tensor(0.0026)
tensor(0.0013)
tensor(0.0015)
tensor(0.0008)
tensor(0.0008)
tensor(0.0004)
tensor(0.0005)
tensor(0.0003)
tensor(0.0003)
tensor(0.0002)
tensor(0.0002)
tensor(9.0527e-05)
tensor(9.2285e-05)
tensor(5.3592e-05)
tensor(5.3168e-05)
tensor(3.1872e-05)
tensor(3.0699e-05)
tensor(1.9059e-05)
tensor(1.7785e-05)
tensor(1.1472e-05)
tensor(1.0354e-05)
tensor(6.9584e-06)
tensor(6.0697e-06)
tensor(4.2582e-06)
tensor(3.5914e-06)
tensor(2.6323e-06)
tensor(2.1511e-06)
tensor(1.6459e-06)
tensor(1.3084e-06)


And voila! We now are training a very simple Deep Learning model using Federated Learning! We send the model to each worker, generate a new gradient, and then bring the gradient back to our local server where we update our global model. Never in this process do we ever see or request access to the underlying training data! We preserve the privacy of Bob and Alice!!!

## Shortcomings of this Example

So, while this example is a nice introduction to Federated Learning, it still has some major shortcomings. Most notably, when we call model.get() and receive the updated model from Bob or Alice, we can actually learn a lot about Bob and Alice's training data by looking at their gradiets. In some cases, we can restore their training data perfectly! 

So, what is there to do? Well, the first strategy people employ is to **average the gradient across multiple individuals before uploading it to the central server**. This strategy, however, will require some more sophisticated use of PointerTenor objects. So, in the next section, we're going to take some time to learn about more advanced pointer functionality and then we'll upgrade this Federated Learning example


# Lesson: Advanced Remote Execution Tools

In the last section we trained a toy model using Federated Learning. We did this by calling .send() and .get() on our model, sending it to the location of training data, updating it, and then bringing it back. However, at the end of the example we realized that we needed to go a bit further to protect people privacy. Namely, we want to average the gradients BEFORE calling .get(). That way, we won't ever see anyone's exact gradient (thus better protecting their privacy!!!)

But, in order to do this, we need a few more pieces:

- use a pointer to send a Tensor directly to another worker

And in addition, while we're here, we're going to learn about a few more advanced tensor operations as well which will help us both with this example and a few in the future!

In [59]:
bob.clear_objects()
alice.clear_objects()

# making sure that bob/alice know about each other
bob.add_worker(alice)
alice.add_worker(bob)

In [60]:
# this is a local tensor
x = th.tensor([1,2,3,4])
x

tensor([1, 2, 3, 4])

In [61]:
# this sends the local tensor to Bob
x_ptr = x.send(bob)

# this is now a pointer
x_ptr

(Wrapper)>[PointerTensor | me:8236342053 -> bob:42080168797]

In [62]:
# now we can SEND THE POINTER to alice!!!
pointer_to_x_ptr = x_ptr.send(alice)

pointer_to_x_ptr

(Wrapper)>[PointerTensor | me:12043791666 -> alice:8236342053]

### What happened?

So, in the previous example, we created a tensor called "x" and send it to Bob, creating a pointer on our local machine ("x_ptr"). 

Then, we called x_ptr.send(alice) which SENT THE POINTER to Alice. 

Note, this did NOT move the data! Instead, it moved the pointer to the data!! 

In [63]:
# As you can see above, Bob still has the actual data (data is always stored in a LocalTensor type). 
bob._objects

{42080168797: tensor([1, 2, 3, 4])}

In [64]:
# Alice, on the other hand, has x_ptr!! (notice how it points at bob)
alice._objects

{8236342053: (Wrapper)>[PointerTensor | alice:8236342053 -> bob:42080168797]}

In [65]:
# and we can use .get() to get x_ptr back from Alice

x_ptr = pointer_to_x_ptr.get()
x_ptr

(Wrapper)>[PointerTensor | me:8236342053 -> bob:42080168797]

In [66]:
alice._objects

{}

In [67]:
x_ptr

(Wrapper)>[PointerTensor | me:8236342053 -> bob:42080168797]

In [68]:
# and then we can use x_ptr to get x back from Bob!

x = x_ptr.get()
x

tensor([1, 2, 3, 4])

### Arithmetic on Pointer -> Pointer -> Data Object

And just like with normal pointers, we can perform arbitrary PyTorch operations across these tensors

In [69]:
bob._objects

{}

In [70]:
alice._objects

{}

In [71]:
p2p2x = th.tensor([1,2,3,4,5]).send(bob).send(alice)

y = p2p2x + p2p2x

In [72]:
bob._objects

{34279937833: tensor([ 2,  4,  6,  8, 10]),
 83449230889: tensor([1, 2, 3, 4, 5])}

In [73]:
alice._objects

{34279937833: (Wrapper)>[PointerTensor | alice:34279937833 -> bob:34279937833],
 88368792159: (Wrapper)>[PointerTensor | alice:88368792159 -> bob:83449230889]}

In [74]:
y.get().get()

tensor([ 2,  4,  6,  8, 10])

In [75]:
bob._objects

{83449230889: tensor([1, 2, 3, 4, 5])}

In [76]:
alice._objects

{88368792159: (Wrapper)>[PointerTensor | alice:88368792159 -> bob:83449230889]}

In [77]:
p2p2x.get().get()

tensor([1, 2, 3, 4, 5])

In [78]:
bob._objects

{}

In [79]:
alice._objects

{}

# Lesson: Pointer Chain Operations

So in the last section whenever we called a .send() or a .get() operation, it called that operation directly on the tensor on our local machine. However, if you have a chain of pointers, sometimes you want to call operations like .get() or .send() on the LAST pointer in the chain (such as sending data directly from one worker to another). To accomplish this, you want to use functions which are especially designed for this privacy preserving operation.

These operations are:

- my_poitner2pointer.end_get()
- my_pointer2pointer.move(another_worker)

Let's start with .end_get(). This one simply identifies the _last_ pointer in the chain and calls .get() on that pointer! It's an inline operation. Let's look at an example.

In [84]:
bob.clear_objects()
alice.clear_objects()

# # making sure that bob/alice know about each other
# bob.add_worker(alice)
# alice.add_worker(bob)

<VirtualWorker id:alice #tensors:0>

In [85]:
# x is now a pointer to a pointer to the data which lives on Bob's machine
x = th.tensor([1,2,3,4,5]).send(bob).send(alice)

In [86]:
bob._objects

{26866549336: tensor([1, 2, 3, 4, 5])}

In [87]:
alice._objects

{32822620901: (Wrapper)>[PointerTensor | alice:32822620901 -> bob:26866549336]}

In [88]:
x

(Wrapper)>[PointerTensor | me:5561320060 -> alice:32822620901]

In [89]:
x2 = x.remote_get()

In [90]:
x2

(Wrapper)>[PointerTensor | me:5561320060 -> alice:32822620901]

In [91]:
bob._objects

{}

In [92]:
alice._objects

{32822620901: tensor([1, 2, 3, 4, 5])}

In [93]:
x2.get()

tensor([1, 2, 3, 4, 5])

### Analyzing .remote_get()

Notice above when we called .remote_get(), it deleted bob's object and MOVED it to alice. So now Alice has the actual data (a LocalTensor). Thus, when we now call .get() on "x2" we will get the data back.

Now, you'll notice, before we called x2.get() we actually sent our tensor on a little journey. 

- First we sent the data to Bob. 
- Then we sent a pointer to the data to Alice
- Then we used Alice's pointer to MOVE the data to Alice (by calling .remote_get())

Thus, we used this series of operations to MOVE the data from Bob -> Alice without us actually seeing the data during the in-between step. As you might guess - .move() is just a convenience wrapper around this operation!

In [94]:
# x is now a pointer to a pointer to the data which lives on Bob's machine
x = th.tensor([1,2,3,4,5]).send(bob)

In [95]:
x

(Wrapper)>[PointerTensor | me:27739632450 -> bob:84332451695]

In [96]:
bob._objects

{84332451695: tensor([1, 2, 3, 4, 5])}

In [97]:
alice._objects

{}

In [98]:
x = x.move(alice)

In [99]:
bob._objects

{}

In [100]:
alice._objects

{27739632450: tensor([1, 2, 3, 4, 5])}

# Final Project: Federated Learning with Untrusted Central Server

In the next project, you will receive a distributed dataset. You are to train a model WITHOUT aggregating gradients to the central server. This will require using .remote_get() to send gradients directly from one remote worker to another when performing the aggregation.