# Summary

This notebook demonstrates how to create a SharedArray (SA) of input data that is accessible by remote processes.  In addition, another SharedArray (SR) is created by the master process for store results from remote processes.

## Prepare two worker processes

In [1]:
addprocs(2)

2-element Array{Int64,1}:
 2
 3

In [2]:
workers()

2-element Array{Int64,1}:
 2
 3

## Create SharedArray objects

- `SA` holds a 2D shared array with random numbers
- `SR` holds a 2-element shared array that stores summarized computation results

In [3]:
SA = convert(SharedArray{Float64, 2}, rand(6,2))

6×2 SharedArray{Float64,2}:
 0.297855  0.910513 
 0.719219  0.690423 
 0.104137  0.0844761
 0.979466  0.242356 
 0.321191  0.370191 
 0.370706  0.826142 

In [4]:
SR = SharedArray{Float64,1}(2)

2-element SharedArray{Float64,1}:
 0.0
 0.0

### Visibility and Data Transfer

Julia does not make the global variables, even for SharedArray's, visible in the worker processes until it is used. The following sequence of event demonstrates that.

Observation - When `@spawnat` macro references a global variable, the worker process suddenly becomes aware of such variable as well.  The `identity` function merely acts as a trigger.

Why do we want to make them visible?  It's convenient when a function needs to reference it without accepting it as an argument (which seems to kill performance for SharedArray's)... (to be confirmed.)

In [8]:
fetch(@spawnat 2 whos())

	From worker 2:	                          Base               Module
	From worker 2:	                          Core               Module
	From worker 2:	                          Main               Module


In [9]:
fetch(@spawnat 3 whos())

	From worker 3:	                          Base               Module
	From worker 3:	                          Core               Module
	From worker 3:	                          Main               Module


In [12]:
whos(r"S[AR]$")

                            SA    381 bytes  6×2 SharedArray{Float64,2}
                            SR    293 bytes  2-element SharedArray{Float64,1}


In [13]:
fetch(@spawnat 2 isdefined(:SA))

false

In [14]:
fetch(@spawnat 2 identity(SA))

6×2 SharedArray{Float64,2}:
 0.297855  0.910513 
 0.719219  0.690423 
 0.104137  0.0844761
 0.979466  0.242356 
 0.321191  0.370191 
 0.370706  0.826142 

In [15]:
fetch(@spawnat 2 isdefined(:SA))

true

In [16]:
fetch(@spawnat 2 whos())

	From worker 2:	                          Base               Module
	From worker 2:	                          Core               Module
	From worker 2:	                          Main               Module
	From worker 2:	                            SA    397 bytes  6×2 SharedArray{Float64,2}


In [17]:
fetch(@spawnat 3 whos())

	From worker 3:	                          Base               Module
	From worker 3:	                          Core               Module
	From worker 3:	                          Main               Module


In [18]:
@spawnat 3 identity(SA)

Future(3, 1, 34, Nullable{Any}())

In [19]:
fetch(@spawnat 3 whos())

	From worker 3:	                          Base               Module
	From worker 3:	                          Core               Module
	From worker 3:	                          Main               Module
	From worker 3:	                            SA    397 bytes  6×2 SharedArray{Float64,2}


In [20]:
@everywhere function foo()
    global SA
    sum(SA[:,1])
end

In [21]:
fetch(@spawnat 2 foo())

2.7925725961535957

In [22]:
@everywhere function bar()
    global SR
    SR[1]
end

In [23]:
fetch(@spawnat 2 bar())

LoadError: [91mOn worker 2:
[91mUndefVarError: SR not defined[39m
#35 at ./distributed/macros.jl:25
#103 at ./distributed/process_messages.jl:264 [inlined]
run_work_thunk at ./distributed/process_messages.jl:56
run_work_thunk at ./distributed/process_messages.jl:65 [inlined]
#96 at ./event.jl:73[39m

## Quick Tests about distributed computation

In [24]:
# using worker 2, compute and store results for first column
# it returns a future object
@spawnat 2 SR[1] = sum(SA[:,1])

Future(2, 1, 45, Nullable{Any}())

In [25]:
# examine SR.  Hopefully it's already filled in by worker 2.
SR

2-element SharedArray{Float64,1}:
 2.79257
 0.0    

In [26]:
# do the same thing with worker 3.  Again, don't wait for it.
@spawnat 3 SR[2] = sum(SA[:,2])

Future(3, 1, 46, Nullable{Any}())

In [27]:
# hooray!
SR

2-element SharedArray{Float64,1}:
 2.79257
 3.1241 

## Distributed vs Single Process Computation

Each worker will compute the sum of random numbers and accumulate the same value in a for-loop 100mm times.  Ther result is stored in the SR array.

We can see that the distributed computing version is about 2x as fast.

In [28]:
# distributed version
SR[1:2] = 0.0
display(SR)
tic()
f1 = @spawnat 2 SR[1] = begin x = 0.0; for i in 1:100_000_000 x += sum(SA[:,1]); end; x; end;
f2 = @spawnat 3 SR[2] = begin x = 0.0; for i in 1:100_000_000 x += sum(SA[:,2]); end; x; end;
@time wait.([f1, f2])
display(SR)
toc()

2-element SharedArray{Float64,1}:
 0.0
 0.0

2-element SharedArray{Float64,1}:
 2.79257e8
 3.1241e8 

 21.165758 seconds (703.90 k allocations: 37.524 MiB, 0.57% gc time)
elapsed time: 21.593016871 seconds


21.593016871

In [29]:
# single-process version
SR[1:2] = 0.0
display(SR)
tic()
SR[1] = begin x = 0.0; for i in 1:100_000_000 x += sum(SA[:,1]); end; x; end;
SR[2] = begin x = 0.0; for i in 1:100_000_000 x += sum(SA[:,2]); end; x; end;
display(SR)
toc()

2-element SharedArray{Float64,1}:
 0.0
 0.0

2-element SharedArray{Float64,1}:
 2.79257e8
 3.1241e8 

elapsed time: 37.126015899 seconds


37.126015899

## Clean up

Shutting down the compute worker nodes.

In [30]:
rmprocs(workers())

Task (done) @0x000000012130a890

In [31]:
procs()

1-element Array{Int64,1}:
 1