#Protocol Buffers - Interfacing Julia with complex systems

# What is a large/complex system?

Large applications often have:
- Multiple interacting components
- Interfaces to other applications
- Heterogeneous environments
- Separate development/deployment cycles
- Continuous availability requirement

<img src="AppComplexity.png"/>


###*Large applications are often more complex than this

Important to:
- define interfaces clearly
- handle changes
- interoperate within a heterogeneous environment

Without introducing a lot of inefficiency & complexity.

Protocol Buffers let us do much of that.

# Protocol Buffers

- A data exchange (serialization) format
- Open-sourced by Google. (https://developers.google.com/protocol-buffers/)

Think XML, but smaller, faster, and simpler.

- Structured & Extensible
- Language / Platform Neutral
- Compact

## Large applications using ProtoBuf:
- Used internally at Google
- Much of the Hadoop ecosystem:
    - HDFS & Yarn
    - HBase
    - Parquet
- Search reveals many others
- Used for both:
    - data transmission
    - storage format


## Julia apps using ProtoBuf:
- Elly.jl (https://github.com/JuliaParallel/Elly.jl) - a HDFS and Yarn interface

# ProtoBuf.jl - A Julia implementation
- https://github.com/tanmaykm/ProtoBuf.jl
- .proto to .jl code generator
- serialization/deserialization

Getting it:
- Pkg.add("ProtoBuf")
- Install protoc from https://github.com/google/protobuf

In [5]:
using ProtoBuf

# Exploring - Message Structure
- Message is a set of fields
    - Typed (determines wire format)
    - Tagged (indicates sequence, presence)
    - Have rules (required/optional/repeated/packed)

Allows validation of:
- field data types
- presence of required fields

In [6]:
protodef = """
package stocks;                 // Contain these in the stocks namespace

message Quote{                  // Stock quote has...
    required string symbol = 1; // the stock symbol
    required double price = 2;   // and it's price
}

message Portfolio{              // A portfolio can have...
    repeated Quote quote = 1;   // a list of quotes
    required int32 count = 2;   // number of quotes in portfolio
}
"""

run(`mkdir -p /tmp/proto`)

open("/tmp/proto/stocks.proto", "w") do f
    write(f, protodef)
end;

In [8]:
@printf("%s\n%10s %20s\n%s\n", "="^33, "Field Type", "Julia Type", "="^33)
for (k,v) in ProtoBuf.WIRETYPES
    @printf("%10s %20s\n", k, v[4])
end

Field Type           Julia Type
      enum                Int32
    uint64               Uint64
   fixed64               Uint64
    sint32                Int32
       obj                  Any
    string               String
    sint64                Int64
     int64                Int64
  sfixed64                Int64
  sfixed32                Int32
      bool                 Bool
    uint32               Uint32
   fixed32               Uint32
     bytes       Array{Uint8,1}
     int32                Int32
    double              Float64
     float              Float32


#Exploring - Language &amp; Platform Neutrality

- Java, Python, C++ supported by default
- Many others (including Julia) through [3rd party add-ons](https://github.com/google/protobuf/wiki/Third-Party-Add-ons)
- Add-ons for about 30 different languages

In [6]:
run(`mkdir -p /tmp/proto/py`)
run(`mkdir -p /tmp/proto/jl`)
run(`protoc --proto_path=/tmp/proto --python_out=/tmp/proto/py /tmp/proto/stocks.proto`)
run(`protoc --proto_path=/tmp/proto --julia_out=/tmp/proto/jl /tmp/proto/stocks.proto`)

In [7]:
pysrvr = """
import socket
import stocks_pb2

def add_quote(portfolio, symbol, price):
    q = portfolio.quote.add()
    q.symbol = symbol
    q.price = price

def get_portfolio():
    portfolio = stocks_pb2.Portfolio()
    add_quote(portfolio, 'GOOG', 400)
    add_quote(portfolio, 'FB', 80)
    portfolio.count = 2
    return portfolio.SerializeToString()

def serve():
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.bind(('0.0.0.0', 9999))
    s.listen(1)
    conn, addr = s.accept()
    conn.send(get_portfolio())
    conn.close()
    s.close()

serve()
"""

open("/tmp/proto/py/srvr.py", "w") do f
    write(f, pysrvr)
end;

In [8]:
jlclnt = """
include("stocks.jl")
using stocks
using ProtoBuf

clnt = connect(9999)
portfolio = readproto(clnt, Portfolio())
println(portfolio)
close(clnt)
"""

open("/tmp/proto/jl/clnt.jl", "w") do f
    write(f, jlclnt)
end;

In [9]:
;python /tmp/proto/py/srvr.py &

In [10]:
;julia /tmp/proto/jl/clnt.jl

stocks.Portfolio([stocks.Quote("GOOG",400.0),stocks.Quote("FB",80.0)],2)


## Exploring - Extensible Structures

- Upgrade to compatible field types
- Add new fields

In [11]:
protodef = """
package stocks2;                 // Contain these in the stocks namespace

message Quote{                   // Stock quote has...
    required string symbol = 1;  // the stock symbol
    required double price = 2;   // and it's price
}

message Portfolio{               // A portfolio can have...
    repeated Quote quote = 1;    // a list of quotes
    required int64 count = 2;    // number of quotes <<== changed int32 to int64
}
"""

open("/tmp/proto/stocks2.proto", "w") do f
    write(f, protodef)
end;

run(`protoc --proto_path=/tmp/proto --julia_out=/tmp/proto/jl /tmp/proto/stocks2.proto`)

In [12]:
jlclnt = """
include("stocks2.jl")
using stocks2
using ProtoBuf

clnt = connect(9999)
portfolio = readproto(clnt, Portfolio())
println(portfolio)
close(clnt)
"""

open("/tmp/proto/jl/clnt2.jl", "w") do f
    write(f, jlclnt)
end;

In [13]:
;python /tmp/proto/py/srvr.py &

In [14]:
;julia /tmp/proto/jl/clnt2.jl

stocks2.Portfolio([stocks2.Quote("GOOG",400.0),stocks2.Quote("FB",80.0)],2)


## Exploring - Efficiency & Compactness

- Very compact serialized form
- often much smaller compared to others

In [10]:
using ProtoBuf
using JSON
using HDF5, JLD
using Compat

type TestType
    b::Bool
    i32::Int32
    iu32::UInt32
    i64::Int64
    ui64::UInt64
    f32::Float32
    f64::Float64
    s::ASCIIString

    ab::Array{Bool,1}
    ai32::Array{Int32,1}
    ai64::Array{Int64,1}
    af32::Array{Float32,1}
    af64::Array{Float64,1}
    as::Array{AbstractString,1}
end # type TestType

In [11]:
function julia_ser(t::TestType)
    iob = PipeBuffer()
    serialize(iob, t)
    iob.size
end

function proto_ser(t::TestType)
    iob = PipeBuffer()
    writeproto(iob, t)
    iob.size
end

function json_ser(t::TestType)
    iob = PipeBuffer()
    JSON.print(iob, t)
    iob.size
end

function jld_ser(t::TestType)
    jldopen("/tmp/t.jld", "w") do file
        write(file, "T", t)
    end
    filesize("/tmp/t.jld")
end;

In [13]:
t = TestType(
        randbool()
        ,rand(-100:100), rand(1:100)
        ,rand(-100:100), rand(1:100)
        ,float32(rand()*100), float64(rand()*100)
        ,randstring(100)
        ,convert(Array{Bool,1}, randbool(100))
        ,round(Int32, 127*rand(50))
        ,round(Int64, 127*rand(50))
        ,rand(Float32, 50)
        ,rand(Float64, 50)
        ,[randstring(10) for i in 1:50]
    )

@printf("%s\n%15s %20s\n%s\n", "="^43, "Method", "Serialized Size", "="^43)
@printf("%15s %20d\n", "ProtoBuf", proto_ser(t))
@printf("%15s %20d\n", "Julia", julia_ser(t))
@printf("%15s %20d\n", "JSON", json_ser(t))
@printf("%15s %20d\n", "JLD", jld_ser(t))

         Method      Serialized Size
       ProtoBuf                 1835
          Julia                 2163
           JSON                 3254
            JLD                32792


## ProtoBuf may NOT be the best when:
- Serialized data needs to be human readable
- Self descriptive objects required
    - When schema can't be shared between parties
- Can't define a schema (arbitrary structure for data)