## Communication Protocols

### OSI 7 layer model
![image.png](attachment:image.png)

### Hypertext transfer protocol (HTTP)
* HTTP is a method for encoding and transporting data between a client and a server. 
  + It is a request/response protocol: clients issue requests and servers issue responses with relevant content and completion status info about the request. 
  + HTTP is self-contained, allowing requests and responses to flow through many intermediate routers and servers that perform load balancing, caching, encryption, and compression.
* A basic HTTP request consists of a verb (method) and a resource (endpoint)
* HTTP is an application layer (layer 7) protocol relying on lower-level protocols such as TCP and UDP
* GRPC, REST and GraphQL are based on HTTP protocol
* Request
  + method
  + url
  + headers
  + body (optional)
* Response
  + status
  + headers
  + body (optional)
* method
  + GET: read
  + POST: create
  + PUT: update
  + DELETE: delete
  + PATCH: partial update
* HTTP status code
  + 100-199: informational
  + 200-299: successful
  + 300 -399: redirection
  + 400-499: client error
  + 500-599: server error
  + 200 OK
  + 201 created
  + 401 unauthorized
  + 403 forbidden
  + 404 not found
  + 500 internal server error
  + 503 server unavailable
  + 301 moved permanently
  + 100 continue
* Use HTTP over WebSocket when
  + all communications are client-driven with no server-driven
  + budget restricts to use webSocket
  + low throughput from client side

### Transmission Control Protocol (TCP)
* TCP is a connection-oriented protocol over an IP network. Connection is established and terminated using a handshake. All packets sent are guaranteed to reach the destination in the original order and without corruption by:
  + Sequence numbers and checksum fields for each packet
  + Acknowledgement packets and automatic retransmission
  + If the sender does not receive a correct response (a packet is missed or doesn't pass error-checking), it will resend the packets.
    + If there are multiple timeouts, the connection is dropped
* TCP also implements flow control and congestion control. 
  + These implementations will cause delays and generally result in less efficient transmission than UDP.
* To ensure high throughput, web servers can keep a large number of TCP connections open, resulting in high memory usage
  + It can be expensive to have a large number of open connections between web server threads and say, a memcached server
  + Connection pooling can help in addition to switching to UDP where applicable.
* TCP is the protocol that HTTP and Websocket protocols are based on
* TCP is useful for applications that require high reliability but are less time critical. 
  + Some examples include web servers, database info, SMTP, FTP, and SSH.
* Use TCP over UDP when:
  + all the data need to arrive intact
  + automatically make a best estimate use of the network throughput

### User Datagram Protocol (UDP)
* UDP is connectionless. 
* Datagrams (analogous to packets) are guaranteed only at the datagram level. 
  + Datagrams might reach their destination out of order or not at all. 
  + UDP does not support congestion control. 
  + Without the guarantees that TCP support, UDP is generally more efficient and faster.
* UDP can broadcast, sending datagrams to all devices on the subnet. 
  + This is useful with DHCP because the client has not yet received an IP address, thus using TCP to stream is impossible due to the lack of IP address information.
* UDP is less reliable but works well in real time use cases such as VoIP, video chat, streaming, and real time multiplayer games.
* Use UDP over TCP when:
  + Need lowest latency
  + Late data is worse than loss of data
  + Need to implement your own error correction
* good for constant stream of data and where a lot of data are sent fast
  + montoring metrics, video streaming, stock exchange and gaming
    + we don't care if we lose some data point, eg. monitoring CPU utilization, which is 100 times/sec

### Long polling
* When start making a request to the server, clients wait for a long time for response, which is defined by timeout
* once the server has new data, it will respond and client, afte receiving the data, will open a new request that stay open for a long time for the next new data
* server can maintain a connection for a long time, and send response to client to decrease the empty response
  + compared to Ajax requests where clients keep sending requsts to server for new data, this greatly reduces the number of requests sent to server
* hard to implement in some languages and frameworks
* for web servers based on threads/processes, you can't maintain too many long living connections that way

### Remote Procedure call (RPC)

#### Overview of RPC
* RPC is focused on exposing behaviors. 
  + often used for performance reasons with internal communications, as you can hand-craft native calls to better fit your use cases.
* In an RPC, a client calls a procedure to execute on a different address space, usually a remote server. 
  + The procedure is coded as if it were a local procedure call, abstracting away the details of how to communicate with the server from the client program. 
  + in another word, RPC allows a program to execute code on another computer without having to understand the network details
* Remote calls are usually slower and less reliable than local calls so it is helpful to distinguish RPC calls from local calls. 
* Popular RPC frameworks include Protobuf, Thrift, and Avro.
* RPC is a request-response protocol:
  + Client program - calls the client stub procedure. The parameters are pushed onto the stack like a local procedure call
  + Client stub procedure - marshals (packs) procedure id and arguments into a request message
  + Client communication module - OS sends the message from the client to the server
  + Server communication module - OS passes the incoming packets to the server stub procedure
  + Server stub procedure - Un-marshalls the results, calls the server procedure matching the procedure id and passes the given arguments
  + Server response repeats the steps above in reverse order
* How to implement a RPC
  + defines function in an abstract manner without using a specific programming language 
  + pass this function definition and your language of choice to a generator and you will get a stub of your function
    + generator takes description as input, and creates implementation in a particular language
    + RPC takes care of communication, marshalling and unmarshalling of messages
  + by doing this, we implement different services in different languages, and they communicate to each other
  
#### GRPC
* developed by Google
* Uses protobuf as IDL (Interface Description Language)
* uses HTTP2 as transport
* main disadvantage is that GRPC can not be used in browsers
  + you can not use GRPC to support both mobile and web clients 
* protobuf
  * is a binary protocl
  * not human readable
  * description stored in .proto files
  * messages are smaller and faster than JSON or XML
  * protoc generates client and server files together to make sure they will work together
    + protoc uses .proto file and the supported language as input and output the client and server files
* summary of GRPC
  + language agnostic
  + based on protobuff and HTTP2
  + binary, more efficient than JSON, but not human readable
  + great for communicating between services
  + not supported by browsers
  + you write definition of request, response, client and server in .proto files, and then generate code using command line tool
  
#### GraphQL (QL stands for "query language")
* released by Facebook in 2015
* comes to resolve two issues RESTFUL API has
  + overfetching
  + underfetching
* overfetching
  + app only need one field (key/value) of a large JSON object fetched back
  + This wastes resources to 
    + prepare data by merging tables, queries and format to JSON
    + network transportation
    + CPU and memory of mobile device to marshal JSON to response objects
* Based on HTTP
* requests and responses are in JSON format
* allows you to define which fields, and which nested entities to return
* consider to use it when designing external API
* example of request and response

```Javascript
    // request
    user (id:123) {
        alias,
        friends: { alias,
        first_name,
        last_name
        }
    }

    // response
    user: { alias: Logan,
        friends: [ { alias: Captain American,
                    first_name: Amy
                    last_name: Nielsen
                   },
                  { alias: Megan,
                    first_name: Megan
                    last_name: Lowrence
                   }
                  ]
    }
```

* RESTFUL vs GraphQL commands
  + GET corresponds to Query
  + POST, PUT and DELETE correspond to Mutation
* Advantages
  + specific about what data to fetch to improve efficiency
  + allows fetching nested entities with a single request
  + great for reporting system
* Disadvantages
  + results are less cacheable since using POST instead of GET
  + support outside of Javascipt ecosystem is not great
  

### Rest API

#### Overview of REST API
* REST is an architectural style enforcing a client/server model where the client acts on a set of resources managed by the server. 
* lose protocol built on top of HTTP
* Minimize the coupling between client and server
* Often used for public HTTP APIs
* Being stateless, good for horizontal scaling and partitioning
* All communication must be stateless and cacheable.
* There are four qualities of a RESTful interface:
  + Identify resources (RUI in HTTP)
  + Change with representations (Verbs in HTTP) use verbs, headers and body
  + Self-descriptive error message (HTTP status code)
  + Hypertext should be used to find your way through the API
* It standardize URL structure and HTTP verbs
* RESTFUL verbs
  + GET - read
    + idemptent
  + POST - create    
  + DELETE - delete
    + idemptent
  + PUT - update
    + idempotent (same results no matter how many times you execute the operation on the data)
  
#### RESTFUL URL Patterns
* RESTFUL URLs 
  + METHOD /\[resource\]/id
    + GET /users/123 (not user/123 or /getuser/123)
  + for nested structures, repeat the pattern by pairs
    + GET /users/123/books/567 (return book 567 borrowed by user 123)
    + DELETE /users/123/books/567 (delete book 567 borrowed by user 123)
    + GET /users/123/books (return all the books borrowed by user 123)
* RESTFUL URLs - state 
  + METHOD /\[resource/id\]/\[action\]
    + PUT /users/123/enable (is user 123 at enable status?)
    + PUT /users/123/disable (is user 123 at disable status?)
    + PUT /users/123/online (is user 123 online?)
    + POST /users/123/likes (how many times was user 123 liked?)
    + PUT method usually set boolean values, eg. is user 123 online? POST can be used to set counts
* RESTFUL URLs-pagination using query strings
  + METHOD /\[resource/id\]?limit=X & offset=100
    + GET /books?limit=50&offset=100
  + use query strings since they are optional so we can use defaults when query strings are not set
* Summary
  + use pairs of resource name/resource id to structure your URLs
  + GET shouldn't change anything
  + use PUT to switch states (boolean)
  + provide pagination

### Web Socket
* A WebSocket connection is a persistent connection
* A bidirectiona protocol where communication can be initiated by the client or server as long as there is an open connection
* Optimized for high-frequency communication
* built on TCP as a duplex protocol
* client connect to server only once
  + once connection is established, both client and server can send messages simultaneously, without the other party's response/request
* advantages over HTTP
  + connection is established only once
    + reduce the cost to create connections for each request
  + real time message delivery to client
  + don't need to poll message
    + whenever new messages are pushed, we get the new message and reduces latency
    + saves on CPU and bandwidth to send poll requests from client and no need for server to process poll requests
* disadvantages
  + high cost to maintain persistent connections with millions of users when communication is at a low throughput on client-side, or always client-driven
* An example of how WebSocket works
  + U1 and U2 will establish HTTP connections with chat server, which are then upgraded to a WebSocket connection 
  + When U1 sends a message for U2 via the chat server, the server will store the message along with its status as RECEIVED
  + if chat server has an open connection with U2, it will send the message to U2 and update the status to SENT
  + If U2 was not online and there was no open connection between U2 and the server
    + chat server saves the messages 
    + when U2 comes online, it will request the server to send all pending messages
    + chat server sends all messages with the status RECEIVED and update the status to SENT
* when to use
  + communication from the client is at a high throughput, it is more cost efficient to persist the connection rather than creating a new one for each request
  + communication is driven by both client and server
* Socket
  + There are two types of sockets: network socket and unix domain socket
  + network socket
    + for remote calls like browser accessing server. Even if when browser and server are on the same machine, the browser will still use remote calls to communicate with server
    + The following is how network socket works:
      + when you start your server, it starts on a specific port and listens to the corresponding socket
      + when you open localhost:8080 in a browser, the browser opens a socket and connects to the socket of the server
  + unix domain socket
    + limited to the same machine and is much faster than remote sockets

### General considerations when selecting protocols
* Whether the APIs are external
  + extenal means API talks between external clients and server
  + internal means communications are just between servers
  + GRPC supports for mobile is limited and doesn't support web browsers
* Support of bidirectional communications
  + WebSocket and UDP support bidirectional communications
  + long polling somewhat supports it
  + RESTFUL, GRPC and GraphQL don't
* Support of high throughput with thousands of messages/sec
  + The most performant protocols are WebSockets, GRPC and UDP
  + UDP are faster than WebSockets and GRPC, but less reliable
* Web browser support
  + REST, WebSockets and GraphQL support web browser, but GRPC and UDP don't