# Data Streaming{background-color="white" background-image="images/stream-data.png" background-size="50%" background-opacity="1"}

## A bit of history
:::: {.columns}

::: {.fragment .column width="50%"}
![](https://i.imgflip.com/54ff5r.jpg)
[NicsMeme](https://imgflip.com/i/54ff5r)
::: 

::: {.fragment .column width="50%"}
![](https://i.imgflip.com/54ffdn.jpg)
[NicsMeme](https://imgflip.com/i/54ffdn)
:::
::::

# Message-oriented middleware (MOM)
<https://www.researchgate.net/publication/271436605_Extending_message-oriented_middleware_using_interception>

## What is a MOM ?

:::: {.columns}

::: {.fragment .column width="50%"}

- Software or hardware infrastructure supporting **sending** and **receiving** messages between _distributed systems_.
- MOM allows _application modules_ to be distributed over **heterogeneous** platforms and reduces the complexity of developing applications that span multiple operating systems and network protocols.
- The **middleware** creates a distributed communications **layer** that insulates the application developer from the details of the various operating systems and network interfaces. 
- APIs that extend across diverse platforms and networks are typically provided by MOM
::: 

::: {.fragment .column width="50%"}
![MOM async](images/mom-async.PNG)
:::
::::

### Async interaction model example

:::: {.columns}

::: {.fragment .column width="50%"}
![Traditional call ](images/207px-Google_Voice_icon.png)
::: 

::: {.fragment .column width="50%"}
![Voice Message](images/messaggi-vocali-whatsapp.jpg)
:::
::::

### Blocking async :)
![](images/vocale10min.jpg)
[Source](https://www.facebook.com/BoomFriendzoned/posts/grazie-a-francesco-gottuso/1114123052108547/)

## Advantages
https://en.wikipedia.org/wiki/Message-oriented_middleware#Advantages

### Asynchronicity

:::: {.columns}

::: {.fragment .column width="50%"}
1. A client makes an API call to send a message to a destination managed by the __provider__.
2. The call invokes __provider services__ to route and deliver the message.
3. Once it has sent the message, the client can continue to do other work, **confident** that the provider retains the message until a receiving client retrieves it. 

::: 

::: {.fragment .column width="50%"}
![](images/async-pray.jpg)
:::
::::

### Loosely Coupled
The message-based model, coupled with the mediation of the provider, makes it possible to create a system of loosely coupled components

![](images/coupling-sketches-cropped-1.png){.r-scretch }


### Routing

:::: {.columns}

::: {.fragment .column width="50%"}
- Many message-oriented middleware implementations depend on a message queue system
- Some implementations permit routing logic to be provided by the messaging layer itself, while others depend on client applications to provide routing information or allow for a mix of both paradigms.
- Some implementations make use of broadcast or multicast distribution paradigms.
::: 

::: {.fragment .column width="50%"}
![](images/message-queue.jpg)

<https://www.slideshare.net/Bozho/overview-of-message-queues>
:::
::::


### Transformation
:::: {.columns}

::: {.fragment .column width="50%"}
- In a message-based middleware system, the message received at the destination need _not be identical_ to the message originally sent.
- In conjunction with the routing and broadcast/multicast facilities, one application can send a message in its own native format, and two or more other applications may each receive a copy of the message in their own native format.
- Many modern MOM systems provide sophisticated message transformation (or mapping) tools which allow programmers to specify transformation rules applicable to a simple GUI drag-and-drop operation.
::: 

::: {.fragment .column width="50%"}
![](images/coolmom.webp)

[The MOM with a message](https://raymondmeester.medium.com/the-mom-with-a-message-4a85bf14c04b)
:::
::::

## Disadvantages

### Less is more ?
- The primary disadvantage of many message-oriented middleware systems is that they require an extra component in the architecture, the message transfer agent (message broker).

- As with any system, adding another component can lead to reductions in performance and reliability, and can also make the system as a whole more difficult and expensive to maintain.

### Synchronous may be a need

:::: {.columns}

::: {.fragment .column width="50%"}
- In addition, many inter-application communications have an intrinsically synchronous aspect, with the sender specifically wanting to wait for a reply to a message before continuing (see real-time computing and near-real-time for extreme cases). 
- Because message-based communication inherently functions asynchronously, it may not fit well in such situations. 
- That said, most MOM systems have facilities to group a request and a response as a single pseudo-synchronous transaction.
- With a synchronous messaging system, the calling function does not return until the called function has finished its task.
::: 

::: {.fragment .column width="50%"}
It applies to humans also

![](https://www.process.st/wp-content/uploads/2024/02/How-to-Combat-Zoom-Fatigue-Synchronous-vs-Asynchronous-Communication-Rev1-07.png)

https://www.process.st/synchronous-vs-asynchronous-communication/
:::
::::


### Shit happens

:::: {.columns}

::: {.fragment .column width="50%"}
- In a loosely coupled asynchronous system, the calling client can continue to load work upon the recipient until the resources needed to handle this work are depleted and the called component fails. 

- Of course, these conditions can be minimized or avoided by monitoring performance and adjusting message flow, but this is work that is not needed with a synchronous messaging system.
::: 

::: {.fragment .column width="50%"}
![](https://i.imgflip.com/8lwdzn.jpg)
:::
::::


## In conclusion 

- The important thing is to understand the advantages and liabilities of each kind of system. Each system is appropriate for different kinds of tasks. 

- Sometimes, a combination of the two kinds of systems is required to obtain the desired behavior.

# MOM Implementation

## Message Broker

<https://en.wikipedia.org/wiki/Message_broker>

:::: {.columns}

::: {.fragment .column width="50%"}
- A message broker (also known as an integration broker or interface engine) is an intermediary computer program module that translates a message from the formal messaging protocol of the sender to the formal messaging protocol of the receiver.

- Message brokers are elements in telecommunication or computer networks where software applications communicate by exchanging formally-defined messages
::: 

::: {.fragment .column width="50%"}
![](images/message-broker.png)
:::
::::




## Functionalites of a message broker

:::: {.columns}

::: {.fragment .column width="50%"}
**Roles**

- The primary purpose of a broker is to take incoming messages from applications and perform some action on them.
- Message brokers can decouple end-points, meet specific non-functional requirements, and facilitate reuse of intermediary functions.
- For example, a message broker may be used to manage a workload queue or message queue for multiple receivers, providing reliable storage, guaranteed message delivery and perhaps transaction management  

::: 

::: {.fragment .column width="50%"}
**Actions**

- Route messages to one or more destinations
- Transform messages to an alternative representation
- Perform message aggregation, decomposing messages into multiple messages and sending them to their destination, then recomposing the responses into one message to return to the user
- Interact with an external repository to augment a message or store it
- Invoke web services to retrieve data
- Respond to events or errors
- Provide content and topic-based message routing using the publish–subscribe pattern

:::
::::


##  Message Broker Models
https://www.ibm.com/topics/message-brokers

### Point to Point messaging

- This is the distribution pattern utilized in message queues with a one-to-one relationship between the message’s sender and receiver. 
- Each message in the queue is sent to only one recipient and is consumed only once. 
- Point-to-point messaging is called for when a message must be acted upon only one time. 

Examples of suitable use cases for this messaging style include payroll and financial transaction processing. In these systems, both senders and receivers need a guarantee that each payment will be sent once and once only.


### Publish/subscribe messaging

- In this message distribution pattern, often referred to as “pub/sub,” the producer of each message publishes it to a topic, and multiple message consumers subscribe to topics from which they want to receive messages. 

- All messages published to a topic are distributed to all the applications subscribed to it. 

- This is a broadcast-style distribution method, in which there is a one-to-many relationship between the message’s publisher and its consumers. 

If, for example, an airline were to disseminate updates about the landing times or delay status of its flights, multiple parties could make use of the information: ground crews performing aircraft maintenance and refueling, baggage handlers, flight attendants and pilots preparing for the plane’s next trip, and the operators of visual displays notifying the public. A pub/sub messaging style would be appropriate for use in this scenario.

### Lots of approches

### Standardization 

- [JMS](https://jcp.org/en/jsr/detail?id=343 ): The Java Message Service (JMS)

### Protocol

| Protocol                          | Name                                       |First Released |
| --------                          | ----                                       |----           |
| [AMQP](https://www.amqp.org/)     | Advanced Message Queuing Protocol          | 2003          |
| [STOMP](https://stomp.github.io/) | Simple Text Oriented Messaging Protocol    | 2005 (tbc)    |
| [MQTT](https://mqtt.org/)         | Message Queuing Telemetry Transport        | 1999          |
| [XMPP](https://xmpp.org/)         | Extensible Messaging and Presence Protocol | 1999 


## List of message broker software 
https://en.wikipedia.org/wiki/Message_broker

## As a software 

### Apache
- [Apache Active MQ / Artemis](https://activemq.apache.org/) (incudes HornetQ, ex JBoss)
- [Apache Camel](https://camel.apache.org/)
- [Apache Kafka](https://kafka.apache.org/)
- [Apache Qpid](https://qpid.apache.org/)
- [Apache Thrift](https://thrift.apache.org/)
- [Apache Pulsar](https://pulsar.apache.org/)

### Rest of the world  

- [RabbitMQ](https://www.rabbitmq.com/)
- [Eclipse Mosquitto](https://mosquitto.org/)
- [Redis](https://redis.io/)
 

## As a service 

### Cloud
- [Amazon MQ](https://aws.amazon.com/amazon-mq/)
- [Amazon Simple Queue Service](https://aws.amazon.com/it/sqs/)
- [Amazon Simple Notification Service](https://aws.amazon.com/it/sns/)
- [Amazon Managed Streaming for Apache Kafka](https://aws.amazon.com/it/msk/)

- [Cloud Pub/Sub](https://cloud.google.com/pubsub/docs)
- [Azure Service Bus](https://azure.microsoft.com/en-us/products/service-bus)

### Vendors
- [IBM MQ](https://www.ibm.com/products/mq)
- [HiveMQ](https://www.hivemq.com/) 
- [Confluent](https://www.confluent.io/)

# Without Message Broker
- 0MQ 

http://wiki.zeromq.org/whitepapers:brokerless

[![](https://img.youtube.com/vi/s5hs9mw-GGg/0.jpg)](https://www.youtube.com/watch?v=s5hs9mw-GGg)

# From Messages to Event 
In a data streaming fashion

i.e. the story how brokers become stream processor

![](https://media.licdn.com/dms/image/D4D22AQG16L4_JNaVHg/feedshare-shrink_2048_1536/0/1712520852515?e=1715212800&v=beta&t=u7X2r6ETsZdGyGB4OIj_6N4tJi5zZ_7PvgxSWW8V9Mc)

# Data Stream
https://www.researchgate.net/publication/326508370_Definition_of_Data_Streams

- A data stream is a countably inﬁnite sequence of elements. 

- Different models of data streams exist that take different approaches with respect to the mutability of the stream and to the structure of stream elements

- Stream processing refers to analyzing data streams on-the-ﬂy to produce new results as new input data becomes available. 

- Time is a central concept in stream processing: in almost all models of streams, each stream element is associated with one or more timestamps from a given time domain that might indicate,for instance, when the element was generated,the validity of its content, or when it became available for processing.

# Stream Processing
https://medium.com/stream-processing/what-is-stream-processing-1eadfca11b97

Stream Processing is a Big data technology. 

It is used to query continuous data stream and detect conditions, quickly, within a small time period from the time of receiving the data. 

The detection time period varies from few milliseconds to minutes. 

For example, with stream processing, you can receive an alert when the temperature has reached the freezing point, querying data streams coming from a temperature sensor.

It is also called by many names: real-time analytics, streaming analytics, Complex Event Processing, real-time streaming analytics, and event processing. Although some terms historically had differences, now tools (frameworks) have converged under term stream processing. 

[see this Quora Question for a list of frameworks and last section of this article for history]
(https://www.quora.com/How-is-stream-processing-and-complex-event-processing-CEP-different)

If you want to build the App yourself:

- place events in a message broker topic (e.g. ActiveMQ, RabbitMQ, or Kafka), 

- write code to receive events from topics in the broker ( they become your stream) 

- publish results back to the broker. 

# Data Streaming Software
https://www.gartner.com/reviews/market/event-stream-processing

# Stream Processing vs MOM


![](images/momvsstream.svg)

# AWS



Amazon SQS offers a reliable, highly-scalable hosted queue for storing messages as they travel between applications or microservices. It moves data between distributed application components and helps you decouple these components. Amazon SQS provides common middleware constructs such as dead-letter queues and poison-pill management. It also provides a generic web services API and can be accessed by any programming language that the AWS SDK supports. Amazon SQS supports both standard and FIFO queues.

Amazon Kinesis Streams allows real-time processing of streaming big data and the ability to read and replay records to multiple Amazon Kinesis Applications. The Amazon Kinesis Client Library (KCL) delivers all records for a given partition key to the same record processor, making it easier to build multiple applications that read from the same Amazon Kinesis stream (for example, to perform counting, aggregation, and filtering).

![](images/aws-sqs.png)

![](images/aws-kinesis.jfif)

# Kafka 
can do both 
https://itnext.io/is-kafka-a-message-queue-or-a-stream-processing-platform-7decc3cf1cf

![](https://miro.medium.com/max/786/1*6iWJA-Pk0t20_h7ry_j5IQ.png)

## Research is on going

### A Survey on the Evolution of Stream Processing Systems
Marios Fragkoulis, Paris Carbone, Vasiliki Kalavri, Asterios Katsifodimos

Preprint, 3 Aug 2020
Published 2023 
https://link.springer.com/article/10.1007/s00778-023-00819-8

Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. This survey provides a comprehensive overview of fundamental aspects of stream processing systems and their evolution in the functional areas of out-of-order data management, state management, fault tolerance, high availability, load management, elasticity, and reconfiguration. We review noteworthy past research findings, outline the similarities and differences between early ('00-'10) and modern ('11-'18) streaming systems, and discuss recent trends and open problems.

![](images/overview-evolution-stream-processing-fragkoulis-2020.png)

## Event-Driven Architectures Done Right, Apache Kafka

[▶![](http://img.youtube.com/vi/A_mstzRGfIE/0.jpg)](https://www.youtube.com/watch?v=A_mstzRGfIE)
