Skip to content

Files

Latest commit

 

History

History
131 lines (46 loc) · 10 KB

event-streams-in-action.md

File metadata and controls

131 lines (46 loc) · 10 KB

Event Streams in Action

> Home

Chapter 2. The unified log

This helps illustrate the fact that Kafka is serving as a kind of event stream database. (link)

look at it another way, our event consists of two pieces of event metadata (namely, the event and the timestamp), and two business entities (the shopper and the product) (link)

It is up to us to define the internal format of our events—a process we call modeling our events. (link)

We need a way of formalizing this structure further, ideally into a data serialization format that is understandable by humans but also can be parsed by computers (link)

Ordered means that the unified log gives each event in a shard a sequential ID number (sometimes called the offset) that uniquely identifies each message within the shard. (link)

To make it easier to work across a cluster of machines, unified logs tend to divide the events in a given event stream into multiple shards (sometimes referred to as partitions); (link)

A unified log will replicate all events within the cluster. (link)

The unified log is distributed because it lives across a cluster of individual machines. (link)

Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. (link)

A unified log is an append-only, ordered, distributed log that allows a company to centralize its continuous event streams. (link)

Preface

We believe that reframing your business in terms of a continuous stream of events offers huge benefits. (link)

Chapter 3. Event stream processing with Apache Kafka

The important thing to understand is that we are looking up the shopper’s IP address in MaxMind, and if it’s found, we are attaching the shopper’s country and city to the outgoing enriched event. If anything goes wrong on the way, we write that error message out to the “bad” topic. (link)

In multiple-event processing, we have to read multiple events from the event stream in order to generate some kind of output. (link)

The first case, single-event processing, is straightforward to implement: we read the next event from our continuous event stream and apply some sort of transformation to it. (link)

Chapter 4. Event stream processing with Amazon Kinesis

We can change that by temporarily creating an arbitrarily large file on our hard drive, using the fallocate command. (link)

The AWS CLI and boto expose the exact same primitives for stream processing, which is unsurprising, given that the AWS CLI is built on boto! (link)

import base64 base64.b64decode (link)

We specified that the shard iterator should be of type TRIM_HORIZON. This is AWS jargon for the oldest events in the shard that have not yet been trimmed—expired for being too old. (link)

Before we can read events from our stream, we need to retrieve what Amazon calls a shard iterator for each shard in the stream. (link)

Apache Spark Streaming, which is Spark’s microbatch processing framework, (link)

AWS Lambda is a fully managed stream processing platform running on a Node.js cluster. (link)

Part of the reason that this code is so simple is that the AWS CLI tool that you configured earlier uses boto, the AWS SDK for Python, under the hood. Therefore, boto can access the AWS credentials that you set up earlier in the AWS CLI without any trouble. (link)

Figure 4.1. In push-based systems monitoring, an agent pushes metrics at regular intervals into a centralized system. By contrast, in pull-based architectures, the centralized system regularly scrapes metrics from endpoints available on the servers. (link)

For example, Zabbix and Prometheus are predominantly pull-based systems with some push support (link)

This chapter introduces Amazon Kinesis (https://aws.amazon.com/kinesis/), a hosted unified log service available as part of Amazon Web Services. Developed internally at Amazon to solve its own challenges around log collection at scale, Kinesis has extremely similar semantics to Kafka (link)

Chapter 1. Introducing event streams

Have different users consume different versions of our applications (link)

If we can have multiple applications reading from the unified log, then it follows that we can also have multiple versions of the same application processing events from the unified log. This is hugely useful, as it allows us to hot swap our data processing applications—to upgrade our applications without taking them offline. (link)

Whenever the customer looks like they are about to abandon their shopping cart, pop up a coupon in their web browser to coax them into checking out. (link)

Together, the unified log plus Hadoop archive represent our single version of the truth. (link)

A unified log is an append-only log to which we write all events generated by our applications. (link)

See figure 1.9 for an example of a hybrid-era architecture. (link)

The nice thing about NSQ for demonstration purposes is that it is super simple to install and set up (link)

Best practice says that you write the log events to disk as log files, and then use a log collection technology, such as Flume, Fluentd, Logstash, or Filebeat, to collect the log files from the individual servers and ingest them into a tool for systems monitoring or log-file analysis. (link)

Hoshi Ryokan is one of the oldest businesses in the world, having been founded in AD 718. (link)

Simply put, a continuous event stream is an unterminated succession of individual events, ordered by the point in time at which each event occurred. (link)

Fortunately, the definition is simple: an event is anything that we can observe occurring at a particular point in time. (link)

a new architectural pattern called the unified log promises to simplify things again. (link)

> Home