Apache Flink Prometheus Connector demo - Vehicle IoT

This repository contains an end-to-end demo demonstrating the integration of Apache Flink and Prometheus using the Flink Prometheus connector.

The demo simulates a fleet of hybrid vehicles continuously emitting several types of telemetry events.

The goal is displaying real-time dashboards and setting up real-time alarms, using Prometheus and Grafana.

To make the telemetry event actionable, the raw events must be pre-processed using Apache Flink, before writing them to Prometheus.

The demo is designed to run on Amazon Managed Service for Apache Flink, Amazon Managed Streaming for Apache Kafka (MSK) and Amazon Managed Service for Prometheus (AMP), but it can be easily adapted to any other Apache Flink, Apache Kafka and Prometheus versions.

End-to-end architecture

The demo includes the following components:

Vehicle event generator: a Flink application generating simulated connected vehicle data and writing into Kafka.
Pre-processor: a Flink application reading raw vehicle events from Kafka, pre-processing with enrichment, reducing frequency via aggregation, reducing cardinality, and writing to Prometheus.
Raw event writer: a second Flink application reading raw vehicle events from Kafka, and writing them, without modifications, to Prometheus.

Vehicle telemetry data and pre-processing

Raw vehicle events

Example of raw vehicle events generated by Vehicle event generator:

{
  "eventType": "IC_RPM",
  "vehicleId": "V0000012345",
  "region": "UK",
  "timestamp": 1730621749,
  "measurement": 4356
}

Dimension cardinality:

eventType
1. IC_RPM: Internal Combustion engine RPM gauge
2. ELECTRIC_RPM: Electric motor RPM gauge (the fictional vehicles are hybrid)
3. WARNINGS: Count warning lights currently on in the vehicle
region: cardinality = 20
vehicleId: configurable (see Configuring Vehicle event generator)

All events are written to a single Kafka topic, serialized as JSON.

Pre-processing

The Pre-processor application simulates the following operations:

Enrich raw events adding a vehicleModel, based on the vehicleId. In the real world this would happen looking up a reference database, possibly pre-loading or caching the results. For simplicity, in this demo the map
vehicleId -> vehicleModel is hardwired.
Calculate derived metrics: Is the vehicle in motion? When either IC_RPM or ELECTRIC_RPM are not zero.
Reduce frequency and cardinality: Aggregate raw events by vehicleModel and region over 5 second windows (configurable).
Write the aggregate metrics to Prometheus.

Aggregate metrics

Example of aggregate metric written by Pre-processor to Prometheus:

{
  "Labels": [
    { "name": "__name__", "value": "vehicles_in_motion" },
    { "name": "model", "value": "Nebulara" },
    { "name": "region", "value": "US" }
  ],
  "Samples": [
    { "timestamp": 1730621749, "value": 42.0 }
  ]
}

The two metrics written by Pre-processor to Prometheus are:

vehicles_in_motion: Count of unique vehicles in motion in the 5 seconds time window, per model, per region
warnings: Count of vehicle warning lights on in the 5 seconds time window, per model, per region

Other dimensions cardinality:

region: cardinality = 20 (from the raw events)
model: cardinality = 8 (for simplicity, models are hardwired in the Pre-processor)

Running the demo on AWS

You can run the demo on AWS, using Amazon Managed Service for Apache Flink, Amazon Managed Streaming for Apache Kafka (MSK), Amazon Managed Service for Prometheus (AMP), and Amazon Managed Grafana.

Set up using CDK

Use the provided CDK stack to set up the VPC, the MSK cluster, build and deploy the Managed Service for Apache Flink applications.

(Alternative) Manual set up

Alternatively, you can set up the resources manually, following the step-by-step instructions

Additionally, you create and run 3 Amazon Managed Service for Apache Flink applications:

Vehicle data generator
Pre-processor
Raw event writer

Set up Grafana

Regardless you create the stack using CDK or manually, some additional steps are required to set up Grafana and the dashboard, following these instructions.

Run the demo locally, in IntelliJ

For testing and development, you can run the demo partly locally:

Run the Flink applications, Vehicle event generator, Pre-processor, and Raw event writer directly in IntelliJ. You do not need to install Apache Flink locally.
Run Kafka locally using the provided Docker compose stack. This simplifies the development setup not requiring access to MSK from your development machine.
You can use Amazon Managed Prometheus and Amazon Managed Grafana when running the Flink application locally, without setting up any special connectivity, as long as on your machine you are using AWS credentials with Remote-Write permissions to the AMP workspace.

See Setup for running the demo locally for details.

Flink applications runtime configuration

Runtime configuration of the Flink jobs.

Configuring Vehicle event generator

Runtime properties for Vehicle event generator.

To configure the application for running locally modify this JSON file.

Group ID	Key	Default	Description
`KafkaSink`	`bootstrap.servers`	N/A	Kafka cluster boostrap servers, for unauthenticated plaintext
`KafkaSink`	`topic`	`vehicle-events`	Topic name
`DataGen`	`vehicles`	`1`	Number of simulated vehicles
`DataGen`	`events.per.sec`	n/a	Number of simulated vehicle events generated per second
`DataGen`	`prob.motion.state.change`	`0.01`	Probability of each simulated vehicle to change its motion status, every time an event for that vehicle is generated (double, between `0.0` and `1.0`)
`DataGen`	`prob.warning.change`	`0.001`	Probability the number of warning lights will change in each simulated vehicle, every time an event for that vehicle is generated (double, between `0.0` and `1.0`)

Note: the "probability" parameters are used to make the randomly generated data a bit more "realistic" but they are not really important for the demo.

Configuring Pre-processor

Runtime properties for Pre-processor.

To configure the application for running locally modify this JSON file.

Group ID	Key	Default	Description
`KafkaSource`	`bootstrap.servers`	N/A	Kafka cluster boostrap servers, for unauthenticated plaintext
`KafkaSource`	`topic`	`vehicle-events`	Topic name
`KafkaSource`	`group.id`	`pre-processor`	Consumer Group ID
`KafkaSource`	`max.parallelism`	N/A	Max parallelism of the source operator. I can be used to limit the parallelism when the application parallelism is > partitions in the source topic. If not specified, the source uses the application parallelism.
`Aggregation`	`window.size.sec`	`5`	Aggregation window, in seconds
`PrometheusSink`	`endpoint.url`	N/A	Premetheus Remote-Write endpoint URL
`PrometheusSink`	`max.request.retry`	`100`	Max number of retries for write retryable errors (e.g. throttling) before the sink discard the write request and continue.

Configuring Raw event writer

Runtime properties for Raw event writer.

To configure the application for running locally modify this JSON file.

Group ID	Key	Default	Description
`KafkaSource`	`bootstrap.servers`	N/A	Kafka cluster boostrap servers, for unauthenticated plaintext
`KafkaSource`	`topic`	`vehicle-events`	Topic name
`KafkaSource`	`group.id`	`pre-processor`	Consumer Group ID
`KafkaSource`	`max.parallelism`	N/A	Max parallelism of the source operator. I can be used to limit the parallelism when the application parallelism is > partitions in the source topic. If not specified, the source uses the application parallelism.
`PrometheusSink`	`endpoint.url`	N/A	Premetheus Remote-Write endpoint URL
`PrometheusSink`	`max.request.retry`	`100`	Max number of retries for write retryable errors (e.g. throttling) before the sink discard the write request and continue.

Prometheus quotas and throttling

Consult the Amazon Managed Service for Prometheus Quotas for the default maximum Ingestion throughput per workspace.

If the ingestion to Prometheus exceed this limit, the Flink application is throttled. This can easily happen with the Raw event writer. We recommend to create a separate AMP workspace to test writing raw events.

Note that this limit is a soft quota. You can request a quota increase for real environment. Consult the AMP Quota documentation for details.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.run		.run
cdk		cdk
docs		docs
grafana		grafana
images		images
kafka-docker		kafka-docker
pre-processor		pre-processor
raw-event-writer		raw-event-writer
vehicle-event-generator		vehicle-event-generator
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache Flink Prometheus Connector demo - Vehicle IoT

End-to-end architecture

Vehicle telemetry data and pre-processing

Raw vehicle events

Pre-processing

Aggregate metrics

Running the demo on AWS

Set up using CDK

(Alternative) Manual set up

Set up Grafana

Run the demo locally, in IntelliJ

Flink applications runtime configuration

Configuring Vehicle event generator

Configuring Pre-processor

Configuring Raw event writer

Prometheus quotas and throttling

About

Releases

Packages

Contributors 3

Languages

License

aws-samples/flink-prometheus-iot-demo

Folders and files

Latest commit

History

Repository files navigation

Apache Flink Prometheus Connector demo - Vehicle IoT

End-to-end architecture

Vehicle telemetry data and pre-processing

Raw vehicle events

Pre-processing

Aggregate metrics

Running the demo on AWS

Set up using CDK

(Alternative) Manual set up

Set up Grafana

Run the demo locally, in IntelliJ

Flink applications runtime configuration

Configuring Vehicle event generator

Configuring Pre-processor

Configuring Raw event writer

Prometheus quotas and throttling

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages