forked from apache/inlong
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[INLONG-2029] add pulsar example document for the InLong (apache#230)
Co-authored-by: dockerzhang <dockerzhang@tencent.com>
- Loading branch information
1 parent
b4c2c6b
commit fe415de
Showing
16 changed files
with
188 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
--- | ||
title: Pulsar Example | ||
sidebar_position: 2 | ||
--- | ||
|
||
Apache InLong has increased the ability to access data through Apache Pulsar, taking full advantage of Pulsar's technical advantages that are different from other MQ, and providing complete solutions for data access scenarios with higher data quality requirements such as finance and billing. | ||
In the following content, we will use a complete example to introduce Apache Pulsar to access data through Apache InLong. | ||
|
||
![Create Group](img/pulsar-arch.png) | ||
|
||
## Install Pulsar | ||
Please refer to [Official Installation Guidelines](https://pulsar.apache.org/docs/en/standalone/). | ||
|
||
## Install Hive | ||
Hive is the necessary component. If you don't have Hive in your machine, we recommand using Docker to install it. Details can be found [here](https://github.com/big-data-europe/docker-hive). | ||
|
||
> Note that if you use Docker, you need to add a port mapping `8020:8020`, because it's the port of HDFS DefaultFS, and we need to use it later. | ||
## Install InLong | ||
Before we begin, we need to install InLong. Here we provide two ways: | ||
1. Install InLong with Docker by according to the [instructions here](deployment/docker.md).(Recommanded) | ||
2. Install InLong binary according to the [instructions here](deployment/bare_metal.md). | ||
|
||
Unlike InLong TubeMQ, if you use Apache Pulsar, you need to configure Pulsar cluster information | ||
in the Manager component installation. The format is as follows: | ||
``` | ||
# Pulsar admin URL | ||
pulsar.adminUrl=http://127.0.0.1:8080,127.0.0.2:8080,127.0.0.3:8080 | ||
# Pulsar broker address | ||
pulsar.serviceUrl=pulsar://127.0.0.1:6650,127.0.0.1:6650,127.0.0.1:6650 | ||
# Default tenant of Pulsar | ||
pulsar.defaultTenant=public | ||
``` | ||
|
||
## Create a data access | ||
### Configure data streams group information | ||
![](img/pulsar-group.png) | ||
When creating data access, the message middleware that the data stream group can use is Pulsar, | ||
and other configuration items related to Pulsar include: | ||
- Queue module: Parallel or Serial, when selecting parallel, you can set the number of topic partitions | ||
- Write quorum: Number of copies to store for each message | ||
- Ack quorum: Number of guaranteed copies (acks to wait before write is complete) | ||
- retention time: retention time for the consumed message | ||
- ttl: The default Time to Live for message | ||
- retention size: retention size for the consumed message | ||
|
||
### Configure data stream | ||
![](img/pulsar-stream.png) | ||
When configuring the message source, the file path in the file data source can be referred to [file-agent-configuration](https://inlong.apache.org/docs/next/modules/agent/file#file-agent-configuration). | ||
|
||
### Configure data information | ||
![](img/pulsar-data.png) | ||
|
||
### Configure Hive cluster | ||
Save Hive cluster information, click "Ok" to submit. | ||
![](img/pulsar-hive.png) | ||
|
||
## Data access Approval | ||
Enter **Approval** page, click **My Approval**, abd approve the data access application. After the approval is over, | ||
the topics and subscriptions required for the data stream will be created in the Pulsar cluster synchronously. | ||
We can use the command-line tool in the Pulsar cluster to check whether the topic is created successfully: | ||
![](img/pulsar-topic.png) | ||
|
||
## Configure File Agent | ||
When configuring the file agent, you must create the file in the directory specified when creating the data access: | ||
``` | ||
touch /data/test_file.txt; | ||
``` | ||
|
||
Write data to the file according to the data source format when creating the data stream: | ||
``` | ||
echo -e "1|test\n2|test\n" >> /data/test_file.txt | ||
``` | ||
|
||
## Data Check | ||
Finally, we log in to the Hive cluster and use Hive SQL commands to check | ||
whether data is successfully inserted in the `test_stream` table. | ||
|
||
## Troubleshooting | ||
If data is not correctly written to the Hive cluster, you can check whether the `DataProxy` and `Sort` related information are synchronized: | ||
- Check whether the topic information corresponding to the data stream is correctly written in the `conf/topics.properties` folder of `InLong DataProxy`: | ||
``` | ||
b_test_group/test_stream=persistent://public/b_test_group/test_stream | ||
``` | ||
|
||
- Check whether the configuration information of the data stream is successfully pushed in | ||
- the ZooKeeper monitored by `InLong Sort`: | ||
``` | ||
get /inlong_hive/dataflows/{{sink_id}} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+8.85 KB
i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-arch.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+21.7 KB
i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-data.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+25.5 KB
i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-group.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+23.5 KB
i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-hive.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+21.6 KB
.../zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-stream.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+32.6 KB
i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-topic.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
88 changes: 88 additions & 0 deletions
88
i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/pulsar_example.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
--- | ||
title: 使用 Pulsar 示例 | ||
sidebar_position: 2 | ||
--- | ||
|
||
Apache InLong 增加了通过 Apache Pulsar 接入数据的能力,充分利用了 Pulsar 不同于其它 MQ 的技术优势,为金融、计费等数据质量要求更高的数据接入场景,提供完整的解决方案。 | ||
在下面的内容中,我们将通过一个完整的示例介绍如何通过 Apache InLong 使用 Apache Pulsar 接入数据。 | ||
|
||
![Create Group](img/pulsar-arch.png) | ||
|
||
## 安装 Pulsar | ||
部署Apache Pulsar 集群可以参考[官方安装指引](https://pulsar.apache.org/docs/en/standalone/). | ||
|
||
## 安装 Hive | ||
Hive 是运行的必备组件。如果您的机器上没有 Hive,这里推荐使用 Docker 进行快速安装,详情可见 [这里](https://github.com/big-data-europe/docker-hive)。 | ||
|
||
> 注意,如果使用以上 Docker 镜像的话,我们需要在 namenode 中添加一个端口映射 `8020:8020`,因为它是 HDFS DefaultFS 的端口,后面在配置 Hive 时需要用到。 | ||
## 安装 InLong | ||
在开始之前,我们需要安装 InLong 的全部组件,这里提供两种方式: | ||
1. 按照 [这里的说明](deployment/docker.md),使用 Docker 进行快速部署。(推荐) | ||
2. 按照 [这里的说明](deployment/bare_metal.md),使用二进制包依次安装各组件。 | ||
|
||
区别于 InLong TubeMQ,如果使用 Apache Pulsar,需要在 Manager 组件安装中配置 Pulsar 集群信息,格式如下: | ||
``` | ||
# Pulsar admin URL | ||
pulsar.adminUrl=http://127.0.0.1:8080,127.0.0.2:8080,127.0.0.3:8080 | ||
# Pulsar broker address | ||
pulsar.serviceUrl=pulsar://127.0.0.1:6650,127.0.0.1:6650,127.0.0.1:6650 | ||
# Default tenant of Pulsar | ||
pulsar.defaultTenant=public | ||
``` | ||
|
||
## 创建数据接入 | ||
### 配置数据流Group 信息 | ||
![](img/pulsar-group.png) | ||
在创建数据接入时,数据流 Group 可选用的消息中间件选择 Pulsar,其它跟 Pulsar 相关的配置项还包括: | ||
- Queue module:队列模型,并行或者顺序,选择并行时可设置 Topic 的分区数,顺序则为一个分区; | ||
- Write quorum:消息写入的副本数 | ||
- Ack quorum:确认写入 Bookies 的数量 | ||
- retention time:已被 consumer 确认的消息被保存的时间 | ||
- ttl:未被确认的消息的过期时间 | ||
- retention size:已被 consumer 确认的消息被保存的大小 | ||
|
||
### 配置数据流 | ||
![](img/pulsar-stream.png) | ||
配置消息来源时,文件数据源中的文件路径,可参照 inlong-agent 中[File Agent的详细指引](https://inlong.apache.org/docs/next/modules/agent/file#file-agent-configuration)。 | ||
|
||
### 配置数据格式 | ||
![](img/pulsar-data.png) | ||
|
||
### 配置 Hive 集群 | ||
保存 Hive 集群信息,点击“确定”。 | ||
![](img/pulsar-hive.png) | ||
|
||
## 数据接入审批 | ||
进入**审批管理**页面,点击**我的审批**,审批上面提交的接入申请,审批结束后会在 Pulsar 集群同步创建数据流需要的 Topic 和订阅。 | ||
我们可以在 Pulsar 集群使用命令行工具检查 Topic 是否创建成功: | ||
![](img/pulsar-topic.png) | ||
|
||
## 配置文件 Agent | ||
在配置文件 Agent 时,需要根据数据接入创建时指定的目录下创建文件: | ||
``` | ||
touch /data/test_file.txt; | ||
``` | ||
|
||
按照创建数据流时的数据源格式,向文件中写入数据(可以按格式写入更多数据): | ||
``` | ||
echo -e "1|test\n2|test\n" >> /data/test_file.txt | ||
``` | ||
|
||
## 数据落地检查 | ||
|
||
最后,我们登入 Hive 集群,通过 Hive 的 SQL 命令查看 `test_stream` 表中是否成功插入了数据。 | ||
|
||
## 问题排查 | ||
如果出现数据未正确写入 Hive 集群,可以检查 `DataProxy` 和 `Sort` 相关信息是否同步: | ||
- 检查 `InLong DataProxy` 的 `conf/topics.properties` 文件夹中是否正确写入该数据流对应的Topic 信息: | ||
``` | ||
b_test_group/test_stream=persistent://public/b_test_group/test_stream | ||
``` | ||
|
||
- 检查 InLong Sort 监听的 ZooKeeper 中是否成功推送了数据流的配置信息: | ||
``` | ||
get /inlong_hive/dataflows/{{sink_id}} | ||
``` | ||
|
||
|