Skip to content

Commit

Permalink
[INLONG-1939] add basic concepts for InLong (apache#221)
Browse files Browse the repository at this point in the history
Co-authored-by: dockerzhang <dockerzhang@tencent.com>
  • Loading branch information
dockerzhang and dockerzhang committed Dec 9, 2021
1 parent 8ec2130 commit 74e006e
Show file tree
Hide file tree
Showing 9 changed files with 36 additions and 17 deletions.
13 changes: 11 additions & 2 deletions docs/design_and_concept/basic_concept.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,14 @@ title: Basic Concept
sidebar_position: 1
---

Will be added soon

| Name | Description | Other |
| ---- | ---- | ---- |
| Group | Data Streams Group, it contains multiple data streams, and one Group represents one data access. | Group has attributes such as ID and Name. |
| Stream | Data Stream, a stream has a specific flow direction. | Stream has attributes such as ID, Name, and data fields. |
| Agent | Represents various collection capabilities. | It contains File Agent, SQL Agent, Binlog Agent, etc. |
| DataProxy | Forward received data to different message queues. | Supports data transmission blocking, placing retransmission. |
| Sort | Data stream sorting | Sort-flink based on Flink, sort-standalone for local sorting. |
| TubeMQ | InLong's self-developed message queuing service | It can also be called Tube, with low-cost, high-performance features. |
| Pulsar | [Apache Pulsar](https://pulsar.apache.org/), a high-performance, high-consistency message queue service |
| Hive | [Apache Hive](https://hive.apache.org/), a data warehouse built on the Hadoop architecture |
| ClickHouse | [ClickHouse](https://clickhouse.com/), a high performance columnar OLAP database | |
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
title: File Collect
title: File
sidebar_position: 3
---

## File Collect Configuration
## File Agent Configuration
```
/data/inlong-agent/test.log //Represents reading the new file test.log in the inlong-agent folder
/data/inlong-agent/test[0-9]{1} // means to read the new file test in the inlong-agent folder followed by a number at the end
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/agent/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Overview
sidebar_position: 1
---

InLong-Agent is a collection tool that supports multiple types of data sources, and is committed to achieving stable and efficient data collection functions between multiple heterogeneous data sources including file, sql, Binlog, metrics, etc.
InLong-Agent is a collection tool that supports multiple types of data sources, and is committed to achieving stable and efficient data collection functions between multiple heterogeneous data sources including File, SQL, Binlog, Metrics, etc.

## Design Concept
In order to solve the problem of data source diversity, InLong-agent abstracts multiple data sources into a unified source concept, and abstracts sinks to write data. When you need to access a new data source, you only need to configure the format and reading parameters of the data source to achieve efficient reading.
Expand Down
8 changes: 4 additions & 4 deletions docs/quick_start/hive_example.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,21 +18,21 @@ Before we begin, we need to install InLong. Here we provide two ways:
## 3 Create a data access
After deployment, we first enter the "Data Access" interface, click "Create an Access" in the upper right corner to create a new date access, and fill in the data streams group information as shown in the figure below.

<img src="img/create-group.png" align="center" alt="Create Group"/>
![Create Group](img/create-group.png)

Then we click the next button, and fill in the stream information as shown in the figure below.

<img src="img/create-stream.png" align="center" alt="Create Stream"/>
![Create Stream](img/create-stream.png)

Note that the message source is "File", and we don't need to create a message source manually.

Then we fill in the following information in the "data information" column below.

<img src="img/data-information.png" align="center" alt="Data Information"/>
![Data Information](img/data-information.png)

Then we select Hive in the data flow and click "Add" to add Hive configuration

<img src="img/hive-config.png" align="center" alt="Hive Config"/>
![Hive Config](img/hive-config.png)

Note that the target table does not need to be created in advance, as InLong Manager will automatically create the table for us after the access is approved. Also, please use connection test to ensure that InLong Manager can connect to your Hive.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,14 @@ title: 基本概念
sidebar_position: 1
---

Will be added soon
| Name | Description | Other |
| ---- | ---- | ---- |
| Group | 数据流组,包含多个数据流,一个Group 代表一个数据接入 | Group 有ID、Name 等属性 |
| Stream | 数据流,一个数据流有具体的流向 | Stream 有ID、Name、数据字段等属性 |
| Agent | 代表各种采集能力 | 包含文件Agent、SQL Agent、Binlog Agent 等 |
| DataProxy | 将接收到的数据转发到不同的消息队列 | 支持数据发送阻塞和落盘重发 |
| Sort | 数据流分拣 | 主要有基于Flink的sort-flink,sort-standalone 本地分拣 |
| TubeMQ | InLong自带的消息队列服务 | 也可以叫Tube,拥有低成本、高性能特性 |
| Pulsar |[Apache Pulsar](https://pulsar.apache.org/), 高性能、高一致性消息队列服务 |
| Hive |[Apache Hive](https://hive.apache.org/),一个建立在Hadoop架构之上的数据仓库 |
| ClickHouse | [ClickHouse](https://clickhouse.com/),高性能列式OLAP 数据库 | |
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
title: 文件采集
title: 文件
sidebar_position: 3
---

## 文件采集配置
## 文件Agent配置
```
/data/inlong-agent/test.log //代表读取inlong-agent文件夹下的的新增文件test.log
/data/inlong-agent/test[0-9]{1} //代表读取inlong-agent文件夹下的新增文件test后接一个数字结尾
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: 总览
sidebar_position: 1
---

InLong-Agent是一个支持多种数据源类型的收集工具,致力于实现包括file、sql、Binlog、metrics等多种异构数据源之间稳定高效的数据采集功能
InLong-Agent是一个支持多种数据源类型的收集工具,致力于实现包括File、Sql、Binlog、Metrics等多种异构数据源之间稳定高效的数据采集功能

## 设计理念
为了解决数据源多样性问题,InLong-agent 将多种数据源抽象成统一的source概念,并抽象出sink来对数据进行写入。当需要接入一个新的数据源的时候,只需要配置好数据源的格式与读取参数便能跟做到高效读取。
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,21 +20,21 @@ Hive 是运行的必备组件。如果您的机器上没有 Hive,这里推荐
## 3 新建接入
部署完毕后,首先我们进入 “数据接入” 界面,点击右上角的 “新建接入”,新建一条接入,按下图所示填入数据流 Group 信息

<img src="img/create-group.png" align="center" alt="Create Group"/>
![Create Group](img/create-group.png)

然后点击下一步,按下图所示填入数据流信息

<img src="img/create-stream.png" align="center" alt="Create Stream"/>
![Create Stream](img/create-stream.png)

注意其中消息来源选择“文件”,暂时不用新建数据源。

然后我们在下面的“数据信息”一栏中填入以下信息

<img src="img/data-information.png" align="center" alt="Data Information"/>
![Data Information](img/data-information.png)

然后在数据流向中选择 Hive,并点击 “添加”,添加 Hive 配置

<img src="img/hive-config.png" align="center" alt="Hive Config"/>
![Hive Config](img/hive-config.png)

注意这里目标表无需提前创建,InLong Manager 会在接入通过之后自动为我们创建表。另外,请使用 “连接测试” 保证 InLong Manager 可以连接到你的 Hive。

Expand Down
Binary file removed static/img/inlong_architecture.png
Binary file not shown.

0 comments on commit 74e006e

Please sign in to comment.