[INLONG-1814] Show document file subdirectories and change the docume…

…nt directory level (apache#190)
xuehuanran · Nov 20, 2021 · b5e9420 · b5e9420
1 parent a279898
commit b5e9420
Show file tree

Hide file tree

Showing 35 changed files with 260 additions and 260 deletions.
diff --git a/docs/modules/agent/architecture.md b/docs/modules/agent/architecture.md
@@ -2,27 +2,27 @@
 title: Architecture
 ---
 
-## 1. Overview of InLong-Agent
+## 1 Overview of InLong-Agent
 InLong-Agent is a collection tool that supports multiple types of data sources, and is committed to achieving stable and efficient data collection functions between multiple heterogeneous data sources including file, sql, Binlog, metrics, etc.
 
-### The brief architecture diagram is as follows:
+### 1.1 The brief architecture diagram is as follows:
 ![](img/architecture.png)
 
-### design concept
+### 1.2 design concept
 In order to solve the problem of data source diversity, InLong-agent abstracts multiple data sources into a unified source concept, and abstracts sinks to write data. When you need to access a new data source, you only need to configure the format and reading parameters of the data source to achieve efficient reading.
 
-### Current status of use
+### 1.3 Current status of use
 InLong-Agent is widely used within the Tencent Group, undertaking most of the data collection business, and the amount of online data reaches tens of billions.
 
-## 2. InLong-Agent architecture
+## 2 InLong-Agent architecture
 The InLong Agent task is used as a data acquisition framework, constructed with a channel + plug-in architecture. Read and write the data source into a reader/writer plug-in, and then into the entire framework.
 
 + Reader: Reader is the data collection module, responsible for collecting data from the data source and sending the data to the channel.
 + Writer: Writer is a data writing module, which reuses data continuously to the channel and writes the data to the destination.
 + Channel: The channel used to connect the reader and writer, and as the data transmission channel of the connection, which realizes the function of data reading and monitoring
 
 
-## 3. Different kinds of agent
+## 3 Different kinds of agent
 ### 3.1 file agent
 File collection includes the following functions:
 

diff --git a/docs/modules/agent/quick_start.md b/docs/modules/agent/quick_start.md
@@ -2,15 +2,15 @@
 title: Build && Deployment
 ---
 
-## 1、Configuration
+## 1 Configuration
 ```
 cd inlong-agent
 ```
 
 The agent supports two modes of operation: local operation and online operation
 
 
-### Agent configuration
+### 1.1 Agent configuration
 
 Online operation needs to pull the configuration from inlong-manager, the configuration conf/agent.properties is as follows:
 ```ini
@@ -20,25 +20,25 @@ agent.manager.vip.http.host=manager web host
 agent.manager.vip.http.port=manager web port
 ```
 
-## 2、run
+## 2 run
 After decompression, run the following command
 
 ```bash
 sh agent.sh start
 ```
 
 
-## 3、Add job configuration in real time
+## 3 Add job configuration in real time
 
-#### 3.1 agent.properties Modify the following two places
+### 3.1 agent.properties Modify the following two places
 ```ini
 # whether enable http service
 agent.http.enable=true
 # http default port
 agent.http.port=Available ports
 ```
 
-#### 3.2 Execute the following command
+### 3.2 Execute the following command
 ```bash
     curl --location --request POST 'http://localhost:8008/config/job' \
     --header 'Content-Type: application/json' \
@@ -78,7 +78,7 @@ agent.http.port=Available ports
     - proxy.streamId: The streamId type used when writing proxy, streamId is the data flow id showed on data flow window in inlong-manager
 
 
-## 4、eg for directory config
+## 4 eg for directory config
 
     E.g:
     /data/inlong-agent/test.log //Represents reading the new file test.log in the inlong-agent folder
@@ -87,7 +87,7 @@ agent.http.port=Available ports
     /data/inlong-agent/^\\d+(\\.\\d+)? // Start with one or more digits, followed by. or end with one. or more digits (? stands for optional, can match Examples: "5", "1.5" and "2.21"
 
 
-## 5. Support to get data time from file name
+## 5 Support to get data time from file name
 
     Agent supports obtaining the time from the file name as the production time of the data. The configuration instructions are as follows:
     /data/inlong-agent/***YYYYMMDDHH***
@@ -143,7 +143,7 @@ curl --location --request POST'http://localhost:8008/config/job' \
 }'
 ```
 
-## 6. Support time offset reading
+## 6 Support time offset reading
 
     After the configuration is read by time, if you want to read data at other times than the current time, you can configure the time offset to complete
     Configure the job attribute name as job.timeOffset, the value is number + time dimension, time dimension includes day and hour

diff --git a/docs/modules/dataproxy-sdk/architecture.md b/docs/modules/dataproxy-sdk/architecture.md
@@ -1,16 +1,16 @@
 ---
 title: Architecture
 ---
-# 1、intro
+## 1 intro
 When the business uses the message access method, the business generally only needs to format the data in a proxy-recognizable format (such as six-segment protocol, digital protocol, etc.)
 After group packet transmission, data can be connected to inlong. But in order to ensure data reliability, load balancing, and dynamic update of the proxy list and other security features
 The user program needs to consider more and ultimately leads to the program being too cumbersome and bloated.
 
 The original intention of API design is to simplify user access and assume some reliability-related logic. After the user integrates the API in the service delivery program, the data can be sent to the proxy without worrying about the grouping format, load balancing and other logic.
 
-# 2、functions
+## 2 functions
 
-## 2.1 overall functions
+### 2.1 overall functions
 
 |  function   | description  |
 |  ----  | ----  |
@@ -22,17 +22,17 @@ The original intention of API design is to simplify user access and assume some
 | proxy list persistence (new)| Persist the proxy list according to the business group id to prevent the configuration center from failing to send data when the program starts
 
 
-## 2.2 Data transmission function description
+### 2.2 Data transmission function description
 
-### Synchronous batch function
+#### Synchronous batch function
 
     public SendResult sendMessage(List<byte[]> bodyList, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)
 
     Parameter Description
 
     bodyListIt is a collection of multiple pieces of data that users need to send. The total length is recommended to be less than 512k. groupId represents the service id, and streamId represents the interface id. dt represents the time stamp of the data, accurate to the millisecond level. It can also be set to 0 directly, and the api will get the current time as its timestamp in the background. timeout & timeUnit: These two parameters are used to set the timeout time for sending data, and it is generally recommended to set it to 20s.
 
-### Synchronize a single function
+#### Synchronize a single function
 
     public SendResult sendMessage(byte[] body, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)
 
@@ -41,7 +41,7 @@ The original intention of API design is to simplify user access and assume some
     body is the content of a single piece of data that the user wants to send, and the meaning of the remaining parameters is basically the same as the batch sending interface.
 
 
-### Asynchronous batch function
+#### Asynchronous batch function
 
     public void asyncSendMessage(SendMessageCallback callback, List<byte[]> bodyList, String groupId, String streamId, long dt, long timeout,TimeUnit timeUnit)
 
@@ -50,7 +50,7 @@ The original intention of API design is to simplify user access and assume some
     SendMessageCallback is a callback for processing messages. The bodyList is a collection of multiple pieces of data that users need to send. The total length of multiple pieces of data is recommended to be less than 512k. groupId is the service id, and streamId is the interface id. dt represents the time stamp of the data, accurate to the millisecond level. It can also be set to 0 directly, and the api will get the current time as its timestamp in the background. timeout and timeUnit are the timeout time for sending data, generally recommended to be set to 20s.
 
 
-### Asynchronous single function
+#### Asynchronous single function
 
 
     public void asyncSendMessage(SendMessageCallback callback, byte[] body, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)

diff --git a/docs/modules/dataproxy/architecture.md b/docs/modules/dataproxy/architecture.md
@@ -1,22 +1,22 @@
 ---
 title: Architecture
 ---
-# 1、intro
+## 1 intro
 
     Inlong-dataProxy belongs to the inlong proxy layer and is used for data collection, reception and forwarding. Through format conversion, the data is converted into TDMsg1 format that can be cached and processed by the cache layer
     InLong-dataProxy acts as a bridge from the InLong collection end to the InLong buffer end. Dataproxy pulls the relationship between the business group id and the corresponding topic name from the manager module, and internally manages the producers of multiple topics
     The overall architecture of inlong-dataproxy is based on Apache Flume. On the basis of this project, inlong-bus expands the source layer and sink layer, and optimizes disaster tolerance forwarding, which improves the stability of the system.
 
 
-# 2、architecture
+## 2 architecture
 
 ![](img/architecture.png)
 
  	1. The source layer opens port monitoring, which is realized through netty server. The decoded data is sent to the channel layer
  	2. The channel layer has a selector, which is used to choose which type of channel to go. If the memory is eventually full, the data will be processed.
  	3. The data of the channel layer will be forwarded through the sink layer. The main purpose here is to convert the data to the TDMsg1 format and push it to the cache layer (tube is more commonly used here)
 
-# 3、DataProxy support configuration instructions
+## 3 DataProxy support configuration instructions
 
 DataProxy supports configurable source-channel-sink, and the configuration method is the same as the configuration file structure of flume:
 
@@ -158,7 +158,7 @@ agent1.sinks.meta-sink-more1.max-survived-size = 3000000
 Maximum number of caches
 ```
 
-# 4、Monitor metrics configuration instructions
+## 4 Monitor metrics configuration instructions
 
   DataProxy provide monitor indicator based on JMX, user can implement the code that read the metrics and report to user-defined monitor system.
 Source-module and Sink-module can add monitor metric class that is the subclass of org.apache.inlong.commons.config.metrics.MetricItemSet, and register it to MBeanServer. User-defined plugin can get module metric with JMX, and report metric data to different monitor system.

diff --git a/docs/modules/dataproxy/quick_start.md b/docs/modules/dataproxy/quick_start.md
@@ -1,11 +1,11 @@
 ---
 title: Build && Deployment
 ---
-## Deploy DataProxy
+## 1 Deploy DataProxy
 
 All deploying files at `inlong-dataproxy` directory.
 
-### config TubeMQ master
+### 1.1 config TubeMQ master
 
 `tubemq_master_list` is the rpc address of TubeMQ Master.
 ```
@@ -14,33 +14,33 @@ $ sed -i 's/TUBE_LIST/tubemq_master_list/g' conf/flume.conf
 
 notice that conf/flume.conf FLUME_HOME is proxy the directory for proxy inner data
 
-### Environmental preparation
+### 1.2 Environmental preparation
 
 ```
 sh prepare_env.sh
 ```
 
-### config manager web url
+### 1.3 config manager web url
 
 configuration file: `conf/common.properties`:
 ```
 # manager web 
 manager_hosts=ip:port 
 ```
 
-## run
+## 2 run
 
 ```
 sh bin/start.sh
 ```
 
 
-## check
+## 3 check
 ```
 telnet 127.0.0.1 46801
 ```
 
-## Add DataProxy configuration to InLong-Manager
+## 4 Add DataProxy configuration to InLong-Manager
 
 After installing the DataProxy, you need to insert the IP and port of the DataProxy service is located into the backend database of InLong-Manager.
 

diff --git a/docs/modules/manager/architecture.md b/docs/modules/manager/architecture.md
@@ -2,19 +2,19 @@
 title: Architecture
 ---
 
-## Introduction to Apache InLong Manager
+## 1 Introduction to Apache InLong Manager
 
 + Target positioning: Apache inlong is positioned as a one-stop data access solution, providing complete coverage of big data access scenarios from data collection, transmission, sorting, and technical capabilities.
 
 + Platform value: Users can complete task configuration, management, and indicator monitoring through the platform's built-in management and configuration platform. At the same time, the platform provides SPI extension points in the main links of the process to implement custom logic as needed. Ensure stable and efficient functions while lowering the threshold for platform use.
 
 + Apache InLong Manager is the user-oriented unified UI of the entire data access platform. After the user logs in, it will provide different function permissions and data permissions according to the corresponding role. The page provides maintenance portals for the platform's basic clusters (such as mq, sorting), and you can view basic maintenance information and capacity planning adjustments at any time. At the same time, business users can complete the creation, modification and maintenance of data access tasks, and index viewing and reconciliation functions. The corresponding background service will interact with the underlying modules when users create and start tasks, and deliver the tasks that each module needs to perform in a reasonable way. Play the role of coordinating the execution process of the serial back-end business.
-## Architecture
+## 2 Architecture
 
 ![](img/inlong-manager.png)
 
 
-##Module division of labor
+## 3 Module division of labor
 
 | Module | Responsibilities |
 | :----| :---- |
@@ -24,9 +24,9 @@ title: Architecture
 | manager-web | Front-end interactive response interface |
 | manager-workflow-engine | Workflow Engine |
 
-## use process 
+## 4 use process 
 ![](img/interactive.jpg)
 
 
-## data model
+## 5 data model
 ![](img/datamodel.jpg)
diff --git a/docs/modules/manager/quick_start.md b/docs/modules/manager/quick_start.md
@@ -2,7 +2,7 @@
 title: Build && Deployment
 ---
 
-# 1. Environmental preparation
+## 1 Environmental preparation
 - Install and start MySQL 5.7+, copy the `doc/sql/apache_inlong_manager.sql` file in the inlong-manager module to the
   server where the MySQL database is located (for example, copy to `/data/` directory), load this file through the
   following command to complete the initialization of the table structure and basic data:
@@ -25,15 +25,15 @@ title: Build && Deployment
   to [Compile and deploy TubeMQ Manager](https://inlong.apache.org/zh-cn/docs/modules/tubemq/tubemq-manager/quick_start.html)
   , install and start TubeManager.
 
-# 2. Deploy and start manager-web
+## 2 Deploy and start manager-web
 
 **manager-web is a background service that interacts with the front-end page.**
 
-## 2.1 Prepare installation files
+### 2.1 Prepare installation files
 
 All installation files at `inlong-manager-web` directory.
 
-## 2.2 Modify configuration
+### 2.2 Modify configuration
 
 Go to the decompressed `inlong-manager-web` directory and modify the `conf/application.properties` file:
 
@@ -74,7 +74,7 @@ The dev configuration is specified above, then modify the `conf/application-dev.
    sort.appName=inlong_app
    ```
 
-## 2.3 Start the service
+### 2.3 Start the service
 
 Enter the decompressed directory, execute `sh bin/startup.sh` to start the service, and check the
 log `tailf log/manager-web.log`. If a log similar to the following appears, the service has started successfully:
@@ -83,7 +83,7 @@ log `tailf log/manager-web.log`. If a log similar to the following appears, the
 Started InLongWebApplication in 6.795 seconds (JVM running for 7.565)
 ```
 
-# 3. Service access verification
+## 3 Service access verification
 
 Verify the manager-web service:
 

diff --git a/docs/modules/sort/introduction.md b/docs/modules/sort/introduction.md
@@ -7,31 +7,31 @@ Inlong-sort is used to extract data from different source systems, then transfor
 Inlong-sort is simply an Flink application, and relys on Inlong-manager to manage meta data(such as the source informations and storage informations)
 
 # features
-## multi-tenancy
+## 1 multi-tenancy
 Inlong-sort is an multi-tenancy system, which means you can extract data from different sources(these sources must be of the same source type) and load data into different sinks(these sinks must be of the same storage type).
 e.g. you can extract data form different topics of inlong-tubemq and the load them to different hive clusters.
 
-## change meta data without restart
+## 2 change meta data without restart
 Inlong-sort uses zookeeper to manage its meta data, every time you change meta data on zk, inlong-sort application will be informed immediately.
 e.g if you want to change the schema of your data, just change the meta data on zk without restart your inlong-sort application.
 
-# supported sources
+## 3 supported sources
 - inlong-tubemq
 - pulsar
 
-# supported storages
+## 4 supported storages
 - clickhouse
 - hive (Currently we just support parquet file format)
 
-# limitations
+## 5 limitations
 Currently, we just support extracting specified fields in the stage of **Transform**.
 
-# future plans
-## More kinds of source systems
+## 6 future plans
+### 6.1 More kinds of source systems
 kafka and etc
 
-## More kinds of storage systems
+### 6.2 More kinds of storage systems
 Hbase, Elastic Search, and etc
 
-## More kinds of file format in hive sink
+### 6.3 More kinds of file format in hive sink
 sequence file, orc
diff --git a/docs/modules/sort/protocol_introduction.md b/docs/modules/sort/protocol_introduction.md
@@ -7,7 +7,7 @@ Currently the metadata management of inlong-sort relies on inlong-manager.
 
 Metadata interaction between inlong-sort and inlong-manager is performed via ZK.
 
-# Zookeeper's path structure
+## 1 Zookeeper's path structure
 
 ![img.png](img.png)
 
@@ -20,6 +20,6 @@ A path at the top of the figure indicates which dataflow are running in a cluste
 
 The path below is used to store the details of the dataflow.
 
-# Protocol
+## 2 Protocol
 Please reference
 `org.apache.inlong.sort.protocol.DataFlowInfo`