diff --git a/docs/automq-kafka-source.md b/docs/automq-kafka-source.md index 49b9c6bd9..2e8ca10a2 100644 --- a/docs/automq-kafka-source.md +++ b/docs/automq-kafka-source.md @@ -28,5 +28,5 @@ Click **Next**. Timeplus will connect to the server and list all topics. Choose In the next step, confirm the schema of the Timeplus stream and specify a name. At the end of the wizard, an external stream will be created in Timeplus. You can query data or even write data to the AutoMQ topic with SQL. See also: -* [Kafka External Stream](/proton-kafka) +* [Kafka External Stream](/kafka-source) * [Tutorial: Streaming ETL from Kafka to ClickHouse](/tutorial-sql-etl-kafka-to-ch) diff --git a/docs/bigquery-external.md b/docs/bigquery-external.md new file mode 100644 index 000000000..957261113 --- /dev/null +++ b/docs/bigquery-external.md @@ -0,0 +1,34 @@ +# BigQuery + +Leveraging HTTP external stream, you can write / materialize data to BigQuery directly from Timeplus. + +## Write to BigQuery {#example-write-to-bigquery} + +Assume you have created a table in BigQuery with 2 columns: +```sql +create table `PROJECT.DATASET.http_sink_t1`( + num int, + str string); +``` + +Follow [the guide](https://cloud.google.com/bigquery/docs/authentication) to choose the proper authentication to Google Cloud, such as via the gcloud CLI `gcloud auth application-default print-access-token`. + +Create the HTTP external stream in Timeplus: +```sql +CREATE EXTERNAL STREAM http_bigquery_t1 (num int,str string) +SETTINGS +type = 'http', +http_header_Authorization='Bearer $OAUTH_TOKEN', +url = 'https://bigquery.googleapis.com/bigquery/v2/projects/$PROJECT/datasets/$DATASET/tables/$TABLE/insertAll', +data_format = 'Template', +format_template_resultset_format='{"rows":[${data}]}', +format_template_row_format='{"json":{"num":${num:JSON},"str":${str:JSON}}}', +format_template_rows_between_delimiter=',' +``` + +Replace the `OAUTH_TOKEN` with the output of `gcloud auth application-default print-access-token` or other secure way to obtain OAuth token. Replace `PROJECT`, `DATASET` and `TABLE` to match your BigQuery table path. Also change `format_template_row_format` to match the table schema. + +Then you can insert data via a materialized view or just via `INSERT` command: +```sql +INSERT INTO http_bigquery_t1 VALUES(10,'A'),(11,'B'); +``` diff --git a/docs/changelog-stream.md b/docs/changelog-stream.md index c42416baa..ae4a89963 100644 --- a/docs/changelog-stream.md +++ b/docs/changelog-stream.md @@ -403,7 +403,7 @@ Debezium also read all existing rows and generate messages like this ### Load data to Timeplus -You can follow this [guide](/proton-kafka) to add 2 external streams to load data from Kafka or Redpanda. For example: +You can follow this [guide](/kafka-source) to add 2 external streams to load data from Kafka or Redpanda. For example: * Data source name `s1` to load data from topic `doc.public.dim_products` and put in a new stream `rawcdc_dim_products` * Data source name `s2` to load data from topic `doc.public.orders` and put in a new stream `rawcdc_orders` diff --git a/docs/cli-migrate.md b/docs/cli-migrate.md index 9224d143c..6bcc2edd9 100644 --- a/docs/cli-migrate.md +++ b/docs/cli-migrate.md @@ -8,7 +8,7 @@ This tool is available in Timeplus Enterprise 2.5. It supports [Timeplus Enterpr ## How It Works -The migration is done via capturing the SQL DDL from the source deployment and rerunning those SQL DDL in the target deployment. Data are read from source Timeplus via [Timeplus External Streams](/timeplus-external-stream) and write to the target Timeplus via `INSERT INTO .. SELECT .. FROM table(tp_ext_stream)`. The data files won't be copied among the source and target Timeplus, but you need to ensure the target Timeplus can access to the source Timeplus, so that it can read data via Timeplus External Streams. +The migration is done via capturing the SQL DDL from the source deployment and rerunning those SQL DDL in the target deployment. Data are read from source Timeplus via [Timeplus External Streams](/timeplus-source) and write to the target Timeplus via `INSERT INTO .. SELECT .. FROM table(tp_ext_stream)`. The data files won't be copied among the source and target Timeplus, but you need to ensure the target Timeplus can access to the source Timeplus, so that it can read data via Timeplus External Streams. ## Supported Resources diff --git a/docs/proton-clickhouse-external-table.md b/docs/clickhouse-external-table.md similarity index 98% rename from docs/proton-clickhouse-external-table.md rename to docs/clickhouse-external-table.md index b5420575c..9632ddca4 100644 --- a/docs/proton-clickhouse-external-table.md +++ b/docs/clickhouse-external-table.md @@ -1,5 +1,7 @@ # ClickHouse External Table +## Overview + Timeplus can read or write ClickHouse tables directly. This unlocks a set of new use cases, such as - Use Timeplus to efficiently process real-time data in Kafka/Redpanda, apply flat transformation or stateful aggregation, then write the data to the local or remote ClickHouse for further analysis or visualization. @@ -41,7 +43,7 @@ The required settings are type and address. For other settings, the default valu The `config_file` setting is available since Timeplus Enterprise 2.7. You can specify the path to a file that contains the configuration settings. The file should be in the format of `key=value` pairs, one pair per line. You can set the ClickHouse user and password in the file. -Please follow the example in [Kafka External Stream](/proton-kafka#config_file). +Please follow the example in [Kafka External Stream](/kafka-source#config_file). You don't need to specify the columns, since the table schema will be fetched from the ClickHouse server. diff --git a/docs/ingestion.md b/docs/connect-data-in.md similarity index 96% rename from docs/ingestion.md rename to docs/connect-data-in.md index c13dafa5d..5ba63fb76 100644 --- a/docs/ingestion.md +++ b/docs/connect-data-in.md @@ -1,9 +1,9 @@ -# Getting Data In +# Connect Data In Timeplus supports multiple ways to load data into the system, or access the external data without copying them in Timeplus: - [External Stream for Apache Kafka](/external-stream), Confluent, Redpanda, and other Kafka API compatible data streaming platform. This feature is also available in Timeplus Proton. -- [External Stream for Apache Pulsar](/pulsar-external-stream) is available in Timeplus Enterprise 2.5 and above. +- [External Stream for Apache Pulsar](/pulsar-source) is available in Timeplus Enterprise 2.5 and above. - Source for extra wide range of data sources. This is only available in Timeplus Enterprise. This integrates with [Redpanda Connect](https://redpanda.com/connect), supporting 200+ connectors. - On Timeplus web console, you can also [upload CSV files](#csv) and import them into streams. - For Timeplus Enterprise, [REST API](/ingest-api) and SDKs are provided to push data to Timeplus programmatically. @@ -15,12 +15,12 @@ Timeplus supports multiple ways to load data into the system, or access the exte Choose "Data Collection" from the navigation menu to setup data access to other systems. There are two categories: * Timeplus Connect: directly supported by Timeplus Inc, with easy-to-use setup wizards. * Demo Stream: generate random data for various use cases. [Learn more](#streamgen) - * Timeplus: read data from another Timeplus deployment. [Learn more](/timeplus-external-stream) + * Timeplus: read data from another Timeplus deployment. [Learn more](/timeplus-source) * Apache Kafka: setup external streams to read from Apache Kafka. [Learn more](#kafka) * Confluent Cloud: setup external streams to read from Confluent Cloud * Redpanda: setup external streams to read from Redpanda * Apache Pulsar: setup external streams to read from Apache Pulsar. [Learn more](#pulsar) - * ClickHouse: setup external tables to read from ClickHouse, without duplicating data in Timeplus. [Learn more](/proton-clickhouse-external-table) + * ClickHouse: setup external tables to read from ClickHouse, without duplicating data in Timeplus. [Learn more](/clickhouse-external-table) * NATS: load data from NATS to Timeplus streams * WebSocket: load data from WebSocket to Timeplus streams * HTTP Stream: load data from HTTP stream to Timeplus streams @@ -29,19 +29,17 @@ Choose "Data Collection" from the navigation menu to setup data access to other * Stream Ingestion: a wizard to guide you to push data to Timeplus via Ingest REST API. [Learn more](/ingest-api) * Redpanda Connect: available since Timeplus Enterprise 2.5 or above. Set up data access to other systems by editing a YAML file. Powered by Redpanda Connect, supported by Redpanda Data Inc. or Redpanda Community. - - ### Load streaming data from Apache Kafka {#kafka} As of today, Kafka is the primary data integration for Timeplus. With our strong partnership with Confluent, you can load your real-time data from Confluent Cloud, Confluent Platform, or Apache Kafka into the Timeplus streaming engine. You can also create [external streams](/external-stream) to analyze data in Confluent/Kafka/Redpanda without moving data. -[Learn more.](/proton-kafka) +[Learn more.](/kafka-source) ### Load streaming data from Apache Pulsar {#pulsar} Apache® Pulsar™ is a cloud-native, distributed, open source messaging and streaming platform for real-time workloads. Since Timeplus Enterprise 2.5, Pulsar External Streams can be created to read or write data for Pulsar. -[Learn more.](/pulsar-external-stream) +[Learn more.](/pulsar-source) ### Upload local files {#csv} diff --git a/docs/databricks-external.md b/docs/databricks-external.md new file mode 100644 index 000000000..106792b36 --- /dev/null +++ b/docs/databricks-external.md @@ -0,0 +1,39 @@ +# Databricks + +Leveraging HTTP external stream, you can write / materialize data to Databricks directly from Timeplus. + +## Write to Databricks {#example-write-to-databricks} + +Follow [the guide](https://docs.databricks.com/aws/en/dev-tools/auth/pat) to create an access token for your Databricks workspace. + +Assume you have created a table in Databricks SQL warehouse with 2 columns: +```sql +CREATE TABLE sales ( + product STRING, + quantity INT +); +``` + +Create the HTTP external stream in Timeplus: +```sql +CREATE EXTERNAL STREAM http_databricks_t1 (product string, quantity int) +SETTINGS +type = 'http', +http_header_Authorization='Bearer $TOKEN', +url = 'https://$HOST.cloud.databricks.com/api/2.0/sql/statements/', +data_format = 'Template', +format_template_resultset_format='{"warehouse_id":"$WAREHOUSE_ID","statement": "INSERT INTO sales (product, quantity) VALUES (:product, :quantity)", "parameters": [${data}]}', +format_template_row_format='{ "name": "product", "value": ${product:JSON}, "type": "STRING" },{ "name": "quantity", "value": ${quantity:JSON}, "type": "INT" }', +format_template_rows_between_delimiter='' +``` + +Replace the `TOKEN`, `HOST`, and `WAREHOUSE_ID` to match your Databricks settings. Also change `format_template_row_format` and `format_template_row_format` to match the table schema. + +Then you can insert data via a materialized view or just via `INSERT` command: +```sql +INSERT INTO http_databricks_t1(product, quantity) VALUES('test',95); +``` + +This will insert one row per request. We plan to support batch insert and Databricks specific format to support different table schemas in the future. + + diff --git a/docs/datadog-external.md b/docs/datadog-external.md new file mode 100644 index 000000000..fb7dd39fa --- /dev/null +++ b/docs/datadog-external.md @@ -0,0 +1,24 @@ +# Datadog + +Leveraging HTTP external stream, you can write / materialize data to Datadog directly from Timeplus. + +## Write to Datadog {#example-write-to-datadog} + +Create or use an existing API key with the proper permission for sending data. + +Create the HTTP external stream in Timeplus: +```sql +CREATE EXTERNAL STREAM datadog_t1 (event string) +SETTINGS +type = 'http', +data_format = 'JSONEachRow', +output_format_json_array_of_rows = 1, +http_header_DD_API_KEY = 'THE_API_KEY', +http_header_Content_Type = 'application/json', +url = 'https://http-intake.logs.us3.datadoghq.com/api/v2/logs' --make sure you set the right region +``` + +Then you can insert data via a materialized view or just +```sql +INSERT INTO datadog_t1(message, hostname) VALUES('test message','a.test.com'),('test2','a.test.com'); +``` diff --git a/docs/elastic-external.md b/docs/elastic-external.md new file mode 100644 index 000000000..fe5ea7f03 --- /dev/null +++ b/docs/elastic-external.md @@ -0,0 +1,25 @@ +# Elastic Search + +Leveraging HTTP external stream, you can write data to Elastic Search or Open Search directly from Timeplus. + +## Write to OpenSearch / ElasticSearch {#example-write-to-es} + +Assuming you have created an index `students` in a deployment of OpenSearch or ElasticSearch, you can create the following external stream to write data to the index. + +```sql +CREATE EXTERNAL STREAM opensearch_t1 ( + name string, + gpa float32, + grad_year int16 +) SETTINGS +type = 'http', +data_format = 'OpenSearch', --can also use the alias "ElasticSearch" +url = 'https://opensearch.company.com:9200/students/_bulk', +username='admin', +password='..' +``` + +Then you can insert data via a materialized view or just +```sql +INSERT INTO opensearch_t1(name,gpa,grad_year) VALUES('Jonathan Powers',3.85,2025); +``` diff --git a/docs/enterprise-v2.4.md b/docs/enterprise-v2.4.md index ed38d5d40..7a677de4c 100644 --- a/docs/enterprise-v2.4.md +++ b/docs/enterprise-v2.4.md @@ -12,10 +12,10 @@ Each component tracks their changes with own version numbers. The version number ## Key Highlights Key highlights of this release: * [Distributed Mutable Streams](/mutable-stream) for high performance query and UPSERT (UPDATE or INSERT), with primary keys, secondary keys, column families, sorting columns, parallel full scan and many more -* [External Streams](/timeplus-external-stream) to query or write to remote Timeplus, designed for data migration or hybrid deployment +* [External Streams](/timeplus-source) to query or write to remote Timeplus, designed for data migration or hybrid deployment * Built-in system observability. Your workspace now comes with a system dashboard to monitor your cluster, including charts for running nodes and failed nodes, read/write throughput and EPS, used disk storage, and more. See additional metrics for resources in the details side panel, accessed via the data lineage or resource list pages, including status and any last errors -* [Kafka schema registry support for Avro output format](/proton-schema-registry#write) -* Read/write Kafka message keys via [_tp_message_key column](/proton-kafka#_tp_message_key) +* [Kafka schema registry support for Avro output format](/kafka-schema-registry#write) +* Read/write Kafka message keys via [_tp_message_key column](/kafka-source) * More performance enhancements, including: * Concurrent and [idempotent data ingestion](/idempotent) * Memory efficiency improvement for window processing @@ -48,7 +48,7 @@ Compared to the [2.4.28](#2_4_28) release: * fix: truncate garbage data at tail for reverse indexes #### Known issues {#known_issue_2_4_29} -1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for migration. +1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for migration. ### 2.4.28 (Stable) {#2_4_28} Built on 08-12-2025. You can install via: @@ -70,7 +70,7 @@ Compared to the [2.4.27](#2_4_27) release: * fix: timestamp sequence deserialization issue #### Known issues {#known_issue_2_4_28} -1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for migration. +1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for migration. ### 2.4.27 (Stable) {#2_4_27} Built on 08-05-2025. You can install via: @@ -94,7 +94,7 @@ Compared to the [2.4.26](#2_4_26) release: * fix: log truncation and garbage collection #### Known issues {#known_issue_2_4_27} -1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for migration. +1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for migration. ### 2.4.26 (Stable) {#2_4_26} @@ -119,7 +119,7 @@ Compared to the [2.4.25](#2_4_25) release: * fix a bug during versioned schema fetch for inner storage of materialized views #### Known issues {#known_issue_2_4_26} -1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for migration. +1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for migration. ### 2.4.25 (Stable) {#2_4_25} Built on 01-31-2025. You can install via: @@ -143,7 +143,7 @@ Compared to the [2.4.23](#2_4_23) release: * set mutable streams' default logstore retention policy from keeping forever to automatic #### Known issues {#known_issue_2_4_25} -1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for migration. +1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for migration. ### 2.4.23 (Stable) {#2_4_23} Built on 08-22-2024. You can install via: @@ -168,7 +168,7 @@ Compared to the [2.4.19](#2_4_19) release: * bugfixes and performance enhancements #### Known issues {#known_issue_2_4_23} -1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for migration. +1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for migration. ### 2.4.19 {#2_4_19} @@ -186,7 +186,7 @@ Compared to the [2.4.17](#2_4_17) release: * feat(ingest): use username:password for ingest API wizard #### Known issues {#known_issue_2_4_19} -1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for migration. +1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for migration. 2. In Timeplus Console, no result will be shown for SQL [SHOW FORMAT SCHEMAS](/sql-show-format-schemas) or [SHOW FUNCTIONS](/sql-show-functions). This only impacts the web interface. Running such SQL via `timeplusd client` CLI or JDBC/ODBC will get the expected results. ### 2.4.17 {#2_4_17} @@ -201,14 +201,14 @@ Compared to the [2.4.16](#2_4_16) release: Components: * timeplusd - * feat: support running [table function](/functions_for_streaming#table) on [Timeplus External Stream](/timeplus-external-stream) + * feat: support running [table function](/functions_for_streaming#table) on [Timeplus External Stream](/timeplus-source) * improvement: track more stats: external_stream_read_failed, external_stream_written_failed, mv_recover_times, mv_memory_usage. * improvement: better track memory usage in macOS and Docker container. * feat: allow you to [drop streams](/sql-drop-stream#force_drop_big_stream) with `force_drop_big_stream=true` setting. * improvement: default listen for 0.0.0.0 instead 127.1 (localhost) #### Known issues {#known_issue_2_4_17} -1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for migration. +1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for migration. 2. In Timeplus Console, no result will be shown for SQL [SHOW FORMAT SCHEMAS](/sql-show-format-schemas) or [SHOW FUNCTIONS](/sql-show-functions). This only impacts the web interface. Running such SQL via `timeplusd client` CLI or JDBC/ODBC will get the expected results. ### 2.4.16 (Stable) {#2_4_16} @@ -245,7 +245,7 @@ Components: * fix: list users properly #### Known issues {#known_issue_2_4_16} -1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for migration. +1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for migration. 2. In Timeplus Console, no result will be shown for SQL [SHOW FORMAT SCHEMAS](/sql-show-format-schemas) or [SHOW FUNCTIONS](/sql-show-functions). This only impacts the web interface. Running such SQL via `timeplusd client` CLI or JDBC/ODBC will get the expected results. @@ -269,7 +269,7 @@ Components: * timeplusd * feat: [new mutable stream](/mutable-stream) for fast UPSERT and high performance point or range query. * perf: better asof join performance - * feat: [external stream to read data from the remote timeplusd](/timeplus-external-stream) + * feat: [external stream to read data from the remote timeplusd](/timeplus-source) * feat: [parallel key space scan](/mutable-stream#key_space_full_scan_threads) * feat: force_full_scan for mutable stream * feat: user management on cluster @@ -277,8 +277,8 @@ Components: * feat: support remote UDF on cluster * feat: primary key columns in secondary key * feat: support [ALTER STREAM .. ADD COLUMN ..](sql-alter-stream#add-column) - * feat: _tp_message_key to [read/write message keys in Kafka](/proton-kafka#_tp_message_key) - * feat: [Kafka schema registry support for Avro output format](/proton-schema-registry#write) + * feat: _tp_message_key to [read/write message keys in Kafka](/kafka-source) + * feat: [Kafka schema registry support for Avro output format](/kafka-schema-registry#write) * feat: support [idempotent keys processing](/idempotent) * feat: collect node free memory usage. You can get it via `select cluster_id, node_id, os_memory_total_mb, os_memory_free_mb, memory_used_mb, disk_total_mb, disk_free_mb, timestamp from system.cluster` * fix: nullptr access in window function @@ -317,6 +317,6 @@ Components: * feat: for stop command, terminate the service if graceful stop times out #### Known issues {#known_issue_2_4_15} -1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for migration. +1. If you have deployed one of the [2.3.x releases](/enterprise-v2.3), you cannot reuse the data and configuration directly. Please have a clean installation of 2.4.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for migration. 2. In Timeplus Console, no result will be shown for SQL [SHOW FORMAT SCHEMAS](/sql-show-format-schemas) or [SHOW FUNCTIONS](/sql-show-functions). This only impacts the web interface. Running such SQL via `timeplusd client` CLI or JDBC/ODBC will get the expected results. 3. For [timeplus user](/cli-user) CLI, you need to add `--verbose` to `timeplus user list` command, in order to list users. diff --git a/docs/enterprise-v2.5.md b/docs/enterprise-v2.5.md index 59bbe3699..331bfa7d7 100644 --- a/docs/enterprise-v2.5.md +++ b/docs/enterprise-v2.5.md @@ -11,13 +11,13 @@ Each component tracks their changes with own version numbers. The version number ## Key Highlights Key highlights of this release: -* Reading or writing data in Apache Pulsar or StreamNative via External Stream. [Learn more](/pulsar-external-stream). +* Reading or writing data in Apache Pulsar or StreamNative via External Stream. [Learn more](/pulsar-source). * Connecting to various input or output systems via Redpanda Connect. [Learn more](/redpanda-connect). * Creating and managing users in the Web Console. You can change the password and assign the user either Administrator or Read-only role. * New [migrate](/cli-migrate) subcommand in [timeplus CLI](/cli-reference) for data migration and backup/restore. -* Materialized views auto-rebalancing in the cluster mode. [Learn more](/view#auto-balancing). +* Materialized views auto-rebalancing in the cluster mode. [Learn more](/materialized-view#auto-balancing). * Approximately 30% faster data ingestion and replication in the cluster mode. -* Performance improvement for [ASOF JOIN](/joins) and [EMIT ON UPDATE](/streaming-aggregations#emit_on_update). +* Performance improvement for [ASOF JOIN](/streaming-joins) and [EMIT ON UPDATE](/streaming-aggregations#emit_on_update). ## Supported OS {#os} |Deployment Type| OS | @@ -49,7 +49,7 @@ Compared to the [2.5.13](#2_5_13) release: * Handle log corruption more gracefully and fixes log truncation. #### Known issues {#known_issue_2_5_14} -1. If you have deployed one of the [2.4.x releases](/enterprise-v2.4), you can reuse the data and configuration directly. However, if your current deployment is [2.3](/enterprise-v2.3) or earlier, you cannot upgrade directly. Please have a clean installation of 2.5.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for migration. +1. If you have deployed one of the [2.4.x releases](/enterprise-v2.4), you can reuse the data and configuration directly. However, if your current deployment is [2.3](/enterprise-v2.3) or earlier, you cannot upgrade directly. Please have a clean installation of 2.5.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for migration. 2. Pulsar external streams are only available in Linux bare metal builds and Linux-based Docker images. This type of external stream is not available in macOS bare metal builds. ### 2.5.13 (Public GA) {#2_5_13} @@ -72,7 +72,7 @@ Compared to the [2.5.12](#2_5_12) release: * Bug fixes without new features #### Known issues {#known_issue_2_5_13} -1. If you have deployed one of the [2.4.x releases](/enterprise-v2.4), you can reuse the data and configuration directly. However, if your current deployment is [2.3](/enterprise-v2.3) or earlier, you cannot upgrade directly. Please have a clean installation of 2.5.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for migration. +1. If you have deployed one of the [2.4.x releases](/enterprise-v2.4), you can reuse the data and configuration directly. However, if your current deployment is [2.3](/enterprise-v2.3) or earlier, you cannot upgrade directly. Please have a clean installation of 2.5.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for migration. 2. Pulsar external streams are only available in Linux bare metal builds and Linux-based Docker images. This type of external stream is not available in macOS bare metal builds. ### 2.5.12 (Public GA) {#2_5_12} @@ -95,7 +95,7 @@ Compared to the [2.5.11](#2_5_11) release: * Able to drop malformed UDFs with `DROP FUNCTION udf_name SETTINGS force=true`. #### Known issues {#known_issue_2_5_12} -1. If you have deployed one of the [2.4.x releases](/enterprise-v2.4), you can reuse the data and configuration directly. However, if your current deployment is [2.3](/enterprise-v2.3) or earlier, you cannot upgrade directly. Please have a clean installation of 2.5.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for migration. +1. If you have deployed one of the [2.4.x releases](/enterprise-v2.4), you can reuse the data and configuration directly. However, if your current deployment is [2.3](/enterprise-v2.3) or earlier, you cannot upgrade directly. Please have a clean installation of 2.5.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for migration. 2. Pulsar external streams are only available in Linux bare metal builds and Linux-based Docker images. This type of external stream is not available in macOS bare metal builds. ### 2.5.11 (Public GA) {#2_5_11} @@ -121,7 +121,7 @@ Compared to the [2.5.10](#2_5_10) release: You can upgrade a deployment of Timeplus Enterprise 2.4 to Timeplus Enterprise 2.5, by stopping the components and replacing the binary files, or reusing the Docker or Kubernetes volumes and update the image versions. #### Known issues {#known_issue_2_5_11} -1. If you have deployed one of the [2.4.x releases](/enterprise-v2.4), you can reuse the data and configuration directly. However, if your current deployment is [2.3](/enterprise-v2.3) or earlier, you cannot upgrade directly. Please have a clean installation of 2.5.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for migration. +1. If you have deployed one of the [2.4.x releases](/enterprise-v2.4), you can reuse the data and configuration directly. However, if your current deployment is [2.3](/enterprise-v2.3) or earlier, you cannot upgrade directly. Please have a clean installation of 2.5.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for migration. 2. Pulsar external streams are only available in Linux bare metal builds and Linux-based Docker images. This type of external stream is not available in macOS bare metal builds. ### 2.5.10 (Controlled Release) {#2_5_10} @@ -149,7 +149,7 @@ Compared to the [2.5.9](#2_5_9) release: You can upgrade a deployment of Timeplus Enterprise 2.4 to Timeplus Enterprise 2.5, by stopping the components and replacing the binary files, or reusing the Docker or Kubernetes volumes and update the image versions. #### Known issues {#known_issue_2_5_10} -1. If you have deployed one of the [2.4.x releases](/enterprise-v2.4), you can reuse the data and configuration directly. However, if your current deployment is [2.3](/enterprise-v2.3) or earlier, you cannot upgrade directly. Please have a clean installation of 2.5.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for migration. +1. If you have deployed one of the [2.4.x releases](/enterprise-v2.4), you can reuse the data and configuration directly. However, if your current deployment is [2.3](/enterprise-v2.3) or earlier, you cannot upgrade directly. Please have a clean installation of 2.5.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for migration. 2. Pulsar external streams are only available in Linux bare metal builds and Linux-based Docker images. This type of external stream is not available in macOS bare metal builds. ### 2.5.9 (Controlled Release) {#2_5_9} @@ -168,12 +168,12 @@ Component versions: Compared to the [2.4.23](/enterprise-v2.4#2_4_23) release: * timeplusd 2.3.30 -> 2.4.23 - * new type of [External Streams for Apache Pulsar](/pulsar-external-stream). + * new type of [External Streams for Apache Pulsar](/pulsar-source). * for bare metal installation, previously you can login with the username `default` with empty password. To improve the security, this user has been removed. * enhancement for nullable data types in streaming and historical queries. - * Materialized views auto-rebalancing in the cluster mode.[Learn more](/view#auto-balancing). + * Materialized views auto-rebalancing in the cluster mode.[Learn more](/materialized-view#auto-balancing). * Approximately 30% faster data ingestion and replication in the cluster mode. - * Performance improvement for [ASOF JOIN](/joins) and [EMIT ON UPDATE](/streaming-aggregations#emit_on_update). + * Performance improvement for [ASOF JOIN](/streaming-joins) and [EMIT ON UPDATE](/streaming-aggregations#emit_on_update). * timeplus_web 1.4.33 -> 2.0.6 * UI to add/remove user or change role and password. This works for both single node and cluster. * UI for inputs/outputs from Redpanda Connect. @@ -194,5 +194,5 @@ Compared to the [2.4.23](/enterprise-v2.4#2_4_23) release: You can upgrade a deployment of Timeplus Enterprise 2.4 to Timeplus Enterprise 2.5, by stopping the components and replacing the binary files, or reusing the Docker or Kubernetes volumes and update the image versions. #### Known issues {#known_issue_2_5_9} -1. If you have deployed one of the [2.4.x releases](/enterprise-v2.4), you can reuse the data and configuration directly. However, if your current deployment is [2.3](/enterprise-v2.3) or earlier, you cannot upgrade directly. Please have a clean installation of 2.5.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for migration. +1. If you have deployed one of the [2.4.x releases](/enterprise-v2.4), you can reuse the data and configuration directly. However, if your current deployment is [2.3](/enterprise-v2.3) or earlier, you cannot upgrade directly. Please have a clean installation of 2.5.x release, then use tools like [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for migration. 2. Pulsar external streams are only available in Linux bare metal builds and Linux-based Docker images. This type of external stream is not available in macOS bare metal builds. diff --git a/docs/enterprise-v2.6.md b/docs/enterprise-v2.6.md index b6993a136..267539d3e 100644 --- a/docs/enterprise-v2.6.md +++ b/docs/enterprise-v2.6.md @@ -13,7 +13,7 @@ Each component maintains its own version numbers. The version number for each Ti Key highlights of this release: * **Revolutionary hybrid hash table technology.** For streaming SQL with JOINs or aggregations, by default a memory based hash table is used. This is helpful for preventing the memory limits from being exceeded for large data streams with hundreds of GB of data. You can adjust the query setting to apply the new hybrid hash table, which uses both the memory and the local disk to store the internal state as a hash table. * **Enhanced operational visibility.** Gain complete transparency into your system's performance through comprehensive monitoring of materialized views and streams. Track state changes, errors, and throughput metrics via [system.stream_state_log](/system-stream-state-log) and [system.stream_metric_log](/system-stream-metric-log). -* **Advanced cross-deployment integration.** Seamlessly write data to remote Timeplus deployments by configuring [Timeplus external stream](/timeplus-external-stream) as targets in materialized views. +* **Advanced cross-deployment integration.** Seamlessly write data to remote Timeplus deployments by configuring [Timeplus external stream](/timeplus-source) as targets in materialized views. * **Improved data management capabilities.** Add new columns to an existing stream. Truncate historical data for streams. Create new databases to organize your streams and materialized views. * **Optimized ClickHouse integration.** Significant performance improvements for read/write operations with ClickHouse external tables. * **Enhanced user experience.** New UI wizards for Coinbase data sources and Apache Pulsar external streams, alongside a redesigned SQL Console and SQL Helper interface for improved usability. Quick access to streams, dashboards, and common actions via Command+K (Mac) or Windows+K (PC) keyboard shortcuts. @@ -51,7 +51,7 @@ Upgrade Instructions: Users can upgrade from Timeplus Enterprise 2.5 to 2.6 by stopping components and replacing binary files, or by updating Docker/Kubernetes image versions while maintaining existing volumes. #### Known issues {#known_issue_2_6_8} -1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.6.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for data migration. +1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.6.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for data migration. 2. Pulsar external stream functionality is limited to Linux bare metal builds and Linux-based Docker images, excluding macOS bare metal builds. 3. The `timeplus_connector` component may experience health issues on Ubuntu Linux with x86_64 chips, affecting Redpanda Connect functionality. This issue is specific to Ubuntu and does not affect other Linux distributions. @@ -80,7 +80,7 @@ Upgrade Instructions: Users can upgrade from Timeplus Enterprise 2.5 to 2.6 by stopping components and replacing binary files, or by updating Docker/Kubernetes image versions while maintaining existing volumes. #### Known issues {#known_issue_2_6_7} -1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.6.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for data migration. +1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.6.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for data migration. 2. Pulsar external stream functionality is limited to Linux bare metal builds and Linux-based Docker images, excluding macOS bare metal builds. 3. The `timeplus_connector` component may experience health issues on Ubuntu Linux with x86_64 chips, affecting Redpanda Connect functionality. This issue is specific to Ubuntu and does not affect other Linux distributions. @@ -108,7 +108,7 @@ Upgrade Instructions: Users can upgrade from Timeplus Enterprise 2.5 to 2.6 by stopping components and replacing binary files, or by updating Docker/Kubernetes image versions while maintaining existing volumes. #### Known issues {#known_issue_2_6_6} -1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.6.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for data migration. +1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.6.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for data migration. 2. Pulsar external stream functionality is limited to Linux bare metal builds and Linux-based Docker images, excluding macOS bare metal builds. 3. The `timeplus_connector` component may experience health issues on Ubuntu Linux with x86_64 chips, affecting Redpanda Connect functionality. This issue is specific to Ubuntu and does not affect other Linux distributions. @@ -136,7 +136,7 @@ Upgrade Instructions: Users can upgrade from Timeplus Enterprise 2.5 to 2.6 by stopping components and replacing binary files, or by updating Docker/Kubernetes image versions while maintaining existing volumes. #### Known issues {#known_issue_2_6_5} -1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.6.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for data migration. +1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.6.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for data migration. 2. Pulsar external stream functionality is limited to Linux bare metal builds and Linux-based Docker images, excluding macOS bare metal builds. 3. The `timeplus_connector` component may experience health issues on Ubuntu Linux with x86_64 chips, affecting Redpanda Connect functionality. This issue is specific to Ubuntu and does not affect other Linux distributions. @@ -164,7 +164,7 @@ Upgrade Instructions: Users can upgrade from Timeplus Enterprise 2.5 to 2.6 by stopping components and replacing binary files, or by updating Docker/Kubernetes image versions while maintaining existing volumes. #### Known issues {#known_issue_2_6_4} -1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.6.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for data migration. +1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.6.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for data migration. 2. Pulsar external stream functionality is limited to Linux bare metal builds and Linux-based Docker images, excluding macOS bare metal builds. 3. The `timeplus_connector` component may experience health issues on Ubuntu Linux with x86_64 chips, affecting Redpanda Connect functionality. This issue is specific to Ubuntu and does not affect other Linux distributions. @@ -192,7 +192,7 @@ Upgrade Instructions: Users can upgrade from Timeplus Enterprise 2.5 to 2.6 by stopping components and replacing binary files, or by updating Docker/Kubernetes image versions while maintaining existing volumes. #### Known issues {#known_issue_2_6_3} -1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.6.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for data migration. +1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.6.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for data migration. 2. Pulsar external stream functionality is limited to Linux bare metal builds and Linux-based Docker images, excluding macOS bare metal builds. 3. The `timeplus_connector` component may experience health issues on Ubuntu Linux with x86_64 chips, affecting Redpanda Connect functionality. This issue is specific to Ubuntu and does not affect other Linux distributions. @@ -223,7 +223,7 @@ Upgrade Instructions: Users can upgrade from Timeplus Enterprise 2.5 to 2.6 by stopping components and replacing binary files, or by updating Docker/Kubernetes image versions while maintaining existing volumes. #### Known issues {#known_issue_2_6_2} -1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.6.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for data migration. +1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.6.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for data migration. 2. Pulsar external stream functionality is limited to Linux bare metal builds and Linux-based Docker images, excluding macOS bare metal builds. 3. The `timeplus_connector` component may experience health issues on Ubuntu Linux with x86_64 chips, affecting Redpanda Connect functionality. This issue is specific to Ubuntu and does not affect other Linux distributions. @@ -252,7 +252,7 @@ Compared to the [2.5.12](/enterprise-v2.5#2_5_12) release: * Implemented Kafka offset tracking in [system.stream_state_log](/system-stream-state-log), exportable via [timeplus diag](/cli-diag) command. * A `_tp_sn` column is added to each stream (except external streams or random streams), as the sequence number in the unified streaming and historical storage. This column is used for data replication among the cluster. By default, it is hidden in the query results. You can show it by setting `SETTINGS asterisk_include_tp_sn_column=true`. This setting is required when you use `INSERT..SELECT` SQL to copy data between streams: `INSERT INTO stream2 SELECT * FROM stream1 SETTINGS asterisk_include_tp_sn_column=true`. * New Features: - * Support for continuous data writing to remote Timeplus deployments via setting a [Timeplus external stream](/timeplus-external-stream) as the target in a materialized view. + * Support for continuous data writing to remote Timeplus deployments via setting a [Timeplus external stream](/timeplus-source) as the target in a materialized view. * New [EMIT PERIODIC .. REPEAT](/streaming-aggregations#emit_periodic_repeat) syntax for emitting the last aggregation result even when there is no new event. * Able to create or drop databases via SQL in a cluster. The web console will be enhanced to support different databases in the next release. * Historical data of a stream can be removed by `TRUNCATE STREAM stream_name`. @@ -288,6 +288,6 @@ Upgrade Instructions: Users can upgrade from Timeplus Enterprise 2.5 to 2.6 by stopping components and replacing binary files, or by updating Docker/Kubernetes image versions while maintaining existing volumes. #### Known issues {#known_issue_2_6_0} -1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.6.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for data migration. +1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.6.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for data migration. 2. Pulsar external stream functionality is limited to Linux bare metal builds and Linux-based Docker images, excluding macOS bare metal builds. 3. The `timeplus_connector` component may experience health issues on Ubuntu Linux with x86_64 chips, affecting Redpanda Connect functionality. This issue is specific to Ubuntu and does not affect other Linux distributions. diff --git a/docs/enterprise-v2.7.md b/docs/enterprise-v2.7.md index f80825fbb..7d4ccf564 100644 --- a/docs/enterprise-v2.7.md +++ b/docs/enterprise-v2.7.md @@ -11,7 +11,7 @@ Each component maintains its own version numbers. The version number for each Ti ## Key Highlights Key highlights of this release: -* **Stream processing for files in S3 buckets:** With the new [S3 external table](/s3-external), Timeplus Enterprise now supports writing stream processing results to S3 buckets, or reading files in S3. +* **Stream processing for files in S3 buckets:** With the new [S3 external table](/s3-sink), Timeplus Enterprise now supports writing stream processing results to S3 buckets, or reading files in S3. * **Join the latest data from MySQL or ClickHouse via dictionary:** You can now create a [dictionary](/sql-create-dictionary) to store key-value pairs in memory or a mutable stream, with data from various sources, such as files, MySQL/ClickHouse databases, or streams in Timeplus. * **PostgreSQL and MySQL CDC via Redpanda Connect:** Timeplus Enterprise now supports CDC (Change Data Capture) for PostgreSQL and MySQL databases via Redpanda Connect. This feature enables real-time data ingestion from these databases into Timeplus. * **Support IAM authentication for accessing Amazon MSK:** Avoid storing static credentials in Kafka external streams by setting `sasl_mechanism` to `AWS_MSK_IAM`. @@ -29,7 +29,7 @@ Key highlights of this release: |Kubernetes|Kubernetes 1.25+, with Helm 3.12+| ## Upgrade Guide -1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.7.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for data migration. +1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.7.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for data migration. 2. For bare metal users, you can upgrade from Timeplus Enterprise 2.6 to 2.7 by stopping components and replacing binary files. 3. For Kubernetes users, please follow [the guide](/k8s-helm#v5-to-v6) carefully since a few timeplusd built-in users are removed in the new helm chart, and you can configure ingress for Appserver and Timeplusd independently. @@ -304,13 +304,13 @@ Compared to the [2.6.0](/enterprise-v2.6#2_6_0) release: * To improve performance, we have optimized the schema for [system.stream_metric_log](/system-stream-metric-log) and [system.stream_state_log](/system-stream-state-log). * Security Enhancements: * **Support IAM authentication for accessing Amazon MSK:** Avoid storing static credentials in Kafka external streams by setting `sasl_mechanism` to `AWS_MSK_IAM`. - * **Integration with HashiCorp Vault:** You can now use HashiCorp Vault to store sensitive data, such as password for all types of external streams or external tables, and reference them in [config_file](/proton-kafka#config_file) setting. + * **Integration with HashiCorp Vault:** You can now use HashiCorp Vault to store sensitive data, such as password for all types of external streams or external tables, and reference them in [config_file](/kafka-source#config_file) setting. * Specify the non-root user in the Docker image to improve security. * New Features: - * **Stream processing for files in S3 buckets:** With the new [S3 external table](/s3-external), Timeplus Enterprise now supports writing stream processing results to S3 buckets, or read files in S3. + * **Stream processing for files in S3 buckets:** With the new [S3 external table](/s3-sink), Timeplus Enterprise now supports writing stream processing results to S3 buckets, or read files in S3. * **Join the latest data from MySQL or ClickHouse via dictionary:** You can now create a [dictionary](/sql-create-dictionary) to store key-value pairs in memory or a mutable stream, with data from various sources, such as files, MySQL/ClickHouse databases, or streams in Timeplus. * Replay historical data in local streams or Kafka external streams with the [replay_speed](/query-settings#replay_speed) setting. - * Read the header key-value pairs in the kafka external stream. [Learn more](/proton-kafka#_tp_message_headers) + * Read the header key-value pairs in the kafka external stream. [Learn more](/kafka-source) * [Python UDF](/py-udf): You can now create user-defined functions (UDFs) in Python to extend the functionality of Timeplus with rich ecosystem of Python. It's currently in technical preview for Linux x86_64 only. * timeplus_web 2.1.7 -> 2.2.10 * Significant improvements of materialized view monitoring and troubleshooting UI. diff --git a/docs/enterprise-v2.8.md b/docs/enterprise-v2.8.md index ba6b4b90d..f1c89a0eb 100644 --- a/docs/enterprise-v2.8.md +++ b/docs/enterprise-v2.8.md @@ -11,8 +11,8 @@ Each component maintains its own version numbers. The version number for each Ti ## Key Highlights Key highlights of this release: -* New Compute Node server role to [run materialized views elastically](/view#autoscaling_mv) with checkpoints on S3 storage. -* Timeplus can read or write data in Apache Iceberg tables. [Learn more](/iceberg) +* New Compute Node server role to [run materialized views elastically](/materialized-view#autoscaling_mv) with checkpoints on S3 storage. +* Timeplus can read or write data in Apache Iceberg tables. [Learn more](/iceberg-sink) * Timeplus can read or write PostgreSQL tables directly via [PostgreSQL External Table](/pg-external-table) or look up data via [dictionaries](/sql-create-dictionary#source_pg). * Use S3 as the [tiered storage](/tiered-storage) for streams. * New SQL command to [rename streams](/sql-rename-stream) or [columns](/sql-alter-stream#rename-column). @@ -135,11 +135,11 @@ Compared to the [2.8.1](#2_8_1) release: * Able to add or drop secondary index for mutable streams. * Able to set `version_column` to make sure only rows with higher value of the `version_column` will override the rows with same primary key. This setting can work with or without `coalesced`. * Support the `UUID` data type for primary key columns. - * **[HTTP External Stream](/http-external):** Added a new type of external stream to send streaming data to external HTTP endpoints, such as Splunk, Open Search and Slack. - * **[MongoDB External Table](/mongo-external):** Added a new type of external table to send streaming data to MongoDB. + * **[HTTP External Stream](/http-external-stream):** Added a new type of external stream to send streaming data to external HTTP endpoints, such as Splunk, Open Search and Slack. + * **[MongoDB External Table](/mongo-external-table):** Added a new type of external table to send streaming data to MongoDB. * Enhanced [MySQL External Table](/mysql-external-table) to support `replace_query` and `on_duplicate_clause` settings. - * Enhanced [Kafka External Stream](/proton-kafka) allows to customize the `partitioner` property, e.g. `settings properties='partitioner=murmur2'`. - * Enhanced [Kafka External Stream](/proton-kafka) and [Pulsar External Stream](/pulsar-external-stream) to support write message headers via `_tp_message_headers`. + * Enhanced [Kafka External Stream](/kafka-source) allows to customize the `partitioner` property, e.g. `settings properties='partitioner=murmur2'`. + * Enhanced [Kafka External Stream](/kafka-source) and [Pulsar External Stream](/pulsar-source) to support write message headers via `_tp_message_headers`. * Support [map_from_arrays](/functions_for_comp#map_from_arrays) and [map_cast](/functions_for_comp#map_cast) with 4 or more parameters. * [SHOW CREATE](/sql-show-create#show_multi_versions) command supports `show_multi_versions=true` to get the history of the object. * New query setting [precise_float_parsing](/query-settings#precise_float_parsing) to precisely handle float numbers. @@ -150,7 +150,7 @@ Compared to the [2.8.1](#2_8_1) release: * Improved the support for gRPC protocol. * Support [EMIT TIMEOUT](/streaming-aggregations#emit-timeout) for both global aggregations and window aggregations. * Able to change log level during runtime via [SYSTEM SET LOG LEVEL](/sql-system-set-log-level) or REST API. - * Support new JOIN type [FULL LATEST JOIN](/joins#full-latest-join). + * Support new JOIN type [FULL LATEST JOIN](/streaming-joins#full-latest-join). * timeplus_web 2.8.8 -> 2.8.12 * Some new UI features and enhancements in 2.9 are ported to 2.8.2: * **Materialized Views (MVs):** @@ -205,7 +205,7 @@ Compared to the [2.8.0 (Preview)](#2_8_0) release: * Fix Kafka external stream parsing issue. * Improve mutable stream creation flow when defined via engine. * When using `CREATE OR REPLACE FORMAT SCHEMA` to update an existing schema, and using `DROP FORMAT SCHEMA` to delete a schema, Timeplus will clean up the Protobuf schema cache to avoid misleading errors. - * Support writing Kafka message timestamp via [_tp_time](/proton-kafka) + * Support writing Kafka message timestamp via [_tp_time](/kafka-source) * Enable IPv6 support for KeyValueService * Simplified the [EMIT syntax](/streaming-aggregations#emit) to make it easier to read and use. * Support [EMIT ON UPDATE WITH DELAY](/streaming-aggregations#emit_on_update_with_delay) @@ -281,7 +281,7 @@ If you are still not sure, here are the things that would be broken without migr For Kubernetes users, please follow [the guide](/k8s-helm#v6-to-v7) to do the migration. #### Known issues {#known_issue_2_8_0} -1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.7.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for data migration. +1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.7.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for data migration. 2. Pulsar external stream functionality is limited to Linux bare metal builds and Linux-based Docker images, excluding macOS bare metal builds. 3. The `timeplus_connector` component may experience health issues on Ubuntu Linux with x86_64 chips, affecting Redpanda Connect functionality. This issue is specific to Ubuntu and does not affect other Linux distributions. 4. Python UDF support is limited to Linux x86_64 bare metal and Linux x86_64 Docker image, excluding macOS or ARM builds. diff --git a/docs/enterprise-v2.9.md b/docs/enterprise-v2.9.md index 27e35c53e..5480d6e92 100644 --- a/docs/enterprise-v2.9.md +++ b/docs/enterprise-v2.9.md @@ -15,9 +15,9 @@ Key highlights of the Timeplus 2.9 release include: * **Enhanced Mutable Streams:** Introducing online schema evolution, versioning, coalesced storage, Time-To-Live (TTL), and secondary index management capabilities. * **Native JSON Support:** A new native JSON data type and powerful [json_encode](/functions_for_json#json_encode) / [json_cast](/functions_for_json#json_cast) functions simplify working with JSON. * **Improved Data Integrity:** Dead Letter Queue (DLQ) support for Materialized Views ensures robust data processing. -* **Expanded Connectivity:** Native [HTTP External Stream](/http-external) for seamless integration with systems like Splunk, Elasticsearch, and more. +* **Expanded Connectivity:** Native [HTTP External Stream](/http-external-stream) for seamless integration with systems like Splunk, Elasticsearch, and more. * **Performance Boost:** [JIT (Just-In-Time) compilation](/jit) for streaming queries delivers significant performance and efficiency improvements. Large cardinality sessionization. -* **Parameterized Views:** Create [Parameterized Views](/view#parameterized-views) for more flexible and reusable query patterns. +* **Parameterized Views:** Create [Parameterized Views](/view#parameterized-view) for more flexible and reusable query patterns. * **Scalable Log Processing:** Distributed LogStream enables efficient handling of large volumes of log data. * **Broader UDF Support:** Python UDFs now run on ARM CPUs (Linux/macOS), and JavaScript UDFs benefit from multiple V8 instances. * **Refined Cluster UI:** The web console offers an improved experience for visualizing and managing cluster nodes. @@ -34,7 +34,7 @@ We recommend using stable releases for production deployment. Engineering builds ### 2.9.0 (Preview 3) {#2_9_0-preview_3} Released on 07-31-2025. Installation options: -* For Linux or Mac users: `curl https://install.timeplus.com/2.9 | sh` [Downloads](/release-downloads#2_9_0-preview_3) +* For Linux or Mac users: `curl https://install.timeplus.com/2.9 | sh` [Downloads](/release-downloads) * For Docker users (not recommended for production): `docker run -p 8000:8000 docker.timeplus.com/timeplus/timeplus-enterprise:2.9.0-preview.3` * We will provide new Helm Charts for Kubernetes deployment when v2.9 is GA. @@ -49,15 +49,15 @@ Component versions: Compared to the [2.8.1](/enterprise-v2.8#2_8_1) release: * timeplusd 2.8.26 -> 2.9.9-rc.26 * New Features: - * **Parameterized Views:** You can now create [parameterized views](/view#parameterized-views), allowing for more dynamic and reusable view definitions. + * **Parameterized Views:** You can now create [parameterized views](/view#parameterized-view), allowing for more dynamic and reusable view definitions. * **JIT Compilation for Queries:** Introduced [Just-In-Time (JIT) compilation](/jit) for queries, potentially improving execution performance for certain query types. * **New JSON Data Type & SQL Functions:** Added a native JSON data type and SQL functions [json_encode](/functions_for_json#json_encode), [json_cast](/functions_for_json#json_cast), [json_array_length](/functions_for_json#json_array_length), [json_merge_patch](/functions_for_json#json_merge_patch) for powerful JSON manipulation. * **Mutable Stream TTL:** You can now define Time-To-Live (TTL) for data in mutable streams, automatically managing data retention. * **Materialized View DLQ:** Introduced Dead Letter Queue (DLQ) support for materialized views to handle data processing errors more robustly. - * **[HTTP External Stream](/http-external):** Added a new type of external stream to send streaming data to external HTTP endpoints, such as Splunk, Open Search and Slack. - * **[MongoDB External Table](/mongo-external):** Added a new type of external table to send streaming data to MongoDB. + * **[HTTP External Stream](/http-external-stream):** Added a new type of external stream to send streaming data to external HTTP endpoints, such as Splunk, Open Search and Slack. + * **[MongoDB External Table](/mongo-external-table):** Added a new type of external table to send streaming data to MongoDB. * Enhanced [MySQL External Table](/mysql-external-table) to support `replace_query` and `on_duplicate_clause` settings. - * Enhanced [Kafka External Stream](/proton-kafka) and [Pulsar External Stream](/pulsar-external-stream) to support write message headers via `_tp_message_headers` + * Enhanced [Kafka External Stream](/kafka-source) and [Pulsar External Stream](/pulsar-source) to support write message headers via `_tp_message_headers` * Build and manage [Alerts](/alert) with SQL. Monitor your streaming data and automatically trigger actions when specific conditions are met. * **Python UDFs on ARM:** Python User-Defined Functions (UDFs) are now supported on ARM-based architectures (Linux/macOS), expanding platform compatibility. * **Improved JavaScript UDFs:** Enhanced JavaScript UDF execution with support for multiple V8 instances, improving concurrency and isolation (also available in 2.8.1 or above). JavaScript User Defined Aggregation Function supports null value as input. @@ -72,7 +72,7 @@ Compared to the [2.8.1](/enterprise-v2.8#2_8_1) release: * **Modifying Comments:** Added `ALTER COMMENT` support for streams, views, materialized views, KVStreams, and RandomStreams. * **Mutable Stream Schema Evolution:** Support for adding new columns and dropping secondary indexes in mutable streams. * Support writing to nested array of records Avro schemas - * Enhanced [Kafka External Stream](/proton-kafka) allows to customize the `partitioner` property, e.g. `settings properties='partitioner=murmur2'` + * Enhanced [Kafka External Stream](/kafka-source) allows to customize the `partitioner` property, e.g. `settings properties='partitioner=murmur2'` * New query setting [precise_float_parsing](/query-settings#precise_float_parsing) to precisely handle float numbers. * Added emit policy [EMIT TIMEOUT](/streaming-aggregations#emit-timeout) and [EMIT PER EVENT](/streaming-aggregations#emit-per-event). * Added new functions `array_partial_sort`, `array_partial_reverse_sort`, and `ulid_string_to_date_time`. @@ -106,7 +106,7 @@ Compared to the [2.8.1](/enterprise-v2.8#2_8_1) release: * Improved layout for HTTP source creation and other external stream Guided Data Ingestion (GDI) UIs. * **SQL Query:** side panel is simplified by removing the snippets and functions accordion, long SQL statement is wrapped by default, cursor position is kept when you switch pages or tabs. * Resource Management (Streams, MVs, Views, UDFs): - * Replaced the Redpanda-Connect based HTTP sink and Slack sink with the new [HTTP External Stream](/http-external) in the core engine. + * Replaced the Redpanda-Connect based HTTP sink and Slack sink with the new [HTTP External Stream](/http-external-stream) in the core engine. * **Materialized Views (MVs):** * Added UI support for **pausing and resuming** materialized views. * Introduced **Dead Letter Queue (DLQ)** support and UI for MVs. @@ -147,7 +147,7 @@ Upgrade Instructions: If you install Timeplus Enterprise 2.7 or earlier, the metadata for the Redpanda Connect sources and sinks are saved in a special key/value service. v2.8 switches to mutable streams for such metadata by default and provides a migration tool. In 2.9, all metadata are saved in mutable streams and the previous key/value service has been removed. Please upgrade to 2.8 first if you are on 2.7 or earlier. Then upgrade to 2.9. #### Known issues {#known_issue_2_9_0-preview_2} -1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.9.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-external-stream) for data migration. +1. Direct upgrades from version 2.3 or earlier are not supported. Please perform a clean installation of 2.9.x and utilize [timeplus sync](/cli-sync) CLI or [Timeplus External Stream](/timeplus-source) for data migration. 2. For existing deployments with any version from 2.3 to 2.7, please upgrade to 2.8 first and migrate the metadata. . 3. Pulsar external stream functionality is limited to Linux bare metal builds and Linux-based Docker images, excluding macOS bare metal builds. 4. The `timeplus_connector` component may experience health issues on Ubuntu Linux with x86_64 chips, affecting Redpanda Connect functionality. This issue is specific to Ubuntu and does not affect other Linux distributions. diff --git a/docs/external-stream.md b/docs/external-stream.md index 4d9a5237b..93ec88c15 100644 --- a/docs/external-stream.md +++ b/docs/external-stream.md @@ -5,9 +5,9 @@ You can create **External Streams** in Timeplus to query data in the external sy You can run streaming analytics with the external streams in the similar way as other streams. Timeplus supports 4 types of external streams: -* [Kafka External Stream](/proton-kafka) -* [Pulsar External Stream](/pulsar-external-stream) -* [Timeplus External Stream](/timeplus-external-stream), only available in Timeplus Enterprise +* [Kafka External Stream](/kafka-source) +* [Pulsar External Stream](/pulsar-source) +* [Timeplus External Stream](/timeplus-source), only available in Timeplus Enterprise * [Log External Stream](/log-stream) (experimental) -Besides external streams, Timeplus also provides external tables to query data in ClickHouse, MySQL, Postgres or S3/Iceberg. The difference of external tables and external streams is that external tables are not real-time, and they are not designed for streaming analytics. You can use external tables to query data in the external systems, but you cannot run streaming SQL on them. [Learn more about external tables](/proton-clickhouse-external-table). +Besides external streams, Timeplus also provides external tables to query data in ClickHouse, MySQL, Postgres or S3/Iceberg. The difference of external tables and external streams is that external tables are not real-time, and they are not designed for streaming analytics. You can use external tables to query data in the external systems, but you cannot run streaming SQL on them. [Learn more about external tables](/clickhouse-external-table). diff --git a/docs/functions_for_datetime.md b/docs/functions_for_datetime.md index e3748d7f0..38cf7c839 100644 --- a/docs/functions_for_datetime.md +++ b/docs/functions_for_datetime.md @@ -216,7 +216,7 @@ Supported unit: ### date_diff_within -`date_diff_within(timegap,time1, time2)` returns true or false. This function only works in [stream-to-stream join](/joins). Check whether the gap between `time1` and `time2` are within the specific range. For example `date_diff_within(10s,payment.time,notification.time)` to check whether the payment time and notification time are within 10 seconds or less. +`date_diff_within(timegap,time1, time2)` returns true or false. This function only works in [stream-to-stream join](/streaming-joins). Check whether the gap between `time1` and `time2` are within the specific range. For example `date_diff_within(10s,payment.time,notification.time)` to check whether the payment time and notification time are within 10 seconds or less. ### date_trunc diff --git a/docs/functions_for_streaming.md b/docs/functions_for_streaming.md index 9e722ccb8..242626d71 100644 --- a/docs/functions_for_streaming.md +++ b/docs/functions_for_streaming.md @@ -22,7 +22,7 @@ Please note, the `table` function also works in other types of streams: * Timeplus external stream: read the existing data for a stream in a remote Timeplus. * Random stream: generate a block of random data. The number of rows in the block is predefined and subject to change. The current value is 65409. For testing or demonstration purpose, you can create a random stream with multiple columns and use the table function to generate random data at once. -Learn more about [Non-streaming queries](/history). +Learn more about [Non-streaming queries](/historical-query). ### tumble @@ -114,7 +114,7 @@ Otherwise, if you run queries with `dedup(table(my_stream),id)` the earliest eve ### date_diff_within -`date_diff_within(timegap,time1, time2)` returns true or false. This function only works in [Range Bidirectional Join](/joins#range-join). Check whether the gap between `time1` and `time2` are within the specific range. For example `date_diff_within(10s,payment.time,notification.time)` to check whether the payment time and notification time are within 10 seconds or less. +`date_diff_within(timegap,time1, time2)` returns true or false. This function only works in [Range Bidirectional Join](/streaming-joins#range-join). Check whether the gap between `time1` and `time2` are within the specific range. For example `date_diff_within(10s,payment.time,notification.time)` to check whether the payment time and notification time are within 10 seconds or less. ✅ streaming query diff --git a/docs/glossary.md b/docs/glossary.md index b02b39b54..d6d4df1fb 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -45,7 +45,7 @@ Event time is used almost everywhere in Timeplus data processing and analysis wo #### Specify during data ingestion -When you [ingest data](/ingestion) into Timeplus, you can specify an attribute in the data which best represents the event time. Even if the attribute is in `String` type, Timeplus will automatically convert it to a timestamp for further processing. +When you [ingest data](/connect-data-in) into Timeplus, you can specify an attribute in the data which best represents the event time. Even if the attribute is in `String` type, Timeplus will automatically convert it to a timestamp for further processing. If you don't choose an attribute in the wizard, then Timeplus will use the ingestion time to present the event time, i.e. when Timeplus receives the data. This may work well for most static or dimensional data, such as city names with zip codes. @@ -98,7 +98,7 @@ Once the materialized view is created, Timeplus will run the query in the backgr Timeplus provides powerful streaming analytics capabilities through the enhanced SQL. By default, queries are unbounded and keep pushing the latest results to the client. The unbounded query can be converted to a bounded query by applying the function [table()](/functions_for_streaming#table), when the user wants to ask the question about what has happened like the traditional SQL. -Learn more: [Streaming Query](/stream-query) and [Non-Streaming Query](/history) +Learn more: [Streaming Query](/streaming-query) and [Non-Streaming Query](/historical-query) ## sink {#sink} @@ -106,13 +106,13 @@ a.k.a. destination. Only available in Timeplus Enterprise, not in Timeplus Proto Timeplus enables you to send real-time insights or transformed data to other systems, either to notify individuals or power up downstream applications. -Learn more: [Destination](/destination). +Learn more: [Destination](/send-data-out). ## source {#source} -A source is a background job in Timeplus Enterprise to load data into a [stream](#stream). For Kafka API compatible streaming data platform, you need to create [Kafka external streams](/proton-kafka). +A source is a background job in Timeplus Enterprise to load data into a [stream](#stream). For Kafka API compatible streaming data platform, you need to create [Kafka external streams](/kafka-source). -Learn more: [Data Collection](/ingestion) +Learn more: [Data Collection](/connect-data-in) ## stream {#stream} @@ -128,4 +128,4 @@ When you create a source and preview the data, you can choose a column as the ti You can define reusable SQL statements as views, so that you can query them as if they are streams `select .. from view1 ..` By default, views don't take any extra computing or storage resources. They are expanded to the SQL definition when they are queried. You can also create materialized views to 'materialize' them (keeping running them in the background and saving the results to the disk). -Learn more: [View](/view) and [Materialized View](/view#m_view) +Learn more: [View](/materialized-view) and [Materialized View](/materialized-view) diff --git a/docs/history.md b/docs/historical-query.md similarity index 100% rename from docs/history.md rename to docs/historical-query.md diff --git a/docs/howtos.md b/docs/howtos.md index ee969b021..13d341a07 100644 --- a/docs/howtos.md +++ b/docs/howtos.md @@ -2,7 +2,7 @@ ## How to read/write Kafka or Redpanda {#kafka} -You use [External Stream](/proton-kafka) to read from Kafka topics or write data to the topics. We verified the integration with Apache Kafka, Confluent Cloud, Confluent Platform, Redpanda, WarpStream and many more. +You use [External Stream](/kafka-source) to read from Kafka topics or write data to the topics. We verified the integration with Apache Kafka, Confluent Cloud, Confluent Platform, Redpanda, WarpStream and many more. ```sql CREATE EXTERNAL STREAM [IF NOT EXISTS] stream_name @@ -19,11 +19,11 @@ For PostgreSQL, MySQL or other OLTP databases, you can apply the CDC (Change Dat -If you have data in local ClickHouse or ClickHouse Cloud, you can also use [External Table](/proton-clickhouse-external-table) to read data. +If you have data in local ClickHouse or ClickHouse Cloud, you can also use [External Table](/clickhouse-external-table) to read data. ## How to read/write ClickHouse {#clickhouse} -You use [External Table](/proton-clickhouse-external-table) to read from ClickHouse tables or write data to the ClickHouse tables. We verified the integration with self-hosted ClickHouse, ClickHouse Cloud, Aiven for ClickHouse and many more. +You use [External Table](/clickhouse-external-table) to read from ClickHouse tables or write data to the ClickHouse tables. We verified the integration with self-hosted ClickHouse, ClickHouse Cloud, Aiven for ClickHouse and many more. @@ -37,7 +37,7 @@ You can use tools like Debezium to send CDC messages to Timeplus, or just use `I ## How to work with JSON {#json} -Proton supports powerful, yet easy-to-use JSON processing. You can save the entire JSON document as a `raw` column in `string` type. Then use JSON path as the shortcut to access those values as string. For example `raw:a.b.c`. If your data is in int/float/bool or other type, you can also use `::` to convert them. For example `raw:a.b.c::int`. If you want to read JSON documents in Kafka topics, you can choose to read each JSON as a `raw` string, or read each top level key/value pairs as columns. Please check the [doc](/proton-kafka) for details. +Proton supports powerful, yet easy-to-use JSON processing. You can save the entire JSON document as a `raw` column in `string` type. Then use JSON path as the shortcut to access those values as string. For example `raw:a.b.c`. If your data is in int/float/bool or other type, you can also use `::` to convert them. For example `raw:a.b.c::int`. If you want to read JSON documents in Kafka topics, you can choose to read each JSON as a `raw` string, or read each top level key/value pairs as columns. Please check the [doc](/kafka-source) for details. diff --git a/docs/http-external.md b/docs/http-external-stream.md similarity index 58% rename from docs/http-external.md rename to docs/http-external-stream.md index 2831f7f94..62ba2ceb6 100644 --- a/docs/http-external.md +++ b/docs/http-external-stream.md @@ -4,7 +4,7 @@ You can send data to HTTP endpoints via the HTTP External Stream. You can use th Currently, it only supports writing data to HTTP endpoints, but reading data from HTTP endpoints is not supported yet. -## CREATE EXTERNAL STREAM +## Create HTTP External Stream To create an external stream for HTTP endpoints, you can run the following DDL SQL: @@ -40,66 +40,6 @@ For the full list of settings, see the [DDL Settings](#ddl-settings) section. ### Examples -#### Write to OpenSearch/ElasticSearch {#example-write-to-es} -Assuming you have created an index `students` in a deployment of OpenSearch or ElasticSearch, you can create the following external stream to write data to the index. - -```sql -CREATE EXTERNAL STREAM opensearch_t1 ( - name string, - gpa float32, - grad_year int16 -) SETTINGS -type = 'http', -data_format = 'OpenSearch', --can also use the alias "ElasticSearch" -url = 'https://opensearch.company.com:9200/students/_bulk', -username='admin', -password='..' -``` - -Then you can insert data via a materialized view or just -```sql -INSERT INTO opensearch_t1(name,gpa,grad_year) VALUES('Jonathan Powers',3.85,2025); -``` - -#### Write to Splunk {#example-write-to-splunk} -Follow [the guide](https://docs.splunk.com/Documentation/Splunk/9.4.1/Data/UsetheHTTPEventCollector) to set up and use HTTP Event Collector(HEC) in Splunk. Make sure you create a HEC token for the desired index and enable it. - -Create the HTTP external stream in Timeplus: -```sql -CREATE EXTERNAL STREAM http_splunk_t1 (event string) -SETTINGS -type = 'http', -data_format = 'JSONEachRow', -http_header_Authorization='Splunk the-hec-token', -url = 'http://host:8088/services/collector/event' -``` - -Then you can insert data via a materialized view or just -```sql -INSERT INTO http_splunk_t1 VALUES('test1'),('test2'); -``` - -#### Write to Datadog {#example-write-to-datadog} - -Create or use an existing API key with the proper permission for sending data. - -Create the HTTP external stream in Timeplus: -```sql -CREATE EXTERNAL STREAM datadog_t1 (event string) -SETTINGS -type = 'http', -data_format = 'JSONEachRow', -output_format_json_array_of_rows = 1, -http_header_DD_API_KEY = 'THE_API_KEY', -http_header_Content_Type = 'application/json', -url = 'https://http-intake.logs.us3.datadoghq.com/api/v2/logs' --make sure you set the right region -``` - -Then you can insert data via a materialized view or just -```sql -INSERT INTO datadog_t1(message, hostname) VALUES('test message','a.test.com'),('test2','a.test.com'); -``` - #### Write to Algolia {#example-write-to-algolia} The [Algolia Ingestion API](https://www.algolia.com/doc/rest-api/ingestion/) accepts multiple rows in a single request in the following JSON payload: @@ -138,93 +78,6 @@ INSERT INTO http_algolia_t1(firstname,lastname,zip_code) VALUES('firstnameA','lastnameA',123),('firstnameB','lastnameB',987) ``` -#### Write to BigQuery {#example-write-to-bigquery} - -Assume you have created a table in BigQuery with 2 columns: -```sql -create table `PROJECT.DATASET.http_sink_t1`( - num int, - str string); -``` - -Follow [the guide](https://cloud.google.com/bigquery/docs/authentication) to choose the proper authentication to Google Cloud, such as via the gcloud CLI `gcloud auth application-default print-access-token`. - -Create the HTTP external stream in Timeplus: -```sql -CREATE EXTERNAL STREAM http_bigquery_t1 (num int,str string) -SETTINGS -type = 'http', -http_header_Authorization='Bearer $OAUTH_TOKEN', -url = 'https://bigquery.googleapis.com/bigquery/v2/projects/$PROJECT/datasets/$DATASET/tables/$TABLE/insertAll', -data_format = 'Template', -format_template_resultset_format='{"rows":[${data}]}', -format_template_row_format='{"json":{"num":${num:JSON},"str":${str:JSON}}}', -format_template_rows_between_delimiter=',' -``` - -Replace the `OAUTH_TOKEN` with the output of `gcloud auth application-default print-access-token` or other secure way to obtain OAuth token. Replace `PROJECT`, `DATASET` and `TABLE` to match your BigQuery table path. Also change `format_template_row_format` to match the table schema. - -Then you can insert data via a materialized view or just via `INSERT` command: -```sql -INSERT INTO http_bigquery_t1 VALUES(10,'A'),(11,'B'); -``` - -#### Write to Databricks {#example-write-to-databricks} - -Follow [the guide](https://docs.databricks.com/aws/en/dev-tools/auth/pat) to create an access token for your Databricks workspace. - -Assume you have created a table in Databricks SQL warehouse with 2 columns: -```sql -CREATE TABLE sales ( - product STRING, - quantity INT -); -``` - -Create the HTTP external stream in Timeplus: -```sql -CREATE EXTERNAL STREAM http_databricks_t1 (product string, quantity int) -SETTINGS -type = 'http', -http_header_Authorization='Bearer $TOKEN', -url = 'https://$HOST.cloud.databricks.com/api/2.0/sql/statements/', -data_format = 'Template', -format_template_resultset_format='{"warehouse_id":"$WAREHOUSE_ID","statement": "INSERT INTO sales (product, quantity) VALUES (:product, :quantity)", "parameters": [${data}]}', -format_template_row_format='{ "name": "product", "value": ${product:JSON}, "type": "STRING" },{ "name": "quantity", "value": ${quantity:JSON}, "type": "INT" }', -format_template_rows_between_delimiter='' -``` - -Replace the `TOKEN`, `HOST`, and `WAREHOUSE_ID` to match your Databricks settings. Also change `format_template_row_format` and `format_template_row_format` to match the table schema. - -Then you can insert data via a materialized view or just via `INSERT` command: -```sql -INSERT INTO http_databricks_t1(product, quantity) VALUES('test',95); -``` - -This will insert one row per request. We plan to support batch insert and Databricks specific format to support different table schemas in the future. - -#### Trigger Slack Notifications {#example-trigger-slack} - -You can follow [the guide](https://api.slack.com/messaging/webhooks) to configure an "incoming webhook" to send notifications to a Slack channel. - -```sql -CREATE EXTERNAL STREAM http_slack_t1 (text string) SETTINGS -type = 'http', data_format='Template', -format_template_resultset_format='{"blocks":[{"type":"section","text":{"type":"mrkdwn","text":"${data}"}}]}', -format_template_row_format='${text:Raw}', -url = 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX' -``` - -Then you can insert data via a materialized view or just via `INSERT` command: -```sql -INSERT INTO http_slack_t1 VALUES('Hello World!'); -INSERT INTO http_slack_t1 VALUES('line1\nline2'); -INSERT INTO http_slack_t1 VALUES('msg1'),('msg2'); -INSERT INTO http_slack_t1 VALUES('This is unquoted text\n>This is quoted text\n>This is still quoted text\nThis is unquoted text again'); -``` - -Please follow Slack's [text formats](https://api.slack.com/reference/surfaces/formatting) guide to add rich text to your messages. - ### DDL Settings #### type @@ -233,7 +86,7 @@ The type of the external stream. The value must be `http` to send data to HTTP e #### config_file The `config_file` setting is available since Timeplus Enterprise 2.7. You can specify the path to a file that contains the configuration settings. The file should be in the format of `key=value` pairs, one pair per line. You can set the HTTP credentials or Authentication tokens in the file. -Please follow the example in [Kafka External Stream](/proton-kafka#config_file). +Please follow the example in [Kafka External Stream](/kafka-source#config_file). #### url The endpoint of the HTTP service. Different services and different use cases may have different endpoints. For example, to send data to a specified OpenSearch index, you can use `http://host:port/my_index/_bulk`. To send data to multiple indexes (depending on the column in the streaming SQL), you can use `http://host:port/_bulk` and also specify the `output_format_opensearch_index_column`. @@ -309,9 +162,3 @@ username='..', password='..', url = 'https://api.openobserve.ai/api/../default/_json' ``` - -## DROP EXTERNAL STREAM - -```sql -DROP STREAM [IF EXISTS] name -``` diff --git a/docs/iceberg-external-stream-sink.md b/docs/iceberg-external-stream-sink.md new file mode 100644 index 000000000..0bf531735 --- /dev/null +++ b/docs/iceberg-external-stream-sink.md @@ -0,0 +1,8 @@ +--- +id: iceberg-sink +title: Iceberg External Stream +--- + +import ExternalIcebergWrite from './shared/iceberg-external-stream.md'; + + diff --git a/docs/iceberg-external-stream-source.md b/docs/iceberg-external-stream-source.md new file mode 100644 index 000000000..d4b1912f7 --- /dev/null +++ b/docs/iceberg-external-stream-source.md @@ -0,0 +1,8 @@ +--- +id: iceberg-source +title: Iceberg External Stream +--- + +import ExternalIcebergRead from './shared/iceberg-external-stream.md'; + + diff --git a/docs/index.mdx b/docs/index.mdx index 4ebe6797b..7dc4490f2 100644 --- a/docs/index.mdx +++ b/docs/index.mdx @@ -30,8 +30,8 @@ Still curious about [the benefits of using Timeplus](/why-timeplus)? Explore our - -

Ingest data →

+ +

Connect Data In →

Connect Timeplus to Apache Kafka, Apache Pulsar, Confluent Cloud, or push with a REST API, SDKs, and beyond.

@@ -62,7 +62,7 @@ Still curious about [the benefits of using Timeplus](/why-timeplus)? Explore our

- +

Materialized Views →

Data streaming processing pipeline via streaming SQL. The results can be written to native Timeplus stream or external systems. diff --git a/docs/kafka-external-stream-sink.mdx b/docs/kafka-external-stream-sink.mdx new file mode 100644 index 000000000..fc7601549 --- /dev/null +++ b/docs/kafka-external-stream-sink.mdx @@ -0,0 +1,12 @@ +--- +id: kafka-sink +title: Kafka Sink +--- + +import ExternalKafkaBasics from './shared/kafka-external-stream.md'; +import ExternalKafkaWrite from './shared/kafka-external-stream-write.md'; +import ExternalKafkaClientProperties from './shared/kafka-external-stream-client-properties.md'; + + + + diff --git a/docs/kafka-external-stream-source.mdx b/docs/kafka-external-stream-source.mdx new file mode 100644 index 000000000..8d565809e --- /dev/null +++ b/docs/kafka-external-stream-source.mdx @@ -0,0 +1,12 @@ +--- +id: kafka-source +title: Kafka Source +--- + +import ExternalKafkaBasics from './shared/kafka-external-stream.md'; +import ExternalKafkaRead from './shared/kafka-external-stream-read.md'; +import ExternalKafkaClientProperties from './shared/kafka-external-stream-client-properties.md'; + + + + diff --git a/docs/proton-schema-registry.md b/docs/kafka-schema-registry.md similarity index 98% rename from docs/proton-schema-registry.md rename to docs/kafka-schema-registry.md index 751a1a0b8..cb16627ca 100644 --- a/docs/proton-schema-registry.md +++ b/docs/kafka-schema-registry.md @@ -70,4 +70,4 @@ INSERT INTO my_ex_stream SETTINGS force_refresh_schema=true ... ``` ::: -For the data type mappings between Avro and Timeplus data type, please check [this doc](/proton-format-schema#avro_types). +For the data type mappings between Avro and Timeplus data type, please check [this doc](/timeplus-format-schema#avro_types). diff --git a/docs/log-stream.md b/docs/log-stream.md index 44ea74bc4..9a9f21bd2 100644 --- a/docs/log-stream.md +++ b/docs/log-stream.md @@ -1,8 +1,10 @@ -# Log Files +# Log External Stream -You can use Timeplus as a lightweight and high-performance tool for log analysis. Please check [the blog](https://www.timeplus.com/post/log-stream-analysis) for more details. +## Overview -## Syntax +You can use Timeplus as a lightweight and high-performance tool for application log analysis. Please check [the blog](https://www.timeplus.com/post/log-stream-analysis) for more details. + +## Create External Log Stream Create an external stream with the log type to monitor log files, e.g. diff --git a/docs/materialized-view.md b/docs/materialized-view.md new file mode 100644 index 000000000..85658e734 --- /dev/null +++ b/docs/materialized-view.md @@ -0,0 +1,132 @@ +# Materialized View {#m_view} +Real-time data pipelines are built via Materialized Views in Timeplus. + +The difference between a materialized view and a regular view is that the materialized view is running in background after creation and the resulting stream is physically written to internal storage (hence it's called materialized). + +Once the materialized view is created, Timeplus will run the query in the background continuously and incrementally emit the calculated results according to the semantics of its underlying streaming select. + +## Create a Materialized View + +```sql +CREATE MATERIALIZED VIEW [IF NOT EXISTS] +[INTO ] +AS +``` + +## Use Materialized Views + +There are different ways to use the materialized views in Timeplus: + +1. Streaming mode: `SELECT * FROM materialized_view` Get the result for future data. This works in the same way as views. +2. Historical mode: `SELECT * FROM table(materialized_view)` Get all past results for the materialized view. +3. Historical + streaming mode: `SELECT * FROM materialized_view WHERE _tp_time>='1970-01-01'` Get all past results and as well as the future data. +4. Pre-aggregation mode: `SELECT * FROM table(materialized_view) where _tp_time in (SELECT max(_tp_time) as m from table(materialized_view))` This immediately returns the most recent query result. If `_tp_time` is not available in the materialized view, or the latest aggregation can produce events with different `_tp_time`, you can add the `emit_version()` to the materialized view to assign a unique ID for each emit and pick up the events with largest `emit_version()`. For example: + +```sql +create materialized view mv as +select emit_version() as version, window_start as time, count() as n, max(speed_kmh) as h from tumble(car_live_data,10s) +group by window_start, window_end; + +select * from table(mv) where version in (select max(version) from table(mv)); +``` + +You build data pipelines in Timeplus using materialized views. + + +## Load Balancing + +It's common to define many materialized views in Timeplus for various computation and analysis. Some materialized views can be memory-consuming or cpu-consuming. + +In Timeplus Enterprise cluster mode, you can schedule the materialized views in a proper way to ensure each node gets similar workload. + +### Manual Load Balancing {#memory_weight} + +Starting from [Timeplus Enterprise v2.3](/enterprise-v2.3), when you create a materialized view with DDL SQL, you can add an optional `memory_weight` setting for those memory-consuming materialized views, e.g. +```sql +CREATE MATERIALIZED VIEW my_mv +SETTINGS memory_weight = 10 +AS SELECT .. +``` + +When `memory_weight` is not set, by default the value is 0. When Timeplus Enterprise server starts, the system will list all materialized views, ordered by the memory weight and view names, and schedule them in the proper node. + +For example, in a 3-node cluster, you define 10 materialized views with names: mv1, mv2, .., mv9, mv10. If you create the first 6 materialized views with `SETTINGS memory_weight = 10`, then node1 will run mv1 and mv4; node2 will run mv2 and mv5; node3 will run mv3 and mv6; Other materialized views(mv7 to mv10) will be randomly scheduled on any nodes. + +It's recommended that each node in the Timeplus Enterprise cluster shares the same hardware specifications. For those resource-consuming materialized views, it's recommended to set the same `memory_weight`, such as 10, to get the expected behaviors to be dispatched to the proper nodes for load-balancing. + +### Auto Load Balancing {#auto-balancing} + +Starting from [Timeplus Enterprise v2.5](/enterprise-v2.5), you can also apply auto-load-balancing for memory-consuming materialized views in Timeplus Enterprise cluster. By default, this feature is enabled and there are 3 settings at the cluster level: + +```yaml +workload_rebalance_check_interval: 30s +workload_rebalance_overloaded_memory_util_threshold: 50% +workload_rebalance_heavy_mv_memory_util_threshold: 10% +``` + +As the administrator, you no longer need to determine which materialized views need to set a `memory_weight` setting. In a cluster, Timeplus will monitor the memory consumption for each materialized view. Every 30 seconds, configurable via `workload_rebalance_check_interval`, the system will check whether there are any node with memory over 50% full. If so, check whether there is any materialized view in such node consuming 10% or more memory. When those conditions are all met, rescheduling those materialized views to less busy nodes. During the rescheduling, the materialized view on the previous node will be paused and its checkpoint will be transferred to the target node, then the materialized view on target node will resume the streaming SQL based on the checkpoint. + +## Auto-Scaling Materialized Views {#autoscaling_mv} +Starting from [Timeplus Enterprise v2.8](/enterprise-v2.8), materialized views can be configured to run on elastic compute nodes. This can reduce TCO (Total Cost of Ownership), by enabling high concurrent materialized views scheduling, auto scale-out and scale-in according to workload. + +To enable this feature, you need to +1. create a S3 disk in the `s3_plain` type. +2. create a materialized view by setting the checkpoint storage to `s3` and use the s3 disk. +3. enable compute nodes in the cluster, with optional autoscaling based on your cloud or on-prem infrastructure. + +For example: +```sql +--S3 based checkpoint +CREATE DISK ckpt_s3_disk disk( + type = 's3_plain', + endpoint = 'https://mat-view-ckpt.s3.us-west-2.amazonaws.com/matv_ckpt/', + access_key_id = '...', + secret_access_key = '...'); + +CREATE MATERIALIZED VIEW mat_v_scale INTO clickhouse_table +AS SELECT … +SETTINGS +checkpoint_settings=’storage_type=s3;disk_name=ckpt_s3_disk;async=true;interval=5’; +``` + +## Drop Materialized Views + +Run the following SQL to drop a view or a materialized view. + +```sql +DROP VIEW [IF EXISTS] db.; +``` + +Like [CREATE STREAM](/sql-create-stream), stream deletion is an async process. + +## Best Practices + +* It's recommended to specify [a target stream](#target-stream) when creating a materialized view, no matter a stream in Timeplus, an external stream to Apache Kafka, Apache Pulsar, or external tables to ClickHouse, S3, Iceberg, etc. diff --git a/docs/mongo-external.md b/docs/mongo-external-table.md similarity index 97% rename from docs/mongo-external.md rename to docs/mongo-external-table.md index 2f2988e20..885ba59ac 100644 --- a/docs/mongo-external.md +++ b/docs/mongo-external-table.md @@ -1,8 +1,10 @@ # MongoDB External Table +## Overview + You can send data to and read data from MongoDB collections via the MongoDB External Table. -## CREATE EXTERNAL TABLE +## Create MongoDB External Table To create an external table for MongoDB, you can run the following DDL SQL: @@ -86,9 +88,3 @@ By default this setting is `true`. While querying the MongoDB external table wit ```sql SELECT name, COUNT(*) AS cnt FROM mongodb_ext_table GROUP BY name HAVING cnt >5 SETTINGS mongodb_throw_on_unsupported_query = false; ``` - -## DROP EXTERNAL TABLE - -```sql -DROP STREAM [IF EXISTS] name -``` diff --git a/docs/mutable-stream.md b/docs/mutable-stream.md index b14fff6e7..fbb24baf4 100644 --- a/docs/mutable-stream.md +++ b/docs/mutable-stream.md @@ -4,7 +4,7 @@ This type of stream is only available in Timeplus Enterprise, with high performa As the name implies, the data in the stream is mutable. Value with the same primary key(s) will be overwritten. -The primary use case of mutable streams is serving as the lookup/dimensional data in [Streaming JOIN](/joins), supporting millions or even billions of unique keys. You can also use mutable streams as the "fact table" to efficiently do range queries or filtering for denormalized data model, a.k.a. OBT (One Big Table). +The primary use case of mutable streams is serving as the lookup/dimensional data in [Streaming JOIN](/streaming-joins), supporting millions or even billions of unique keys. You can also use mutable streams as the "fact table" to efficiently do range queries or filtering for denormalized data model, a.k.a. OBT (One Big Table). Learn more about why we introduced Mutable Streams by checking [this blog](https://www.timeplus.com/post/introducing-mutable-streams). @@ -81,7 +81,7 @@ SELECT * FROM mutable_stream_name WHERE condition SELECT * FROM table(mutable_stream_name) WHERE condition ``` -Mutable streams can be used in [JOINs](/joins) or as the source or cache for [Dictionaries](/sql-create-dictionary). +Mutable streams can be used in [JOINs](/streaming-joins) or as the source or cache for [Dictionaries](/sql-create-dictionary). ## Example @@ -185,7 +185,7 @@ SELECT .. FROM mutable_stream ``` This will query all existing data and accept new incoming data. -Mutable stream can also be used in [JOINs](/joins). +Mutable stream can also be used in [JOINs](/streaming-joins). ## Advanced Settings diff --git a/docs/mysql-external-table.md b/docs/mysql-external-table.md index a992a443b..9e7fc0ff7 100644 --- a/docs/mysql-external-table.md +++ b/docs/mysql-external-table.md @@ -1,5 +1,7 @@ # MySQL External Table +## Overview + Timeplus can read or write MySQL tables directly. This unlocks a set of new use cases, such as - Use Timeplus to efficiently process real-time data in Kafka/Redpanda, apply flat transformation or stateful aggregation, then write the data to the local or remote MySQL for further analysis or visualization. @@ -8,9 +10,7 @@ Timeplus can read or write MySQL tables directly. This unlocks a set of new use This integration is done by introducing "External Table" in Timeplus. Similar to [External Stream](/external-stream), there is no data persisted in Timeplus. However, since the data in MySQL is in the form of table, not data stream, so we call this as External Table. Currently, we support MySQL and ClickHouse. In the roadmap, we will support more integration by introducing other types of External Table. -## CREATE EXTERNAL TABLE - -### Syntax +## Create MySQL External Table ```sql CREATE EXTERNAL TABLE name @@ -35,7 +35,7 @@ The required settings are type and address. For other settings, the default valu The `config_file` setting is available since Timeplus Enterprise 2.7. You can specify the path to a file that contains the configuration settings. The file should be in the format of `key=value` pairs, one pair per line. You can set the MySQL user and password in the file. -Please follow the example in [Kafka External Stream](/proton-kafka#config_file). +Please follow the example in [Kafka External Stream](/kafka-source#config_file). You don't need to specify the columns, since the table schema will be fetched from the MySQL server. @@ -53,7 +53,7 @@ The data types in the output will be Timeplus data types, such as `uint8`, inste You can define the external table and use it to read data from the MySQL table, or write to it. -### Connect to a local MySQL {#local} +## Connect to a local MySQL {#local} Example SQL to connect to a local MySQL server without password: @@ -101,7 +101,9 @@ CREATE MATERIALIZED VIEW mv INTO mysql_table AS ``` ### Batching Settings + In Timeplus Enterprise, additional performance tuning settings are available, such as + ```sql INSERT INTO mysql_table SELECT * FROM some_source_stream diff --git a/docs/native-client.md b/docs/native-client.md new file mode 100644 index 000000000..2c5761cd7 --- /dev/null +++ b/docs/native-client.md @@ -0,0 +1,5 @@ +# Timeplusd Client + +Timeplus provides a native command-line client for executing SQL queries directly against a Timeplus server. It supports both interactive mode (for live query execution) and batch mode (for scripting and automation). Query results can be displayed in the terminal or exported to a file, with support for all Timeplus output formats, such as Pretty, CSV, JSON, and more. + +The client provides real-time feedback on query execution with a progress bar and the number of rows read, bytes processed and query execution time. It supports both command-line options and configuration files. diff --git a/docs/pg-external-table.md b/docs/pg-external-table.md index c9020981f..c69f2e36c 100644 --- a/docs/pg-external-table.md +++ b/docs/pg-external-table.md @@ -1,5 +1,7 @@ # PostgreSQL External Table +## Overview + Timeplus can read or write PostgreSQL tables directly. This unlocks a set of new use cases, such as - Use Timeplus to efficiently process real-time data in Kafka/Redpanda, apply flat transformation or stateful aggregation, then write the data to the local or remote PostgreSQL for further analysis or visualization. @@ -8,9 +10,7 @@ Timeplus can read or write PostgreSQL tables directly. This unlocks a set of new This integration is done by introducing "External Table" in Timeplus. Similar to [External Stream](/external-stream), there is no data persisted in Timeplus. However, since the data in PostgreSQL is in the form of table, not data stream, so we call this as External Table. Currently, we support S3, MySQL, PostgreSQL and ClickHouse. In the roadmap, we will support more integration by introducing other types of External Table. -## CREATE EXTERNAL TABLE - -### Syntax +## Create PostgreSQL External Table ```sql CREATE EXTERNAL TABLE name @@ -35,7 +35,7 @@ The required settings are type and address. For other settings, the default valu The `config_file` setting is available since Timeplus Enterprise 2.7. You can specify the path to a file that contains the configuration settings. The file should be in the format of `key=value` pairs, one pair per line. You can set the PostgreSQL user and password in the file. -Please follow the example in [Kafka External Stream](/proton-kafka#config_file). +Please follow the example in [Kafka External Stream](/kafka-source#config_file). You don't need to specify the columns, since the table schema will be fetched from the PostgreSQL server. @@ -53,7 +53,7 @@ The data types in the output will be Timeplus data types, such as `uint8`, inste You can define the external table and use it to read data from the PostgreSQL table, or write to it. -### Connect to a local PostgreSQL {#local} +## Connect to a local PostgreSQL {#local} You can use the following command to start a local PostgreSQL via Docker: ```bash @@ -90,7 +90,7 @@ Then query the table: SELECT * FROM pg_local; ``` -### Connect to Aiven for PostgreSQL {#aiven} +## Connect to Aiven for PostgreSQL {#aiven} Example SQL to connect to [Aiven for PostgreSQL](https://aiven.io/docs/products/postgresql/get-started): diff --git a/docs/private-beta-1.md b/docs/private-beta-1.md index f706bd4e9..1a3fe1756 100644 --- a/docs/private-beta-1.md +++ b/docs/private-beta-1.md @@ -38,7 +38,7 @@ Last weekly release in Private Beta 1. Starting from August 8, we are transiting - Streaming engine - - Refined the behavior of [materialized views](/view#m_view), to keep it consistent with the other Timeplus queries. `SELECT * FROM table(a_materialized_view)` will get all past results, instead of the recent one. + - Refined the behavior of [materialized views](/materialized-view), to keep it consistent with the other Timeplus queries. `SELECT * FROM table(a_materialized_view)` will get all past results, instead of the recent one. - Added the count_if function and unique_exact_if function to count the number of rows or unique value matching certain conditions. - Added json_extract_keys function to get the keys for the JSON map object. - Added the to_bool function to convert other types to `bool` @@ -140,7 +140,7 @@ Last weekly release in Private Beta 1. Starting from August 8, we are transiting - Streaming engine - More math functions are exposed. This can help you to run SQL-based simple ML/prediction models. - - (Experimental) [stream-to-stream join](/joins) no longer requires a ` date_diff_within(..)`, although it's still recommended to add timestamp constraints to improve performance. + - (Experimental) [stream-to-stream join](/streaming-joins) no longer requires a ` date_diff_within(..)`, although it's still recommended to add timestamp constraints to improve performance. - (Experimental) able to set a retention policy for each stream, either time-based (say only keep recent 7 days' data), or size based(say only keep recent 1GB data) - Source and sink - (Experimental) support Personal Access Token (PAT) in the REST API, which is long-living (or set an expiration date) and per-user. Tenant-level access token will be deprecated. @@ -163,7 +163,7 @@ Last weekly release in Private Beta 1. Starting from August 8, we are transiting - Streaming engine - (Experimental) new UI and API to create and query [external streams](/external-stream). You can query real-time data in Confluent Cloud, Apache Kafka or Redpanda immediately, without loading the data into Timeplus. - - (Experimental) [stream-to-stream join](/joins) is ready to test for beta customers, e.g. `SELECT .. FROM stream1 INNER JOIN stream2 ON stream1.id=stream2.id AND date_diff_within(10s)` + - (Experimental) [stream-to-stream join](/streaming-joins) is ready to test for beta customers, e.g. `SELECT .. FROM stream1 INNER JOIN stream2 ON stream1.id=stream2.id AND date_diff_within(10s)` - New function `date_diff_within` to determine whether 2 datetime are within the specified range. This is necessary for stream to stream join. You can also use more flexible expressions like `date_diff('second',left.time,right.time) between -3 and 5` - Source and sink - Enhanced the [datapm](https://datapm.io/docs/quick-start/) Timeplus sink to support loading JSON data from PostgreSQL. diff --git a/docs/private-beta-2.md b/docs/private-beta-2.md index 84cf30b53..c2554fb80 100644 --- a/docs/private-beta-2.md +++ b/docs/private-beta-2.md @@ -84,7 +84,7 @@ First product update in the Private Beta 2. * Source, sink, API and SDK * Released https://pypi.org/project/timeplus/0.2.0/ with optional tenant ID support. - * Supported Apache Pulsar and StreamNative Cloud as a data source or a data sink. You can load real-time data from Pulsar into Timeplus with REST API (web UI will be ready soon). [Learn more](/ingestion#pulsar) + * Supported Apache Pulsar and StreamNative Cloud as a data source or a data sink. You can load real-time data from Pulsar into Timeplus with REST API (web UI will be ready soon). [Learn more](/pulsar-source) * Added an experimental sink for Snowflake. You can send Timeplus real-time query results to Snowflake. * UI improvements * New login screen. diff --git a/docs/proton-faq.md b/docs/proton-faq.md index b2a4575bb..5ed340d40 100644 --- a/docs/proton-faq.md +++ b/docs/proton-faq.md @@ -36,11 +36,11 @@ Apache License 2.0 also prevents any contributor to Timeplus Proton—a member o ## What features are available with Timeplus Proton versus Timeplus Enterprise? {#compare} -Please refer to our [comparison page](/compare) for a detailed comparison of Timeplus Proton and Timeplus Enterprise. +Please refer to our [comparison page](/proton-oss-vs-enterprise) for a detailed comparison of Timeplus Proton and Timeplus Enterprise. ## My organization already uses ClickHouse—are there plans to integrate Timeplus Proton with the open source ClickHouse project? -You can create an [External Table](/proton-clickhouse-external-table) to read or write ClickHouse tables from Timeplus Proton. Check the tutorials for how to build streaming ETL [from Kafka to ClickHouse](/tutorial-sql-etl-kafka-to-ch), or [from MySQL to ClickHouse](/tutorial-sql-etl-mysql-to-ch), via Timeplus. +You can create an [External Table](/clickhouse-external-table) to read or write ClickHouse tables from Timeplus Proton. Check the tutorials for how to build streaming ETL [from Kafka to ClickHouse](/tutorial-sql-etl-kafka-to-ch), or [from MySQL to ClickHouse](/tutorial-sql-etl-mysql-to-ch), via Timeplus. We are also in conversation with the folks at ClickHouse, Inc., and the ClickHouse open source project at large, to scope the possibility of deep integration between the projects. @@ -52,8 +52,8 @@ Short answer: very easy. We designed Timeplus Proton's usage to be similar to Cl - The SQL keyword `AS` is required to create a temporary name for a table, stream, or a column. - We renamed data types and functions to remove camelcase. For example, ClickHouse's `toInt8()` is renamed `to_int8()` in Timeplus Proton. Our [functions](/functions) docs have additional details. - Not all ClickHouse functions are currently enabled in Timeplus Proton or work in a streaming query. If we should add or enhance the functions available in Timeplus Proton, let us know in the [GitHub issues](https://github.com/timeplus-io/proton/issues). -- Materialized Views in ClickHouse works for one source table, and data is processed at the index time. In Timeplus Proton, you can define a [Materialized View](/view#m_view) with a streaming SQL, for any number of streams, with JOIN, CTE, or subqueries. Timeplus Proton continuously runs the query and sends the results to the internal stream or the target stream. -- In Timeplus Proton, [JOINs](/joins) are a powerful and flexible means of combining data from multiple sources into a single stream. +- Materialized Views in ClickHouse works for one source table, and data is processed at the index time. In Timeplus Proton, you can define a [Materialized View](/materialized-view) with a streaming SQL, for any number of streams, with JOIN, CTE, or subqueries. Timeplus Proton continuously runs the query and sends the results to the internal stream or the target stream. +- In Timeplus Proton, [JOINs](/streaming-joins) are a powerful and flexible means of combining data from multiple sources into a single stream. See the documentation for full usage details. @@ -114,6 +114,6 @@ We also discuss our journey to releasing Timeplus Proton in open source in our [ ## How can I get started? -Learn how to pull and run the Timeplus Proton image and query a test stream in our [documentation](/proton#-deployment). To see a more complete use case in action, using Timeplus Proton, Redpanda, and sample live data, check out our [tutorial](/proton-kafka#tutorial) that leverages Docker Compose. +Learn how to pull and run the Timeplus Proton image and query a test stream in our [documentation](/proton#-deployment). To see a more complete use case in action, using Timeplus Proton, Redpanda, and sample live data, check out our tutorial that leverages Docker Compose. If you need advanced deployment strategies or features, with Timeplus Proton running behind the scenes, please download the [Timeplus Enterprise](https://timeplus.com/install) package. diff --git a/docs/compare.md b/docs/proton-oss-vs-enterprise.md similarity index 60% rename from docs/compare.md rename to docs/proton-oss-vs-enterprise.md index df8de5d95..9814d879c 100644 --- a/docs/compare.md +++ b/docs/proton-oss-vs-enterprise.md @@ -5,8 +5,8 @@ Timeplus Proton powers unified streaming and data processing on a single databas | Features | **Timeplus Proton** | **Timeplus Enterprise** | | ----------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Deployment** |

  • Single-node Docker image
  • Single binary on Mac/Linux
|
  • Single node
  • Multi-node Cluster for high availability and horizontal scalability, with data replication and distributed query processing
  • [Kubernetes Helm Chart](/k8s-helm)
| -| **Data Processing** |
  • [Streaming SQL](/stream-query)
  • [Historical Data Processing](/history)
  • [Append Stream](/append-stream), Random streams, Mutable Stream v1 ([Versioned Stream](/versioned-stream))
  • Streaming transformation, [join](/joins), [aggregation](/streaming-aggregations), [tumble/hop/session windows](/streaming-windows)
  • User-Defined Function: [JavaScript](/js-udf), [Remote](/remote-udf), [SQL](/sql-udf)
|
  • Everything in Timeplus Proton
  • Auto-scaling Materialized View
  • [Mutable Stream v2](/mutable-stream) with row-based storage for 3x performance and efficiency, also support coalesced and better high cardinality data mutations
  • Support more [EMIT](/streaming-aggregations#emit) strategies and spill-to-disk capabilities when memory is scarce
  • [Python UDF](/py-udf)
  • [Dynamic Dictionary](/sql-create-dictionary) based on MySQL/Postgres or Mutable streams
  • [Tiered Storage](/tiered-storage) using S3 or HDD
  • [Just-In-Time compilation](/jit)
| -| **External Systems** |
  • External streams to [Apache Kafka, Confluent Cloud, Redpanda](/proton-kafka), [Apache Pulsar](/pulsar-external-stream), and [Remote Timeplus](/timeplus-external-stream)
  • [Streaming ingestion via REST API (compact mode only)](/proton-ingest-api)
  • [External table to ClickHouse](/proton-clickhouse-external-table)
|
  • Everything in Timeplus Proton
  • External streams to [HTTP API](/http-external), External tables to [MySQL](/mysql-external-table), [PostgreSQL](/pg-external-table), [MongoDB](/mongo-external), [Amazon S3](/s3-external), [Apache Iceberg](/iceberg)
  • Hundreds of connectors from [Redpanda Connect](/redpanda-connect), e.g. WebSocket, HTTP Stream, NATS
  • CSV upload
  • [Streaming ingestion via REST API (with API key and flexible modes)](/ingest-api)
| +| **Data Processing** |
  • [Streaming SQL](/streaming-query)
  • [Historical Data Processing](/historical-query)
  • [Append Stream](/append-stream), Random streams, Mutable Stream v1 ([Versioned Stream](/versioned-stream))
  • Streaming transformation, [join](/streaming-joins), [aggregation](/streaming-aggregations), [tumble/hop/session windows](/streaming-windows)
  • User-Defined Function: [JavaScript](/js-udf), [Remote](/remote-udf), [SQL](/sql-udf)
|
  • Everything in Timeplus Proton
  • Auto-scaling Materialized View
  • [Mutable Stream v2](/mutable-stream) with row-based storage for 3x performance and efficiency, also support coalesced and better high cardinality data mutations
  • Support more [EMIT](/streaming-aggregations#emit) strategies and spill-to-disk capabilities when memory is scarce
  • [Python UDF](/py-udf)
  • [Dynamic Dictionary](/sql-create-dictionary) based on MySQL/Postgres or Mutable streams
  • [Tiered Storage](/tiered-storage) using S3 or HDD
  • [Just-In-Time compilation](/jit)
| +| **External Systems** |
  • External streams to [Apache Kafka, Confluent Cloud, Redpanda](/kafka-source), [Apache Pulsar](/pulsar-source), and [Remote Timeplus](/timeplus-source)
  • [Streaming ingestion via REST API (compact mode only)](/proton-ingest-api)
  • [External table to ClickHouse](/clickhouse-external-table)
|
  • Everything in Timeplus Proton
  • External streams to [HTTP API](/http-external-stream), External tables to [MySQL](/mysql-external-table), [PostgreSQL](/pg-external-table), [MongoDB](/mongo-external-table), [Amazon S3](/s3-sink), [Apache Iceberg](/iceberg-sink)
  • Hundreds of connectors from [Redpanda Connect](/redpanda-connect), e.g. WebSocket, HTTP Stream, NATS
  • CSV upload
  • [Streaming ingestion via REST API (with API key and flexible modes)](/ingest-api)
| | **Web Console** | |
  • [RBAC](/rbac), Pipeline Wizard, SQL Exploration, Data Lineage, Cluster Monitoring, Troubleshooting and Manageability
| | **Support** |
  • Community support from GitHub and Slack
|
  • Enterprise support via email, Slack, and Zoom, with a SLA
| diff --git a/docs/proton.md b/docs/proton.md index 789294ec4..d47328beb 100644 --- a/docs/proton.md +++ b/docs/proton.md @@ -8,7 +8,7 @@ Timeplus Proton is a stream processing engine and database. It is fast and light 2. **Fast.** Timeplus Proton is written in C++, with optimized performance through SIMD. [For example](https://www.timeplus.com/post/scary-fast), on an Apple MacBookPro with M2 Max, Timeplus Proton can deliver 90 million EPS, 4 millisecond end-to-end latency, and high cardinality aggregation with 1 million unique keys. 3. **Lightweight.** Timeplus Proton is a single binary (\<500MB). No JVM or any other dependencies. You can also run it with Docker, or on an AWS t2.nano instance (1 vCPU and 0.5 GiB memory). 4. **Powered by the fast, resource efficient and mature [ClickHouse](https://github.com/clickhouse/clickhouse).** Timeplus Proton extends the historical data, storage, and computing functionality of ClickHouse with stream processing. Thousands of SQL functions are available in Timeplus Proton. Billions of rows are queried in milliseconds. -5. **Best streaming SQL engine for [Kafka](https://kafka.apache.org/), [Redpanda](https://redpanda.com/), or [Pulsar](https://pulsar.apache.org/).** Query the live data in Kafka or other compatible streaming data platforms, with [external streams](/proton-kafka). +5. **Best streaming SQL engine for [Kafka](https://kafka.apache.org/), [Redpanda](https://redpanda.com/), or [Pulsar](https://pulsar.apache.org/).** Query the live data in Kafka or other compatible streaming data platforms, with [external streams](/kafka-source). ![Proton Architecture](/img/proton-arch.png) See our [architecture](/architecture) doc for technical details and our [FAQ](/proton-faq) for more information. @@ -65,7 +65,7 @@ SQL is the main interface. You can start a new terminal window with `proton clie You can also integrate Timeplus Proton with Python/Java/Go SDK, REST API, or BI plugins. Please check [Integration](#integration). ::: -In the `proton client`, you can write SQL to create [External Stream for Kafka](/proton-kafka) or [External Table for ClickHouse](/proton-clickhouse-external-table). +In the `proton client`, you can write SQL to create [External Stream for Kafka](/kafka-source) or [External Table for ClickHouse](/clickhouse-external-table). You can also run the following SQL to create a stream of random data: @@ -110,7 +110,7 @@ The following drivers are available: Integration with other systems: -- ClickHouse https://docs.timeplus.com/proton-clickhouse-external-table +- ClickHouse https://docs.timeplus.com/clickhouse-external-table - [Docker and Testcontainers](/tutorial-testcontainers-java) - [Sling](/sling) - Grafana https://github.com/timeplus-io/proton-grafana-source diff --git a/docs/public-beta-1.md b/docs/public-beta-1.md index 8e3e31e4f..a3794715a 100644 --- a/docs/public-beta-1.md +++ b/docs/public-beta-1.md @@ -33,7 +33,7 @@ We will update the beta version from time to time and list key enhancements in t * In the source management page, we added a sparkline to show the throughput for each source. This sparkline auto-refreshes every 15 seconds. * When you create a new source and choose to send data to an existing stream, only the streams with matching schema will be shown. If no existing streams match, you have to create a new stream. * In the preview step, the first 3 rows will be fetched from the source. If Timeplus cannot detect the column data type automatically, the column type will be set as `unknown`. This could happen if the value in those 3 events contain `null`. Please check with your data source provider. If you are sure the future events will be in a certain data type, such as `string`, you can change the column type and choose to create a new stream to receive data from the source. -* When you create a new [materialized view](/view#m_view), you can set a retention policy, specifying the max size or max age for the data in the materialized view. +* When you create a new [materialized view](/materialized-view), you can set a retention policy, specifying the max size or max age for the data in the materialized view. * Clicking on a recent query on the homepage will now open the query page, instead of showing the query history. * We removed the purple page description banners formerly at the top of each page. If there is no object defined in a certain page, a customized help message is shown. * You can click-and-drag to resize column width in the streaming table (query page). diff --git a/docs/public-beta-2.md b/docs/public-beta-2.md index 5809164ae..4c3ae6e1c 100644 --- a/docs/public-beta-2.md +++ b/docs/public-beta-2.md @@ -172,7 +172,7 @@ Enhancements: **Query** -- Simplified the `LATEST JOIN` syntax. No need to write `INNER LATEST JOIN`. [Learn more](/joins). +- Simplified the `LATEST JOIN` syntax. No need to write `INNER LATEST JOIN`. [Learn more](/streaming-joins). - For historical queries with tumble window aggregation, if there is no event in a window, such window won't be in the results. To show an empty window with default value(0 for numeric types and empty string for string), you can add order by window_start with fill step \ . - Auto-cleanup recent query logs: if there are more than 500, older queries are removed. @@ -209,7 +209,7 @@ Enhancements: - Introduce a new function `earliest_timestamp()` to return `1970-1-1 00:00:00`(UTC) You can also call this with `earliest_ts()`. A typical usage is `select * from stream where _tp_time>earliest_ts()` to list all data in the past and future. Again, the previous syntax `settings seek_to='earliest'` has been deprecated and will be removed soon. - You can also use `where _tp_time >..` multiple times in a query with JOIN/UNION, to time travel to different starting points for different streams. - To improve readability, you can use numeric literals with underscores, e.g. `select * from iot where age_second > 86_400 `. Underscores `_` inside numeric literals are ignored. - - Added a new [LATEST JOIN](/joins) for streaming SQL. For two append-only streams, if you use `a LEFT INNER LATEST JOIN b on a.key=b.key`, any time when the key changes on either stream, the previous join result will be canceled and a new result will be added. + - Added a new [LATEST JOIN](/streaming-joins) for streaming SQL. For two append-only streams, if you use `a LEFT INNER LATEST JOIN b on a.key=b.key`, any time when the key changes on either stream, the previous join result will be canceled and a new result will be added. ## February 17, 2023 @@ -227,7 +227,7 @@ Enhancements: - Stream list page shows the earliest and latest event time, helping you better understand the freshness for each data stream. - If you start running a streaming SQL then go to another page in Timeplus console, the query will be stopped automatically. This will reduce unnecessary server workload and the number of concurrent queries. - Improved the performance of query results in list mode. - - Performance tuning for [external streams](/external-stream) and [materialized views](/view#m_view). + - Performance tuning for [external streams](/external-stream) and [materialized views](/materialized-view). ## February 3, 2023 diff --git a/docs/pulsar-external-stream-sink.md b/docs/pulsar-external-stream-sink.md new file mode 100644 index 000000000..4fa363dd8 --- /dev/null +++ b/docs/pulsar-external-stream-sink.md @@ -0,0 +1,10 @@ +--- +id: pulsar-sink +title: Pulsar Sink +--- + +import ExternalPulsarBasics from './shared/pulsar-external-stream.md'; +import ExternalPulsarWrite from './shared/pulsar-external-stream-write.md'; + + + diff --git a/docs/pulsar-external-stream-source.md b/docs/pulsar-external-stream-source.md new file mode 100644 index 000000000..1cc045626 --- /dev/null +++ b/docs/pulsar-external-stream-source.md @@ -0,0 +1,10 @@ +--- +id: pulsar-source +title: Pulsar Source +--- + +import ExternalPulsarBasics from './shared/pulsar-external-stream.md'; +import ExternalPulsarRead from './shared/pulsar-external-stream-read.md'; + + + diff --git a/docs/pulsar-external-stream.md b/docs/pulsar-external-stream.md deleted file mode 100644 index c3b5dc6a0..000000000 --- a/docs/pulsar-external-stream.md +++ /dev/null @@ -1,464 +0,0 @@ -# Pulsar External Stream - -[Apache® Pulsar™](https://pulsar.apache.org/) is a multi-tenant, high-performance solution for server-to-server messaging. - -In [Timeplus Enterprise v2.5](/enterprise-v2.5), we added the first-class integration for Apache Pulsar, as a new type of [External Stream](/external-stream). You can read or write data in Apache Pulsar or StreamNative Cloud. This is also available in Timeplus Proton, since v1.6.8. - -## CREATE EXTERNAL STREAM - -To create an external stream for Apache Pulsar, you can run the following DDL SQL: - -```sql -CREATE EXTERNAL STREAM [IF NOT EXISTS] stream_name - ( ) -SETTINGS - type='pulsar', -- required - service_url='pulsar://host:port',-- required - topic='..', -- required - jwt='..', - config_file='..', - data_format='..', - format_schema='..', - one_message_per_row=.., - skip_server_cert_check=.., - validate_hostname=.., - ca_cert='..', - client_cert='..', - client_key='..', - connections_per_broker=.., - memory_limit=.., - io_threads=.. -``` -### Connect to a local Apache Pulsar - -If you have a local Apache Pulsar server running, you can run the following SQL DDL to create an external stream to connect to it. - -```sql -CREATE EXTERNAL STREAM local_pulsar (raw string) -SETTINGS type='pulsar', - service_url='pulsar://localhost:6650', - topic='persistent://public/default/my-topic' -``` - -### Connect to StreamNative Cloud -If you have the access to [StreamNative Cloud](https://console.streamnative.cloud), you can run the following SQL DDL to create an external stream to connect to it, with a proper [API Key](https://docs.streamnative.io/docs/api-keys-overview) for a service account. Make sure you choose "Create API Key", instead of the "Get JWT Token (Depreciated)". - -![screenshot](/img/pulsar_api_key.png) - -```sql -CREATE EXTERNAL STREAM ext_stream (raw string) -SETTINGS type='pulsar', - service_url='pulsar+ssl://pc-12345678.gcp-shared-usce1.g.snio.cloud:6651', - topic='persistent://tenant/namespace/topic', - jwt='eyJh..syFQ' -``` - -### DDL Settings - -#### skip_server_cert_check -Default false. If set to true, it will accept untrusted TLS certificates from brokers. - -#### validate_hostname - -Default false. Configure whether it allows validating hostname verification when a client connects to a broker over TLS. -#### ca_cert -The CA certificate (PEM format), which will be used to verify the server's certificate. -#### client_cert -The certificate (PEM format) for the client to use mTLS authentication. [Learn more](https://pulsar.apache.org/docs/3.3.x/security-tls-authentication/). -#### client_key -The private key (PEM format) for the client to use mTLS authentication. -#### jwt -The JSON Web Tokens for the client to use JWT authentication. [Learn more](https://docs.streamnative.io/docs/api-keys-overview). -#### config_file -The `config_file` setting is available since Timeplus Enterprise 2.7. You can specify the path to a file that contains the configuration settings. The file should be in the format of `key=value` pairs, one pair per line. You can set the Pulsar credentials in the file. - -Please follow the example in [Kafka External Stream](/proton-kafka#config_file). -#### connections_per_broker -Default 1. Sets the max number of connection that this external stream will open to a single broker. By default, the connection pool will use a single connection for all the producers and consumers. -#### memory_limit -Default 0 (unlimited). Configure a limit on the amount of memory that will be allocated by this external stream. -#### io_threads -Default 1. Set the number of I/O threads to be used by the Pulsar client. - -Like [Kafka External Stream](/proton-kafka), Pulsar External Stream also supports all format related settings: `data_format`, `format_schema`, `one_message_per_row`, etc. - -#### data_format -The supported values for `data_format` are: - -- JSONEachRow: parse each row of the message as a single JSON document. The top level JSON key/value pairs will be parsed as the columns. [Learn More](#jsoneachrow). -- CSV: less commonly used. [Learn More](#csv). -- TSV: similar to CSV but tab as the separator -- ProtobufSingle: for single Protobuf message per message -- Protobuf: there could be multiple Protobuf messages in a single message. -- Avro -- RawBLOB: the default value. Read/write message as plain text. - -For data formats which write multiple rows into one single message (such as `JSONEachRow` or `CSV`), two more advanced settings are available: - -#### max_insert_block_size -`max_insert_block_size` to control the maximum number of rows can be written into one message. - -#### max_insert_block_bytes -`max_insert_block_bytes` to control the maximum size (in bytes) that one message can be. - -## Read Data in Pulsar -### Read messages in a single column {#single_col_read} - -If the message in Pulsar topic is in plain text format or JSON, you can create an external stream with only a `raw` column in `string` type. - -Example: - -```sql -CREATE EXTERNAL STREAM ext_github_events (raw string) -SETTINGS type='pulsar', service_url='pulsar://host:port', topic='..' -``` - -Then use query time [JSON extraction functions](/functions_for_json) or shortcut to access the values, e.g. `raw:id`. - -### Read messages as multiple columns{#multi_col_read} - -If the keys in the JSON message never change, or you don't care about the new columns, you can also create the external stream with multiple columns. - -You can pick up some top level keys in the JSON as columns, or all possible keys as columns. - -Example: - -```sql -CREATE EXTERNAL STREAM ext_stream_parsed - (address string, firstName string, middleName string, lastName string, email string, username string, password string,sex string,telephoneNumber string, dateOfBirth int64, age uint8, company string,companyEmail string,nationalIdentityCardNumber string,nationalIdentificationNumber string, - passportNumber string) -SETTINGS type='pulsar', - service_url='pulsar+ssl://pc-12345678.gcp-shared-usce1.g.snio.cloud:6651', - topic='persistent://docs/ns/datagen', - data_format='JSONEachRow', - jwt='eyJhb..syFQ' -``` - -If there are nested complex JSON in the message, you can define the column as a string type. Actually any JSON value can be saved in a string column. - -### Virtual Columns - -Pulsar external stream has the follow virtual columns: -#### _tp_time -the event time of the Pulsar message if it's available, or it's the publish time otherwise. -#### _tp_append_time -the publish time of the pulsar message. -#### _tp_process_time -the timestamp when the message was read by Pulsar External Stream. -#### _tp_shard -the partition ID, starting from 0. -#### _pulsar_message_id -an `array` which contains 4 elements: ledger_id, entry_id, partition, and batch_index. -#### _tp_sn -the sequence number in Timeplus, in int64 type. -#### _tp_message_key -the message key (a.k.a partition key). Can be empty. - -#### _tp_message_headers - -Starting from Timeplus Enterprise 2.8.2, you can read and write custom headers via this column. - -Define the column in the DDL: -```sql -CREATE EXTERNAL STREAM example ( - s string, - i int, - ..., - _tp_message_headers map(string, string) -) settings type='pulsar',...; -``` -Then insert data to the external stream via `INSERT INTO` or materialized views, with a map of string pairs as custom headers for each message. - -### Query Settings -#### shards -You can read in specified Pulsar partitions. By default, all partitions will be read. But you can also read from a single partition via the `shards` setting, e.g. - -```sql -SELECT raw FROM ext_stream SETTINGS shards='0' -``` - -Or you can specify a set of partition ID, separated by comma, e.g. - -```sql -SELECT raw FROM ext_stream SETTINGS shards='0,2' -``` - -#### record_consume_timeout_ms -Use setting `record_consume_timeout_ms` to determine how much time the external can wait for new messages before returning the query result. The smaller the value is, the smaller the latency will be, but also will be less performant. - -### Read existing messages {#rewind} - -When you run `SELECT raw FROM ext_stream `, Timeplus will read the new messages in the topics, not the existing ones. -#### seek_to -If you need to read all existing messages, you can use the following settings: - -```sql -SELECT raw FROM ext_stream SETTINGS seek_to='earliest' -``` - -Or the following SQL: - -```sql -SELECT raw FROM table(ext_stream) WHERE ... -``` -Note: both `earliest` and `latest` are supported. You can also use `seek_to='2024-10-14'` for date or datetime based rewind. But number-based seek_to is not supported. - - -:::warning -Please avoid scanning all existing data via `select * from table(ext_stream)`. -::: - -### Read/Write Pulsar Message Key {#messagekey} - -For each message in the topic, the value is critical for sure. The key is optional but could carry important meta data. - -You can define the `_tp_message_key` column when you create the external stream. - -For example: -```sql -CREATE EXTERNAL STREAM test_msg_key ( - id int32, - name string, - _tp_message_key string -) SETTINGS type='pulsar', - service_url='pulsar://host.docker.internal:6650', - topic='persistent://public/default/msg-key' -``` -You can insert any data to the Pulsar topic. - -When insert a row to the stream like: -```sql -INSERT INTO test_msg_key(id,name,_tp_message_key) VALUES (1, 'John', 'some-key'); -``` -`'some-key'` will be used for the message key for the Pulsar message (and it will be excluded from the message body, so the message will be `{"id": 1, "name": "John"}` for the above SQL). - -When doing a SELECT query, the message key will be populated to the `_tp_message_key` column as well. -`SELECT * FROM test_msg_key` will return `'some-key'` for the `_tp_message_key` message. - -`_tp_message_key` support the following types: `uint8`, `uint16`, `uint32`, `uint64`, `int8`, `int16`, `int32`, `int64`, `bool`, `float32`, `float64`, `string`, and `fixed_string`. - -`_tp_message_key` also support `nullable`. Thus we can create an external stream with optional message key. For example: -```sql -CREATE EXTERNAL STREAM foo ( - id int32, - name string, - _tp_message_key nullable(string) default null -) SETTINGS type='pulsar',...; -``` - -## Write Data to Pulsar - -### Write to Pulsar in Plain Text {#single_col_write} - -You can write plain text messages to Pulsar topics with an external stream with a single column. - -```sql -CREATE EXTERNAL STREAM ext_github_events (raw string) -SETTINGS type='pulsar', service_url='pulsar://host:port', topic='..' -``` - -Then use either `INSERT INTO VALUES (v)`, or [Ingest REST API](/proton-ingest-api), or set it as the target stream for a materialized view to write message to the Pulsar topic. The actual `data_format` value is `RawBLOB` but this can be omitted. By default `one_message_per_row` is `true`. - -#### Advanced Settings for writing data -Settings for controlling the producer behavior: - * `output_batch_max_messages` - Set the max number of messages permitted in a batch. If you set this option to a value greater than 1, messages are queued until this threshold is reached or batch interval has elapsed. - * `output_batch_max_size_bytes` - Set the max size of messages permitted in a batch. If you set this option to a value greater than 1, messages are queued until this threshold is reached or batch interval has elapsed. - * `output_batch_max_delay_ms` - Set the max time for message publish delay permitted in a batch. - * `pulsar_max_pending_messages` - Set the max size of the producer's queue holding the messages pending to receive an acknowledgment from the broker. When the queue is full, the producer will be blocked. - -### Multiple columns to write to Pulsar{#multi_col_write} - -To write structured data to Pulsar topics, you can choose different data formats: - -##### RawBLOB -Write the content as pain text. - -#### JSONEachRow - -You can use `data_format='JSONEachRow',one_message_per_row=true` to inform Timeplus to write each event as a JSON document. The columns of the external stream will be converted to keys in the JSON documents. For example: - -```sql -CREATE EXTERNAL STREAM target( - _tp_time datetime64(3), - url string, - method string, - ip string) - SETTINGS type='pulsar', - service_url='pulsar://host:port', - topic='..', - data_format='JSONEachRow', - one_message_per_row=true; -``` - -The messages will be generated in the specific topic as - -```json -{ -"_tp_time":"2023-10-29 05:36:21.957" -"url":"https://www.nationalweb-enabled.io/methodologies/killer/web-readiness" -"method":"POST" -"ip":"c4ecf59a9ec27b50af9cc3bb8289e16c" -} -``` - -:::info - -Please note, by default multiple JSON documents will be inserted to the same Pulsar message. One JSON document each row/line. Such default behavior aims to get the maximum writing performance to Pulsar. But you need to make sure the downstream applications are able to properly split the JSON documents per message. - -If you need a valid JSON per each message, instead of a JSONL, please set `one_message_per_row=true` e.g. - -```sql -CREATE EXTERNAL STREAM target(_tp_time datetime64(3), url string, ip string) -SETTINGS type='pulsar', service_url='pulsar://host:port', topic='..', - data_format='JSONEachRow',one_message_per_row=true -``` - -The default value of one_message_per_row, if not specified, is false for `data_format='JSONEachRow'` and true for `data_format='RawBLOB'`. -::: - -#### CSV - -You can use `data_format='CSV'` to inform Timeplus to write each event as a JSON document. The columns of the external stream will be converted to keys in the JSON documents. For example: - -```sql -CREATE EXTERNAL STREAM target( - _tp_time datetime64(3), - url string, - method string, - ip string) - SETTINGS type='pulsar', - service_url='pulsar://host:port', - topic='..', - data_format='CSV'; -``` - -The messages will be generated in the specific topic as - -```csv -"2023-10-29 05:35:54.176","https://www.nationalwhiteboard.info/sticky/recontextualize/robust/incentivize","PUT","3eaf6372e909e033fcfc2d6a3bc04ace" -``` - -##### TSV -Similar to CSV but tab as the separator. - -#### ProtobufSingle - -You can write Protobuf-encoded messages in Pulsar topics. - -First, you need to create a schema with SQL, e.g. -```sql -CREATE OR REPLACE FORMAT SCHEMA schema_name AS ' - syntax = "proto3"; - - message SearchRequest { - string query = 1; - int32 page_number = 2; - int32 results_per_page = 3; - } - ' TYPE Protobuf -``` -Then refer to this schema while creating an external stream for Pulsar: -```sql -CREATE EXTERNAL STREAM stream_name( - query string, - page_number int32, - results_per_page int32) -SETTINGS type='pulsar', - service_url='pulsar://host.docker.internal:6650', - topic='persistent://public/default/protobuf', - data_format='ProtobufSingle', - format_schema='schema_name:SearchRequest' -``` - -Then you can run `INSERT INTO` or use a materialized view to write data to the topic. -```sql -INSERT INTO stream_name(query,page_number,results_per_page) VALUES('test',1,100) -``` - -Please refer to [Protobuf/Avro Schema](/proton-format-schema) for more details. - -#### Avro - -You can write messages in Avro format. - -First, you need to create a schema with SQL, e.g. -```sql -CREATE OR REPLACE FORMAT SCHEMA avro_schema AS '{ - "namespace": "example.avro", - "type": "record", - "name": "User", - "fields": [ - {"name": "name", "type": "string"}, - {"name": "favorite_number", "type": ["int", "null"]}, - {"name": "favorite_color", "type": ["string", "null"]} - ] - } - ' TYPE Avro; -``` -Then refer to this schema while creating an external stream for Pulsar: -```sql -CREATE EXTERNAL STREAM stream_avro( - name string, - favorite_number nullable(int32), - favorite_color nullable(string)) -SETTINGS type='pulsar', - service_url='pulsar://host.docker.internal:6650', - topic='persistent://public/default/avro', - data_format='Avro', - format_schema='avro_schema' -``` - -Then you can run `INSERT INTO` or use a materialized view to write data to the topic. -```sql -INSERT INTO stream_avro(name,favorite_number,favorite_color) VALUES('test',1,'red') -``` - -Please refer to [Protobuf/Avro Schema](/proton-format-schema) for more details. - -### Continuously Write to Pulsar via MV - -You can use materialized views to write data to Pulsar as an external stream, e.g. - -```sql --- read the topic via an external stream -CREATE EXTERNAL STREAM frontend_events(raw string) - SETTINGS type='pulsar', - service_url='pulsar://host:port', - topic='owlshop-frontend-events'; - --- create the other external stream to write data to the other topic -CREATE EXTERNAL STREAM target( - _tp_time datetime64(3), - url string, - method string, - ip string) - SETTINGS type='pulsar', - service_url='pulsar://host:port', - topic='..', - data_format='JSONEachRow', - one_message_per_row=true; - --- setup the ETL pipeline via a materialized view -CREATE MATERIALIZED VIEW mv INTO target AS - SELECT now64() AS _tp_time, - raw:requestedUrl AS url, - raw:method AS method, - lower(hex(md5(raw:ipAddress))) AS ip - FROM frontend_events; -``` - -## DROP EXTERNAL STREAM - -```sql -DROP STREAM [IF EXISTS] stream_name -``` - -## Limitations - -There are some limitations for the Pulsar-based external streams, because Timeplus doesn’t control the storage or the data format for the external stream. - -1. Unlike normal streams, there is no historical storage for the external streams. You can run `table(ex_pulsar_stream)` but it will scan all messages in the topic. There is no way to implement an efficient `count`. Thus, `SELECT count() FROM table(ex_pulsar_stream)` will always scan the whole topic. If you need to frequently run query for historical data, you can use a Materialized View to query the Pulsar External Stream and save the data in Timeplus columnar or row storage. This will improve the query performance. -2. You use `seek_to` in the streaming query. `earliest` and `latest` are supported. You can also use `seek_to='2024-10-14'` for date or datetime based rewind. But number-based seek_to is not supported. -3. There is no retention policy for the external streams in Timeplus. You need to configure the retention policy on Pulsar. If the data is no longer available in the external systems, they cannot be searched in Timeplus either. -4. Like Kafka external stream, Pulsar external stream will fetch the partition list after the streaming SQL starts running. Thus, it won't be automatically detect new partitions at runtime. Users must re-execute the query in order to read data from the new partitions. -5. Pulsar external stream functionality is limited to Linux bare metal builds and Linux-based Docker images, excluding macOS bare metal builds. diff --git a/docs/query-settings.md b/docs/query-settings.md index e3009d442..74b496b68 100644 --- a/docs/query-settings.md +++ b/docs/query-settings.md @@ -104,7 +104,7 @@ Type: string Default: `default` -Specifies which [JOIN](/joins) algorithm is used. +Specifies which [JOIN](/streaming-joins) algorithm is used. Possible values: * `default`: Same as `direct`,`parallel_hash`,`hash`, i.e. try to use direct join, parallel hash join, and hash join (in this order). diff --git a/docs/query-syntax.md b/docs/query-syntax.md index 9668de6ea..43d3d87c5 100644 --- a/docs/query-syntax.md +++ b/docs/query-syntax.md @@ -55,7 +55,7 @@ This will fetch the 3 rows from the 2nd smallest value of `a`. ## JOINs -Please check [Joins](/joins). +Please check [Joins](/streaming-joins). ## WITH cte diff --git a/docs/quickstart.md b/docs/quickstart.md index c9fd8633c..44e76e908 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -24,10 +24,10 @@ After creating the account, login with that username and password. If your streaming data resides in or a publicly accessible Kafka or Pulsar instance, follow one of following docs to setup data access in Timeplus, then return here to complete the quickstart: -- [Apache Kafka](/proton-kafka) +- [Apache Kafka](/kafka-source) - Confluent Cloud -- [Apache Pulsar](/pulsar-external-stream) -- [REST API, SDK, and others](/ingestion) +- [Apache Pulsar](/pulsar-source) +- [REST API, SDK, and others](/connect-data-in) If you don't yet have a streaming data source and would like test out how Timeplus works, Timeplus provides a built-in data source to generate streaming data for some common use cases. @@ -50,10 +50,10 @@ To send data to Kafka, ClickHouse or other systems, you can submit a streaming S ![Send data out](/img/sink.png) Timeplus supports various systems as the downstreams: -* [Send data to Kafka topics](/destination#kafka) -* [Send data to Pulsar topics](/pulsar-external-stream#write-data-to-pulsar) -* [Send data to ClickHouse tables](/proton-clickhouse-external-table#write) -* [Send data to another Timeplus deployment](/timeplus-external-stream) -* [Send data to Webhook endpoints](/destination#webhook) -* [Notify others via Slack](/destination#slack) -* [Send data to other systems via Redpanda Connect](/destination#rpconnect) +* [Send data to Kafka topics](/send-data-out#kafka) +* [Send data to Pulsar topics](/pulsar-sink) +* [Send data to ClickHouse tables](/clickhouse-external-table#write) +* [Send data to another Timeplus deployment](/timeplus-source) +* [Send data to Webhook endpoints](/send-data-out#webhook) +* [Notify others via Slack](/send-data-out#slack) +* [Send data to other systems via Redpanda Connect](/send-data-out#rpconnect) diff --git a/docs/redpanda-connect.md b/docs/redpanda-connect.md index 69e700caa..7a5d3a34f 100644 --- a/docs/redpanda-connect.md +++ b/docs/redpanda-connect.md @@ -62,8 +62,8 @@ Processors can be defined in `inputs` section or at the top level. ## Available Connectors All input/output/processor components from the latest Redpanda Connect are available in Timeplus Enterprise, except the following ones: -* `kafka` and `kafka_franz`: please create [external streams for Kafka](/proton-kafka) to read or write Kafka data. -* `pulsar`: please create [external streams for Pulsar](/pulsar-external-stream) to read or write Pulsar data. +* `kafka` and `kafka_franz`: please create [external streams for Kafka](/kafka-source) to read or write Kafka data. +* `pulsar`: please create [external streams for Pulsar](/pulsar-source) to read or write Pulsar data. * `snowflake_put` and `splunk`: those 2 components require [Enterprise Edition license](https://redpanda.com/compare-platform-editions) from Redpanda. Please contact [Redpanda Sales](https://redpanda.com/try-redpanda) to request a trial license, or to purchase an Enterprise Edition license. * `csv`, `file`, `stdin`: those are designed for local development and tests. Please use the "CSV Upload" feature in "Data Collection" page. diff --git a/docs/s3-external-table-sink.mdx b/docs/s3-external-table-sink.mdx new file mode 100644 index 000000000..e8d666b67 --- /dev/null +++ b/docs/s3-external-table-sink.mdx @@ -0,0 +1,8 @@ +--- +id: s3-sink +title: S3 External Table +--- + +import ExternalS3Write from './shared/s3-external-table.md'; + + diff --git a/docs/s3-external-table-source.mdx b/docs/s3-external-table-source.mdx new file mode 100644 index 000000000..8913f103c --- /dev/null +++ b/docs/s3-external-table-source.mdx @@ -0,0 +1,8 @@ +--- +id: s3-source +title: S3 External Table +--- + +import ExternalS3Read from './shared/s3-external-table.md'; + + diff --git a/docs/destination.md b/docs/send-data-out.md similarity index 89% rename from docs/destination.md rename to docs/send-data-out.md index a5b7aa821..3f7c6cc4c 100644 --- a/docs/destination.md +++ b/docs/send-data-out.md @@ -1,13 +1,14 @@ -# Sending Data Out +# Send Data Out + +## Overview With Timeplus Console, you can easily explore and analyze streaming data, with intuitive UI, standard SQL and streaming charts. But you won't stop here. Timeplus enables you to setup real-time data pipelines to send data to other systems, or notify individuals or power up downstream applications. -## Overview Timeplus supports various systems as the downstreams: * [Send data to Kafka topics](#kafka) -* [Send data to Pulsar topics](/pulsar-external-stream#write-data-to-pulsar) -* [Send data to ClickHouse tables](/proton-clickhouse-external-table#write) -* [Send data to another Timeplus deployment](/timeplus-external-stream) +* [Send data to Pulsar topics](/pulsar-sink) +* [Send data to ClickHouse tables](/clickhouse-external-table#write) +* [Send data to another Timeplus deployment](/timeplus-source) * [Send data to Webhook endpoints](#webhook) * [Notify others via Slack](#slack) * [Send data to other systems via Redpanda Connect](#rpconnect) @@ -32,7 +33,7 @@ Then choose "Apache Kafka". The following parameters are required: * Topic name: either an existing topic or specify the new topic name for Timeplus to create. * Authentication -Please refer to the [this page](/ingestion#kafka) for details of the parameters. You can send data to Confluent Cloud, Confluent Platform, or custom managed Apache Kafka. +Please refer to the [this page](/kafka-sink) for details of the parameters. You can send data to Confluent Cloud, Confluent Platform, or custom managed Apache Kafka. ## Trigger actions via Webhook{#webhook} diff --git a/docs/server_config.md b/docs/server_config.md index 66955d76a..74365dccc 100644 --- a/docs/server_config.md +++ b/docs/server_config.md @@ -369,7 +369,7 @@ For multi-node clusters deployed via [Helm Chart](/k8s-helm), please set the sys To edit or add new users, you can use the [timeplus user](/cli-user) CLI or container, which supports bare metal and Kubernetes, both single node and multi-node. ## Secret Management {#secrets} -Starting from Timeplus Enterprise 2.7, you can integrate with HashiCorp Vault for secret management. You can store the secrets of Kafka External Stream in Vault and specify them in the [config_file](/proton-kafka#config_file) setting. For bare metal deployments, you can also use this setting when all nodes have access to the same file in the same path. +Starting from Timeplus Enterprise 2.7, you can integrate with HashiCorp Vault for secret management. You can store the secrets of Kafka External Stream in Vault and specify them in the [config_file](/kafka-source#config_file) setting. For bare metal deployments, you can also use this setting when all nodes have access to the same file in the same path. ## License Management{#license} diff --git a/docs/iceberg.md b/docs/shared/iceberg-external-stream.md similarity index 99% rename from docs/iceberg.md rename to docs/shared/iceberg-external-stream.md index 40a5d3560..002e195fc 100644 --- a/docs/iceberg.md +++ b/docs/shared/iceberg-external-stream.md @@ -1,4 +1,4 @@ -# Iceberg Integration +## Overview [Apache Iceberg](https://iceberg.apache.org/) is an open table format for large-scale analytic datasets, designed for high performance and reliability. It provides an open, vendor-neutral solution that supports multiple engines, making it ideal for various analytics workloads. Initially, the Iceberg ecosystem was primarily built around Java, but with the increasing adoption of the REST catalog specification, Timeplus is among the first vendors to integrate with Iceberg purely in C++. This allows Timeplus users to stream data to Iceberg with a high performance, low memory footprint, and easy installation without relying on Java dependencies. @@ -11,7 +11,7 @@ Since Timeplus Proton 1.7(to be released soon) and [Timeplus Enterprise 2.8](/en - Query your Iceberg tables with multiple engines including Timeplus, Apache Spark, Apache Flink, ClickHouse, DuckDB, and AWS Athena - Future-proof your data architecture with broad industry support and an active open-source community -## CREATE DATABASE {#syntax} +## Create Iceberg Database {#syntax} To create an Iceberg database in Timeplus, use the following syntax: @@ -170,7 +170,7 @@ spark-sql -v --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1, --conf spark.sql.catalog.spark_catalog.warehouse=s3://mybucket/demo/gravitino1 ``` -## DROP DATABASE +## Drop Iceberg Database ```sql DROP DATABASE demo CASCADE; diff --git a/docs/proton-kafka.md b/docs/shared/kafka-external-stream-client-properties.md similarity index 66% rename from docs/proton-kafka.md rename to docs/shared/kafka-external-stream-client-properties.md index 758293360..766befb64 100644 --- a/docs/proton-kafka.md +++ b/docs/shared/kafka-external-stream-client-properties.md @@ -1,578 +1,3 @@ -# Kafka External Stream - -Timeplus allows users to **read from** and **write to** Apache Kafka (and compatible platforms like **Confluent Cloud** and **Redpanda**) using **Kafka External Streams**. - -By combining external streams with [Materialized Views](/view#m_view) and [Target Streams](/view#target-stream), users can build robust **real-time streaming pipelines**. - -## Tutorial with Docker Compose {#tutorial} - -Explore the following hands-on tutorials: - -- [Query Kafka with SQL](/tutorial-sql-kafka) -- [Streaming JOIN](/tutorial-sql-join) -- [Streaming ETL](/tutorial-sql-etl) - -## CREATE EXTERNAL STREAM - -Use the following SQL command to create a Kafka external stream: - -```sql -CREATE EXTERNAL STREAM [IF NOT EXISTS] - ( ) -SETTINGS - type='kafka', -- required - brokers='ip:9092', -- required - topic='..', -- required - security_protocol='..', - sasl_mechanism='..', - username='..', - password='..', - config_file='..', - data_format='..', - format_schema='..', - one_message_per_row=.., - kafka_schema_registry_url='..', - kafka_schema_registry_credentials='..', - ssl_ca_cert_file='..', - ssl_ca_pem='..', - skip_ssl_cert_check=.., - properties='..' -``` - -### Settings - -#### type - -Must be set to `kafka`. Compatible with: - -* Apache Kafka -* Confluent Platform or Cloud -* Redpanda -* Other Kafka-compatible systems - -#### brokers - -Comma-separated list of broker addresses (host\:port), e.g.: - -``` -kafka1:9092,kafka2:9092,kafka3:9092 -``` - -#### topic - -Kafka topic name to connect to. - -#### security_protocol - -The supported values for `security_protocol` are: - -- PLAINTEXT: when this option is omitted, this is the default value. -- SASL_SSL: when this value is set, username and password should be specified. - - If users need to specify own SSL certification file, add another setting `ssl_ca_cert_file='/ssl/ca.pem'`. Users can also put the full content of the pem file as a string in the `ssl_ca_pem` setting. - - To skip the SSL certification verification: `skip_ssl_cert_check=true`. - -#### sasl_mechanism - -The supported values for `sasl_mechanism` are: - -- PLAIN: when setting security_protocol to SASL_SSL, this is the default value for sasl_mechanism. -- SCRAM-SHA-256 -- SCRAM-SHA-512 -- AWS_MSK_IAM (for AWS MSK IAM role-based access when EC2 or Kubernetes pod is configured with a proper IAM role) - -#### username / password - -Required when `sasl_mechanism` is set to SCRAM-SHA-256 or SCRAM-SHA-512. - -Alternatively, use [`config_file`](#config_file) to securely pass credentials. - -#### config_file - -Use this to point to a file containing key-value config lines for Kafka external stream, e.g.: - -```properties -username=my_username -password=my_password -data_format='Avro' -one_message_per_row=true -``` - -This is especially useful in Kubernetes environments with secrets managed via [HashiCorp Vault](https://learn.hashicorp.com/tutorials/vault/kubernetes-sidecar). - -**HarshiCorp Vault injection example:** - -```yaml -annotations: - vault.hashicorp.com/agent-inject: "true" - vault.hashicorp.com/agent-inject-status: "update" - vault.hashicorp.com/agent-inject-secret-kafka-secret: "secret/kafka-secret" - vault.hashicorp.com/agent-inject-template-kafka-secret: | - {{- with secret "secret/kafka-secret" -}} - username={{ .Data.data.username }} - password={{ .Data.data.password }} - {{- end }} - vault.hashicorp.com/role: "vault-role" -``` - -:::info - -Please note values in settings in the DDL will override those in config_file and it will only merge the settings from the config_file which are not explicitly specified in the DDL. - -::: - - -#### data_format - -Defines how Kafka messages are parsed and written. Supported formats are - -| Format | Description | -| ---------------- | ---------------------------------------- | -| `JSONEachRow` | Parses one JSON document per line | -| `CSV` | Parses comma-separated values | -| `TSV` | Like CSV, but tab-delimited | -| `ProtobufSingle` | One Protobuf message per Kafka message | -| `Protobuf` | Multiple Protobuf messages per Kafka msg | -| `Avro` | Avro-encoded messages | -| `RawBLOB` | Raw text, no parsing (default) | - -#### format_schema - -Required for these data formats: - -* `ProtobufSingle` -* `Protobuf` -* `Avro` - -#### one_message_per_row - -Set to `true` to ensure each Kafka message maps to exactly **one JSON document**, especially when writing with `JSONEachRow`. - -#### kafka_schema_registry_url - -URL of the [Kafka Schema Registry](/proton-schema-registry), including the protocol is required (`http://` or `https://`). - -#### kafka_schema_registry_credentials - -Credentials for the registry, in `username:password` format. - -#### ssl_ca_cert_file / ssl_ca_pem - -Use either: - -* `ssl_ca_cert_file='/path/to/cert.pem'` -* `ssl_ca_pem='-----BEGIN CERTIFICATE-----\n...'` - -#### skip_ssl_cert_check - -* Default: `false` -* Set to `true` to **bypass SSL verification**. - -#### properties - -Used for advanced configurations. These settings are passed directly to the Kafka client ([librdkafka config options](https://github.com/confluentinc/librdkafka/blob/master/CONFIGURATION.md)) to fine tune the Kafka producer, consumer or topic behaviors. - -For more, see the [Advanced Settings](#advanced_settings) section. - -## Read Data from Kafka - -Timeplus allows reading Kafka messages in multiple data formats, including: - -* Plain string (raw) -* CSV / TSV -* JSON -* Protobuf -* Avro - -### Read Kafka Messages as Raw String - -Use this mode when: - -* Messages contain **unstructured text or binary data** -* No built-in format is applicable -* You want to **debug raw Kafka messages** - -#### Raw String Example - -```sql -CREATE EXTERNAL STREAM ext_application_logs - (raw string) -SETTINGS type='kafka', - brokers='localhost:9092', - topic='application_logs' -``` - -Users can use functions like regex string processing or JSON extract etc functions to further process the raw string. - -#### Regex Example – Parse Application Logs - -```sql -SELECT - to_time(extract(raw, '^(\\d{4}\\.\\d{2}\\.\\d{2} \\d{2}:\\d{2}:\\d{2}\\.\\d+)')) AS timestamp, - extract(raw, '} <(\\w+)>') AS level, - extract(raw, '} <\\w+> (.*)') AS message -FROM application_logs; -``` - -### Read JSON Kafka Message - -Assuming Kafka message contains JSON text with this schema - -```json -{ - "actor": string, - "created_at": timestamp, - "id": string, - "payload": string, - "repo": string, - "type": string -} -``` - -You can process JSON in two ways: - -#### Option A: Parse with JSON Extract Functions - -1. Create a raw stream: - -```sql -CREATE EXTERNAL STREAM ext_json_raw - (raw string) -SETTINGS type='kafka', - brokers='localhost:9092', - topic='github_events'; -``` - -2. Extract fields using JSON extract shortcut syntax or [JSON extract functions](/functions_for_json): - -```sql -SELECT - raw:actor AS actor, - raw:created_at::datetime64(3, 'UTC') AS created_at, - raw:id AS id, - raw:payload AS payload, - raw:repo AS repo, - raw:type AS type -FROM ext_json_raw; -``` - -This method is most flexible and is best for dynamic JSON text with new fields or missing fields and it can also extract nested JSON fields. - -#### Option B: Use JSONEachRow Format - -Define a Kafka external stream with columns which are mapped to the JSON fields and also specify the `data_format` as `JSONEachRow`. - -```sql -CREATE EXTERNAL STREAM ext_json_parsed - ( - actor string, - created_at datetime64(3, 'UTC'), - id string, - payload string, - repo string, - type string - ) -SETTINGS type='kafka', - brokers='localhost:9092', - topic='github_events', - data_format='JSONEachRow' -``` - -When users query the `ext_json_parsed` stream, the JSON fields will be parsed and cast to the target column type automatically. - -This method is most convenient when the JSON text is in stable schema and can be used to extract JSON fields at top level. - -### Read CSV Kafka Messages - -Similar to data format `JSONEachRow`, users can read Kafka message in CSV format. - -``` -CREATE EXTERNAL STREAM ext_json_parsed - ( - actor string, - created_at datetime64(3, 'UTC'), - id string, - payload string, - repo string, - type string - ) -SETTINGS type='kafka', - brokers='localhost:9092', - topic='csv_topic', - data_format='CSV'; -``` - -### Read TSV Kafka Messages - -Identical to CSV, but expects **tab-separated values**: - -```sql -SETTINGS data_format='TSV'; -``` - -### Read Avro or Protobuf Messages - -To read Avro-encoded / Protobuf-encoded Kafka message, please refer to [Schema](/proton-format-schema) and [Schema Registry](/proton-schema-registry) for details. - -### Access Kafka Message Metadata - -Timeplus provides **virtual columns** for Kafka message metadata. - -| Virtual Column | Description | Type | -| --------------------- | ------------------------------ | ---------------------- | -| `_tp_time` | Kafka message timestamp | `datetime64(3, 'UTC')` | -| `_tp_message_key` | Kafka message key | `string` | -| `_tp_message_headers` | Kafka headers as key-value map | `map(string, string)` | -| `_tp_sn` | Kafka message offset | `int64` | -| `_tp_shard` | Kafka partition ID | `int32` | - - -### Kafka Message Metadata Examples - -```sql --- View message time and payload -SELECT _tp_time, raw FROM ext_github_events; - --- View message key -SELECT _tp_message_key, raw FROM ext_github_events; - --- Access headers -SELECT _tp_message_headers['trace_id'], raw FROM ext_github_events; - --- View message offset and partition -SELECT _tp_sn, _tp_shard, raw FROM ext_github_events; -``` - -### Query Settings for Kafka External Streams - -Timeplus supports several query-level settings to control how data is read from Kafka topics. These settings can be especially useful for targeting specific partitions or replaying messages from a defined point in time. - -#### Read from Specific Kafka Partitions - -By default, Timeplus reads from **all partitions** of a Kafka topic. You can override this by using the `shards` setting to specify which partitions to read from. - -##### Read from a Single Partition - -```sql -SELECT raw FROM ext_stream SETTINGS shards='0' -``` - -##### Read from Multiple Partitions - -Separate partition IDs with commas: - -```sql -SELECT raw FROM ext_stream SETTINGS shards='0,2' -``` - -#### Rewind via seek_to - -By default, Timeplus only reads **new messages** published after the query starts. To read historical messages, use the `seek_to` setting. - -#### Rewind to the Earliest Offset (All Partitions) - -```sql -SELECT raw FROM ext_stream SETTINGS seek_to='earliest' -``` - -#### Rewind to Specific Offsets (Per Partition) - -Offsets are specified **in partition order**. For example: - -```sql -SELECT raw FROM ext_stream SETTINGS seek_to='5,3,11' -``` - -This seeks to: - -* Offset `5` in partition `0` -* Offset `3` in partition `1` -* Offset `11` in partition `2` - -#### Rewind to a Specific Timestamp (All Partitions) - -You can also rewind based on a timestamp: - -```sql -SELECT raw FROM ext_stream SETTINGS seek_to='2025-01-01T00:00:00.000' -``` - -:::info - -Timeplus will use Kafka API to convert the timestamp to the corresponding offsets for each partition internally. - -::: - -## Write Data to Kafka - -Timeplus supports writing data to Kafka using various encoding formats such as strings, JSON, CSV, TSV, Avro, and Protobuf. You can write to Kafka using SQL `INSERT` statements, the [Ingest REST API](/proton-ingest-api), or as the target of a [Materialized View](/sql-create-materialized-view). - -### Write as Raw String - -You can encode data as a raw string in Kafka messages: - -```sql -CREATE EXTERNAL STREAM ext_github_events (raw string) -SETTINGS type='kafka', - brokers='localhost:9092', - topic='github_events' -``` - -You can then write data via: - -* `INSERT INTO ext_github_events VALUES ('some string')` -* [Ingest REST API](/proton-ingest-api) -* Materialized View - - -:::info - -Internally, the `data_format` is `RawBLOB`, and `one_message_per_row=true` by default. - -Pay attention to setting `kafka_max_message_size`. When multiple rows can be written to the same Kafka message, this setting will control how many data will be put in a Kafka message, ensuring it won't exceed the `kafka_max_message_size` limit. -::: - -### Write as JSONEachRow - -Encode each row as a separate JSON object (aka JSONL or jsonlines): - -```sql -CREATE EXTERNAL STREAM target( - _tp_time datetime64(3), - url string, - method string, - ip string) - SETTINGS type='kafka', - brokers='redpanda:9092', - topic='masked-fe-event', - data_format='JSONEachRow', - one_message_per_row=true; -``` - -The messages will be generated in the specific topic as - -```json -{ - "_tp_time":"2023-10-29 05:36:21.957" - "url":"https://www.nationalweb-enabled.io/methodologies/killer/web-readiness" - "method":"POST" - "ip":"c4ecf59a9ec27b50af9cc3bb8289e16c" -} -``` - -:::info - -Please note, by default multiple JSON documents will be inserted to the same Kafka message. One JSON document each row/line (JSONEachRow, jsonl). Such default behavior aims to get the maximum writing performance to Kafka/Redpanda. But users need to make sure the downstream applications are able to properly process the json lines. - -If users need a valid JSON per each Kafka message, instead of a JSONL, please set `one_message_per_row=true` e.g. - -```sql -CREATE EXTERNAL STREAM target(_tp_time datetime64(3), url string, ip string) -SETTINGS type='kafka', brokers='redpanda:9092', topic='masked-fe-event', - data_format='JSONEachRow',one_message_per_row=true -``` - -The default value of one_message_per_row is false for `data_format='JSONEachRow'` and true for `data_format='RawBLOB'`. - -::: - -### Write as CSV - -Each row is encoded as one CSV line: - -```sql -CREATE EXTERNAL STREAM target( - _tp_time datetime64(3), - url string, - method string, - ip string) - SETTINGS type='kafka', - brokers='redpanda:9092', - topic='masked-fe-event', - data_format='CSV'; -``` - -The messages will be generated in the specific topic as - -```csv -"2023-10-29 05:35:54.176","https://www.nationalwhiteboard.info/sticky/recontextualize/robust/incentivize","PUT","3eaf6372e909e033fcfc2d6a3bc04ace" -``` - -### Write as TSV - -Same as CSV, but uses **tab characters** as delimiters instead of commas. - -### Write as ProtobufSingle - -To write Protobuf-encoded messages from Kafka topics, please refer to [Protobuf Schema](/proton-format-schema), and [Kafka Schema Registry](/proton-schema-registry) pages for details. - -### Write as Avro - -To write Avro-encoded messages from Kafka topics, please refer to [Avro Schema](/proton-format-schema), and [Kafka Schema Registry](/proton-schema-registry) pages for details. - -### Write Kafka Message Metadata - -#### _tp_message_key - -If users like to populate Kafka message key when producing data to a Kafka topic, users can define the `_tp_message_key` column when creating the external stream. - -For example: -```sql -CREATE EXTERNAL STREAM foo ( - id int32, - name string, - _tp_message_key string -) SETTINGS type='kafka',...; -``` - -After inserting a row to the stream like this: -```sql -INSERT INTO foo(id,name,_tp_message_key) VALUES (1, 'John', 'some-key'); -``` - -* Kafka key will be `'some-key'` -* Message body: `{"id": 1, "name": "John"}`. Kafka key was excluded from the message body. - -`_tp_message_key` supports these types: - -* Numeric: `uint8/16/32/64`, `int8/16/32/64` -* Others: `string`, `bool`, `float32`, `float64`, `fixed_string` -* Nullable are also supported: - -```sql -CREATE EXTERNAL STREAM foo ( - id int32, - name string, - _tp_message_key nullable(string) default null -) SETTINGS type='kafka',...; -``` - -#### _tp_message_headers - -Add Kafka headers via `_tp_message_headers` (map of key-value pairs): - -```sql -CREATE EXTERNAL STREAM example ( - s string, - i int, - ..., - _tp_message_headers map(string, string) -) settings type='kafka',...; -``` - -Then insert rows to the external stream via `INSERT INTO` or Materialized Views, the `_tp_message_headers` will be set to the headers of the Kafka message. - -#### sharding_expr {#sharding_expr} - -`sharding_expr` is used to control how rows are distributed to Kafka partitions: - -```sql -CREATE EXTERNAL STREAM foo ( - id int32,.. -) SETTINGS type='kafka', sharding_expr='hash(id)'...; -``` - -When inserting rows, the partition ID will be evaluated based on the `sharding_expr` and Timeplus will put the message into the corresponding Kafka partition. - ## Properties for Kafka client {#advanced_settings} In advanced use cases, you may want to fine-tune the behavior of the Kafka consumer, producer, or topic when creating Kafka external streams. For example, fine tune the consumeer, producer's latency, throughput etc. Timeplus allows these fine tuning through the `properties` setting, which passes configuration options directly to the underlying [librdkafka](https://github.com/confluentinc/librdkafka) client. @@ -696,4 +121,3 @@ batch.num.messages | P | 1 .. 1000000 | 10000 batch.size | P | 1 .. 2147483647 | 1000000 | medium | Maximum size (in bytes) of all messages batched in one MessageSet, including protocol framing overhead. This limit is applied after the first message has been added to the batch, regardless of the first message's size, this is to ensure that messages that exceed batch.size are produced. The total MessageSet size is also limited by batch.num.messages and message.max.bytes. *Type: integer* delivery.report.only.error | P | true, false | false | low | Only provide delivery reports for failed messages. *Type: boolean* sticky.partitioning.linger.ms | P | 0 .. 900000 | 10 | low | Delay in milliseconds to wait to assign new sticky partitions for each topic. By default, set to double the time of linger.ms. To disable sticky behavior, set to 0. This behavior affects messages with the key NULL in all cases, and messages with key lengths of zero when the consistent_random partitioner is in use. These messages would otherwise be assigned randomly. A higher value allows for more effective batching of these messages. *Type: integer* - diff --git a/docs/shared/kafka-external-stream-read.md b/docs/shared/kafka-external-stream-read.md new file mode 100644 index 000000000..efd0c27a6 --- /dev/null +++ b/docs/shared/kafka-external-stream-read.md @@ -0,0 +1,228 @@ +## Read Data from Kafka + +Timeplus allows reading Kafka messages in multiple data formats, including: + +* Plain string (raw) +* CSV / TSV +* JSON +* Protobuf +* Avro + +### Read Kafka Messages as Raw String + +Use this mode when: + +* Messages contain **unstructured text or binary data** +* No built-in format is applicable +* You want to **debug raw Kafka messages** + +#### Raw String Example + +```sql +CREATE EXTERNAL STREAM ext_application_logs + (raw string) +SETTINGS type='kafka', + brokers='localhost:9092', + topic='application_logs' +``` + +Users can use functions like regex string processing or JSON extract etc functions to further process the raw string. + +#### Regex Example – Parse Application Logs + +```sql +SELECT + to_time(extract(raw, '^(\\d{4}\\.\\d{2}\\.\\d{2} \\d{2}:\\d{2}:\\d{2}\\.\\d+)')) AS timestamp, + extract(raw, '} <(\\w+)>') AS level, + extract(raw, '} <\\w+> (.*)') AS message +FROM application_logs; +``` + +### Read JSON Kafka Message + +Assuming Kafka message contains JSON text with this schema + +```json +{ + "actor": string, + "created_at": timestamp, + "id": string, + "payload": string, + "repo": string, + "type": string +} +``` + +You can process JSON in two ways: + +#### Option A: Parse with JSON Extract Functions + +1. Create a raw stream: + +```sql +CREATE EXTERNAL STREAM ext_json_raw + (raw string) +SETTINGS type='kafka', + brokers='localhost:9092', + topic='github_events'; +``` + +2. Extract fields using JSON extract shortcut syntax or [JSON extract functions](/functions_for_json): + +```sql +SELECT + raw:actor AS actor, + raw:created_at::datetime64(3, 'UTC') AS created_at, + raw:id AS id, + raw:payload AS payload, + raw:repo AS repo, + raw:type AS type +FROM ext_json_raw; +``` + +This method is most flexible and is best for dynamic JSON text with new fields or missing fields and it can also extract nested JSON fields. + +#### Option B: Use JSONEachRow Format + +Define a Kafka external stream with columns which are mapped to the JSON fields and also specify the `data_format` as `JSONEachRow`. + +```sql +CREATE EXTERNAL STREAM ext_json_parsed + ( + actor string, + created_at datetime64(3, 'UTC'), + id string, + payload string, + repo string, + type string + ) +SETTINGS type='kafka', + brokers='localhost:9092', + topic='github_events', + data_format='JSONEachRow' +``` + +When users query the `ext_json_parsed` stream, the JSON fields will be parsed and cast to the target column type automatically. + +This method is most convenient when the JSON text is in stable schema and can be used to extract JSON fields at top level. + +### Read CSV Kafka Messages + +Similar to data format `JSONEachRow`, users can read Kafka message in CSV format. + +``` +CREATE EXTERNAL STREAM ext_json_parsed + ( + actor string, + created_at datetime64(3, 'UTC'), + id string, + payload string, + repo string, + type string + ) +SETTINGS type='kafka', + brokers='localhost:9092', + topic='csv_topic', + data_format='CSV'; +``` + +### Read TSV Kafka Messages + +Identical to CSV, but expects **tab-separated values**: + +```sql +SETTINGS data_format='TSV'; +``` + +### Read Avro or Protobuf Messages + +To read Avro-encoded / Protobuf-encoded Kafka message, please refer to [Schema](/timeplus-format-schema) and [Schema Registry](/kafka-schema-registry) for details. + +### Access Kafka Message Metadata + +Timeplus provides **virtual columns** for Kafka message metadata. + +| Virtual Column | Description | Type | +| --------------------- | ------------------------------ | ---------------------- | +| `_tp_time` | Kafka message timestamp | `datetime64(3, 'UTC')` | +| `_tp_message_key` | Kafka message key | `string` | +| `_tp_message_headers` | Kafka headers as key-value map | `map(string, string)` | +| `_tp_sn` | Kafka message offset | `int64` | +| `_tp_shard` | Kafka partition ID | `int32` | + + +### Kafka Message Metadata Examples + +```sql +-- View message time and payload +SELECT _tp_time, raw FROM ext_github_events; + +-- View message key +SELECT _tp_message_key, raw FROM ext_github_events; + +-- Access headers +SELECT _tp_message_headers['trace_id'], raw FROM ext_github_events; + +-- View message offset and partition +SELECT _tp_sn, _tp_shard, raw FROM ext_github_events; +``` + +### Query Settings for Kafka External Streams + +Timeplus supports several query-level settings to control how data is read from Kafka topics. These settings can be especially useful for targeting specific partitions or replaying messages from a defined point in time. + +#### Read from Specific Kafka Partitions + +By default, Timeplus reads from **all partitions** of a Kafka topic. You can override this by using the `shards` setting to specify which partitions to read from. + +##### Read from a Single Partition + +```sql +SELECT raw FROM ext_stream SETTINGS shards='0' +``` + +##### Read from Multiple Partitions + +Separate partition IDs with commas: + +```sql +SELECT raw FROM ext_stream SETTINGS shards='0,2' +``` + +#### Rewind via seek_to + +By default, Timeplus only reads **new messages** published after the query starts. To read historical messages, use the `seek_to` setting. + +#### Rewind to the Earliest Offset (All Partitions) + +```sql +SELECT raw FROM ext_stream SETTINGS seek_to='earliest' +``` + +#### Rewind to Specific Offsets (Per Partition) + +Offsets are specified **in partition order**. For example: + +```sql +SELECT raw FROM ext_stream SETTINGS seek_to='5,3,11' +``` + +This seeks to: + +* Offset `5` in partition `0` +* Offset `3` in partition `1` +* Offset `11` in partition `2` + +#### Rewind to a Specific Timestamp (All Partitions) + +You can also rewind based on a timestamp: + +```sql +SELECT raw FROM ext_stream SETTINGS seek_to='2025-01-01T00:00:00.000' +``` + +:::info + +Timeplus will use Kafka API to convert the timestamp to the corresponding offsets for each partition internally. + +::: diff --git a/docs/shared/kafka-external-stream-write.md b/docs/shared/kafka-external-stream-write.md new file mode 100644 index 000000000..b01206bd6 --- /dev/null +++ b/docs/shared/kafka-external-stream-write.md @@ -0,0 +1,170 @@ +## Write Data to Kafka + +Timeplus supports writing data to Kafka using various encoding formats such as strings, JSON, CSV, TSV, Avro, and Protobuf. You can write to Kafka using SQL `INSERT` statements, the [Ingest REST API](/proton-ingest-api), or as the target of a [Materialized View](/sql-create-materialized-view). + +### Write as Raw String + +You can encode data as a raw string in Kafka messages: + +```sql +CREATE EXTERNAL STREAM ext_github_events (raw string) +SETTINGS type='kafka', + brokers='localhost:9092', + topic='github_events' +``` + +You can then write data via: + +* `INSERT INTO ext_github_events VALUES ('some string')` +* [Ingest REST API](/proton-ingest-api) +* Materialized View + + +:::info + +Internally, the `data_format` is `RawBLOB`, and `one_message_per_row=true` by default. + +Pay attention to setting `kafka_max_message_size`. When multiple rows can be written to the same Kafka message, this setting will control how many data will be put in a Kafka message, ensuring it won't exceed the `kafka_max_message_size` limit. +::: + +### Write as JSONEachRow + +Encode each row as a separate JSON object (aka JSONL or jsonlines): + +```sql +CREATE EXTERNAL STREAM target( + _tp_time datetime64(3), + url string, + method string, + ip string) + SETTINGS type='kafka', + brokers='redpanda:9092', + topic='masked-fe-event', + data_format='JSONEachRow', + one_message_per_row=true; +``` + +The messages will be generated in the specific topic as + +```json +{ + "_tp_time":"2023-10-29 05:36:21.957" + "url":"https://www.nationalweb-enabled.io/methodologies/killer/web-readiness" + "method":"POST" + "ip":"c4ecf59a9ec27b50af9cc3bb8289e16c" +} +``` + +:::info + +Please note, by default multiple JSON documents will be inserted to the same Kafka message. One JSON document each row/line (JSONEachRow, jsonl). Such default behavior aims to get the maximum writing performance to Kafka/Redpanda. But users need to make sure the downstream applications are able to properly process the json lines. + +If users need a valid JSON per each Kafka message, instead of a JSONL, please set `one_message_per_row=true` e.g. + +```sql +CREATE EXTERNAL STREAM target(_tp_time datetime64(3), url string, ip string) +SETTINGS type='kafka', brokers='redpanda:9092', topic='masked-fe-event', + data_format='JSONEachRow',one_message_per_row=true +``` + +The default value of one_message_per_row is false for `data_format='JSONEachRow'` and true for `data_format='RawBLOB'`. + +::: + +### Write as CSV + +Each row is encoded as one CSV line: + +```sql +CREATE EXTERNAL STREAM target( + _tp_time datetime64(3), + url string, + method string, + ip string) + SETTINGS type='kafka', + brokers='redpanda:9092', + topic='masked-fe-event', + data_format='CSV'; +``` + +The messages will be generated in the specific topic as + +```csv +"2023-10-29 05:35:54.176","https://www.nationalwhiteboard.info/sticky/recontextualize/robust/incentivize","PUT","3eaf6372e909e033fcfc2d6a3bc04ace" +``` + +### Write as TSV + +Same as CSV, but uses **tab characters** as delimiters instead of commas. + +### Write as ProtobufSingle + +To write Protobuf-encoded messages from Kafka topics, please refer to [Protobuf Schema](/timeplus-format-schema), and [Kafka Schema Registry](/kafka-schema-registry) pages for details. + +### Write as Avro + +To write Avro-encoded messages from Kafka topics, please refer to [Avro Schema](/timeplus-format-schema), and [Kafka Schema Registry](/kafka-schema-registry) pages for details. + +### Write Kafka Message Metadata + +#### _tp_message_key + +If users like to populate Kafka message key when producing data to a Kafka topic, users can define the `_tp_message_key` column when creating the external stream. + +For example: +```sql +CREATE EXTERNAL STREAM foo ( + id int32, + name string, + _tp_message_key string +) SETTINGS type='kafka',...; +``` + +After inserting a row to the stream like this: +```sql +INSERT INTO foo(id,name,_tp_message_key) VALUES (1, 'John', 'some-key'); +``` + +* Kafka key will be `'some-key'` +* Message body: `{"id": 1, "name": "John"}`. Kafka key was excluded from the message body. + +`_tp_message_key` supports these types: + +* Numeric: `uint8/16/32/64`, `int8/16/32/64` +* Others: `string`, `bool`, `float32`, `float64`, `fixed_string` +* Nullable are also supported: + +```sql +CREATE EXTERNAL STREAM foo ( + id int32, + name string, + _tp_message_key nullable(string) default null +) SETTINGS type='kafka',...; +``` + +#### _tp_message_headers + +Add Kafka headers via `_tp_message_headers` (map of key-value pairs): + +```sql +CREATE EXTERNAL STREAM example ( + s string, + i int, + ..., + _tp_message_headers map(string, string) +) settings type='kafka',...; +``` + +Then insert rows to the external stream via `INSERT INTO` or Materialized Views, the `_tp_message_headers` will be set to the headers of the Kafka message. + +#### sharding_expr {#sharding_expr} + +`sharding_expr` is used to control how rows are distributed to Kafka partitions: + +```sql +CREATE EXTERNAL STREAM foo ( + id int32,.. +) SETTINGS type='kafka', sharding_expr='hash(id)'...; +``` + +When inserting rows, the partition ID will be evaluated based on the `sharding_expr` and Timeplus will put the message into the corresponding Kafka partition. diff --git a/docs/shared/kafka-external-stream.md b/docs/shared/kafka-external-stream.md new file mode 100644 index 000000000..2ab931901 --- /dev/null +++ b/docs/shared/kafka-external-stream.md @@ -0,0 +1,166 @@ +## Overview + +Timeplus allows users to **read from** and **write to** Apache Kafka (and compatible platforms like **Confluent Cloud** and **Redpanda**) using **Kafka External Streams**. + +By combining external streams with [Materialized Views](/materialized-view) and [Target Streams](/materialized-view#target-stream), users can build robust **real-time streaming pipelines**. + +## Create Kafka External Stream + +Use the following SQL command to create a Kafka external stream: + +```sql +CREATE EXTERNAL STREAM [IF NOT EXISTS] + ( ) +SETTINGS + type='kafka', -- required + brokers='ip:9092', -- required + topic='..', -- required + security_protocol='..', + sasl_mechanism='..', + username='..', + password='..', + config_file='..', + data_format='..', + format_schema='..', + one_message_per_row=.., + kafka_schema_registry_url='..', + kafka_schema_registry_credentials='..', + ssl_ca_cert_file='..', + ssl_ca_pem='..', + skip_ssl_cert_check=.., + properties='..' +``` + +### Settings + +#### type + +Must be set to `kafka`. Compatible with: + +* Apache Kafka +* Confluent Platform or Cloud +* Redpanda +* Other Kafka-compatible systems + +#### brokers + +Comma-separated list of broker addresses (host\:port), e.g.: + +``` +kafka1:9092,kafka2:9092,kafka3:9092 +``` + +#### topic + +Kafka topic name to connect to. + +#### security_protocol + +The supported values for `security_protocol` are: + +- PLAINTEXT: when this option is omitted, this is the default value. +- SASL_SSL: when this value is set, username and password should be specified. + - If users need to specify own SSL certification file, add another setting `ssl_ca_cert_file='/ssl/ca.pem'`. Users can also put the full content of the pem file as a string in the `ssl_ca_pem` setting. + - To skip the SSL certification verification: `skip_ssl_cert_check=true`. + +#### sasl_mechanism + +The supported values for `sasl_mechanism` are: + +- PLAIN: when setting security_protocol to SASL_SSL, this is the default value for sasl_mechanism. +- SCRAM-SHA-256 +- SCRAM-SHA-512 +- AWS_MSK_IAM (for AWS MSK IAM role-based access when EC2 or Kubernetes pod is configured with a proper IAM role) + +#### username / password + +Required when `sasl_mechanism` is set to SCRAM-SHA-256 or SCRAM-SHA-512. + +Alternatively, use [`config_file`](#config_file) to securely pass credentials. + +#### config_file + +Use this to point to a file containing key-value config lines for Kafka external stream, e.g.: + +```properties +username=my_username +password=my_password +data_format='Avro' +one_message_per_row=true +``` + +This is especially useful in Kubernetes environments with secrets managed via [HashiCorp Vault](https://learn.hashicorp.com/tutorials/vault/kubernetes-sidecar). + +**HarshiCorp Vault injection example:** + +```yaml +annotations: + vault.hashicorp.com/agent-inject: "true" + vault.hashicorp.com/agent-inject-status: "update" + vault.hashicorp.com/agent-inject-secret-kafka-secret: "secret/kafka-secret" + vault.hashicorp.com/agent-inject-template-kafka-secret: | + {{- with secret "secret/kafka-secret" -}} + username={{ .Data.data.username }} + password={{ .Data.data.password }} + {{- end }} + vault.hashicorp.com/role: "vault-role" +``` + +:::info + +Please note values in settings in the DDL will override those in config_file and it will only merge the settings from the config_file which are not explicitly specified in the DDL. + +::: + + +#### data_format + +Defines how Kafka messages are parsed and written. Supported formats are + +| Format | Description | +| ---------------- | ---------------------------------------- | +| `JSONEachRow` | Parses one JSON document per line | +| `CSV` | Parses comma-separated values | +| `TSV` | Like CSV, but tab-delimited | +| `ProtobufSingle` | One Protobuf message per Kafka message | +| `Protobuf` | Multiple Protobuf messages per Kafka msg | +| `Avro` | Avro-encoded messages | +| `RawBLOB` | Raw text, no parsing (default) | + +#### format_schema + +Required for these data formats: + +* `ProtobufSingle` +* `Protobuf` +* `Avro` + +#### one_message_per_row + +Set to `true` to ensure each Kafka message maps to exactly **one JSON document**, especially when writing with `JSONEachRow`. + +#### kafka_schema_registry_url + +URL of the [Kafka Schema Registry](/kafka-schema-registry), including the protocol is required (`http://` or `https://`). + +#### kafka_schema_registry_credentials + +Credentials for the registry, in `username:password` format. + +#### ssl_ca_cert_file / ssl_ca_pem + +Use either: + +* `ssl_ca_cert_file='/path/to/cert.pem'` +* `ssl_ca_pem='-----BEGIN CERTIFICATE-----\n...'` + +#### skip_ssl_cert_check + +* Default: `false` +* Set to `true` to **bypass SSL verification**. + +#### properties + +Used for advanced configurations. These settings are passed directly to the Kafka client ([librdkafka config options](https://github.com/confluentinc/librdkafka/blob/master/CONFIGURATION.md)) to fine tune the Kafka producer, consumer or topic behaviors. + +For more, see the `Properties for Kafka client` section. diff --git a/docs/shared/pulsar-external-stream-read.md b/docs/shared/pulsar-external-stream-read.md new file mode 100644 index 000000000..04f475c1d --- /dev/null +++ b/docs/shared/pulsar-external-stream-read.md @@ -0,0 +1,154 @@ +## Read Data from Pulsar + +Timeplus allows reading Pulsar messages in multiple data formats, including: + +* Plain string (raw) +* CSV / TSV +* JSON +* Protobuf +* Avro + +### Read messages in a single column {#single_col_read} + +If the message in Pulsar topic is in plain text format or JSON, you can create an external stream with only a `raw` column in `string` type. + +Example: + +```sql +CREATE EXTERNAL STREAM ext_github_events (raw string) +SETTINGS type='pulsar', service_url='pulsar://host:port', topic='..' +``` + +Then use query time [JSON extraction functions](/functions_for_json) or shortcut to access the values, e.g. `raw:id`. + +### Read messages as multiple columns{#multi_col_read} + +If the keys in the JSON message never change, or you don't care about the new columns, you can also create the external stream with multiple columns. + +You can pick up some top level keys in the JSON as columns, or all possible keys as columns. + +Example: + +```sql +CREATE EXTERNAL STREAM ext_stream_parsed + (address string, firstName string, middleName string, lastName string, email string, username string, password string,sex string,telephoneNumber string, dateOfBirth int64, age uint8, company string,companyEmail string,nationalIdentityCardNumber string,nationalIdentificationNumber string, + passportNumber string) +SETTINGS type='pulsar', + service_url='pulsar+ssl://pc-12345678.gcp-shared-usce1.g.snio.cloud:6651', + topic='persistent://docs/ns/datagen', + data_format='JSONEachRow', + jwt='eyJhb..syFQ' +``` + +If there are nested complex JSON in the message, you can define the column as a string type. Actually any JSON value can be saved in a string column. + +### Virtual Columns + +Pulsar external stream has the follow virtual columns: +#### _tp_time +the event time of the Pulsar message if it's available, or it's the publish time otherwise. +#### _tp_append_time +the publish time of the pulsar message. +#### _tp_process_time +the timestamp when the message was read by Pulsar External Stream. +#### _tp_shard +the partition ID, starting from 0. +#### _pulsar_message_id +an `array` which contains 4 elements: ledger_id, entry_id, partition, and batch_index. +#### _tp_sn +the sequence number in Timeplus, in int64 type. +#### _tp_message_key +the message key (a.k.a partition key). Can be empty. + +#### _tp_message_headers + +Starting from Timeplus Enterprise 2.8.2, you can read and write custom headers via this column. + +Define the column in the DDL: +```sql +CREATE EXTERNAL STREAM example ( + s string, + i int, + ..., + _tp_message_headers map(string, string) +) settings type='pulsar',...; +``` +Then insert data to the external stream via `INSERT INTO` or materialized views, with a map of string pairs as custom headers for each message. + +### Query Settings + +#### shards +You can read in specified Pulsar partitions. By default, all partitions will be read. But you can also read from a single partition via the `shards` setting, e.g. + +```sql +SELECT raw FROM ext_stream SETTINGS shards='0' +``` + +Or you can specify a set of partition ID, separated by comma, e.g. + +```sql +SELECT raw FROM ext_stream SETTINGS shards='0,2' +``` + +#### record_consume_timeout_ms +Use setting `record_consume_timeout_ms` to determine how much time the external can wait for new messages before returning the query result. The smaller the value is, the smaller the latency will be, but also will be less performant. + +### Read existing messages {#rewind} + +When you run `SELECT raw FROM ext_stream `, Timeplus will read the new messages in the topics, not the existing ones. +#### seek_to +If you need to read all existing messages, you can use the following settings: + +```sql +SELECT raw FROM ext_stream SETTINGS seek_to='earliest' +``` + +Or the following SQL: + +```sql +SELECT raw FROM table(ext_stream) WHERE ... +``` +Note: both `earliest` and `latest` are supported. You can also use `seek_to='2024-10-14'` for date or datetime based rewind. But number-based seek_to is not supported. + + +:::warning +Please avoid scanning all existing data via `select * from table(ext_stream)`. +::: + +### Read / Write Pulsar Message Key {#messagekey} + +For each message in the topic, the value is critical for sure. The key is optional but could carry important meta data. + +You can define the `_tp_message_key` column when you create the external stream. + +For example: +```sql +CREATE EXTERNAL STREAM test_msg_key ( + id int32, + name string, + _tp_message_key string +) SETTINGS type='pulsar', + service_url='pulsar://host.docker.internal:6650', + topic='persistent://public/default/msg-key' +``` +You can insert any data to the Pulsar topic. + +When insert a row to the stream like: +```sql +INSERT INTO test_msg_key(id,name,_tp_message_key) VALUES (1, 'John', 'some-key'); +``` +`'some-key'` will be used for the message key for the Pulsar message (and it will be excluded from the message body, so the message will be `{"id": 1, "name": "John"}` for the above SQL). + +When doing a SELECT query, the message key will be populated to the `_tp_message_key` column as well. +`SELECT * FROM test_msg_key` will return `'some-key'` for the `_tp_message_key` message. + +`_tp_message_key` support the following types: `uint8`, `uint16`, `uint32`, `uint64`, `int8`, `int16`, `int32`, `int64`, `bool`, `float32`, `float64`, `string`, and `fixed_string`. + +`_tp_message_key` also support `nullable`. Thus we can create an external stream with optional message key. For example: +```sql +CREATE EXTERNAL STREAM foo ( + id int32, + name string, + _tp_message_key nullable(string) default null +) SETTINGS type='pulsar',...; +``` diff --git a/docs/shared/pulsar-external-stream-write.md b/docs/shared/pulsar-external-stream-write.md new file mode 100644 index 000000000..801519e13 --- /dev/null +++ b/docs/shared/pulsar-external-stream-write.md @@ -0,0 +1,201 @@ +## Write Data to Pulsar + +### Write to Pulsar in Plain Text {#single_col_write} + +You can write plain text messages to Pulsar topics with an external stream with a single column. + +```sql +CREATE EXTERNAL STREAM ext_github_events (raw string) +SETTINGS type='pulsar', service_url='pulsar://host:port', topic='..' +``` + +Then use either `INSERT INTO VALUES (v)`, or [Ingest REST API](/proton-ingest-api), or set it as the target stream for a materialized view to write message to the Pulsar topic. The actual `data_format` value is `RawBLOB` but this can be omitted. By default `one_message_per_row` is `true`. + +#### Advanced Settings for writing data + +Settings for controlling the producer behavior: + * `output_batch_max_messages` - Set the max number of messages permitted in a batch. If you set this option to a value greater than 1, messages are queued until this threshold is reached or batch interval has elapsed. + * `output_batch_max_size_bytes` - Set the max size of messages permitted in a batch. If you set this option to a value greater than 1, messages are queued until this threshold is reached or batch interval has elapsed. + * `output_batch_max_delay_ms` - Set the max time for message publish delay permitted in a batch. + * `pulsar_max_pending_messages` - Set the max size of the producer's queue holding the messages pending to receive an acknowledgment from the broker. When the queue is full, the producer will be blocked. + +### Multiple columns to write to Pulsar{#multi_col_write} + +To write structured data to Pulsar topics, you can choose different data formats: + +##### RawBLOB +Write the content as pain text. + +#### JSONEachRow + +You can use `data_format='JSONEachRow',one_message_per_row=true` to inform Timeplus to write each event as a JSON document. The columns of the external stream will be converted to keys in the JSON documents. For example: + +```sql +CREATE EXTERNAL STREAM target( + _tp_time datetime64(3), + url string, + method string, + ip string) + SETTINGS type='pulsar', + service_url='pulsar://host:port', + topic='..', + data_format='JSONEachRow', + one_message_per_row=true; +``` + +The messages will be generated in the specific topic as + +```json +{ +"_tp_time":"2023-10-29 05:36:21.957" +"url":"https://www.nationalweb-enabled.io/methodologies/killer/web-readiness" +"method":"POST" +"ip":"c4ecf59a9ec27b50af9cc3bb8289e16c" +} +``` + +:::info + +Please note, by default multiple JSON documents will be inserted to the same Pulsar message. One JSON document each row/line. Such default behavior aims to get the maximum writing performance to Pulsar. But you need to make sure the downstream applications are able to properly split the JSON documents per message. + +If you need a valid JSON per each message, instead of a JSONL, please set `one_message_per_row=true` e.g. + +```sql +CREATE EXTERNAL STREAM target(_tp_time datetime64(3), url string, ip string) +SETTINGS type='pulsar', service_url='pulsar://host:port', topic='..', + data_format='JSONEachRow',one_message_per_row=true +``` + +The default value of one_message_per_row, if not specified, is false for `data_format='JSONEachRow'` and true for `data_format='RawBLOB'`. +::: + +#### CSV + +You can use `data_format='CSV'` to inform Timeplus to write each event as a JSON document. The columns of the external stream will be converted to keys in the JSON documents. For example: + +```sql +CREATE EXTERNAL STREAM target( + _tp_time datetime64(3), + url string, + method string, + ip string) + SETTINGS type='pulsar', + service_url='pulsar://host:port', + topic='..', + data_format='CSV'; +``` + +The messages will be generated in the specific topic as + +```csv +"2023-10-29 05:35:54.176","https://www.nationalwhiteboard.info/sticky/recontextualize/robust/incentivize","PUT","3eaf6372e909e033fcfc2d6a3bc04ace" +``` + +##### TSV +Similar to CSV but tab as the separator. + +#### ProtobufSingle + +You can write Protobuf-encoded messages in Pulsar topics. + +First, you need to create a schema with SQL, e.g. +```sql +CREATE OR REPLACE FORMAT SCHEMA schema_name AS ' + syntax = "proto3"; + + message SearchRequest { + string query = 1; + int32 page_number = 2; + int32 results_per_page = 3; + } + ' TYPE Protobuf +``` +Then refer to this schema while creating an external stream for Pulsar: +```sql +CREATE EXTERNAL STREAM stream_name( + query string, + page_number int32, + results_per_page int32) +SETTINGS type='pulsar', + service_url='pulsar://host.docker.internal:6650', + topic='persistent://public/default/protobuf', + data_format='ProtobufSingle', + format_schema='schema_name:SearchRequest' +``` + +Then you can run `INSERT INTO` or use a materialized view to write data to the topic. +```sql +INSERT INTO stream_name(query,page_number,results_per_page) VALUES('test',1,100) +``` + +Please refer to [Protobuf/Avro Schema](/timeplus-format-schema) for more details. + +#### Avro + +You can write messages in Avro format. + +First, you need to create a schema with SQL, e.g. +```sql +CREATE OR REPLACE FORMAT SCHEMA avro_schema AS '{ + "namespace": "example.avro", + "type": "record", + "name": "User", + "fields": [ + {"name": "name", "type": "string"}, + {"name": "favorite_number", "type": ["int", "null"]}, + {"name": "favorite_color", "type": ["string", "null"]} + ] + } + ' TYPE Avro; +``` +Then refer to this schema while creating an external stream for Pulsar: +```sql +CREATE EXTERNAL STREAM stream_avro( + name string, + favorite_number nullable(int32), + favorite_color nullable(string)) +SETTINGS type='pulsar', + service_url='pulsar://host.docker.internal:6650', + topic='persistent://public/default/avro', + data_format='Avro', + format_schema='avro_schema' +``` + +Then you can run `INSERT INTO` or use a materialized view to write data to the topic. +```sql +INSERT INTO stream_avro(name,favorite_number,favorite_color) VALUES('test',1,'red') +``` + +Please refer to [Protobuf/Avro Schema](/timeplus-format-schema) for more details. + +### Continuously Write to Pulsar via MV + +You can use materialized views to write data to Pulsar as an external stream, e.g. + +```sql +-- read the topic via an external stream +CREATE EXTERNAL STREAM frontend_events(raw string) + SETTINGS type='pulsar', + service_url='pulsar://host:port', + topic='owlshop-frontend-events'; + +-- create the other external stream to write data to the other topic +CREATE EXTERNAL STREAM target( + _tp_time datetime64(3), + url string, + method string, + ip string) + SETTINGS type='pulsar', + service_url='pulsar://host:port', + topic='..', + data_format='JSONEachRow', + one_message_per_row=true; + +-- setup the ETL pipeline via a materialized view +CREATE MATERIALIZED VIEW mv INTO target AS + SELECT now64() AS _tp_time, + raw:requestedUrl AS url, + raw:method AS method, + lower(hex(md5(raw:ipAddress))) AS ip + FROM frontend_events; +``` diff --git a/docs/shared/pulsar-external-stream.md b/docs/shared/pulsar-external-stream.md new file mode 100644 index 000000000..a4b82fb7a --- /dev/null +++ b/docs/shared/pulsar-external-stream.md @@ -0,0 +1,102 @@ +## Overview + +[Apache® Pulsar™](https://pulsar.apache.org/) is a multi-tenant, high-performance solution for server-to-server messaging. + +In [Timeplus Enterprise v2.5](/enterprise-v2.5), we added the first-class integration for Apache Pulsar, as a new type of [External Stream](/external-stream). You can read or write data in Apache Pulsar or StreamNative Cloud. This is also available in Timeplus Proton, since v1.6.8. + +## Create Pulsar External Stream + +To create an external stream for Apache Pulsar, you can run the following DDL SQL: + +```sql +CREATE EXTERNAL STREAM [IF NOT EXISTS] stream_name + ( ) +SETTINGS + type='pulsar', -- required + service_url='pulsar://host:port',-- required + topic='..', -- required + jwt='..', + config_file='..', + data_format='..', + format_schema='..', + one_message_per_row=.., + skip_server_cert_check=.., + validate_hostname=.., + ca_cert='..', + client_cert='..', + client_key='..', + connections_per_broker=.., + memory_limit=.., + io_threads=.. +``` +### Connect to a local Apache Pulsar + +If you have a local Apache Pulsar server running, you can run the following SQL DDL to create an external stream to connect to it. + +```sql +CREATE EXTERNAL STREAM local_pulsar (raw string) +SETTINGS type='pulsar', + service_url='pulsar://localhost:6650', + topic='persistent://public/default/my-topic' +``` + +### Connect to StreamNative Cloud +If you have the access to [StreamNative Cloud](https://console.streamnative.cloud), you can run the following SQL DDL to create an external stream to connect to it, with a proper [API Key](https://docs.streamnative.io/docs/api-keys-overview) for a service account. Make sure you choose "Create API Key", instead of the "Get JWT Token (Depreciated)". + +![screenshot](/img/pulsar_api_key.png) + +```sql +CREATE EXTERNAL STREAM ext_stream (raw string) +SETTINGS type='pulsar', + service_url='pulsar+ssl://pc-12345678.gcp-shared-usce1.g.snio.cloud:6651', + topic='persistent://tenant/namespace/topic', + jwt='eyJh..syFQ' +``` + +### DDL Settings + +#### skip_server_cert_check +Default false. If set to true, it will accept untrusted TLS certificates from brokers. + +#### validate_hostname + +Default false. Configure whether it allows validating hostname verification when a client connects to a broker over TLS. +#### ca_cert +The CA certificate (PEM format), which will be used to verify the server's certificate. +#### client_cert +The certificate (PEM format) for the client to use mTLS authentication. [Learn more](https://pulsar.apache.org/docs/3.3.x/security-tls-authentication/). +#### client_key +The private key (PEM format) for the client to use mTLS authentication. +#### jwt +The JSON Web Tokens for the client to use JWT authentication. [Learn more](https://docs.streamnative.io/docs/api-keys-overview). +#### config_file +The `config_file` setting is available since Timeplus Enterprise 2.7. You can specify the path to a file that contains the configuration settings. The file should be in the format of `key=value` pairs, one pair per line. You can set the Pulsar credentials in the file. + +Please follow the example in [Kafka External Stream](/kafka-source#config_file). +#### connections_per_broker +Default 1. Sets the max number of connection that this external stream will open to a single broker. By default, the connection pool will use a single connection for all the producers and consumers. +#### memory_limit +Default 0 (unlimited). Configure a limit on the amount of memory that will be allocated by this external stream. +#### io_threads +Default 1. Set the number of I/O threads to be used by the Pulsar client. + +Like [Kafka External Stream](/kafka-source), Pulsar External Stream also supports all format related settings: `data_format`, `format_schema`, `one_message_per_row`, etc. + +#### data_format +The supported values for `data_format` are: + +- JSONEachRow: parse each row of the message as a single JSON document. The top level JSON key/value pairs will be parsed as the columns. +- CSV: less commonly used. +- TSV: similar to CSV but tab as the separator +- ProtobufSingle: for single Protobuf message per message +- Protobuf: there could be multiple Protobuf messages in a single message. +- Avro +- RawBLOB: the default value. Read/write message as plain text. + +For data formats which write multiple rows into one single message (such as `JSONEachRow` or `CSV`), two more advanced settings are available: + +#### max_insert_block_size +`max_insert_block_size` to control the maximum number of rows can be written into one message. + +#### max_insert_block_bytes +`max_insert_block_bytes` to control the maximum size (in bytes) that one message can be. diff --git a/docs/s3-external.md b/docs/shared/s3-external-table.md similarity index 99% rename from docs/s3-external.md rename to docs/shared/s3-external-table.md index 7ef784b39..256104e62 100644 --- a/docs/s3-external.md +++ b/docs/shared/s3-external-table.md @@ -1,10 +1,10 @@ -# S3 External Table +## Overview Amazon S3 is cloud object storage with industry-leading scalability, data availability, security, and performance. In [Timeplus Enterprise v2.7](/enterprise-v2.7), we added the first-class integration for S3-compatible object storage systems, as a new type of External Table. You can read or write data in Amazon S3 or S3-compatible cloud or local storage. -## CREATE EXTERNAL TABLE +## Create S3 External Table To create an external table for S3, you can run the following DDL SQL: @@ -134,7 +134,7 @@ The AWS secret access key. It's optional when `use_environment_credentials` is ` #### config_file The `config_file` setting is available since Timeplus Enterprise 2.7. You can specify the path to a file that contains the configuration settings. The file should be in the format of `key=value` pairs, one pair per line. You can set the AWS access key ID and secret access key in the file. -Please follow the example in [Kafka External Stream](/proton-kafka#config_file). +Please follow the example in [Kafka External Stream](/kafka-source#config_file). #### region The region where the S3 bucket is located, such as `us-west-1`. Optional for GCS. @@ -260,12 +260,6 @@ While reading from an S3 external table, you can use the following virtual colum * `_path` — Path to the file. Type: `low_cardinalty(string)`. In case of archive, shows path in a format: `"{path_to_archive}::{path_to_file_inside_archive}"` * `_file` — Name of the file. Type: `low_cardinalty(string)`. In case of archive shows name of the file inside the archive. -## DROP EXTERNAL TABLE - -```sql -DROP STREAM [IF EXISTS] name -``` - ## Limitations 1. The UI wizard to setup S3 External Table is coming soon. Before it's ready, you need the SQL DDL. diff --git a/docs/timeplus-external-stream.md b/docs/shared/timeplus-external-stream.md similarity index 75% rename from docs/timeplus-external-stream.md rename to docs/shared/timeplus-external-stream.md index 0f084f700..8f3559cf2 100644 --- a/docs/timeplus-external-stream.md +++ b/docs/shared/timeplus-external-stream.md @@ -1,6 +1,6 @@ -# Timeplus External Stream +## Overview -In addition to [Kafka External Stream](/proton-kafka) and [Pulsar External Stream](/pulsar-external-stream), Timeplus also supports another type of external stream to read/write data from/to another Timeplus Enterprise or Timeplus Proton deployment. +In addition to [Kafka External Stream](/kafka-source) and [Pulsar External Stream](/pulsar-source), Timeplus also supports another type of external stream to read/write data from/to another Timeplus Enterprise or Timeplus Proton deployment. ## Use Cases @@ -8,7 +8,8 @@ By introducing the external stream for Timeplus, you can implement many new use * **Hybrid Deployment**: you can deploy Timeplus Enterprise in both public cloud and private cloud, even on the edge servers. Using the Timeplus External Stream, you can run federation search from one Timeplus deployment to query data from the other deployment, without replicating the data. Alternatively, you can continuously send data from one deployment to the other deployment; or accumulate data at the edge servers and forward high value data to the cloud deployment when the edge server can connect to the servers in the cloud. * **Data Migration or Upgrade**: when you are ready to go production, you may use the Timeplus External Stream to transfer data from the staging cluster to the production cluster. Or if you need to upgrade Timeplus Enterprise across major releases, this type of external stream can help you to transfer data. -## Syntax +## Create Timeplus External Stream + ```sql CREATE EXTERNAL STREAM SETTINGS @@ -28,15 +29,13 @@ Settings: * **password**: the password for the remote Timeplusd. The default value is an empty string. * **secure**: a bool for whether to use secure connection to the remote Timeplusd. The default value is false. Use port 9440 when `secure` is set to true, otherwise use port 8463. * **stream**: the stream name in the remote Timeplusd. It's required and there is no default value. -* **config_file**: since Timeplus Enterprise 2.7, you can specify a config file to load the settings from. Please follow the example in [Kafka External Stream](/proton-kafka#config_file). +* **config_file**: you can specify a config file to load the settings from. Please follow the example in [Kafka External Stream](/kafka-source#config_file). ## Examples ### Migrate data from Timeplus Proton to Timeplus Enterprise + If you have deployed [Timeplus Proton](https://github.com/timeplus-io/proton) and want to load those data to a Timeplus Enterprise deployment, you cannot upgrade in place. You can use the Timeplus External Stream to migrate data. -:::info -The Timeplus Proton need to be 1.5.15 or above. -::: For example, there is a stream `streamA` in Timeplus Proton, running on host1. @@ -53,6 +52,7 @@ SELECT * FROM streama_proton WHERE _tp_time>earliest_ts(); ``` ### Upload data from edge server to the cloud + If you deploy Timeplus Proton or Timeplus Enterprise at edge servers, it can collect and process live data with high performance and low footprint. The important data can be uploaded to the other Timeplus Enterprise in the cloud when the internet is available. For example, on the edge server, you collect the real-time web access log and only want to upload error logs to the server. @@ -64,10 +64,3 @@ SETTINGS type='timeplus',hosts='cloud1',stream='..'; INSERT INTO stream_in_cloud SELECT * FROM local_stream WHERE http_code>=400; ``` - -## Limitations -* [window functions](/functions_for_streaming) like tumble/hop are not working yet. -* can't read virtual columns on remote streams. -* [table function](/functions_for_streaming#table) is not supported in timeplusd 2.3.21 or earlier version. This has been enhanced since timeplusd 2.3.22. -* Timeplus Proton earlier than 1.6.9 doesn't support the Timeplus external stream. -* In Timeplus Proton, if your materialized view queries a Timeplus external stream, the checkpoint of the external stream may not be properly persisted. No such issue for Timeplus Enterprise and we are working on the fix. diff --git a/docs/showcases.md b/docs/showcases.md index 6f58585d8..dbd0e484d 100644 --- a/docs/showcases.md +++ b/docs/showcases.md @@ -99,7 +99,7 @@ At Timeplus, we collect various logs, metrics and usage data and send them to ou ### Metering for usage-based pricing -By leveraging streaming SQL, [Versioned Stream](/versioned-stream), [HTTP ingestion](/ingest-api), [WebHook sink](/destination#webhook) and many other features, we collect real-time infrastructure usage per tenants, apply lookup and aggregation, and send data to our usage-based pricing vendor, ([Paigo](https://paigo.tech/)). +By leveraging streaming SQL, [Versioned Stream](/versioned-stream), [HTTP ingestion](/ingest-api), [WebHook sink](/send-data-out#webhook) and many other features, we collect real-time infrastructure usage per tenants, apply lookup and aggregation, and send data to our usage-based pricing vendor, ([Paigo](https://paigo.tech/)). [Read case study](https://www.timeplus.com/post/usage-based-pricing-with-timeplus-and-paigo). diff --git a/docs/slack-external.md b/docs/slack-external.md new file mode 100644 index 000000000..6122c1d51 --- /dev/null +++ b/docs/slack-external.md @@ -0,0 +1,25 @@ +# Slack + +Leveraging HTTP external stream, you can write / materialize data to Slack directly from Timeplus to trigger notifications. + +## Trigger Slack Notifications {#example-trigger-slack} + +You can follow [the guide](https://api.slack.com/messaging/webhooks) to configure an "incoming webhook" to send notifications to a Slack channel. + +```sql +CREATE EXTERNAL STREAM http_slack_t1 (text string) SETTINGS +type = 'http', data_format='Template', +format_template_resultset_format='{"blocks":[{"type":"section","text":{"type":"mrkdwn","text":"${data}"}}]}', +format_template_row_format='${text:Raw}', +url = 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX' +``` + +Then you can insert data via a materialized view or just via `INSERT` command: +```sql +INSERT INTO http_slack_t1 VALUES('Hello World!'); +INSERT INTO http_slack_t1 VALUES('line1\nline2'); +INSERT INTO http_slack_t1 VALUES('msg1'),('msg2'); +INSERT INTO http_slack_t1 VALUES('This is unquoted text\n>This is quoted text\n>This is still quoted text\nThis is unquoted text again'); +``` + +Please follow Slack's [text formats](https://api.slack.com/reference/surfaces/formatting) guide to add rich text to your messages. diff --git a/docs/splunk-external.md b/docs/splunk-external.md new file mode 100644 index 000000000..143faff22 --- /dev/null +++ b/docs/splunk-external.md @@ -0,0 +1,22 @@ +# Splunk + +Leveraging HTTP external stream, you can write / materialize data to Splunk directly from Timeplus. + +## Write to Splunk {#example-write-to-splunk} + +Follow [the guide](https://docs.splunk.com/Documentation/Splunk/9.4.1/Data/UsetheHTTPEventCollector) to set up and use HTTP Event Collector(HEC) in Splunk. Make sure you create a HEC token for the desired index and enable it. + +Create the HTTP external stream in Timeplus: +```sql +CREATE EXTERNAL STREAM http_splunk_t1 (event string) +SETTINGS +type = 'http', +data_format = 'JSONEachRow', +http_header_Authorization='Splunk the-hec-token', +url = 'http://host:8088/services/collector/event' +``` + +Then you can insert data via a materialized view or just +```sql +INSERT INTO http_splunk_t1 VALUES('test1'),('test2'); +``` diff --git a/docs/sql-create-database.md b/docs/sql-create-database.md index f84f2a6af..20cac914f 100644 --- a/docs/sql-create-database.md +++ b/docs/sql-create-database.md @@ -75,7 +75,7 @@ SETTINGS storage_credential=''; ``` -[Learn more](/iceberg#syntax) +[Learn more](/iceberg-source) ## See also diff --git a/docs/sql-create-disk.md b/docs/sql-create-disk.md index 6f1f5939f..129201c9a 100644 --- a/docs/sql-create-disk.md +++ b/docs/sql-create-disk.md @@ -1,7 +1,7 @@ # CREATE DISK By default, Timeplus only created a `default` disk for local storage. -Starting from [Timeplus Enterprise 2.8](/enterprise-v2.8), you can create S3 disks for [tiered storage](/tiered-storage) or [autoscaling materialized views](/view#autoscaling_mv). +Starting from [Timeplus Enterprise 2.8](/enterprise-v2.8), you can create S3 disks for [tiered storage](/tiered-storage) or [autoscaling materialized views](/materialized-view#autoscaling_mv). ## Syntax You can create a S3 disk with the following SQL: @@ -15,7 +15,7 @@ CREATE DISK name disk( ) ``` -Please refer to [S3 External Table](/s3-external) for how to connect to the S3 storage. It's not recommended to hardcode the access key and secret access key in the DDL. Instead, users should use environment variables or IAM role to secure these credentials. +Please refer to [S3 External Table](/s3-sink) for how to connect to the S3 storage. It's not recommended to hardcode the access key and secret access key in the DDL. Instead, users should use environment variables or IAM role to secure these credentials. You can use the following SQL to list the disks: ```sql diff --git a/docs/sql-create-external-stream.md b/docs/sql-create-external-stream.md index 505685dba..28cdc52f7 100644 --- a/docs/sql-create-external-stream.md +++ b/docs/sql-create-external-stream.md @@ -1,6 +1,6 @@ # CREATE EXTERNAL STREAM -External stream for Kafka is official supported. The external stream for local log files is at technical preview. In Timeplus Enterprise, it also supports [another type of External Stream](/timeplus-external-stream) to read/write data for a remote Timeplus Enterprise. +External stream for Kafka is official supported. The external stream for local log files is at technical preview. In Timeplus Enterprise, it also supports [another type of External Stream](/timeplus-source) to read/write data for a remote Timeplus Enterprise. ## Kafka External Stream ```sql @@ -24,7 +24,7 @@ SETTINGS type='kafka', config_file='..' ``` -Please check the [Kafka External Stream](/proton-kafka) for more details about the settings, and [this doc](/tutorial-sql-connect-kafka) for examples to connect to various Kafka API compatible message platforms. +Please check the [Kafka External Stream](/kafka-source) for more details about the settings, and [this doc](/tutorial-sql-connect-kafka) for examples to connect to various Kafka API compatible message platforms. ## Pulsar External Stream ```sql @@ -49,7 +49,7 @@ SETTINGS config_file='..' ``` -Please check the [Pulsar External Stream](/pulsar-external-stream#ddl-settings) for more details. +Please check the [Pulsar External Stream](/pulsar-source) for more details. ## Timeplus External Stream ```sql diff --git a/docs/sql-create-format-schema.md b/docs/sql-create-format-schema.md index caa88ff71..b29c21e47 100644 --- a/docs/sql-create-format-schema.md +++ b/docs/sql-create-format-schema.md @@ -1,6 +1,6 @@ # CREATE FORMAT -Timeplus supports reading or writing messages in [Protobuf](https://protobuf.dev/) or [Avro](https://avro.apache.org) format. This document covers how to process data without a Schema Registry. Check [this page](/proton-schema-registry) if your Kafka topics are associated with a Schema Registry. +Timeplus supports reading or writing messages in [Protobuf](https://protobuf.dev/) or [Avro](https://avro.apache.org) format. This document covers how to process data without a Schema Registry. Check [this page](/kafka-schema-registry) if your Kafka topics are associated with a Schema Registry. Without a Schema Registry, you need to define the Protobuf or Avro schema using SQL. @@ -39,7 +39,7 @@ Please note: 1. If you want to ensure there is only a single Protobuf message per Kafka message, please set `data_format` to `ProtobufSingle`. If you set it to `Protobuf`, then there could be multiple Protobuf messages in a single Kafka message. 2. The `format_schema` setting contains two parts: the registered schema name (in this example: schema_name), and the message type (in this example: SearchRequest). Combining them together with a semicolon. 3. You can use this external stream to read or write Protobuf messages in the target Kafka/Confluent topics. -4. For more advanced use cases, please check the [examples for complex schema](/proton-format-schema#protobuf_complex). +4. For more advanced use cases, please check the [examples for complex schema](/timeplus-format-schema#protobuf_complex). ## Avro Available since Proton 1.5.10. diff --git a/docs/sql-show-disks.md b/docs/sql-show-disks.md index f049c2816..3ed24c31e 100644 --- a/docs/sql-show-disks.md +++ b/docs/sql-show-disks.md @@ -1,5 +1,5 @@ # SHOW DISKS -Starting from [Timeplus Enterprise 2.8](/enterprise-v2.8), you can create S3 disks for [tiered storage](/tiered-storage) or [autoscaling materialized views](/view#autoscaling_mv). +Starting from [Timeplus Enterprise 2.8](/enterprise-v2.8), you can create S3 disks for [tiered storage](/tiered-storage) or [autoscaling materialized views](/materialized-view#autoscaling_mv). ## Syntax diff --git a/docs/streaming-aggregations.md b/docs/streaming-aggregations.md index 5cc39981f..c40b2467a 100644 --- a/docs/streaming-aggregations.md +++ b/docs/streaming-aggregations.md @@ -36,7 +36,7 @@ As an advanced feature, Timeplus supports various policies to emit results durin Please note, we updated the EMIT syntax in [Timeplus Enterprise 2.7.6](/enterprise-v2.7#2_7_6). Please upgrade to the latest version to use those refined emit polices. ::: -For [global aggregations](/stream-query#global-aggregation), the syntax is: +For [global aggregations](/streaming-query#global-aggregation), the syntax is: ```sql EMIT [STREAM|CHANGELOG] @@ -46,9 +46,9 @@ EMIT [STREAM|CHANGELOG] ``` By default `EMIT STREAM` and `PERIODIC 2s` are applied. Advanced settings: -* `EMIT CHANGELOG` works for [global aggregations](/stream-query#global-aggregation) and [non-aggregation tail/filter](/stream-query#non-aggregation). It will output `+1` or `-1` for `_tp_delta` column. +* `EMIT CHANGELOG` works for [global aggregations](/streaming-query#global-aggregation) and [non-aggregation tail/filter](/streaming-query#non-aggregation). It will output `+1` or `-1` for `_tp_delta` column. -For [time-window aggregations](/stream-query#window-aggregation), the syntax is: +For [time-window aggregations](/streaming-query#window-aggregation), the syntax is: ```sql EMIT diff --git a/docs/joins.md b/docs/streaming-joins.md similarity index 100% rename from docs/joins.md rename to docs/streaming-joins.md diff --git a/docs/stream-query.md b/docs/streaming-query.md similarity index 100% rename from docs/stream-query.md rename to docs/streaming-query.md diff --git a/docs/tiered-storage.md b/docs/tiered-storage.md index a06e3674e..faec5cf67 100644 --- a/docs/tiered-storage.md +++ b/docs/tiered-storage.md @@ -29,7 +29,7 @@ CREATE DISK name disk( The `type` needs to be `s3` to create a S3 disk storage. Timeplus also supports disk with `s3_plain` but that is for S3-based materialized view checkpointing. -Please refer to [S3 External Table](/s3-external) for how to connect to the S3 storage. It's not recommended to hardcode the access key and secret access key in the DDL. Instead, users should use [environment variables](/s3-external#use_environment_credentials) or IAM role to secure these credentials. +Please refer to [S3 External Table](/s3-sink) for how to connect to the S3 storage. It's not recommended to hardcode the access key and secret access key in the DDL. Instead, users should use [environment variables](/s3-sink#use_environment_credentials) or IAM role to secure these credentials. You can use the following SQL to list the disks: diff --git a/docs/timeplus-enterprise.md b/docs/timeplus-enterprise.md index 123755d2d..c87050389 100644 --- a/docs/timeplus-enterprise.md +++ b/docs/timeplus-enterprise.md @@ -5,7 +5,7 @@ We've recently reorganized our documentation to provide you with better, more fo * [Introduction to Timeplus Enterprise](/) * [Why Timeplus](/why-timeplus) * [Use cases](/showcases) -* [Comparing Timeplus Enterprise and Timeplus Proton](/compare) +* [Comparing Timeplus Enterprise and Timeplus Proton](/proton-oss-vs-enterprise) * [Deploying in Kubernetes or Bare Metal](/k8s-helm) ## Need Help? diff --git a/docs/timeplus-external-stream-sink.mdx b/docs/timeplus-external-stream-sink.mdx new file mode 100644 index 000000000..8164401c1 --- /dev/null +++ b/docs/timeplus-external-stream-sink.mdx @@ -0,0 +1,8 @@ +--- +id: timeplus-sink +title: Remote Timeplus +--- + +import ExternalTimeplusSink from './shared/timeplus-external-stream.md'; + + diff --git a/docs/timeplus-external-stream-source.mdx b/docs/timeplus-external-stream-source.mdx new file mode 100644 index 000000000..81c5eaacc --- /dev/null +++ b/docs/timeplus-external-stream-source.mdx @@ -0,0 +1,8 @@ +--- +id: timeplus-source +title: Remote Timeplus +--- + +import ExternalTimeplusRead from './shared/timeplus-external-stream.md'; + + diff --git a/docs/proton-format-schema.md b/docs/timeplus-format-schema.md similarity index 97% rename from docs/proton-format-schema.md rename to docs/timeplus-format-schema.md index eb702f882..259e33edc 100644 --- a/docs/proton-format-schema.md +++ b/docs/timeplus-format-schema.md @@ -1,7 +1,8 @@ -# Protobuf/Avro Schema -Timeplus supports reading or writing messages in [Protobuf](https://protobuf.dev/) or [Avro](https://avro.apache.org) format for [Kafka External Stream](/proton-kafka) or [Pulsar External Stream](/pulsar-external-stream). This document covers how to process data without a Schema Registry. Check [this page](/proton-schema-registry) if your Kafka topics are associated with a Schema Registry. +# Protobuf / Avro Schema -## Create A Schema {#create} +Timeplus supports reading or writing messages in [Protobuf](https://protobuf.dev/) or [Avro](https://avro.apache.org) format for [Kafka External Stream](/kafka-source) or [Pulsar External Stream](/pulsar-source). This document covers how to process data without a Schema Registry. Check [this page](/kafka-schema-registry) if your Kafka topics are associated with a Schema Registry. + +## Create Schema {#create} Without a Schema Registry, you need to define the Protobuf or Avro schema using SQL. @@ -45,7 +46,7 @@ Please note: 1. If you want to ensure there is only a single Protobuf message per Kafka message, please set `data_format` to `ProtobufSingle`. If you set it to `Protobuf`, then there could be multiple Protobuf messages in a single Kafka message. 2. The `format_schema` setting contains two parts: the registered schema name (in this example: schema_name), and the message type (in this example: SearchRequest). Combining them together with a semicolon. 3. You can use this external stream to read or write Protobuf messages in the target Kafka/Confluent topics. -4. For more advanced use cases, please check the [examples for complex schema](/proton-format-schema#protobuf_complex). +4. For more advanced use cases, please check the [examples for complex schema](/timeplus-format-schema#protobuf_complex). ### Avro Available since Timeplus Proton 1.5.10. diff --git a/docs/tutorial-sql-etl-kafka-to-ch.md b/docs/tutorial-sql-etl-kafka-to-ch.md index 0aaaf0935..82ca62f29 100644 --- a/docs/tutorial-sql-etl-kafka-to-ch.md +++ b/docs/tutorial-sql-etl-kafka-to-ch.md @@ -34,7 +34,7 @@ In the demo docker compose stack, a Redpanda container is started, together with The goal of this tutorial is to read these access logs and turn the sensitive IP addresses into md5 and ingest them to ClickHouse for more business analysis. -To read data from Kafka or Redpanda, you just need to create an [Kafka External Stream](/proton-kafka) with the following DDL SQL: +To read data from Kafka or Redpanda, you just need to create an [Kafka External Stream](/kafka-source) with the following DDL SQL: ```sql CREATE EXTERNAL STREAM frontend_events(raw string) @@ -63,7 +63,7 @@ CREATE MATERIALIZED VIEW mv INTO ch_local AS FROM frontend_events; ``` -Once the materialized view is created, it will work as a background ETL job in Proton, to continuously read data from Kafka/Redpanda, apply transformations or aggregations, then send results to ClickHouse. To learn more about Materialized View in Proton, please refer to [this documentation](/view#m_view). +Once the materialized view is created, it will work as a background ETL job in Proton, to continuously read data from Kafka/Redpanda, apply transformations or aggregations, then send results to ClickHouse. To learn more about Materialized View in Proton, please refer to [this documentation](/materialized-view). Now if you go back to ClickHouse and run `select * from events`, you will see new data coming at sub-second latency. diff --git a/docs/tutorial-sql-etl-mysql-to-ch.md b/docs/tutorial-sql-etl-mysql-to-ch.md index 21bb05564..771e873dd 100644 --- a/docs/tutorial-sql-etl-mysql-to-ch.md +++ b/docs/tutorial-sql-etl-mysql-to-ch.md @@ -63,8 +63,8 @@ You can use `docker exec -it proton-client -h 127.0.0.1 -m -n` to run the Copy the content of `mysql-to-clickhouse.sql`and paste in the Proton Client and run them together. What will happen: 1. One [Timeplus External Stream](/external-stream) will be created to read the MySQL CDC data from the Kafka/Redpanda topic. -2. One [External Table](/proton-clickhouse-external-table) will be created to write data from Timeplus to ClickHouse. -3. One [Materialized View](/view#m_view) will be created to continuously read data from Kafka and write to ClickHouse. +2. One [External Table](/clickhouse-external-table) will be created to write data from Timeplus to ClickHouse. +3. One [Materialized View](/materialized-view) will be created to continuously read data from Kafka and write to ClickHouse. The content of the `mysql-to-clickhouse.sql` is: diff --git a/docs/tutorial-sql-join.md b/docs/tutorial-sql-join.md index 89537aaca..f06cf99f0 100644 --- a/docs/tutorial-sql-join.md +++ b/docs/tutorial-sql-join.md @@ -46,7 +46,7 @@ Note: * Two CTE are defined to parse the JSON attribute as columns * `SETTINGS seek_to='earliest'` is the special settings to fetch earliest data from the Kafka topic * `USING(id)` is same as `ON left.id=right.id` -* Check [JOIN](/joins) for more options to join dynamic and static data +* Check [JOIN](/streaming-joins) for more options to join dynamic and static data :::info diff --git a/docs/tutorial-testcontainers-java.md b/docs/tutorial-testcontainers-java.md index 951113ee9..8ab541961 100644 --- a/docs/tutorial-testcontainers-java.md +++ b/docs/tutorial-testcontainers-java.md @@ -199,7 +199,7 @@ We will create a `init.sql` file under `src/test/resources`, so that the file is #### Create Kafka external streams -First we will create 4 [Kafka External Streams](/proton-kafka) so that we can read data and write data. +First we will create 4 [Kafka External Streams](/kafka-source) so that we can read data and write data. ```sql CREATE EXTERNAL STREAM input(raw string) @@ -221,7 +221,7 @@ Please note: 2. Kafka External Stream in Timeplus is bi-directional. You can read Kafka data by `SELECT.. FROM` or write data to Kafka via `INSERT INTO` or Materialized View target stream. -3. For `primes` and `composites` external streams, we also defined the [_tp_message_key](/proton-kafka#_tp_message_key) virtual column to write the message key. The default key value is set to the message value. +3. For `primes` and `composites` external streams, we also defined the [_tp_message_key](/kafka-source) virtual column to write the message key. The default key value is set to the message value. #### Create a JavaScript UDF to check prime numbers diff --git a/docs/understanding-watermark.mdx b/docs/understanding-watermark.mdx index 10f5e2f87..b20c1742e 100644 --- a/docs/understanding-watermark.mdx +++ b/docs/understanding-watermark.mdx @@ -2,7 +2,7 @@ import TimeplusWatermarkVisualization from '@site/src/components/TimeplusWaterma # Understanding Watermark -In stream processing, a watermark is a mechanism used to track the progress of data processing in one or more streams. It helps determine when all events up to a specific timestamp have likely arrived, enabling the system to process time-based operations like [window aggregations](/stream-query#window-aggregation). +In stream processing, a watermark is a mechanism used to track the progress of data processing in one or more streams. It helps determine when all events up to a specific timestamp have likely arrived, enabling the system to process time-based operations like [window aggregations](/streaming-query#window-aggregation). Since stream processing deals with unbounded, out-of-order data, watermarks are essential for defining bounded windows and for discarding late or outdated data in a consistent way. Watermarks allow the system to advance event-time, trigger computations (e.g., closing a [tumbling window](/functions_for_streaming#tumble) and [emitting results](/streaming-aggregations#emit)), and maintain accuracy even when data arrives with delays. diff --git a/docs/usecases.mdx b/docs/usecases.mdx index 23372c06b..a00af6183 100644 --- a/docs/usecases.mdx +++ b/docs/usecases.mdx @@ -189,7 +189,7 @@ Result: | ----------------------- | ------ | --------------- | --------- | | 2022-01-12 23:01:00.000 | c00001 | 34 | 35 | -More practically, the user can create a [materialized view](/view#m_view) to automatically put downsampled data into a new stream/view. +More practically, the user can create a [materialized view](/materialized-view) to automatically put downsampled data into a new stream/view. ```sql CREATE MATERIALIZED VIEW car_live_data_1min as diff --git a/docs/v1-release-notes.md b/docs/v1-release-notes.md index fb690f959..a92cd0383 100644 --- a/docs/v1-release-notes.md +++ b/docs/v1-release-notes.md @@ -5,7 +5,7 @@ This page summarizes changes for each major update in Proton and Timeplus Cloud, ## Jun 24, 2024 *Timeplus Core Engine (Proton v1.5.10):* - * Avro-Encoded Messages: Previously, Schema Registry must be enabled to read Avro-encoded messages. Now, SQL can be used to define the Avro schema and read these messages. [Learn more](/proton-format-schema) + * Avro-Encoded Messages: Previously, Schema Registry must be enabled to read Avro-encoded messages. Now, SQL can be used to define the Avro schema and read these messages. [Learn more](/timeplus-format-schema) * Improved Proton Client: `-h 127.0.0.1` is no longer used when launching the proton-client. Proton v1.5.10 listens on both IPv4 and IPv6 ports. ## May 28, 2024 @@ -40,7 +40,7 @@ _Timeplus Console:_ _Timeplus Core Engine (Proton v1.5.7):_ -- You can now join multiple [versioned streams](/versioned-stream) using `LEFT JOIN` and by assigning primary key(s). Results will be emitted whenever there are updates to either side of the JOIN. [Learn more](/joins) +- You can now join multiple [versioned streams](/versioned-stream) using `LEFT JOIN` and by assigning primary key(s). Results will be emitted whenever there are updates to either side of the JOIN. [Learn more](/streaming-joins) - New examples in the Timeplus Proton repo /examples folder: - [One Billion Rows Challenge (1BRC)](https://github.com/timeplus-io/proton/tree/develop/examples/onebrc), contributed by Timeplus Community member [Saïd Abiola](https://github.com/ayewo) - [Real-time retrieval-augmented generation (RAG)](https://github.com/timeplus-io/proton/tree/develop/examples/real-time-ai) @@ -79,8 +79,8 @@ _Timeplus Cloud and Timeplus Enterprise:_ _Timeplus Proton:_ - Streaming processing now supports nullable data type. -- [External Table](/proton-clickhouse-external-table#create-external-table): ClickHouse external tables with names containing special characters (such as dashes) are now supported. Simply set `table='test-a-b'` in the `CREATE EXTERNAL TABLE` DDL. -- [External Stream](/proton-kafka#create-external-stream): Error handling and connection pooling/retry for Kafka external streams have been greatly improved. +- [External Table](/clickhouse-external-table): ClickHouse external tables with names containing special characters (such as dashes) are now supported. Simply set `table='test-a-b'` in the `CREATE EXTERNAL TABLE` DDL. +- [External Stream](/kafka-source): Error handling and connection pooling/retry for Kafka external streams have been greatly improved. - Materialized View: Added option to [skip dirty/unexpected data](/query-syntax#settings). If you set `SETTINGS recovery_policy='best_effort'`, Timeplus will try up to 3 times, then skip dirty data and continue processing the rest of the data. _Timeplus Cloud and Timeplus Enterprise:_ @@ -109,7 +109,7 @@ _Proton:_ - Proton can now natively integrate with ClickHouse, available for both ClickHouse Cloud or local/self-managed versions of ClickHouse. [Learn more](https://www.timeplus.com/post/proton-clickhouse-integration) - Bulk CSV import is enhanced, in Proton 1.5.2. You can load billions of rows in multiple CSV files via a single SQL. -- Kafka Schema Registry is supported with Protobuf and Avro format (Proton 1.5.2). [Learn more](/proton-schema-registry) +- Kafka Schema Registry is supported with Protobuf and Avro format (Proton 1.5.2). [Learn more](/kafka-schema-registry) - Self-signed HTTPS certification for Schema Registry is supported (Proton 1.5.3). - Proton now can be compiled on SUSE Linux. @@ -138,8 +138,8 @@ _Timeplus Cloud:_ _Proton (Current version: v1.4.2):_ -- Since Proton v1.4.2, we’ve added support to read or write ClickHouse tables. To do this, we’ve introduced a new concept in Proton: "External Table". Similar to [External Stream](/external-stream), no data is persisted in Proton. In the future, we will support more integration by introducing other types of External Table. [See our docs](/proton-clickhouse-external-table) for use cases and more details. -- Based on user feedback, we’ve simplified the process of reading key/value pairs in the JSON document in a Kafka topic. You don’t need to define all keys as columns, and no need to set `input_format_skip_unknown_fields` in DDL or SQL. [Learn more](/proton-kafka) +- Since Proton v1.4.2, we’ve added support to read or write ClickHouse tables. To do this, we’ve introduced a new concept in Proton: "External Table". Similar to [External Stream](/external-stream), no data is persisted in Proton. In the future, we will support more integration by introducing other types of External Table. [See our docs](/clickhouse-external-table) for use cases and more details. +- Based on user feedback, we’ve simplified the process of reading key/value pairs in the JSON document in a Kafka topic. You don’t need to define all keys as columns, and no need to set `input_format_skip_unknown_fields` in DDL or SQL. [Learn more](/kafka-source) - For random streams, you can now define the EPS (event per second) as a number between 0 to 1. For example, eps=0.5 means generating an event every 2 seconds. - A new [extract_key_value_pairs](/functions_for_text#extract_key_value_pairs) function is added to extract key value pairs from a string to a map. - We’ve refined the anonymous telemetry configuration. Regardless if it’s a single binary or Docker deployment, you can set a `TELEMETRY_ENABLED` environment variable. The reporting interval is adjusted from 2 minutes to 5 minutes. @@ -156,7 +156,7 @@ _Timeplus Cloud:_ _Proton:_ - Proton v1.4.1 is now released. Please note: you cannot use an older version of Proton client to connect to the new v1.4 Proton server — be sure to update your Proton client. All existing JDBC, ODBC, Go, and Python drivers will still work as usual. -- (v1.3.31) Write to Kafka in plain text: you can now [produce raw format data](/proton-kafka) to a Kafka external stream with a single column. +- (v1.3.31) Write to Kafka in plain text: you can now [produce raw format data](/kafka-source) to a Kafka external stream with a single column. - (v1.3.31) By default, we disable sort for historical backfill. [Learn more](/query-settings) in our query guide, including how to enable. _Timeplus Cloud:_ @@ -173,7 +173,7 @@ _Proton:_ - We've added a new example in the [proton/examples](https://github.com/timeplus-io/proton/tree/develop/examples) folder for [Coinbase](https://github.com/timeplus-io/proton/tree/develop/examples/coinbase). - (v1.3.30) New functions for aggregation: [stochastic_linear_regression_state](/functions_for_agg#stochastic_linear_regression_state) and [stochastic_logistic_regression](/functions_for_agg#stochastic_logistic_regression). - (v1.3.30) New functions for processing text: [base64_encode](/functions_for_text#base64_encode), [base64_decode](/functions_for_text#base64_decode), [base58_encode](/functions_for_text#base58_encode), and [base58_decode](/functions_for_text#base58_decode), -- (v1.3.30) When creating an external stream, you can set sasl_mechanism to SCRAM-SHA-512, SCRAM-SHA-256, or PLAIN (default value). Learn more with [examples](/proton-kafka#create-external-stream) in our docs. +- (v1.3.30) When creating an external stream, you can set sasl_mechanism to SCRAM-SHA-512, SCRAM-SHA-256, or PLAIN (default value). Learn more with [examples](/kafka-source) in our docs. _Timeplus Cloud:_ @@ -185,7 +185,7 @@ _Timeplus Cloud:_ _Proton:_ - Check out new examples in the [proton/examples](https://github.com/timeplus-io/proton/tree/develop/examples) folder: [CDC](https://github.com/timeplus-io/proton/tree/develop/examples/cdc), [awesome-sensor-logger](https://github.com/timeplus-io/proton/tree/develop/examples/awesome-sensor-logger), and [fraud detection](https://github.com/timeplus-io/proton/tree/develop/examples/fraud_detection) -- (v1.3.29) Introduced new SQL commands for [managing format schemas](/proton-format-schema) (for now, only Protobuf schemas are supported). +- (v1.3.29) Introduced new SQL commands for [managing format schemas](/timeplus-format-schema) (for now, only Protobuf schemas are supported). - (v1.3.28) For `create random stream`, the default interval_time is now 5 milliseconds, instead of 100 milliseconds. This new default value will generate random data more evenly. - (v1.3.28) Function names are no longer case sensitive. You can use count(), COUNT(), or Count(). This improves the compatibility for Proton with 3rd party tools if they generate SQL statements in uppercase. - (v1.3.27) Random stream supports ipv4 and ipv6 data type. @@ -226,7 +226,7 @@ _Proton:_ - Proton JDBC driver is now available via [Maven](https://central.sonatype.com/artifact/com.timeplus/proton-jdbc). - You can now connect Proton to [Pulse](https://www.timestored.com/pulse/) for OHLC charts. - New functions added: [untuple](/functions_for_comp#untuple), [tuple_element](/functions_for_comp#tuple_element), [columns](/functions_for_comp#columns), [apply](/functions_for_comp#apply), [any](/functions_for_agg#any), and [last_value](/functions_for_agg#last_value). -- You can now create an external stream with multiple columns while reading Kafka. [Learn more](/proton-kafka#multi_col_read) +- You can now create an external stream with multiple columns while reading Kafka. [Learn more](/kafka-source) _Timeplus Cloud:_ diff --git a/docs/v2-release-notes.md b/docs/v2-release-notes.md index 807a77930..f39f3118c 100644 --- a/docs/v2-release-notes.md +++ b/docs/v2-release-notes.md @@ -22,8 +22,8 @@ This page summarizes changes for Timeplus Enterprise and Timeplus Proton, on a b ### Timeplus Enterprise v2.8 GA * 2.8.1 is the first GA version of [Timeplus Enterprise v2.8](/enterprise-v2.8), with the key features: - * New Compute Node server role to [run materialized views elastically](/view#autoscaling_mv) with checkpoints on S3 storage. - * Timeplus can read or write data in Apache Iceberg tables. [Learn more](/iceberg) + * New Compute Node server role to [run materialized views elastically](/materialized-view#autoscaling_mv) with checkpoints on S3 storage. + * Timeplus can read or write data in Apache Iceberg tables. [Learn more](/iceberg-source) * Timeplus can read or write PostgreSQL tables directly via [PostgreSQL External Table](/pg-external-table) or look up data via [dictionaries](/sql-create-dictionary#source_pg). * Use S3 as the [tiered storage](/tiered-storage) for streams. * New SQL command to [rename streams](/sql-rename-stream) or [columns](/sql-alter-stream#rename-column). @@ -88,8 +88,8 @@ You can add connection to Timeplus Proton or Timeplus Enterprise in [marimo](/ma ### Timeplus Enterprise v2.8 (Preview) [Timeplus Enterprise v2.8.0](/enterprise-v2.8) is now available as a technical preview for the 2.8 release. Not ready for production use but feel free to try the new features and provide feedback. -* New Compute Node server role to [run materialized views elastically](/view#autoscaling_mv) with checkpoints on S3 storage. -* Timeplus can read or write data in Apache Iceberg tables. [Learn more](/iceberg) +* New Compute Node server role to [run materialized views elastically](/materialized-view#autoscaling_mv) with checkpoints on S3 storage. +* Timeplus can read or write data in Apache Iceberg tables. [Learn more](/iceberg-source) * Timeplus can read or write PostgreSQL tables directly via [PostgreSQL External Table](/pg-external-table) or look up data via [dictionaries](/sql-create-dictionary#source_pg). * Use S3 as the [tiered storage](/tiered-storage) for streams. * New SQL command to [rename streams](/sql-rename-stream). @@ -187,13 +187,13 @@ The new version of Grafana plugin improved the batching strategy to render resul [Timeplus Enterprise v2.6](/enterprise-v2.6) is now Generally Available! Key breakthroughs: * **Revolutionary hybrid hash table technology.** For streaming SQL with JOINs or aggregations, by default a memory based hash table is used. This is helpful for preventing the memory limits from being exceeded for large data streams with hundreds of GB of data. You can adjust the query setting to apply the new hybrid hash table, which uses both the memory and the local disk to store the internal state as a hash table. * **Enhanced operational visibility.** Gain complete transparency into your system's performance through comprehensive monitoring of materialized views and streams. Track state changes, errors, and throughput metrics via [system.stream_state_log](/system-stream-state-log) and [system.stream_metric_log](/system-stream-metric-log). -* **Advanced cross-deployment integration.** Seamlessly write data to remote Timeplus deployments by configuring [Timeplus external stream](/timeplus-external-stream) as targets in materialized views. +* **Advanced cross-deployment integration.** Seamlessly write data to remote Timeplus deployments by configuring [Timeplus external stream](/timeplus-source) as targets in materialized views. * **Improved data management capabilities.** Add new columns to an existing stream. Truncate historical data for streams. Create new databases to organize your streams and materialized views. * **Optimized ClickHouse integration.** Significant performance improvements for read/write operations with ClickHouse external tables. * **Enhanced user experience.** New UI wizards for Coinbase data sources and Apache Pulsar external streams, alongside a redesigned SQL Console and SQL Helper interface for improved usability. Quick access to streams, dashboards, and common actions via Command+K (Mac) or Windows+K (PC) keyboard shortcuts. ### Timeplus Proton v1.6.9 -* Timeplus external stream is now available in Timeplus Proton. You can read or write data across Timeplus deployments. [Learn more](/timeplus-external-stream). +* Timeplus external stream is now available in Timeplus Proton. You can read or write data across Timeplus deployments. [Learn more](/timeplus-source). ### Timeplus Grafana plugin v2.1.0 The new version of Grafana plugin supports query variables and annotations. The SQL editor is enlarged for better readability and editing experience. @@ -202,7 +202,7 @@ The new version of Grafana plugin supports query variables and annotations. The Happy New Year 🎉 ### Timeplus Proton v1.6.8 -* Pulsar external stream is now available in Timeplus Proton. You can use Pulsar external stream to query or process data in Pulsar with SQL. [Learn more](/pulsar-external-stream). +* Pulsar external stream is now available in Timeplus Proton. You can use Pulsar external stream to query or process data in Pulsar with SQL. [Learn more](/pulsar-source). ## Dec 23, 2024 @@ -220,10 +220,10 @@ Merry Christmas 🎄 ### Timeplus Enterprise v2.5 * [Timeplus Enterprise v2.5](/enterprise-v2.5) is now Generally Available! This milestone marks a significant leap forward for our Timeplus Enterprise v2 which was released earlier this year. In this release, we pushed our unparalleled performance to a new level, natively integrated with Redpanda Connect and Apache Pulsar to access a rich ecosystem of enterprise and AI applications. Key breakthroughs: - * [Materialized Views Auto-Rebalancing](/view#auto-balancing) + * [Materialized Views Auto-Rebalancing](/materialized-view#auto-balancing) * Performance Improvements * Enterprise-Grade Real-Time Data Integration with [200+ Connectors from Redpanda Connect](/redpanda-connect) - * [Pulsar External Stream](pulsar-external-stream) to query or process data in Pulsar with SQL + * [Pulsar External Stream](pulsar-source) to query or process data in Pulsar with SQL ### Timeplus Proton v1.6.4 * Support more Linux distributions by lowering the required version for GLIBC. For AMD64 chips,the minimal version is 2.2.5, and for ARM64 chips, the minimal version is 2.17. @@ -255,8 +255,8 @@ Merry Christmas 🎄 ### Documentation Updates * Updated the guide of [Ingest REST API](/ingest-api) to add more instructions for self-hosting Timeplus Enterprise. API keys are only available in Timeplus Cloud. For self-hosting deployment, please encode the username and password with base64, and set it in the HTTP Authorization header. -* Updated the structure of [Kafka external stream](/proton-kafka). Mentioned `RawBLOB` as a supported data format. -* Added documentation for materialized view [load balancing](/view#memory_weight). +* Updated the structure of [Kafka external stream](/kafka-source). Mentioned `RawBLOB` as a supported data format. +* Added documentation for materialized view [load balancing](/materialized-view#memory_weight). ## Oct 28, 2024 @@ -305,7 +305,7 @@ Compared to v1.4.33 in Timeplus Enterprise self-hosted edition, the key enhancem * Refinements to error and warning messages, including avoid showing two messages for the same resource. * Moved the 'Send as Sink' button to the 'Save As' dropdown in the SQL Console. * Able to render large numbers such as int256 or uint256. -* Wizard UI to create [Timeplus External Streams](/timeplus-external-stream). +* Wizard UI to create [Timeplus External Streams](/timeplus-source). * Wizard UI to create [Mutable Streams](/mutable-stream). * Fix the issue where scrollbar is too thin. * In SQL Console, you can write multiple SQL statements and select one to run the statement. @@ -356,7 +356,7 @@ We are actively working on the refinement to support latest Timeplus core engine ### Timeplus Enterprise v2.4.18 * Timeplus Enterprise [v2.4.17](/enterprise-v2.4#2_4_17) and [v2.4.19](/enterprise-v2.4#2_4_19) are released. * The key changes are: - * support running [table function](/functions_for_streaming#table) on [Timeplus External Stream](/timeplus-external-stream) + * support running [table function](/functions_for_streaming#table) on [Timeplus External Stream](/timeplus-source) * better track memory usage in macOS and Docker container. * allow you to [drop streams](/sql-drop-stream#force_drop_big_stream) with `force_drop_big_stream=true` * use username:password for ingest API wizard @@ -375,7 +375,7 @@ We are actively working on the refinement to support latest Timeplus core engine * Stream “mode” is renamed to stream “type” in the web console UI. ### Timeplus Proton v1.5.15 - * Timeplus Proton v1.5.15 is released, allowing Timeplus Enterprise v2.4 to read or write via external streams. [Learn more](/timeplus-external-stream) + * Timeplus Proton v1.5.15 is released, allowing Timeplus Enterprise v2.4 to read or write via external streams. [Learn more](/timeplus-source) ### Timeplus Native JDBC v2.0.4 * Bug fix: For low_cardinality(nullable), nullable(uuid), map(low_cardinality) and tuple(low_cardinality) diff --git a/docs/view.md b/docs/view.md index 311f83aa4..bed88b25e 100644 --- a/docs/view.md +++ b/docs/view.md @@ -1,13 +1,8 @@ -# View and Materialized View -Real-time data pipelines are built via [Materialized Views](#m_view) in Timeplus. +# View -There are two types of views in Timeplus: logical view (or just view) and materialized view. +Like in regular database, Timeplus View is a logical definition of a virtual table which doesn't store any data by itself nor itself runs. It gets bounded to a SQL statement and serves as a reusable component that other views or queries or Materialized Views can use. -## View - -If you have a common query you'll run multiple times a week or month, you can save it as a "Bookmark" in web console. While you can load the bookmarked query in the query editor without typing it again, you can't refer to bookmarked queries in a SQL query. This is where views come in. - -### Create or Drop Views +## Create or Drop Views You can create views for all kinds of queries, and refer to the views in other queries. Creating a view won't trigger any query execution. Views are evaluated only when other queries refer to it. @@ -38,7 +33,8 @@ CREATE VIEW view2 AS SELECT * FROM table(my_stream) ``` Then each time you run `SELECT count(*) FROM view2` will return the current row number of the my_stream immediately without waiting for the future events. -### Parameterized Views +### Parameterized View + Starting from Timeplus Enterprise 2.9, you can create views with parameters. For example: ```sql -- create a parameterized view with one int8 parameter @@ -48,135 +44,3 @@ select * from github_events limit {limit:int8}; -- run a SQL with the view and the parameter value select * from github_param_view(limit=2); ``` - -## Materialized view {#m_view} - -The difference between a materialized view and a regular view is that the materialized view is running in background after creation and the resulting stream is physically written to internal storage (hence it's called materialized). - -Once the materialized view is created, Timeplus will run the query in the background continuously and incrementally emit the calculated results according to the semantics of its underlying streaming select. - -### Create a Materialized View - -```sql -CREATE MATERIALIZED VIEW [IF NOT EXISTS] -[INTO ] -AS -``` - -### Use Materialized Views - -There are different ways to use the materialized views in Timeplus: - -1. Streaming mode: `SELECT * FROM materialized_view` Get the result for future data. This works in the same way as views. -2. Historical mode: `SELECT * FROM table(materialized_view)` Get all past results for the materialized view. -3. Historical + streaming mode: `SELECT * FROM materialized_view WHERE _tp_time>='1970-01-01'` Get all past results and as well as the future data. -4. Pre-aggregation mode: `SELECT * FROM table(materialized_view) where _tp_time in (SELECT max(_tp_time) as m from table(materialized_view))` This immediately returns the most recent query result. If `_tp_time` is not available in the materialized view, or the latest aggregation can produce events with different `_tp_time`, you can add the `emit_version()` to the materialized view to assign a unique ID for each emit and pick up the events with largest `emit_version()`. For example: - -```sql -create materialized view mv as -select emit_version() as version, window_start as time, count() as n, max(speed_kmh) as h from tumble(car_live_data,10s) -group by window_start, window_end; - -select * from table(mv) where version in (select max(version) from table(mv)); -``` - -You build data pipelines in Timeplus using materialized views. - - -### Load Balancing - -It's common to define many materialized views in Timeplus for various computation and analysis. Some materialized views can be memory-consuming or cpu-consuming. - -In Timeplus Enterprise cluster mode, you can schedule the materialized views in a proper way to ensure each node gets similar workload. - -#### Manual Load Balancing {#memory_weight} - -Starting from [Timeplus Enterprise v2.3](/enterprise-v2.3), when you create a materialized view with DDL SQL, you can add an optional `memory_weight` setting for those memory-consuming materialized views, e.g. -```sql -CREATE MATERIALIZED VIEW my_mv -SETTINGS memory_weight = 10 -AS SELECT .. -``` - -When `memory_weight` is not set, by default the value is 0. When Timeplus Enterprise server starts, the system will list all materialized views, ordered by the memory weight and view names, and schedule them in the proper node. - -For example, in a 3-node cluster, you define 10 materialized views with names: mv1, mv2, .., mv9, mv10. If you create the first 6 materialized views with `SETTINGS memory_weight = 10`, then node1 will run mv1 and mv4; node2 will run mv2 and mv5; node3 will run mv3 and mv6; Other materialized views(mv7 to mv10) will be randomly scheduled on any nodes. - -It's recommended that each node in the Timeplus Enterprise cluster shares the same hardware specifications. For those resource-consuming materialized views, it's recommended to set the same `memory_weight`, such as 10, to get the expected behaviors to be dispatched to the proper nodes for load-balancing. - -#### Auto Load Balancing {#auto-balancing} - -Starting from [Timeplus Enterprise v2.5](/enterprise-v2.5), you can also apply auto-load-balancing for memory-consuming materialized views in Timeplus Enterprise cluster. By default, this feature is enabled and there are 3 settings at the cluster level: - -```yaml -workload_rebalance_check_interval: 30s -workload_rebalance_overloaded_memory_util_threshold: 50% -workload_rebalance_heavy_mv_memory_util_threshold: 10% -``` - -As the administrator, you no longer need to determine which materialized views need to set a `memory_weight` setting. In a cluster, Timeplus will monitor the memory consumption for each materialized view. Every 30 seconds, configurable via `workload_rebalance_check_interval`, the system will check whether there are any node with memory over 50% full. If so, check whether there is any materialized view in such node consuming 10% or more memory. When those conditions are all met, rescheduling those materialized views to less busy nodes. During the rescheduling, the materialized view on the previous node will be paused and its checkpoint will be transferred to the target node, then the materialized view on target node will resume the streaming SQL based on the checkpoint. - -### Auto-Scaling Materialized Views {#autoscaling_mv} -Starting from [Timeplus Enterprise v2.8](/enterprise-v2.8), materialized views can be configured to run on elastic compute nodes. This can reduce TCO (Total Cost of Ownership), by enabling high concurrent materialized views scheduling, auto scale-out and scale-in according to workload. - -To enable this feature, you need to -1. create a S3 disk in the `s3_plain` type. -2. create a materialized view by setting the checkpoint storage to `s3` and use the s3 disk. -3. enable compute nodes in the cluster, with optional autoscaling based on your cloud or on-prem infrastructure. - -For example: -```sql ---S3 based checkpoint -CREATE DISK ckpt_s3_disk disk( - type = 's3_plain', - endpoint = 'https://mat-view-ckpt.s3.us-west-2.amazonaws.com/matv_ckpt/', - access_key_id = '...', - secret_access_key = '...'); - -CREATE MATERIALIZED VIEW mat_v_scale INTO clickhouse_table -AS SELECT … -SETTINGS -checkpoint_settings=’storage_type=s3;disk_name=ckpt_s3_disk;async=true;interval=5’; -``` - -### Drop Materialized Views - -Run the following SQL to drop a view or a materialized view. - -```sql -DROP VIEW [IF EXISTS] db.; -``` - -Like [CREATE STREAM](/sql-create-stream), stream deletion is an async process. - -### Best Practices - -* It's recommended to specify [a target stream](#target-stream) when creating a materialized view, no matter a stream in Timeplus, an external stream to Apache Kafka, Apache Pulsar, or external tables to ClickHouse, S3, Iceberg, etc. diff --git a/docs/why-timeplus.md b/docs/why-timeplus.md index 28deabfdc..e30cf1c40 100644 --- a/docs/why-timeplus.md +++ b/docs/why-timeplus.md @@ -39,9 +39,9 @@ Timeplus scales easily from edge devices to multi-node clusters, and with its Ap ### Multi-JOINs and ASOF JOINs {#join} -Stream processing involves combining multiple data sources, and [MULTI-JOINs](/joins) are essential for enriching and correlating events in streaming queries. Timeplus allows you to run ad-hoc historical queries on the same data, reducing the need for denormalization in downstream data warehouses. +Stream processing involves combining multiple data sources, and [MULTI-JOINs](/streaming-joins) are essential for enriching and correlating events in streaming queries. Timeplus allows you to run ad-hoc historical queries on the same data, reducing the need for denormalization in downstream data warehouses. -In many cases, Business Intelligence and analytical queries can be executed directly in Timeplus, eliminating the need for a separate data warehouse. [ASOF JOINs](/joins) enable approximate time-based lookups for comparing recent versus historical data. +In many cases, Business Intelligence and analytical queries can be executed directly in Timeplus, eliminating the need for a separate data warehouse. [ASOF JOINs](/streaming-joins) enable approximate time-based lookups for comparing recent versus historical data. ### Python and JavaScript UDF {#udf} @@ -51,7 +51,7 @@ With Python UDFs, this opens up the possibility to bring in pre-existing and pop ### External Stream, External Table {#external} -We want to simplify the experience of joining data from Apache Kafka and writing results out to data warehouses such as Clickhouse, or another Timeplus instance. Timeplus implements native integration to these systems in timeplusd via EXTERNAL STREAM (with [Kafka](/proton-kafka) and [Timeplus](/timeplus-external-stream)) and [EXTERNAL TABLE (with ClickHouse)](/proton-clickhouse-external-table). No need for deploying yet another Connector component. +We want to simplify the experience of joining data from Apache Kafka and writing results out to data warehouses such as Clickhouse, or another Timeplus instance. Timeplus implements native integration to these systems in timeplusd via EXTERNAL STREAM (with [Kafka](/kafka-source) and [Timeplus](/timeplus-source)) and [EXTERNAL TABLE (with ClickHouse)](/clickhouse-external-table). No need for deploying yet another Connector component. We understand that we cannot do this for all systems and for that, we have Timeplus Connector which can be configured to integrate with hundreds of other systems if needed. diff --git a/docs/working-with-streams.md b/docs/working-with-streams.md index 30599a6ff..def520f85 100644 --- a/docs/working-with-streams.md +++ b/docs/working-with-streams.md @@ -41,12 +41,12 @@ When users [create a stream](/sql-create-stream) in Timeplus, they can specify t The **streaming store** is essentially the **Write-Ahead Log** (internally called `NativeLog`). It supports: -* High-concurrency [data ingestion](/ingestion) -* [Incremental stream processing](/stream-query) +* High-concurrency [data ingestion](/connect-data-in) +* [Incremental stream processing](/streaming-query) * Real-time data replication For more information, refer to the [high-level architecture](/architecture) page. -The **historical store** asynchronously derives its data from the WAL through a dedicated background thread. It performs periodic **compaction**, **merge**, and **compression**, making it highly efficient for [historical analytic queries](/history) and **streaming backfills**. +The **historical store** asynchronously derives its data from the WAL through a dedicated background thread. It performs periodic **compaction**, **merge**, and **compression**, making it highly efficient for [historical analytic queries](/historical-query) and **streaming backfills**. To learn more about stream lifecycle operations (Create, Read, Delete, Update) and advanced configurations like **TTL**, **key versioning**, and other stream settings, refer to the SQL Reference documentation. To learning more about external streams, refer to [external stream](/external-stream) pages for more details. diff --git a/docusaurus.config.js b/docusaurus.config.js index 790bad791..421d0793b 100644 --- a/docusaurus.config.js +++ b/docusaurus.config.js @@ -44,11 +44,74 @@ const config = { from: '/faq', to: '/howtos', }, + { + from: '/proton-kafka', + to: '/kafka-source', + }, + { + from: '/pulsar-external-stream', + to: '/pulsar-source', + }, + { + from: '/proton-schema-registry', + to: '/kafka-schema-registry', + }, + { + from: '/proton-format-schema', + to: '/timeplus-format-schema', + }, + { + from: '/proton-clickhouse-external-table', + to: '/clickhouse-external-table', + }, + { + from: '/mongo-external', + to: '/mongo-external-table', + }, + { + from: '/s3-external', + to: '/s3-source', + }, + { + from: '/iceberg', + to: '/iceberg-source', + }, + { + from: '/http-external', + to: '/http-external-stream', + }, + { + from: '/timeplus-external-stream', + to: '/timeplus-source', + }, + { + from: '/stream-query', + to: '/streaming-query', + }, + { + from: '/history', + to: '/historical-query', + }, + { + from: '/joins', + to: '/streaming-joins', + }, + { + from: '/ingest', + to: '/connect-data-in', + }, + { + from: '/destination', + to: '/send-data-out', + }, + { + from: '/compare', + to: '/proton-oss-vs-enterprise', + }, ], }, ], ], - presets: [ [ "classic", @@ -143,19 +206,19 @@ const config = { { type: "doc", position: "left", - docId: "ingestion", - label: "Get Data In", + docId: "connect-data-in", + label: "Connect Data In", }, { type: "doc", position: "left", docId: "query-syntax", - label: "Query Data", + label: "Process Data", }, { type: "doc", position: "left", - docId: "destination", + docId: "send-data-out", label: "Send Data Out", }, { diff --git a/sidebars.js b/sidebars.js index 80e7ce965..f9d98eb61 100644 --- a/sidebars.js +++ b/sidebars.js @@ -95,193 +95,286 @@ const sidebars = { { type: "doc", id: "howtos", - label: "How To", + label: "How Tos", }, ] }, { type: "category", - label: "Core Features", - // customProps: { tag: "Popular" }, + label: "Connect Data In", items: [ + { + type: "doc", + label: "Overview", + id: "connect-data-in", + }, + // { + // type: "category", + // label: "Native Client", + // link: { + // type: "doc", + // id: "native-client", + // }, + // items: [ + // { + // type: "doc", + // id: "idempotent", + // }, + // ], + // }, { type: "category", - label: "Streams", + label: "Apache Kafka", link: { type: "doc", - id: "working-with-streams", + id: "kafka-source", }, - items: [ - "append-stream", - "versioned-stream", - "changelog-stream", - { - type: "doc", - id: "mutable-stream", - customProps: { tag: "Enterprise" }, - }, - { - label: "Random Stream", - type: "link", - href: "https://docs.timeplus.com/sql-create-random-stream", - }, - ], + items: ["kafka-schema-registry", "timeplus-format-schema"], + }, + { + type: "doc", + label: "Apache Pulsar", + id: "pulsar-source", + }, + { + type: "doc", + label: "MySQL", + id: "mysql-external-table", + }, + { + type: "doc", + label: "Apache Iceberg", + id: "iceberg-source", + }, + { + type: "doc", + label: "MongoDB", + id: "mongo-external-table", + }, + { + type: "doc", + label: "PostgreSQL", + id: "pg-external-table", + }, + { + type: "doc", + label: "S3", + id: "s3-source", + }, + { + type: "doc", + label: "Log Stream", + id: "log-stream", + }, + { + type: "doc", + label: "Remote Timeplus", + id: "timeplus-source", + }, + { + type: "link", + label: "Timeplus Random Stream", + href: "https://docs.timeplus.com/sql-create-random-stream", + }, + ] + }, + { + type: "category", + label: "Process Data", + items: [ + { + type: "doc", + id: "view", + label: "View", }, { type: "category", - label: "Materialized Views", + label: "Materialized View", link: { type: "doc", - id: "view", + id: "materialized-view", }, items: ["checkpoint-settings"], }, + { + type: "doc", + id: "streaming-query", + label: "Streaming Query" + }, + { + type: "doc", + id: "historical-query", + label: "Historical Query" + }, + { + type: "doc", + id: "streaming-joins", + label: "Streaming Joins" + }, + { + type: "doc", + id: "streaming-windows", + label: "Streaming Windows" + }, + { + type: "doc", + id: "streaming-aggregations", + label: "Streaming Aggregations" + }, + { + type: "doc", + id: "jit", + label: "Just-In-Time Compilation" + }, + ] + }, + { + type: "category", + label: "Store & Serve Data", + items: [ { type: "category", - label: "Data Ingestion", + label: "Streams", link: { type: "doc", - id: "ingestion", + id: "working-with-streams", }, items: [ { type: "doc", - id: "idempotent", - customProps: { tag: "Enterprise" }, + id: "append-stream", }, - ], - }, - "destination", - { - type: "category", - label: "External Streams & Tables", - // link: { - // type: "generated-index", - // title: "SQL Commands", - // description: "Overview of the SQL commands supported by Timeplus.", - // slug: "/category/commands", - // keywords: ["guides"], - // }, - items: [ { - type: "category", - label: "External Streams", - link: { - type: "doc", - id: "external-stream", - }, - items: [ - { - type: "category", - label: "Apache Kafka", - link: { - type: "doc", - id: "proton-kafka", - }, - items: ["proton-schema-registry", "proton-format-schema"], - }, - { - type: "doc", - id: "pulsar-external-stream", - label: "Apache Pulsar", - }, - { - type: "doc", - id: "timeplus-external-stream", - label: "Remote Timeplus", - }, - { - type: "doc", - id: "http-external", - label: "HTTP Write", - customProps: { tag: "Enterprise" }, - }, - "log-stream", - ], + type: "doc", + id: "versioned-stream", }, { - type: "category", - label: "External Tables", - items: [ - { - type: "doc", - id: "proton-clickhouse-external-table", - label: "ClickHouse", - }, - { - type: "doc", - id: "mysql-external-table", - label: "MySQL", - customProps: { tag: "Enterprise" }, - }, - { - type: "doc", - id: "pg-external-table", - label: "PostgreSQL", - customProps: { tag: "Enterprise" }, - }, - { - type: "doc", - id: "mongo-external", - label: "MongoDB", - customProps: { tag: "Enterprise" }, - }, - { - type: "doc", - label: "Amazon S3", - id: "s3-external", - customProps: { tag: "Enterprise" }, - }, - { - type: "doc", - id: "iceberg", - label: "Apache Iceberg", - customProps: { tag: "Enterprise" }, - }, - ], + type: "doc", + id: "changelog-stream", + }, + { + type: "doc", + id: "mutable-stream", + customProps: { tag: "Enterprise" }, + }, + { + type: "doc", + id: "tiered-storage", }, ], }, - // "stream-query", - // "history", { label: "Dictionary", type: "link", href: "https://docs.timeplus.com/sql-create-dictionary", + }, + { + type: "doc", + id: "viz", customProps: { tag: "Enterprise" }, }, + ] + }, + { + type: "category", + label: "Send Data Out", + items: [ { - type: "category", - label: "Stream Processing", - items: [ - "stream-query", - "history", - "joins", - "streaming-windows", - "streaming-aggregations", - { - type: "doc", - id: "jit", - customProps: { tag: "Enterprise" }, - }, - ], + type: "doc", + label: "Overview", + id: "send-data-out", }, + // { + // type: "doc", + // label: "Native Client", + // id: "native-client", + // }, { type: "doc", - id: "alert", - customProps: { tag: "Enterprise" }, + label: "Apache Kafka", + id: "kafka-sink", }, { type: "doc", - id: "tiered-storage", - customProps: { tag: "Enterprise" }, + label: "Apache Pulsar", + id: "pulsar-sink", }, { type: "doc", - id: "viz", + label: "Apache Iceberg", + id: "iceberg-sink", + }, + { + type: "doc", + label: "ClickHouse", + id: "clickhouse-external-table", + }, + { + type: "doc", + label: "S3", + id: "s3-sink", + }, + { + type: "doc", + label: "HTTP", + id: "http-external-stream", + }, + { + type: "doc", + label: "Datadog", + id: "datadog-external", + }, + { + type: "doc", + label: "Splunk", + id: "splunk-external", + }, + { + type: "doc", + label: "Elastic Search", + id: "elastic-external", + }, + { + type: "doc", + label: "Big Query", + id: "bigquery-external", + }, + { + type: "doc", + label: "Databricks", + id: "databricks-external", + }, + // { + // type: "doc", + // label: "MySQL", + // id: "mysql-external-table", + // }, + // { + // type: "doc", + // label: "PostgreSQL", + // id: "pg-external-table", + // }, + // { + // type: "doc", + // label: "MongoDB", + // id: "mongo-external-table", + // }, + { + type: "doc", + label: "Remote Timeplus", + id: "timeplus-sink", + }, + { + type: "doc", + label: "Slack", + id: "slack-external", + }, + { + type: "doc", + id: "alert", customProps: { tag: "Enterprise" }, }, - ], + ] }, { type: "category", @@ -422,8 +515,8 @@ const sidebars = { items: [ { type: "doc", - label: "vs. Timeplus Enterprise", - id: "compare", + label: "Proton OSS vs Enterprise", + id: "proton-oss-vs-enterprise", }, "proton-faq", ],