diff --git a/README.md b/README.md index a34992f0..833f8883 100644 --- a/README.md +++ b/README.md @@ -220,6 +220,14 @@ nebula-storaged-1 1/1 Running 0 15m nebula-storaged-2 1/1 Running 0 19s ``` +## Guidelines + +[Custom config](doc/user/custom_config.md) + +[Log guide](doc/user/log_guide.md) + +[Storage guide](doc/user/storage_guide.md) + ## Compatibility matrix Nebula Operator <-> NebulaGraph diff --git a/doc/user/balance.md b/doc/user/balance.md deleted file mode 100644 index e8d6b287..00000000 --- a/doc/user/balance.md +++ /dev/null @@ -1,13 +0,0 @@ -## Scale storage nodes and Balance - -Scaling out Storage is divided into two stages. - -* In the first stage, you need to wait for the status of all newly created Pods to be Ready. - -* In the second stage, the BALANCE DATA and BALANCE LEADER command is executed. - -We provide a parameter `enableAutoBalance` in CRD to control whether to automatically balance data and leader. - -Through both stages, the scaling process of the controller replicas is decoupled from the balancing data process and user executing it at low traffic. - -Such an implementation can effectively reduce the impact of data migration on online services, which is in line with the NebulaGraph principle: Balancing data is not fully automated and when to balance data is decided by users. \ No newline at end of file diff --git a/doc/user/custom_config.md b/doc/user/custom_config.md index a47d121c..00cc2fe4 100644 --- a/doc/user/custom_config.md +++ b/doc/user/custom_config.md @@ -1,12 +1,14 @@ -# configure custom parameter +# Configure custom flags -For each component has a configuration entry, it defines in crd as config which is a map structure, it will be loaded by configmap. +### Apply custom flags + +For each component has a configuration entry, it defines in CRD as config which is a map structure, it will be loaded by configmap. ```go // Config defines a graphd configuration load into ConfigMap Config map[string]string `json:"config,omitempty"` ``` -The following example will show you how to make configuration chagnes in CRD, i.e for any given options `--foo=bar` in conf files, `.config.foo` could be applied like: +The following example will show you how to make configuration changes in CRD, i.e. for any given options `--foo=bar` in conf files, `.config.foo` could be applied like: ```yaml apiVersion: apps.nebula-graph.io/v1alpha1 @@ -30,7 +32,7 @@ spec: resources: requests: storage: 2Gi - storageClassName: gp2 + storageClassName: ebs-sc config: "enable_authorize": "true" "auth_type": "password" @@ -38,4 +40,41 @@ spec: ... ``` -Afterwards, the custom parameters _enable_authorize_, _auth_type_ and _foo_ will be configured and overwritten by configmap. +Afterwards, the custom flags _enable_authorize_, _auth_type_ and _foo_ will be configured and overwritten by configmap. + +### Dynamic runtime flags + +This a dynamic runtime flags table, if all the flags in the filed `config` are in this table, +the pod rolling update will not be triggered after you apply updates, and it will take effect in time. + + +| Flag | Description | Default | +|:----------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------| +| `minloglevel` | Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively | `0` | +| `v` | Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging | `0` | +| `accept_partial_success` | This flag is only used for Read-only access, and Modify access always treats partial success as an error | `false` | +| `session_reclaim_interval_secs` | Period we try to reclaim expired sessions | `60` | +| `max_allowed_query_size` | Maximum sentence length, unit byte | `4194304` | +| `system_memory_high_watermark_ratio` | System memory high watermark ratio, cancel the memory checking when the ratio greater than 1.0 | `0.8` | +| `ng_black_box_file_lifetime_seconds` | Black box log files expire time | `1800` | +| `memory_tracker_limit_ratio` | Trackable memory ratio (trackable_memory / (total_memory - untracked_reserved_memory) ) | `0.8` | +| `memory_tracker_untracked_reserved_memory_mb` | Untracked reserved memory in Mib | `50` | +| `memory_tracker_detail_log` | Enable log memory tracker stats periodically | `false` | +| `memory_tracker_detail_log_interval_ms` | Log memory tacker stats interval in milliseconds | `60000` | +| `memory_purge_enabled` | Enable memory background purge (if jemalloc is used) | `true` | +| `memory_purge_interval_seconds` | Memory background purge interval in seconds | `10` | +| `heartbeat_interval_secs` | Heartbeat interval in seconds | `10` | +| `raft_heartbeat_interval_secs` | Raft election timeout | `30` | +| `raft_rpc_timeout_ms` | RPC timeout for raft client (ms) | `500` | +| `query_concurrently` | Whether turn on query in multiple thread | `true` | +| `wal_ttl` | Recycle Raft WAL | `14400` | +| `auto_remove_invalid_space` | Whether remove outdated space data | `true` | +| `num_io_threads` | Network IO threads number | `16` | +| `num_worker_threads` | Worker threads number to handle request | `32` | +| `max_concurrent_subtasks` | Maximum subtasks to run admin jobs concurrently | `10` | +| `snapshot_part_rate_limit` | The rate limit in bytes when leader synchronizes snapshot data | `10485760` | +| `snapshot_batch_size` | The amount of data sent in each batch when leader synchronizes snapshot data | `1048576` | +| `rebuild_index_part_rate_limit` | The rate limit in bytes when leader synchronizes rebuilding index | `4194304` | +| `rocksdb_db_options` | Rocksdb DBOptions in json, each name and value of option is a string, given as "option_name":"option_value" separated by comma | `{}` | +| `rocksdb_column_family_options` | Rocksdb ColumnFamilyOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma | `{"write_buffer_size":"67108864","max_write_buffer_number":"4","max_bytes_for_level_base":"268435456"}` | +| `rocksdb_block_based_table_options` | Rocksdb BlockBasedTableOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma | `{"block_size":"8192"}` | \ No newline at end of file diff --git a/doc/user/log_guide.md b/doc/user/log_guide.md new file mode 100644 index 00000000..11ae21e5 --- /dev/null +++ b/doc/user/log_guide.md @@ -0,0 +1,104 @@ +### Log rotation + +We use the sidecar container to clean NebulaGraph logs and run logs archiving tasks every hour. + +```yaml +apiVersion: apps.nebula-graph.io/v1alpha1 +kind: NebulaCluster +metadata: + name: nebula +spec: + graphd: + config: + # Whether logging files' name contain timestamp. + "timestamp_in_logfile_name": "false" + metad: + config: + "timestamp_in_logfile_name": "false" + storaged: + config: + "timestamp_in_logfile_name": "false" + logRotate: + # Log files are rotated count times before being removed + rotate: 5 + # Log files are rotated only if they grow bigger than size bytes + size: "100M" +``` + +### Write log to stdout + +If you do not need to mount additional log disks in order to save costs on the cloud, +and collect them through services such as fluent-bit and send them to the log center, you can refer to the configuration below. + +```yaml +apiVersion: apps.nebula-graph.io/v1alpha1 +kind: NebulaCluster +metadata: + name: nebula +spec: + graphd: + config: + # Whether to redirect stdout and stderr to separate output files + redirect_stdout: "false" + # The numbers of severity level INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively. + stderrthreshold: "0" + env: + - name: GLOG_logtostderr # Logs are written to standard error instead of to files + value: "1" + image: vesoft/nebula-graphd + replicas: 1 + resources: + requests: + cpu: 500m + memory: 500Mi + service: + externalTrafficPolicy: Local + type: NodePort + version: v3.4.0 + imagePullPolicy: Always + metad: + config: + redirect_stdout: "false" + stderrthreshold: "0" + dataVolumeClaim: + resources: + requests: + storage: 1Gi + storageClassName: ebs-sc + env: + - name: GLOG_logtostderr + value: "1" + image: vesoft/nebula-metad + replicas: 1 + resources: + requests: + cpu: 500m + memory: 500Mi + version: v3.4.0 + reference: + name: statefulsets.apps + version: v1 + schedulerName: default-scheduler + storaged: + config: + redirect_stdout: "false" + stderrthreshold: "0" + dataVolumeClaims: + - resources: + requests: + storage: 1Gi + storageClassName: ebs-sc + enableAutoBalance: true + enableForceUpdate: false + env: + - name: GLOG_logtostderr + value: "1" + image: vesoft/nebula-storaged + replicas: 1 + resources: + requests: + cpu: 500m + memory: 500Mi + version: v3.4.0 + unsatisfiableAction: ScheduleAnyway +``` diff --git a/doc/user/storage_guide.md b/doc/user/storage_guide.md new file mode 100644 index 00000000..db2abaa6 --- /dev/null +++ b/doc/user/storage_guide.md @@ -0,0 +1,21 @@ +## Scale storage nodes and balance + +Scaling out Storage is divided into two stages. + +* In the first stage, you need to wait for the status of all newly created Pods to be Ready. + +* In the second stage, the BALANCE DATA and BALANCE LEADER command is executed. + +We provide a parameter `enableAutoBalance` in CRD to control whether to automatically balance data and leader. +Through both stages, the scaling process of the controller replicas is decoupled from the balancing data process and user executing it at low traffic. +Such an implementation can effectively reduce the impact of data migration on online services, which is in line with the NebulaGraph principle: Balancing data is not fully automated and when to balance data is decided by users. + +## Storage leaders transfer + +When NebulaGraph starts to provide services, there will be multiple partition leaders on each storage node, +in the rolling update scenario, in order to minimize the impact on client reads and writes, +so it is necessary to transfer the leader to other nodes until the number of leaders on a storage node is 0, this process is relatively long. + +To make rolling updates more convenient for DBA, we provide the parameter `enableForceUpdate` for storage service, +which can directly roll update storage when it is determined that there is no external access traffic, +without waiting for the partition leader to be transferred before it can operate. \ No newline at end of file