vesoft-inc · kqzh · Feb 27, 2023 · Feb 27, 2023
diff --git a/README.md b/README.md
@@ -220,6 +220,14 @@ nebula-storaged-1   1/1     Running   0          15m
 nebula-storaged-2   1/1     Running   0          19s
 ```
 
+## Guidelines
+
+[Custom config](doc/user/custom_config.md)
+
+[Log guide](doc/user/log_guide.md)
+
+[Storage guide](doc/user/storage_guide.md)
+
 ## Compatibility matrix
 
 Nebula Operator <-> NebulaGraph

diff --git a/doc/user/balance.md b/doc/user/balance.md
diff --git a/doc/user/custom_config.md b/doc/user/custom_config.md
@@ -1,12 +1,14 @@
-# configure custom parameter
+# Configure custom flags
 
-For each component has a configuration entry, it defines in crd as config which is a map structure, it will be loaded by configmap.
+### Apply custom flags
+
+For each component has a configuration entry, it defines in CRD as config which is a map structure, it will be loaded by configmap.
 ```go
 // Config defines a graphd configuration load into ConfigMap
 Config map[string]string `json:"config,omitempty"`
 ```
 
-The following example will show you how to make configuration chagnes in CRD, i.e for any given options `--foo=bar` in conf files, `.config.foo` could be applied like:
+The following example will show you how to make configuration changes in CRD, i.e. for any given options `--foo=bar` in conf files, `.config.foo` could be applied like:
 
 ```yaml
 apiVersion: apps.nebula-graph.io/v1alpha1
@@ -30,12 +32,49 @@ spec:
       resources:
         requests:
           storage: 2Gi
-      storageClassName: gp2
+      storageClassName: ebs-sc
     config:
       "enable_authorize": "true"
       "auth_type": "password"
       "foo": "bar"
 ...
 ```
 
-Afterwards, the custom parameters _enable_authorize_, _auth_type_ and _foo_ will be configured and overwritten by configmap.
+Afterwards, the custom flags  _enable_authorize_, _auth_type_ and _foo_ will be configured and overwritten by configmap.
+
+### Dynamic runtime flags
+
+This a dynamic runtime flags table, if all the flags in the filed `config` are in this table, 
+the pod rolling update will not be triggered after you apply updates, and it will take effect in time.
+
+
+| Flag                                          | Description                                                                                                                               | Default                                                                                                 |
+|:----------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------|
+| `minloglevel`                                 | Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively                                                                        | `0`                                                                                                     |
+| `v`                                           | Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging                                                   | `0`                                                                                                     |
+| `accept_partial_success`                      | This flag is only used for Read-only access, and Modify access always treats partial success as an error                                  | `false`                                                                                                 |
+| `session_reclaim_interval_secs`               | Period we try to reclaim expired sessions                                                                                                 | `60`                                                                                                    |
+| `max_allowed_query_size`                      | Maximum sentence length, unit byte                                                                                                        | `4194304`                                                                                               |
+| `system_memory_high_watermark_ratio`          | System memory high watermark ratio, cancel the memory checking when the ratio greater than 1.0                                            | `0.8`                                                                                                   |
+| `ng_black_box_file_lifetime_seconds`          | Black box log files expire time                                                                                                           | `1800`                                                                                                  |
+| `memory_tracker_limit_ratio`                  | Trackable memory ratio (trackable_memory / (total_memory - untracked_reserved_memory) )                                                   | `0.8`                                                                                                   |
+| `memory_tracker_untracked_reserved_memory_mb` | Untracked reserved memory in Mib                                                                                                          | `50`                                                                                                    |
+| `memory_tracker_detail_log`                   | Enable log memory tracker stats periodically                                                                                              | `false`                                                                                                 |
+| `memory_tracker_detail_log_interval_ms`       | Log memory tacker stats interval in milliseconds                                                                                          | `60000`                                                                                                 |
+| `memory_purge_enabled`                        | Enable memory background purge (if jemalloc is used)                                                                                      | `true`                                                                                                  |
+| `memory_purge_interval_seconds`               | Memory background purge interval in seconds                                                                                               | `10`                                                                                                    |
+| `heartbeat_interval_secs`                     | Heartbeat interval in seconds                                                                                                             | `10`                                                                                                    |
+| `raft_heartbeat_interval_secs`                | Raft election timeout                                                                                                                     | `30`                                                                                                    |
+| `raft_rpc_timeout_ms`                         | RPC timeout for raft client (ms)                                                                                                          | `500`                                                                                                   |
+| `query_concurrently`                          | Whether turn on query in multiple thread                                                                                                  | `true`                                                                                                  |
+| `wal_ttl`                                     | Recycle Raft WAL                                                                                                                          | `14400`                                                                                                 |
+| `auto_remove_invalid_space`                   | Whether remove outdated space data                                                                                                        | `true`                                                                                                  |
+| `num_io_threads`                              | Network IO threads number                                                                                                                 | `16`                                                                                                    |
+| `num_worker_threads`                          | Worker threads number to handle request                                                                                                   | `32`                                                                                                    |
+| `max_concurrent_subtasks`                     | Maximum subtasks to run admin jobs concurrently                                                                                           | `10`                                                                                                    |
+| `snapshot_part_rate_limit`                    | The rate limit in bytes when leader synchronizes snapshot data                                                                            | `10485760`                                                                                              |
+| `snapshot_batch_size`                         | The amount of data sent in each batch when leader synchronizes snapshot data                                                              | `1048576`                                                                                               |
+| `rebuild_index_part_rate_limit`               | The rate limit in bytes when leader synchronizes rebuilding index                                                                         | `4194304`                                                                                               |
+| `rocksdb_db_options`                          | Rocksdb DBOptions in json, each name and value of option is a string, given as "option_name":"option_value" separated by comma            | `{}`                                                                                                    |
+| `rocksdb_column_family_options`               | Rocksdb ColumnFamilyOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma    | `{"write_buffer_size":"67108864","max_write_buffer_number":"4","max_bytes_for_level_base":"268435456"}` |
+| `rocksdb_block_based_table_options`           | Rocksdb BlockBasedTableOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma | `{"block_size":"8192"}`                                                                                 |
diff --git a/doc/user/log_guide.md b/doc/user/log_guide.md
@@ -0,0 +1,104 @@
+### Log rotation
+
+We use the sidecar container to clean NebulaGraph logs and run logs archiving tasks every hour.
+
+```yaml
+apiVersion: apps.nebula-graph.io/v1alpha1
+kind: NebulaCluster
+metadata:
+  name: nebula
+spec:
+  graphd:
+    config:
+      # Whether logging files' name contain timestamp.
+      "timestamp_in_logfile_name": "false"
+  metad:
+    config:
+      "timestamp_in_logfile_name": "false"
+  storaged:
+    config:
+      "timestamp_in_logfile_name": "false"
+  logRotate:
+    # Log files are rotated count times before being removed  
+    rotate: 5
+    # Log files are rotated only if they grow bigger than size bytes
+    size: "100M"
+```
+
+### Write log to stdout
+
+If you do not need to mount additional log disks in order to save costs on the cloud,
+and collect them through services such as fluent-bit and send them to the log center, you can refer to the configuration below.
+
+```yaml
+apiVersion: apps.nebula-graph.io/v1alpha1
+kind: NebulaCluster
+metadata:
+  name: nebula
+spec:
+  graphd:
+    config:
+      # Whether to redirect stdout and stderr to separate output files
+      redirect_stdout: "false"
+      # The numbers of severity level INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively.
+      stderrthreshold: "0"
+    env:
+    - name: GLOG_logtostderr # Logs are written to standard error instead of to files
+      value: "1"
+    image: vesoft/nebula-graphd
+    replicas: 1
+    resources:
+      requests:
+        cpu: 500m
+        memory: 500Mi
+    service:
+      externalTrafficPolicy: Local
+      type: NodePort
+    version: v3.4.0
+  imagePullPolicy: Always
+  metad:
+    config:
+      redirect_stdout: "false"
+      stderrthreshold: "0"
+    dataVolumeClaim:
+      resources:
+        requests:
+          storage: 1Gi
+      storageClassName: ebs-sc
+    env:
+    - name: GLOG_logtostderr
+      value: "1"
+    image: vesoft/nebula-metad
+    replicas: 1
+    resources:
+      requests:
+        cpu: 500m
+        memory: 500Mi
+    version: v3.4.0
+  reference:
+    name: statefulsets.apps
+    version: v1
+  schedulerName: default-scheduler
+  storaged:
+    config:
+      redirect_stdout: "false"
+      stderrthreshold: "0"
+    dataVolumeClaims:
+    - resources:
+        requests:
+          storage: 1Gi
+      storageClassName: ebs-sc
+    enableAutoBalance: true
+    enableForceUpdate: false
+    env:
+    - name: GLOG_logtostderr
+      value: "1"
+    image: vesoft/nebula-storaged
+    replicas: 1
+    resources:
+      requests:
+        cpu: 500m
+        memory: 500Mi
+    version: v3.4.0
+  unsatisfiableAction: ScheduleAnyway
+```
diff --git a/doc/user/storage_guide.md b/doc/user/storage_guide.md
@@ -0,0 +1,21 @@
+## Scale storage nodes and balance
+
+Scaling out Storage is divided into two stages. 
+
+* In the first stage, you need to wait for the status of all newly created Pods to be Ready. 
+
+* In the second stage, the BALANCE DATA and BALANCE LEADER command is executed. 
+
+We provide a parameter `enableAutoBalance` in CRD to control whether to automatically balance data and leader.
+Through both stages, the scaling process of the controller replicas is decoupled from the balancing data process and user executing it at low traffic.
+Such an implementation can effectively reduce the impact of data migration on online services, which is in line with the NebulaGraph principle: Balancing data is not fully automated and when to balance data is decided by users.
+
+## Storage leaders transfer
+
+When NebulaGraph starts to provide services, there will be multiple partition leaders on each storage node, 
+in the rolling update scenario, in order to minimize the impact on client reads and writes, 
+so it is necessary to transfer the leader to other nodes until the number of leaders on a storage node is 0, this process is relatively long.
+
+To make rolling updates more convenient for DBA, we provide the parameter `enableForceUpdate` for storage service, 
+which can directly roll update storage when it is determined that there is no external access traffic, 
+without waiting for the partition leader to be transferred before it can operate.