FQA

Vearch-QA

1. Compile, deploy issues

(1) 'cuda_runtime.h' is not found when compiling the GPU version. But when I use the locate command. Tip: This file is available in /usr/local/cuda-10.0/include. How do you solve this?

A: export CPLUS_INCLUDE_PATH =$CPLUS_INCLUDE_PATH:/usr/local/cuda/include/

(2) An error about faiss occurred in gamma compilation, but faiss has been compiled. What could be the reason?

A: With older versions of FAISS, you need to recompile FAISS. FAISS's 1.6.3 GPU have bug.

(3) Compile times error, as follows:

libfaiss.so: undefinde reference to 'sgeqrf_........'

A: Add the path of the Liblapack library to LIBRARY_PATH. The version using FAISS is V1.6.3.

(4) How to configure rocksDB for storage?

A: 1). Download and compile rocksDB v6.2.2 first; 2). 'export ROCKSDB HOME' directory before compiling gamma. Then gamma will dynamically link rocksDB's SO.

(5) sudo docker run -p 8817:8817 -p 9001:9001 -v *$PWD/config/config.[toml:/vearch/config.toml](http://toml/vearch/config.toml) vearch/vearch:0.5.1 all* ，Just quit. No logs?

A: Start command problem.

(6) Using Docker, start each node in turn and report the following errors:

{"code":550,"msg":"not enough PS , need replica 1 but only has 0"}

A: 1) Deploy in docker, IP uses IP in the container; 2) Address under [etcd] in the configuration file is written as external etcd IP. This configuration can be ignored with the built-in etcd; 3) Same configuration files for Master, Router and PS.

(7) Error while starting the container. Description of error: Vearch knows there is space but cannot find data. The display data is zero.

A: Data formats differ between versions. The data cannot be loaded onto each other.

(8) Start Vearch steps with Docker:

A : 1) Get the IP of the container; 2) Modify the IP in the configuration file to the IP of the container; 3) Start vearch in the container, the following is the start master command:

sudo docker run -itd --name=vearch_master -p 8817:8817 --restart=always -v $PWD/vearch/master/config.toml:/vearch/config.toml -v $PWD/vearch/master/logs:/vearch/logs vearch/vearch:3.2.0 master

(9) Master failed to start?

The error log："find local master failed:master's name:[%s] master's ip:[%s] and local master's name:[%s]"

Problem description：The user starts the Vearch service in the Docker container. Configure the physical machine IP address for master IP. Cause the master program to fail to start. No matching master configuration could be found.

A: The IP of the MASTER configuration file must be the same as the container IP that was started. If any of these problems occur, check that the MASTER IP is configured correctly.

(10) Image downloaded from DockerHub. Version: Vearch/Vearch : 0.5.1. Deploy error in K8S. As follows:

request cluster ID mismatch......

A: The master's Address is changed to the real IP.

2. Usability problem

(1) Node restart error:

etcdserver: MaxRequestBytes 33554432 exceeds maximum recommended

A: The maximum number of characters per string field is now 256. If you have special requirements, can modify gamma code https://github.com/vearch/gamma/blob/master/table/table_define.h. Changes the 27 lines from typedef uint8_t str_len_t to typedef uint16_t str_len_t. After recompiling, you can enlarge the string size.

(2) The Ps node died and was restarted 30 minutes later, but found it could no longer be restored to normal.

A: The reason could be a raft log missing. Make the raft_truncate_count parameter in the PS configuration larger and larger than the expected total number of documents. So the Ps can be restored normally.

(3) Can master clusters and etcd clusters be adjusted dynamically?

A: Yes, in the future we will separate the master from the etcd cluster, and all masters will share one etcd cluster. Etcd itself supports the function of dynamically adding members, so that dynamic adjustment can be achieved. The latest versions of both the etcd cluster and the Master cluster can be deployed via the domain name. The configuration file needs some changes. The Master can also build a cluster using load balancing. Provide the domain name for ps and router, so that the master does not need to change other nodes after dynamic expansion. This feature will be supported later.

(4) If a node is down, one replica of partition is lost;If the new node is added or the node is restored, will this node automatically create replica sharding?

A: 1). If the subsequent node returns to normal, the data is not lost and the service can be restarted, it will automatically synchronize the latest data of other replica to recover.

2). If the node cannot be recovered, you can restart a new node and manually execute the command to create a new replica.

(5) Do you now support PS dynamic capacity expansion?

A: Now only replica expansion is supported. For example: 1 partition, 3 replica. Expanded to 1 partition, 4 replica. Execute the following command:

curl -XPOST -H "content-type:application/json" -d '{
"partition_id":1,
"node_id":4,
"method":0
}
' http://$master_server/partition/change_member

"Node_id ": newly added node ID, method: 0 expansion, 1 reduction.

(6) In v3.1.0, 3 replica of space are set, and the whole cluster becomes unavailable after killing the leader node.

A: You need to look at the configuration of the environment and the amount of data in space for analysis. Raft highly available recovery relies on raft logging. You need to make sure that the log amount is complete. It is usually possible to set the raft_truncate_count in the configuration file to be greater than the total number of documents expected. It controls how much PS saves raft logs.

(7) After the master node container is restarted. Router node cannot get master information, and the query shows that “master Server is not running"?

A: Ps and Router need to be restarted when the Master is redeployed anew. Ps and Router need to be re-registered.

(8) If the engine exits with an exception, but there is no dump of newly inserted data, will this cause data loss?

A: This is A short raft. You can play it back.

(9) Docker initiates Vearch data persistence issues.

Description: The Docker container is stateless. After the restart, all the data before disappeared.

A: Solution: Docker provides the ability to mount physical machine directories to containers. Specific reference to: http://www.dockerinfo.net/%e6%95%b0%e6%8d%ae%e5%8d%b7volumes

With this feature, mount A local directory "A" for use as the "data" directory configured in "Vearch config.toml". So all the vearch data is actually stored in directory A. When the Docker container is restarted, the previous data is read by remounting directory A.

(10) Error in restarting Router or PS: "err, can not get ps server from master, err: node_id: [partition_not_exist]"

A: The starting order of Master, Router, and Ps is: Master must start before Router and PS, because Router and PS will be registered with Master.

(11) The PS disk was full when the data was inserted, and the PS died. Ps restart raft error. How do you resolve this?

A: Incomplete raft log caused the restart to fail. After you delete the raft log file, you can restart, but some of the data will be lost. In the case of multiple copies, missing data can be retrieved from other copies. If it is a single copy and the inserted data is a custom ID, you can override the inserted data to complete the missing data.

3. Create table problem

(1) create space error:

{"code":564, "msg":"dup_space"}

A: The space already exists. You can change the name of space.

(2) How to select the index model ?

A: GPU is suitable for high QPS and TP99. Compared with IFVPQ and IVFFLAT, HNSW has higher memory occupancy and longer training time. But the recall performance is better. Generally, the data volume of single machine is not more than 10 million. IVFPQ is suitable for scenarios with high data volumes per machine, up to 100 million per machine. IVFPQ recalls are higher and searches slower than IVFPQ.

(3) Create table parameters for HNSW? What are the implications of each parameter and its impact on performance?

A：

{  
	"name":"ts_space", 
	"partition_num": 1, 
	"replica_num": 1,
	"engine": { 
		"name": "gamma",
        "index_size": 1, 
        "max_size": 20000000,   
        "retrieval_type": "HNSW", 
        "retrieval_param": { "metric_type": "L2", "nlinks": 32, "efSearch": 64, "efConstruction": 40}},
        "properties": { "img_url": {  "type": "string"  },  	
        "feature": {  "type": "vector", "dimension": 128, "store_type" : "Mmap"  }
    }
}

nlinks：HNSW figure the number of neighbors for each node. efConstruction：A parameter that controls the number of adjacency points searched for the build graph. efSearch：The parameter used to control the number of adjacency points traversed during the search.

Among them, nlinks、efSearch and efConstructionall mean that the higher the value, the better the search effect, but the corresponding time will increase.

(4) Can you create multiple tables with different parameters under one Vearch?

A: Can

(5) The error of building table:

{"code":550,"msg":"create partition err: create gamma table has err:[4294967294] "}

A: Too little memory.

(6) What values can the nsubvector set for the IVFPQ model?

A: nsubvector can only be an integer multiple of 4. The original vector dimension is required to be an integer multiple of the nsubvector, otherwise 0 is added to the original vector to achieve an integer multiple. This could have a significant impact on the recall. So nsubvector is set to an integer multiple of 4 and it divisible the dimensions of the original vector.

(7) What does it mean for IVF model index_size? What is the general range of this value?

A: Index_size represents the training data size. The training data is between ncentroids*(39~256).

(8) During the creation of the table. The cluster failed once. Resulting in 5 minutes of lockdown.

A: 1). If there is no data in vearch, redeploy it all, and the historical data must be cleared for redeployment. 2). Use curl $master:$port/clean_lock, 'master' and 'port' to replace your address and port. The lock is cleared.

(9) Two Ps are distributed deployed. See table there is a PS no log.

A: The "partition" was set to 2 when the table was built.

(10) Vearch has a copy mechanism. If replica_num=1, does the data store 2 copies when it is stored?

A: When replica_num = 1, a copy of the data is stored. When replica_num = 2, the data is stored in 2 copies.

(11) How do I specify ps nodes when Creating space?

A: Specified when creating the DB.

curl -H "content-type:application/json" -XPUT -d'
{
	"name":"ts_db";
	"ps":["172.17.17.17"]
}
'http://master:7810/db/_create

4. Insert data problem

(1) Add a newline at the end of each line of JSON in the bulk insert interface. Is this required by the interface? How does this work? I think a new 'content-type' should be defined. Instead of just writing 'Application/JSON'.

A: The server resolves by line. Each line is a JSON string. So the parameters need to be constructed in this format. The current format refer to the popular ElasticSearch format. https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html Later versions can be referred to for improvement.

(2) Error inserting data： E0526 02:12:02.358861 30117 ps_space_service.go:659] [ERROR] client ps write doc error: process vectory err:[field:[embed] vector_length err ,schema is:[768] but input :[0]] m value:[[]].

A: When inserted, it has the characteristic of being empty.

(3) When batch insert, Error: client has closed.

A: One insertion is too large.

(4) What is the maximum number of inserts allowed for batch inserts at one time? I wrote a very long curl command, about 20,000 lines. Error in executing times： insert_0.sh:1:insert_0.sh:curl:Argument list too long.

A: Very long parameters are not supported. It is recommended to insert 100, 200, 500 and 1000 in batches. Multiple clients can be started to speed up the insertion.

(5) Does Vearch support real-time indexing, real-time inserts, and real-time queries?

A: Support

(6) How is the update performance of the index for real-time insertion?

A: Pretty soon, it's all unlocked. Billions of data volumes, millisecond updates. With millions of data, update delays should be largely negligible.

(7) Error inserting data? The log is as follows:

E0702 14:52:28.842144 37842 handler_document.go:247] [ERROR] type: [REPLACE] doc err :[raft is already shutdown.]

runtime/debug.Stack(0xc0011a4a28, 0x409020, 0xc00097ef58)

...

E0702 14:52:28.849894 37842 raft_state_machine.go:158] [ERROR] partition[2] occur fatal error: raft[2] occur panic error: [[raft->persist][2] storage storeEntries err: [writeZ /export/Data/baud/datas/raft/2/000000000000054d-0000000001f016b2.log: no space left on device].]

A: The disk is full.

(8) Concurrency insert error: "Too many open Files".

A: Change the Limit on Linux.

(9) After writing more than 20 W pieces of data to the Vearch. After restarting the service, there are only 1W9 pieces of data left in the index. What's the reason for this?

A: For the persistence function of the latest version, RocksDB should be specified when compiling veach to avoid connecting to tcmalloc. It has not yet been fully tested at tcmalloc. In your case rocksDB was not specified at compile time and is now connected to tcmalloc.

5. Vector search problem

(1) For the HNSW model, the query is performed using the inserted vector, and the result is not itself. The maximum score returned was only 0.6. What's the reason?

A: HNSW is a graph model. The search is traversed continuously by the neighbors of the point. If you don't build a good relationship with your neighbors, you may not find yourself. You can make efConstruction bigger, which improves the quality of the graph, but it also takes longer to compose. If the vector is not normalized, the HNSW query will not work very well.

(2) Where are the indexe of vearch GPU stored?

A: In order to support in-place updates, CPU version index dump was turned off in the new version 3.1. GPU indexes do not dump themselves; they depend on CPU dump.

6. Numerical search problem

(1) “term“：{”field_name“：["100","200","300"]，”operator“：”or“}，Can the parameter of operator be not?

A: The operator parameter options are or and and.

(2) When a Vector uses IVFPQ for indexing, if the query criteria contain the RangeFilter for the remaining fields. What's the algorithm for that? Since IVFPQ is searched by clustering, would it be a problem if the RangeFilter filtered most of the data?

A: No problem. Ivfpq determines whether doc is filtered out by RangeFilter before calculating similarity. If it is filtered out, the doc similarity is not calculated. You can see: https://github.com/vearch/gamma/blob/master/index/gamma_index_ivfpq.h:629

(3) Are indexes and original vectors currently stored in memory?

A: The V3.2.0 IVFPQ model allows indexes to be stored on disk. Other models are not yet supported.

7. Search related questions

(1) Vearch occasionally crashes when the volume of queries suddenly increases. When the log shows: “libgomp: Thread creation failed: Resource temporarily unavailable, fatal error: unexpected signal during runtime execution.” The program then exits with an exception.

A: After checking the server OpenMP library, it was found that the server libgomp library version was 4.8.5-4. However, the libgomp version is 4.8.5-39 on the server that compiled gamma. This means that the new version of the library was used when gamma was compiled, and the old library was linked when it was deployed to the server, and there was a significant difference between the old and new versions. This causes the service to crash at high concurrency.

Libgomp is backward compatible. Programs compiled with 4.8.5-39 May crash when running under 4.8.5-4. But on the contrary, no problem has been found. This phenomenon can be repeated by using the AB command to set the concurrency value to 20,000.

How to avoid this:

The compilation of the program can be put on the deployment machine to compile, so that the probability of problems is the lowest.
If the compiled server and deployed server are not the same server. Then you need to package libgomp from the compilation server and put it on the deployment server.

(2) Which models are affected by the parameter "quick"?

A: Models with pq algorithm, such as IVFPQ.

(3) Execute in Docker. the error log is as follows:

libgomp:Thread creation failed: Resource temporarily unavailable

A: Maximum thread limit. Execute 'ulimit-s 99999' to modify the limit in docker and host. OMP_NUM_THREADS is also set to be larger. If you still report this error, the pressure may be too high, causing the thread to squeeze too much. You can control the amount of concurrency.

(4) Gamma.log and Master.log did not report an error during the pressure test. The shell reported the following error and the server crashed.

*goroutine 391 [IO wait, 5003 minutes]: internal/poll.runtime_pollWait(0x7f4cd9c3f968, 0x72, 0x0) /data/rrjia/project/go/src/runtime/netpoll.go:203 +0x55 internal/poll.(pollDesc).wait(0xc000c1e398, 0x72, 0x0, 0x0, 0x1520d63) /data/rrjia/project/go/src/

Create 1000 spaces.

A: The number of space creation is not online, but it's too many. Suggest 10 below, otherwise waste a lot of memory. You can add a location field and then index the location field. When searching, just filter by location. There's no need to build too much space.

(5) The system returns an error similar to the following when querying.

errors":{"internal error":{}

A: The query statement has an error.

(6) The query returns no results。

A: For the query with field filtering, add 'index: true' parameter to the field using filtering when building the table, otherwise no result may be returned; The query parameters need to have vector fields, otherwise the query will return no results.

(7) Search with filtering criteria, how does the search work? Does it first find topK image from "inverted index" and then filter it according to the condition?

A: Except for GPU index, the qualified images are filtered out first, and then topK is found from "inverted index".

(8) For distributed deployment, Vearch's query process.

A: Router sends Query to each PS. Then router gets the returned results of each PS and summarizes them. The process is similar to ES.

(9) How does Vearch use cosine retrieval?

A: Cosine distance = We use the InnerProduct after we normalize the vector.

(10) Why is the search score greater than 1?

A: There is no norm normalization for vectors in the library.

(11) What do the recall_num_queries and is_brute_search parameters mean?

A: The default for "recall num queries "is 1, parallel between searches. 0 epresents parallelism between buckets. In the IVFPQ model, is_brute_search denotes whether or not a violent search is performed.

(12) "metric_type ": "L2" score does not match the actual score?

A: For L2, Vearch defaults to no square root. When creating a table "L2_sqrt ": true is sqrt.

8. Update and delete data issues

(1) When updating data, error is reported:

"error":"JsonMap GetJsonValBytes key: doc not found.","status":400

A: The field passed in does not exist.

(2) Does deleting a lot of 'keys' cause write blocking?

A: It won't make much influence.

(3) Delete an ID from vearch. Insert the ID again. Displays success. But when you look up the ID, it says 'found: False'. Is this a bug?Github Tags V0.3.0 was used.

A: Vearch uses tag deletion. It is not actually deleted. After inserting the ID again, the tag is not removed. So this can cause this problem. Later versions will fix the problem.

(4) Memory is still occupied after vearch has deleted the data, but it cannot be searched.

A: Use the tag to delete when a small amount of data is deleted. When the markup is removed to a certain degree, it triggers "compaction" to allow physical removal.

9. Performance issues

(1) 300 million 256 dimensional data, how much is the appropriate number of shards? How does the final system performance relate to the number of partitions?

A: Generally, a server is configured with a calculated sharding node (PS).The smaller the amount of data allocated per machine, the faster the calculation. For the specific allocation of 300 million data, first determine the maximum amount of data stored in each PS compute node (PS node server physical configuration should be as same or similar as possible). For example, if each machine can store up to 100 million data, then at least 3 machines will be required to use 3 slices. If the number of machines is greater than 3, set the sharding number to the number of PS service nodes. The more service nodes, the less data per node and the faster the computation.

Approximate estimate of how much data each server stores: 1) For characteristic disk storage, the maximum amount of data inserted depends on the size of the disk. This storage mode is relatively slow to query. 2.) For feature memory storage, the memory footprint can be multiplied by 20% by the size of the feature converted to bytes. Actual test verification is required.

(2) How to improve QPS?

A: When building a table, partition_num is the number of slices, and replica_num is the number of replicas. When the number of copies is greater than 1, the primary node is searched by default. Both the primary and slave nodes can accept query requests when the request parameter client_type=random. When the data volume is relatively large, the number of partitions can be expanded (ps nodes can be increased) and the speed can be increased.

(3) Vearch supported models and recall rates?

A: Five models are supported temporarily: 'IVFPQ', 'GPU', 'BINARYIVF', 'HNSW', 'FLAT'. IVFPQgenerally recall about 95%, depending on the data; HNSW recall 97-99%, heavy memory consumption; FLATviolent search is 100% effective, but has an impact on performance.

10. Other issues

(1) How do I get the amount of data in a vearch table?

A: https://vearch.readthedocs.io/zh_CN/latest/use_op/cluster_status.html#id3 The documentation says so. This is available through the following command.

curl -XGET http://master_server/_cluster/health

(2) Accept error: accept tcp [::]:9001: accept4: too many open files; retrying in 5ms？

A: System file handle configuration is too small.

(3) When the Vearch service stops, will the vearch automatically call the dump method in the profile to save the full amount of scalar raw data?

A: Dump is not called when the service is stopped. The dump interval can be set by configuration.

(4) Is RPC a new GRPC interface?

A: The client TCP interface is the GRPC interface. RPC is a call to the service between Router and PS.

(5) Does the Python SDK go HTTP or GPRC? GRPC only query interface, no insert interface, insert interface only via HTTP?

A: The Python SDK only supports native use. The current version insert only takes HTTP.

(6) Linux pip install vearch . The error “Could not find a version that satisfies the requirement vearch (from versions: ). No matching distribution found for vearch” ？

A: pip version is too low, upgrade pip.

(7) How to reduce the number of log entries?

A: Modify the log print levels in the configuration file. such as debug, info, warn, error.

(8) The raft log is too large. Is there any way to reduce it?

A: The maximum number of entries retained in the logs can be controlled by the parameter raft_truncate_count.

(9) Call pytorch with Vearch, using only the first GPU. How to solve it?

A: Multithreading can only use the first GPU card. When using multiple processes, set visible GPU cards for each process, Otherwise, the first card is used. Sets the value of the GPU card at the beginning of the process. For details, please refer to:

https://github.com/vearch/python-algorithm-plugin/blob/3511bd9b720bf1172bfe3bf95e768cb54c022c19/src/main.py#L75

(10) How to add SSH service to The Docker image of V3.2.0?

A: docker exec -it imageid /bin/bash` Into the container.

(11) When is the auto_recover_ps parameter used in general?

A: When the PS node is suspended, the PS node is automatically removed. When a new node is added, it automatically switches to the new node. In the case of multiple copies, bad PS data is restored to the new PS.

(12) Gamma control to remove raft log parameters?

A: Modify the "raft_truncate_count = 500,000" in the configuration file under [PS]

(13) Maximum amount of data per PS?

A: The maximum number of bitmaps that mark doc's effectiveness is 1 billion. Everything over a billion will fail.

11. Unsolved problem

(1). Does GPU mode not support flat at present?

A：yes

(2). Can vearch get a list of the number of samples and id list in the space?

A: Not currently supported.

(3). https://vearch.readthedocs.io/en/latest/use_op/op_doc.html#delete Bulk deletion written in the document does not work. What is the correct batch delete JSON?

A: Not currently supported. You can use a single delete or delete based on a query.

(4). Does the API call documentation have a return value description?

A: Not at the moment.

(5). Bulk inserts, where part of the inserted data is in the database, seem to return the same result. Some of them are overwritten, none of them are new, and all of them are successful. If the ID already exists in the database, can you prevent such an insert and just create a new id that doesn't exist?

A: It is currently not supported to mark existing id when they are inserted again.

(6). After physical deletion is developed, is marked deletion in the current version compatible with physical deletion?

A: Not currently supported.

(7). GPU does not support adding data to indexes in real time. The new data will not take effect until the index is rebuilt.（Rebuild index:curl -XPOST http://router_server/$db_name/$space_name/_forcemerge）

A: Is not supported. The GPU index can be searched after the index is created. Subsequent data updates are performed in the CPU. Synchronizes to GPU by "forcemerge", during which the query is not affected.

(8). Return when building the table. {"code":550,"msg":"revoke lease 6074074183416770188 error :context deadline exceeded"} What is the reason for this?

A: The problem was caused when ps called the gamma engine interface timeout after the master called PS. The creation of the table is optimized later.

(9). Do you support single field updates?

A: Not currently supported.

(10). Using python SDK, write to the engine after the retrieval, the program runs to the end, generates some data on the hard disk, how to recover the index from the hard disk?Call the "load" method and the error "No such File or directory: 'files/table.pickle'" is displayed.

A: You need to call dump manually, some files will be dumped first, such as the feature, because the feature will be large, and then it will be time-consuming to dump together. Because the original intention of Python SDK design is stand-alone scenario, the data set size is generally considered to be tens of millions, and the data of hundreds of millions has not been tested. According to previous experience, the writing speed is around 5000 pieces /s. A "restful" interface is recommended if the scale is large.

(11). Doesn't the mmap version support Load and Lump?

A: Not currently supported.

(12). Do you support sparse vectors？

A: Not currently supported.

(13). Currently support callback?

A: Not currently supported.

(14). Vearch supports index only, not original vectors?

A: Not currently supported.

(15). Does Vearch support must_not filtering?

A: Not currently supported.

(16). The score returned when retrieved using "retrieval_type" : "BINARYIVF" is no longer a 0-1 decimal. Set min_score and max_score, return values still have out-of-range results? And the score doesn't come back in positive or negative order, right?

A: For BINARYIVF, the exact same vector score is 0, but this version of min_score and max_score are not yet in effect. The next article will fix this problem. The score order is from small to large, with the most similar first.

(17). Can the number of partition_num slices be added dynamically?

A: Not currently supported.

(18). Fields are determined when a table is built. Will an error be reported when inserting data and when searching for a missing field?

A: yes.

(19). Does Address support IP only? Don't support domain name deployment?

A: yes

(20). Insertion error：

terminate called fater throwing an instance of “libcuckoo_load_factor_too_low”，

A: Id is too continuous, that's the hash table problem. You can insert it without specifying an ID, and we'll generate it automatically.

(21). Using the IVFPQ model and InnerProduct calculation, multiple queries are performed on the same piece of data but inconsistent results are returned.

A: When the vector contains negative values, the InnerProduct can be used to calculate a negative distance. As a result, it is filtered out, so the result is inconsistent. We're going to do a little bit of extra work in this case as well.

(22). Can Vearch run on Huawei's Kunpeng Arm architecture server?

A: It hasn't been tested so far.

(23). Do HNSW and BINARYIVF support dump Index?

A: don't support

(24). Raft takes up a lot of memory.

A: Late optimization

(25). Multi-db, multi-table query cancelled?

A: Temporarily cancelled, the next iteration will add this feature.

(26). Can similar calculation rules be changed?For example: customize.

A: Not at the moment

(27). Do you support scope searches using only scalar fields?

A: don't support

(28). The update error is as follows:

gamma_engine.cc:564 update error, key=3323072295084000-12, docid=4
mmap_raw_vector.cc:226 MMap doesn't support update!

A: Mmap does not support updates at this time

(29). Is there a batch update interface?

A: No.

Home

English

中文版

引导教程

源码编译以及Vearch二进制文件下载地址

使用教程

常见问题

Master的架构和设计

master详细设计

Router的架构和设计

router-ps改造设计文档

PS的架构和设计

PS架构和流程图

Gamma的架构和设计

索引Inplace-update和Compaction

Hnsw实时索引详细设计

评测和典型用例

召回评测

索引介绍和参数选择

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FQA

Vearch-QA

1. Compile, deploy issues

2. Usability problem

3. Create table problem

4. Insert data problem

5. Vector search problem

6. Numerical search problem

7. Search related questions

8. Update and delete data issues

9. Performance issues

10. Other issues

11. Unsolved problem

English

Tutorial

Architecture and Design of Master

Architecture and Design of Router

Architecture and Design of PS

Architecture and Design of Gamma

Benchmarks and Typical use cases

中文版

引导教程

Master的架构和设计

Router的架构和设计

PS的架构和设计

Gamma的架构和设计

评测和典型用例

Clone this wiki locally