Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ldbc数据集需要做什么样的处理 #11

Closed
Sean58238 opened this issue Jun 17, 2021 · 9 comments
Closed

ldbc数据集需要做什么样的处理 #11

Sean58238 opened this issue Jun 17, 2021 · 9 comments

Comments

@Sean58238
Copy link

ldbc数据集导入测试部分不太详细,有几个问题请帮忙解答一下
1 ldbc数据集
目前,ldbc数据集是否要做什么特殊处理,之前有在论坛问过的回复是:目前nebula-bench 没更新,比较简单的办法就是用 ldbc v0.3.3 生成数据后,去掉 csv 的第一行。 现在任然需要自己去修改指定的数据嘛,v2.0.1的测试报告是否也是基于ldbc v0.3.3来做的?

2
nebula-bench中对ldbc数据处理有一个merger的步骤,我们观察到只是修改了updateStream.csv这个文件,但是yaml配置中似乎并未使用到这个csv文件

@HarrisChu
Copy link
Collaborator

HarrisChu commented Jun 17, 2021

Q1:
只需要去掉第一行,然后配置 importer 的配置文件就好,不需要再修改其他数据。

Q2:
merger 是将 ldbc 0_0 格式的 csv 文件合并

@HarrisChu
Copy link
Collaborator

HarrisChu commented Jun 17, 2021

Q1:
Just delete the first line, and then config the importer configuration file, no needs to modify other data.

Q2
merger script is used for merge csv files

@Sean58238
Copy link
Author

请问下,如果在单机上进行性能测试,比如验证不同的SSD产品对nebula的性能影响,推荐部署几个meta,storaged和graphd。分别各起一个是否可行?

@HarrisChu
Copy link
Collaborator

不同的 SSD 对 nebula 的影响,主要在 storage,分别起一个是可以的。
如果测试的 ssd 比较少,也可以使用同样的 meta,storaged,只把 graphd 放在要测试的 ssd 上,做测试。

@Sean58238
Copy link
Author

谢谢,另外nebula-importer可以测量导入数据的性能。ldbc下面有很多数据,比如dynamic下面有很多excel表,导入性能测试推荐所有表都一起导入,还是有选择的某个或某几个表就可以?

@HarrisChu
Copy link
Collaborator

It depends on your scenarios.
If just test the import performance and simple queries, you could import only 1 file.
If you want to test more complex queries, you could import all the files, .e.g. person -> KNOWS -> person -> created -> POST.


这取决于你的场景。
如果只是测试导入性能和简单的查询,你可以导入1个文件。
如果要测试更复杂的查询,可以导入所有文件。.e.g. person -> KNOWS -> person -> created -> POST.

@Sean58238
Copy link
Author

请教一个导入的问题
1 创建space;
CREATE SPACE IF NOT EXISTS importer_test2(partition_num=5, replica_factor=1, vid_type=FIXED_STRING(100));

2 创建Forum的schema
CREATE TAG IF NOT EXISTS Forum(title string,creationDate string);

3 import数据
导入的时候有2种错误,似乎都是和 vertex id有关的

INSERT VERTEX Forum(title,creationDate) VALUES 0: ("Wall of Mahinda Perera","2010-02-14T15:32:20.447+0000");
ErrMsg: SemanticError: No schema found for `Forum', ErrCode: -12

INSERT VERTEX Forum(title,creationDate) VALUES 2199023255564: ("Album 11 of Mahinda Perera","2012-09-08T16:20:33.879+0000");
ErrMsg: Wrong vertex id type: 2199023255564, ErrCode: -8

@HarrisChu
Copy link
Collaborator

question 1, please refer https://docs.nebula-graph.com.cn/2.0.1/5.configurations-and-logs/1.configurations/3.graph-config/#networking

You should wait graph sync the schema.

question2, your space VID type is FIXED_STRING, it should be
INSERT VERTEX Forum(title,creationDate) VALUES '2199023255564': ("Album 11 of Mahinda Perera","2012-09-08T16:20:33.879+0000");

@HarrisChu
Copy link
Collaborator

close the issue, if you have other questions, please raise a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants