Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean data for ldbc #56

Merged
merged 7 commits into from
Feb 8, 2022
Merged

clean data for ldbc #56

merged 7 commits into from
Feb 8, 2022

Conversation

heroicNeZha
Copy link
Contributor

@heroicNeZha heroicNeZha commented Jan 25, 2022

some script to clean date for ldbc gen

clean-data.py will generate .copy file next to the csv file, clean-data.py can be repeated many times before executing copy-data.py

# this command gennerate xxx.csv.copy file
$ python3 clean-data.py -j 10 -i ../target/data/test_data/social_network/
handler-Thread-1 handler dynamic/person.csv.
handler-Thread-2 handler dynamic/post.csv.
handler-Thread-3 handler dynamic/person_workAt_organisation.csv.
handler-Thread-4 handler dynamic/person_likes_post.csv.
handler-Thread-5 handler dynamic/person_likes_comment.csv.
handler-Thread-6 handler dynamic/post_hasCreator_person.csv.
handler-Thread-7 handler dynamic/comment.csv.
handler-Thread-8 handler dynamic/forum_hasMember_person.csv.
handler-Thread-9 handler dynamic/person_hasInterest_tag.csv.
handler-Thread-10 handler dynamic/forum_containerOf_post.csv.
handler-Thread-1 handler dynamic/comment_hasTag_tag.csv.
handler-Thread-3 handler dynamic/post_hasTag_tag.csv.
handler-Thread-9 handler dynamic/comment_replyOf_comment.csv.
handler-Thread-3 handler dynamic/comment_isLocatedIn_place.csv.
handler-Thread-10 handler dynamic/person_isLocatedIn_place.csv.
handler-Thread-10 handler dynamic/post_isLocatedIn_place.csv.
handler-Thread-6 handler dynamic/comment_replyOf_post.csv.
handler-Thread-9 handler dynamic/forum.csv.
handler-Thread-4 handler dynamic/comment_hasCreator_person.csv.
handler-Thread-9 handler dynamic/forum_hasModerator_person.csv.
handler-Thread-9 handler dynamic/person_email_emailaddress.csv.
handler-Thread-9 handler dynamic/person_speaks_language.csv.
handler-Thread-9 handler dynamic/person_knows_person.csv.
handler-Thread-9 handler dynamic/person_studyAt_organisation.csv.
handler-Thread-9 handler dynamic/forum_hasTag_tag.csv.
handler-Thread-10 handler static/place.csv.
handler-Thread-10 handler static/tag.csv.
handler-Thread-10 handler static/tag_hasType_tagclass.csv.
handler-Thread-10 handler static/tagclass.csv.
handler-Thread-10 handler static/organisation.csv.
handler-Thread-10 handler static/tagclass_isSubclassOf_tagclass.csv.
handler-Thread-10 handler static/place_isPartOf_place.csv.
handler-Thread-10 handler static/organisation_isLocatedIn_place.csv.
all task done! please run copy-data.py to recover csv file

$ python3 copy-data.py  -i ../target/data/test_data/social_network/

GOLANG_VERSION=${GOLANG_VERSION:-1.16.6}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need add new line

import threading
import pandas as pd

_csv_dir = "../target/data/test_data/social_network/"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be obtained through input parameters

@HarrisChu HarrisChu merged commit 9684d1a into nebula-contrib:master Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants