Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements varint + delta encoding on CSR to optimize the fragment memory usage #1372

Merged
merged 43 commits into from
May 26, 2023

Conversation

vegetableysm
Copy link
Collaborator

@vegetableysm vegetableysm commented May 12, 2023

@github-actions
Copy link
Contributor

github-actions bot commented May 12, 2023

🎊 PR Preview 1b66da5 has been successfully built and deployed to https://deploy-preview-pr-1372--v6d.netlify.app
🤖 By netlify

}

template <typename T>
size_t varint_decode(const uint8_t* input, T& output) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
…mpile time and add field to control the process of load graph.

Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
const uint8_t* begin_ptr_;
const uint8_t* end_ptr_;
size_t size_ = 0;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't touch Nbr and NbrList as you already have a EncodedNbr and EncodeNbrList variant.

Unify the terminology to only use Encoded or only use Compacted. I personally prefer Compacted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

[&e_list, &e_offsets_lists_, &encoded_id_sub_lists](int64_t k) {
VID_T pre_vid = 0;
encoded_id_sub_lists[k].resize(9 * (e_offsets_lists_[k + 1] - e_offsets_lists_[k]));
encoded_id_sub_lists[k].resize(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??? resize to 9*xxxx then resize to 0?

Copy link
Member

@sighingnow sighingnow May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::vector::reserve().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

[&encoded_id_sub_lists, &encoded_id_list, &encoded_offsets_list](int64_t i) {
memcpy(encoded_id_list.data() + encoded_offsets_list[i], encoded_id_sub_lists[i].data(), encoded_id_sub_lists[i].size());
},
concurrency);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please report numbers of time used by these two stages (encode, copy) and given a comparison about parallel and non-parallel version?

Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
traverse_fragment(fragment, options.dump_dry_run_rounds);
} else {
dump_fragment(fragment, target_directory + "/output_graph_f" +
std::to_string(fragment->fid()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try add the fragment_t as the template type parameter for traverse_fragment and dump_fragment to avoid the duplication.

});
} else {
return loadfn().value();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot see why you need copy the whole file, rather than some template tricks.

@sighingnow
Copy link
Member

Please fixes the format and lint error as well.

<< " seconds";
LOG(INFO) << "Generate varint time usage: "
<< (generate_varint_time / 1000000.0) << " seconds";
LOG(INFO) << "Memcpy time usage: " << (memcpy_time / 1000000.0) << " seconds";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use VLOG(100) as I think those profiling logs shouldn't be so verbose.

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
@sighingnow sighingnow changed the title Varint encoding csr Implements varint + delta encoding on CSR to optimize the fragment memory usage May 26, 2023
@sighingnow sighingnow merged commit 0c9c16e into v6d-io:main May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve the memory usage of fragment (CSR) using delta encoding and varint
2 participants