Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache Ignite File System #22194

Merged
merged 8 commits into from Nov 11, 2018
Merged

Apache Ignite File System #22194

merged 8 commits into from Nov 11, 2018

Conversation

dmitrievanthony
Copy link
Contributor

@dmitrievanthony dmitrievanthony commented Sep 10, 2018

This is a proposal to support Apache Ignite File System in TensorFlow.

Apache Ignite is a memory-centric distributed database, caching, and processing platform for transactional, analytical, and streaming workloads, delivering in-memory speeds at petabyte scale. This proposal is a part of a more global initiative to implement so called "TensorFlow on Apache Ignite" (IGNITE-8335, Design Document).

The integration is based on custom filesystem plugin from TensorFlow side and IGFS Native API from Apache Ignite side. More information about this module you can find in README.md.

Tests have also been added. They use docker to hide configuration complexity, so that the implemented functionality can be tested quite simply by manual run.

Be aware that this pull request is based on previous #22210 that is not merged yet (significant part of the code you see here will be merged in #22210).

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

@yifeif
Copy link
Contributor

yifeif commented Sep 10, 2018

Thanks @dmitrievanthony. Could you check the cla for all contributors who have commits in this PR?

@yifeif yifeif self-assigned this Sep 10, 2018
@dmitrievanthony
Copy link
Contributor Author

dmitrievanthony commented Sep 10, 2018

Hi, @yifeif. Thank you for attention. Did I understand correctly from the message above that all authors have signed CLA and the problem is only that the committer and the author of one commit are not the same? If so, I asked @artemmalykh to write here that he's ok with this PR.

@artemmalykh
Copy link
Contributor

I'm totally OK with PR!

@yifeif
Copy link
Contributor

yifeif commented Sep 11, 2018

Thanks @dmitrievanthony and @artemmalykh. Looks good now.

@yifeif yifeif added cla: yes and removed cla: no labels Sep 11, 2018
@googlebot
Copy link

A Googler has manually verified that the CLAs look good.

(Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.)

@yifeif yifeif added the awaiting review Pull request awaiting review label Sep 11, 2018
@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

@dmitrievanthony
Copy link
Contributor Author

@yifeif, I just rebased this branch, so commits haven't been changed. I think CLA check should be fine. I still waiting #22210 to be merged so I'll make one more rebase when it will be merged and ping you again.

@googlebot
Copy link

A Googler has manually verified that the CLAs look good.

(Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.)

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

@googlebot googlebot added cla: no and removed cla: yes labels Oct 2, 2018
@dmitrievanthony
Copy link
Contributor Author

Hi, @mrry, @martinwicke, @yifeif.

After merge of #22210 I rebased and updated this pull request, so now it's ready for review. It's quite big, but less that Ignite Dataset.

Derek, I'll be grateful if you have a look and start reviewing. Please let me know if changes required.

Would be great if someone starts tests and clicks cla:yes (cla checker is too suspicious in case of rebases).

@googlebot
Copy link

A Googler has manually verified that the CLAs look good.

(Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.)

@dmitrievanthony
Copy link
Contributor Author

Hi, @mrry. Any updates?

plugged into Hadoop or Spark deployments. This contrib package contains an
intergration between IGFS and TensorFlow.

@@IGFS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module doesn't appear to import a symbol called IGFS. Do you need this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


namespace tensorflow {

string GetEnvOrElse(const string &env, string default_value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark private functions as static or put them in an anonymous namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, marked as static.

const string &user_name);
~IGFSClient();

inline Status Handshake(CtrlResponse<HandshakeResponse> *res) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline is implied for member functions defined in the class definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's exactly member function defined in the class definition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, my point is that the inline here is redundant, so it doesn't need to be there (even though the compiler accepts and ignores it).

The other headers don't use inline when the function is defined inline in the class definition, so please delete the inline specifiers in this file for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I got it. I've removed inline in headers here and in ignite_client.h and ignite_byte_swapper.h as well.

@@ -126,6 +126,7 @@ py_library(
}) + select({
"//tensorflow:with_ignite_support": [
"//tensorflow/contrib/ignite",
"//tensorflow/contrib/igfs",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine these two modules into one? IGFS seems like it would fit under "//tensorflow/contrib/ignite".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have them in separate modules by several reasons. First of all, they have very different functionality ("//tensorflow/contrib/ignite" provides an ability to read data from Apache Ignite Cache while "//tensorflow/contrib/igfs" provides allows to use file system, based on Apache Ignite). User may want to use only one feature, but not other. It looks like very reasonable in some cases. I think it's a case when "write programs that do one thing and do it well" is good, isn't it? And by the way, after contrib sunsetting it makes even more sense to have them in separate packages.

Are you agree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't agree. The two modules are coupled by dependencies from this new code to the tensorflow/contrib/ignite module. The tf.contrib.igfs module doesn't export any public symbols, so users won't interact with it directly... and (if built) it will be loaded if they like it or not.

Perhaps most importantly, the shared object rules in the BUILD file mean that the code in "//tensorflow/contrib/ignite:ignite_client" will be present in both tensorflow/contrib/igfs/_igfs_ops.so and tensorflow/contrib/ignite/_dataset_ops.so, which can lead to a One Definition Rule violation and undefined behavior.

So, for all those reasons, life would be much easier if the code were combined in one module, with a single shared object!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. I merged two modules together.

return Status::OK();
}

std::shared_ptr<IGFSClient> IGFS::CreateClient() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "shared"-ness ever used here? It looks like this could return a unique_ptr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced by unique_ptr, done.

TF_RETURN_IF_ERROR(ReadString(&key));
TF_RETURN_IF_ERROR(ReadString(&val));

res->insert(std::pair<string, string>(key, val));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could std::move(key), std::move(val) here to avoid copying the strings again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Status ExtendedTCPClient::WriteString(string str) {
if (!str.empty()) {
TF_RETURN_IF_ERROR(WriteBool(false));
unsigned short l = str.length();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unsafe cast and should raise a compiler warning: str.length() returns a size_t. You need to check that it fits in the range of an unsigned short before casting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

const std::map<string, string> &properties)
: Request(command_id_),
user_name_(std::move(user_name)),
path_(std::move(path)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path is a const string&, so moving it has no effect. If you want to avoid a copy, you need to take path as a (by-value) string or a string&&.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, just removed copy().

int32_t len;
TF_RETURN_IF_ERROR(client->ReadInt(&len));

entries = std::vector<T>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: entries.clear() is slightly more readable, since the object must already have been constructed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@tensorflowbutler tensorflowbutler removed the awaiting review Pull request awaiting review label Oct 13, 2018
@dmitrievanthony
Copy link
Contributor Author

Hi @mrry. Any updates?

@dmitrievanthony
Copy link
Contributor Author

Hi, @mrry, @yifeif. I don't clearly understand current state. Do you have some concerns about this PR or it's ready to be merged?

const string &user_name);
~IGFSClient();

inline Status Handshake(CtrlResponse<HandshakeResponse> *res) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, my point is that the inline here is redundant, so it doesn't need to be there (even though the compiler accepts and ignores it).

The other headers don't use inline when the function is defined inline in the class definition, so please delete the inline specifiers in this file for consistency.

if (!str.empty()) {
TF_RETURN_IF_ERROR(WriteBool(false));
size_t l = str.length();
if (l > 0xFFFF) return errors::InvalidArgument("String is too long");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, WriteShort() takes a (signed) int16_t, so the maximum value is 0x7fff. Maybe use std::numeric_limits<int16_t>::max to avoid errors here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done.

@@ -126,6 +126,7 @@ py_library(
}) + select({
"//tensorflow:with_ignite_support": [
"//tensorflow/contrib/ignite",
"//tensorflow/contrib/igfs",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't agree. The two modules are coupled by dependencies from this new code to the tensorflow/contrib/ignite module. The tf.contrib.igfs module doesn't export any public symbols, so users won't interact with it directly... and (if built) it will be loaded if they like it or not.

Perhaps most importantly, the shared object rules in the BUILD file mean that the code in "//tensorflow/contrib/ignite:ignite_client" will be present in both tensorflow/contrib/igfs/_igfs_ops.so and tensorflow/contrib/ignite/_dataset_ops.so, which can lead to a One Definition Rule violation and undefined behavior.

So, for all those reasons, life would be much easier if the code were combined in one module, with a single shared object!

@dmitrievanthony
Copy link
Contributor Author

Hi @mrry. I merged ignite and igfs modules together and fixed other remarks. Please have a look.

"kernels/igfs/igfs_writable_file.h",
],
deps = [
"//tensorflow/contrib/ignite:ignite_client",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This should be ":ignite_client" now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -31,39 +32,88 @@ tf_custom_op_library(
deps = [":dataset_kernels"],
)

tf_custom_op_library(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ":igfs_kernels" and ":dataset_kernels" share some symbols (in ":ignite_client"), to avoid the ODR issues, it'll be easiest to merge "_dataset_ops.so" and "_igfs_ops.so" into a single shared object (and hence tf_custom_op_library() rule).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, good point. I've merged them into a single shared library _ignite_ops.so.

@dmitrievanthony
Copy link
Contributor Author

Hi @mrry. I've updated the code, please have a look.

mrry
mrry previously approved these changes Nov 6, 2018
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Nov 6, 2018
@dmitrievanthony
Copy link
Contributor Author

It looks like Sanity failed because of //tensorflow/core:lib dependency in igfs_kernels. I've removed it, so please rerun tests.

@mrry mrry added the kokoro:force-run Tests on submitted change label Nov 7, 2018
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Nov 7, 2018
@dmitrievanthony
Copy link
Contributor Author

Thanks @mrry, looks like everything is fixed. Currently only one test fails and it fails in non related to my code part.

Hope the PR will be merged soon :)

@yifeif yifeif added cla: yes ready to pull PR ready for merge process and removed cla: no labels Nov 7, 2018
@googlebot
Copy link

A Googler has manually verified that the CLAs look good.

(Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.)

@tensorflow-copybara tensorflow-copybara merged commit 24579bc into tensorflow:master Nov 11, 2018
tensorflow-copybara pushed a commit that referenced this pull request Nov 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes ready to pull PR ready for merge process
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants