Skip to content
This repository was archived by the owner on Jul 1, 2023. It is now read-only.

Conversation

eaplatanios
Copy link
Contributor

This PR simply adds a script that clones the TF bindings in a Bindings subdirectory. This script must currently be invoked independently before calling swift build if we want the bindings to be cloned. It is currently useless, but it is part of the larger move discussed in #109.

@rxwei @pschuh

@rxwei
Copy link
Contributor

rxwei commented May 28, 2019

In order for other libraries to use swift-apis as a dependency, I think we should check the bindings file and the generation script into this repo like you proposed earlier. We can discuss a plan to move the binding generation out once SwiftPM supports custom commands.

@eaplatanios
Copy link
Contributor Author

@rxwei I changed this PR to reflect this and made a corresponding PR in swiftlang/swift#25097.

@rxwei rxwei requested a review from bgogul May 28, 2019 20:18
@rxwei
Copy link
Contributor

rxwei commented May 28, 2019

It appears the build failed.

@eaplatanios
Copy link
Contributor Author

That's because is needs stuff from CompilerRuntime.swift. This may be quite hard to separate though, given that other things in the stdlib also depend on CompilerRuntime.swift. How should we proceed about this? #109 compiles fine for me so we just need to find a way to break it down without resulting in issues like this. Another way would be to make several PRs like this that do not compile/pass tests successfully for now, just for the sake of breaking down the diff.

@eaplatanios
Copy link
Contributor Author

@rxwei For now I added the runtime sources as I have reformatted them in #109. Could you please rerun the CI tests and see if this works? I cannot currently do that on my machine because I need to recompile the toolchain and it'll take a while.

@eaplatanios
Copy link
Contributor Author

eaplatanios commented May 28, 2019

Sorry I had to make one more change to make this compile. It also required a change in the corresponding swiftlang/swift#25097 PR so not sure how we can test this on the CI before merging that other PR.

@rxwei
Copy link
Contributor

rxwei commented May 28, 2019

FYI: For the next few hours I'll focus on helping you get the refactoring all done. Let me know which PR you'd like to trigger CI on or anything you need help with.

@eaplatanios
Copy link
Contributor Author

That's great! Thanks a lot Richard!

So, I believe that currently, swiftlang/swift#25097 and this PR are inter-connected in that none can be tested without the other one being merged. Thus, I'll first make a separate small PR on apple/swift that should let us test this independently.

@rxwei
Copy link
Contributor

rxwei commented May 29, 2019

Building a toolchain now.

@eaplatanios
Copy link
Contributor Author

I just finished compiling one locally and I get the same errors. It seems like something is wrong with tffunc but I'm not sure what or why yet.

@rxwei
Copy link
Contributor

rxwei commented May 29, 2019

It's surprising that the code has different behavior when migrated. Was any logic changed?

@eaplatanios
Copy link
Contributor Author

Nevermind, I think I have pinned it down to the .== operator. I have noticed previously that operators may need to be redefined/redeclared sometimes across modules.

@eaplatanios
Copy link
Contributor Author

Looking into it a bit more I see that the issue is not with .==. It seems like some tensors end up being empty but I'm not sure why. I'll look a bit more into it now and keep updating here.

@eaplatanios
Copy link
Contributor Author

It is indeed only an issue when _tffunc is involved. When was the toolchain that was used to run the tests previously built? Can we find that out?

@eaplatanios
Copy link
Contributor Author

I'm wondering if e.g. this change was introduced or not.

@bgogul
Copy link
Contributor

bgogul commented May 29, 2019

I'm wondering if e.g. this change was introduced or not.

@eaplatanios, can you run the failing test after setting the following environment variables: TF_CPP_MIN_LOG_LEVEL=0 TF_CPP_MIN_VLOG_LEVEL=3 SWIFT_TENSORFLOW_ENABLE_DEBUG_LOGGING="true"

It will dump a bunch of logs that should help debug a bit more. This generates a lot of logs. So, you might want to run the smallest failing test.

@eaplatanios
Copy link
Contributor Author

@bgogul I'll do that, but in the meantime I found out that e.g., a tensor created here has 0 elements when it shouldn't (for the testMapToDifferentType test). I'm not sure if that helps.

@bgogul
Copy link
Contributor

bgogul commented May 29, 2019

at e.g., a tensor created here has 0 elements when it shouldn't (for the testMapToDifferentType test). I'm not sure if that helps.

Hmm.. that is odd. Can you get print the number of elements in In._typeList here ?

@eaplatanios
Copy link
Contributor Author

Hmm.. that is odd. Can you get print the number of elements in In._typeList here ?

There is one element in In._typeList which is consistent with the fact that it builds a single tensor; it's just an empty tensor (i.e., it has shape [0]). This should be reproducible by running swift test -Xlinker -ltensorflow --filter 'MapToDifferentType' for this PR.

@eaplatanios
Copy link
Contributor Author

@rxwei @bgogul Something very weird is happening! I tried building swiftlang/swift#25097 with this PR for swift-apis. When I run the tests in apple/swift using lit, everything works fine and they pass. However, when I run them with swift test in this PR, these weird errors pop up in places where function attributes are used. Why could this be happening?

@eaplatanios
Copy link
Contributor Author

Could function tracing not be working correctly when used within XCTests?

@eaplatanios
Copy link
Contributor Author

I tried adding a separate executable target (thus avoiding XCTest) and the same issue occurs. I wonder if this has to do with any of the flags used when compiling the stdlib in apple/swift.

@bgogul
Copy link
Contributor

bgogul commented May 29, 2019

@rxwei @bgogul Something very weird is happening! I tried building apple/swift#25097 with this PR for swift-apis. When I run the tests in apple/swift using lit, everything works fine and they pass. However, when I run them with swift test in this PR, these weird errors pop up in places where function attributes are used. Why could this be happening?

I am not sure, @eaplatanios. I have cloned your setup on my desktop and I am currently building it. I will keep you posted.

@eaplatanios
Copy link
Contributor Author

Thanks @bgogul!

@bgogul
Copy link
Contributor

bgogul commented May 29, 2019

@eaplatanios I just got around to reproducing the error on my desktop.I will investigate this a bit and let you know.

@eaplatanios
Copy link
Contributor Author

That’s great. Thanks for the update @bgogul!

@eaplatanios
Copy link
Contributor Author

@bgogul I don't know if this helps but I've made the following observations:

  • When CompilerRuntime.swift and the bindings are compiled as part of apple/swift, then the tests here, in swift-apis, pass when using swift test.
  • When I move CompilerRuntime.swift and the bindings to swift-apis, the error occurs.

This makes me think that this has to do with how the tracing functionality is compiled and wonder if any of the compiler flags in the TensorFlow module CMakeLists file affects this.

@eaplatanios
Copy link
Contributor Author

Side note, but it'll be so great once we can test everything independently in swift-apis :) Every single change I make while debugging this requires me to build a new toolchain which takes forever.

@rxwei
Copy link
Contributor

rxwei commented May 30, 2019

All flags we use to build stdlib/TensorFlow should be here. I can't think of any compiler flag that could change the behavior this way. It's strange.

@eaplatanios
Copy link
Contributor Author

All flags we use to build stdlib/TensorFlow should be here. I can't think of any compiler flag that could change the behavior this way. It's strange.

Yeah I saw these before and wasn't sure if they can cause any issues. I really have no idea what's wrong but it does only seem to appear whenever _tffunc is used which is very suspicious.

@bgogul
Copy link
Contributor

bgogul commented May 30, 2019

  • When CompilerRuntime.swift and the bindings are compiled as part of apple/swift, then the tests here, in swift-apis, pass when using swift test.
  • When I move CompilerRuntime.swift and the bindings to swift-apis, the error occurs.

Thanks for this information. This gives me something to go with. I was completely clueless on how to debug this. ;)

I am actually a bit confused now. Aren't we including the CompilerRuntime.swift and the bindings multiple times when we do swift test? I see that stdlib/public/TensorFlow/CMakeLists.txt includes the sources from the bindings and so does swift-apis/Package.swift. Is that the issue here?

@bgogul
Copy link
Contributor

bgogul commented May 30, 2019

I am actually a bit confused now. Aren't we including the CompilerRuntime.swift and the bindings multiple times when we do swift test? I see that stdlib/public/TensorFlow/CMakeLists.txt includes the sources from the bindings and so does swift-apis/Package.swift. Is that the issue here?

I believe this is the problem. I excluded the Runtime and Bindings/* from Package.swift and the tests pass. See https://gist.github.com/bgogul/d4fb0d652064654391796202a8aba2ea

@eaplatanios
Copy link
Contributor Author

That’s interesting. I thought that the fact that they belong to different modules should resolve this (DeepLearning vs TensorFlow). But I guess TensorFlow is indeed imported in DeepLearning. Generally though I don’t understand these kinds of problems in Swift and how namescopes work for packages. I would expect that there should have been a compile-time error at least.

@eaplatanios
Copy link
Contributor Author

@rxwei one way around this is to move ahead directly with #109 which removes the need for importing TensorFlow in this module and avoids this issue. I’ll try to see if I can make all tests pass there directly.

@eaplatanios
Copy link
Contributor Author

All tests pass for me locally for #109 and I just started the CI tests. @bgogul Thanks for helping figure this out. I still don't really understand why there is no compile-time error and what causes this behavior but we should probably just move ahead with the complete move directly.

@bgogul
Copy link
Contributor

bgogul commented May 30, 2019

this out. I still don't really understand why there is no compile-time error and what causes this behavior but we should probably just move ahead with the complete move directly.

We don't see any compiler error because they are in different modules. However, TraceContext and ExecutionContext initializes some global TF library state. I think that the runtime is getting messed up when the initialization happens twice. That is just my hypothesis.

In any case, you are right that we should go ahead with the move.

@rxwei
Copy link
Contributor

rxwei commented May 30, 2019

Going with #109 directly sounds good to me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants