Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Devicelist method to Tensorflow.java #171

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tomburke-rse
Copy link

This is a draft PR for the list_devices functionality.
Unfortunately, I could not run the tests at all and would highly appreciate any help in setting up the project.
Do I need to build from source to run the tests?
Any remarks and improvements are welcome.

@google-cla
Copy link

google-cla bot commented Dec 17, 2020

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@tomburke-rse
Copy link
Author

@googlebot I signed it!

@saudet
Copy link
Contributor

saudet commented Dec 17, 2020 via email

@@ -103,6 +102,22 @@ private static OpList libraryOpList(TF_Library handle) {
}
}

public static List<DeviceSpec> listDevices(Optional<DeviceSpec.DeviceType> deviceType, TFE_Context ctx) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't expose TFE_Context at the public level, these classes are generated from the C API and we always wrapped them up in public endpoints for more flexibility and scalability. So in this case, I suggest maybe to create an EagerSession.Context static nested class that encapsulates a TFE_Context?

deviceList.add(devSpec);
}
TF_DeleteDeviceList(devices);
if(deviceType.isPresent()) return deviceList;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just general comment, be careful to format your code according to Google Java Style Guide, I saw a bunch of missing spaces in the code that will fail the lint checks to pass when enabled.

}
TF_DeleteDeviceList(devices);
if(deviceType.isPresent()) return deviceList;
return deviceList.stream().filter(d -> d.deviceType().equals(deviceType.get())).collect(Collectors.toList());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java streams tends to be slower than simple for loops, the overhead is even worst when manipulating such a small list of objects. While I don't think this method needs to be time-critical, I would suggest to take the simple/faster route here (just my two cents).

@karllessard
Copy link
Collaborator

karllessard commented Dec 18, 2020

Thanks for the contribution @tomburke-rse ,

Like @saudet said a simple mvn install should compile and run all tests but you can skip the 6-hours long native build by adding the dev profile to your command, i.e. mvn install -Pdev. This will fetch prebuild binaries for your platform.

@tomburke-rse
Copy link
Author

Thanks for alle the advice everyone, I'll add them asap.
In regards to the building: If I skip tests I can build with -Pdev. Otherwise, I get the same errors as the github buildpipeline. Would be great if anyone could take a look at it with a deeper understanding.

@karllessard
Copy link
Collaborator

mvn install -Pdev should build and run the tests without any trouble. Can you please share more context about the issues you are facing, e.g. an error message or a stacktrace?

1 similar comment
@karllessard
Copy link
Collaborator

mvn install -Pdev should build and run the tests without any trouble. Can you please share more context about the issues you are facing, e.g. an error message or a stacktrace?

@tomburke-rse
Copy link
Author

Like I mentioned, it is the same error as in the pipeline here, specifically only for the TensorFlowTest class.
Might the problem be that I use a windows pc?
Here is the output of my console, anyway:

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.0:test (default-test) on project tensorflow-core-api: There are test failures.

Please refer to C:\mpicbg\workspace\tensorflow\java\tensorflow-core\tensorflow-core-api\target\surefire-reports for the individual test results.
Please refer to dump files (if any exist) [date]-jvmRun[N].dump, [date].dumpstream and [date]-jvmRun[N].dumpstream.
The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
Command was cmd.exe /X /C "C:\Users\burke\.jdks\adopt-openj9-1.8.0_275\jre\bin\java -jar C:\Users\burke\AppData\Local\Temp\surefire236563113746082396\surefirebooter5751859365434514212.jar C:\Users\burke\AppData\Local\Temp\surefire236563113746082396 2020-12-18T13-57-26_766-jvmRun1 surefire2445852067572510918tmp surefire_05950149004635894208tmp"
Error occurred in starting fork, check output in log
Process Exit Code: -1
Crashed tests:
org.tensorflow.TensorFlowTest
org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
Command was cmd.exe /X /C "C:\Users\burke\.jdks\adopt-openj9-1.8.0_275\jre\bin\java -jar C:\Users\burke\AppData\Local\Temp\surefire236563113746082396\surefirebooter5751859365434514212.jar C:\Users\burke\AppData\Local\Temp\surefire236563113746082396 2020-12-18T13-57-26_766-jvmRun1 surefire2445852067572510918tmp surefire_05950149004635894208tmp"
Error occurred in starting fork, check output in log
Process Exit Code: -1
Crashed tests:
org.tensorflow.TensorFlowTest
	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:671)
	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:533)
	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:278)
	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:244)
	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1194)
	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1022)
	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:868)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:957)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:289)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:193)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:282)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:225)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347)
	at org.codehaus.classworlds.Launcher.main(Launcher.java:47)

@karllessard
Copy link
Collaborator

Just throwing ideas here, I've personally never tried it with OpenJ9, can you check if you have the same error with a standard OpenJDK version? Also, can we check if the test that crashes is yours by commenting out the custom op library test?

import org.tensorflow.internal.c_api.TF_Buffer;
import org.tensorflow.internal.c_api.TF_Library;
import org.tensorflow.internal.c_api.TF_Status;
import org.tensorflow.internal.c_api.*;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No star imports in the main classes.

@tomburke-rse
Copy link
Author

My own test is indeed the problem. Not sure why, but I will figure out the problem in the next days. Still a weird error.

@rnett
Copy link
Contributor

rnett commented Dec 18, 2020

I've seen this before, when the native process has a process killing error, since the process dies maven thinks it's a fork error. It should generate a dump file somewhere in the project with more info and the stacktrace.

@karllessard
Copy link
Collaborator

@tomburke-rse , are you unblocked now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants