Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TFFailedPreconditionException with BigGAN-deep model from TensorFlow Hub #365

Closed
AzizZayed opened this issue Aug 5, 2021 · 9 comments · Fixed by #376
Closed

TFFailedPreconditionException with BigGAN-deep model from TensorFlow Hub #365

AzizZayed opened this issue Aug 5, 2021 · 9 comments · Fixed by #376

Comments

@AzizZayed
Copy link

System information

  • OS: macOS Big Sur 11.3.1
  • TensorFlow-java version: 0.3.1
  • Java version: I tried both 1.8 and 11

Issue

I am able to load this biggan-deep model into java with the SavedModelBundle.Loader but when I run it, I get

TFFailedPreconditionException: Error while reading resource variable prev_truncation from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/prev_truncation)

The model works in Python. I initially encountered the issue when I used DJL to load and run the model. I tried it with TF-Java to see if the issue is in DJL or TF-Java.

Describe the expected behavior

The model should run fine just like in python.

Java code to reproduce the issue

The main method and the loadModel(...) are the important ones, the rest are just tensor operations.

public static void main(String[] args) {
    SavedModelBundle model = loadModel("src/test/resources/biggan-deep-128_1", new String[0]);

    int[] input = {100, 207, 971, 970, 933}; // image classes

    Tensor y = oneHot(input, 1000);
    Tensor z = truncNormal(input.length, 128);

    Tensor result =
            model.session().runner()
                    .feed("y", y)
                    .feed("z", z)
                    .feed("truncation", TFloat32.scalarOf(0.5f))
                    .fetch("G_trunc_output").run().get(0);
}

private static SavedModelBundle loadModel(String dir, String[] tags) {
    SavedModelBundle.Loader loader = SavedModelBundle.loader(dir);

    try {
        Field field = SavedModelBundle.Loader.class.getDeclaredField("tags");
        field.setAccessible(true);
        field.set(loader, tags);
    } catch (ReflectiveOperationException e) {
        throw new AssertionError(e);
    }

    return loader.load();
}

private static Tensor truncNormal(int row, int col) {
    float[][] dist = new float[row][col];
    Random random = new Random();

    for (int i = 0; i < row; i++) {
        for (int j = 0; j < col; j++) {
            double sample = random.nextGaussian();
            while (sample < -2 || sample > 2) {
                sample = random.nextGaussian();
            }
            dist[i][j] = (float) sample;
        }
    }
    return toTensor(Shape.of(row, col), dist);
}

private static Tensor oneHot(int[] input, int numCategories) {
    float[][] dist = new float[input.length][numCategories];
    for (int i = 0; i < input.length; i++) {
        float[] row = new float[numCategories];

        Arrays.fill(row, 0.0f);
        row[input[i]] = 1.0f;

        dist[i] = row;
    }
    return toTensor(Shape.of(input.length, numCategories), dist);
}

private static TFloat32 toTensor(Shape shape, float[][] data) {
    FloatNdArray mat = NdArrays.ofFloats(shape);
    for (int i = 0; i < data.length; i++) {
        mat.set(TFloat32.vectorOf(data[i]), i);
    }
    return TFloat32.tensorOf(mat);
}

build.gradle:

dependencies {
    compile group: 'org.tensorflow', name: 'tensorflow-core-platform', version: '0.3.1'
}

Other info / logs

Console output:

> Task :Main.main()
2021-08-04 22:07:22.105450: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:32] Reading SavedModel from: src/test/resources/biggan-deep-128_1
2021-08-04 22:07:22.295033: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:55] Reading meta graph with tags {  }
2021-08-04 22:07:22.295055: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:93] Reading SavedModel debug info (if present) from: src/test/resources/biggan-deep-128_1
2021-08-04 22:07:22.295136: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] 
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-04 22:07:23.514662: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags {  }; Status: success: OK. Took 1409219 microseconds.
Exception in thread "main" org.tensorflow.exceptions.TFFailedPreconditionException: Error while reading resource variable prev_truncation from Container: localhost. 
This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/prev_truncation)
	 [[{{node Equal/ReadVariableOp}}]]
	at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:95)
	at org.tensorflow.Session.run(Session.java:691)
	at org.tensorflow.Session.access$100(Session.java:72)
	at org.tensorflow.Session$Runner.runHelper(Session.java:381)
	at org.tensorflow.Session$Runner.run(Session.java:329)
	at ai.tf.Main.main(Main.java:32)

> Task :Main.main() FAILED

Execution failed for task ':Main.main()'.
> Process 'command '/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/bin/java'' finished with non-zero exit value 1

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
@Craigacp
Copy link
Collaborator

Craigacp commented Aug 5, 2021

I can replicate this. We know about the empty tags issue that you're working around with reflection, that'll get fixed soon, but I'm not sure why the SavedModelBundle has uninitialized variables in it. You can work around it by running model.session().run("init"); before using the bundle.

We're currently working on the init ops to see if we can make it more compatible with TF Python's multiple incompatible ways of doing this, so we'll add this to the list of things to fix. FYI @rnett.

For future reference/when I need to Google this later, I found the init op name by dumping all the ops to a file and grepping for "init" using this code in jshell:

var itr = bundle.graph().operations();
try (PrintWriter w = new PrintWriter(new FileWriter("ops.txt"))) {
    while (itr.hasNext()) {
        var op = itr.next();
        w.println(""+op);
    }
}

@AzizZayed
Copy link
Author

Adding model.session().run("init"); worked wonders for this specific model. Thank you for that. However, if I add this line before running a model that already worked, will it break? In other words, does this fix do anything in the background that can break existing and working code?

@Craigacp
Copy link
Collaborator

I suspect this issue might be specific to GANs because they generate things, but I'm not sure if the naming of init nodes is consistent across different models. There might be something in the protobuf which names the init node that we haven't discovered yet.

If it doesn't exist in the model you've loaded it will throw an exception, and if it does exist then it might re-initialize the weights zeroing out the training, so doing this blindly is a bad idea.

@rnett
Copy link
Contributor

rnett commented Aug 11, 2021

That particular model is using TF1 stuff (see https://www.tensorflow.org/hub/api_docs/python/hub/Module deprecation notice), so I'm not sure whether that's something we can support automatically. I'll look into it though.

@Craigacp where is your ops.txt file you greped from? Did you load the model to Java and then dump the names? Since I don't see any op with that name in the saved_model.pb file from Hub.

@Craigacp
Copy link
Collaborator

Yes, I loaded the Saved Model then pulled out the graph and iterated the operations to generate a list of the ops.

@rnett
Copy link
Contributor

rnett commented Aug 11, 2021

Ok, this will definitely be handled by SavedModelBundle.load for TF2 models, see https://github.com/tensorflow/tensorflow/blob/master/tensorflow/cc/saved_model/loader.cc#L428. It finds and executes the init op itself when the saved model is exported with one. I would like to figure out how to export our init scope as init ops.

TF does variable restoration (from assets) before running init ops, which imo seems backwards, so we'll have to work around that.

@pyrator
Copy link

pyrator commented Aug 17, 2021

FYI @Craigacp @rnett I've noticed other TF1 models sometimes require multiple named init ops e.g.
https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1 requires both
model.session().run("hub_input/index_to_string_1/table_init");
model.session().run("hub_input/index_to_string/table_init");

@Craigacp
Copy link
Collaborator

Looks like the table initializers are stored in a specific place in TF Hub v1 models - https://github.com/tensorflow/hub/blob/67f62bc46f58ccceabeec6fd6799236675f50f04/tensorflow_hub/native_module.py#L486, referencing https://github.com/tensorflow/tensorflow/blob/5dcfc51118817f27fad5246812d83e5dccdc5f72/tensorflow/python/framework/ops.py#L6174. Then it initializes the rest of the variables from the checkpoint https://github.com/tensorflow/hub/blob/c8403953fdd429ea4c9ad1a96869eef4182a3b6f/tensorflow_hub/native_module.py#L458.

Presumably we could iterate the MetaGraphDef's collections to find the one that holds all the table initializers and execute that on startup if it exists. What do you think @rnett?

@rnett
Copy link
Contributor

rnett commented Aug 17, 2021

Should be easy enough to do, but I think it's better done as part of SavedModelBundle. Once init scope is merged I'm planning a PR to do the loading mentioned above, I can do this too. It looks like there's the INIT_OP key for the init op on tf v1 models, too, so I can do that as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants