TFFailedPreconditionException with BigGAN-deep model from TensorFlow Hub #365

AzizZayed · 2021-08-05T05:33:22Z

System information

OS: macOS Big Sur 11.3.1
TensorFlow-java version: 0.3.1
Java version: I tried both 1.8 and 11

Issue

I am able to load this biggan-deep model into java with the SavedModelBundle.Loader but when I run it, I get

TFFailedPreconditionException: Error while reading resource variable prev_truncation from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/prev_truncation)

The model works in Python. I initially encountered the issue when I used DJL to load and run the model. I tried it with TF-Java to see if the issue is in DJL or TF-Java.

Describe the expected behavior

The model should run fine just like in python.

Java code to reproduce the issue

The main method and the loadModel(...) are the important ones, the rest are just tensor operations.

public static void main(String[] args) {
    SavedModelBundle model = loadModel("src/test/resources/biggan-deep-128_1", new String[0]);

    int[] input = {100, 207, 971, 970, 933}; // image classes

    Tensor y = oneHot(input, 1000);
    Tensor z = truncNormal(input.length, 128);

    Tensor result =
            model.session().runner()
                    .feed("y", y)
                    .feed("z", z)
                    .feed("truncation", TFloat32.scalarOf(0.5f))
                    .fetch("G_trunc_output").run().get(0);
}

private static SavedModelBundle loadModel(String dir, String[] tags) {
    SavedModelBundle.Loader loader = SavedModelBundle.loader(dir);

    try {
        Field field = SavedModelBundle.Loader.class.getDeclaredField("tags");
        field.setAccessible(true);
        field.set(loader, tags);
    } catch (ReflectiveOperationException e) {
        throw new AssertionError(e);
    }

    return loader.load();
}

private static Tensor truncNormal(int row, int col) {
    float[][] dist = new float[row][col];
    Random random = new Random();

    for (int i = 0; i < row; i++) {
        for (int j = 0; j < col; j++) {
            double sample = random.nextGaussian();
            while (sample < -2 || sample > 2) {
                sample = random.nextGaussian();
            }
            dist[i][j] = (float) sample;
        }
    }
    return toTensor(Shape.of(row, col), dist);
}

private static Tensor oneHot(int[] input, int numCategories) {
    float[][] dist = new float[input.length][numCategories];
    for (int i = 0; i < input.length; i++) {
        float[] row = new float[numCategories];

        Arrays.fill(row, 0.0f);
        row[input[i]] = 1.0f;

        dist[i] = row;
    }
    return toTensor(Shape.of(input.length, numCategories), dist);
}

private static TFloat32 toTensor(Shape shape, float[][] data) {
    FloatNdArray mat = NdArrays.ofFloats(shape);
    for (int i = 0; i < data.length; i++) {
        mat.set(TFloat32.vectorOf(data[i]), i);
    }
    return TFloat32.tensorOf(mat);
}

build.gradle:

dependencies {
    compile group: 'org.tensorflow', name: 'tensorflow-core-platform', version: '0.3.1'
}

Other info / logs

Console output:

> Task :Main.main()
2021-08-04 22:07:22.105450: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:32] Reading SavedModel from: src/test/resources/biggan-deep-128_1
2021-08-04 22:07:22.295033: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:55] Reading meta graph with tags {  }
2021-08-04 22:07:22.295055: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:93] Reading SavedModel debug info (if present) from: src/test/resources/biggan-deep-128_1
2021-08-04 22:07:22.295136: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] 
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-04 22:07:23.514662: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags {  }; Status: success: OK. Took 1409219 microseconds.
Exception in thread "main" org.tensorflow.exceptions.TFFailedPreconditionException: Error while reading resource variable prev_truncation from Container: localhost. 
This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/prev_truncation)
	 [[{{node Equal/ReadVariableOp}}]]
	at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:95)
	at org.tensorflow.Session.run(Session.java:691)
	at org.tensorflow.Session.access$100(Session.java:72)
	at org.tensorflow.Session$Runner.runHelper(Session.java:381)
	at org.tensorflow.Session$Runner.run(Session.java:329)
	at ai.tf.Main.main(Main.java:32)

> Task :Main.main() FAILED

Execution failed for task ':Main.main()'.
> Process 'command '/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/bin/java'' finished with non-zero exit value 1

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

The text was updated successfully, but these errors were encountered:

Craigacp · 2021-08-05T13:50:43Z

I can replicate this. We know about the empty tags issue that you're working around with reflection, that'll get fixed soon, but I'm not sure why the SavedModelBundle has uninitialized variables in it. You can work around it by running model.session().run("init"); before using the bundle.

We're currently working on the init ops to see if we can make it more compatible with TF Python's multiple incompatible ways of doing this, so we'll add this to the list of things to fix. FYI @rnett.

For future reference/when I need to Google this later, I found the init op name by dumping all the ops to a file and grepping for "init" using this code in jshell:

var itr = bundle.graph().operations();
try (PrintWriter w = new PrintWriter(new FileWriter("ops.txt"))) {
    while (itr.hasNext()) {
        var op = itr.next();
        w.println(""+op);
    }
}

AzizZayed · 2021-08-10T00:04:52Z

Adding model.session().run("init"); worked wonders for this specific model. Thank you for that. However, if I add this line before running a model that already worked, will it break? In other words, does this fix do anything in the background that can break existing and working code?

Craigacp · 2021-08-10T00:41:57Z

I suspect this issue might be specific to GANs because they generate things, but I'm not sure if the naming of init nodes is consistent across different models. There might be something in the protobuf which names the init node that we haven't discovered yet.

If it doesn't exist in the model you've loaded it will throw an exception, and if it does exist then it might re-initialize the weights zeroing out the training, so doing this blindly is a bad idea.

rnett · 2021-08-11T23:18:28Z

That particular model is using TF1 stuff (see https://www.tensorflow.org/hub/api_docs/python/hub/Module deprecation notice), so I'm not sure whether that's something we can support automatically. I'll look into it though.

@Craigacp where is your ops.txt file you greped from? Did you load the model to Java and then dump the names? Since I don't see any op with that name in the saved_model.pb file from Hub.

Craigacp · 2021-08-11T23:23:59Z

Yes, I loaded the Saved Model then pulled out the graph and iterated the operations to generate a list of the ops.

rnett · 2021-08-11T23:31:51Z

Ok, this will definitely be handled by SavedModelBundle.load for TF2 models, see https://github.com/tensorflow/tensorflow/blob/master/tensorflow/cc/saved_model/loader.cc#L428. It finds and executes the init op itself when the saved model is exported with one. I would like to figure out how to export our init scope as init ops.

TF does variable restoration (from assets) before running init ops, which imo seems backwards, so we'll have to work around that.

pyrator · 2021-08-17T14:04:57Z

FYI @Craigacp @rnett I've noticed other TF1 models sometimes require multiple named init ops e.g.
https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1 requires both
model.session().run("hub_input/index_to_string_1/table_init");
model.session().run("hub_input/index_to_string/table_init");

Craigacp · 2021-08-17T14:39:31Z

Looks like the table initializers are stored in a specific place in TF Hub v1 models - https://github.com/tensorflow/hub/blob/67f62bc46f58ccceabeec6fd6799236675f50f04/tensorflow_hub/native_module.py#L486, referencing https://github.com/tensorflow/tensorflow/blob/5dcfc51118817f27fad5246812d83e5dccdc5f72/tensorflow/python/framework/ops.py#L6174. Then it initializes the rest of the variables from the checkpoint https://github.com/tensorflow/hub/blob/c8403953fdd429ea4c9ad1a96869eef4182a3b6f/tensorflow_hub/native_module.py#L458.

Presumably we could iterate the MetaGraphDef's collections to find the one that holds all the table initializers and execute that on startup if it exists. What do you think @rnett?

rnett · 2021-08-17T18:46:41Z

Should be easy enough to do, but I think it's better done as part of SavedModelBundle. Once init scope is merged I'm planning a PR to do the loading mentioned above, I can do this too. It looks like there's the INIT_OP key for the init op on tf v1 models, too, so I can do that as well.

SennriSyunnga mentioned this issue Aug 25, 2021

【JavaAPI】IllegalStateException happended while running a model loading from SavedModel and the graph instance cant close itself tensorflow/tensorflow#51648

Closed

rnett mentioned this issue Aug 28, 2021

Init exporting and loading #376

Merged

karllessard closed this as completed in #376 Oct 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TFFailedPreconditionException with BigGAN-deep model from TensorFlow Hub #365

TFFailedPreconditionException with BigGAN-deep model from TensorFlow Hub #365

AzizZayed commented Aug 5, 2021

Craigacp commented Aug 5, 2021 •

edited

AzizZayed commented Aug 10, 2021

Craigacp commented Aug 10, 2021

rnett commented Aug 11, 2021

Craigacp commented Aug 11, 2021

rnett commented Aug 11, 2021 •

edited

pyrator commented Aug 17, 2021

Craigacp commented Aug 17, 2021

rnett commented Aug 17, 2021

TFFailedPreconditionException with BigGAN-deep model from TensorFlow Hub #365

TFFailedPreconditionException with BigGAN-deep model from TensorFlow Hub #365

Comments

AzizZayed commented Aug 5, 2021

Craigacp commented Aug 5, 2021 • edited

AzizZayed commented Aug 10, 2021

Craigacp commented Aug 10, 2021

rnett commented Aug 11, 2021

Craigacp commented Aug 11, 2021

rnett commented Aug 11, 2021 • edited

pyrator commented Aug 17, 2021

Craigacp commented Aug 17, 2021

rnett commented Aug 17, 2021

Craigacp commented Aug 5, 2021 •

edited

rnett commented Aug 11, 2021 •

edited