Bug reported: Tensorflow GPU version(Cuda) may cause thread blocking when calling scripts across languages. C#. #46952

RickyYCheng · 2021-02-05T13:02:07Z

Actually I'm not sure whether it's a bug 'cause I've solved my problem, but i'll still trying to make a report.

Basically, I wrote a py script to generate a TFLite model.
Trying to run the test code, and worked good.

import threading
from tensorflow import keras
import tensorflow as tf
import numpy as np
import sys
import time
import asyncio

def load_interpreter(model_dir:str):
    interpreter = tf.lite.Interpreter(model_path=model_dir)
    interpreter.allocate_tensors()
    return interpreter
def classify_single_image(img_dir:str, interpreter:tf.lite.Interpreter):
    start_time = time.time()
    img = keras.preprocessing.image.load_img(img_dir, target_size=(160, 160))
    img_array = keras.preprocessing.image.img_to_array(img)
    img_array = np.expand_dims(img_array, 0)
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    interpreter.set_tensor(input_details[0]['index'], img_array)
    interpreter.invoke()
    output_data = np.squeeze(interpreter.get_tensor(output_details[0]['index']))
    probs = tf.nn.softmax(output_data)
    probs = np.round(probs * 100, 2)
    elapsed_time = time.time()
    result = "{}+{:.2f}".format(probs, (elapsed_time - start_time) * 1000)
    print(result, flush=True)

interpreter = load_interpreter("trash_model_lite.tflite")
threading.Thread(classify_single_image("1.jpg", interpreter)).start()
threading.Thread(classify_single_image("2.jpg", interpreter)).start()
threading.Thread(classify_single_image("3.jpg", interpreter)).start()
threading.Thread(classify_single_image("4.jpg", interpreter)).start()
threading.Thread(classify_single_image("5.jpg", interpreter)).start()
# 很显然，如果继续这样做, interpreter 不是线程安全的. 需要加锁.

i'm tring to call it in my C# project just by openning python and let it to run the script
Problem is that C# cannot get any result, and i'll give the code in the end 'cause this is a tf bug report.
i'm trying to see where's the problem, just by print("n", flush=True) after every slice of code.
The result shows that the tensorflow module will "BLOCK"(maybe not, but will stop here) the program such as:

    print("0", flush=True) # will see 0 in the python console and C# console
    img_array = tf.expand_dims(img_array, 0)
    print("1", flush=True) # will see 1 printed in the python console but not in C# console

trying to avoid tensorflow will exam the hypothesis:

    print("0", flush=True) # 0 in python and C#
    img_array = np.expand_dims(img_array, 0)
    print("1", flush=True) # 1 in python and C#
......
    probs = tf.nn.softmax(output_data) # still wrong

I suddenly understand that the tensorflow module might be made a mistake, so i'm tring to install cpu version and get the correct answer by only work with cpu.

import os
os.environ["CUDA_VISIBLE_DEVICES"]="-1"
import tensorflow as tf
......

After delete the print code, i've got the almost same answer in both C# and python, and which is correct after deleting the empty line.

[  0.   0.   0. 100.]+960.62
[9.997e+01 0.000e+00 0.000e+00 2.000e-02]+31.92
[  0. 100.   0.   0.]+35.90
[ 0.35  0.2  41.75 57.7 ]+32.91
[  0.   0. 100.   0.]+33.91

[  0.   0.   0. 100.]+382.50
[9.997e+01 0.000e+00 0.000e+00 2.000e-02]+31.99
[  0. 100.   0.   0.]+31.91
[ 0.35  0.2  41.75 57.7 ]+31.96
[  0.   0. 100.   0.]+32.90

If i delete the code which forbide the cuda, the C# program will still wrong. So the right thing is to just work on cpu, and i think this maybe a bug( or config problem maybe)

THESE ARE MY SYSTEM CONFIG:
OS: Windows 10 from Microsoft
Python: 3.8, download in python.org
Tensorflow: GPU==2.4.1, CPU==2.4.1( installed after the bug became) with tflite, download with pip
Keras: download with TF
IDE or Text Editor: VSCode, Visual Studio
Cuda: 11.0
Cuddn: the right version with cuda 11.0
GPU: Laptop gpu -- MX250
Bazel: No bazel.

Config with C#:
.Net: .Net 5 or .Net Core 3.1
Project: dotnet new Console
Nuget packages: None

C# code :

using System;
using System.Diagnostics;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            string[] args_array = new string[2];
            string pyDir = "test.py";
            args_array[0] = "";
            args_array[1] = "";
            RunPythonScript(pyDir,"", args_array);
        }
        public static void RunPythonScript(string sArgName, string args = "", params string[] teps)
        {
            Process p = new Process();
            p.StartInfo.FileName = @"C:/Program Files/Python38/python.exe";
            string sArguments = sArgName;
            foreach (string sigstr in teps)
            {
                sArguments += " " + sigstr;
            }
            sArguments += " " + args;
            p.StartInfo.Arguments = sArguments;
            p.StartInfo.UseShellExecute = false;
            p.StartInfo.RedirectStandardOutput = true;
            p.StartInfo.RedirectStandardInput = true;
            p.StartInfo.RedirectStandardError = true;
            p.StartInfo.CreateNoWindow = true;
            p.OutputDataReceived += new DataReceivedEventHandler(p_OutputDataReceived);
            p.Start();
            p.BeginOutputReadLine();
            p.WaitForExit();
            Console.ReadLine();
        }
        static void p_OutputDataReceived(object sender, DataReceivedEventArgs e)
        {
            if (!string.IsNullOrEmpty(e.Data))
            {
                AppendText(e.Data + Environment.NewLine);
            }
        }
        public delegate void AppendTextCallback(string text);
        public static void AppendText(string text)
        {
            Console.WriteLine(text);
        }
    }
}

And i hope you could solve your problem like this. That's why i made this report. Have a good time! :)

RickyYCheng added the type:bug Bug label Feb 5, 2021

google-ml-butler bot assigned ravikyram Feb 5, 2021

ravikyram added comp:gpu GPU related issues TF 2.4 for issues related to TF 2.4 labels Feb 8, 2021

ravikyram assigned jvishnuvardhan and unassigned ravikyram Feb 8, 2021

jvishnuvardhan assigned sanjoy and unassigned jvishnuvardhan Feb 9, 2021

jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Feb 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug reported: Tensorflow GPU version(Cuda) may cause thread blocking when calling scripts across languages. C#. #46952

Bug reported: Tensorflow GPU version(Cuda) may cause thread blocking when calling scripts across languages. C#. #46952

RickyYCheng commented Feb 5, 2021 •

edited

Bug reported: Tensorflow GPU version(Cuda) may cause thread blocking when calling scripts across languages. C#. #46952

Bug reported: Tensorflow GPU version(Cuda) may cause thread blocking when calling scripts across languages. C#. #46952

Comments

RickyYCheng commented Feb 5, 2021 • edited

RickyYCheng commented Feb 5, 2021 •

edited