-
Notifications
You must be signed in to change notification settings - Fork 185
Workflows fail to stop while workflow inference profiling is enabled (readonly filesystem) #1110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looks like stopping inference causes all future API requests to timeout, even with profiling disabled. Returns a 200 OK for the /terminate call and then locks up. No logs or errors coming up though. Might be a different issue. Unclear. |
hi there - thanks for reporting. Could you provide more context. Are you running video inference or inference against images? What is the sequence of actions that lead for the error? |
Sure thing. I set up the template Time in Zone workflow. I am running video inference on an RTSP stream using that workflow. I am able to successfully start inference, and the workflow runs. However, when I try to terminate the pipeline, one of the following cases occur: Case 1If
Case 2If The request never reaches the handler function. I added logging before anything is done with /initialise, and it never hits. Very weird. Replication
Inference code for referencefrom inference_sdk import InferenceHTTPClient
import atexit
import time
client = InferenceHTTPClient(
api_url="http://192.168.1.197:9001",
api_key=""
)
max_fps = 1
result = client.start_inference_pipeline_with_workflow(
video_reference=["rtsp://192.168.0.197:554/cam/realmonitor?channel=1&subtype=1"],
workspace_name="local",
workflow_id="time-in-zone",
max_fps=max_fps,
results_buffer_size=5, # results are consumed from in-memory buffer - optionally you can control its size
)
print(result)
pipeline_id = result["context"]["pipeline_id"]
# Terminate the pipeline when the script exits
atexit.register(lambda: client.terminate_inference_pipeline(pipeline_id))
while True:
result = client.consume_inference_pipeline_result(pipeline_id=pipeline_id)
if not result["outputs"] or not result["outputs"][0]:
# still initializing
time.sleep(1/max_fps)
continue
output = result["outputs"][0]
print(output["time_in_zone"])
time.sleep(1/max_fps) |
Looks like Case 2 might be hanging on getting a response from the pipeline socket when trying to initialize the pipeline the second time. Able to get logs to appear if I hit a different endpoint (e.g. /workflows/definition/schema). Successful command
Hanging initialize command
Edit: I have confirmed this is the case. I set
|
ok, cool - thanks for all of the evidences - will take a look when I have a moment, but cannot promise strict deadline. Is that something you can run for a while with (I mean setting the workaround)? |
this PR should bring solution: #1123 |
hi there - would you be able to verify on your end? |
sorry - did not actually wake up. We did not merge the change with fix to main before release, as it was on the branch with other changes I did not finish. Sorry for confusion. |
The fix does seem to resolve the profiling issue. I am still only able to run a single pipeline though before it hangs. It hangs on trying to start again. I can make a separate issue for that, thanks for the profiling fix. |
we had a PR that may have addressed this issue this week - please post the issue with more details if problem is still there after next release (planned end of this week) |
Uh oh!
There was an error while loading. Please reload this page.
Search before asking
Bug
inference/inference/core/interfaces/stream/inference_pipeline.py
Line 484 in 295ca68
Because the default profiling directory is found in "./inference_profiling" and workflow profiling is enabled by default, this fails to stop workflow inference as it errors when trying to write to this folder. Disabling workflow profiling via
ENABLE_WORKFLOWS_PROFILING=False
environment variable is sufficient to resolve the issue.This is a workaround. I assume a proper fix would be to write to /tmp to support profiling? Not sure.
Environment
Inference: 0.44.0
OS: Ubuntu Server 24.04
Minimal Reproducible Example
Preview any workflow and attempt to stop it.
Additional
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: