Skip to content

Ollama timeout still doesn't work as expected #1614

Open
@maxw1489

Description

@maxw1489
  • This is actually a bug report.
  • I am not getting good LLM Results
  • I have tried asking for help in the community on discord or discussions and have not received a response.
  • I have tried searching the documentation and have not found an answer.

What Model are you using?

  • gpt-3.5-turbo
  • gpt-4-turbo
  • gpt-4
  • Other (Ollama mistral-small:24B)

Describe the bug
I tested the bugfix of #1597. Unfortunately, the bug still persists by iterating over all retries without accepting the global timeout.

To Reproduce
I tested it on a slightly modified example from docs/integrations/ollama.md namely:

import logging
logging.basicConfig(level=logging.DEBUG)
from openai import OpenAI
from pydantic import BaseModel
import instructor


class Character(BaseModel):
    name: str
    age: int


client = instructor.from_openai(
    OpenAI(
        base_url="http://10.10.10.115:11434/v1",
        api_key="ollama",  # required, but unused
    ),
    mode=instructor.Mode.JSON,
)

resp = client.chat.completions.create(
    model="mistral-small:24b-9k",
    messages=[
        {
            "role": "user",
            "content": "Tell me about Harry Potter",
        }
    ],
    response_model=Character,
    max_retries=2,
    timeout=1.0,  # Total timeout across all retry attempts
)

Here are the logs:
DEBUG:instructor:Patching client.chat.completions.create with mode=<Mode.JSON: 'json_mode'>
DEBUG:instructor:Instructor Request: mode.value='json_mode', response_model=<class 'main.Character'>, new_kwargs={'messages': [{'role': 'system', 'content': '\n As a genius expert, your task is to understand the content and provide\n the parsed objects in json that match the following json_schema:\n\n\n {\n "properties": {\n "name": {\n "title": "Name",\n "type": "string"\n },\n "age": {\n "title": "Age",\n "type": "integer"\n }\n },\n "required": [\n "name",\n "age"\n ],\n "title": "Character",\n "type": "object"\n}\n\n Make sure to return an instance of the JSON, not the schema itself\n'}, {'role': 'user', 'content': 'Tell me about Harry Potter'}], 'model': 'mistral-small:24b-9k', 'timeout': 1.0, 'response_format': {'type': 'json_object'}}
DEBUG:instructor:max_retries: 2, timeout: 1.0
DEBUG:instructor:Retrying, attempt: 1
DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/chat/completions', 'timeout': 1.0, 'files': None, 'idempotency_key': 'stainless-python-retry-583e0c32-bf69-4644-8654-9b148cf38734', 'json_data': {'messages': [{'role': 'system', 'content': '\n As a genius expert, your task is to understand the content and provide\n the parsed objects in json that match the following json_schema:\n\n\n {\n "properties": {\n "name": {\n "title": "Name",\n "type": "string"\n },\n "age": {\n "title": "Age",\n "type": "integer"\n }\n },\n "required": [\n "name",\n "age"\n ],\n "title": "Character",\n "type": "object"\n}\n\n Make sure to return an instance of the JSON, not the schema itself\n'}, {'role': 'user', 'content': 'Tell me about Harry Potter'}], 'model': 'mistral-small:24b-9k', 'response_format': {'type': 'json_object'}}}
DEBUG:openai._base_client:Sending HTTP Request: POST http://10.10.10.115:11434/v1/chat/completions
DEBUG:httpcore.connection:connect_tcp.started host='10.10.10.115' port=11434 local_address=None timeout=1.0 socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7be5ae49a000>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:receive_response_headers.failed exception=ReadTimeout(TimeoutError('timed out'))
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:openai._base_client:Encountered httpx.TimeoutException
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
yield
File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 250, in handle_request
resp = self._pool.handle_request(req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 256, in handle_request
raise exc from None
File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 236, in handle_request
response = connection.handle_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection.py", line 103, in handle_request
return self._connection.handle_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py", line 136, in handle_request
raise exc
File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py", line 106, in handle_request
) = self._receive_response_headers(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py", line 177, in _receive_response_headers
event = self._receive_event(timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py", line 217, in _receive_event
data = self._network_stream.read(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_backends/sync.py", line 126, in read
with map_exceptions(exc_map):
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/contextlib.py", line 158, in exit
self.gen.throw(value)
File "/usr/local/lib/python3.12/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
raise to_exc(exc) from exc
httpcore.ReadTimeout: timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 969, in request
response = self._client.send(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 914, in send
response = self._send_handling_auth(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 942, in _send_handling_auth
response = self._send_handling_redirects(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 979, in _send_handling_redirects
response = self._send_single_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1014, in _send_single_request
response = transport.handle_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 249, in handle_request
with map_httpcore_exceptions():
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/contextlib.py", line 158, in exit
self.gen.throw(value)
File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 118, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.ReadTimeout: timed out
DEBUG:openai._base_client:2 retries left
INFO:openai._base_client:Retrying request to /chat/completions in 0.475531 seconds
DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/chat/completions', 'timeout': 1.0, 'files': None, 'idempotency_key': 'stainless-python-retry-583e0c32-bf69-4644-8654-9b148cf38734', 'json_data': {'messages': [{'role': 'system', 'content': '\n As a genius expert, your task is to understand the content and provide\n the parsed objects in json that match the following json_schema:\n\n\n {\n "properties": {\n "name": {\n "title": "Name",\n "type": "string"\n },\n "age": {\n "title": "Age",\n "type": "integer"\n }\n },\n "required": [\n "name",\n "age"\n ],\n "title": "Character",\n "type": "object"\n}\n\n Make sure to return an instance of the JSON, not the schema itself\n'}, {'role': 'user', 'content': 'Tell me about Harry Potter'}], 'model': 'mistral-small:24b-9k', 'response_format': {'type': 'json_object'}}}
DEBUG:openai._base_client:Sending HTTP Request: POST http://10.10.10.115:11434/v1/chat/completions
DEBUG:httpcore.connection:connect_tcp.started host='10.10.10.115' port=11434 local_address=None timeout=1.0 socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7be5adf56030>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:receive_response_headers.failed exception=ReadTimeout(TimeoutError('timed out'))
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:openai._base_client:Encountered httpx.TimeoutException
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
yield
File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 250, in handle_request
resp = self._pool.handle_request(req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 256, in handle_request
raise exc from None
File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 236, in handle_request
response = connection.handle_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection.py", line 103, in handle_request
return self._connection.handle_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py", line 136, in handle_request
raise exc
File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py", line 106, in handle_request
) = self._receive_response_headers(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py", line 177, in _receive_response_headers
event = self._receive_event(timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py", line 217, in _receive_event
data = self._network_stream.read(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpcore/_backends/sync.py", line 126, in read
with map_exceptions(exc_map):
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/contextlib.py", line 158, in exit
self.gen.throw(value)
File "/usr/local/lib/python3.12/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
raise to_exc(exc) from exc
httpcore.ReadTimeout: timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 969, in request
response = self._client.send(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 914, in send
response = self._send_handling_auth(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 942, in _send_handling_auth
response = self._send_handling_redirects(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 979, in _send_handling_redirects
response = self._send_single_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1014, in _send_single_request
response = transport.handle_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 249, in handle_request
with map_httpcore_exceptions():
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/contextlib.py", line 158, in exit
self.gen.throw(value)
File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 118, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.ReadTimeout: timed out
DEBUG:openai._base_client:1 retry left
INFO:openai._base_client:Retrying request to /chat/completions in 0.785212 seconds
DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/chat/completions', 'timeout': 1.0, 'files': None, 'idempotency_key': 'stainless-python-retry-583e0c32-bf69-4644-8654-9b148cf38734', 'json_data': {'messages': [{'role': 'system', 'content': '\n As a genius expert, your task is to understand the content and provide\n the parsed objects in json that match the following json_schema:\n\n\n {\n "properties": {\n "name": {\n "title": "Name",\n "type": "string"\n },\n "age": {\n "title": "Age",\n "type": "integer"\n }\n },\n "required": [\n "name",\n "age"\n ],\n "title": "Character",\n "type": "object"\n}\n\n Make sure to return an instance of the JSON, not the schema itself\n'}, {'role': 'user', 'content': 'Tell me about Harry Potter'}], 'model': 'mistral-small:24b-9k', 'response_format': {'type': 'json_object'}}}
DEBUG:openai._base_client:Sending HTTP Request: POST http://10.10.10.115:11434/v1/chat/completions
DEBUG:httpcore.connection:connect_tcp.started host='10.10.10.115' port=11434 local_address=None timeout=1.0 socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7be5adf56b10>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 500, b'Internal Server Error', [(b'Content-Type', b'application/json'), (b'Date', b'Wed, 18 Jun 2025 17:32:37 GMT'), (b'Content-Length', b'119')])
INFO:httpx:HTTP Request: POST http://10.10.10.115:11434/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:openai._base_client:HTTP Response: POST http://10.10.10.115:11434/v1/chat/completions "500 Internal Server Error" Headers({'content-type': 'application/json', 'date': 'Wed, 18 Jun 2025 17:32:37 GMT', 'content-length': '119'})
DEBUG:openai._base_client:request_id: None
DEBUG:openai._base_client:Encountered httpx.HTTPStatusError
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1014, in request
response.raise_for_status()
File "/usr/local/lib/python3.12/site-packages/httpx/_models.py", line 829, in raise_for_status
raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'http://10.10.10.115:11434/v1/chat/completions'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500
DEBUG:openai._base_client:Re-raising status error
DEBUG:instructor:Retry error: RetryError[<Future at 0x7be5adf57aa0 state=finished raised InternalServerError>]
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/instructor/retry.py", line 184, in retry_sync
response = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/openai/_utils/_utils.py", line 287, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/openai/resources/chat/completions/completions.py", line 925, in create
return self._post(
^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1239, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1034, in request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'error': {'message': 'unexpected server status: llm server loading model', 'type': 'api_error', 'param': None, 'code': None}}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/instructor/retry.py", line 179, in retry_sync
for attempt in max_retries:
^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/tenacity/init.py", line 445, in iter
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/tenacity/init.py", line 378, in iter
result = action(retry_state)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/tenacity/init.py", line 421, in exc_check
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7be5adf57aa0 state=finished raised InternalServerError>]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.12/runpy.py", line 198, in _run_module_as_main
return _run_code(code, main_globals, None,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/runpy.py", line 88, in _run_code
exec(code, run_globals)
File "/home/ai/.cursor-server/extensions/ms-python.debugpy-2025.8.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 71, in
cli.main()
File "/home/ai/.cursor-server/extensions/ms-python.debugpy-2025.8.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 501, in main
run()
File "/home/ai/.cursor-server/extensions/ms-python.debugpy-2025.8.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 351, in run_file
runpy.run_path(target, run_name="main")
File "/home/ai/.cursor-server/extensions/ms-python.debugpy-2025.8.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310, in run_path
return _run_module_code(code, init_globals, run_name, pkg_name=pkg_name, script_name=fname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ai/.cursor-server/extensions/ms-python.debugpy-2025.8.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127, in _run_module_code
_run_code(code, mod_globals, init_globals, mod_name, mod_spec, pkg_name, script_name)
File "/home/ai/.cursor-server/extensions/ms-python.debugpy-2025.8.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118, in _run_code
exec(code, run_globals)
File "/home/ai/dev/dev/model_validator/lithon/production_pipeline/retry.py", line 24, in
resp = client.chat.completions.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/instructor/client.py", line 366, in create
return self.create_fn(
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/instructor/patch.py", line 195, in new_create_sync
response = retry_sync(
^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/instructor/retry.py", line 210, in retry_sync
raise InstructorRetryException(
instructor.exceptions.InstructorRetryException: Error code: 500 - {'error': {'message': 'unexpected server status: llm server loading model', 'type': 'api_error', 'param': None, 'code': None}}

Expected behavior
Timeout happens after the global time without retries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions