Ollama timeout still doesn't work as expected

- [X] This is actually a bug report.
- [ ] I am not getting good LLM Results
- [ ] I have tried asking for help in the community on discord or discussions and have not received a response.
- [ ] I have tried searching the documentation and have not found an answer.

**What Model are you using?**

- [ ] gpt-3.5-turbo
- [ ] gpt-4-turbo
- [ ] gpt-4
- [ ] Other (Ollama mistral-small:24B)

**Describe the bug**
I tested the bugfix of https://github.com/567-labs/instructor/issues/1597. Unfortunately, the bug still persists by iterating over all retries without accepting the global timeout.

**To Reproduce**
I tested it on a slightly modified example from docs/integrations/ollama.md namely:
```
import logging
logging.basicConfig(level=logging.DEBUG)
from openai import OpenAI
from pydantic import BaseModel
import instructor


class Character(BaseModel):
    name: str
    age: int


client = instructor.from_openai(
    OpenAI(
        base_url="http://10.10.10.115:11434/v1",
        api_key="ollama",  # required, but unused
    ),
    mode=instructor.Mode.JSON,
)

resp = client.chat.completions.create(
    model="mistral-small:24b-9k",
    messages=[
        {
            "role": "user",
            "content": "Tell me about Harry Potter",
        }
    ],
    response_model=Character,
    max_retries=2,
    timeout=1.0,  # Total timeout across all retry attempts
)
```

Here are the logs:
DEBUG:instructor:Patching `client.chat.completions.create` with mode=<Mode.JSON: 'json_mode'>
DEBUG:instructor:Instructor Request: mode.value='json_mode', response_model=<class '__main__.Character'>, new_kwargs={'messages': [{'role': 'system', 'content': '\n        As a genius expert, your task is to understand the content and provide\n        the parsed objects in json that match the following json_schema:\n\n\n        {\n  "properties": {\n    "name": {\n      "title": "Name",\n      "type": "string"\n    },\n    "age": {\n      "title": "Age",\n      "type": "integer"\n    }\n  },\n  "required": [\n    "name",\n    "age"\n  ],\n  "title": "Character",\n  "type": "object"\n}\n\n        Make sure to return an instance of the JSON, not the schema itself\n'}, {'role': 'user', 'content': 'Tell me about Harry Potter'}], 'model': 'mistral-small:24b-9k', 'timeout': 1.0, 'response_format': {'type': 'json_object'}}
DEBUG:instructor:max_retries: 2, timeout: 1.0
DEBUG:instructor:Retrying, attempt: 1
DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/chat/completions', 'timeout': 1.0, 'files': None, 'idempotency_key': 'stainless-python-retry-583e0c32-bf69-4644-8654-9b148cf38734', 'json_data': {'messages': [{'role': 'system', 'content': '\n        As a genius expert, your task is to understand the content and provide\n        the parsed objects in json that match the following json_schema:\n\n\n        {\n  "properties": {\n    "name": {\n      "title": "Name",\n      "type": "string"\n    },\n    "age": {\n      "title": "Age",\n      "type": "integer"\n    }\n  },\n  "required": [\n    "name",\n    "age"\n  ],\n  "title": "Character",\n  "type": "object"\n}\n\n        Make sure to return an instance of the JSON, not the schema itself\n'}, {'role': 'user', 'content': 'Tell me about Harry Potter'}], 'model': 'mistral-small:24b-9k', 'response_format': {'type': 'json_object'}}}
DEBUG:openai._base_client:Sending HTTP Request: POST http://10.10.10.115:11434/v1/chat/completions
DEBUG:httpcore.connection:connect_tcp.started host='10.10.10.115' port=11434 local_address=None timeout=1.0 socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7be5ae49a000>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:receive_response_headers.failed exception=ReadTimeout(TimeoutError('timed out'))
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:openai._base_client:Encountered httpx.TimeoutException
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
    yield
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 250, in handle_request
    resp = self._pool.handle_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 256, in handle_request
    raise exc from None
  File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 236, in handle_request
    response = connection.handle_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection.py", line 103, in handle_request
    return self._connection.handle_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py", line 136, in handle_request
    raise exc
  File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py", line 106, in handle_request
    ) = self._receive_response_headers(**kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py", line 177, in _receive_response_headers
    event = self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py", line 217, in _receive_event
    data = self._network_stream.read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_backends/sync.py", line 126, in read
    with map_exceptions(exc_map):
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/usr/local/lib/python3.12/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadTimeout: timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 969, in request
    response = self._client.send(
               ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 914, in send
    response = self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 942, in _send_handling_auth
    response = self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 979, in _send_handling_redirects
    response = self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1014, in _send_single_request
    response = transport.handle_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 249, in handle_request
    with map_httpcore_exceptions():
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 118, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadTimeout: timed out
DEBUG:openai._base_client:2 retries left
INFO:openai._base_client:Retrying request to /chat/completions in 0.475531 seconds
DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/chat/completions', 'timeout': 1.0, 'files': None, 'idempotency_key': 'stainless-python-retry-583e0c32-bf69-4644-8654-9b148cf38734', 'json_data': {'messages': [{'role': 'system', 'content': '\n        As a genius expert, your task is to understand the content and provide\n        the parsed objects in json that match the following json_schema:\n\n\n        {\n  "properties": {\n    "name": {\n      "title": "Name",\n      "type": "string"\n    },\n    "age": {\n      "title": "Age",\n      "type": "integer"\n    }\n  },\n  "required": [\n    "name",\n    "age"\n  ],\n  "title": "Character",\n  "type": "object"\n}\n\n        Make sure to return an instance of the JSON, not the schema itself\n'}, {'role': 'user', 'content': 'Tell me about Harry Potter'}], 'model': 'mistral-small:24b-9k', 'response_format': {'type': 'json_object'}}}
DEBUG:openai._base_client:Sending HTTP Request: POST http://10.10.10.115:11434/v1/chat/completions
DEBUG:httpcore.connection:connect_tcp.started host='10.10.10.115' port=11434 local_address=None timeout=1.0 socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7be5adf56030>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:receive_response_headers.failed exception=ReadTimeout(TimeoutError('timed out'))
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:openai._base_client:Encountered httpx.TimeoutException
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
    yield
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 250, in handle_request
    resp = self._pool.handle_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 256, in handle_request
    raise exc from None
  File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 236, in handle_request
    response = connection.handle_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection.py", line 103, in handle_request
    return self._connection.handle_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py", line 136, in handle_request
    raise exc
  File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py", line 106, in handle_request
    ) = self._receive_response_headers(**kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py", line 177, in _receive_response_headers
    event = self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py", line 217, in _receive_event
    data = self._network_stream.read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_backends/sync.py", line 126, in read
    with map_exceptions(exc_map):
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/usr/local/lib/python3.12/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadTimeout: timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 969, in request
    response = self._client.send(
               ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 914, in send
    response = self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 942, in _send_handling_auth
    response = self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 979, in _send_handling_redirects
    response = self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1014, in _send_single_request
    response = transport.handle_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 249, in handle_request
    with map_httpcore_exceptions():
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 118, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadTimeout: timed out
DEBUG:openai._base_client:1 retry left
INFO:openai._base_client:Retrying request to /chat/completions in 0.785212 seconds
DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/chat/completions', 'timeout': 1.0, 'files': None, 'idempotency_key': 'stainless-python-retry-583e0c32-bf69-4644-8654-9b148cf38734', 'json_data': {'messages': [{'role': 'system', 'content': '\n        As a genius expert, your task is to understand the content and provide\n        the parsed objects in json that match the following json_schema:\n\n\n        {\n  "properties": {\n    "name": {\n      "title": "Name",\n      "type": "string"\n    },\n    "age": {\n      "title": "Age",\n      "type": "integer"\n    }\n  },\n  "required": [\n    "name",\n    "age"\n  ],\n  "title": "Character",\n  "type": "object"\n}\n\n        Make sure to return an instance of the JSON, not the schema itself\n'}, {'role': 'user', 'content': 'Tell me about Harry Potter'}], 'model': 'mistral-small:24b-9k', 'response_format': {'type': 'json_object'}}}
DEBUG:openai._base_client:Sending HTTP Request: POST http://10.10.10.115:11434/v1/chat/completions
DEBUG:httpcore.connection:connect_tcp.started host='10.10.10.115' port=11434 local_address=None timeout=1.0 socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7be5adf56b10>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 500, b'Internal Server Error', [(b'Content-Type', b'application/json'), (b'Date', b'Wed, 18 Jun 2025 17:32:37 GMT'), (b'Content-Length', b'119')])
INFO:httpx:HTTP Request: POST http://10.10.10.115:11434/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:openai._base_client:HTTP Response: POST http://10.10.10.115:11434/v1/chat/completions "500 Internal Server Error" Headers({'content-type': 'application/json', 'date': 'Wed, 18 Jun 2025 17:32:37 GMT', 'content-length': '119'})
DEBUG:openai._base_client:request_id: None
DEBUG:openai._base_client:Encountered httpx.HTTPStatusError
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1014, in request
    response.raise_for_status()
  File "/usr/local/lib/python3.12/site-packages/httpx/_models.py", line 829, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'http://10.10.10.115:11434/v1/chat/completions'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500
DEBUG:openai._base_client:Re-raising status error
DEBUG:instructor:Retry error: RetryError[<Future at 0x7be5adf57aa0 state=finished raised InternalServerError>]
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/instructor/retry.py", line 184, in retry_sync
    response = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_utils/_utils.py", line 287, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/resources/chat/completions/completions.py", line 925, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1239, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1034, in request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'error': {'message': 'unexpected server status: llm server loading model', 'type': 'api_error', 'param': None, 'code': None}}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/instructor/retry.py", line 179, in retry_sync
    for attempt in max_retries:
                   ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 445, in __iter__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 378, in iter
    result = action(retry_state)
             ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 421, in exc_check
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7be5adf57aa0 state=finished raised InternalServerError>]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/home/ai/.cursor-server/extensions/ms-python.debugpy-2025.8.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 71, in <module>
    cli.main()
  File "/home/ai/.cursor-server/extensions/ms-python.debugpy-2025.8.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 501, in main
    run()
  File "/home/ai/.cursor-server/extensions/ms-python.debugpy-2025.8.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 351, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/home/ai/.cursor-server/extensions/ms-python.debugpy-2025.8.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310, in run_path
    return _run_module_code(code, init_globals, run_name, pkg_name=pkg_name, script_name=fname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ai/.cursor-server/extensions/ms-python.debugpy-2025.8.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127, in _run_module_code
    _run_code(code, mod_globals, init_globals, mod_name, mod_spec, pkg_name, script_name)
  File "/home/ai/.cursor-server/extensions/ms-python.debugpy-2025.8.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118, in _run_code
    exec(code, run_globals)
  File "/home/ai/dev/dev/model_validator/lithon/production_pipeline/retry.py", line 24, in <module>
    resp = client.chat.completions.create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/instructor/client.py", line 366, in create
    return self.create_fn(
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/instructor/patch.py", line 195, in new_create_sync
    response = retry_sync(
               ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/instructor/retry.py", line 210, in retry_sync
    raise InstructorRetryException(
instructor.exceptions.InstructorRetryException: Error code: 500 - {'error': {'message': 'unexpected server status: llm server loading model', 'type': 'api_error', 'param': None, 'code': None}}

**Expected behavior**
Timeout happens after the global time without retries.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Ollama timeout still doesn't work as expected #1614

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Ollama timeout still doesn't work as expected #1614

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions