Weights download speedup #1354

hansent · 2025-06-11T20:44:15Z

Description

This adds anew download method to be used for downloading weights. I was basically unable to download perception encoder (2.5GB) or even clip weights (250MB). because it would go so slow (I used naive chinking and measured at ~5Mbps and slower..although partially that chunking might have been worse than what requests does by default). Either way with previous implementation I was waiting 30 minutes and still weights weren't done, now it downloads in seconds.

This implements parralell or serialized chunked downloads with BytesIO buffer and adds some basic logging about file size when model weights are fetched (with debug logs it will print download progress and speed at 10% intervals also)

Ideally we don't need to buffer entire weight file in memory at al...but we cant just stream to a file because at least need to be careful I think since cache implementation differences in various envs so will need some more refactoring for that.

Type of change

Please delete options that are not relevant.
maintenance / speed improvement

How has this change been tested, please provide a testcase or example of how you tested the change?

locally

Any specific deployment considerations

no

Docs

Docs updated? What were the changes:

…ch faster (parallel chunks if possible and better buffer handling)

inference/core/models/roboflow.py

yeldarby · 2025-06-11T21:00:29Z

Had Claude give it a once-over. These were the most salient pieces of feedback:

1. Threading Safety Violation - Lines 844-868

progress_lock = threading.Lock()
downloaded_bytes = 0

def download_chunk(start, end):
    nonlocal downloaded_bytes
    # ... downloads chunk ...
    with progress_lock:
        content_buffer.seek(current_pos)
        content_buffer.write(chunk)
        downloaded_bytes += len(chunk)

Problem: current_pos is calculated outside the lock but used inside. This creates a race condition where chunks could overwrite each other.

Fix: Calculate position inside the lock:

def download_chunk(start, end):
    nonlocal downloaded_bytes
    local_pos = start
    try:
        response = requests.get(wrap_url(url), headers=headers, stream=True, timeout=ROBOFLOW_API_REQUEST_TIMEOUT)
        response.raise_for_status()
        for chunk in response.iter_content(chunk_size=8192):
            if chunk:
                with progress_lock:
                    content_buffer.seek(local_pos)
                    content_buffer.write(chunk)
                    downloaded_bytes += len(chunk)
                    local_pos += len(chunk)

2. Buffer Pre-allocation Bug - Lines 825-827

content_buffer.seek(total_size - 1)
content_buffer.write(b'\0') # Pre-allocate buffer

Problem: This only allocates the last byte, not the entire buffer. Writing to middle positions on an empty BytesIO will fill gaps with zeros, corrupting binary data.

Fix: Pre-allocate properly:

content_buffer = io.BytesIO(b'\0' * total_size)

3. Missing Error Handling in Parallel Download

The parallel download doesn't handle api_key_safe_raise_for_status or maintain the same error handling patterns as the original get_from_url.

Fix: Add consistent error handling:

def download_chunk(start, end):
    # ... existing code ...
    response = requests.get(wrap_url(url), headers=build_roboflow_api_headers(), stream=True, timeout=ROBOFLOW_API_REQUEST_TIMEOUT)
    api_key_safe_raise_for_status(response=response)

…download-speedup`) Here is the optimized version of your code, rewritten for better runtime and memory performance. - **Optimization strategy**: - Avoid unnecessary variable initializations and duplicate function calls. - Minimize expensive string formatting and JSON parsing. - Reduce `time.time()` calls inside tight loops. - Pre-calculate constants. - Reuse byte counters; only log when necessary. - In log calls, use conditional debug log early to avoid unnecessary string formatting. **All return values remain the same, and function signatures are not changed.** **All comments are preserved or updated if code has changed.** **Summary of changes:** - Avoid recomputing constants, unnecessary branching, duplicate string formatting when log level is not enabled. - Only call `time.time()` and perform Mbps calculations when logging. - Minor clean-ups for clarity. - Final response construction is unchanged. - Comments updated as needed, originals preserved. This will be measurably faster and use slightly less memory, especially when downloading large files.

codeflash-ai · 2025-06-18T22:17:59Z

⚡️ Codeflash found optimizations for this PR

📄 24% (0.24x) speedup for `_serial_download` in `inference/core/roboflow_api.py`

⏱️ Runtime : 924 microseconds → 747 microseconds (best of 44 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function _serial_download by 24% in PR #1354 (weights-download-speedup) #1374

If you approve, it will be merged into this PR (branch weights-download-speedup).

grzegorz-roboflow · 2025-06-20T08:09:22Z

@PawelPeczek-Roboflow added similar functionality here: https://github.com/roboflow/inference/blob/feature/inference-v1-models/inference_experimental/inference_exp/utils/download.py

I'd vote for porting that change (since it does the same thing and the plan for it is to replace current solution)

hansent · 2025-06-20T20:17:35Z

testing this on staging @grzegorz-roboflow observed the following error:

{"event": "Failed to download chunk from 3115652745 to 3738783293: ('Connection broken: IncompleteRead(0 bytes read, 623130549 more expected)', IncompleteRead(0 bytes read, 623130549 more expected))", "timestamp": "2025-06-20 19:54.57", "filename": "roboflow_api.py", "func_name": "download_chunk", "lineno": 940}
{"positional_args": ["ChunkedEncodingError", "ChunkedEncodingError(ProtocolError('Connection broken: IncompleteRead(0 bytes read, 623130549 more expected)', IncompleteRead(0 bytes read, 623130549 more expected)))"], "event": "%s: %s", "request_id": "7e02da1dd62c4236b0b3abb46cab0d0f", "timestamp": "2025-06-20 19:54.57", "exception": {"type": "ChunkedEncodingError", "message": "('Connection broken: IncompleteRead(0 bytes read, 623130549 more expected)', IncompleteRead(0 bytes read, 623130549 more expected))", "stacktrace": [{"filename": "/app/inference/core/interfaces/http/http_api.py", "lineno": 284, "function": "wrapped_route", "code": "return await route(*args, **kwargs)"}, {"filename": "/app/inference/usage_tracking/collector.py", "lineno": 728, "function": "async_wrapper", "code": "res = await func(*args, **kwargs)"}, {"filename": "/app/inference/core/interfaces/http/http_api.py", "lineno": 1218, "function": "infer_lmm", "code": "return await process_inference_request(inference_request)"}, {"filename": "/app/inference/core/interfaces/http/http_api.py", "lineno": 778, "function": "process_inference_request", "code": "self.model_manager.add_model(de_aliased_model_id, inference_request.api_key)"}, {"filename": "/app/inference/core/managers/decorators/fixed_size_cache.py", "lineno": 104, "function": "add_model", "code": "raise error"}, {"filename": "/app/inference/core/managers/decorators/fixed_size_cache.py", "lineno": 93, "function": "add_model", "code": "return super().add_model("}, {"filename": "/app/inference/core/managers/decorators/base.py", "lineno": 69, "function": "add_model", "code": "self.model_manager.add_model("}, {"filename": "/app/inference/core/managers/base.py", "lineno": 85, "function": "add_model", "code": "model = self.model_registry.get_model(resolved_identifier, api_key)("}, {"filename": "/app/inference/models/transformers/transformers.py", "lineno": 84, "function": "__init__", "code": "self.cache_model_artefacts()"}, {"filename": "/app/inference/core/models/roboflow.py", "lineno": 255, "function": "cache_model_artefacts", "code": "self.download_model_artifacts_from_roboflow_api()"}, {"filename": "/app/inference/models/transformers/transformers.py", "lineno": 235, "function": "download_model_artifacts_from_roboflow_api", "code": "model_weights_response = get_weights_from_url_optimally(weights_url)"}, {"filename": "/app/inference/core/roboflow_api.py", "lineno": 831, "function": "get_weights_from_url_optimally", "code": "return _parallel_download(url, total_size)"}, {"filename": "/app/inference/core/roboflow_api.py", "lineno": 952, "function": "_parallel_download", "code": "f.result()"}

So we will need to add retry and backoff logic before merging this change. Will try to implement that as well as streaming directly to file and testing that with serverless filesystem on staging also.

hansent added 4 commits June 11, 2025 13:03

add new get_weights_from_url_optimally to download weights files mu…

055073a

…ch faster (parallel chunks if possible and better buffer handling)

log download finished

1e77d1e

fix logging

38713b4

use BytesIO

d6512bf

hansent requested review from PawelPeczek-Roboflow, grzegorz-roboflow, yeldarby and probicheaux as code owners June 11, 2025 20:44

yeldarby reviewed Jun 11, 2025

View reviewed changes

inference/core/models/roboflow.py Show resolved Hide resolved

formatting

c4931a2

hansent and others added 4 commits June 13, 2025 10:56

Merge branch 'main' into weights-download-speedup

8aaf3df

Merge branch 'main' into weights-download-speedup

825b451

Merge branch 'main' into weights-download-speedup

b3fe25f

Merge branch 'main' into weights-download-speedup

a4111c8

codeflash-ai bot mentioned this pull request Jun 18, 2025

⚡️ Speed up function _serial_download by 24% in PR #1354 (weights-download-speedup) #1374

Closed

Merge branch 'main' into weights-download-speedup

a24ebee

Merge branch 'main' into weights-download-speedup

2e8d8cc

hansent closed this Jul 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Weights download speedup #1354

Weights download speedup #1354

Uh oh!

hansent commented Jun 11, 2025

Uh oh!

Uh oh!

yeldarby commented Jun 11, 2025

Uh oh!

codeflash-ai bot commented Jun 18, 2025

⚡️ Speed up function `_serial_download` by 24% in PR #1354 (`weights-download-speedup`) #1374

Uh oh!

grzegorz-roboflow commented Jun 20, 2025 •

edited

Loading

Uh oh!

hansent commented Jun 20, 2025

Uh oh!

Uh oh!

Weights download speedup #1354

Weights download speedup #1354

Uh oh!

Conversation

hansent commented Jun 11, 2025

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

Uh oh!

Uh oh!

yeldarby commented Jun 11, 2025

1. Threading Safety Violation - Lines 844-868

2. Buffer Pre-allocation Bug - Lines 825-827

3. Missing Error Handling in Parallel Download

Uh oh!

codeflash-ai bot commented Jun 18, 2025

⚡️ Codeflash found optimizations for this PR

📄 24% (0.24x) speedup for _serial_download in inference/core/roboflow_api.py

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function _serial_download by 24% in PR #1354 (weights-download-speedup) #1374

Uh oh!

grzegorz-roboflow commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hansent commented Jun 20, 2025

Uh oh!

Uh oh!

📄 24% (0.24x) speedup for `_serial_download` in `inference/core/roboflow_api.py`

⚡️ Speed up function `_serial_download` by 24% in PR #1354 (`weights-download-speedup`) #1374

grzegorz-roboflow commented Jun 20, 2025 •

edited

Loading