Skip to content

Weights download speedup #1354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed

Weights download speedup #1354

wants to merge 11 commits into from

Conversation

hansent
Copy link
Contributor

@hansent hansent commented Jun 11, 2025

Description

This adds anew download method to be used for downloading weights. I was basically unable to download perception encoder (2.5GB) or even clip weights (250MB). because it would go so slow (I used naive chinking and measured at ~5Mbps and slower..although partially that chunking might have been worse than what requests does by default). Either way with previous implementation I was waiting 30 minutes and still weights weren't done, now it downloads in seconds.

This implements parralell or serialized chunked downloads with BytesIO buffer and adds some basic logging about file size when model weights are fetched (with debug logs it will print download progress and speed at 10% intervals also)

Ideally we don't need to buffer entire weight file in memory at al...but we cant just stream to a file because at least need to be careful I think since cache implementation differences in various envs so will need some more refactoring for that.

Type of change

Please delete options that are not relevant.
maintenance / speed improvement

How has this change been tested, please provide a testcase or example of how you tested the change?

locally

Any specific deployment considerations

no

Docs

  • Docs updated? What were the changes:

@yeldarby
Copy link
Contributor

Had Claude give it a once-over. These were the most salient pieces of feedback:

1. Threading Safety Violation - Lines 844-868

progress_lock = threading.Lock()
downloaded_bytes = 0

def download_chunk(start, end):
    nonlocal downloaded_bytes
    # ... downloads chunk ...
    with progress_lock:
        content_buffer.seek(current_pos)
        content_buffer.write(chunk)
        downloaded_bytes += len(chunk)

Problem: current_pos is calculated outside the lock but used inside. This creates a race condition where chunks could overwrite each other.

Fix: Calculate position inside the lock:

def download_chunk(start, end):
    nonlocal downloaded_bytes
    local_pos = start
    try:
        response = requests.get(wrap_url(url), headers=headers, stream=True, timeout=ROBOFLOW_API_REQUEST_TIMEOUT)
        response.raise_for_status()
        for chunk in response.iter_content(chunk_size=8192):
            if chunk:
                with progress_lock:
                    content_buffer.seek(local_pos)
                    content_buffer.write(chunk)
                    downloaded_bytes += len(chunk)
                    local_pos += len(chunk)

2. Buffer Pre-allocation Bug - Lines 825-827

content_buffer.seek(total_size - 1)
content_buffer.write(b'\0') # Pre-allocate buffer

Problem: This only allocates the last byte, not the entire buffer. Writing to middle positions on an empty BytesIO will fill gaps with zeros, corrupting binary data.

Fix: Pre-allocate properly:

content_buffer = io.BytesIO(b'\0' * total_size)

3. Missing Error Handling in Parallel Download

The parallel download doesn't handle api_key_safe_raise_for_status or maintain the same error handling patterns as the original get_from_url.

Fix: Add consistent error handling:

def download_chunk(start, end):
    # ... existing code ...
    response = requests.get(wrap_url(url), headers=build_roboflow_api_headers(), stream=True, timeout=ROBOFLOW_API_REQUEST_TIMEOUT)
    api_key_safe_raise_for_status(response=response)

codeflash-ai bot added a commit that referenced this pull request Jun 18, 2025
…download-speedup`)

Here is the optimized version of your code, rewritten for better runtime and memory performance.
- **Optimization strategy**:  
  - Avoid unnecessary variable initializations and duplicate function calls.
  - Minimize expensive string formatting and JSON parsing.
  - Reduce `time.time()` calls inside tight loops.
  - Pre-calculate constants.
  - Reuse byte counters; only log when necessary.
  - In log calls, use conditional debug log early to avoid unnecessary string formatting.

**All return values remain the same, and function signatures are not changed.**  
**All comments are preserved or updated if code has changed.**


**Summary of changes:**
- Avoid recomputing constants, unnecessary branching, duplicate string formatting when log level is not enabled.
- Only call `time.time()` and perform Mbps calculations when logging.
- Minor clean-ups for clarity.
- Final response construction is unchanged.
- Comments updated as needed, originals preserved.

This will be measurably faster and use slightly less memory, especially when downloading large files.
Copy link
Contributor

codeflash-ai bot commented Jun 18, 2025

⚡️ Codeflash found optimizations for this PR

📄 24% (0.24x) speedup for _serial_download in inference/core/roboflow_api.py

⏱️ Runtime : 924 microseconds 747 microseconds (best of 44 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch weights-download-speedup).

@grzegorz-roboflow
Copy link
Contributor

grzegorz-roboflow commented Jun 20, 2025

@PawelPeczek-Roboflow added similar functionality here: https://github.com/roboflow/inference/blob/feature/inference-v1-models/inference_experimental/inference_exp/utils/download.py

I'd vote for porting that change (since it does the same thing and the plan for it is to replace current solution)

@hansent
Copy link
Contributor Author

hansent commented Jun 20, 2025

testing this on staging @grzegorz-roboflow observed the following error:

{"event": "Failed to download chunk from 3115652745 to 3738783293: ('Connection broken: IncompleteRead(0 bytes read, 623130549 more expected)', IncompleteRead(0 bytes read, 623130549 more expected))", "timestamp": "2025-06-20 19:54.57", "filename": "roboflow_api.py", "func_name": "download_chunk", "lineno": 940}
{"positional_args": ["ChunkedEncodingError", "ChunkedEncodingError(ProtocolError('Connection broken: IncompleteRead(0 bytes read, 623130549 more expected)', IncompleteRead(0 bytes read, 623130549 more expected)))"], "event": "%s: %s", "request_id": "7e02da1dd62c4236b0b3abb46cab0d0f", "timestamp": "2025-06-20 19:54.57", "exception": {"type": "ChunkedEncodingError", "message": "('Connection broken: IncompleteRead(0 bytes read, 623130549 more expected)', IncompleteRead(0 bytes read, 623130549 more expected))", "stacktrace": [{"filename": "/app/inference/core/interfaces/http/http_api.py", "lineno": 284, "function": "wrapped_route", "code": "return await route(*args, **kwargs)"}, {"filename": "/app/inference/usage_tracking/collector.py", "lineno": 728, "function": "async_wrapper", "code": "res = await func(*args, **kwargs)"}, {"filename": "/app/inference/core/interfaces/http/http_api.py", "lineno": 1218, "function": "infer_lmm", "code": "return await process_inference_request(inference_request)"}, {"filename": "/app/inference/core/interfaces/http/http_api.py", "lineno": 778, "function": "process_inference_request", "code": "self.model_manager.add_model(de_aliased_model_id, inference_request.api_key)"}, {"filename": "/app/inference/core/managers/decorators/fixed_size_cache.py", "lineno": 104, "function": "add_model", "code": "raise error"}, {"filename": "/app/inference/core/managers/decorators/fixed_size_cache.py", "lineno": 93, "function": "add_model", "code": "return super().add_model("}, {"filename": "/app/inference/core/managers/decorators/base.py", "lineno": 69, "function": "add_model", "code": "self.model_manager.add_model("}, {"filename": "/app/inference/core/managers/base.py", "lineno": 85, "function": "add_model", "code": "model = self.model_registry.get_model(resolved_identifier, api_key)("}, {"filename": "/app/inference/models/transformers/transformers.py", "lineno": 84, "function": "__init__", "code": "self.cache_model_artefacts()"}, {"filename": "/app/inference/core/models/roboflow.py", "lineno": 255, "function": "cache_model_artefacts", "code": "self.download_model_artifacts_from_roboflow_api()"}, {"filename": "/app/inference/models/transformers/transformers.py", "lineno": 235, "function": "download_model_artifacts_from_roboflow_api", "code": "model_weights_response = get_weights_from_url_optimally(weights_url)"}, {"filename": "/app/inference/core/roboflow_api.py", "lineno": 831, "function": "get_weights_from_url_optimally", "code": "return _parallel_download(url, total_size)"}, {"filename": "/app/inference/core/roboflow_api.py", "lineno": 952, "function": "_parallel_download", "code": "f.result()"}

So we will need to add retry and backoff logic before merging this change. Will try to implement that as well as streaming directly to file and testing that with serverless filesystem on staging also.

@hansent hansent closed this Jul 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants