Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix is_online() for faster offline imports #9544

Merged
merged 8 commits into from Apr 6, 2024

Conversation

khoalu
Copy link
Contributor

@khoalu khoalu commented Apr 4, 2024

If the user knows that this packages is used in an offline environments, dont waste time to check for online status when importing by setting the environment variable "YOLO_OFFLINE=true"

I have read the CLA Document and I sign the CLA

πŸ› οΈ PR Summary

Made with ❀️ by Ultralytics Actions

🌟 Summary

Improved Internet connectivity check method with environmental override.

πŸ“Š Key Changes

  • Streamlined code for checking Internet connectivity.
  • Added an option to bypass the connectivity check via an environment variable (YOLO_OFFLINE).

🎯 Purpose & Impact

  • Efficiency Boost: Simplifying the connectivity check reduces the chance of errors and speeds up the process. πŸš€
  • User Control: Users can now control the connectivity check feature by setting an environment variable, making the tool more flexible, especially for offline use cases. πŸ”§
  • Broader Compatibility: The update aims to enhance user experience across different environments by ensuring the tool behaves more predictably in both online and offline scenarios. 🌍

Copy link

github-actions bot commented Apr 4, 2024

CLA Assistant Lite bot All Contributors have signed the CLA. βœ…

@khoalu
Copy link
Contributor Author

khoalu commented Apr 4, 2024

I have read the CLA Document and I sign the CLA

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ‘‹ Hello @khoalu, thank you for submitting an Ultralytics YOLOv8 πŸš€ PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

  • βœ… Verify your PR is up-to-date with ultralytics/ultralytics main branch. If your PR is behind you can update your code by clicking the 'Update branch' button or by running git pull and git merge main locally.
  • βœ… Verify all YOLOv8 Continuous Integration (CI) checks are passing.
  • βœ… Update YOLOv8 Docs for any new or updated features.
  • βœ… Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." β€” Bruce Lee

See our Contributing Guide for details and let us know if you have any questions!

Copy link

codecov bot commented Apr 4, 2024

Codecov Report

All modified and coverable lines are covered by tests βœ…

Project coverage is 76.14%. Comparing base (a262865) to head (df89c00).

❗ Current head df89c00 differs from pull request most recent head bd0bdea. Consider uploading reports for the commit bd0bdea to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9544      +/-   ##
==========================================
+ Coverage   76.12%   76.14%   +0.01%     
==========================================
  Files         121      121              
  Lines       15281    15278       -3     
==========================================
  Hits        11633    11633              
+ Misses       3648     3645       -3     
Flag Coverage Ξ”
Benchmarks 36.05% <100.00%> (+<0.01%) ⬆️
GPU 37.95% <100.00%> (-0.02%) ⬇️
Tests 71.42% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

β˜” View full report in Codecov by Sentry.
πŸ“’ Have feedback on the report? Share it here.

@glenn-jocher
Copy link
Member

@khoalu thanks for the PR, but I did a serious amount of optimization before introducing this function into the codebase. The execution time in an offline environment (in this case my M2 Macbook Air with Wifi disabled) is 43 microseconds, i.e. 4% of a millisecond.

Screenshot 2024-04-04 at 10 48 45

@glenn-jocher
Copy link
Member

@khoalu hey I had one idea though. You're right that initial import is too slow, but this is mostly because we import so many packages. I've done some work eliminating default package imports, so i.e. pandas and seaborn now are completely eliminated from initial imports. Maybe you could check the rest of the package import speeds and see if there's opportunity to scope the slowest imports across the repo to improve the ultralytics import speed.

What do you think?

@khoalu
Copy link
Contributor Author

khoalu commented Apr 4, 2024

I use ultralytics behind corporate proxy. I dont really know the details about the proxy being used.
Maybe the proxy has something to do with the import time being slow ?

When i set ONLINE = False without running through the is_online function, the time to import drop very close to 6 seconds

Using my PR, i benchmark like this:

time python -c "import ultralytics"

real    0m8,471s
user    0m2,491s
sys     0m0,372s
time YOLO_OFFLINE=true python -c "import ultralytics"

real    0m2,393s
user    0m2,376s
sys     0m0,369s

Some other packages for reference:

time python -c "import torch"

real    0m1,207s
user    0m1,309s
sys     0m0,251s
time python -c "import cv2"

real    0m0,172s
user    0m0,279s
sys     0m0,219s
time python -c "import numpy"

real    0m0,146s
user    0m0,255s
sys     0m0,176s
time python -c "import pandas"

real    0m0,284s
user    0m0,402s
sys     0m0,167s

If my PR won't be merged, I wonder if there is some possible way to monkey patch that line ONLINE = is_online() to ONLINE = False before import ultralytics without changing the source code ? (yes it should be faster than 6 seconds :) ) I'll be appreciated if there is some way to do so.

UPDATE:
Ok so I try to setup another benchmark like this, maybe more fair

import time
t0 = time.perf_counter()
import ultralytics
t1 = time.perf_counter()
print(f"{t1-t0:.4f}")

The result:

without YOLO_OFFLINE: 8.0602
with YOLO_OFFLINE = true: 2.0127

Packages for references

import torch: 1.0037
import cv2: 0.0795
import numpy: 0.1071
import pandas: 0.2330


@khoalu hey I had one idea though. You're right that initial import is too slow, but this is mostly because we import so many packages. I've done some work eliminating default package imports, so i.e. pandas and seaborn now are completely eliminated from initial imports. Maybe you could check the rest of the package import speeds and see if there's opportunity to scope the slowest imports across the repo to improve the ultralytics import speed.

What do you think?

That's a great idea. I will check the package, and if possible, open another PR for this.

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 4, 2024

@khoalu oh really strange. Ok let me think about this a bit. I also profiled all the imported packages, torchvision in particular seems very slow, maybe I can scope it around the repo:

PIL: 0.0003 seconds
yaml: 0.0082 seconds
tqdm: 0.0093 seconds
psutil: 0.0099 seconds
cpuinfo: 0.0137 seconds
requests: 0.0414 seconds
scipy: 0.0693 seconds
matplotlib: 0.1319 seconds
cv2: 0.1403 seconds
pandas: 0.2956 seconds # already scoped
torch: 0.6030 seconds
thop: 0.6066 seconds # imports torch
seaborn: 0.7776 seconds # already scoped
torchvision: 0.9394 seconds # <-- slowest by far

@khoalu
Copy link
Contributor Author

khoalu commented Apr 4, 2024

my result to match yours: Im using python 3.10.13 torch==2.1.0+cpu, torchvision==0.16.0+cpu

PIL: 0.0005 seconds
yaml: 0.0124 seconds
tqdm: 0.0121 seconds
psutil: 0.0148 seconds
cpuinfo: 0.0147 seconds
requests: 0.0630 seconds
scipy: 0.0985 seconds
matplotlib: 0.1213 seconds
cv2: 0.0776 seconds
pandas: 0.2319 seconds
torch: 1.0034 seconds
thop: 1.0871 seconds
seaborn: 0.7051 seconds
torchvision: 1.3442 seconds
ultralytics: 8.0400 seconds

script:

import subprocess

packages = [
    "PIL",
    "yaml",
    "tqdm",
    "psutil",
    "cpuinfo",
    "requests",
    "scipy",
    "matplotlib",
    "cv2",
    "pandas",
    "torch",
    "thop",
    "seaborn",
    "torchvision",
    "ultralytics",
]


template = '''
import time;
t0 = time.perf_counter()
import {package}
t1 = time.perf_counter()
print( "{package}", f": {{t1 - t0:.4f}} seconds", sep="")
'''

for package in packages:
    with open("temp.py", "w") as f:
        f.write(template.format(package=package))
    subprocess.call(["python", "temp.py"])

@glenn-jocher
Copy link
Member

I get this with your script (M2 Macbook Air)

PIL: 0.0015 seconds
yaml: 0.0114 seconds
tqdm: 0.0105 seconds
psutil: 0.0114 seconds
cpuinfo: 0.0173 seconds
requests: 0.0545 seconds
scipy: 0.1394 seconds
matplotlib: 0.1291 seconds
cv2: 0.1783 seconds
pandas: 0.4130 seconds
torch: 0.6815 seconds
thop: 0.5513 seconds
seaborn: 0.8098 seconds
torchvision: 0.9933 seconds
ultralytics: 1.3940 seconds

@glenn-jocher
Copy link
Member

@khoalu that's really strange. Are there modifications you can make directly to the is_online() function to allow it to run faster in your environment?

@glenn-jocher
Copy link
Member

I'm working to see if I can scope torchvision imports

@khoalu
Copy link
Contributor Author

khoalu commented Apr 4, 2024

@khoalu that's really strange. Are there modifications you can make directly to the is_online() function to allow it to run faster in your environment?

I'll try and reply back later

@ddeevvaa
Copy link

ddeevvaa commented Apr 4, 2024

good topic

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 4, 2024

@khoalu does this function work faster, switching the port from 53 to 80, which is more standard?

EDIT: I've updated the PR with this change, which should improve performance behind firewalls.

def is_online() -> bool:
    """
    Check internet connectivity by attempting to connect to a known online host.

    Returns:
        (bool): True if connection is successful, False otherwise.
    """
    import socket

    for host in ("1.1.1.1", "8.8.8.8"):  # Cloudflare, Google
        try:
            test_connection = socket.create_connection(address=(host, 80), timeout=1.5)
        except (socket.timeout, socket.gaierror, OSError):
            continue
        else:
            # If the connection was successful, close it to avoid a ResourceWarning
            test_connection.close()
            return True
    return False

%timeit is_online()

@glenn-jocher glenn-jocher changed the title faster import for offline environment Fix is_online() for faster offline imports Apr 4, 2024
@khoalu
Copy link
Contributor Author

khoalu commented Apr 5, 2024

Still not really work in my case :(

Using my script above:

ONLINE = is_online()

ultralytics: 5.0675 seconds

ONLINE = False ( bypass is_online() )

ultralytics: 2.0479 seconds

It decrease 3 seconds (= 1.5s timeout x 2 hosts I guess ?)

@glenn-jocher
Copy link
Member

@khoalu oh got it. I think an HTTP request may provide a more standardized format better suited for firewall environments. Can you test this function for speed in your offline environment?

import requests

def is_online() -> bool:
    """
    Check internet connectivity by making an HTTP GET request to a known online host.
    
    Returns:
        bool: True if the HTTP request is successful, False otherwise.
    """
    try:
        # Use httpbin.org for a simple status check. HTTPS is used for compatibility with strict firewalls.
        response = requests.get("https://httpbin.org/get", timeout=1.5)
        # Check if the HTTP status code is 200 (OK), indicating successful internet connectivity.
        if response.status_code == 200:
            return True
    except requests.RequestException:
        return False
    return False

@khoalu
Copy link
Contributor Author

khoalu commented Apr 5, 2024

The above is_online() takes 9 seconds in my environment
I try to change the timeout:

  • timeout = 0.5: 3 seconds
  • timeout = 1: 6 seconds
  • timeout = 0.2: 1.22 seconds

The time it takes seems to scale linearly.

I try to change the url to https://example.com, keep the timeout=1.5 and it now takes 1.5 seconds

@glenn-jocher
Copy link
Member

@khoalu ok got it. I give up then, I'll add the proposed YOLO_OFFLINE ENV variable here. I'll also get rid of the second DNS check so the max timeout is just 1.5s total.

@glenn-jocher glenn-jocher added the TODO Items that needs completing label Apr 5, 2024
@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 5, 2024

@khoalu ok buddy, new function is faster (and shorter), and also incorporates your YOLO_OFFLINE ENV variable. Can you check if this works for you?

import socket
import contextlib

def is_online() -> bool:
    """
    Check internet connectivity by attempting to connect to a known online host.

    Returns:
        (bool): True if connection is successful, False otherwise.
    """
    with contextlib.suppress(Exception):
        assert str(os.getenv("YOLO_OFFLINE", "")).lower() != "true"  # check if ENV var YOLO_OFFLINE="True"
        import socket

        socket.create_connection(address=("1.1.1.1", 80), timeout=1.0).close()  # check Cloudflare DNS
        return True
    return False

@glenn-jocher glenn-jocher removed the TODO Items that needs completing label Apr 6, 2024
@glenn-jocher glenn-jocher merged commit fcb0641 into ultralytics:main Apr 6, 2024
@glenn-jocher
Copy link
Member

@khoalu PR merged! Thank you for your contributions. Offline import should now be much faster (no slower than 1s), and you can now use YOLO_OFFLINE="true" to speed this up further.

Let me know if this works for you and if you spot any other areas for improvement!

@khoalu
Copy link
Contributor Author

khoalu commented Apr 7, 2024

Sorry for the late response.

Thank you very much, it works for me now. Now importing ultralytics only takes 1.5 seconds with YOLO_OFFLINE=true (was 8+ secs)

@khoalu khoalu deleted the faster_offline_import branch April 7, 2024 11:37
@glenn-jocher
Copy link
Member

@khoalu i'm glad to hear that the update works well for you! πŸš€ If you have any more questions or run into any issues, feel free to reach out. Happy coding!

hmurari pushed a commit to hmurari/ultralytics that referenced this pull request Apr 17, 2024
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: UltralyticsAssistant <web@ultralytics.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants