Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Change Request] Remove pull_if_newer mechanism #1188

Closed
lsago opened this issue Apr 7, 2024 · 2 comments
Closed

[Change Request] Remove pull_if_newer mechanism #1188

lsago opened this issue Apr 7, 2024 · 2 comments
Labels
change Internal change New

Comments

@lsago
Copy link
Contributor

lsago commented Apr 7, 2024

Description
At the moment we have a function pull_if_newer that to avoid overwriting (well, technically retagging, I believe) an "older" (upstream) image. See issue where I think this came from.

IMHO, I don't think the added complexity to solely deal with this today, when multiple registries are typically used for algorithms, is worth the continued effort.

These are some of the reasons, as a I see it:

FWIW, to deal with the issue for algorithms I might be developing on locally, on the same machine where my node is running (testing), I think naming the image with a local tag seems very acceptable. Although most people I believe use the MockClient. Perhaps using 'localhost/name-of-image' could be an idea, or local.vantage6.ai .. or anything the user can rest assure will not actually exist out there.

For server/node/support images, this is where my bias shows :). I never have to deal with this, as if I ever make changes to the node/server code, I always just test it locally with --mount-src. And if I push it somewhere, it's not to the official registry.

Perhaps related to this issue, I'd argue that running something like make base-image should limit itself to building the image. Pushing to a production registry seems unexpected from just running that command. Perhaps we can find a series of make incantations that work for everybody – local tags, make push, .. or whatever we might come up with :)

Source code:

  • Improve updating of non-harbor docker images...  #1175
  • def registry_basic_auth_header(
    docker_client: DockerClient, registry: str
    ) -> dict[str, str]:
    """
    Obtain credentials for registry
    This is a wrapper around docker-py to obtain the credentials used
    to access a registry. Normally communication to the registry goes
    through the Docker deamon API (locally), therefore we have to take
    extra steps in order to communicate with the (Harbor) registry
    directly.
    Note that this has only been tested for the harbor registries.
    Parameters
    ----------
    docker_client: DockerClient
    Docker client
    registry : str
    registry name (e.g. harbor2.vantage6.ai)
    Returns
    -------
    dict
    Containing a basic authorization header
    """
    # Obtain the header used to be send to the docker deamon. We
    # communicate directly with the registry therefore we need to
    # change this headers.
    header = docker.auth.get_config_header(docker_client.api, registry)
    if not header:
    log.debug(f"No credentials found for {registry}")
    return
    # decode header
    header_json = json.loads(base64.b64decode(header))
    # Extract username and password. Depending on the docker version, the keys
    # may be capitalized in the JSON header
    username = (
    header_json["username"]
    if "username" in header_json
    else header_json["Username"]
    )
    password = (
    header_json["password"]
    if "password" in header_json
    else header_json["Password"]
    )
    basic_auth = f"{username}:{password}"
    # Encode them back to base64 and as a dict
    bytes_basic_auth = basic_auth.encode("utf-8")
    b64_basic_auth = base64.b64encode(bytes_basic_auth).decode("utf-8")
    return {"authorization": f"Basic {b64_basic_auth}"}
    def inspect_remote_image_timestamp(
    docker_client: DockerClient,
    image: str,
    log: logging.Logger | ClickLogger = ClickLogger,
    ) -> tuple[datetime, str] | None:
    """
    Obtain creation timestamp object from remote image.
    Parameters
    ----------
    docker_client: DockerClient
    Docker client
    image: str
    Image name
    log: logging.Logger | ClickLogger
    Logger
    Returns
    -------
    tuple[datetime, str] | None
    Timestamp containing the creation date of the image and its digest, or
    None if the remote image could not be found.
    """
    # check if a tag has been profided
    image_tag = re.split(":", image)
    img = image_tag[0]
    tag = image_tag[1] if len(image_tag) == 2 else "latest"
    try:
    reg, rep, img_ = re.split("/", img)
    except ValueError:
    log.warn("Could not construct remote URL, " "are you using a local image?")
    log.warn("Or an image from docker hub?")
    log.warn(
    "We'll make a final attempt when running the image to pull"
    " it without any checks..."
    )
    return None, None
    # figure out API of the docker repo
    v1_check = requests.get(f"https://{reg}/api/health")
    v1 = v1_check.status_code == 200
    v2 = False
    if not v1:
    v2_check = requests.get(f"https://{reg}/api/v2.0/health")
    v2 = v2_check.status_code == 200
    if not v1 and not v2:
    log.error(f"Could not determine version of the registry! {reg}")
    log.error("Is this a Harbor registry?")
    log.error("Or is the harbor server offline?")
    return None, None
    if v1:
    image = f"https://{reg}/api/repositories/{rep}/{img_}/tags/{tag}"
    else:
    image = (
    f"https://{reg}/api/v2.0/projects/{rep}/repositories/"
    f"{img_}/artifacts/{tag}"
    )
    # retrieve info from the Harbor server
    result = requests.get(image, headers=registry_basic_auth_header(docker_client, reg))
    # verify that we got an result
    if result.status_code == 404:
    log.warn(f"Remote image not found! {image}")
    return None, None
    if result.status_code != 200:
    log.warn(
    f"Remote info could not be fetched! ({result.status_code}) " f"{image}"
    )
    return None, None
    if v1:
    timestamp = parse(result.json().get("created"))
    digest = None
    else:
    timestamp = parse(result.json().get("push_time"))
    digest = result.json().get("digest")
    return timestamp, digest
    def inspect_local_image_timestamp(
    docker_client: DockerClient,
    image: str,
    log: logging.Logger | ClickLogger = ClickLogger,
    ) -> tuple[datetime, str] | None:
    """
    Obtain creation timestamp object from local image.
    Parameters
    ----------
    docker_client: DockerClient
    Docker client
    image: str
    Image name
    log: logging.Logger | ClickLogger
    Logger
    Returns
    -------
    tuple[datetime, str] | None
    Timestamp containing the creation date and time of the local image and
    the image digest. If the image does not exist, None is returned.
    """
    try:
    img = docker_client.images.get(image)
    except docker.errors.ImageNotFound:
    log.debug(f"Local image does not exist! {image}")
    return None, None
    except docker.errors.APIError:
    log.debug(f"Local info not available! {image}")
    return None, None
    try:
    digest = img.attrs.get("RepoDigests")[0].split("@")[1]
    except Exception:
    digest = None
    timestamp = img.attrs.get("Created")
    timestamp = parse(timestamp)
    return timestamp, digest
    def pull_if_newer(
    docker_client: DockerClient,
    image: str,
    log: logging.Logger | ClickLogger = ClickLogger,
    ) -> None:
    """
    Docker pull only if the remote image is newer.
    Parameters
    ----------
    docker_client: DockerClient
    A Docker client instance
    image: str
    Image to be pulled
    log: logger.Logger or ClickLogger
    Logger class
    Raises
    ------
    docker.errors.APIError
    If the image cannot be pulled
    """
    local_time, local_digest = inspect_local_image_timestamp(
    docker_client, image, log=log
    )
    remote_time, remote_digest = inspect_remote_image_timestamp(
    docker_client, image, log=log
    )
    pull = False
    if local_time and remote_time:
    if remote_digest == local_digest:
    log.debug(f"Local image is up-to-date: {image}")
    elif remote_time > local_time:
    log.debug(f"Remote image is newer: {image}")
    pull = True
    elif remote_time == local_time:
    log.debug(f"Local image is up-to-date: {image}")
    elif remote_time < local_time:
    log.warn(f"Local image is newer! Are you testing? {image}")
    elif local_time:
    log.warn(f"Only a local image has been found! {image}")
    elif remote_time:
    log.debug("No local image found, pulling from remote!")
    pull = True
    elif not local_time and not remote_time:
    log.warn(
    f"Cannot locate image {image} locally or remotely. Will try "
    "to pull it from Docker Hub..."
    )
    # we will try to pull it from the docker hub
    pull = True
    if pull:
    try:
    docker_client.images.pull(image)
    log.debug(f"Succeeded to pull image: {image}")
    except docker.errors.APIError as e:
    log.error(f"Failed to pull image! {image}")
    log.debug(e)
    raise docker.errors.APIError("Failed to pull image") from e
@bartvanb
Copy link
Member

bartvanb commented Apr 8, 2024

I think it's a good idea because the code is now too complex (and not so nice) for working with different registry types to check the versions

Perhaps related to this issue, I'd argue that running something like make base-image should limit itself to building the image. Pushing to a production registry seems unexpected from just running that command. Perhaps we can find a series of make incantations that work for everybody – local tags, make push, .. or whatever we might come up with :)

I'm fine with changing this but to me it's not really high-prio. Maybe easiest is to rename to build-and-push or so - of course splitting is also an option if you want to do it yourself ;-)

@frankcorneliusmartin
Copy link
Contributor

I also agree with this, thanks for your research @lsago.

IMHO, vantage6 shouldn't be tied to a specific registry implementation like harbor – especially to parts of it that are outside of the .

I think indeed it would be good to use this as a reference for future extensions in the docker_addon section.

@mellesies (@harmbuisman) do you still use this feature? If so then keep in mind that you have to adjust your workflow.

lsago added a commit that referenced this issue Apr 8, 2024
@bartvanb bartvanb closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change Internal change New
Projects
None yet
Development

No branches or pull requests

3 participants