Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(image): Research nydus/stargz #1176

Open
gaocegege opened this issue Nov 10, 2022 · 14 comments
Open

feat(image): Research nydus/stargz #1176

gaocegege opened this issue Nov 10, 2022 · 14 comments
Assignees

Comments

@gaocegege
Copy link
Member

gaocegege commented Nov 10, 2022

Description

stargz/nydus can accelerate the image load process on Kubernetes. Let's investigate how to integrate and the benefits to AI/ML use case.


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

@xieydd
Copy link
Member

xieydd commented Nov 14, 2022

In production enviroment, i rarely see images below 20G, some user even put the data in image. I think image accelerate is highlight and practical solutions to user problems.

@gaocegege
Copy link
Member Author

Yep, I think so.

@gaocegege
Copy link
Member Author

filename: usr/local/lib/python3.8/dist-packages/wrapt-1.14.1.dist-info/top_level.txt, offset: 1046700032, size: 6
filename: usr/local/lib/python3.8/dist-packages/zipp/, offset: 1046701056, size: 0
filename: usr/local/lib/python3.8/dist-packages/zipp/__init__.py, offset: 1046701568, size: 8659
filename: usr/local/lib/python3.8/dist-packages/zipp/__pycache__/, offset: 1046710784, size: 0
filename: usr/local/lib/python3.8/dist-packages/zipp/__pycache__/__init__.cpython-38.pyc, offset: 1046711296, size: 10762
filename: usr/local/lib/python3.8/dist-packages/zipp/__pycache__/py310compat.cpython-38.pyc, offset: 1046723072, size: 406
filename: usr/local/lib/python3.8/dist-packages/zipp/py310compat.py, offset: 1046724096, size: 309
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/, offset: 1046725120, size: 0
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/INSTALLER, offset: 1046725632, size: 4
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/LICENSE, offset: 1046726656, size: 1050
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/METADATA, offset: 1046728704, size: 3672
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/RECORD, offset: 1046733312, size: 707
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/WHEEL, offset: 1046734848, size: 92
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/top_level.txt, offset: 1046735872, size: 5

The index generated by https://github.com/awslabs/soci-snapshotter

@gaocegege
Copy link
Member Author

SOCI addresses these issues by loading from the original, unmodified OCI image. Instead of converting the image, it builds a separate index artifact (the "SOCI index"), which lives in the remote registry, right next to the image itself. At container launch time, SOCI Snapshotter queries the registry for the presence of the SOCI index using the mechanism developed by the OCI Reference Types working group.

@gaocegege
Copy link
Member Author

@gaocegege
Copy link
Member Author

@cutecutecat
Copy link
Member

cutecutecat commented Nov 24, 2022

Difference of stargz/nydus:

Design report:

Pros & Cons:
From my perspective, nydus might be faster, with lower CPU load, but need to introduce an standalone executable file. While estargz is more compatible with buildkit.

There seems some difference in image format of them either, need to do more research.

@gaocegege
Copy link
Member Author

@cutecutecat Are you interested in this? You can pick it up. And I'd appreciate it.

@cutecutecat
Copy link
Member

cutecutecat commented Nov 24, 2022

@cutecutecat Are you interested in this? You can pick it up. And I'd appreciate it.

@gaocegege
Yes, I would like to pick it. Are there anything else that need to investigate?

As we have known buildkit can built both of them, I think I could build a large image by buildctl and test the time cost and image size?

@cutecutecat
Copy link
Member

cutecutecat commented Nov 24, 2022

Restriction

Nydus

Nydus is conflict with --export-cache and --import-cache, is this acceptable in envd? @gaocegege
I think it might be not.

Since exported Nydus image will always have one more metadata layer than images in other compression types, Nydus image cannot be exported/imported as cache.

ref: https://github.com/moby/buildkit/blob/master/docs/nydus.md and moby/buildkit#2581

Estargz

Rootless execution is currently unsupported.

It seems more acceptable. The cache python, R and julia don't work with rootless now, but we should be careful if we would support rootless of cache in the future.
If we pick Estargz, we should keep a configurable item for image format, instead of substitude origin image format.

@cutecutecat
Copy link
Member

Prefetch

Estargz and Nydus support prefetch. This can be used to mitigate runtime performance drawbacks caused by the on-demand fetching of each file.

Maybe we could use https://github.com/docker-slim/docker-slim to do a scan of some typical ML training case, in order to pick which file is hotspot and need to be prefetched.

@gaocegege
Copy link
Member Author

I think we can verify the benifits of theses tools. For example, we can run a shell in tensorflow. And see the startup time.

@cutecutecat
Copy link
Member

cutecutecat commented Nov 28, 2022

golang:1.18-alpine is used to build and run a simple hello.go to test Golang building cost of stargz.
mskwyditd/pytorch-cuda-python3.1 is used to run a simple train.go to test Python building cost of stargz. As docker.io is too slow for a 16G image, I deploy an localhost registry by:

# limit registry pull speed to 200mbps

sudo docker run -d \                                                       
    --name docker-tc \
    --network host \
    --cap-add NET_ADMIN \
    --restart always \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v /var/docker-tc:/var/docker-tc \
    lukaszlach/docker-tc

sudo docker network create test-net

sudo docker run --net test-net --label "com.docker-tc.limit=200mbps"  -d -p 5000:5000 --restart=always --name registry registry:2

The hello.go is simple:

package main

import "fmt"

func main() {
	fmt.Println("Hello, world!")
}

The train.py is use CNN to predict mnist, source from file.

nerdctl is used to pull, convert and build the image.

sudo nerdctl image pull mskwyditd/pytorch-cuda-python3.10:latest

sudo nerdctl image convert --estargz --oci starkind/stargz-examples:pycache  starkind/stargz-examples:pycache-stgz

sudo time -o first.txt buildctl build --frontend dockerfile.v0 \
                 --no-cache \
                 --local context=. \
                 --local dockerfile=. \
Image size source stargz File Pull / s Run first time / s
golang:1.18-alpine 113.35 M docker.io   hello.go 96.4 1.37
golang:1.18-alpine 117.65 M docker.io hello.go / 37.8
mskwyditd/pytorch-cuda-python3.10 16.2G localhost   train.py 91.0 406.9
mskwyditd/pytorch-cuda-python3.10 16.3G localhost train.py / 420.1

traditional-pytorch-example

 => [internal] load .dockerignore                                                                             0.0s
 => => transferring context: 2B                                                                               0.0s
 => [internal] load build definition from Dockerfile                                                          0.0s
 => => transferring dockerfile: 376B                                                                          0.0s
 => [internal] load metadata for localhost:5000/pycache:latest                                                0.0s
 => [internal] load build context                                                                             0.0s
 => => transferring context: 5.68kB                                                                           0.0s
 => [1/3] FROM localhost:5000/pycache:latest@sha256:b5d0f6ea5ace68790c08cf17201eaa5998ecf53087b9ac57b392841  91.0s
 => => resolve localhost:5000/pycache:latest@sha256:b5d0f6ea5ace68790c08cf17201eaa5998ecf53087b9ac57b3928415  0.0s
 => => sha256:d031a9181ade169343b9a94cbc6cd4e6647e98f64134d33e94ee8f8f7c85ed5c 86.90kB / 86.90kB              0.0s
 => => sha256:29f6e52f2e6080c637928592798904ecedb31e4079c07748edb7376ebbd2e398 63.10kB / 63.10kB              0.0s
 => => sha256:64129b569154cf5afcea88d65c1657a84d9961b7aaf086bd2fe2f2e3ed2fcad8 6.43kB / 6.43kB                0.0s 
 => => sha256:1362a29ff46515e1f117f2bebd093ce13af97bb1a6f27171abc4990dbee4a435 186B / 186B                    0.0s 
 => => sha256:82bb026e1cd969dcc9dface186bc188a104a5f5a03c6cad8ff422f6f3aa98995 7.26GB / 7.26GB               41.1s 
 => => sha256:813ff0237f8341ab86af37666aa400c9640cb266317881233c7112927b791f8c 1.60GB / 1.60GB               10.3s 
 => => sha256:19e4169ce7d724dbcc1a6f5bf9e5dc21a05a6983173f3522c106bfb4994d07a5 1.18GB / 1.18GB                6.8s 
 => => sha256:ccd8058ddd7517692e482566c35645f0bfdd75354260d9ea207de5c699564bee 56.23MB / 56.23MB              0.3s 
 => => sha256:58710bbb48677cfcf4bed3cdd3cbb56f040f85e1b4fc8df8a2715d7760b45c67 4.60MB / 4.60MB                0.0s 
 => => sha256:cf92e523b49ea3d1fae59f5f082437a5f96c244fda6697995920142ff31d59cf 30.43MB / 30.43MB              0.2s 
 => => extracting sha256:cf92e523b49ea3d1fae59f5f082437a5f96c244fda6697995920142ff31d59cf                     0.6s 
 => => extracting sha256:58710bbb48677cfcf4bed3cdd3cbb56f040f85e1b4fc8df8a2715d7760b45c67                     0.1s 
 => => extracting sha256:ccd8058ddd7517692e482566c35645f0bfdd75354260d9ea207de5c699564bee                     0.8s 
 => => extracting sha256:1362a29ff46515e1f117f2bebd093ce13af97bb1a6f27171abc4990dbee4a435                     0.0s
 => => extracting sha256:64129b569154cf5afcea88d65c1657a84d9961b7aaf086bd2fe2f2e3ed2fcad8                     0.0s
 => => extracting sha256:19e4169ce7d724dbcc1a6f5bf9e5dc21a05a6983173f3522c106bfb4994d07a5                    11.0s
 => => extracting sha256:29f6e52f2e6080c637928592798904ecedb31e4079c07748edb7376ebbd2e398                     0.0s
 => => extracting sha256:813ff0237f8341ab86af37666aa400c9640cb266317881233c7112927b791f8c                    20.0s
 => => extracting sha256:d031a9181ade169343b9a94cbc6cd4e6647e98f64134d33e94ee8f8f7c85ed5c                     0.3s
 => => extracting sha256:82bb026e1cd969dcc9dface186bc188a104a5f5a03c6cad8ff422f6f3aa98995                    49.4s
 => [2/3] COPY ./train.py /train.py                                                                          15.9s
 => [3/3] RUN python3 train.py                                                                              406.9s

stargz-pytorch-example

 => [internal] load .dockerignore                                                                            0.0ss
 => => transferring context: 2B                                                                              0.0ss
 => [internal] load build definition from Dockerfile                                                         0.0ss
 => => transferring dockerfile: 293B                                                                         0.0ss
 => [internal] load metadata for localhost:5000/pycache-stgz:latest                                          0.0ss
 => [internal] load build context                                                                            0.0ss
 => => transferring context: 5.68kB                                                                          0.0ss
 => [1/3] FROM localhost:5000/pycache-stgz:latest@sha256:878d36dadf5fe645453793433006827170334aa454470e2efa  0.0ss
 => => resolve localhost:5000/pycache-stgz:latest@sha256:878d36dadf5fe645453793433006827170334aa454470e2efa  0.0ss
 => [2/3] COPY ./train.py /train.py                                                                          0.1ss
 => [3/3] RUN python3 train.py                                                                             420.1ss

@gaocegege
Copy link
Member Author

It's awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: No status
Development

No branches or pull requests

3 participants