Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Could not start v0.11.0 from Docker Compose #1478

Closed
Minamiyama opened this issue May 12, 2024 · 18 comments
Closed

BUG: Could not start v0.11.0 from Docker Compose #1478

Minamiyama opened this issue May 12, 2024 · 18 comments
Labels
bug Something isn't working
Milestone

Comments

@Minamiyama
Copy link
Contributor

No description provided.

@XprobeBot XprobeBot added the bug Something isn't working label May 12, 2024
@XprobeBot XprobeBot added this to the v0.11.1 milestone May 12, 2024
@XiaoCC
Copy link

XiaoCC commented May 12, 2024

我也是最新版的镜像起不来

@yanmao2023
Copy link

me too.+1

@ChengjieLi28
Copy link
Contributor

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this:
Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it?

In my own machine with two GPUs, I can use xinference it normally with above method.

@Minamiyama
Copy link
Contributor Author

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it?

In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

@ChengjieLi28
Copy link
Contributor

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it?
In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Paste your error stack. And your related commands.

@Minamiyama
Copy link
Contributor Author

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it?
In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Paste your error stack. And your related commands.

image

image

image

@ChengjieLi28
Copy link
Contributor

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it?
In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Paste your error stack. And your related commands.

image

image

image

What's the error here? You can also add --log-level debug in your entrypoint command.
Could you just test:

  1. build the new image
  2. run it
docker run -p 9997:9997 --gpus all <the new image> xinference-local --log-level debug -H 0.0.0.0
  1. And then use it

@Minamiyama
Copy link
Contributor Author

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it?
In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Paste your error stack. And your related commands.

image
image
image

What's the error here? You can also add --log-level debug in your entrypoint command. Could you just test:

  1. build the new image
  2. run it
docker run -p 9997:9997 --gpus all <the new image> xinference-local --log-level debug -H 0.0.0.0
  1. And then use it

image

is this message useful?

@ChengjieLi28
Copy link
Contributor

ChengjieLi28 commented May 13, 2024

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it?
In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Paste your error stack. And your related commands.

image
image
image

What's the error here? You can also add --log-level debug in your entrypoint command. Could you just test:

  1. build the new image
  2. run it
docker run -p 9997:9997 --gpus all <the new image> xinference-local --log-level debug -H 0.0.0.0
  1. And then use it

image

is this message useful?

image What's this? It seems no relation with xinference and it may be the issue with your cuda environment. The docker image uses pytorch image as the base image. You can try that whether you can use this image directly:

pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

@Minamiyama
Copy link
Contributor Author

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it?
In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Paste your error stack. And your related commands.

image
image
image

What's the error here? You can also add --log-level debug in your entrypoint command. Could you just test:

  1. build the new image
  2. run it
docker run -p 9997:9997 --gpus all <the new image> xinference-local --log-level debug -H 0.0.0.0
  1. And then use it

image
is this message useful?

image What's this? It seems no relation with xinference and it may be the issue with your cuda environment. The docker image uses pytorch image as the base image. You can try that whether you can use this image directly:
pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

image

cause by adding --log-level, mayby it's my wrong usage

@Minamiyama
Copy link
Contributor Author

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it?
In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Paste your error stack. And your related commands.

image
image
image

What's the error here? You can also add --log-level debug in your entrypoint command. Could you just test:

  1. build the new image
  2. run it
docker run -p 9997:9997 --gpus all <the new image> xinference-local --log-level debug -H 0.0.0.0
  1. And then use it

image
is this message useful?

image What's this? It seems no relation with xinference and it may be the issue with your cuda environment. The docker image uses pytorch image as the base image. You can try that whether you can use this image directly:
pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

image

cause by adding --log-level, mayby it's my wrong usage

image

nothing new shown, and auto shut down as well

@ChengjieLi28
Copy link
Contributor

pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

Just run

docker run --gpus all pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel tail -f /dev/null

still auto shutdown?

@ChengjieLi28
Copy link
Contributor

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it?
In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Paste your error stack. And your related commands.

image
image
image

What's the error here? You can also add --log-level debug in your entrypoint command. Could you just test:

  1. build the new image
  2. run it
docker run -p 9997:9997 --gpus all <the new image> xinference-local --log-level debug -H 0.0.0.0
  1. And then use it

image
is this message useful?

image What's this? It seems no relation with xinference and it may be the issue with your cuda environment. The docker image uses pytorch image as the base image. You can try that whether you can use this image directly:
pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

image
cause by adding --log-level, mayby it's my wrong usage

image

nothing new shown, and auto shut down as well

The host machine is windows OS. May cannot use 0.0.0.0. I haven't tried windows. Remove -H 0.0.0.0 and try again.

@Minamiyama
Copy link
Contributor Author

pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

Just run

docker run --gpus all pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel tail -f /dev/null

still auto shutdown?

image

running normally

@ChengjieLi28
Copy link
Contributor

pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

Just run

docker run --gpus all pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel tail -f /dev/null

still auto shutdown?

image

running normally

Cannot reproduce.

docker pull xprobe/xinference:nightly-bug_torchvision_version

This image is built by #1485 . And I can use it normally on my ubuntu machine.

@Minamiyama
Copy link
Contributor Author

pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

Just run

docker run --gpus all pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel tail -f /dev/null

still auto shutdown?

image
running normally

Cannot reproduce.

docker pull xprobe/xinference:nightly-bug_torchvision_version

This image is built by #1485 . And I can use it normally on my ubuntu machine.

image

failed as well

@ChengjieLi28
Copy link
Contributor

@Minamiyama Try this image:

docker pull xprobe/xinference:nightly-docker_crash_due_to_llama

@XprobeBot XprobeBot modified the milestones: v0.11.1, v0.11.2 May 17, 2024
@Minamiyama
Copy link
Contributor Author

0.11.1 is ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants