Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i can't get the horovod images using the cmd 'docker pull uber/horovod' which provided by tutorial,how can i get the docker enviroment? #25

Closed
MaxwellHan opened this issue Nov 7, 2018 · 8 comments

Comments

@MaxwellHan
Copy link

This template is for miscellaneous issues not covered by the other issue categories.

For questions on how to work with PocketFlow, or support for problems that are not verified bugs in PocketFlow, please go to StackOverflow.

For high-level discussions about TensorFlow, please post to discuss group.

@jiaxiang-wu
Copy link
Contributor

Hi, can you post the error message?

@herbiezhao
Copy link

try "docker pull uber/horovod:0.14.1-tf1.10.0-torch0.4.0-py3.5"

@MaxwellHan
Copy link
Author

Hi, can you post the error message?
看了教程,我做了以下步骤的操作:
step1. path.conf中指定了cifar数据集的地址
step2. 我用 docker pull uber/horovod:0.15.1-tf1.11.0-torch0.4.1-py3.5 pull下了镜像
step3. 修改scripts/run_docker.sh 文件,把执行nvidia-docker的镜像名从
docker.oa.com/g_tfplus/horovod:python3.5 改为 uber/horovod:0.15.1-tf1.11.0-torch0.4.1-py3.5
step4. 执行./scripts/run_docker.sh nets/resnet_at_cifar10_run.py,进入到了容器的bash命令中。但是既没有调用到gpu也没有运算,请问这样操作有什么问题,或者下一步应该做什么?

@jiaxiang-wu
Copy link
Contributor

After entering the docker environment, use the following command to start the program:

$ bash main.sh

P.S.: Please use English for future discussion, if possible.

@MaxwellHan
Copy link
Author

After entering the docker environment, use the following command to start the program:

$ bash main.sh

P.S.: Please use English for future discussion, if possible.

when i get into the docker eviroment and run "./main.sh" ,i got
"Could not find a version that satisfies the requirement docopt"
"Could not find a version that satisfies the requirement hdfs "
......
"Could not find a version that satisfies the requirement pandas" .et
,which means python doesn't have those modules.
Is the docker image "uber/horovod:0.15.1-tf1.11.0-torch0.4.1-py3.5" a wrong image?

My docker is in an internal network enviroment,which mean i can't connect the internet in the docker container, how do i fix this?

@xieydd
Copy link

xieydd commented Nov 9, 2018

I have veen tested in k8s , and use horovod ,it`s ok.

@herbiezhao
Copy link

hehe,because they use it inside tencent,and use internal source,you should use tsinghua or aliyun source instead of tencent internal source. Please modify main.sh with "index-url = https://pypi.tuna.tsinghua.edu.cn/simple" and "trusted-host=mirrors.aliyun.com"

@jiaxiang-wu
Copy link
Contributor

@herbiezhao
Thanks for pointing this out. For users outside Tencent, you need to modify "index-url" and "trusted-host" to be able to install extra dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants