Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nr_tower #1561

Open
bon1996 opened this issue Jan 11, 2024 · 0 comments
Open

nr_tower #1561

bon1996 opened this issue Jan 11, 2024 · 0 comments

Comments

@bon1996
Copy link

bon1996 commented Jan 11, 2024

If you're asking about an unexpected problem which you do not know the root cause,
use this template. PLEASE DO NOT DELETE THIS TEMPLATE, FILL IT:

If you already know the root cause to your problem,
feel free to delete everything in this template.

1. What you did:

import argparse
import os

import tensorflow as tf
from tensorflow.contrib.layers import variance_scaling_initializer
from tensorpack import *
from tensorpack.dataflow import dataset
from tensorpack.tfutils.summary import *
from tensorpack.tfutils.symbolic_functions import *
from tensorpack.utils.gpu import get_nr_gpu

max_epoch=400,
nr_tower=max(get_nr_gpu(), 1),
session_init=SaverRestore(args.load) if args.load else None

launch_train_with_config(config,SyncMultiGPUTrainerParameterServer(nr_tower))

(1) If you're using examples, what's the command you run:

python /home/liuxp/LQ-Nets-master/cifar10-vgg-small.py --gpu 0 --qw 1 --qa 2

(2) If you're using examples, have you made any changes to the examples? Paste git status; git diff here:

(3) If not using examples, help us reproduce your issue:

It's always better to copy-paste what you did than to describe them.

Please try to provide enough information to let others reproduce your issues.
Without reproducing the issue, we may not be able to investigate it.

2. What you observed:

(1) Include the ENTIRE logs here:

Traceback (most recent call last):
File "/home/liuxp/LQ-Nets-master/cifar10-vgg-small.py", line 157, in
launch_train_with_config(config, SyncMultiGPUTrainerParameterServer(nr_tower))
NameError: name 'nr_tower' is not defined

Tensorpack typically saves stdout to its training log.
If stderr is relevant, you can run a command with my_command 2>&1 | tee logs.txt
to save both stdout and stderr to one file.

(2) Other observations, if any:
For example, CPU/GPU utilization, output images, tensorboard curves, if relevant to your issue.

3. What you expected, if not obvious.

I wanna debug this

4. Your environment:

Paste the output of this command: python -m tensorpack.tfutils.collect_env
If this command failed, also tell us your version of Python/TF/tensorpack.

python 3.6.13 h12debd9_1
python-dateutil 2.8.2
python-prctl 1.8.1

tensorflow 1.10.0 gpu_py36h8dbd23f_0
tensorflow-base 1.10.0 gpu_py36h3435052_0
tensorflow-gpu 1.10.0
tensorpack 0.9.9

Note that:

  • You can install tensorpack master by pip install -U git+https://github.com/tensorpack/tensorpack.git
    and see if your issue is already solved.
  • If you're not using tensorpack under a normal command line shell (e.g.,
    using an IDE or jupyter notebook), please retry under a normal command line shell.

You may often want to provide extra information related to your issue, but
at the minimum please try to provide the above information accurately to save effort in the investigation.

@bon1996 bon1996 changed the title Please read & provide the following information nr_tower Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant