-
Notifications
You must be signed in to change notification settings - Fork 630
Using Image Net data set #176
Description
I am trying to use the ImageNET data . I am using the script to prepare the data for TF. The data is in the directory:
/home/amalik/NEWIMAGENET/tfrecrd/train
Some eaxmples of the outputfiles:
train-00000-of-01024 train-00106-of-01024 train-00212-of-01024 train-00318-of-01024 train-00424-of-01024 train-00530-of-01024 train-00636-of-01024 train-00742-of-01024 train-00848-of-01024 train-00954-of-01024 validation-00036-of-00128
train-00001-of-01024 train-00107-of-01024 train-00213-of-01024 train-00319-of-01024 train-00425-of-01024 train-00531-of-01024 train-00637-of-01024 train-00743-of-01024 train-00849-of-01024 train-00955-of-01024 validation-00037-of-00128
train-00002-of-01024 train-00108-of-01024 train-00214-of-01024 train-00320-of-01024 train-00426-of-01024 train-00532-of-01024 train-00638-of-01024 train-00744-of-01024 train-00850-of-01024 train-00956-of-01024 validation-00038-of-00128
train-00003-of-01024 train-00109-of-01024 train-00215-of-01024 train-00321-of-01024 train-00427-of-01024 train-00533-of-01024 train-00639-of-01024 train-00745-of-01024 train-00851-of-01024 train-00957-of-01024 validation-00039-of-00128
train-00004-of-01024 train-00110-of-01024 train-00216-of-01024 train-00322-of-01024 train-00428-of-01024 train-00534-of-01024 train-00640-of-01024 train-00746-of-01024 train-00852-of-01024 train-00958-of-01024 validation-00040-of-00128
train-00005-of-01024 train-00111-of-01024 train-00217-of-01024 train-00323-of-01024 train-00429-of-01024 train-00535-of-01024 train-00641-of-01024 train-00747-of-01024 train-00853-of-01024 train-00959-of-01024 validation-00041-of-00128
train-00006-of-01024 train-00112-of-01024 train-00218-of-01024 train-00324-of-01024 train-00430-of-01024 train-00536-of-01024 train-00642-of-01024 train-00748-of-01024 train-00854-of-01024 train-00960-of-01024 validation-00042-of-00128
train-00007-of-01024 train-00113-of-01024 train-00219-of-01024 train-00325-of-01024 train-00431-of-01024 train-00537-of-01024 train-00643-of-01024 train-00749-of-01024 train-00855-of-01024 train-00961-of-01024 validation-00043-of-00128
train-00008-of-01024 train-00114-of-01024 train-00220-of-01024 train-00326-of-01024 train-00432-of-01024 train-00538-of-01024 train-00644-of-01024 train-00750-of-01024 train-00856-of-01024 train-00962-of-01024 validation-00044-of-00128
train-00009-of-01024 train-00115-of-01024 train-00221-of-01024 train-00327-of-01024 train-00433-of-01024 train-00539-of-01024 train-00645-of-01024 train-00751-of-01024 train-00857-of-01024 train-00963-of-01024 validation-00045-of-00128
train-00010-of-01024 train-00116-of-01024 train-00222-of-01024 train-00328-of-01024 train-00434-of-01024 train-00540-of-01024 train-00646-of-01024 train-00752-of-01024 train-00858-of-01024 train-00964-of-01024 validation-00046-of-00128
train-00011-of-01024 train-00117-of-01024 train-00223-of-01024 train-00329-of-01024 train-00435-of-01024 train-00541-of-01024 train-00647-of-01024 train-00753-of-01024 train-00859-of-01024 train-00965-of-01024 validation-00047-of-00128
train-00012-of-01024 train-00118-of-01024 train-00224-of-01024 train-00330-of-01024 train-00436-of-01024 train-00542-of-01024 train-00648-of-01024 train-00754-of-01024 train-00860-of-01024 train-00966-of-01024 validation-00048-of-00128
train-00013-of-01024 train-00119-of-01024 train-00225-of-01024 train-00331-of-01024 train-00437-of-01024 train-00543-of-01024 train-00649-of-01024 train-00755-of-01024 train-00861-of-01024 train-00967-of-01024 validation-00049-of-00128
train-00014-of-01024 train-00120-of-01024 train-00226-of-01024 train-00332-of-01024 train-00438-of-01024 train-00544-of-01024 train-00650-of-01024 train-00756-of-01024 train-00862-of-01024 train-00968-of-01024 validation-00050-of-00128
train-00015-of-01024 train-00121-of-01024 train-00227-of-01024 train-00333-of-01024 train-00439-of-01024 train-00545-of-01024 train-00651-of-01024 train-00757-of-01024 train-00863-of-01024 train-00969-of-01024 validation-00051-of-00128
train-0001
===
I am using the following to run the benchmarks for alexnet:
mpirun -np 1 -npernode 1 -x NCCL_DEBUG=INFO python tf_cnn_benchmarks.py --model=alexnet --batch_size=64 --data_name=imagenet --data_dir=/home/amalik/NEWIMAGENET/tfrecrd/train --horovod_device=gpu --num_batches=200 --print_training_accuracy --variable_update horovod
However, I am getting the following error:
WARNING: Logging before flag parsing goes to stderr.
W0430 10:17:19.406574 140542494578496 tf_logging.py:126] From /home/amalik/tenENV/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Traceback (most recent call last):
File "tf_cnn_benchmarks.py", line 60, in
app.run(main) # Raises error on invalid flags, unlike tf.app.run()
File "/home/amalik/.local/lib/python2.7/site-packages/absl/app.py", line 274, in run
_run_main(main, argv)
File "/home/amalik/.local/lib/python2.7/site-packages/absl/app.py", line 238, in _run_main
sys.exit(main(argv))
File "tf_cnn_benchmarks.py", line 56, in main
bench.run()
File "/home/amalik/horovod/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1270, in run
return self._benchmark_cnn()
File "/home/amalik/horovod/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1391, in _benchmark_cnn
(image_producer_ops, enqueue_ops, fetches) = self._build_model()
File "/home/amalik/horovod/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1702, in _build_model
image_producer_stages) = self._build_image_processing(shift_ratio=0)
File "/home/amalik/horovod/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1645, in _build_image_processing
shift_ratio=shift_ratio)
File "/home/amalik/horovod/benchmarks/scripts/tf_cnn_benchmarks/preprocessing.py", line 513, in minibatch
self.parse_and_preprocess, dataset, subset, self.train, cache_data)
File "/home/amalik/horovod/benchmarks/scripts/tf_cnn_benchmarks/data_utils.py", line 82, in create_iterator
file_names = gfile.Glob(glob_pattern)
File "/home/amalik/tenENV/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 339, in get_matching_files
compat.as_bytes(single_filename), status)
File "/home/amalik/tenENV/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: /home/amalik/NEWIMAGENET/tfrecrd/train; No such file or directory
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[49226,1],0]
Exit code: 1
How should I access the data for training?
Thanks