Skip to content
This repository was archived by the owner on Dec 9, 2024. It is now read-only.
This repository was archived by the owner on Dec 9, 2024. It is now read-only.

Accuracy doesn't increase #209

@GRGargallo

Description

@GRGargallo

Hi,
I run tf_cnn_benchmark and my accuracy doesn't grow up. I don't know if I do it wrong horovod's settings for CPU or a mistake in the way of using vgg16 model. Is there something to change? Is normal than accuracy doesn't change after 1 epoch?

Specs of the nodes:
CPU: Cavium ThunderX (ARM)
Number cores per socket: 48
Number threads per socket: 96
Number socket per MOBO: 2
Number of nodes: 2
RAM: 128GB
OS: 14.04.5 LTS

Command using slurm as resource management:

srun  --wait=999999 -K --cpu_bind=verbose python tf_cnn_benchmarks.py \
--variable_update=horovod --device=cpu --data_dir=/home/gramirez/imagenet --num_inter_threads=1 \
--num_intra_threads=48 --model=vgg16 --data_name=imagenet --nodistortions --allow_growth=True \
--optimizer=momentum --data_format=NHWC  --num_batches=1360 --num_warmup_batches=40 \
--batch_size=252 --print_training_accuracy=true --forward_only=true

Output from first MPI rank:

TensorFlow:  1.9
Model:       vgg16
Dataset:     imagenet
Mode:        forward-only
SingleSess:  False
Batch size:  1008 global
             252.0 per device
Num batches: 1360
Num epochs:  1.07
Devices:     ['horovod/cpu:0', 'horovod/cpu:1', 'horovod/cpu:2', 'horovod/cpu:3']
Data format: NHWC
Layout optimizer: False
Optimizer:   momentum
Variables:   horovod
==========
Generating model
Running warm up
Done warm up
Step    Img/sec total_loss      top_1_accuracy  top_5_accuracy
1       images/sec: 4.3 +/- 0.0 (jitter = 0.0)  0.000   0.000   0.004
10      images/sec: 4.2 +/- 0.0 (jitter = 0.0)  0.000   0.000   0.004
.
.
.
1340    images/sec: 4.3 +/- 0.0 (jitter = 0.0)  0.000   0.004   0.008
1350    images/sec: 4.3 +/- 0.0 (jitter = 0.0)  0.000   0.000   0.000
1360    images/sec: 4.2 +/- 0.0 (jitter = 0.0)  0.000   0.000   0.000
----------------------------------------------------------------
total images/sec: 17.00
----------------------------------------------------------------

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions