Skip to content

Can't train detector model using Google Cloud #2890

@lucasharada

Description

@lucasharada

I'm trying to train my own Detector model based on Tensorflow sample and this post. And I did succeed on training locally on a Macbook Pro. The problem is that I don't have a GPU and doing it on the CPU is too slow (about 25-40s per iteration).

This way, I'm trying to run on Google Cloud ML Engine following the tutorial, but I can't make it run properly.

My folder structures is described below:

+ data
 - train.record
 - test.record
+ models
 + train
 + eval
+ training
 - ssd_mobilenet_v1_coco

My steps to change from local training to Google Cloud training were:

  1. Create a bucket in Google Cloud storage and copy my local folder structure with files;
  2. Edit my pipeline.config file and change all paths from Users/dev/detector/ to gcc://bucketname/;
  3. Create a YAML file with the default configuration provided in the tutorial;
  4. Run
gcloud ml-engine jobs submit training object_detection_date +%s \ 
--job-dir=gs://bucketname/models/train \ 
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \ 
--module-name object_detection.train \ 
--region us-east1 \ 
--config /Users/dev/detector/training/cloud.yml \ 
-- \ 
--train_dir=gs://bucketname/models/train \ 
--pipeline_config_path=gs://bucketname/data/pipeline.config

Doing so, gives me the following error message from the MLUnits:

The replica ps 0 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/object_detection/train.py", line 49, in from object_detection import trainer File "/root/.local/lib/python2.7/site-packages/object_detection/trainer.py", line 27, in from object_detection.builders import preprocessor_builder File "/root/.local/lib/python2.7/site-packages/object_detection/builders/preprocessor_builder.py", line 21, in from object_detection.protos import preprocessor_pb2 File "/root/.local/lib/python2.7/site-packages/object_detection/protos/preprocessor_pb2.py", line 71, in options=None, file=DESCRIPTOR), TypeError: new() got an unexpected keyword argument 'file'

Thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions