MemoryData & JAR error #44

mauricio-onoda · 2016-03-29T17:28:41Z

We launched a cluster with your image and ran the "lenet_memory" example with success.

(1) Then we executed the same example, but with other data type (type: '"Data" vs "MemoryData") as showed in Caffe's directory, and an error occured:

Example:
==> Original data type:

}
data_param {
source: mnist_train_lmdb/"
batch_size: 64
backend: LMDB
}

==> New data type:

source_class: "com.yahoo.ml.caffe.LMDB"
memory_data_param {
source: "mnist_train_lmdb/"
batch_size: 64
channels: 1
height: 28
width: 28
share_in_parallel: false
}

Execution:

root@ip-172-31-14-118:~/CaffeOnSpark/data# spark-submit --master spark://$(hostname):7077
--files lenet_train_test.prototxt,lenet_solver.prototxt
--conf spark.cores.max=${TOTAL_CORES}
--conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}"
--conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}"
--class com.yahoo.ml.caffe.CaffeOnSpark
${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar
-train
-features accuracy,loss -label label
-conf lenet_solver.prototxt
-clusterSize ${SPARK_WORKER_INSTANCES}
-devices ${DEVICES}
-connection ethernet
-model /mnist.model
-output /mnist_features_result
....
16/03/29 16:36:25 INFO DataSource$: Source data layer:0
16/03/29 16:36:25 ERROR DataSource$: source_class must be defined for input data layer:Data
Exception in thread "main" java.lang.NullPointerException
...

Do CaffeOnSpark use only MemoryData type?

(2) We have tested another example from Caffe: "mnist_autoencoder". After change data type to MemoryData in prototxt file, we got an error:

root@ip-172-31-14-118:~/CaffeOnSpark/data# spark-submit --master spark://$(hostname):7077
--files mnist_memory_autoencoder.prototxt, mnist_memory_autoencoder_solver.prototxt
--conf spark.cores.max=${TOTAL_CORES}
--conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}"
--conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}"
--class com.yahoo.ml.caffe.CaffeOnSpark
${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar
-train -persistent
-features accuracy,loss -label label
-conf mnist_memory_autoencoder_solver.prototxt
-clusterSize ${SPARK_WORKER_INSTANCES}
-devices ${DEVICES}
-connection ethernet
-model /mnist_memory.model
-output /mnist_memory_autoencoder
Error: Cannot load main class from JAR file:/root/CaffeOnSpark/data/mnist_memory_autoencoder_solver.prototxt

This files exists in ~/CaffeOnSpark/data:

root@ip-172-31-14-118:~/CaffeOnSpark/data# ls mni* -l
-rwxr-xr-x 1 root root 5102 Mar 29 16:14 mnist_memory_autoencoder.prototxt
-rwxr-xr-x 1 root root 417 Mar 29 16:06 mnist_memory_autoencoder_solver.prototxt

What are we missing?

anfeng · 2016-03-29T18:52:23Z

Yes. We only support MemoryData right now.

What's content of your /data/mnist_memory_autoencoder_solver.prototxt? More specifically, I like to know its value for source_class

mauricio-onoda · 2016-03-29T19:48:35Z

name: "MNISTAutoencoder"
layer {
name: "data"
type: "MemoryData"
top: "data"
include {
phase: TRAIN
}
transform_param {
scale: 0.0039215684
}
source_class: "com.yahoo.ml.caffe.LMDB"
memory_data_param {
source: "mnist_train_lmdb/"
batch_size: 64
channels: 1
height: 28
width: 28
share_in_parallel: false
}
}

This source_class is the same for phases TEST/stage test-on-train and TEST/stage test-on-test.

anfeng · 2016-03-29T21:10:31Z

I suspect that you missed some spaces in your CLI. Please add a space char before all \s.

mauricio-onoda · 2016-03-30T13:03:30Z

I found the problem! At second line in my CLI there was a space between file's names:

--files mnist_memory_autoencoder.prototxt, mnist_memory_autoencoder_solver.prototxt\

With a space after the comma, the error occurs.
After I removed the comma, my CLI worked.

However, we got another error:

16/03/30 12:49:12 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-172-31-2-118.eu->west-1.compute.internal, partition 0,PROCESS_LOCAL, 2216 bytes)
16/03/30 12:49:12 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, ip-172-31-2-117.eu->west-1.compute.internal, partition 1,PROCESS_LOCAL, 2216 bytes)
16/03/30 12:49:12 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-31-2->118.eu-west-1.compute.internal:39514 (size: 2.0 KB, free: 8.9 GB)
16/03/30 12:49:12 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-31-2->117.eu-west-1.compute.internal:58422 (size: 2.0 KB, free: 8.9 GB)
16/03/30 12:49:12 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, ip-172-31-2-117.eu->west-1.compute.internal): java.io.FileNotFoundException: >/root/CaffeOnSpark/data/mnist_memory_autoencoder.prototxt (No such file or directory)

The question is: the prototxt files must be exists in all workers (nodes)? If yes, how may I copy these files to workers?

Thanks again!

mauricio-onoda · 2016-03-30T13:43:41Z

My mistake! I had used a version of mnist_memory_autoencoder_solver.prototxt with path at "net" parameter. After I removed the path, it worked.

githubier · 2017-01-12T06:37:04Z

I have met the same error (NullPointerException) when I train other network(Caffenet), more detail see the #issue 217. I
I change the spark submit :
${CAFFE_ON_SPARK}/data/, so the spark submit is:
spark-submit --master ${MASTER_URL}
--files ${CAFFE_ON_SPARK}/data/solver.prototxt,${CAFFE_ON_SPARK}/data/train_val.prototxt
--conf spark.cores.max=${TOTAL_CORES}
--conf spark.task.cpus=${CORES_PER_WORKER}
--conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}"
--conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}"
--class com.yahoo.ml.caffe.CaffeOnSpark
${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar
-train
-features accuracy,loss -label label
-conf solver.prototxt
-clusterSize ${SPARK_WORKER_INSTANCES}
-devices 1
-connection ethernet
-model file:${CAFFE_ON_SPARK}/myself_caffenet.model
-output file:${CAFFE_ON_SPARK}/myself_result

the solver.prototxt and train_val.prototxt at the path: ${CAFFE_ON_SPARK}/data/,
and the error is:
17/01/11 20:49:34 ERROR caffe.DataSource$: source_class must be defined for input data layer:Data
Exception in thread "main" java.lang.NullPointerException
at com.yahoo.ml.caffe.CaffeOnSpark.train(CaffeOnSpark.scala:103)
at com.yahoo.ml.caffe.CaffeOnSpark$.main(CaffeOnSpark.scala:40)
at com.yahoo.ml.caffe.CaffeOnSpark.main(CaffeOnSpark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/01/11 20:49:34 INFO spark.SparkContext: Invoking stop() from shutdown hook

where is my error?

mauricio-onoda closed this as completed Mar 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MemoryData & JAR error #44

MemoryData & JAR error #44

mauricio-onoda commented Mar 29, 2016

anfeng commented Mar 29, 2016

mauricio-onoda commented Mar 29, 2016

anfeng commented Mar 29, 2016

mauricio-onoda commented Mar 30, 2016

mauricio-onoda commented Mar 30, 2016

githubier commented Jan 12, 2017

MemoryData & JAR error #44

MemoryData & JAR error #44

Comments

mauricio-onoda commented Mar 29, 2016

anfeng commented Mar 29, 2016

mauricio-onoda commented Mar 29, 2016

anfeng commented Mar 29, 2016

mauricio-onoda commented Mar 30, 2016

mauricio-onoda commented Mar 30, 2016

githubier commented Jan 12, 2017