Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/03/27 06:06:12 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 16/03/27 06:06:12 INFO SecurityManager: Changing view acls to: root 16/03/27 06:06:12 INFO SecurityManager: Changing modify acls to: root 16/03/27 06:06:12 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 16/03/27 06:06:13 INFO SecurityManager: Changing view acls to: root 16/03/27 06:06:13 INFO SecurityManager: Changing modify acls to: root 16/03/27 06:06:13 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 16/03/27 06:06:13 INFO Slf4jLogger: Slf4jLogger started 16/03/27 06:06:13 INFO Remoting: Starting remoting 16/03/27 06:06:14 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutorActorSystem@172.31.26.41:60184] 16/03/27 06:06:14 INFO Utils: Successfully started service 'sparkExecutorActorSystem' on port 60184. 16/03/27 06:06:14 INFO DiskBlockManager: Created local directory at /mnt/spark/spark-6c1a158a-29b7-44b6-9ba8-6b797e698dd9/executor-d30ff3f3-8f4a-4616-994d-4ddaf96f3c52/blockmgr-1907fd4d-1aac-4dc9-86ed-57b4f883bab5 16/03/27 06:06:14 INFO DiskBlockManager: Created local directory at /mnt2/spark/spark-9e87ca63-ed07-4268-88c7-7075fe65431d/executor-597385c8-b9a6-4fd5-a019-9a57a0e642ae/blockmgr-c9085e42-6022-4d6b-a65c-75f71ad401c9 16/03/27 06:06:14 INFO MemoryStore: MemoryStore started with capacity 37.9 GB 16/03/27 06:06:14 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@172.31.29.4:37094 16/03/27 06:06:14 INFO WorkerWatcher: Connecting to worker spark://Worker@172.31.26.41:35341 16/03/27 06:06:14 INFO CoarseGrainedExecutorBackend: Successfully registered with driver 16/03/27 06:06:14 INFO Executor: Starting executor ID 0 on host ip-172-31-26-41.eu-west-1.compute.internal 16/03/27 06:06:14 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49778. 16/03/27 06:06:14 INFO NettyBlockTransferService: Server created on 49778 16/03/27 06:06:14 INFO BlockManagerMaster: Trying to register BlockManager 16/03/27 06:06:14 INFO BlockManagerMaster: Registered BlockManager 16/03/27 06:06:16 INFO CoarseGrainedExecutorBackend: Got assigned task 1 16/03/27 06:06:16 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 16/03/27 06:06:16 INFO Executor: Fetching http://172.31.29.4:47335/files/lenet_memory_solver.prototxt with timestamp 1459058771583 16/03/27 06:06:16 INFO Utils: Fetching http://172.31.29.4:47335/files/lenet_memory_solver.prototxt to /mnt/spark/spark-6c1a158a-29b7-44b6-9ba8-6b797e698dd9/executor-d30ff3f3-8f4a-4616-994d-4ddaf96f3c52/spark-752a905d-e7fc-4f5e-ae64-ab2356bb0ad4/fetchFileTemp8606045603229999583.tmp 16/03/27 06:06:16 INFO Utils: Copying /mnt/spark/spark-6c1a158a-29b7-44b6-9ba8-6b797e698dd9/executor-d30ff3f3-8f4a-4616-994d-4ddaf96f3c52/spark-752a905d-e7fc-4f5e-ae64-ab2356bb0ad4/-18436743301459058771583_cache to /root/spark/work/app-20160327060611-0002/0/./lenet_memory_solver.prototxt 16/03/27 06:06:16 INFO Executor: Fetching http://172.31.29.4:47335/files/lenet_memory_train_test.prototxt with timestamp 1459058771575 16/03/27 06:06:16 INFO Utils: Fetching http://172.31.29.4:47335/files/lenet_memory_train_test.prototxt to /mnt/spark/spark-6c1a158a-29b7-44b6-9ba8-6b797e698dd9/executor-d30ff3f3-8f4a-4616-994d-4ddaf96f3c52/spark-752a905d-e7fc-4f5e-ae64-ab2356bb0ad4/fetchFileTemp3022397165667968136.tmp 16/03/27 06:06:16 INFO Utils: Copying /mnt/spark/spark-6c1a158a-29b7-44b6-9ba8-6b797e698dd9/executor-d30ff3f3-8f4a-4616-994d-4ddaf96f3c52/spark-752a905d-e7fc-4f5e-ae64-ab2356bb0ad4/211215521459058771575_cache to /root/spark/work/app-20160327060611-0002/0/./lenet_memory_train_test.prototxt 16/03/27 06:06:16 INFO Executor: Fetching http://172.31.29.4:47335/jars/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar with timestamp 1459058771465 16/03/27 06:06:16 INFO Utils: Fetching http://172.31.29.4:47335/jars/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar to /mnt/spark/spark-6c1a158a-29b7-44b6-9ba8-6b797e698dd9/executor-d30ff3f3-8f4a-4616-994d-4ddaf96f3c52/spark-752a905d-e7fc-4f5e-ae64-ab2356bb0ad4/fetchFileTemp7513563774151534830.tmp 16/03/27 06:06:16 INFO Utils: Copying /mnt/spark/spark-6c1a158a-29b7-44b6-9ba8-6b797e698dd9/executor-d30ff3f3-8f4a-4616-994d-4ddaf96f3c52/spark-752a905d-e7fc-4f5e-ae64-ab2356bb0ad4/-9138904761459058771465_cache to /root/spark/work/app-20160327060611-0002/0/./caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar 16/03/27 06:06:16 INFO Executor: Adding file:/root/spark/work/app-20160327060611-0002/0/./caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar to class loader 16/03/27 06:06:16 INFO TorrentBroadcast: Started reading broadcast variable 0 16/03/27 06:06:16 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.0 KB, free 2.0 KB) 16/03/27 06:06:16 INFO TorrentBroadcast: Reading broadcast variable 0 took 136 ms 16/03/27 06:06:16 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.2 KB, free 5.3 KB) 16/03/27 06:06:16 INFO CaffeProcessor: my rank is 1 16/03/27 06:06:16 INFO LMDB: Batch size:64 WARNING: Logging before InitGoogleLogging() is written to STDERR I0327 06:06:18.130993 4559 CaffeNet.cpp:78] set root solver device id to 0 I0327 06:06:18.237562 4559 solver.cpp:48] Initializing solver from parameters: test_iter: 1 test_interval: 10001 base_lr: 0.01 display: 100 max_iter: 10000 lr_policy: "inv" gamma: 0.0001 power: 0.75 momentum: 0.9 weight_decay: 0.0005 snapshot: 10001 snapshot_prefix: "mnist_lenet" solver_mode: GPU device_id: 0 net_param { name: "LeNet" layer { name: "data" type: "MemoryData" top: "data" top: "label" include { phase: TRAIN } memory_data_param { batch_size: 64 channels: 1 height: 28 width: 28 share_in_parallel: false source: "file:///root/CaffeOnSpark/data/mnist_train_lmdb/" } source_class: "com.yahoo.ml.caffe.LMDB" } layer { name: "data" type: "MemoryData" top: "data" top: "label" include { phase: TEST } memory_data_param { batch_size: 100 channels: 1 height: 28 width: 28 share_in_parallel: false source: "file:///root/CaffeOnSpark/data/mnist_test_lmdb/" } source_class: "com.yahoo.ml.caffe.LMDB" } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 20 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 50 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "ip1" type: "InnerProduct" bottom: "pool2" top: "ip1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "accuracy" type: "Accuracy" bottom: "ip2" bottom: "label" top: "accuracy" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" top: "loss" } } test_initialization: false I0327 06:06:18.237716 4559 solver.cpp:86] Creating training net specified in net_param. I0327 06:06:18.237805 4559 net.cpp:322] The NetState phase (0) differed from the phase (1) specified by a rule in layer data I0327 06:06:18.237825 4559 net.cpp:322] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy I0327 06:06:18.237910 4559 net.cpp:49] Initializing net from parameters: name: "LeNet" state { phase: TRAIN } layer { name: "data" type: "MemoryData" top: "data" top: "label" include { phase: TRAIN } memory_data_param { batch_size: 64 channels: 1 height: 28 width: 28 share_in_parallel: false source: "file:///root/CaffeOnSpark/data/mnist_train_lmdb/" } source_class: "com.yahoo.ml.caffe.LMDB" } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 20 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 50 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "ip1" type: "InnerProduct" bottom: "pool2" top: "ip1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" top: "loss" } I0327 06:06:18.238009 4559 layer_factory.hpp:77] Creating layer data I0327 06:06:18.238044 4559 net.cpp:106] Creating Layer data I0327 06:06:18.238065 4559 net.cpp:411] data -> data I0327 06:06:18.238100 4559 net.cpp:411] data -> label I0327 06:06:18.240193 4559 net.cpp:150] Setting up data I0327 06:06:18.240218 4559 net.cpp:157] Top shape: 64 1 28 28 (50176) I0327 06:06:18.240226 4559 net.cpp:157] Top shape: 64 (64) I0327 06:06:18.240229 4559 net.cpp:165] Memory required for data: 200960 I0327 06:06:18.240239 4559 layer_factory.hpp:77] Creating layer conv1 I0327 06:06:18.240269 4559 net.cpp:106] Creating Layer conv1 I0327 06:06:18.240284 4559 net.cpp:454] conv1 <- data I0327 06:06:18.240298 4559 net.cpp:411] conv1 -> conv1 I0327 06:06:18.242127 4559 net.cpp:150] Setting up conv1 I0327 06:06:18.242148 4559 net.cpp:157] Top shape: 64 20 24 24 (737280) I0327 06:06:18.242153 4559 net.cpp:165] Memory required for data: 3150080 I0327 06:06:18.242175 4559 layer_factory.hpp:77] Creating layer pool1 I0327 06:06:18.242194 4559 net.cpp:106] Creating Layer pool1 I0327 06:06:18.242200 4559 net.cpp:454] pool1 <- conv1 I0327 06:06:18.242207 4559 net.cpp:411] pool1 -> pool1 I0327 06:06:18.242310 4559 net.cpp:150] Setting up pool1 I0327 06:06:18.242326 4559 net.cpp:157] Top shape: 64 20 12 12 (184320) I0327 06:06:18.242331 4559 net.cpp:165] Memory required for data: 3887360 I0327 06:06:18.242336 4559 layer_factory.hpp:77] Creating layer conv2 I0327 06:06:18.242352 4559 net.cpp:106] Creating Layer conv2 I0327 06:06:18.242363 4559 net.cpp:454] conv2 <- pool1 I0327 06:06:18.242372 4559 net.cpp:411] conv2 -> conv2 I0327 06:06:18.243177 4559 net.cpp:150] Setting up conv2 I0327 06:06:18.243196 4559 net.cpp:157] Top shape: 64 50 8 8 (204800) I0327 06:06:18.243201 4559 net.cpp:165] Memory required for data: 4706560 I0327 06:06:18.243211 4559 layer_factory.hpp:77] Creating layer pool2 I0327 06:06:18.243226 4559 net.cpp:106] Creating Layer pool2 I0327 06:06:18.243230 4559 net.cpp:454] pool2 <- conv2 I0327 06:06:18.243237 4559 net.cpp:411] pool2 -> pool2 I0327 06:06:18.243324 4559 net.cpp:150] Setting up pool2 I0327 06:06:18.243340 4559 net.cpp:157] Top shape: 64 50 4 4 (51200) I0327 06:06:18.243345 4559 net.cpp:165] Memory required for data: 4911360 I0327 06:06:18.243348 4559 layer_factory.hpp:77] Creating layer ip1 I0327 06:06:18.243360 4559 net.cpp:106] Creating Layer ip1 I0327 06:06:18.243366 4559 net.cpp:454] ip1 <- pool2 I0327 06:06:18.243376 4559 net.cpp:411] ip1 -> ip1 I0327 06:06:18.247448 4559 net.cpp:150] Setting up ip1 I0327 06:06:18.247467 4559 net.cpp:157] Top shape: 64 500 (32000) I0327 06:06:18.247473 4559 net.cpp:165] Memory required for data: 5039360 I0327 06:06:18.247485 4559 layer_factory.hpp:77] Creating layer relu1 I0327 06:06:18.247500 4559 net.cpp:106] Creating Layer relu1 I0327 06:06:18.247504 4559 net.cpp:454] relu1 <- ip1 I0327 06:06:18.247511 4559 net.cpp:397] relu1 -> ip1 (in-place) I0327 06:06:18.247524 4559 net.cpp:150] Setting up relu1 I0327 06:06:18.247531 4559 net.cpp:157] Top shape: 64 500 (32000) I0327 06:06:18.247535 4559 net.cpp:165] Memory required for data: 5167360 I0327 06:06:18.247539 4559 layer_factory.hpp:77] Creating layer ip2 I0327 06:06:18.247546 4559 net.cpp:106] Creating Layer ip2 I0327 06:06:18.247550 4559 net.cpp:454] ip2 <- ip1 I0327 06:06:18.247560 4559 net.cpp:411] ip2 -> ip2 I0327 06:06:18.248672 4559 net.cpp:150] Setting up ip2 I0327 06:06:18.248692 4559 net.cpp:157] Top shape: 64 10 (640) I0327 06:06:18.248697 4559 net.cpp:165] Memory required for data: 5169920 I0327 06:06:18.248704 4559 layer_factory.hpp:77] Creating layer loss I0327 06:06:18.248719 4559 net.cpp:106] Creating Layer loss I0327 06:06:18.248730 4559 net.cpp:454] loss <- ip2 I0327 06:06:18.248736 4559 net.cpp:454] loss <- label I0327 06:06:18.248747 4559 net.cpp:411] loss -> loss I0327 06:06:18.248767 4559 layer_factory.hpp:77] Creating layer loss I0327 06:06:18.249009 4559 net.cpp:150] Setting up loss I0327 06:06:18.249027 4559 net.cpp:157] Top shape: (1) I0327 06:06:18.249032 4559 net.cpp:160] with loss weight 1 I0327 06:06:18.249055 4559 net.cpp:165] Memory required for data: 5169924 I0327 06:06:18.249060 4559 net.cpp:226] loss needs backward computation. I0327 06:06:18.249065 4559 net.cpp:226] ip2 needs backward computation. I0327 06:06:18.249069 4559 net.cpp:226] relu1 needs backward computation. I0327 06:06:18.249073 4559 net.cpp:226] ip1 needs backward computation. I0327 06:06:18.249078 4559 net.cpp:226] pool2 needs backward computation. I0327 06:06:18.249080 4559 net.cpp:226] conv2 needs backward computation. I0327 06:06:18.249086 4559 net.cpp:226] pool1 needs backward computation. I0327 06:06:18.249092 4559 net.cpp:226] conv1 needs backward computation. I0327 06:06:18.249097 4559 net.cpp:228] data does not need backward computation. I0327 06:06:18.249101 4559 net.cpp:270] This network produces output loss I0327 06:06:18.249111 4559 net.cpp:283] Network initialization done. I0327 06:06:18.249166 4559 solver.cpp:181] Creating test net (#0) specified by net_param I0327 06:06:18.249196 4559 net.cpp:322] The NetState phase (1) differed from the phase (0) specified by a rule in layer data I0327 06:06:18.249289 4559 net.cpp:49] Initializing net from parameters: name: "LeNet" state { phase: TEST } layer { name: "data" type: "MemoryData" top: "data" top: "label" include { phase: TEST } memory_data_param { batch_size: 100 channels: 1 height: 28 width: 28 share_in_parallel: false source: "file:///root/CaffeOnSpark/data/mnist_test_lmdb/" } source_class: "com.yahoo.ml.caffe.LMDB" } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 20 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 50 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "ip1" type: "InnerProduct" bottom: "pool2" top: "ip1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "accuracy" type: "Accuracy" bottom: "ip2" bottom: "label" top: "accuracy" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" top: "loss" } I0327 06:06:18.249374 4559 layer_factory.hpp:77] Creating layer data I0327 06:06:18.249390 4559 net.cpp:106] Creating Layer data I0327 06:06:18.249397 4559 net.cpp:411] data -> data I0327 06:06:18.249407 4559 net.cpp:411] data -> label I0327 06:06:18.250479 4559 net.cpp:150] Setting up data I0327 06:06:18.250502 4559 net.cpp:157] Top shape: 100 1 28 28 (78400) I0327 06:06:18.250509 4559 net.cpp:157] Top shape: 100 (100) I0327 06:06:18.250512 4559 net.cpp:165] Memory required for data: 314000 I0327 06:06:18.250517 4559 layer_factory.hpp:77] Creating layer label_data_1_split I0327 06:06:18.250527 4559 net.cpp:106] Creating Layer label_data_1_split I0327 06:06:18.250538 4559 net.cpp:454] label_data_1_split <- label I0327 06:06:18.250545 4559 net.cpp:411] label_data_1_split -> label_data_1_split_0 I0327 06:06:18.250552 4559 net.cpp:411] label_data_1_split -> label_data_1_split_1 I0327 06:06:18.250646 4559 net.cpp:150] Setting up label_data_1_split I0327 06:06:18.250663 4559 net.cpp:157] Top shape: 100 (100) I0327 06:06:18.250669 4559 net.cpp:157] Top shape: 100 (100) I0327 06:06:18.250672 4559 net.cpp:165] Memory required for data: 314800 I0327 06:06:18.250677 4559 layer_factory.hpp:77] Creating layer conv1 I0327 06:06:18.250690 4559 net.cpp:106] Creating Layer conv1 I0327 06:06:18.250701 4559 net.cpp:454] conv1 <- data I0327 06:06:18.250711 4559 net.cpp:411] conv1 -> conv1 I0327 06:06:18.251344 4559 net.cpp:150] Setting up conv1 I0327 06:06:18.251363 4559 net.cpp:157] Top shape: 100 20 24 24 (1152000) I0327 06:06:18.251368 4559 net.cpp:165] Memory required for data: 4922800 I0327 06:06:18.251379 4559 layer_factory.hpp:77] Creating layer pool1 I0327 06:06:18.251389 4559 net.cpp:106] Creating Layer pool1 I0327 06:06:18.251392 4559 net.cpp:454] pool1 <- conv1 I0327 06:06:18.251402 4559 net.cpp:411] pool1 -> pool1 I0327 06:06:18.251498 4559 net.cpp:150] Setting up pool1 I0327 06:06:18.251514 4559 net.cpp:157] Top shape: 100 20 12 12 (288000) I0327 06:06:18.251518 4559 net.cpp:165] Memory required for data: 6074800 I0327 06:06:18.251523 4559 layer_factory.hpp:77] Creating layer conv2 I0327 06:06:18.251538 4559 net.cpp:106] Creating Layer conv2 I0327 06:06:18.251545 4559 net.cpp:454] conv2 <- pool1 I0327 06:06:18.251555 4559 net.cpp:411] conv2 -> conv2 I0327 06:06:18.252347 4559 net.cpp:150] Setting up conv2 I0327 06:06:18.252367 4559 net.cpp:157] Top shape: 100 50 8 8 (320000) I0327 06:06:18.252372 4559 net.cpp:165] Memory required for data: 7354800 I0327 06:06:18.252384 4559 layer_factory.hpp:77] Creating layer pool2 I0327 06:06:18.252398 4559 net.cpp:106] Creating Layer pool2 I0327 06:06:18.252403 4559 net.cpp:454] pool2 <- conv2 I0327 06:06:18.252410 4559 net.cpp:411] pool2 -> pool2 I0327 06:06:18.252493 4559 net.cpp:150] Setting up pool2 I0327 06:06:18.252509 4559 net.cpp:157] Top shape: 100 50 4 4 (80000) I0327 06:06:18.252514 4559 net.cpp:165] Memory required for data: 7674800 I0327 06:06:18.252518 4559 layer_factory.hpp:77] Creating layer ip1 I0327 06:06:18.252527 4559 net.cpp:106] Creating Layer ip1 I0327 06:06:18.252534 4559 net.cpp:454] ip1 <- pool2 I0327 06:06:18.252542 4559 net.cpp:411] ip1 -> ip1 I0327 06:06:18.256626 4559 net.cpp:150] Setting up ip1 I0327 06:06:18.256646 4559 net.cpp:157] Top shape: 100 500 (50000) I0327 06:06:18.256651 4559 net.cpp:165] Memory required for data: 7874800 I0327 06:06:18.256662 4559 layer_factory.hpp:77] Creating layer relu1 I0327 06:06:18.256676 4559 net.cpp:106] Creating Layer relu1 I0327 06:06:18.256681 4559 net.cpp:454] relu1 <- ip1 I0327 06:06:18.256691 4559 net.cpp:397] relu1 -> ip1 (in-place) I0327 06:06:18.256705 4559 net.cpp:150] Setting up relu1 I0327 06:06:18.256711 4559 net.cpp:157] Top shape: 100 500 (50000) I0327 06:06:18.256714 4559 net.cpp:165] Memory required for data: 8074800 I0327 06:06:18.256718 4559 layer_factory.hpp:77] Creating layer ip2 I0327 06:06:18.256726 4559 net.cpp:106] Creating Layer ip2 I0327 06:06:18.256732 4559 net.cpp:454] ip2 <- ip1 I0327 06:06:18.256742 4559 net.cpp:411] ip2 -> ip2 I0327 06:06:18.257045 4559 net.cpp:150] Setting up ip2 I0327 06:06:18.257062 4559 net.cpp:157] Top shape: 100 10 (1000) I0327 06:06:18.257067 4559 net.cpp:165] Memory required for data: 8078800 I0327 06:06:18.257074 4559 layer_factory.hpp:77] Creating layer ip2_ip2_0_split I0327 06:06:18.257081 4559 net.cpp:106] Creating Layer ip2_ip2_0_split I0327 06:06:18.257087 4559 net.cpp:454] ip2_ip2_0_split <- ip2 I0327 06:06:18.257097 4559 net.cpp:411] ip2_ip2_0_split -> ip2_ip2_0_split_0 I0327 06:06:18.257105 4559 net.cpp:411] ip2_ip2_0_split -> ip2_ip2_0_split_1 I0327 06:06:18.257189 4559 net.cpp:150] Setting up ip2_ip2_0_split I0327 06:06:18.257205 4559 net.cpp:157] Top shape: 100 10 (1000) I0327 06:06:18.257211 4559 net.cpp:157] Top shape: 100 10 (1000) I0327 06:06:18.257215 4559 net.cpp:165] Memory required for data: 8086800 I0327 06:06:18.257220 4559 layer_factory.hpp:77] Creating layer accuracy I0327 06:06:18.257231 4559 net.cpp:106] Creating Layer accuracy I0327 06:06:18.257241 4559 net.cpp:454] accuracy <- ip2_ip2_0_split_0 I0327 06:06:18.257247 4559 net.cpp:454] accuracy <- label_data_1_split_0 I0327 06:06:18.257253 4559 net.cpp:411] accuracy -> accuracy I0327 06:06:18.257267 4559 net.cpp:150] Setting up accuracy I0327 06:06:18.257278 4559 net.cpp:157] Top shape: (1) I0327 06:06:18.257282 4559 net.cpp:165] Memory required for data: 8086804 I0327 06:06:18.257287 4559 layer_factory.hpp:77] Creating layer loss I0327 06:06:18.257292 4559 net.cpp:106] Creating Layer loss I0327 06:06:18.257297 4559 net.cpp:454] loss <- ip2_ip2_0_split_1 I0327 06:06:18.257302 4559 net.cpp:454] loss <- label_data_1_split_1 I0327 06:06:18.257311 4559 net.cpp:411] loss -> loss I0327 06:06:18.257325 4559 layer_factory.hpp:77] Creating layer loss I0327 06:06:18.257549 4559 net.cpp:150] Setting up loss I0327 06:06:18.257565 4559 net.cpp:157] Top shape: (1) I0327 06:06:18.257570 4559 net.cpp:160] with loss weight 1 I0327 06:06:18.257577 4559 net.cpp:165] Memory required for data: 8086808 I0327 06:06:18.257581 4559 net.cpp:226] loss needs backward computation. I0327 06:06:18.257586 4559 net.cpp:228] accuracy does not need backward computation. I0327 06:06:18.257591 4559 net.cpp:226] ip2_ip2_0_split needs backward computation. I0327 06:06:18.257594 4559 net.cpp:226] ip2 needs backward computation. I0327 06:06:18.257598 4559 net.cpp:226] relu1 needs backward computation. I0327 06:06:18.257602 4559 net.cpp:226] ip1 needs backward computation. I0327 06:06:18.257606 4559 net.cpp:226] pool2 needs backward computation. I0327 06:06:18.257609 4559 net.cpp:226] conv2 needs backward computation. I0327 06:06:18.257613 4559 net.cpp:226] pool1 needs backward computation. I0327 06:06:18.257617 4559 net.cpp:226] conv1 needs backward computation. I0327 06:06:18.257622 4559 net.cpp:228] label_data_1_split does not need backward computation. I0327 06:06:18.257627 4559 net.cpp:228] data does not need backward computation. I0327 06:06:18.257629 4559 net.cpp:270] This network produces output accuracy I0327 06:06:18.257633 4559 net.cpp:270] This network produces output loss I0327 06:06:18.257648 4559 net.cpp:283] Network initialization done. I0327 06:06:18.257699 4559 solver.cpp:60] Solver scaffolding done. I0327 06:06:18.258568 4559 socket.cpp:219] Waiting for valid port [0] I0327 06:06:18.258647 4576 socket.cpp:158] Assigned socket server port [38201] I0327 06:06:18.259286 4576 socket.cpp:171] Socket Server ready [0.0.0.0] I0327 06:06:18.268749 4559 socket.cpp:219] Waiting for valid port [38201] I0327 06:06:18.268764 4559 socket.cpp:227] Valid port found [38201] I0327 06:06:18.268779 4559 CaffeNet.cpp:186] Socket adapter: ip-172-31-26-41:38201 I0327 06:06:18.271020 4559 CaffeNet.cpp:325] 0-th Socket addr: ip-172-31-26-41:38201 I0327 06:06:18.271049 4559 CaffeNet.cpp:325] 1-th Socket addr: I0327 06:06:18.271056 4559 CaffeNet.cpp:325] 2-th Socket addr: ip-172-31-26-41:38201 I0327 06:06:18.271077 4559 JniCaffeNet.cpp:110] 0-th local addr: ip-172-31-26-41:38201 I0327 06:06:18.271085 4559 JniCaffeNet.cpp:110] 1-th local addr: I0327 06:06:18.271088 4559 JniCaffeNet.cpp:110] 2-th local addr: ip-172-31-26-41:38201 16/03/27 06:06:18 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 994 bytes result sent to driver 16/03/27 06:06:18 INFO CoarseGrainedExecutorBackend: Got assigned task 4 16/03/27 06:06:18 INFO Executor: Running task 1.0 in stage 1.0 (TID 4) 16/03/27 06:06:18 INFO TorrentBroadcast: Started reading broadcast variable 2 16/03/27 06:06:18 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1580.0 B, free 6.8 KB) 16/03/27 06:06:18 INFO TorrentBroadcast: Reading broadcast variable 2 took 12 ms 16/03/27 06:06:18 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.6 KB, free 9.4 KB) 16/03/27 06:06:18 INFO TorrentBroadcast: Started reading broadcast variable 1 16/03/27 06:06:18 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 132.0 B, free 9.5 KB) 16/03/27 06:06:18 INFO TorrentBroadcast: Reading broadcast variable 1 took 9 ms 16/03/27 06:06:18 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 1072.0 B, free 10.6 KB) I0327 06:06:18.459633 4559 common.cpp:61] 1-th string is NULL I0327 06:06:18.459702 4559 socket.cpp:250] Trying to connect with ...[ip-172-31-26-42:33433] I0327 06:06:18.462236 4559 socket.cpp:309] Connected to server [ip-172-31-26-42.eu-west-1.compute.internal:33433] with client_fd [170] I0327 06:06:18.464623 4576 socket.cpp:184] Accepted the connection from client [ip-172-31-26-42.eu-west-1.compute.internal] I0327 06:06:28.462440 4559 socket.cpp:250] Trying to connect with ...[ip-172-31-26-40:38965] I0327 06:06:28.464995 4559 socket.cpp:309] Connected to server [ip-172-31-26-40.eu-west-1.compute.internal:38965] with client_fd [21] I0327 06:06:28.466243 4576 socket.cpp:184] Accepted the connection from client [ip-172-31-26-40.eu-west-1.compute.internal] I0327 06:06:38.474051 4559 parallel.cpp:392] GPUs pairs 0:1, 2:3, 0:2 I0327 06:06:38.485669 4559 parallel.cpp:234] GPU 1 does not have p2p access to GPU 0 I0327 06:06:38.496879 4559 parallel.cpp:234] GPU 2 does not have p2p access to GPU 0 I0327 06:06:38.508085 4559 parallel.cpp:234] GPU 3 does not have p2p access to GPU 2 I0327 06:06:38.512322 4580 MemoryInputAdapter.cpp:15] MemoryInputAdapter is used 16/03/27 06:06:38 INFO Executor: Finished task 1.0 in stage 1.0 (TID 4). 935 bytes result sent to driver I0327 06:06:38.825384 4582 MemoryInputAdapter.cpp:15] MemoryInputAdapter is used I0327 06:06:38.839550 4584 MemoryInputAdapter.cpp:15] MemoryInputAdapter is used I0327 06:06:38.847441 4586 MemoryInputAdapter.cpp:15] MemoryInputAdapter is used 16/03/27 06:06:38 INFO CoarseGrainedExecutorBackend: Got assigned task 8 16/03/27 06:06:38 INFO Executor: Running task 2.0 in stage 2.0 (TID 8) 16/03/27 06:06:38 INFO TorrentBroadcast: Started reading broadcast variable 3 16/03/27 06:06:38 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 2.2 KB, free 12.8 KB) 16/03/27 06:06:38 INFO TorrentBroadcast: Reading broadcast variable 3 took 11 ms 16/03/27 06:06:38 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 3.4 KB, free 16.2 KB) 16/03/27 06:06:38 INFO CacheManager: Partition rdd_6_2 not found, computing it 16/03/27 06:06:38 INFO CacheManager: Partition rdd_1_2 not found, computing it 16/03/27 06:06:38 INFO LmdbRDD: Processing partition 2 16/03/27 06:06:38 INFO LmdbRDD: local LMDB path:/root/CaffeOnSpark/data/mnist_train_lmdb/ 16/03/27 06:06:39 INFO LmdbRDD: Completed partition 2 16/03/27 06:06:39 INFO BlockManager: Found block rdd_1_2 locally