it will stop in training ,there is no error in terminal. #11

szm88 · 2017-12-26T14:41:42Z

hi,jeasinema.I get your code,do the next step:
1.change the config of gpu: '3,1,2,0' -> '0' # I only have gtx 1070
2.python3 setup.py build_ext --inplace # I use python3.5.2
3.cd utils
python3 preprocess.py

error:there is no config,so i copy tf_voxelnet/config.py to tf_voxelnet/utils/ #is that right?????
the data is from kitti object include data_object_velodyne.zip (about 29G) \image_2(12G)\label_2\voxel(about 25G)
4.python3 train.py
errror : thert is no label of testing data ,so i copy "training" to "testing" in trainpy.
then :
..........
train: 18/60 @ epoch:3/10 loss: 1.9014711380004883 reg_loss: 0.31266674399375916 cls_loss: 1.5888043642044067 default
train ['000004']
--------------------using time: 73.70951771736145s-------------------
train: 19/60 @ epoch:3/10 loss: 1.5401957035064697 reg_loss: 0.23529152572155 cls_loss: 1.3049042224884033 default
train ['000001']
--------------------using time: 77.3743188381195s-------------------
train: 20/60 @ epoch:3/10 loss: 1.8793950080871582 reg_loss: 0.2751219868659973 cls_loss: 1.6042730808258057 default

it will stop in this ,there is no error in terminal.

when use 4 tanx train the model, it used all the 20 cpu threads and 45G ram.it used gpu-memory 149m*8.Why it use so much cpu?????
i found that gpu-util:0%,0%,0%,50%.I train another model,so i think it didn't use gpu. what's the reason?

turboxin · 2018-01-26T15:44:41Z

hi szm88, could you please kindly share how you solve this problem. I kind of get the same problem:

train: 20/18700 @ epoch:0/10 loss: 4.318506240844727 reg_loss: 2.653141498565674 cls_loss: 1.6653645038604736 default

It just stops here with no error in terminal. And the gpu-util turns down to 0% , with a high gpu memory usage of 8527/11172MB x 4 1080Ti

qianguih · 2018-02-01T20:19:37Z

I ran into the same problem. Any suggestion or comment will be appreciated. : )

dominikj93 · 2018-02-01T21:45:49Z

As far as I know, the labels for testing set are not publicly available. Thus, you cannot use the training and testing split as provided by KITTI dataset. The solution is to split training data set into smaller training and validation sets. At least that's what worked for me. Hope it helped!

qianguih · 2018-02-05T23:25:30Z

@dominikj93 Thanks for your reply. Actually, I have already splitter the training data. However, it still crashed sometimes during training.

jeasinema · 2018-02-06T00:58:56Z

@qianguih please upload the output of the terminal when you run this program. It's hard for us to determine what to go wrong with these limited informations.

BTW, @dominikj93 does provide the correct solution, sorry for not telling you that I use a split file available here.

qianguih · 2018-02-06T06:03:25Z

@jeasinema Thanks for your reply. I did use the same split file in my experiments. It will run smoothly for couple of epochs and just stop training without reporting any errors or warnings. It works well at most of the time on a 1080 Ti GPU but fails frequently on a P100 GPU. I don't have a sample output right now. I am trying to reproduce the problem and will post a sample output here when it is available. Currently, I feel like it is something wrong with the multi-thread processing in data loader.

jeasinema · 2018-02-06T08:27:46Z

@qianguih you do find a potential problem. The loaders may have competition with model. So you can try to add more workers like here.

ashishkumar-rambhatla · 2018-02-06T09:14:22Z

@jeasinema Can you share with us the code to split the kitti training data using the split files provided?

qianguih · 2018-02-06T19:14:55Z

@jeasinema Attached is a sample log. The CPU thread is still running but the GPU thread is dead. There is no error or warning. I have tried 8 workers. However, it didn't help.

log.txt

jeasinema · 2018-02-07T10:26:27Z

@qianguih Have you tried to pause the training and then restart it again? Did it still stuck here?

qianguih · 2018-02-07T16:42:19Z

@jeasinema No, I didn't try this. I tried to replace the multi-thread data loader with a normal loader. And it solved the problem, which proved that the problem does come from the multi-thread data loader.

zhanpx · 2018-10-16T05:44:15Z

@jeasinema No, I didn't try this. I tried to replace the multi-thread data loader with a normal loader. And it solved the problem, which proved that the problem does come from the multi-thread data loader.

I ran cross the same problem. Could you share your loader? Thx

szm88 closed this as completed Dec 26, 2017

dominikj93 mentioned this issue Feb 1, 2018

training stops at 20/18700 of epoch 0/10 with no error in terminal #21

Closed

jeasinema reopened this Feb 6, 2018

jeasinema closed this as completed Feb 8, 2018

bw4sz mentioned this issue Apr 18, 2018

Stops on step 10 - empty array in kitti_loader.py? #38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

it will stop in training ,there is no error in terminal. #11

it will stop in training ,there is no error in terminal. #11

szm88 commented Dec 26, 2017 •

edited

Loading

turboxin commented Jan 26, 2018

qianguih commented Feb 1, 2018

dominikj93 commented Feb 1, 2018 •

edited

Loading

qianguih commented Feb 5, 2018

jeasinema commented Feb 6, 2018

qianguih commented Feb 6, 2018 •

edited

Loading

jeasinema commented Feb 6, 2018

ashishkumar-rambhatla commented Feb 6, 2018

qianguih commented Feb 6, 2018

jeasinema commented Feb 7, 2018

qianguih commented Feb 7, 2018

zhanpx commented Oct 16, 2018

it will stop in training ,there is no error in terminal. #11

it will stop in training ,there is no error in terminal. #11

Comments

szm88 commented Dec 26, 2017 • edited Loading

turboxin commented Jan 26, 2018

qianguih commented Feb 1, 2018

dominikj93 commented Feb 1, 2018 • edited Loading

qianguih commented Feb 5, 2018

jeasinema commented Feb 6, 2018

qianguih commented Feb 6, 2018 • edited Loading

jeasinema commented Feb 6, 2018

ashishkumar-rambhatla commented Feb 6, 2018

qianguih commented Feb 6, 2018

jeasinema commented Feb 7, 2018

qianguih commented Feb 7, 2018

zhanpx commented Oct 16, 2018

szm88 commented Dec 26, 2017 •

edited

Loading

dominikj93 commented Feb 1, 2018 •

edited

Loading

qianguih commented Feb 6, 2018 •

edited

Loading