Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Error #160

Closed
woshigcy opened this issue Nov 16, 2017 · 17 comments
Closed

Test Error #160

woshigcy opened this issue Nov 16, 2017 · 17 comments

Comments

@woshigcy
Copy link

Hi,
I have built the project, but I met this error when test the model.
.............
11600 videos parsed
11800 videos parsed
12000 videos parsed
12200 videos parsed
12400 videos parsed
12600 videos parsed
12800 videos parsed
13000 videos parsed
13200 videos parsed
frame folder analysis done
Setting device 0
Setting device 1
Setting device 2
Setting device 3
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1116 11:08:02.481065 13484 common.cpp:196] Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal
*** Check failure stack trace: ***
Setting device 4
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1116 11:08:03.024714 13507 common.cpp:196] Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal
*** Check failure stack trace: ***
Setting device 5
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1116 11:08:03.412492 13516 common.cpp:196] Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal
*** Check failure stack trace: ***
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1116 11:08:03.449525 13482 common.cpp:201] Check failed: status == CUBLAS_STATUS_SUCCESS (1 vs. 0) CUBLAS_STATUS_NOT_INITIALIZED
*** Check failure stack trace: ***
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1116 11:08:03.471946 13483 common.cpp:201] Check failed: status == CUBLAS_STATUS_SUCCESS (1 vs. 0) CUBLAS_STATUS_NOT_INITIALIZED
*** Check failure stack trace: ***
Setting device 6
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1116 11:08:03.512379 13481 common.cpp:201] Check failed: status == CUBLAS_STATUS_SUCCESS (1 vs. 0) CUBLAS_STATUS_NOT_INITIALIZED
*** Check failure stack trace: ***
Setting device 7
Setting device 8
Setting device 9
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1116 11:08:04.135582 13529 common.cpp:196] Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1116 11:08:04.135614 13521 common.cpp:196] Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal
*** Check failure stack trace: ***
*** Check failure stack trace: ***
Setting device 10
Setting device 11
......................

Anyone have a suggestion?

@yjxiong
Copy link
Owner

yjxiong commented Nov 28, 2017

This is usually due to that the network initialization has failed. Try run testing with one GPU and see the error log for more details.

@woshigcy
Copy link
Author

Thanks xiong, I have solved this problem , it is caused by the version of NVIDIA's driver.

@woshigcy
Copy link
Author

Hi Xiong, I have a problem when testing the rgb. I got a very low accuracy and the predict label are the same.

@woshigcy woshigcy reopened this Nov 29, 2017
@TimidLion
Copy link

TimidLion commented Jan 23, 2018

@woshigcy Hi, I have same problem with you. I wonder the detail of how you fixed this problem.

and this is the error that I got when I run eval_net.py with num_worker as 1.

I0123 06:41:43.439946   464 net.cpp:244] conv1/7x7_s2 does not need backward computation.
I0123 06:41:43.439951   464 net.cpp:285] This network produces output fc-action
I0123 06:41:43.440199   464 net.cpp:551] Collecting Learning Rate and Weight Decay.
I0123 06:41:43.440235   464 net.cpp:300] Network initialization done.
I0123 06:41:43.440241   464 net.cpp:301] Memory required for data: 74005652
F0123 06:41:43.527283   464 pooling_layer.cu:213] Check failed: error == cudaSuccess (8 vs. 0)  invalid device function
*** Check failure stack trace: ***
Aborted (core dumped)

@Nothgard
Copy link

Nothgard commented Jan 23, 2018

@TimidLion i have the same problem, any solution?

@woshigcy I have nvidia, what was your driver's solution?

@yjxiong
Copy link
Owner

yjxiong commented Jan 23, 2018

@Nothgard @TimidLion

Check your Caffe build. Make sure your driver is loaded correctly. Make sure you are using the correct version of the Caffe (not the Official version).

@Nothgard
Copy link

@yjxiong now i have this problem

frame folder analysis done
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 19:12: Message type "caffe.LayerParameter" has no field named "bn_param".
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0123 15:42:48.810196 28312 upgrade_proto.cpp:79] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: models/ucf101/tsn_bn_inception_rgb_deploy.prototxt
*** Check failure stack trace: ***
Aborted (core dumped)

@yjxiong
Copy link
Owner

yjxiong commented Jan 23, 2018

@Nothgard

This very error indicates you are running another version of Caffe installed on your machine.

@TimidLion
Copy link

@yjxiong Actually, when building caffe-action, I got some problem of
" unsupported gpu architecture 'compute_70' " error.
I use docker container and nvidia-docker run to start the container with CUDA8.0 and cudnn5.1, and my graphic card is Tesla v100. So I changed Cuda.cmake(TSN/lib/caffe-action/cmake/Cuda.cmake) file's 96 line from
caffe_detect_installed_gpus(__cuda_arch_bin)
to
set(__cuda_arch_bin "61")
Is this the main problem?If you know how to solve such 'compute_70' error, then I will build it in that way again.

@yjxiong
Copy link
Owner

yjxiong commented Jan 24, 2018

@TimidLion AFIK, v100 only supports CUDA9

@TimidLion
Copy link

@yjxiong Thanks, I will try with CUDA9.0 version.

@yjxiong
Copy link
Owner

yjxiong commented Feb 1, 2018

@TimidLion @woshigcy
Any update?

@TimidLion
Copy link

@yjxiong Actually I give up for this because of some errors, and moved on to the anet2016 repository. It was build completely on CUDA 9.0. Maybe I'll try this later. Thanks!

@yjxiong yjxiong closed this as completed Feb 16, 2018
@xiaoyu5301
Copy link

@Nothgard Hi, I have same problem with you. I wonder the detail of how you fixed this problem,what is the version of NVIDIA's driver?

@Jhajaykant
Copy link

Jhajaykant commented Oct 1, 2018

I want to use this with its pre trained model on cpu_only caffe. I compiled caffe for cpu but after running the eval_net. py it is showing can not use gpu in cpu_only caffe i have changed solver mode to cpu but error is same . Further in your description it is not clear that how to fed image data into the program I mean directory architecture of framepath. Can you help with more specific on this.

Thanks in advance

@yjxiong
Copy link
Owner

yjxiong commented Oct 1, 2018

@Jhajaykant

For custom dataset please see the wiki

https://github.com/yjxiong/temporal-segment-networks/wiki/Working-on-custom-datasets.

In CPU_only you may need to make you are using the Caffe version we provided and set every command of config to use CPU. Besides, I would not suggest you use CPU for training, it could be too slow.

@Jhajaykant
Copy link

One more question please. After looking at code I feel that the prediction is called on each frame provided in frame path. Am i correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants