-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the solver context thread safe? #65
Comments
I'm not sure how to interpret the output above. |
Hi John, Thanks for your reply, I just create a simple function named
I give each context a different header, and called the solver this way:
|
The full log context:
|
The earlier runs with edges in the 10s of thousands look fine, but the ones with less than 100 edges are wrong and spent 0ms on trimming. Perhaps you can more diagnostic output from the trimming routine to figure out why they skipped nearly all the work. For instance, SeedA should compute siphashes for 2^29 nodes which necessarily takes a lot of time. |
Thanks John! I'm continue debugging, will let you know if this problem solved! |
It seems that the
I'm continue debugging ... |
dear tianchaijz,
It seems that the edgetrimmer *dt point to invalid memory area.
651 int solve() {
(gdb) n
653 auto time0 = std::chrono::high_resolution_clock::now();
(gdb)
655 trimmer.abort = false;
(gdb)
656 u32 nedges = trimmer.trim();
(gdb) s
edgetrimmer::trim (this=0x7f94c4000c00) at mean.cu:416
416 u32 trim() {
(gdb) n
417 cudaMemcpy(dt, this, sizeof(edgetrimmer), cudaMemcpyHostToDevice);
(gdb) p sizeof(edgetrimmer)
$4 = 16536
(gdb) p *dt
Cannot access memory at address 0x7f9479800000
I'm continue debugging ...
I noticed my cuckoo/mean.cu had a redundant cudaMemCpy.
I removed that one in my latest commit. Don't see how that could be
responsible for behaviour you saw though...
regards,
-John
|
Hi John, Still not solved, really weird. I'll try newer nvidia driver ... |
Hi John, This problem is solved. Since goroutine may be scheduled running on different OS thread, which lead this problem, I add |
I still don't understand how running on different OS threads explains the faulty behaviour. |
Thank you again, John! |
Hi John:
I build
mean.cu
as a shared library, and use cgo to call it. Create one goroutine for each graphic card, and binding one solver context for each goroutine.One solver context works as expected, but multiple contexts not work well, after running a while, those
solver contexts stopping working, and the log shows:
Seems that trimming phase is not work. Any ideas? Thanks!
The text was updated successfully, but these errors were encountered: