Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the solver context thread safe? #65

Closed
tianchaijz opened this issue Dec 11, 2018 · 11 comments
Closed

Is the solver context thread safe? #65

tianchaijz opened this issue Dec 11, 2018 · 11 comments

Comments

@tianchaijz
Copy link

Hi John:

I build mean.cu as a shared library, and use cgo to call it. Create one goroutine for each graphic card, and binding one solver context for each goroutine.

One solver context works as expected, but multiple contexts not work well, after running a while, those
solver contexts stopping working, and the log shows:

findcycles edges 46 time 4 ms total 4 ms
findcycles edges 46 time 3 ms total 3 ms
findcycles edges 46 time 3 ms total 3 ms
findcycles edges 46 time 4 ms total 4 ms
findcycles edges 46 time 3 ms total 3 ms
findcycles edges 46 time 3 ms total 3 ms
findcycles edges 46 time 3 ms total 3 ms
findcycles edges 46 time 16 ms total 16 ms
findcycles edges 46 time 4 ms total 4 ms

Seems that trimming phase is not work. Any ideas? Thanks!

@tromp
Copy link
Owner

tromp commented Dec 11, 2018

I'm not sure how to interpret the output above.
Are the 9 lines from 9 different contexts?
Are you giving each one a different graph to work on (by varying the arguments to run_solver()) ?
It's also puzzling why the trim took 0 ms in each case, and why there are always 46 edges left.
Where is your code calling run_solver() ?

@tianchaijz
Copy link
Author

Hi John,

Thanks for your reply,

I just create a simple function named find_solution:

 CALL_CONVENTION int find_solution(SolverCtx* ctx,
                                   char* header,
                                   int header_length,
                                   SolverSolutions *solutions)
 {
   if (ctx == NULL || !ctx->trimmer.initsuccess){
     print_log("Error initialising trimmer. Aborting.\n");
     print_log("Reason: %s\n", LAST_ERROR_REASON);
     return -__LINE__;
   }

   ctx->setheadernonce(header, header_length, 0);
   u32 nsols = ctx->solve();

   for (unsigned s = 0; s < nsols; s++) {
     print_log("Solution");
     u32* prf = &ctx->sols[s * PROOFSIZE];
     if (solutions != NULL){
       solutions->edge_bits = EDGEBITS;
       solutions->num_sols++;
       for (u32 i = 0; i < PROOFSIZE; i++)
         solutions->sols[s].proof[i] = (u64) prf[i];
     }

     int pow_rc = verify(prf, &ctx->trimmer.sipkeys);
     if (pow_rc != POW_OK) {
         return -__LINE__;
     }
   }

   return nsols;
 }

I give each context a different header, and called the solver this way:

var solverSols SolverSolutions
n := C.find_solution(ctx, header, headerLen, (*C.SolverSolutions)(unsafe.Pointer(&solverSols)))
if n > 0 {
    sols := make([]uint32, 42)
    for i := 0; i < int(n); i++ {
        for j := 0; j < 42; j++ {
            sols[j] = uint32(solverSols.sols[i].proof[j])
        }
    }
} else if n < 0 {
    return fmt.Errorf("cgo return code: %d", int(n))
}

@tianchaijz
Copy link
Author

The full log context:

 GPU1 finding solution
 GPU4 finding solution
 GPU2 finding solution
 GPU0 finding solution
 GPU1 finding solution
 GPU4 finding solution
 GPU2 finding solution
 GPU0 finding solution
 eted in 63 + 46 ms
 Seeding completed in 63 + 46 ms
 Seeding completed in 63 + 46 ms
 Seeding completed in 63 + 46 ms
 Seeding completed in 63 + 46 ms
    2-cycle found
    8-cycle found
   54-cycle found
 findcycles edges 63077 time 56 ms total 390 ms
   12-cycle found
    4-cycle found
   20-cycle found
    4-cycle found
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=RgAAACQPgXA=
    6-cycle found
  100-cycle found
  406-cycle found
  594-cycle found
  340-cycle found
 findcycles edges 71378 time 49 ms total 386 ms
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=RwAAACQPgXA=
   38-cycle found
   10-cycle found
   66-cycle found
   12-cycle found
 2466-cycle found
   70-cycle found
 findcycles edges 74681 time 58 ms total 383 ms
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=QwAAACQPgXA=
  232-cycle found
  306-cycle found
 findcycles edges 61895 time 53 ms total 378 ms
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=RAAAACQPgXA=
   20-cycle found
   74-cycle found
   28-cycle found
  332-cycle found
 findcycles edges 63438 time 51 ms total 379 ms
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=SAAAACQPgXA=
   38-cycle found
 findcycles edges 64257 time 74 ms total 400 ms
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=RQAAACQPgXA=
 findcycles edges 77 time 2 ms total 106 ms
 findcycles edges 77 time 2 ms total 137 ms
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=TAAAACQPgXA=
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=TgAAACQPgXA=
 findcycles edges 77 time 1 ms total 189 ms
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=SwAAACQPgXA=
 findcycles edges 77 time 1 ms total 280 ms
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=SQAAACQPgXA=
 findcycles edges 77 time 0 ms total 365 ms
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=TQAAACQPgXA=
 findcycles edges 77 time 3 ms total 432 ms
 findcycles edges 77 time 4 ms total 319 ms
 findcycles edges 46 time 4 ms total 234 ms
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=SgAAACQPgXA=
 findcycles edges 77 time 4 ms total 319 ms
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=UQAAACQPgXA=
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=VAAAACQPgXA=
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=UgAAACQPgXA=
 findcycles edges 46 time 2 ms total 2 ms
 findcycles edges 46 time 2 ms total 2 ms
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=VwAAACQPgXA=
 setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=UAAAACQPgXA=

@tromp
Copy link
Owner

tromp commented Dec 11, 2018

The earlier runs with edges in the 10s of thousands look fine, but the ones with less than 100 edges are wrong and spent 0ms on trimming. Perhaps you can more diagnostic output from the trimming routine to figure out why they skipped nearly all the work. For instance, SeedA should compute siphashes for 2^29 nodes which necessarily takes a lot of time.
I think with all data encapsulated in the SolverCtx, the solver should be thread safe.

@tianchaijz
Copy link
Author

Thanks John!

I'm continue debugging, will let you know if this problem solved!

@tianchaijz
Copy link
Author

tianchaijz commented Dec 12, 2018

It seems that the edgetrimmer *dt point to invalid memory area.

651       int solve() {
(gdb) n
653         auto time0 = std::chrono::high_resolution_clock::now();
(gdb) n
655         trimmer.abort = false;
(gdb) n
656         u32 nedges = trimmer.trim();
(gdb) s
edgetrimmer::trim (this=0x7f3510000c00) at mean.cu:416
416       u32 trim() {
(gdb) n
417         cudaMemcpy(dt, this, sizeof(edgetrimmer), cudaMemcpyHostToDevice);
(gdb) p *dt
Cannot access memory at address 0x7f34cf400000
(gdb) p *this
$1 = {tp = {expand = 0, ntrims = 176, genA = {blocks = 4096, tpb = 256}, genB = {blocks = 4096,
      tpb = 128}, trim = {blocks = 4096, tpb = 512}, tail = {blocks = 4096, tpb = 1024}, recover = {
      blocks = 1024, tpb = 1024}}, dt = 0x7f34cf400000, sizeA = 4462739456, sizeB = 2852126720,
  indexesSize = 16384, bufferA = 0x7f331a600000, bufferB = 0x7f3424600000,
  bufferAB = 0x7f33c4600000, indexesE = 0x7f34cec00000, indexesE2 = 0x7f34cec04000, hostA = {72644,
    0 <repeats 4095 times>}, uvnodes = 0x7f34cf000000, sipkeys = {k0 = 3288269178733044074,
    k1 = 2872931202106657597, k2 = 11375450941595734051, k3 = 5158531013351406122},
  dipkeys = 0x7f34cf000200, abort = false, initsuccess = true}
(gdb) n
420         checkCudaErrors(cudaEventCreate(&startall)); checkCudaErrors(cudaEventCreate(&stopall));
(gdb) p *dt
Cannot access memory at address 0x7f34cf400000

I'm continue debugging ...

@tromp
Copy link
Owner

tromp commented Dec 12, 2018 via email

@tianchaijz
Copy link
Author

tianchaijz commented Dec 12, 2018

Hi John,

Still not solved, really weird. I'll try newer nvidia driver ...

@tianchaijz
Copy link
Author

Hi John,

This problem is solved.

Since goroutine may be scheduled running on different OS thread, which lead this problem, I add runtime.LockOSThread(), my program has been running hours, all look fine now.

@tromp
Copy link
Owner

tromp commented Dec 12, 2018

I still don't understand how running on different OS threads explains the faulty behaviour.
But I'm happy to hear you found a solution!

@tianchaijz
Copy link
Author

Thank you again, John!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants