Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Do you imagine having your CUDA kernel perform significant amounts of work within each eval() call? I can imagine doing GPU-amenable things in Verilog but they'd involve a clock and state, which which would require a numerous eval()s to execute the kernel in the verilated C++.
If you have a round trip to/from the GPU in eval() my guess is that the latency will burn through any performance gains you got. I suspect (please correct me if I'm wrong) you'd want a way to fork off the GPU bit of your verilated design while you continued eval()ing the rest of it and pick up the results from the GPU at some later point. Verilator does emit multi-threaded eval() code, however it's still just dividing up the work of the single-threaded eval(). What I'm imagining you'd need here is inter-eval() multi-threading which does not exist today. Nor has (to the best of my knowledge) there been any other work done on emitting CUDA code.
Perhaps I don't fully understand what you're asking. If you could expand on your thoughts here that may be helpful in discussing.