- 
                Notifications
    
You must be signed in to change notification settings  - Fork 27
 
Adds optimization from proxy application #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ollocation functions
…reduce register pressure
…tegrator O(2-5x) improvement in GFLOP/s for collocation on V100 + PPC
…inor performance degredation, disabled for now
| 
           @dmclark17 I've rebased this locally, do you want me to push directly or have you review in a separate branch?  | 
    
| 
           I think pushing directly here would work  | 
    
| 
           @dmclark17 Up-to-date. You can toggle your compact collocation kernel implementation by uncommenting and changing the analogous kernel launch. FWIW, this increases the register usage ( Note, I'd like to either 
  | 
    
| 
           For the   | 
    
| 
           I'm wondering if this is might be due to the fact that we're striding on multiple dimensions, it might be possible that a stride along   | 
    
| 
           @dmclark17 Is there anything else you'd like to include in this PR?  | 
    
* [CI] Try container based CI * [CI] Typo * [CI] Typo * [CI] Typo * [CI] Typo * [CI] Typo * [CI] Add BLIS linkage * [CI] Reenable LLVM in tests * [CI] Reenable LLVM in tests * [CI] Reenable LLVM in tests * [CI] Reenable LLVM in tests * [CI] typo * [CI] Renable Debug + subproject tests * [CI] Renable Debug + subproject tests * [CI] Renable Debug + subproject tests * [CI] Renable Debug + subproject tests * [CI] Some cleanup * [CI] Some cleanup * [CI] Some cleanup * [CI] Use installed LibXC in Docker container * [CMake] bug in discovery export * [CMake] Add ExchCXX discovery in export config... how did this ever work?? * [CI] Try running on self-hosted * [CI] Enable CUDA CI * [CI] Pass CMAKE_CUDA_ARCHITECTURES to GH Actions toolchain * [CI] Disable MPI for CUDA tests * Fix CUDA + no MPI Build * Fix CUDA + no MPI Build * Actually fix CUDA + no MPI * Disable pinned vector for CUDA 12
Ported proxy app changes from my end to GauXC. I am marking as a draft because I did not include the downstream changes from the matrix transpose edit so it is not currently correct.