Skip to content
/ pVars Public

C++ data parallel programming library based on the concept of computation shape.

License

Notifications You must be signed in to change notification settings

thorp/pVars

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

* For standard c++17
nvc++ -std=c++17 -stdpar -Minfo pvars.cpp pvars_test.cpp testable.cpp

* For OPENACC
nvc++ -std=c++17 -acc -gpu=managed -Minfo pvars.cpp pvars_test.cpp testable.cpp

* Older OPENACC
nvc++ -acc -gpu=managed -Minfo pvars.cpp pvars_test.cpp
nvc++ -acc -gpu=managed -Minfo hello_acc.cpp
nvc++ -acc -gpu=managed -Minfo -Mneginfo -ta=tesla_cc75 parlate.cpp parlate_test.cpp

* For additional info
nvaccelinfo
nvidia-smi
nvidia-smi -q



Documentation should include:

    Description of what the program/library is supposed to do. What is it expected to be used for.

    What is the user API, how does one use the the program/library - including tutorials and examples.

    High level description of implementation strategy

    What are the design decisions

    What is the rationale/motivation for these decisions

Documentation should not include:

    Implementation details. These should be in code itself. If the code is not obvious, then either the code should be changed or comments added.

    Reference to any private implementation details.


# Controlling the number of threads in 
You can control how many threads the program will use to run the parallel compute regions with the environment variable ACC_NUM_CORES. The default is to use all available cores on the system. For Linux targets, the runtime will launch as many threads as physical cores (not hyper-threaded logical cores). OpenACC gang-parallel loops run in parallel across the threads. If you have an OpenACC parallel construct with a num_gangs(200) clause, the runtime will take the minimum of the num_gangs argument and the number of cores on the system, and launch that many threads. That avoids the problem of launching hundreds or thousands of gangs, which makes sense on a GPU but which would overload a multicore CPU.

About

C++ data parallel programming library based on the concept of computation shape.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages