GitHub - thorp/pVars: C++ data parallel programming library based on the concept of computation shape.

thorp / pVars Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

C++ data parallel programming library based on the concept of computation shape.

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README		README
pvars.cpp		pvars.cpp
pvars.h		pvars.h
pvars_test.cpp		pvars_test.cpp
testable.cpp		testable.cpp
testable.h		testable.h

Repository files navigation

* For standard c++17
nvc++ -std=c++17 -stdpar -Minfo pvars.cpp pvars_test.cpp testable.cpp

* For OPENACC
nvc++ -std=c++17 -acc -gpu=managed -Minfo pvars.cpp pvars_test.cpp testable.cpp

* Older OPENACC
nvc++ -acc -gpu=managed -Minfo pvars.cpp pvars_test.cpp
nvc++ -acc -gpu=managed -Minfo hello_acc.cpp
nvc++ -acc -gpu=managed -Minfo -Mneginfo -ta=tesla_cc75 parlate.cpp parlate_test.cpp

* For additional info
nvaccelinfo
nvidia-smi
nvidia-smi -q

Documentation should include:

Description of what the program/library is supposed to do. What is it expected to be used for.

What is the user API, how does one use the the program/library - including tutorials and examples.

High level description of implementation strategy

What are the design decisions

What is the rationale/motivation for these decisions

Documentation should not include:

Implementation details. These should be in code itself. If the code is not obvious, then either the code should be changed or comments added.

Reference to any private implementation details.

# Controlling the number of threads in
You can control how many threads the program will use to run the parallel compute regions with the environment variable ACC_NUM_CORES. The default is to use all available cores on the system. For Linux targets, the runtime will launch as many threads as physical cores (not hyper-threaded logical cores). OpenACC gang-parallel loops run in parallel across the threads. If you have an OpenACC parallel construct with a num_gangs(200) clause, the runtime will take the minimum of the num_gangs argument and the number of cores on the system, and launch that many threads. That avoids the problem of launching hundreds or thousands of gangs, which makes sense on a GPU but which would overload a multicore CPU.