In C++ (and CUDA), we use headers to include predefined functions and classes. The most common header is <iostream>
, which provides input/output functionality. For example:
#include <iostream>
int
: Represents whole numbers (e.g., 5, -10, 0).short
: Short integers with smaller range.long
: Long integers with larger range.
float
: Single-precision floating-point (e.g., 3.14, -0.001).double
: Double-precision floating-point (more accurate, but uses more memory).
char
: Represents individual characters (e.g., 'A', 'b', '$').bool
: Represents true or false values.
-
Use
const
to define constants (values that don't change during program execution). -
Example:
const float PI = 3.14159;
-
Repeats a block of code a specified number of times.
-
Example:
for (int i = 0; i < 10; ++i) { // Code to execute }
-
Repeats a block of code while a condition is true.
-
Example:
int count = 0; while (count < 5) { // Code to execute ++count; }
- Every C++ program starts with the
main
function. - It's the entry point of your program, where execution begins.
- The
int
beforemain
indicates that the function returns an integer (usually 0 for successful execution).
- A kernel is a function that runs on the GPU.
- It's the heart of CUDA programming.
- Kernels are defined using the
__global__
keyword.
Example of a simple kernel:
__global__ void vectorAdd(float* A, float* B, float* C, int size) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
if (tid < size) {
C[tid] = A[tid] + B[tid];
}
}
- In CUDA, we work with threads.
- Each thread has a unique index.
- We calculate the thread index using
blockIdx.x
,blockDim.x
, andthreadIdx.x
.
- We allocate memory for data on both the CPU (host) and GPU (device).
cudaMalloc
allocates memory on the device.cudaMemcpy
transfers data between host and device.
- We launch the kernel using
<<<numBlocks, threadsPerBlock>>>
syntax. numBlocks
andthreadsPerBlock
determine the grid and block dimensions.
- After using GPU memory, we free it using
cudaFree
. - Also, delete any host memory allocated with
new
.