# OpenCL data types

We saw in the survival C++ course that some C datatypes are represented with a nominal number of bits, depending on the platform and operating system in use. Within OpenCL kernels, a particular C datatype always uses the same number of bits, however in the main program that same C datatype might use a different number of bits. This is a problem for OpenCL applications which promise a level of portability across implementations. In order to remedy this, the latest [OpenCL C specification](https://www.khronos.org/registry/OpenCL/specs/3.0-unified/pdf/OpenCL_C.pdf) has a number of standard data types for fixing the number of bits used, and maintaining consistency with the number of bits used for types in the kernel. As follows is a number of commonly used OpenCL data types and how many bits they use. It is **good practice** to use these data types wherever practical in OpenCL applications.

## Scalar types

| kernel C type | OpenCL standard type | Description and bits used |  
| :- | :- | :- |
| bool | NA | undefined |
| char | cl_char | 8 bits, signed two's complement integer  |
| unsigned char, uchar | cl_uchar | 8 bits, unsigned integer |
| short | cl_short | 16 bits, signed two's complement integer |
| unsigned short, ushort | cl_ushort | 16 bits, unsigned integer |
| int | cl_int | 32 bits, signed two's complement integer |
| unsigned int, uint | cl_uint | 32 bits, unsigned integer |
| long | cl_long | 64 bits, signed two's complement integer |
| unsigned long, ulong | cl_ulong | 64 bits, unsigned integer |
| half | cl_half | 16 bits, floating point number |
| float | cl_float | 32 bits, floating point number |
| double | cl_double | 64 bits, floating point number |
| size_t | NA | unknown bits, result of **sizeof** operator |
| ptrdiff_t | NA | unknown bits, signed integer type from the subtraction of one pointer from another |
| intptr_t | NA | unknown bits, pointer storage in a signed integer type |
| uintptr_t | NA | unknown bits, pointer storage in an unsigned integer type |
| void | void | unknown bits, incomplete type |

## Vector types

In addition to the standard types above, the OpenCL standard also defines a number of vector types with **n** = 2,3,4,8, and 16 elements. Vectors can unlock performance within an OpenCL application because memory is loaded into caches using cache lines that are typically around 64-128 bytes (or 16-32 floats) wide. Furthermore, CPU's have SIMD units that can process, in one instruction, vectors of floats to 64 bytes long. Here are the vector types as used in both host and kernel code.

| kernel C type | OpenCL standard type | Description and bits used |  
| :- | :- | :- |
| char**n** | cl_char**n** | **n** x 8 bits, signed two's complement integers  |
| uchar**n** | cl_uchar**n** | **n** x 8 bits, unsigned integers |
| short**n** | cl_short**n** | **n** x 16 bits, signed two's complement integers |
| ushort**n** | cl_ushort**n** | **n** x 16 bits, unsigned integers |
| int**n** | cl_int**n** | **n** x 32 bits, signed two's complement integers |
| uint**n** | cl_uint**n** | **n** x 32 bits, unsigned integers |
| long**n** | cl_long**n** | **n** x 64 bits, signed two's complement integers |
| ulong**n** | cl_ulong**n** | **n** x 64 bits, unsigned integers |
| float**n** | cl_float**n** | **n** x 32 bits, floating point numbers |
| double**n** | cl_double**n** | **n** x 64 bits, floating point numbers |

### Complex numbers in OpenCL

Complex numbers are not implemented in OpenCL, however you can store the real and imaginary components in a **float2** or **double2** vector type for example. One must manually perform the complex math on the individual components.

### Vector access from the host

There is a sophisticated means of indexing into a vector type within an OpenCL kernel, however from the host one has to use the **.s[index]** indexing to get at individual elements.

```C++
// Code from the host

// Declare an initialised vector
cl_float4 f = (cl_float4){0.0, 1.0, 2.0, 3.0};
    
// Could have also been done like this
//cl_float4 f = (cl_float4){0.0};

// Print out the last element
std::printf("%f\n", f.s[3]);
    
// Store a value in the last element
f.s[3] = 10.0;
    
// Print out the last element again
std::printf("%f\n", f.s[3]);
```

### Vector access from within a kernel

Allocations of memory that are passed to a kernel in the **\_\_global** or **\_\_local** address spaces can be interpreted as a vector data type. For example in this kernel definition we interpret the global memory allocations **A_star**, **BT_star** and the local memory allocations **shared_A_star** and **shared_BT_star** as vectors of type **float8**. 

```C++
__kernel void mat_mult_local_transp_vec (
                        __global float8* A_star, 
                        __global float8* BT_star, 
                        __global float* C,
                        __local  float8* shared_A_star,
                        __local  float8* shared_BT_star,
                        unsigned int N1_A_v, 
                        unsigned int N0_C,
                        unsigned int N1_C) {
```

One must make sure of two things when using memory in this way:

* The memory allocation is big enough so that the last element in the last vector accessed is backed by memory.
* The memory is byte-aligned so that the starting address of the allocation is a multiple of the vector length.

If an OpenCL function is performing the memory allocation, such as [clCreateBuffer](https://www.khronos.org/registry/OpenCL/sdk/3.0/docs/man/html/clCreateBuffer.html) or [clSVMAlloc](https://www.khronos.org/registry/OpenCL/sdk/3.0/docs/man/html/clSVMAlloc.html) then it will usually allocate memory according to the largest OpenCL memory type (**long16**). Otherwise, use the C11 function **aligned_alloc** to allocate memory with the same alignment as there are bytes in the vector type.

Access to a vector type from within a kernel is done using dot notation. You can use **.x .y .z** and **.w** for the first four elements, or you can use **.s0, .s1, .s2, .s3, .s4, .s5, .s6, .s7, .s8, .s9, .sa, .sb, .sc, .sd, .se, .sf** to access values up to the 16th element. A neat thing about OpenCL vectors is that you can "swizzle", or permute indices (using either .xyzw or .s* but not both) to mix up the order of the vector.

```C++
// Code within a kernel

// Explicit declaration
float4 f = (float4)(1.0f, 2.0f, 3.0f, 4.0f);

// Explicit declaration
float4 v = (float4)(1.0f);

// Access to element 0 (both expressions are equivalent)
v.x = 1.0f;
v.s0 = 1.0f;

// Valid examples of swizzling
v.xyzw = f.wzyx;
v.xyzw = f.s3210;
```

You can also load and store vectors from a memory allocation using the **vloadn** and **vstoren** functions. 

```C++
// OpenCL kernel code

// Assuming arr is a memory allocation from global memory

// Load a float4 vector starting at the
// memory location of offset*4 relative to arr
float4 f = vload4(offset, arr);

// Store a float4 vector starting at the 
// memory location of offset*4 relative to arr
vstore4(f, offset, arr);
```

In order to avoid undefined behaviour, the address **arr** for the allocation needs to be byte-aligned to the data type being used. So as long as you use the allocated (and aligned) address as the address for **vstoren** and **vloadn** functions you will be fine.

<address>
Written by Dr. Toby Potter of <a href="https://www.pelagos-consulting.com">Pelagos Consulting and Education</a> for the Pawsey Supercomputing Centre
</address>