## Support for null values

Arrow supports missing values or "nulls" for all data types: any value in an array may be semantically null, whether primitive or nested type.

In Arrow, a dedicated buffer, known as the validity (or "null") bitmap, is used alongside the data indicating whether each value in the array is null or not. You can think of it as vector of 0 and 1 values, where a 1 means that the value is not-null ("valid"), while a 0 indicates the value is null.

This validity bitmap is optional, i.e. if there are no missing values in the array the buffer does not need to be allocated (as in the example column 1 in the diagram below).

![](diagrams/primitive-diagram.svg)

In [2]:
import numpy as np
import pyarrow as pa
import nanoarrow as na

In [3]:
arr = pa.array([1.2, 3.4, 9.0, None, 2.9])
arr

<pyarrow.lib.DoubleArray object at 0x7fe64d8236a0>
[
  1.2,
  3.4,
  9,
  null,
  2.9
]

In [4]:
na.c_array_view(arr)

<nanoarrow.c_lib.CArrayView>
- storage_type: 'double'
- length: 5
- offset: 0
- null_count: 1
- buffers[2]:
  - validity <bool[1 b] 11101000>
  - data <double[40 b] 1.2 3.4 9.0 0.0 2.9>
- dictionary: NULL
- children[0]:

**Attention**: Arrow uses [least-significant bit (LSB) numbering](https://en.wikipedia.org/wiki/Bit_numbering) (also known as bit-endianness). This means that within a group of 8 bits (1 byte), we read right-to-left. However, the `nanoarrow` repr of the validity buffer in the example above already takes that into account and shows the values in logical order matching the position in the array. 

The diagram above shows it as how it is actually stored in memory. We can inspect the validity bitmap buffer with pyarrow and numpy:

In [5]:
validity_bitmap_buffer = arr.buffers()[0]
validity_bitmap_buffer.to_pybytes()

b'\x17'

In this case of a small array of 5 values, the validity bitmap consists of only a single byte. To view the data as bytes in numpy, we can use the `uint8` data type, which has a width of 1 byte:

In [9]:
np.frombuffer(validity_bitmap_buffer, dtype="uint8")

array([23], dtype=uint8)

Numpy also provides a function to "unpack" the 0/1 bits of those bytes into separate values:

In [10]:
np.unpackbits(np.frombuffer(validity_bitmap_buffer, dtype="uint8"), bitorder="little")

array([1, 1, 1, 0, 1, 0, 0, 0], dtype=uint8)

In this case of an array of 5 elements, only the first 5 bits have a meaning, and the additional ("padded") bits are always set to 0.

### Null vs NaN

In numpy (and numpy-based packages such as pandas), often `NaN` is used as indicator for "missing" values, mostly by lack of better alternatives (numpy does not have built-in support for missing values in general). `NaN` is a specific floating-point value ("Not a Number") within the IEEE floating-point standard, and thus is only available for floating point data types.
In the Arrow format, since there is a separate concept of nulls, a NaN value is considered as just another valid floating point array value:

In [11]:
arr = na.Array([0.5, float("nan"), 1.5, None, 3.5], na.float64())

In [12]:
arr

nanoarrow.Array<double>[5]
0.5
nan
1.5
None
3.5

In [13]:
arr.buffers

(nanoarrow.c_lib.CBufferView(bool[1 b] 11101000),
 nanoarrow.c_lib.CBufferView(double[40 b] 0.5 nan 1.5 0.0 3.5))