# Nested layouts

* List, List View
* Struct
* Map
* Union

In nested types we introduce the concept of **parent** and **child arrays**. They express relationships between physical value arrays in a nested type structure.

Nested types depend on one or more other child data types. For instance, List is a nested type (parent) that has one child (the data types of the values in the list).

## List

The list type enables values of the same type being stacked together in a sequence of values in each column slot. The layout is similar to binary or string type as it has offsets buffer to define where the sequence of values starts and ends with all the values of the column being stored consecutively in a values child array.

![image info](./diagrams/var-list-diagram.svg)

In [None]:
import nanoarrow as na
import numpy as np
import pyarrow as pa

In [None]:
column_1 = pa.array([[12, -7, 25], None, [0, -127, 127, 50], []],
                    type=pa.list_(pa.int8()))
column_1

When inspecting a list type column (and all nested data in general) using pyarrow, the `buffers()` method returns all buffers - of the list array itself (validity bitmap buffer and offset buffer) and its child array (validity bitmap buffer and values buffer):

In [None]:
# Inspecting buffers using PyArrow and buffers() method
column_1.buffers()

In [None]:
# Inspecting buffers using PyArrow and buffers() method and numpy
validity_bitmap_buffer = column_1.buffers()[0]
np.unpackbits(np.frombuffer(validity_bitmap_buffer, dtype="uint8"), bitorder="little")

In [None]:
offsets_buffer = column_1.buffers()[1]
np.frombuffer(offsets_buffer, dtype="int32")

In [None]:
values_validity_bitmap_buffer = column_1.buffers()[2]
values_validity_bitmap_buffer is None

In [None]:
values_buffer = column_1.buffers()[3]
np.frombuffer(values_buffer, dtype="int8")

In [None]:
# Inspecting buffers using nanoarrow
na_column1 = na.array(column_1)
na_column1.inspect()

### Exercise

Create the following example column, inspect the buffers and look for the difference between this and the previous list type column:

<details><summary>Hints</summary>

* Do you see any offset buffers?
* What is the length of the child array?

</details>

In [None]:
column_1_example = pa.array([[12, -7, 25], None, [50, -127, 127]],
                            type=pa.list_(pa.int8(), 3))

### Fixed size list

**Fixed size list** is a special case of variable-size list where each column slot contains a fixed size sequence meaning all lists are the same size and so the offset buffer is no longer needed.

![image info](./diagrams/fixed-list-diagram.svg)

In [None]:
column_2 = pa.array([[12, -7], None, [0, None]], type=pa.list_(pa.int16(), 2))
na.array(column_2).inspect()

### List and large list comparison

In a normal list with variable or fixed size the offsets are `int32` while in the **large** list the offsets are `int64`.

In [None]:
column_1 = pa.array([[12, -7, 25], None, [0, -127, 127, 50], []],
                    type=pa.list_(pa.int8()))
na.array(column_1).inspect()

In [None]:
column_1_large = pa.array([[12, -7, 25], None, [0, -127, 127, 50], []],
                          type=pa.large_list(pa.int8()))
na.array(column_1_large).inspect()

### List and large list view

List view type allows arrays to specify out-of-order offsets.

![image info](./diagrams/var-list-view-diagram.svg)

In [None]:
column_3 = pa.ListViewArray.from_arrays(offsets=[4, 7, 0, 0, 3],
                                        sizes=[3, 0, 4, 0, 2],
                                        values=[0, -127, 127, 50, 12, -7, 25],
                                        mask=pa.array([False, True, False, False, False]))

In [None]:
column_3.buffers()

In [None]:
column_3

In [None]:
values_buffer = column_3.buffers()[4]
np.frombuffer(values_buffer, dtype="int64")

### Struct

A struct is a nested type parameterized by an ordered sequence of types.

* one child array for each field
* child arrays are independent and need not be adjacent to each other in memory (only need to have the same length)

One can think of an individual struct field as a key-value pair where the key is the field name and the child array its values. The field (key) is saved in the schema and the values of a specific field (key) are saved in the child array.

![image info](./diagrams/struct-diagram.svg)

In [None]:
ty = pa.struct([pa.field('x', pa.string()),
                pa.field('y', pa.int8())])
column_4 = pa.array([{"x": "joe", "y": 1},
                     {"x": None, "y": 2}, None,
                     {"x": "mark", "y": 4},
                     {"x": "jane", "y": None}],
                    type=ty)
column_4

In [None]:
column_4.buffers()

In [None]:
na.array(column_4).inspect()

### Exercise

Create the following nested example column. How many buffers does the example have? Try to determine the number first before inspecting the buffers using pyarrow or nanoarrow.

<details><summary>Hints</summary>

* Struct: validity bitmap buffer and a child arrays
* List: validity bitmap buffer, offsets buffer and one child array
* String: validity bitmap buffer, offsets buffer and data buffer
* Fixed size list: validity bitmap buffer and one child array
* Uint: validity bitmap buffer and data buffer

</details>

In [None]:
ty = pa.struct([pa.field('x', pa.list_(pa.string())),
                pa.field('y', pa.list_(pa.list_(pa.uint8(), 2)))])
nested_example = pa.array([{"x": ["joe"], "y": [[1, 2], [2, 1]]},
                           {"x": None, "y": [[2, 3], None]}, None,
                           {"x": ["mark"], "y": [[4, None]]},
                           {"x": ["jane", None], "y": None}],
                           type=ty)
ty

### Map

Map type represents nested data where each value is a variable number of key-value pairs. Its physical representation is the same as a list of `{key, value}` structs.

The difference between a struct and a map type is that a struct holds the key in the schema therefore needs to be a string, the values are stored in the child arrays, one for each field. There can be multiple keys and therefore multiple child arrays. The map, on the other hand, has one child array holding all the different keys (that thus all need to be of the same type but not necessarily strings) and a second child array holding all the values, those values need to be of the same type (which doesn't have to match the one on the keys).

Also, the map stores the struct in a list and needs an offset as the list is variable shape.

![image info](./diagrams/map-diagram.svg)

In [None]:
column_6_data = [{'Dark Knight': 10},
                 {'Dark Knight': 8, 'Meet the Parents': 4, 'Superman': 5},
                 None,
                 {'Meet the Parents': 10, 'Superman': None}]
column_6 = pa.array(column_6_data, type=pa.map_(pa.string(), pa.int32()))
column_6

In [None]:
column_6.buffers()

In [None]:
column_6.type

In [None]:
na.array(column_6).inspect()

### Comparing the conversion of struct and map to python objects

Struct converts to a list of dictionaries:

In [None]:
column_4.to_numpy(zero_copy_only=False)

In [None]:
column_4.to_pylist()

And map converts to a nested list of tuples by default:

In [None]:
column_6.to_numpy(zero_copy_only=False)

Or to dictionary with the use of `maps_as_pydicts` keyword:

In [None]:
# maps_as_pydicts can be ‘lossy’ or ‘strict’
# This can change the ordering of (key, value) pairs, and will deduplicate multiple keys, resulting in a possible loss of data.

# ‘lossy’: warning printed when detected
# ‘strict’ exception being raised when deduplicate detected
column_6.to_pandas(zero_copy_only=False, maps_as_pydicts="lossy")

In [None]:
column_6.to_pylist()