## The Arrow C Data Interface

Up to now we have focused on explaining the Arrow columnar memory layout and showing you examples of it using `pyarrow` and `nanoarrow`. But this memory layout is meant to be a universal standard for tabular data, not tied to a specific implementation.

While there are specifications to share Arrow data between processes or over the network (e.g. the IPC messages), the **Arrow C Data Interface** is meant to actually zero-copy share the data between different libraries *within the same process* (i.e. actually share the same buffers in memory).

The Arrow C Data Interface defines a set of small C structures

```c
struct ArrowSchema {
  const char* format;
  const char* name;
  const char* metadata;
  int64_t flags;
  int64_t n_children;
  struct ArrowSchema** children;
  struct ArrowSchema* dictionary;

  // Release callback
  void (*release)(struct ArrowSchema*);
  // Opaque producer-specific data
  void* private_data;
};

struct ArrowArray {
  int64_t length;
  int64_t null_count;
  int64_t offset;
  int64_t n_buffers;
  int64_t n_children;
  const void** buffers;
  struct ArrowArray** children;
  struct ArrowArray* dictionary;

  // Release callback
  void (*release)(struct ArrowArray*);
  // Opaque producer-specific data
  void* private_data;
};
```


The C Data Interface passes Arrow data buffers through memory pointers. So, by construction, it allows you to share data from one runtime to another without copying it. Since the data is in standard Arrow in-memory format, its layout is well-defined and unambiguous.

And in the examples up to now, when we created a `nanoarrow.Array` from a `pyarrow` array (or vice versa), we were actually using the Arrow C Data Interface to share the data zero-copy under the hood (and you might recognize the structure members from the nanoarrow display we have been using to inspect our data). 
Similarly, other libraries like polars, duckdb, datafusion, reticulate (connect R and python), ... are all leveraging the Arrow C Data Interface to interchange data zero-copy.

### Arrow PyCapsule Interface

While the Arrow C data interface specify how to share the data at the C(FFI) level, it doesn't specify how Python libraries should expose these structs to other libraries. Enter the [**Arrow PyCapsule Interface**](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html), which standardizes the usage of `PyCapsule` objects to share those structs at the Python level and the protocol methods (`__arrow_c_array__` et al) to export those capsules.

When we pass the `pyarrow.Array` object to the `nanoarrow.array()` function, schematically the following happens:

```python
def array_from_arrow(obj):
    """
    Function to coerce any Arrow-compatible array object into
    and array of my own library.
    """
    # 1. check if the passed object has the protocol method that signals it can export
    #    itself as Arrow data
    if hasattr(obj, "__arrow_c_array__"):
        # 2. call the protocol method, which returns to PyCapsule objects (one describing
        #    the schema (data type), and one describing the array data)
        schema_capsule, array_capsule = obj.__arrow_c_array__()

        # 3. extract the pointer to the C struct from the PyCapsule, and pass it to
        #    a lower-level function that can read the Arrow data and coerce it into a
        #    data structure of your own library
        # example for nanoarrow
        return na.clib.CArray._import_from_c_capsule(schema_capsule, array_capsule)
    ...
```

This way, we can import the data of any input that supports this protocol, not just objects from the pyarrow library.

Example to "prove" that this conversion happens zero-copy:

In [2]:
import numpy as np
import pyarrow as pa
import nanoarrow as na

In [60]:
pyarrow_arr = pa.array(["some", "random", None, "strings"])
pyarrow_arr

<pyarrow.lib.StringArray object at 0x7fa8a39ad240>
[
  "some",
  "random",
  null,
  "strings"
]

In [61]:
nanoarrow_arr = na.Array(pyarrow_arr)
nanoarrow_arr

nanoarrow.Array<string>[4]
'some'
'random'
None
'strings'

In [62]:
numpy_array_data = np.asarray(pyarrow_arr.buffers()[2])

In [63]:
numpy_array_data[0:4] = int.from_bytes(b"!")

In [64]:
pyarrow_arr

<pyarrow.lib.StringArray object at 0x7fa8a39ad240>
[
  "!!!!",
  "random",
  null,
  "strings"
]

In [65]:
nanoarrow_arr

nanoarrow.Array<string>[4]
'!!!!'
'random'
None
'strings'