<img src="https://xnd.io/xndlogo%20transparentbg.png" align="center" width="auto">

<h1 align="center">What is xnd?</h1>

xnd is a module that implements a container type for mapping all python values relevant for scientific computing directly to memory.
xnd has a superset of features for typed memory found in similar libraries like numpy and apache arrow. 

In [1]:
from xnd import xnd
import numpy as np
import sys

In [2]:
print('Python %s' % sys.version)

Python 3.6.5 | packaged by conda-forge | (default, Apr  6 2018, 13:39:56) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]


# Xnd container creation routines

## Creating xnd container from Python data types

- xnd supports type inference. This allows the user to create xnd container from Python data types. 

In [3]:
xnd([1, 2, 3, 4, 5]) # xnd

xnd([1, 2, 3, 4, 5], type='5 * int64')

In [4]:
np.array([1, 2, 3, 4, 5])  # numpy 

array([1, 2, 3, 4, 5])

In [5]:
xnd([[1., 1.5], [-1.5, 1.]]) # xnd

xnd([[1.0, 1.5], [-1.5, 1.0]], type='2 * 2 * float64')

In [6]:
np.array([[1, 1.5], [-1.5, 1]]) # numpy 

array([[ 1. ,  1.5],
       [-1.5,  1. ]])


You can see some differences with numpy at this level already, such as the array dimensionality being included in the type.

- The default string is variable-length in xnd. In numpy, you either choose a maximum size, or use object arrays with lower performance.

In [7]:
xnd(["this", "is", "a", "test", "notebook"]) # xnd

xnd(['this', 'is', 'a', 'test', 'notebook'], type='5 * string')

In [8]:
np.array(["this", "is", "a", "test", "notebook"]) # numpy

array(['this', 'is', 'a', 'test', 'notebook'], dtype='<U8')

- xnd has a variable-length dimension type, which supports ragged arrays. [Ragged arrays](https://en.wikipedia.org/wiki/Jagged_array) also known as jagged arrays are array of arrays which can be of different sizes and producing rows of jagged edges when visualized as output. In contrast, two dimensional arrays are always rectangular. If you give this kind of data to numpy, it uses arrays which are slower and the array programming functionality in the ragged dimension.

In [9]:
xnd([[1,5,2], [1], [7,9,10,20,13]]) # xnd

xnd([[1, 5, 2], [1], [7, 9, 10, 20, 13]], type='var * var * int64')

In [10]:
np.array([[1,5,2], [1], [7,9,10,20,13]]) # numpy

array([list([1, 5, 2]), list([1]), list([7, 9, 10, 20, 13])], dtype=object)

- Categorical Type

In [11]:
levels = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
rainbow = xnd(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet', 'red', 'green'], levels=levels)
rainbow

xnd(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet', 'red', 'green'],
    type='9 * categorical('red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet')')

In [12]:
rainbow.type

ndt("9 * categorical('red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet')")

In [13]:
rainbow.value

['red',
 'orange',
 'yellow',
 'green',
 'blue',
 'indigo',
 'violet',
 'red',
 'green']

- Structs: xnd provides a convenient way to create arrays of structs.  

In [14]:
data = [{'title': 'Introduction to Digital Signal Processing', 
         'speaker': 'Allen Downey', 
          'room': 10},
        {'title': 'Making Art with Python', 
         'speaker':'Emily Xie', 
         'room': 16},
        {'title': 'Foundations of Numerical Computing in Python', 
         'speaker': 'Scott Sanderson', 
         'room': 20},
        {'title':'Exploratory Data Visualization with Vega, Vega-Lite, and Altair', 
         'speaker':'Jake VanderPlas', 
         'room': 21}]



x = xnd(data)
x

xnd([{'title': 'Introduction to Digital Signal Processing', 'speaker': 'Allen Downey', 'room': 10},
     {'title': 'Making Art with Python', 'speaker': 'Emily Xie', 'room': 16},
     {'title': 'Foundations of Numerical Computing in Python', 'speaker': 'Scott Sanderson', 'room': 20},
     {'title': 'Exploratory Data Visualization with Vega, Vega-Lite, and Altair',
      'speaker': 'Jake VanderPlas',
      'room': 21}],
    type='4 * {title : string, speaker : string, room : int64}')

In [15]:
x[0]

xnd({'title': 'Introduction to Digital Signal Processing', 'speaker': 'Allen Downey', 'room': 10},
    type='{title : string, speaker : string, room : int64}')

In [16]:
x[1, 0]

'Making Art with Python'

## Creating xnd container from numpy arrays and records arrays

In [17]:
data = np.random.random(size=(3, 4, 5))
xnd.from_buffer(data)

xnd([[[0.6076881899689686, 0.6219109519913005, 0.30344618200092954, 0.08173401021496007, 0.3844624759307528],
      [0.6856190351548628, 0.05350141186353041, 0.25899927575708814, 0.37261337824969687, 0.5977425692393364],
      [0.7945231795210885, 0.8012982119656936, 0.08137444991838749, 0.2439180258815713, 0.9529049126548508],
      [0.5988979975713662, 0.0806151782502349, 0.0008232917027927167, 0.2772265980748734, 0.8506145117732026]],
     [[0.5413552607137084, 0.38746376304706354, 0.4997580259783966, 0.4444103103829682, 0.4235021424530925],
      [0.03952215022135419, 0.5729745985892067, 0.7502721665968254, 0.8265034406454439, 0.46855793908725696],
      [0.2820798588576, 0.045337483685417346, 0.9794748192013842, 0.11539842718451232, 0.845022406930828],
      [0.7769521661141332, 0.36245588386407834, 0.22032637869994698, 0.3893101355617755, 0.8152267191084634]],
     [[0.1102928359169234, 0.022734405536281366, 0.4190526844616239, 0.7684482342007484, 0.48975669701272706],
      [0.5

In [18]:
recordarr = np.rec.array([('Hello', (1,2)),("World", (3,4))], 
               dtype=[('foo', 'S6'),('bar', [('A', int), ('B', int)])])
xnd.from_buffer(recordarr)

xnd([{'foo': b'Hello\x00', 'bar': {'A': 1, 'B': 2}}, {'foo': b'World\x00', 'bar': {'A': 3, 'B': 4}}],
    type='2 * {foo : fixed_bytes(size=6), bar : {A : int64, B : int64}}')

## Creating xnd container with explicity types

Creating an xnd container with explicit types has significant performance advantages for large arrays. This is because xnd does not have to infer the type for each element.

In [19]:
N = 1000000
alist = [1] * N

In [20]:
%%timeit
xnd(alist)

2.33 s ± 79.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [21]:
%%timeit
types = f"{N} * int64"
xnd(alist, type=types)

41 ms ± 2.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
