# Structured Arrays

NumPy also provides efficient storage of heterogeneous data

In [1]:
import numpy as np
# Data in 3 different formats (string, integer and float)
name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

The following is the declaration of a structured array that uses a compound data type specification. This allows for a single structure (`data`) that is able to store data of different formats.

In [2]:
# The number 4 specifies the number of elements each type will have
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
data.dtype

dtype([('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

`U10` is a unicode string of maximum length 10,`i4` is a 4 byte integer and `f8` is a 8 byte float. Now that the container of the data is created, we can assign data to it:

In [3]:
data['name'] = name
data['age'] = age
data['weight'] = weight
data

array([('Alice', 25, 55. ), ('Bob', 45, 85.5), ('Cathy', 37, 68. ),
       ('Doug', 19, 61.5)],
      dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

Data access can be done either by index or name:

In [4]:
# All names
data['name']

array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')

In [5]:
# First row (data entry)
data[0]

('Alice', 25, 55.)

In [6]:
# Using both also works! This returns the name from the last row
data[-1]['name']

'Doug'

In [7]:
# Use boolean masking to filter data by age and return names of the matches
data[data['age'] < 30]['name']

array(['Alice', 'Doug'], dtype='<U10')

Although this is certainly a very useful feature, the Pandas package provides much more functionality to manipulate structured data.

As an alternative to using structured arrays as presented here, NumPy also provides `np.recarray` where fields can be accessed as attributes o fan object.