# 1. Structured arrays
- Sometimes we need to represent our data in heterogenous format.
- Then in Numpy we use Structured arrays and Record arrays, which provides efficint storage for heterogeneous data.
- For complex data, we use pandas dataframes.


## (a) Creating Structured arrays- Dictionary method
1. Define a strucutred array with all **zeros** and a **compound datatype** specification.

In [4]:
import numpy as np
data = np.zeros(4, dtype= {'names': ('name', 'age', 'weight'),
                            'formats':('U10', 'i4', 'f8')})

data

array([('', 0, 0.), ('', 0, 0.), ('', 0, 0.), ('', 0, 0.)],
      dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

dtype formats are: 
- U10 - Unicode string of max length 10
- i4 - integer of max 4 bot length
- f8 - 8-byte float

In [8]:
# Fill data
data['name'] = ['Alice', 'Bob', 'Cathy', 'Doug']
data['age'] = [25, 45, 37, 19]
data['weight'] = [55.0, 85.5, 68.0, 61.5]


data   # Data is stored efficiently in a single memory block.  

array([('Alice', 25, 55. ), ('Bob', 45, 85.5), ('Cathy', 37, 68. ),
       ('Doug', 19, 61.5)],
      dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

## (b) Access data of structured array

In [9]:
# Access data by field name
data['name']

array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')

In [12]:
# Access data by index
data[0]  

np.void(('Alice', 25, 55.0), dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

In [15]:
data[-1]['age']

np.int32(19)

## (c) Advanced oprations with boolean masking

In [16]:
data

array([('Alice', 25, 55. ), ('Bob', 45, 85.5), ('Cathy', 37, 68. ),
       ('Doug', 19, 61.5)],
      dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

In [17]:
data['age']

array([25, 45, 37, 19], dtype=int32)

In [20]:
data[data['age']>30]

array([('Bob', 45, 85.5), ('Cathy', 37, 68. )],
      dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

In [21]:
data[data['age']>30]['name']

array(['Bob', 'Cathy'], dtype='<U10')

- Structured arrays are a bridge between basic NumPy arrays and more advanced structures like DataFrames.
- They allow efficient memory usage and access, making them useful for specific tasks in data science.
- For more complex scenarios, consider using Pandas DataFrames, which offer greater flexibility and advanced data manipulation tools.

# 2. Creating Structured arrays - other ways

1. Use dictionary method with names and formats keys.
1(a). Can specify types using Python types or NumPy dtypes for clarity.
2. Can also use list of tuples for a cleaner approach.
3. For unnamed fields, types can be defined as a comma-separated string.

In [25]:
# Dict method
data = np.zeros(4, dtype= {'names':('name', 'age', 'weight'), 'formats':('U10', 'i4', 'f8')})
data  # Fill data accordingly

array([('', 0, 0.), ('', 0, 0.), ('', 0, 0.), ('', 0, 0.)],
      dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

In [28]:
# np dtype
np.dtype({'names':('name', 'age', 'weight'), 'formats': ((np.str_, 10), int, np.float32)})

dtype([('name', '<U10'), ('age', '<i8'), ('weight', '<f4')])

In [29]:
# names and formats as list of tuples 
np.dtype([('name', 'S10'), ('age', 'i4'), ('weight', 'f8')])

dtype([('name', 'S10'), ('age', '<i4'), ('weight', '<f8')])

In [30]:
# For unnamed fields
np.dtype('S10,i4,f8')

dtype([('f0', 'S10'), ('f1', '<i4'), ('f2', '<f8')])

## (a) Data Types in Structured Arrays

Character Codes for Data Types:
- 'b': Byte
- 'i': Signed Integer (e.g., np.int32)
- 'u': Unsigned Integer (e.g., np.uint8)
- 'f': Floating Point (e.g., np.float64)
- 'c': Complex Floating Point (e.g., np.complex128)
- 'S': String
- 'U': Unicode String
- 'V': Raw Data (void)

# 3. Advanced Compound Types

- Nested Data Structures: Structured arrays can include nested arrays or matrices, useful for complex data types.
- Useful for interfacing Python with legacy C/Fortran code, where direct mapping to C structures is needed.

In [32]:
tp = np.dtype([('id', 'i8'), ('mat', 'f8', (3, 3))])
X = np.zeros(1, dtype=tp)

X


array([(0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])],
      dtype=[('id', '<i8'), ('mat', '<f8', (3, 3))])

# 4. 4. Record Arrays (np.recarray):
- Similar to structured arrays but **fields can be accessed as attributes**, providing more convenient syntax.

In [33]:
data_rec = data.view(np.recarray)
data_rec.age


array([0, 0, 0, 0], dtype=int32)

- Accessing fields in record arrays involves more overhead compared to structured arrays.
    - Direct access: data['age']
    - Record array access: data_rec['age']
    - Attribute access: data_rec.age (most convenient but slowest)


# When to Use Structured Arrays vs. Pandas
- Structured Arrays: Best suited for cases where data needs to be directly mapped to C or Fortran binary formats.

- Pandas DataFrames: More powerful and flexible for daily data manipulation and analysis tasks, with extensive functionality beyond structured arrays.