# Splitting a waveform by channel
Wave data is loaded by the [wave module](https://docs.python.org/3/library/wave.html#module-wave) as multiplexed batches of bytes. The number of bytes per batch depends on the sample width. Can we load it with plain Numpy, or is it too complicated?

Create some mock data, packed as 24-bit samples. Note the use of big-endianness `>` — this may need to be changed depending on the contents of the actual file being read.

In [1]:
import numpy as np
import struct

sample_width = 3
n_samples = 5
n_channels = 2

mock_data = list(range(250, 250 + n_samples * n_channels))
packed_data = b''.join([
    struct.pack(f">I", sample)[1:]
    for sample in mock_data])

print(mock_data)
print(packed_data)

[250, 251, 252, 253, 254, 255, 256, 257, 258, 259]
b'\x00\x00\xfa\x00\x00\xfb\x00\x00\xfc\x00\x00\xfd\x00\x00\xfe\x00\x00\xff\x00\x01\x00\x00\x01\x01\x00\x01\x02\x00\x01\x03'


Read the data into Numpy. Since there's no built-in 24-bit integer type, read it in as raw bytes for now.

In [2]:
data = np.frombuffer(packed_data, dtype=np.uint8)
data

array([  0,   0, 250,   0,   0, 251,   0,   0, 252,   0,   0, 253,   0,
         0, 254,   0,   0, 255,   0,   1,   0,   0,   1,   1,   0,   1,
         2,   0,   1,   3], dtype=uint8)

[Reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html#numpy.reshape) it into a 2D array of samples. Add an extra column to pad to 32-bit (to the start due to big-endianness).

In [3]:
samples = data.reshape((n_samples * n_channels, sample_width))
padded_samples = np.hstack((np.zeros((n_samples * n_channels, 1), dtype=np.uint8), samples))
print(samples)
print(padded_samples)

[[  0   0 250]
 [  0   0 251]
 [  0   0 252]
 [  0   0 253]
 [  0   0 254]
 [  0   0 255]
 [  0   1   0]
 [  0   1   1]
 [  0   1   2]
 [  0   1   3]]
[[  0   0   0 250]
 [  0   0   0 251]
 [  0   0   0 252]
 [  0   0   0 253]
 [  0   0   0 254]
 [  0   0   0 255]
 [  0   0   1   0]
 [  0   0   1   1]
 [  0   0   1   2]
 [  0   0   1   3]]


Reinterpret rows of bytes as integers. Take every second element to extract a channel.

In [4]:
int_samples = padded_samples.view('>u4').flatten()
samples_l = int_samples[::2]
print(samples_l)

[250 252 254 256 258]


In [5]:
assert np.all(samples_l == mock_data[0::2])

## Conclusion
The data was successfully deserialised, but it's complicated and we haven't even accounted for endianness. Better to use [scipy.io.wavfile.read](https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.read.html), which takes care of all that.