# Decoding DCR Data

## Background

## Concepts

### Numpy

Reshaping: https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html

Slicing: https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html

## Decoding DCR Data

### Shape of `DCR.DATA.DATA`

We begin our journey in the DCR FITS file's `DATA` table. Specifically, its `DATA` column.

The shape of this data is determined by the physical attributes of the system when the data was taken. So, `DATA` is of a variable shape: `(numIntegrations, numPorts, numPhases)`.

Let's try an example:

In [18]:
numIntegrations = 3  # The number of integrations that data was taken for
numPorts = 4  # The total number of ports that took data
numPhases = 4  # The total number of phase states that data was taken during
shape = (numIntegrations, numPorts, numPhases)

# Okay, so we know our shape. Now let's make some data!
# How many entries will we have? Well, that is determined by the product of the shape:
numElements = numpy.prod(shape)
print("An array with shape {} will have {} elements".format(shape, numElements))

data = numpy.arange(numElements)
print("We now use numpy's reshape to transform this:")
print(data)

print("Into this:")
data = data.reshape(shape)
print(data)


An array with shape (3, 4, 4) will have 48 elements
We now use numpy's reshape to transform this:
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47]
Into this:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]
  [12 13 14 15]]

 [[16 17 18 19]
  [20 21 22 23]
  [24 25 26 27]
  [28 29 30 31]]

 [[32 33 34 35]
  [36 37 38 39]
  [40 41 42 43]
  [44 45 46 47]]]


### The nature of "phase"

Great! So we have successfully created an array that is the same shape as the `DATA` column. However, there is a little more to the story.

We said before that the shape of the `DATA` column is `(numIntegrations, numPorts, numPhases)`. But what is the nature of "phase"? Well, phase, as used here, is actually a _state_, and this state is really a representation of two other states: that of the calibration diode and that of the signal/reference beam selection. This is best explained with a table:

| `SIGREF` | `CAL` |      Phase key       | Phase index |
|----------|-------|----------------------|-------------|
|        0 |     0 | `Signal / No Cal`    |           0 |
|        0 |     1 | `Signal / Cal`       |           1 |
|        1 |     0 | `Reference / No Cal` |           2 |
|        1 |     1 | `Reference / Cal`    |           3 |

Note that when `SIGREF` is `0`, it means that the signal beam is selected. Otherewise, it means that the reference beam is selected. When `CAL` is `0` it means the calibration diode is off, otherwise it is on. So, how does that apply? Well, since `DATA`'s shape is based on phase, it means we must _know_ phase in order to do queries on it. This is best demonstrated via example.

### Slicing DATA using NumPy

Let's try a few "queries" on our `DATA` column to demonstrate how slicing works.

What data was taken during the first integration?

In [None]:
print(data[0])

In the first integration, what data went through the first port?

In [1]:
print(data[0][0])

NameError: name 'data' is not defined

In the first port of the first integration, what data was taken while the cal diode was **on** and the **signal** beam was selected? Well, we need to figure out the phase based on SIGREF and CAL. Using the table above we can determine that `phase = 1`. 

In [22]:
print(data[0][0][1])

1


Cool! We can now select the power value associated with a given scan, integration, port, and phase. But what if we want _all_ data taken over the first port? We could iterate through data and select it that way:

In [23]:
first_port_data = []
for integration in data:
    first_port_data.append(integration[0])
print(first_port_data)

[array([0, 1, 2, 3]), array([16, 17, 18, 19]), array([32, 33, 34, 35])]


In [None]:
However, numpy provides a much better solution:

In [27]:
first_port_data = data[:,0]
print(first_port_data)

SyntaxError: invalid syntax (<ipython-input-27-0afd7b5321d1>, line 1)

What's going on here? Well, this is more fully explained here: https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html. But it's really not that complicated.

We know that `data[:]` selects the whole array:

In [30]:
assert numpy.all(data == data[:])

We know that data[:] selects the whole array
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]
  [12 13 14 15]]

 [[16 17 18 19]
  [20 21 22 23]
  [24 25 26 27]
  [28 29 30 31]]

 [[32 33 34 35]
  [36 37 38 39]
  [40 41 42 43]
  [44 45 46 47]]]


So, what does `data[:,0]` do? It says, "Give me the first element of every element in data".

We can model `data` as a list of lists of lists. So, put another way, `data[:,0]` means "For every list of lists within `data`, give me the first list". But honestly its best demonstrated by running the code.

Let's try a few more.

Give me all of the data that was taken over the first port while the calibration diode was on and the signal beam was selected. Again, this is `phase = 1`. So...

In [32]:
data[:,0,1]

array([ 1, 17, 33])

Alright, but what if we want all of the data taken over the first port while the calibration diode was on, regardless of the signal/reference beam selection? That's actually somewhat harder, because we can no longer query by a single phase value. By referencing the table again we can see that we want phase values `1` **and** `3`. Again, though, numpy makes this easy:

In [34]:
data[:,0,[1,3]]

array([[ 1,  2],
       [17, 18],
       [33, 34]])