1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [1]:
# If haven't downloaded it yet, please get the data file with wget
# ! wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/
# ! pip3 install pandas
import pandas as pd

1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [2]:
# Part 1.
max_N = sum(1 for line in open('data/data_000637.txt'))
N = 12000
if N <= max_N and N > 10000:
    data = pd.read_csv('data/data_000637.txt', nrows=N)
    print(data)
else:
    print("Choose a suitable N value")

       HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS
0         1     0          123  3869200167        2374        26
1         1     0          124  3869200167        2374        27
2         1     0           63  3869200167        2553        28
3         1     0           64  3869200167        2558        19
4         1     0           64  3869200167        2760        25
...     ...   ...          ...         ...         ...       ...
11995     1     0          139  3869200286        3251         0
11996     1     0           62  3869200286        3246         4
11997     1     0           58  3869200286        3246        11
11998     1     0           61  3869200286        3251        17
11999     1     0           59  3869200286        3248        16

[12000 rows x 6 columns]


2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [3]:
# Part 2.
# Get BX_COUNTER columns
data_bx = data['BX_COUNTER']

# After the maximum value it should reset to 0
estimated_bx = max(data_bx) + 1
print("Estimated number of BX in a ORBIT:", estimated_bx)

Estimated number of BX in a ORBIT: 3564


3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

In [4]:
# Part 3.
# Some defitinions that are given
tdc_count = 25 / 30
unit_of_bx_counter = 25
# Calculating the absolute time in ns
data["ABS_TIME_NS"] = data['TDC_MEAS'] * (tdc_count) + data['BX_COUNTER'] * unit_of_bx_counter + data['ORBIT_CNT'] * estimated_bx * unit_of_bx_counter
print(data)

       HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS   ABS_TIME_NS
0         1     0          123  3869200167        2374        26  3.447457e+14
1         1     0          124  3869200167        2374        27  3.447457e+14
2         1     0           63  3869200167        2553        28  3.447457e+14
3         1     0           64  3869200167        2558        19  3.447457e+14
4         1     0           64  3869200167        2760        25  3.447457e+14
...     ...   ...          ...         ...         ...       ...           ...
11995     1     0          139  3869200286        3251         0  3.447457e+14
11996     1     0           62  3869200286        3246         4  3.447457e+14
11997     1     0           58  3869200286        3246        11  3.447457e+14
11998     1     0           61  3869200286        3251        17  3.447457e+14
11999     1     0           59  3869200286        3248        16  3.447457e+14

[12000 rows x 7 columns]


4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [5]:
# Part 4.
import datetime as dt

# Some given definitions
tdc_count = 25 / 30
unit_of_bx_counter = 25

# Starting the measure the duration
begin_time = dt.datetime.now()
print("Begin time:", begin_time)

# Calculating the time according to the explanation at the beginning of the assignment
time =  data['TDC_MEAS'] * (tdc_count) + data['BX_COUNTER'] * unit_of_bx_counter + data['ORBIT_CNT'] * estimated_bx * unit_of_bx_counter

end_time = dt.datetime.now()
print("End time:", end_time)
# Calculating the duration and printing
print("Elapsed time:", (end_time - begin_time))

Begin time: 2022-12-02 17:55:27.909378
End time: 2022-12-02 17:55:27.910316
Elapsed time: 0:00:00.000938


5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [6]:
# Part 5.
noisy_channels = data.groupby('TDC_CHANNEL').sum().sort_values(by = ['HEAD']).iloc[-3:]
print(noisy_channels)

             HEAD  FPGA      ORBIT_CNT  BX_COUNTER  TDC_MEAS   ABS_TIME_NS
TDC_CHANNEL                                                               
63            600     4  2321520136839     1111114      9033  2.068474e+17
64            609    15  2356342939847     1136672      8774  2.099502e+17
139          1025   315  3965930231950     1815251         0  3.533644e+17


6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [7]:
# Part 6.
non_empty = data['ORBIT_CNT'].nunique()
print('Number of orbits with at least one hit:', non_empty)

Number of orbits with at least one hit: 120


7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [8]:
# Part 7.
df = data[data['TDC_CHANNEL'] == 139]
num_unique = df['ORBIT_CNT'].nunique()
print('Unique orbits with at least one measurement from TDC_CHANNEL=139', num_unique)

Unique orbits with at least one measurement from TDC_CHANNEL=139 120


8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

In [9]:
# Part 8.
# Creating two series (one for each FPGA) that have the TDC channel as index, with the counts of corresponding TDC channel as values
fpga_series_1 = pd.Series(data[data['FPGA'] == 0]['TDC_CHANNEL'].value_counts())
fpga_series_2 = pd.Series(data[data['FPGA'] == 1]['TDC_CHANNEL'].value_counts())

print('First FPGA Serie:\n', fpga_series_1)
print('Second FPGA Serie:\n', fpga_series_2)

First FPGA Serie:
 139    710
63     596
64     594
61     448
62     423
      ... 
97       4
101      3
106      3
99       2
98       2
Name: TDC_CHANNEL, Length: 117, dtype: int64
Second FPGA Serie:
 139    315
2      296
1      270
4      243
3      229
      ... 
79       1
126      1
9        1
78       1
85       1
Name: TDC_CHANNEL, Length: 118, dtype: int64


9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.