1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [1]:
# If haven't downloaded it yet, please get the data file with wget
#!wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/

1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [2]:
import pandas as pd
file_name = "./data/data_000637.txt"
N = 15000
data = pd.read_csv(file_name, nrows=N)
data

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS
0,1,0,123,3869200167,2374,26
1,1,0,124,3869200167,2374,27
2,1,0,63,3869200167,2553,28
3,1,0,64,3869200167,2558,19
4,1,0,64,3869200167,2760,25
...,...,...,...,...,...,...
14995,1,1,4,3869200316,3399,9
14996,1,1,17,3869200316,3400,15
14997,1,1,10,3869200316,3530,16
14998,1,1,8,3869200316,3533,18


2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [3]:
result = 0
for i in range(len(data)):
    if data.loc[data.index[i], 'BX_COUNTER'] == 0:
        result = data.loc[data.index[i-1], 'BX_COUNTER']
print(result)

3563


3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

*Hint:* introduce an offset to the absolute time such that the start of the data acquisition (i.e. the first entry) is zero.

In [4]:
tdc_count_to_ns = 25 / 30  
bx_counter_to_ns = 25  
orbit_cnt_factor = data['BX_COUNTER'].max()  

data['absolute_time_ns'] = (
    data['ORBIT_CNT'] * orbit_cnt_factor * bx_counter_to_ns + data['BX_COUNTER'] * bx_counter_to_ns + data['TDC_MEAS'] * tdc_count_to_ns)


data['absolute_time_ns'] = data['absolute_time_ns'] - data['absolute_time_ns'].min()
data2 = data['absolute_time_ns']
print(data.head())

data['absolute_time_ns'] = pd.to_datetime(data['absolute_time_ns'],format = "%f")

print(data.head())

   HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS  absolute_time_ns
0     1     0          123  3869200167        2374        26            0.0000
1     1     0          124  3869200167        2374        27            0.8125
2     1     0           63  3869200167        2553        28         4476.6250
3     1     0           64  3869200167        2558        19         4594.1250
4     1     0           64  3869200167        2760        25         9649.1250
   HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS  \
0     1     0          123  3869200167        2374        26   
1     1     0          124  3869200167        2374        27   
2     1     0           63  3869200167        2553        28   
3     1     0           64  3869200167        2558        19   
4     1     0           64  3869200167        2760        25   

            absolute_time_ns  
0 1900-01-01 00:00:00.000000  
1 1900-01-01 00:00:00.000000  
2 1900-01-01 00:00:00.447600  
3 1900-01-01 00:0

4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [5]:
print(data2.max())
seconds = data2.max() / 10**9
print(f"duration of data is 0 hours 0 minutes and {seconds} seconds")

13301278.3125
duration of data is 0 hours 0 minutes and 0.0133012783125 seconds


5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [6]:
import numpy as np
channel_counts = data.groupby("TDC_CHANNEL")["BX_COUNTER"].count()
channel_counts = channel_counts.sort_values(ascending=False)
print(channel_counts[:3])

TDC_CHANNEL
139    1268
64      752
63      749
Name: BX_COUNTER, dtype: int64


6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [7]:
non_empty_orbits_count = data[data['BX_COUNTER'] > 0]['ORBIT_CNT'].nunique()
print(f"Number of non-empty orbits: {non_empty_orbits_count}")

Number of non-empty orbits: 150


7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [8]:
unique_orbits_channel139 = data[data['TDC_CHANNEL'] == 139]['ORBIT_CNT'].nunique()
print(f"number of unique orbits with at least one measurement from channel 139: {unique_orbits_channel139}")

number of unique orbits with at least one measurement from channel 139: 150


8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

In [9]:
fpga_0_series = pd.Series(data[data['FPGA'] == 0]['TDC_CHANNEL'].value_counts())
fpga_1_series = pd.Series(data[data['FPGA'] == 1]['TDC_CHANNEL'].value_counts())

print("FPGA 0 Series:")
print(fpga_0_series.head())

print("\nFPGA 1 Series:")
print(fpga_1_series.head())

FPGA 0 Series:
139    879
63     743
64     735
61     555
62     529
Name: TDC_CHANNEL, dtype: int64

FPGA 1 Series:
139    389
2      363
1      338
4      290
3      277
Name: TDC_CHANNEL, dtype: int64


9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.