1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [25]:
# If haven't downloaded it yet, please get the data file with wget
#!wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/
import pandas as pd

1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [26]:
data = pd.read_csv("data/data_000637.txt", sep=",")
data.head(100)

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS
0,1,0,123,3869200167,2374,26
1,1,0,124,3869200167,2374,27
2,1,0,63,3869200167,2553,28
3,1,0,64,3869200167,2558,19
4,1,0,64,3869200167,2760,25
...,...,...,...,...,...,...
95,1,0,64,3869200168,1506,4
96,1,1,6,3869200168,1503,6
97,1,0,61,3869200168,1609,10
98,1,0,59,3869200168,1614,16


2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [27]:
data['X_VALUES'] = data['ORBIT_CNT'] / data['BX_COUNTER']
max_x_value = data[data['X_VALUES'] < float('inf') ]['X_VALUES'].max()
print("An estimation of x can be: {}".format(max_x_value))


An estimation of x can be: 3869211135.0


3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

In [28]:
TDC_TIME = 25 / 30
BX_TIME = 25
BX_per_ORBIT = max_x_value

data['TIME_NANOSECONDS'] = data['TDC_MEAS'] * TDC_TIME + (data['BX_COUNTER'] + data['ORBIT_CNT'] * BX_per_ORBIT)*BX_TIME

data.set_index('TIME_NANOSECONDS')
# time_series = pd.DataFrame(data['ABSOLUTE_TIME'])

Unnamed: 0_level_0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS,X_VALUES
TIME_NANOSECONDS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
3.742688e+20,1,0,123,3869200167,2374,26,1.629823e+06
3.742688e+20,1,0,124,3869200167,2374,27,1.629823e+06
3.742688e+20,1,0,63,3869200167,2553,28,1.515550e+06
3.742688e+20,1,0,64,3869200167,2558,19,1.512588e+06
3.742688e+20,1,0,64,3869200167,2760,25,1.401884e+06
...,...,...,...,...,...,...,...
3.742699e+20,1,0,62,3869211171,762,14,5.077705e+06
3.742699e+20,1,1,4,3869211171,763,11,5.071050e+06
3.742699e+20,1,0,64,3869211171,764,0,5.064413e+06
3.742699e+20,1,0,139,3869211171,769,0,5.031484e+06


4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [29]:
data['TIME_TO_REPAIR_D'] = pd.to_timedelta(data['TIME_NANOSECONDS'])
data.head()

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS,X_VALUES,TIME_NANOSECONDS,TIME_TO_REPAIR_D
0,1,0,123,3869200167,2374,26,1629823.0,3.742688e+20,-1 days +23:59:59.999999998
1,1,0,124,3869200167,2374,27,1629823.0,3.742688e+20,-1 days +23:59:59.999999998
2,1,0,63,3869200167,2553,28,1515550.0,3.742688e+20,-1 days +23:59:59.999999998
3,1,0,64,3869200167,2558,19,1512588.0,3.742688e+20,-1 days +23:59:59.999999998
4,1,0,64,3869200167,2760,25,1401884.0,3.742688e+20,-1 days +23:59:59.999999998


5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [30]:
data.groupby('TDC_CHANNEL' ).count().sort_values(by = ['HEAD']).iloc[-3:]

Unnamed: 0_level_0,HEAD,FPGA,ORBIT_CNT,BX_COUNTER,TDC_MEAS,X_VALUES,TIME_NANOSECONDS,TIME_TO_REPAIR_D
TDC_CHANNEL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
63,64642,64642,64642,64642,64642,64642,64642,64642
64,66020,66020,66020,66020,66020,66020,66020,66020
139,108059,108059,108059,108059,108059,108059,108059,108059


6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [31]:
print(data['ORBIT_CNT'].nunique())

11001


7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [32]:
unique_orbits_of_139 = data[data['TDC_CHANNEL']==139]['ORBIT_CNT'].nunique()
print(unique_orbits_of_139)

10976


8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

In [33]:
channel = 85

fpga_0 = data[data['FPGA'] == 0]
fpga_0 = fpga_0[fpga_0['TDC_CHANNEL']==channel].groupby('TDC_CHANNEL').count().rename(columns={'HEAD':'Count'})[['Count']]


fpga_1 = data[data['FPGA'] == 1]
fpga_1 = fpga_1[fpga_1['TDC_CHANNEL']==channel].groupby('TDC_CHANNEL').count().rename(columns={'HEAD':'Count'})[['Count']]

print("For FPGA 0, with CHANNEL", str(channel))
print(fpga_0)
print("\nFor FPGA 1, with CHANNEL", str(channel))
print(fpga_1)


For FPGA 0, with CHANNEL 85
             Count
TDC_CHANNEL       
85            2119

For FPGA 1, with CHANNEL 85
             Count
TDC_CHANNEL       
85             204


9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.