1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [None]:
# If haven't downloaded it yet, please get the data file with wget
#!wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/

1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime as dt
import json
import csv 

data = pd.read_csv('data/data_000637.txt')
print(data)
r = data.shape[0]
print(r)
data.iloc[1000:r]

print(data.info()) # new data frame's number of rows = 500000

         HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS
0           1     0          123  3869200167        2374        26
1           1     0          124  3869200167        2374        27
2           1     0           63  3869200167        2553        28
3           1     0           64  3869200167        2558        19
4           1     0           64  3869200167        2760        25
...       ...   ...          ...         ...         ...       ...
1310715     1     0           62  3869211171         762        14
1310716     1     1            4  3869211171         763        11
1310717     1     0           64  3869211171         764         0
1310718     1     0          139  3869211171         769         0
1310719     1     0           61  3869211171         762        18

[1310720 rows x 6 columns]
1310720
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1310720 entries, 0 to 1310719
Data columns (total 6 columns):
 #   Column       Non-Null Count    Dtype
---  -

2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [5]:
bx = data["BX_COUNTER"]
estimated_bx = max(bx)
print("BX value:", estimated_bx)

BX value: 3563


3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

In [27]:
tdc = data['TDC_MEAS']
bx_counter = data['BX_COUNTER']
orbit_counter = data['ORBIT_CNT']
data["time"] =  (tdc *(25/30)) + (bx_counter * 25) + (orbit_counter * estimated_bx*25)
print(data)

       HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS          time
0         1     0          123  3869200167        2374        26  3.446490e+14
1         1     0          124  3869200167        2374        27  3.446490e+14
2         1     0           63  3869200167        2553        28  3.446490e+14
3         1     0           64  3869200167        2558        19  3.446490e+14
4         1     0           64  3869200167        2760        25  3.446490e+14
...     ...   ...          ...         ...         ...       ...           ...
14995     1     1            4  3869200316        3399         9  3.446490e+14
14996     1     1           17  3869200316        3400        15  3.446490e+14
14997     1     1           10  3869200316        3530        16  3.446490e+14
14998     1     1            8  3869200316        3533        18  3.446490e+14
14999     1     0          139  3869200316        3539         0  3.446490e+14

[15000 rows x 7 columns]


4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [13]:
start_time = dt.datetime.now()

data = pd.read_csv('data/data_000637.txt')
r = data.shape[0]
data.iloc[1000:r]
#tdc = data['TDC_MEAS']
#bx_counter = data['BX_COUNTER']
#orbit_counter = data['ORBIT_CNT']
#estimated_bx = max(bx_counter)
#time =  (tdc *(25/30)) + (bx_counter * 25) + (orbit_counter * estimated_bx*25)

end_time = dt.datetime.now()
print("Elapsed time:", (end_time - start_time))

Elapsed time: 0:00:00.647144


5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [36]:
group_data =  data.groupby('TDC_CHANNEL').sum().sort_values(by = ['HEAD'])
best_three = group_data.iloc[-3:]
print(best_three)

             HEAD  FPGA      ORBIT_CNT  BX_COUNTER  TDC_MEAS          time
TDC_CHANNEL                                                               
63            749     6  2898030982000     1364359     11085  2.581421e+17
64            752    17  2909638583165     1394717     10889  2.591761e+17
139          1268   389  4906145905369     2247027         0  4.370149e+17


6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [29]:
arr = data[ data['ORBIT_CNT'] > 0 ] # find the elements that ORBIT_CNT > 0
result = arr.ORBIT_CNT.nunique()
print("the number of nonempty orbits: ",result)

the number of nonempty orbits:  11001


7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [41]:
result = data[ data['TDC_CHANNEL'] == 139 ] # find the elements with TDC_CHANNEL == 139 
result = result.ORBIT_CNT.nunique() #find the unique ones
print("the number of unique orbits with tdc = 139: ",result)

the number of unique orbits with tdc = 139:  150


8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.