1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [2]:
# If haven't downloaded it yet, please get the data file with wget
!wget --no-check-certificate https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/

--2022-12-02 23:49:49--  https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt
Resolving www.dropbox.com (www.dropbox.com)... 162.125.69.18
Connecting to www.dropbox.com (www.dropbox.com)|162.125.69.18|:443... connected.
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 302 Found
Location: /s/raw/xvjzaxzz3ysphme/data_000637.txt [following]
--2022-12-02 23:49:49--  https://www.dropbox.com/s/raw/xvjzaxzz3ysphme/data_000637.txt
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://ucf03171300089ef05221f95c32c.dl.dropboxusercontent.com/cd/0/inline/Bx64-kSDUt0vbcZ0xdFM1IH8mpRneBXZoBmuulJUQP5eHkHZ3LChpKveelZpKyZkVl7oEcs5gwV_mhiz-lk2JhrhUBPo7x1o9oJ13AKNxmhVQ3pvwjCXX-Y4PsFu_OHXAObmsKbj2iT1VVjEtjFixgRrRlgMfLV27yDFMiry5GKmRw/file# [following]
--2022-12-02 23:49:50--  https://ucf03171300089ef05221f95c32c.dl.dropboxusercontent.com/cd/0/inline/Bx64-kSDUt0vbcZ0xdFM1IH8mpRneBXZoBmuulJUQP5eHk

1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [7]:
import pandas as pd
import numpy as np
file_name = "./data/data_000637.txt"
data = pd.read_csv(file_name, nrows=20000)
print(data)
data.head()
df = pd.DataFrame(data)
data_complete=pd.read_csv(file_name)
df_complete=pd.DataFrame(data_complete)
m,n=data.shape
df.head()


       HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS
0         1     0          123  3869200167        2374        26
1         1     0          124  3869200167        2374        27
2         1     0           63  3869200167        2553        28
3         1     0           64  3869200167        2558        19
4         1     0           64  3869200167        2760        25
...     ...   ...          ...         ...         ...       ...
19995     1     0           27  3869200366        2513        29
19996     1     0           63  3869200366        2517         6
19997     1     0           32  3869200366        2519         5
19998     1     0           17  3869200366        2522        21
19999     1     0           64  3869200366        2522         0

[20000 rows x 6 columns]


Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS
0,1,0,123,3869200167,2374,26
1,1,0,124,3869200167,2374,27
2,1,0,63,3869200167,2553,28
3,1,0,64,3869200167,2558,19
4,1,0,64,3869200167,2760,25


2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [9]:
BX_number= df_complete.BX_COUNTER.max()
print("Number of BX in an Orbit is: ", BX_number)
for i in range(m):
    if data_complete.iloc[i]['BX_COUNTER']==0:
        zero=i
        max_BX=data_complete.iloc[i-1]['BX_COUNTER']
        max_Row=i-1
        
print("MAX BX ",max_BX, "reached at row", max_Row)


Number of BX in an Orbit is:  3563
MAX BX  3551 reached at row 15514


3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

In [11]:
df_complete['absolute_time_ns']=df_complete['TDC_MEAS']*25/30 + df_complete['BX_COUNTER']*25 + df_complete['ORBIT_CNT']*25*max_BX
s=pd.Series(pd.to_timedelta(df_complete['absolute_time_ns'], unit='ns'))
df_complete

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS,absolute_time_ns
0,1,0,123,3869200167,2374,26,3.434882e+14
1,1,0,124,3869200167,2374,27,3.434882e+14
2,1,0,63,3869200167,2553,28,3.434882e+14
3,1,0,64,3869200167,2558,19,3.434882e+14
4,1,0,64,3869200167,2760,25,3.434882e+14
...,...,...,...,...,...,...,...
1310715,1,0,62,3869211171,762,14,3.434892e+14
1310716,1,1,4,3869211171,763,11,3.434892e+14
1310717,1,0,64,3869211171,764,0,3.434892e+14
1310718,1,0,139,3869211171,769,0,3.434892e+14


4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [12]:
print("in hour is: ", pd.to_timedelta(df_complete['absolute_time_ns'], unit='h'), '\n' '\n' "in minute is: ", pd.to_timedelta(df_complete['absolute_time_ns'], unit='m'), '\n' '\n' "in seconds is: ", pd.to_timedelta(df_complete['absolute_time_ns'], unit='s') )

in hour is:  0           51010 days 05:24:06.479875584
1           51010 days 06:12:51.479875584
2           51196 days 18:01:36.479875584
3           51201 days 15:31:36.479875584
4           51412 days 06:31:36.479875584
                        ...              
1310715   -26593 days +12:22:47.955516928
1310716   -26592 days +10:52:47.955516928
1310717   -26591 days +02:41:32.955516928
1310718   -26586 days +07:41:32.955516928
1310719   -26593 days +15:41:32.955516928
Name: absolute_time_ns, Length: 1310720, dtype: timedelta64[ns] 

in minute is:  0         22200 days 13:38:51.478953088
1         22200 days 13:39:40.228953088
2         22203 days 16:15:28.978953088
3         22203 days 18:12:58.978953088
4         22207 days 06:27:58.978953088
                       ...             
1310715   60049 days 14:05:10.350298240
1310716   60049 days 14:27:40.350298240
1310717   60049 days 14:43:29.100298240
1310718   60049 days 16:48:29.100298240
1310719   60049 days 14:08:29.100298240
Name

5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [13]:
df_complete.groupby('TDC_CHANNEL').max()[-3:]


Unnamed: 0_level_0,HEAD,FPGA,ORBIT_CNT,BX_COUNTER,TDC_MEAS,absolute_time_ns
TDC_CHANNEL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
137,1,1,3869211154,3535,15,343489200000000.0
138,1,1,3869211154,3535,9,343489200000000.0
139,1,1,3869211171,3563,0,343489200000000.0


6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [57]:
count = df_complete[(df_complete['TDC_CHANNEL'])!=0]
count_unique = len(set(count.ORBIT_CNT))
count_unique

6959

7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [58]:
count1 =df_complete[(df_complete['TDC_CHANNEL'])==139]
count139 = len(set(count1.ORBIT_CNT))
count139

6934

8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

In [14]:
df_complete.head()
ser1 = pd.Series(data=df_complete['BX_COUNTER'], index=df_complete['TDC_CHANNEL'])
ser2 = pd.Series(data=df_complete['ORBIT_CNT'], index=df_complete['TDC_CHANNEL'])
print(ser1,ser2)

9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.