1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [32]:
import pandas as pd
import numpy as np

In [None]:
# If haven't downloaded it yet, please get the data file with wget
#!wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/

1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [33]:

filename = "data/data_000637.txt"
data = pd.read_csv(filename)
df = pd.DataFrame(data)
df


Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS
0,1,0,123,3869200167,2374,26
1,1,0,124,3869200167,2374,27
2,1,0,63,3869200167,2553,28
3,1,0,64,3869200167,2558,19
4,1,0,64,3869200167,2760,25
...,...,...,...,...,...,...
1310715,1,0,62,3869211171,762,14
1310716,1,1,4,3869211171,763,11
1310717,1,0,64,3869211171,764,0
1310718,1,0,139,3869211171,769,0


2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [120]:
x = df["BX_COUNTER"].max() + 1 #taking into account that the count starts from 0, we will have to add 1 tu the number obtained
x

3564

3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1310720 entries, 0 to 1310719
Data columns (total 6 columns):
 #   Column       Non-Null Count    Dtype
---  ------       --------------    -----
 0   HEAD         1310720 non-null  int64
 1   FPGA         1310720 non-null  int64
 2   TDC_CHANNEL  1310720 non-null  int64
 3   ORBIT_CNT    1310720 non-null  int64
 4   BX_COUNTER   1310720 non-null  int64
 5   TDC_MEAS     1310720 non-null  int64
dtypes: int64(6)
memory usage: 60.0 MB


In [119]:
df["ABS_TIME(ns)"] = (df["ORBIT_CNT"]*3564*25) + df["BX_COUNTER"] + (df["TDC_MEAS"] * (25/30)) #using the description given in the instruction to get the absolute values in ns
df

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS,ABS_TIME(ns)
0,1,0,123,3869200167,2374,26,3.447457e+14
1,1,0,124,3869200167,2374,27,3.447457e+14
2,1,0,63,3869200167,2553,28,3.447457e+14
3,1,0,64,3869200167,2558,19,3.447457e+14
4,1,0,64,3869200167,2760,25,3.447457e+14
...,...,...,...,...,...,...,...
1310715,1,0,62,3869211171,762,14,3.447467e+14
1310716,1,1,4,3869211171,763,11,3.447467e+14
1310717,1,0,64,3869211171,764,0,3.447467e+14
1310718,1,0,139,3869211171,769,0,3.447467e+14


In [25]:
df =  # pd.to_timedelta(df["ABS_TIME(ns)"], unit = "ns")
abs_timeseries

0         3 days 23:45:45.734882095
1         3 days 23:45:45.734882096
2         3 days 23:45:45.734882276
3         3 days 23:45:45.734882273
4         3 days 23:45:45.734882480
                     ...           
1310715   3 days 23:45:46.715336873
1310716   3 days 23:45:46.715336872
1310717   3 days 23:45:46.715336864
1310718   3 days 23:45:46.715336869
1310719   3 days 23:45:46.715336877
Name: ABS_TIME(ns), Length: 1310720, dtype: timedelta64[ns]

4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [118]:
abs_timeseries = pd.to_timedelta(df["ABS_TIME(ns)"], unit = "ns")
abs_timeseries

0         3 days 23:45:45.734882095
1         3 days 23:45:45.734882096
2         3 days 23:45:45.734882276
3         3 days 23:45:45.734882273
4         3 days 23:45:45.734882480
                     ...           
1310715   3 days 23:45:46.715336873
1310716   3 days 23:45:46.715336872
1310717   3 days 23:45:46.715336864
1310718   3 days 23:45:46.715336869
1310719   3 days 23:45:46.715336877
Name: ABS_TIME(ns), Length: 1310720, dtype: timedelta64[ns]

5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [122]:
sect = df.groupby("TDC_CHANNEL").count()
sect = sect.sort_values("BX_COUNTER", ascending = False)["BX_COUNTER"]
print(sect.head(3))

TDC_CHANNEL
139    108059
64      66020
63      64642
Name: BX_COUNTER, dtype: int64


In [70]:
df.groupby(by ="TDC_CHANNEL", sort = True).last(3)

Unnamed: 0_level_0,HEAD,FPGA,ORBIT_CNT,BX_COUNTER,TDC_MEAS
TDC_CHANNEL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,1,1,3869211170,3517,5
2,1,1,3869211170,3527,24
3,1,1,3869211168,3116,16
4,1,1,3869211171,763,11
5,1,1,3869211170,2484,20
...,...,...,...,...,...
129,1,1,3869211154,2405,20
130,1,1,3869211157,358,10
137,1,0,3869211154,2400,11
138,1,0,3869211154,2400,9


6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [44]:
len((df["ORBIT_CNT"] >= 1))

1310720

7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [65]:
mask = (df["TDC_CHANNEL"] == 139) & (df["ORBIT_CNT"] != 0)
len(df[mask]["ORBIT_CNT"].value_counts().index)

10976

8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

In [121]:
mask0 = df["FPGA"] == 0                           #creating a filter for the the value "FPGA" == 0 
df[mask0]["TDC_CHANNEL"].value_counts()           #applying the filter and extracting the Series 

123    17994
124    16463
63     63724
64     64581
61     48699
       ...  
137       32
138       34
129        2
30         4
39         1
Name: TDC_CHANNEL, Length: 124, dtype: int64

In [40]:
mask1 = df["FPGA"] == 1                           # creating creating a filter for the the value "FPGA" == 1
df[mask1]["TDC_CHANNEL"].value_counts()           # applying the filter and extracting the Series 

2      32669
139    32442
1      28438
4      26403
3      21970
       ...  
9         80
130       38
138       36
137       36
129       35
Name: TDC_CHANNEL, Length: 132, dtype: int64

9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.