1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [16]:
# If haven't downloaded it yet, please get the data file with wget
#!wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/

1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [12]:

import pandas as pd
import numpy as np

file_name = "./data/data_000637.txt"
data = pd.read_csv(file_name
                   #, nrows=1000000
                  )
data

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS
0,1,0,123,3869200167,2374,26
1,1,0,124,3869200167,2374,27
2,1,0,63,3869200167,2553,28
3,1,0,64,3869200167,2558,19
4,1,0,64,3869200167,2760,25
...,...,...,...,...,...,...
1310715,1,0,62,3869211171,762,14
1310716,1,1,4,3869211171,763,11
1310717,1,0,64,3869211171,764,0
1310718,1,0,139,3869211171,769,0


2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [2]:
max=np.max(data["BX_COUNTER"])
print(max)

3563


3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

In [3]:
data["TIME"]=data["TDC_MEAS"]*25/30+25*data["BX_COUNTER"]+25*max*data["ORBIT_CNT"]
#data["TIME"]=pd.to_datetime(data["TIME"], unit="ns")
data

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS,TIME
0,1,0,123,3869200167,2374,26,3.446490e+14
1,1,0,124,3869200167,2374,27,3.446490e+14
2,1,0,63,3869200167,2553,28,3.446490e+14
3,1,0,64,3869200167,2558,19,3.446490e+14
4,1,0,64,3869200167,2760,25,3.446490e+14
...,...,...,...,...,...,...,...
1310715,1,0,62,3869211171,762,14,3.446500e+14
1310716,1,1,4,3869211171,763,11,3.446500e+14
1310717,1,0,64,3869211171,764,0,3.446500e+14
1310718,1,0,139,3869211171,769,0,3.446500e+14


4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [4]:
elapsed_time=data.iloc[1310719,6]-data.iloc[0,6]
print(pd.to_timedelta(elapsed_time))

#data["OK"]=data["TIME"]-data.iloc[0,6]
#data["OK"]=pd.to_timedelta(data["OK"])
#data

0 days 00:00:00.980140993


5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [5]:

max_TDC=np.max(data["TDC_CHANNEL"])
print(max_TDC)
data.groupby("TDC_CHANNEL").get_group(max_TDC).head(3)






139


Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS,TIME
7,1,0,139,3869200167,2776,0,344649000000000.0
15,1,1,139,3869200167,2797,0,344649000000000.0
30,1,0,139,3869200167,3085,0,344649000000000.0


6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [16]:
df=data[data > 3869200167]
dt=data.copy()
dt["ORBIT_CNT"] = df["ORBIT_CNT"]
dt.count()

HEAD           1310720
FPGA           1310720
TDC_CHANNEL    1310720
ORBIT_CNT      1310677
BX_COUNTER     1310720
TDC_MEAS       1310720
dtype: int64

7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [22]:
mambo=data.groupby("TDC_CHANNEL").get_group(max_TDC)
len(mambo.groupby("ORBIT_CNT").groups)

10976

8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

11004


9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.