1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [3]:
# If haven't downloaded it yet, please get the data file with wget
#!wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/
FILE = "./data/data_000637.txt"
NROWMIN = 10000

# Hour, Min, Sec :
ORBIT_TIME = 1 # to multiply by by 360.000 to have a hour
BX_TIME = 25 # to multiply by by 60.000 to have a minute
TDC_TIME = 25/30 # to multiply by 1000 to have a second

import numpy as np
import pandas as pd

1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [4]:
def fromFileCreateDataFrame(iFile):
    oDataFrame = pd.read_csv(iFile, sep=",")
    nRowMax = max(NROWMIN, len(oDataFrame))
    return oDataFrame[:nRowMax]

dataFrame = fromFileCreateDataFrame(FILE)
print(dataFrame)

         HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS
0           1     0          123  3869200167        2374        26
1           1     0          124  3869200167        2374        27
2           1     0           63  3869200167        2553        28
3           1     0           64  3869200167        2558        19
4           1     0           64  3869200167        2760        25
...       ...   ...          ...         ...         ...       ...
1310715     1     0           62  3869211171         762        14
1310716     1     1            4  3869211171         763        11
1310717     1     0           64  3869211171         764         0
1310718     1     0          139  3869211171         769         0
1310719     1     0           61  3869211171         762        18

[1310720 rows x 6 columns]


2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [5]:
def estimationBXinOrbit(iDataFrame):
    colOrbit = iDataFrame["BX_COUNTER"]
    oCount = 0
    for value in colOrbit:
        if value == 0:
            oCount += 1
    return(oCount)

ORBIT_TIME = estimationBXinOrbit(dataFrame)
print(ORBIT_TIME)


354


3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

*Hint:* introduce an offset to the absolute time such that the start of the data acquisition (i.e. the first entry) is zero.

In [6]:
def addAbsoluteTime(ioDataFrame):
    OrbitCol = ioDataFrame["ORBIT_CNT"]
    BXCol = ioDataFrame["BX_COUNTER"]
    TDCCol = ioDataFrame["TDC_MEAS"]

    ref = OrbitCol[0]*ORBIT_TIME*360000+BXCol[0]*BX_TIME*6000+TDCCol[0]*TDC_TIME*1000
    TimeInNs = []

    for i in range(len(OrbitCol)):
        TimeInNs.append(abs(OrbitCol[i]*ORBIT_TIME*360000+BXCol[i]*BX_TIME*6000+TDCCol[i]*TDC_TIME*1000-ref))

    ioDataFrame.insert(len(ioDataFrame.columns),"ABS_TIME",TimeInNs)

    return(ioDataFrame)

dataFrame = addAbsoluteTime(dataFrame)
print(dataFrame)

         HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS  \
0           1     0          123  3869200167        2374        26   
1           1     0          124  3869200167        2374        27   
2           1     0           63  3869200167        2553        28   
3           1     0           64  3869200167        2558        19   
4           1     0           64  3869200167        2760        25   
...       ...   ...          ...         ...         ...       ...   
1310715     1     0           62  3869211171         762        14   
1310716     1     1            4  3869211171         763        11   
1310717     1     0           64  3869211171         764         0   
1310718     1     0          139  3869211171         769         0   
1310719     1     0           61  3869211171         762        18   

             ABS_TIME  
0        0.000000e+00  
1        8.320000e+02  
2        2.685171e+07  
3        2.759411e+07  
4        5.789920e+07  
...            

4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [7]:
def changeFormat(ioDataFrame):
    ioDataFrame["ABS_TIME"] = pd.to_datetime(ioDataFrame["ABS_TIME"],origin=ioDataFrame["ABS_TIME"][0])
    ioDataFrame["ABS_TIME"] = ioDataFrame["ABS_TIME"].dt.strftime("%H:%M:%S:%f")
    return(ioDataFrame)

dataFrame = changeFormat(dataFrame)
print(dataFrame)

         HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS  \
0           1     0          123  3869200167        2374        26   
1           1     0          124  3869200167        2374        27   
2           1     0           63  3869200167        2553        28   
3           1     0           64  3869200167        2558        19   
4           1     0           64  3869200167        2760        25   
...       ...   ...          ...         ...         ...       ...   
1310715     1     0           62  3869211171         762        14   
1310716     1     1            4  3869211171         763        11   
1310717     1     0           64  3869211171         764         0   
1310718     1     0          139  3869211171         769         0   
1310719     1     0           61  3869211171         762        18   

                ABS_TIME  
0        00:00:00:000000  
1        00:00:00:000000  
2        00:00:00:026851  
3        00:00:00:027594  
4        00:00:00:057899

5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [9]:
def noisyChannel(iDataFrame,iNumberOfChannels):
    groupedByTDCChannel = iDataFrame.groupby("TDC_CHANNEL").agg(countChannels=("TDC_CHANNEL","count"))
    print(groupedByTDCChannel.sort_values(by="countChannels",ascending=False)[:iNumberOfChannels])

noisyChannel(dataFrame,3)


             countChannels
TDC_CHANNEL               
139                 108059
64                   66020
63                   64642


6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [10]:
def nonEmptyOrbitsCount(iDataFrame):
    orbits = iDataFrame["ORBIT_CNT"].unique()
    print(pd.DataFrame({"ORBIT_CNT":orbits}))
    
nonEmptyOrbitsCount(dataFrame)


        ORBIT_CNT
0      3869200167
1      3869200168
2      3869200169
3      3869200170
4      3869200171
...           ...
10996  3869211167
10997  3869211168
10998  3869211169
10999  3869211170
11000  3869211171

[11001 rows x 1 columns]


7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [11]:
def uniqueOrbitsFromChannel(iDataFrame,iChannelNumber):
    filteredDataFrame = iDataFrame[iDataFrame.TDC_CHANNEL == iChannelNumber]
    nonEmptyOrbitsCount(filteredDataFrame)

uniqueOrbitsFromChannel(dataFrame,139)


        ORBIT_CNT
0      3869200167
1      3869200168
2      3869200169
3      3869200170
4      3869200171
...           ...
10971  3869211167
10972  3869211168
10973  3869211169
10974  3869211170
10975  3869211171

[10976 rows x 1 columns]


8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

In [12]:
def serieFPGA(iDataFrame,iValue):
    filteredDataFrame = iDataFrame[iDataFrame.FPGA == iValue]
    values = filteredDataFrame.TDC_CHANNEL.value_counts()
    index =  filteredDataFrame.TDC_CHANNEL.unique()
    newDataFrame = pd.DataFrame({"TDC_CHANNEL":values},index=index).reindex(index.sort())
    return(newDataFrame)

serieZero = serieFPGA(dataFrame,0)
print(serieZero)
serieOne = serieFPGA(dataFrame,1)
print(serieOne)

     TDC_CHANNEL
1          17994
2          16463
3          63724
4          64581
5          48699
..           ...
129           32
130           34
137            2
138            4
139            1

[124 rows x 1 columns]
     TDC_CHANNEL
1          13646
2          18869
3          32442
4          17813
5          15003
..           ...
129          196
130           38
137           36
138           36
139           35

[132 rows x 1 columns]


9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.