1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [1]:
# If haven't downloaded it yet, please get the data file with wget
#!wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/

1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [2]:
import pandas as pd # standard naming convention
import numpy as np

file_name = "./data/data_000637.txt"
def read_data(n):
    row_count, column_count = pd.read_csv(file_name).shape
    if (n <= row_count) & (n >= 10000):
        data = pd.read_csv(file_name,nrows=n)
        return data
    else:
        return 'ERROR : n to valid'
df = read_data(1310720)
df
#we have 1310720 rows × 6 columns

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS
0,1,0,123,3869200167,2374,26
1,1,0,124,3869200167,2374,27
2,1,0,63,3869200167,2553,28
3,1,0,64,3869200167,2558,19
4,1,0,64,3869200167,2760,25
...,...,...,...,...,...,...
1310715,1,0,62,3869211171,762,14
1310716,1,1,4,3869211171,763,11
1310717,1,0,64,3869211171,764,0
1310718,1,0,139,3869211171,769,0


2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [226]:
#we check the maximum value of 'BX_COUNTER' for every orbit before begin reset to 0
dfmax = df.groupby(['ORBIT_CNT'])['BX_COUNTER'].max()
print(dfmax)

#to estimate the number of x in a orbit we do the mean 
print('\nEstimation of x :', dfmax.mean())

ORBIT_CNT
3869200167    3187
3869200168    3538
3869200169    2766
3869200170    3377
3869200171    3542
              ... 
3869211167    3553
3869211168    3556
3869211169    3498
3869211170    3527
3869211171     769
Name: BX_COUNTER, Length: 11001, dtype: int64

Estimation of x : 3280.1814380510864


3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

In [227]:
import datetime as dt
#we create a new columns with the combination of the three columns with timing information (like hours minutes seconds)
df['time'] = df['ORBIT_CNT'] + df['BX_COUNTER'] + df['TDC_MEAS']
#we calcule the absolute time since the beginning of the data acquisition
#beginning of the data acquisition => iloc[0,6]
df['time'] = np.absolute(df['time'] - df.iloc[0,6])
#we convert the new column to a Time Series in ns
df['time'] = pd.to_datetime(df['time'], unit='ns')
df

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS,time
0,1,0,123,3869200167,2374,26,1970-01-01 00:00:00.000000000
1,1,0,124,3869200167,2374,27,1970-01-01 00:00:00.000000001
2,1,0,63,3869200167,2553,28,1970-01-01 00:00:00.000000181
3,1,0,64,3869200167,2558,19,1970-01-01 00:00:00.000000177
4,1,0,64,3869200167,2760,25,1970-01-01 00:00:00.000000385
...,...,...,...,...,...,...,...
1310715,1,0,62,3869211171,762,14,1970-01-01 00:00:00.000009380
1310716,1,1,4,3869211171,763,11,1970-01-01 00:00:00.000009378
1310717,1,0,64,3869211171,764,0,1970-01-01 00:00:00.000009368
1310718,1,0,139,3869211171,769,0,1970-01-01 00:00:00.000009373


4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [228]:
df.iloc[len(df)-1,6]

Timestamp('1970-01-01 00:00:00.000009384')

5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [229]:
#groupby(['TDC_CHANNEL'])['TDC_CHANNEL'].count() -> for every TDC_CHANNEL we count it
#we use .iloc[:3] to only print to screen the top 3 with the corresponding counts
df.groupby(['TDC_CHANNEL'])['TDC_CHANNEL'].count().sort_values(ascending=False).iloc[:3]
#compter combien de fois une channel apparait

TDC_CHANNEL
139    108059
64      66020
63      64642
Name: TDC_CHANNEL, dtype: int64

6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [230]:
#drop_duplicates(keep='first') : we only keep the first instance of every orbits
#we count the orbits with at least one hit so the ones available in the data
df['ORBIT_CNT'].drop_duplicates(keep="first").count() 

11001

7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [231]:
df7 = df.loc[df['TDC_CHANNEL'] == 139] #we select the datadrame where the TDC_CHANNEL = 139
#we do the same as the question 6 : we drop duplicate and we count the number of orbits
df7['ORBIT_CNT'].drop_duplicates(keep="first").count() 

10976

8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

In [232]:
fpga0 = df[df['FPGA'] == 0].groupby(['TDC_CHANNEL'])['TDC_CHANNEL'].count()
fpga1 = df[df['FPGA'] == 1].groupby(['TDC_CHANNEL'])['TDC_CHANNEL'].count()
print("\nFPGA1 : \n",fpga1.sort_values(ascending=False))
print("\nFPGA1 type : \n",type(fpga1))
print("\nFPGA1 index : \n",fpga1.index)

print("\nFPGA0 : \n",fpga0.sort_values(ascending=False))
print("\nFPGA0 type : \n",type(fpga0))
print("\nFPGA0 index : \n",fpga0.index)


FPGA1 : 
 TDC_CHANNEL
2      32669
139    32442
1      28438
4      26403
3      21970
       ...  
9         80
130       38
137       36
138       36
129       35
Name: TDC_CHANNEL, Length: 132, dtype: int64

FPGA1 type : 
 <class 'pandas.core.series.Series'>

FPGA1 index : 
 Int64Index([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,
            ...
            123, 125, 126, 127, 128, 129, 130, 137, 138, 139],
           dtype='int64', name='TDC_CHANNEL', length=132)

FPGA0 : 
 TDC_CHANNEL
139    75617
64     64581
63     63724
61     48699
62     48275
       ...  
130       33
137       32
30         4
129        2
39         1
Name: TDC_CHANNEL, Length: 124, dtype: int64

FPGA0 type : 
 <class 'pandas.core.series.Series'>

FPGA0 index : 
 Int64Index([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,
            ...
            120, 121, 122, 123, 124, 129, 130, 137, 138, 139],
           dtype='int64', name='TDC_CHANNEL', length=124)


9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.