1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [1]:
# If haven't downloaded it yet, please get the data file with wget
#!wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/

1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [2]:
import pandas as pd
#you can choose the number of rows that you eant to demonstrate with variable "N" : 
N=69000
df = pd.read_csv('C:/Users/sina tavakoli/Desktop/data_000637.txt', nrows=N)
table = pd.DataFrame(df)
print(table)



       HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS
0         1     0          123  3869200167        2374        26
1         1     0          124  3869200167        2374        27
2         1     0           63  3869200167        2553        28
3         1     0           64  3869200167        2558        19
4         1     0           64  3869200167        2760        25
...     ...   ...          ...         ...         ...       ...
68995     1     0           51  3869200853        1391         7
68996     1     0          139  3869200853        1395         0
68997     1     0           62  3869200853        1387        24
68998     1     0           46  3869200853        1391         5
68999     1     0           49  3869200853        1396        27

[69000 rows x 6 columns]


2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [3]:
maximum = table['BX_COUNTER'].max()
print(f'the maximum number for "BX_COUNTER" columns is : \n {maximum} ')

the maximum number for "BX_COUNTER" columns is : 
 3563 


3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

*Hint:* introduce an offset to the absolute time such that the start of the data acquisition (i.e. the first entry) is zero.

In [4]:
table = (table.assign(ABSOLUTE_TIME = table['ORBIT_CNT'] + table['BX_COUNTER'] + table['TDC_MEAS']))/(10**9)
print(table)

               HEAD  FPGA   TDC_CHANNEL  ORBIT_CNT  BX_COUNTER      TDC_MEAS   
0      1.000000e-09   0.0  1.230000e-07   3.869200    0.000002  2.600000e-08  \
1      1.000000e-09   0.0  1.240000e-07   3.869200    0.000002  2.700000e-08   
2      1.000000e-09   0.0  6.300000e-08   3.869200    0.000003  2.800000e-08   
3      1.000000e-09   0.0  6.400000e-08   3.869200    0.000003  1.900000e-08   
4      1.000000e-09   0.0  6.400000e-08   3.869200    0.000003  2.500000e-08   
...             ...   ...           ...        ...         ...           ...   
68995  1.000000e-09   0.0  5.100000e-08   3.869201    0.000001  7.000000e-09   
68996  1.000000e-09   0.0  1.390000e-07   3.869201    0.000001  0.000000e+00   
68997  1.000000e-09   0.0  6.200000e-08   3.869201    0.000001  2.400000e-08   
68998  1.000000e-09   0.0  4.600000e-08   3.869201    0.000001  5.000000e-09   
68999  1.000000e-09   0.0  4.900000e-08   3.869201    0.000001  2.700000e-08   

       ABSOLUTE_TIME  
0           3.86

4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [5]:
table['TIME_DURATION'] = pd.to_timedelta(table['ABSOLUTE_TIME'])
print(f"The duration of the whole data taking : \n {table['TIME_DURATION'].max() - table['TIME_DURATION'].min()}")


The duration of the whole data taking : 
 0 days 00:00:00


5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [6]:
grouped_data = table.groupby('HEAD')['TDC_CHANNEL']
selected_grouped_data = grouped_data.head(3)
print(f'the result for the the top 3 noisy channel is : \n {selected_grouped_data}')

the result for the the top 3 noisy channel is : 
 0    1.230000e-07
1    1.240000e-07
2    6.300000e-08
Name: TDC_CHANNEL, dtype: float64


6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [7]:
non_empty=table.groupby(['ORBIT_CNT']).count()
size_of_orbit = len(non_empty)
print(f"the number of orbits with at least one hit is : \n {size_of_orbit}")

the number of orbits with at least one hit is : 
 687


7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [8]:
orbits=df[df['TDC_CHANNEL']==139]
unique=len(orbits.groupby('ORBIT_CNT').count())
print(f'the no. of unique orbits with at least one measurement from TDC_CHANNEL=139 is : \n {unique}')

the no. of unique orbits with at least one measurement from TDC_CHANNEL=139 is : 
 686


8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

In [9]:
FPGA_0 = pd.Series(table[table['FPGA'] == 0]['TDC_CHANNEL'].value_counts())
FPGA_1 = pd.Series(table[table['FPGA'] == 1]['TDC_CHANNEL'].value_counts())
print(f'Series 1 is :\n {FPGA_0}')
print(f'Series 2 is : \n {FPGA_1}')

Series 1 is :
 TDC_CHANNEL
1.390000e-07    4016
6.400000e-08    3420
6.300000e-08    3389
6.100000e-08    2582
6.200000e-08    2536
                ... 
1.010000e-07      19
6.700000e-08       8
1.300000e-07       2
1.370000e-07       1
1.380000e-07       1
Name: count, Length: 121, dtype: int64
Series 2 is : 
 Series([], Name: count, dtype: int64)


9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.