1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [None]:

!wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/
import numpy as np
import pandas as pd

1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [None]:
dt_all=pd.read_csv('data_000637.txt')
print (dt_all)
# now we know that there are 1310720 rows in dataset
dt=pd.read_csv('data_000637.txt', nrows = 20000)
print (dt)
print("Dataframe shape:", dt.shape) #now we have dataframe with 20K rows

2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [None]:
BX_amount = 0
COUNTER = dt['BX_COUNTER'].to_numpy()
i = 0
for i in COUNTER:
    if COUNTER[i] == 0:
        BX_amount+= 1
    
print ('amount of BX:', BX_amount)

3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

In [None]:
time = np.array(dt['ORBIT_CNT']*BX_amount*25e-9+dt['BX_COUNTER']*25e-9+dt['TDC_MEAS']*25e-9/30)
dt['ABSOLUTE_TIME'] = pd.Series(time)
dt

4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [None]:
dt=pd.read_csv('data_000637.txt', nrows = 20000)
time = np.array(dt['ORBIT_CNT']*BX_amount*25e-9+dt['BX_COUNTER']*25e-9+dt['TDC_MEAS']*25e-9/30)
total_time = time[dt.index[-1]]-time[0]
print (f'estimated total time: {round(total_time, 7)} seconds')
dt['ABSOLUTE_TIME'] = pd.Series(time)
dt

5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [None]:
dt.groupby('TDC_CHANNEL').count()["FPGA"].nlargest(3)

6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [None]:
unique = dt['ORBIT_CNT'].nunique()
print('Number of unique orbits:', unique)

7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [None]:
result_df = dt[dt['TDC_CHANNEL'] == 139]
uniqueOrbitsAmount = result_df['ORBIT_CNT'].nunique()
print('Number of unique orbits in 139 channel :', uniqueOrbitsAmount)

8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

In [None]:
dt[dt['FPGA']==1].value_counts('TDC_CHANNEL')

9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.