1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [None]:
# If haven't downloaded it yet, please get the data file with wget
#!wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/

1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [None]:
import pandas as pd
#you can choose the number of rows that you eant to demonstrate with variable "N" : 
N=69000
df = pd.read_csv('C:/Users/sina tavakoli/Desktop/data_000637.txt', nrows=N)
table = pd.DataFrame(df)
print(table)



2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [None]:
maximum = table['BX_COUNTER'].max()
print(f'the maximum number for "BX_COUNTER" columns is : \n {maximum} ')

3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

In [None]:
table = (table.assign(ABSOLUTE_TIME = table['ORBIT_CNT'] + table['BX_COUNTER'] + table['TDC_MEAS']))/(10**9)
print(table)

4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [None]:
table['TIME_DURATION'] = pd.to_timedelta(table['ABSOLUTE_TIME'])
print(f"The duration of the whole data taking : \n {table['TIME_DURATION'].max() - table['TIME_DURATION'].min()}")


5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [None]:
grouped_data = table.groupby('HEAD')['TDC_CHANNEL']
selected_grouped_data = grouped_data.head(3)
print(f'the result for the the top 3 noisy channel is : \n {selected_grouped_data}')

6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [None]:
non_empty=table.groupby(['ORBIT_CNT']).count()
size_of_orbit = len(non_empty)
print(f"the number of orbits with at least one hit is : \n {size_of_orbit}")

7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [None]:
orbits=df[df['TDC_CHANNEL']==139]
unique=len(orbits.groupby('ORBIT_CNT').count())
print(f'the no. of unique orbits with at least one measurement from TDC_CHANNEL=139 is : \n {unique}')

8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

In [None]:
FPGA_0 = pd.Series(table[table['FPGA'] == 0]['TDC_CHANNEL'].value_counts())
FPGA_1 = pd.Series(table[table['FPGA'] == 1]['TDC_CHANNEL'].value_counts())
print(f'Series 1 is :\n {FPGA_0}')
print(f'Series 2 is : \n {FPGA_1}')

9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.