1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [None]:
# If haven't downloaded it yet, please get the data file with wget
#!wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/

1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [None]:
import pandas as pd

# Specify the number of rows to read
N = 15000

# Create a DataFrame from the dataset, reading N rows
data = pd.read_csv('data/data_000637.txt', sep=',', nrows=N)

2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [None]:
# The maximum BX_COUNTER value, which is equal to x, is the number before that BX_COUNTER is resetted and ORBIT_CNT is increased by one
x = data['BX_COUNTER'].max()

3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

*Hint:* introduce an offset to the absolute time such that the start of the data acquisition (i.e. the first entry) is zero.

In [None]:
# Calculate the absolute time in nanoseconds
data['ABSOLUTE_TIME'] = (data['ORBIT_CNT'] * x) * 25 + data['BX_COUNTER'] * 25 + data['TDC_MEAS'] * (25/30)

# Calculate the offset to make the start of data acquisition zero
offset = data['ABSOLUTE_TIME'].min()
data['ABSOLUTE_TIME'] -= offset

# Convert the new column to a Time Series
data['ABSOLUTE_TIME'] = pd.to_datetime(data['ABSOLUTE_TIME'], unit='ns').dt.strftime('%H:%M:%S.%f')

4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [None]:
# Total duration is the maximum absolute time in the set
total_duration = data['ABSOLUTE_TIME'].max()

5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [None]:
# Group the DataFrame by the "TDC_CHANNEL" column and count the occurrences
grouped = data.groupby('TDC_CHANNEL').size()

# Get the top three most frequent TDC channels
top_three_channels = grouped.nlargest(3)

print(top_three_channels)

6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [None]:
# Get the number of unique entries in ORBIT CNT
non_empty_orbits = data['ORBIT_CNT'].nunique()

7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [None]:
# Filter the DataFrame based on the condition TDC_CHANNEL == 139
filtered_data = data[data['TDC_CHANNEL'] == 139]

# Get the unique orbits from the "TEAM" column in the filtered DataFrame
unique_orbits = filtered_data['ORBIT_CNT'].nunique()

8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

In [None]:
# Creating the two series
fpga0 = data[data['FPGA'] == 0]['TDC_CHANNEL'].value_counts().sort_index()
fpga1 = data[data['FPGA'] == 1]['TDC_CHANNEL'].value_counts().sort_index()
print("FPGA 0:")
print(fpga0)
print("FPGA 1:")
print(fpga1)

9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.