1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [19]:
import pandas as pd
import numpy as np

In [20]:
# If haven't downloaded it yet, please get the data file with wget
#!wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/

data_000637.txt1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [34]:

def load(N):
    dataset = pd.read_csv("C:/Users/User/desktop/data_000637.txt", nrows=N ,delimiter=',')
    
    return dataset
    

df=load(15000)
print(df)




       HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS
0         1     0          123  3869200167        2374        26
1         1     0          124  3869200167        2374        27
2         1     0           63  3869200167        2553        28
3         1     0           64  3869200167        2558        19
4         1     0           64  3869200167        2760        25
...     ...   ...          ...         ...         ...       ...
14995     1     1            4  3869200316        3399         9
14996     1     1           17  3869200316        3400        15
14997     1     1           10  3869200316        3530        16
14998     1     1            8  3869200316        3533        18
14999     1     0          139  3869200316        3539         0

[15000 rows x 6 columns]


2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [38]:


# Find the indices where BX reaches its maximum value before being reset
max_indices = df[df['BX_COUNTER'] == df['BX_COUNTER'].max()].index

# Calculate the differences between consecutive maximum indices
orbit_lengths = [max_indices[i + 1] - max_indices[i] for i in range(len(max_indices) - 1)]

# Estimate the number of occurrences of BX in one orbit
estimated_orbits = len(orbit_lengths) + 1  # Add 1 because the last orbit doesn't have a subsequent maximum

# Print the result
print(f"Estimated number of BX_COUNTER occurrences in one orbit: {estimated_orbits}")


Estimated number of BX_COUNTER occurrences in one orbit: 9


3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

*Hint:* introduce an offset to the absolute time such that the start of the data acquisition (i.e. the first entry) is zero.

In [39]:

# Combine columns 'ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS' to create absolute time in ns
df['absolute_time_ns'] = (df['ORBIT_CNT'] * 25 * 3564) + (df['BX_COUNTER'] * 3564) + df['TDC_MEAS']

# Find the start time (minimum absolute time) and use it as an offset
start_time_offset = df['absolute_time_ns'].min()

# Subtract the start time offset to create an absolute time series with an offset
df['absolute_time_ns'] -= start_time_offset

# Convert the 'absolute_time_ns' column to a Time Series
df['absolute_time_ns'] = pd.to_datetime(df['absolute_time_ns'], unit='ns')

# Print the DataFrame with the new 'absolute_time_ns' column
print(df[['ORBIT_CNT', 'BX_COUNTER', 'TDC_MEAS', 'absolute_time_ns']])


        ORBIT_CNT  BX_COUNTER  TDC_MEAS              absolute_time_ns
0      3869200167        2374        26 1970-01-01 00:00:00.008346898
1      3869200167        2374        27 1970-01-01 00:00:00.008346899
2      3869200167        2553        28 1970-01-01 00:00:00.008984856
3      3869200167        2558        19 1970-01-01 00:00:00.009002667
4      3869200167        2760        25 1970-01-01 00:00:00.009722601
...           ...         ...       ...                           ...
14995  3869200316        3399         9 1970-01-01 00:00:00.025275881
14996  3869200316        3400        15 1970-01-01 00:00:00.025279451
14997  3869200316        3530        16 1970-01-01 00:00:00.025742772
14998  3869200316        3533        18 1970-01-01 00:00:00.025753466
14999  3869200316        3539         0 1970-01-01 00:00:00.025774832

[15000 rows x 4 columns]


4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [40]:
# Find the start and end times
start_time = df['absolute_time_ns'].min()
end_time = df['absolute_time_ns'].max()
# Calculate the duration
duration = end_time - start_time

# Convert the duration to hours, minutes, and seconds
duration_hours = duration.total_seconds() / 3600
duration_minutes = duration.total_seconds() / 60
duration_seconds = duration.total_seconds()
# Print the results
print(f"Start Time: {start_time}")
print(f"End Time: {end_time}")
print(f"Duration: {duration}")
print(f"Duration in Hours: {duration_hours} hours")
print(f"Duration in Minutes: {duration_minutes} minutes")
print(f"Duration in Seconds: {duration_seconds} seconds")

Start Time: 1970-01-01 00:00:00
End Time: 1970-01-01 00:00:00.025774832
Duration: 0 days 00:00:00.025774832
Duration in Hours: 7.1594444444444436e-06 hours
Duration in Minutes: 0.00042956666666666663 minutes
Duration in Seconds: 0.025774 seconds


5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [43]:
noisy_channel_counts = df.groupby('TDC_CHANNEL').size().sort_values(ascending=False)
top_noisy_channels = noisy_channel_counts.head(3)

print("Top 3 Noisy Channels:")
print(noisy_channel_counts)

Top 3 Noisy Channels:
TDC_CHANNEL
139    1268
64      752
63      749
61      571
62      542
       ... 
107       6
106       6
55        6
125       6
67        5
Length: 129, dtype: int64


6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [44]:
# Count the number of empty values in each column
empty_counts = df.isna().sum()

# Print the result
print("Number of Empty Hits in Each Column:")
print(empty_counts)

Number of Empty Hits in Each Column:
HEAD                0
FPGA                0
TDC_CHANNEL         0
ORBIT_CNT           0
BX_COUNTER          0
TDC_MEAS            0
absolute_time_ns    0
dtype: int64


7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [46]:
unique_orbits_with_measurement = df[df['TDC_CHANNEL'] == 139].groupby('ORBIT_CNT')['TDC_CHANNEL'].nunique()
# Count the total number of unique orbits
count_unique_orbits = len(unique_orbits_with_measurement)

# Print the result
print("Number of Unique Orbits with at Least One Measurement from TDC_CHANNEL=139:", count_unique_orbits)

Number of Unique Orbits with at Least One Measurement from TDC_CHANNEL=139: 150


8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

In [47]:

# Group by 'FPGA' and 'TDC_CHANNEL', count occurrences, and create two Series
channel_counts_by_fpga = df.groupby(['FPGA', 'TDC_CHANNEL']).size()

# Extract counts for each FPGA
fpga_0_counts = channel_counts_by_fpga.loc[0] if 0 in channel_counts_by_fpga.index.levels[0] else pd.Series()
fpga_1_counts = channel_counts_by_fpga.loc[1] if 1 in channel_counts_by_fpga.index.levels[0] else pd.Series()

# Print the results
print("FPGA 0 Counts:")
print(fpga_0_counts)

print("\nFPGA 1 Counts:")
print(fpga_1_counts)


FPGA 0 Counts:
TDC_CHANNEL
1        8
2       16
3       16
4       19
5       19
      ... 
121     63
122     71
123    202
124    193
139    879
Length: 117, dtype: int64

FPGA 1 Counts:
TDC_CHANNEL
1      338
2      363
3      277
4      290
5      189
      ... 
125      6
126      7
127     16
128     18
139    389
Length: 124, dtype: int64


9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.