1\. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.

In [2]:
# If haven't downloaded it yet, please get the data file with wget
!curl https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/
    
    
    
#invoc iwr
import pandas as pd
import numpy as np
import random as r

<a href="/s/raw/xvjzaxzz3ysphme/data_000637.txt">Found</a>.



  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100    61    0    61    0     0    130      0 --:--:-- --:--:-- --:--:--   131


1\. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [3]:
file_name = "data_000637.txt"
data = pd.read_csv(file_name)

Nmax = len(data)

Nmin = 10000

N  = r.randint(Nmin, Nmax)

data = data[0:N]
data

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS
0,1,0,123,3869200167,2374,26
1,1,0,124,3869200167,2374,27
2,1,0,63,3869200167,2553,28
3,1,0,64,3869200167,2558,19
4,1,0,64,3869200167,2760,25
...,...,...,...,...,...,...
24655,1,0,64,3869200412,2145,28
24656,1,0,48,3869200412,2144,16
24657,1,0,86,3869200412,2270,1
24658,1,0,62,3869200412,2362,18


2\. Estimate the number of BX in a ORBIT (the value `x`).

*Hint*: check when the BX counter reaches the maximum value before being reset to 0.

In [3]:
# prendo la colonna BX

BX = data.loc[:,['BX_COUNTER']]

x = max(BX.values)

print(x)


[3563]


3\. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

*Hint:* introduce an offset to the absolute time such that the start of the data acquisition (i.e. the first entry) is zero.

In [4]:
data['TimeSeries'] = (data['ORBIT_CNT'] * x * 25) + (data['BX_COUNTER'] * 25) + data['TDC_MEAS'] * (25/30)

data['TimeSeries'] -= data['TimeSeries'].min()

data

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS,TimeSeries
0,1,0,123,3869200167,2374,26,0.000000e+00
1,1,0,124,3869200167,2374,27,8.125000e-01
2,1,0,63,3869200167,2553,28,4.476625e+03
3,1,0,64,3869200167,2558,19,4.594125e+03
4,1,0,64,3869200167,2760,25,9.649125e+03
...,...,...,...,...,...,...,...
149404,1,0,58,3869201654,2913,8,1.324680e+08
149405,1,1,20,3869201654,2915,12,1.324680e+08
149406,1,0,64,3869201654,2911,5,1.324679e+08
149407,1,0,53,3869201654,2914,0,1.324680e+08


4\. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [5]:
durata = data['TimeSeries'].max() - data['TimeSeries'].min() 
durata_ore = int(durata // 3600)
durata_minuti = int((durata % 3600) // 60)
durata_secondi = int(durata % 60)

print(f"Durata della raccolta dati: {durata_ore} ore, {durata_minuti} minuti, {durata_secondi} secondi")


Durata della raccolta dati: 36798 ore, 27 minuti, 51 secondi


5\. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [6]:
# raggruppo in base al TCD_CHANNEL e conto quanti sono
noisy_channels = (data.groupby('TDC_CHANNEL').size()).sort_values(ascending=False)

top_noisy_channels = noisy_channels.head(3)

print("Top 3 Noisy Channels:")
print(top_noisy_channels)


Top 3 Noisy Channels:
TDC_CHANNEL
139    12429
64      7424
63      7300
dtype: int64


6\. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [7]:
non_empty_orbits = data.groupby('ORBIT_CNT')['TDC_MEAS'].nunique() 

count_non_empty_orbits = non_empty_orbits[non_empty_orbits>0].count()

print("Number of Non-Empty Orbits:", count_non_empty_orbits)


Number of Non-Empty Orbits: 1486


7\. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [8]:
df = data[data['TDC_CHANNEL'] == 139]

unique_orbits_count = df['ORBIT_CNT'].nunique()


print("Number of Unique Orbits with at least one measurement from TDC_CHANNEL=139:", unique_orbits_count)

Number of Unique Orbits with at least one measurement from TDC_CHANNEL=139: 1484


8\. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

In [17]:
data0 = data[data['FPGA'] == 0]
data1 = data[data['FPGA'] == 1]


#series_fpga1 = pd.Series(fpga1_data['count'].values, index=fpga1_data['TDC_channel'])

TDC0 = data0.groupby('TDC_CHANNEL').count()
#TDC1 = data0.groupby('TDC_CHANNEL').size()
TDC0
#df = pd.Series(, index = data0['TDC_CHANNEL'])

Unnamed: 0_level_0,HEAD,FPGA,ORBIT_CNT,BX_COUNTER,TDC_MEAS,TimeSeries
TDC_CHANNEL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,139,139,139,139,139,139
2,170,170,170,170,170,170
3,173,173,173,173,173,173
4,240,240,240,240,240,240
5,177,177,177,177,177,177
...,...,...,...,...,...,...
129,1,1,1,1,1,1
130,5,5,5,5,5,5
137,4,4,4,4,4,4
138,4,4,4,4,4,4


9\. **Optional:** Create two histograms (one for each FPGA) that show the number of counts for each TDC channel.

In [20]:


channel_counts = data.groupby(['FPGA', 'TDC_CHANNEL']).size().reset_index(name='count')

# Separate the counts for each FPGA
fpga1_counts = channel_counts[channel_counts['FPGA'] == 0].set_index('TDC_CHANNEL')['count']
fpga2_counts = channel_counts[channel_counts['FPGA'] == 1].set_index('TDC_CHANNEL')['count']

# Print or use the resulting Series as needed
print("FPGA1 Counts:")
print(fpga1_counts)

print("\nFPGA2 Counts:")
print(fpga2_counts)



FPGA1 Counts:
TDC_CHANNEL
1       139
2       170
3       173
4       240
5       177
       ... 
129       1
130       5
137       4
138       4
139    8722
Name: count, Length: 122, dtype: int64

FPGA2 Counts:
TDC_CHANNEL
1      3207
2      3686
3      2495
4      2990
5      1701
       ... 
129       4
130       6
137       4
138       4
139    3707
Name: count, Length: 132, dtype: int64
