In [2]:
import pandas as pd
import numpy as np

1. **Pandas DataFrame**

This exercise consists in analyzing a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a pair of FPGAs. Each measurement (i.e. each row of the input file) consists of a flag that specifies the type of message ('HEAD', which in this case is always 1), two addresses of the TDC providing the signal ('FPGA' and 'TDC_CHANNEL'), and the timing information ('ORBIT_CNT', 'BX_COUNTER', and 'TDC_MEAS'). Each TDC count corresponds to 25/30 ns, whereas a unit of BX_COUNTER corresponds to 25 ns, and the ORBIT_CNT is increased every `x` BX_COUNTER. This allows to store the time in a similar way to hours, minutes and seconds.


In [20]:
# If haven't downloaded it yet, please get the data file with wget
#!wget https://www.dropbox.com/s/xvjzaxzz3ysphme/data_000637.txt -P ./data/


1. Create a Pandas DataFrame reading N rows of the `data/data_000637.txt` dataset. Choose N to be smaller than or equal to the maximum number of rows and larger that 10k (check the documentation).

In [3]:
filename = "ScientificComputingWithPython2023/data/data_000637.txt"
N = 10000
data = pd.read_csv(filename, nrows = N)
data

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS
0,1,0,123,3869200167,2374,26
1,1,0,124,3869200167,2374,27
2,1,0,63,3869200167,2553,28
3,1,0,64,3869200167,2558,19
4,1,0,64,3869200167,2760,25
...,...,...,...,...,...,...
9995,1,0,61,3869200267,696,11
9996,1,0,60,3869200267,701,5
9997,1,0,59,3869200267,707,23
9998,1,0,63,3869200267,706,15


2. Estimate the number of BX in a ORBIT (the value `x`).

Hint: check when the BX counter reaches the maximum value before being reset to 0.

In [4]:
df = pd.DataFrame(data)

# eliminate duplicates of values in ORBIT_CNT keeping the last
orbit_no_dup = df.drop_duplicates(subset='ORBIT_CNT', keep='last')
print(orbit_no_dup)

# compute the mean of BX_COUNTER corresponding to orbit without duplicates
df_mean = orbit_no_dup.BX_COUNTER.mean()
print('The mean is:',df_mean)

      HEAD  FPGA  TDC_CHANNEL   ORBIT_CNT  BX_COUNTER  TDC_MEAS
42       1     0           57  3869200167        3187        17
127      1     0           62  3869200168        3538         4
254      1     0           49  3869200169        2766         1
352      1     1            3  3869200170        3377        11
461      1     0           64  3869200171        3538         7
...    ...   ...          ...         ...         ...       ...
9682     1     0           58  3869200263        3422        28
9761     1     1            6  3869200264        3065        20
9851     1     1            2  3869200265        3480        29
9982     1     0           56  3869200266        1851        11
9999     1     0           49  3869200267         777        13

[101 rows x 6 columns]
The mean is: 3172.2475247524753


3. Create a new column with the absolute time in ns (as a combination of the other three columns with timing information) since the beginning of the data acquisition, and convert the new column to a Time Series.

*Hint*: introduce an offset to the absolute time such that the start of the data acquisition (i.e. the first entry) is zero.

In [24]:
# offset to the absolute time in all three columns
orbit_offset = df["ORBIT_CNT"] - df["ORBIT_CNT"].loc[0]
counter_offset = df["BX_COUNTER"] - df["BX_COUNTER"].loc[0]
tdc_offset = df["TDC_MEAS"] - df["TDC_MEAS"].loc[0]

#compute absolute time in ns
df["Time"] = orbit_offset*df_mean*25 + counter_offset*25 + tdc_offset

#Convert to a Time Series
df['Time'] = pd.to_datetime(df['Time'], unit='ns')

print(df['Time'])

0      1970-01-01 00:00:00.000000000
1      1970-01-01 00:00:00.000000001
2      1970-01-01 00:00:00.000004477
3      1970-01-01 00:00:00.000004593
4      1970-01-01 00:00:00.000009649
                    ...             
9995   1970-01-01 00:00:00.007888653
9996   1970-01-01 00:00:00.007888772
9997   1970-01-01 00:00:00.007888940
9998   1970-01-01 00:00:00.007888907
9999   1970-01-01 00:00:00.007890680
Name: Time, Length: 10000, dtype: datetime64[ns]


4. Find out the duration of the data taking in hours, minutes and seconds, by using the features of the Time Series. Perform this check reading the whole dataset.

In [14]:
print(pd.to_datetime(df['Time']).dt.time)

0              00:00:00
1              00:00:00
2       00:00:00.000004
3       00:00:00.000004
4       00:00:00.000009
             ...       
9995    00:00:00.007888
9996    00:00:00.007888
9997    00:00:00.007888
9998    00:00:00.007888
9999    00:00:00.007890
Name: Time, Length: 10000, dtype: object


5. Use the `.groupby()` method to find out the noisy channels, i.e. the TDC channels with most counts (print to screen the top 3 and the corresponding counts)

In [30]:
print(df.groupby('TDC_CHANNEL').size().sort_values(ascending=False).head(3))

TDC_CHANNEL
139    860
63     499
64     491
dtype: int64


6. Count the number of non-empty orbits (i.e. the number of orbits with at least one hit).

In [46]:
# first I check for nan values 
count = df["ORBIT_CNT"] == float("nan")
print('ORBIT_CNT nan are:',df[count])

# I use the data frame without duplicates that I initiated in es 2
n1 = abs((orbit_no_dup["ORBIT_CNT"] > 1).sum()) - 1
print("Number of non-empty orbits:", n1)


ORBIT_CNT nan are: Empty DataFrame
Columns: [HEAD, FPGA, TDC_CHANNEL, ORBIT_CNT, BX_COUNTER, TDC_MEAS, Time]
Index: []
Number of non-empty orbits: 100


7. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139.

In [49]:
print("number of different ORBIT_CNT on TDC_CHANNEL=139:")
print(len(df.loc[df["TDC_CHANNEL"]==139].groupby(['ORBIT_CNT'])))


number of different ORBIT_CNT on TDC_CHANNEL=139:
101


8. Create two Series (one for each FPGA) that have the TDC channel as index, and the number of counts for the corresponding TDC channel as values.

In [50]:
series_1 = df.loc[df["FPGA"] == 0].groupby(['TDC_CHANNEL']).size()
series_2 = df.loc[df["FPGA"] == 1].groupby(['TDC_CHANNEL']).size()
print(series_1)
print(series_2)


TDC_CHANNEL
1        4
2       10
3        8
4       11
5       11
      ... 
121     45
122     55
123    138
124    130
139    599
Length: 117, dtype: int64
TDC_CHANNEL
1      237
2      250
3      193
4      200
5      128
      ... 
125      2
126      1
127     10
128     11
139    261
Length: 115, dtype: int64
