# Type A Evaluations of Uncertainy

In previous chapters you have come to understand what a measurement is, and how to use a probability density function to model your information about a measurand. The pdf may be summarised by your best approximation of the measurand and the standard uncertainty, which together form the result of the measurement.

You have already learnt how complete a Type B evaluation of uncertainty to estimate the standard uncertainty associated with certain sources of uncertainty. In this chapter you will learn how to deal with dispersion (scatter) in a set of repeated readings (observations) of the same measurand.


## 5.1 Dispersion of Data

Consider the following experiment being performed by yourself in the physics laboratory. A wooden slope is clamped near the edge of a table. A ball is released from a height h above the table as shown in the diagram. The ball leaves the slope __horizontally__ and lands on the floor a distance d from the edge of the table. Special paper is placed on the floor on which the ball makes a small mark when it lands and you use a long ruler to measure d and h . You have been asked to investigate how the distance d on the floor changes when the height h is varied.


<img src="5_1-Fig1.png" style="width: 600px;"/>

You decide to roll the ball once from a height h = 78.0 mm. You then see a single spot on the paper where the ball landed.  The first thing to do is to measure d with a ruler. Since the ruler has markings every 1 mm, we can estimate the reading of d to the nearest 0.1 mm.

<img src="5_1-Fig2.png" style="width: 600px;"/>

What value would you record for the location of this spot, d_1?  Enter your answer in the cell below.  Hit <kbd>Shift</kbd>+<kbd>Enter</kbd> after entering your value.

<kbd>Shift</kbd>+<kbd>Enter</kbd> tells the notebook to evaluate the code in the cell.  In this case it will assign your numerical value to the variable d_1.

In [None]:
d_1= #Enter your number after the equals sign on this line

print("You think that d_1 is equal to", d_1, "mm.")

You decide to roll the ball a second time from the same height h = 78.0 mm and observe that the ball lands at a slightly different position. You now see two spots on the paper.

<img src="5_1-Fig3.png" style="width: 600px;"/>

What value would you record for d_2?  Enter your value in the cell below.  Hit <kbd>Shift</kbd>+<kbd>Enter</kbd> after entering your value.


In [None]:
d_2=  #Enter your number after the equals sign on this line

print("You think that d_2 is equal to", d_2, "mm.")

What should you write down if you were asked to record one value to represent the best value for the distance d?  You can do arithmetic in code cells.  For instance, if you thought you should subtract the two values of d_1 and d_2 and then multiply that result by 113 (clearly you are a little confused), you could type the following into the cell below:

```python

d=113*(d_1-d_2)
```

Enter your value for d below and explain why you chose this value by replacing the text between quotes. Hit <kbd>Shift</kbd>+<kbd>Enter</kbd> after entering your value. 

In [None]:
d=   #Enter your number after the equals sign on this line

print("You think the best value to describe d is",d,"mm.")

print("Enter your answer to the question between these quote signs")

Say that you now decide to roll the ball a third time from the same height h = 78.0 mm, and the ball lands at a slightly different position. You now see three spots on the paper.

<img src="5_1-Fig4.png" style="width: 600px;"/>

Write down your value for d_3.  Also, include what value you would record for the best value of d.

In [None]:
d_3 =  #Enter your number after the equals sign on this line

print("You think that d_3 is equal to", d_3, "mm.")

d =  #Enter your number after the equals sign on this line

print("You think the best value for d is",d,"mm.")

We see that there is clearly a dispersion, or scatter, in the readings for d. Why do all the spots not occur exactly on top of each other?  Actually, it is usually not possible to identify a single reason for what causes the observed scatter in the data. Even if you do the experiment as carefully as possible, then there will still be a dispersion in the readings of d. The important question is how to deal with this dispersion (in this case d).

The best approximation for d after one roll is clearly 650.4 mm. After 2 or more rolls, the average, or arithmetic mean, of all the readings is usually the best value to use. Why is this the case ?

In [None]:
print("Enter your answer here between the quote signs and replace this text.")

You take a fourth measurement and find that d_4=669.6 mm.  

Put your cursor in the following cell and hit <kbd>Shift</kbd> + <kbd>Enter</kbd>.

In [None]:
d_4=666.9

print("After 2 measurements the average is d=",(d_1+d_2)/2,"mm.")

print("After 3 measurements the average is d=",(d_1+d_2+d_3)/3,"mm.")

print("After 4 measurements the average is d=",(d_1+d_2+d_3+d_4)/4,"mm.")

You can see that the average changes as we take more and more readings.

<img src="5_1-Fig5.png" style="width: 600px;"/>

It is not possible to give a firm answer to this question. Let us say that you decide to roll the ball a total of 75 times from the same height, h = 78.0 mm. Then you might see the following pattern of spots on the paper:

<img src="5_1-Fig6.png" style="width: 600px;"/>


Look at the table on the next page. The data from Table 5.1 are listed together with the “running average”, which is the average calculated up to and including each reading. You can see that the average jumps around quite a bit when there are only a few readings. As the number of readings increases, the average approaches a constant value.
Therefore when you see a dispersion in your readings in an experiment, then you should try to take as many readings as possible. Of course, it is usually not practical to take millions of readings, but you need to carefully consider how many repeated readings are necessary to give a reliable average.


In [6]:
%matplotlib inline  
#Used to show plots and graphs in the notebook

import pandas as pd  #Import all of the function in pandas


data=pd.read_csv('data_for_uncertainty_packet_1.csv')   #Read in the data file

In [5]:
data  #Print the data to make sure it looks ok

Unnamed: 0,measurements
0,599.808197
1,661.222745
2,662.578944
3,691.993904
4,670.920782
5,696.048456
6,671.238760
7,693.148831
8,690.893400
9,679.470112


In [4]:
data['measurements'].plot.hist(10)

NameError: name 'data' is not defined

In [None]:
data.measurements.describe()

In [None]:
data.measurements[1:50].describe()

In [None]:
data.measurements[1:50].mean()

In [None]:
data.index

In [None]:
data.sort_index()

In [None]:
data.sort_values(by='measurements')

In [7]:
def running_average(dataframe,column):
    new_data1=[]
    new_data2=[]
    for i in range(0,column.size):
        new_data1.append(column[0:i+1].mean())
        new_data2.append(column[0:i+1].std())
    dataframe['running_avg']=pd.Series(new_data1, index=dataframe.index)
    dataframe['std']=pd.Series(new_data2,index=dataframe.index)
running_average(data,data.measurements)
print(data)

    measurements  running_avg        std
0     599.808197   599.808197        NaN
1     661.222745   630.515471  43.426643
2     662.578944   641.203295  35.855619
3     691.993904   653.900948  38.755711
4     670.920782   657.304914  34.415676
5     696.048456   663.762171  34.608208
6     671.238760   664.830255  31.718958
7     693.148831   668.370077  31.025918
8     690.893400   670.872669  29.977463
9     679.470112   671.732413  28.393487
10    671.372211   671.699667  26.936645
11    682.820053   672.626366  25.882929
12    637.044886   669.889329  26.673708
13    649.236465   668.414125  26.214960
14    646.638097   666.962390  25.879523
15    665.141300   666.848571  25.006140
16    651.050932   665.919299  24.513377
17    638.585089   664.400731  24.638732
18    661.838186   664.265861  23.951759
19    647.561383   663.430637  23.610267
20    642.385359   662.428481  23.466212
21    645.129176   661.642149  23.195777
22    656.223614   661.406560  22.690616
23    655.867593

In [9]:
sorted_data=data.sort_values(by='measurements')
print(sorted_data)

    measurements  running_avg        std
0     599.808197   599.808197        NaN
24    620.000638   659.528765  23.259395
35    623.076431   657.900045  23.017818
48    623.332725   655.356962  21.545377
31    628.838536   659.275289  23.135334
63    629.543362   654.974484  20.486943
37    629.790219   656.666781  23.032166
65    630.312974   654.469381  20.421952
59    630.381697   655.483388  20.727461
71    631.830688   654.717943  20.384814
40    636.291627   656.017464  22.387918
12    637.044886   669.889329  26.673708
43    637.277811   655.985283  22.163289
61    637.688874   655.373816  20.559771
30    637.761680   660.257119  22.830005
17    638.585089   664.400731  24.638732
36    639.145847   657.393175  22.904337
34    640.818916   658.895005  22.554747
20    642.385359   662.428481  23.466212
25    643.164622   658.899374  23.014320
54    643.969198   656.008615  21.127525
21    645.129176   661.642149  23.195777
68    645.697184   654.829240  20.484865
53    646.170201

In [11]:
running_average(sorted_data,sorted_data.measurements)
print(sorted_data
     )

    measurements  running_avg        std
0     599.808197   599.808197        NaN
24    620.000638   609.904417  14.278212
35    623.076431   614.295089  12.639923
48    623.332725   616.554498  11.266387
31    628.838536   619.011305  11.197238
63    629.543362   620.766648  10.899077
37    629.790219   622.055730  10.517779
65    630.312974   623.087885  10.165779
59    630.381697   623.898309   9.815103
71    631.830688   624.691547   9.587726
40    636.291627   625.746099   9.744995
12    637.044886   626.687665   9.847350
43    637.277811   627.502292   9.875043
61    637.688874   628.229905   9.870516
30    637.761680   628.865356   9.824713
17    638.585089   629.472840   9.797682
36    639.145847   630.041840   9.772352
34    640.818916   630.640567   9.814978
20    642.385359   631.258714   9.911707
25    643.164622   631.854009  10.007938
54    643.969198   632.430923  10.106449
21    645.129176   633.008116  10.227699
68    645.697184   633.559815  10.336903
53    646.170201

In [20]:
op=data.measurements.value_counts(bins=10)

op.sort_index()
#This returns a series, not a dataframe

(599.711, 609.432]     1
(609.432, 619.056]     0
(619.056, 628.68]      3
(628.68, 638.304]     11
(638.304, 647.928]    15
(647.928, 657.552]    17
(657.552, 667.176]     8
(667.176, 676.8]       8
(676.8, 686.424]       5
(686.424, 696.048]     7
Name: measurements, dtype: int64

In [21]:
list(op)

[17, 15, 11, 8, 8, 7, 5, 3, 1, 0]

In [26]:
freq_table=pd.DataFrame({'frequency':op.values, 'range':op.index})

#convert Series into a dataframe and set old index to one of the columns

In [28]:
freq_table.frequency.sum()

75

In [30]:
freq_table['relative_freq']=freq_table.frequency/freq_table.frequency.sum()
freq_table

Unnamed: 0,frequency,range,relative_freq
0,17,"(647.928, 657.552]",0.226667
1,15,"(638.304, 647.928]",0.2
2,11,"(628.68, 638.304]",0.146667
3,8,"(667.176, 676.8]",0.106667
4,8,"(657.552, 667.176]",0.106667
5,7,"(686.424, 696.048]",0.093333
6,5,"(676.8, 686.424]",0.066667
7,3,"(619.056, 628.68]",0.04
8,1,"(599.711, 609.432]",0.013333
9,0,"(609.432, 619.056]",0.0
