In [4]:
using ColorBitstring, SetRounding

## Rounding

In [14]:
# RoundNearest 0舍1入
printlnbits(Float16(1/3, RoundNearest)) # Rounding要放在 Float(...) 里面
printlnbits(Float16(1/3, RoundUp))
printlnbits(Float16(1/3, RoundDown))

[31m0[0m[32m01101[0m[34m0101010101[0m
[31m0[0m[32m01101[0m[34m0101010110[0m
[31m0[0m[32m01101[0m[34m0101010101[0m


## Approximate Arithmetic  `+` `-` `*` `/` `^` `sqrt`

non-associativity of addition

In [27]:
# 64-bit settings 
S = 52
Q = 11
# 1+11+52=64 所以这里的 y = 2.0^(-53) 在Float64中为0

11

In [32]:
x = 1.0
y = 2.0^(-53)
x+y

1.0

In [34]:
printbits(x+y)

[31m0[0m[32m01111111111[0m[34m0000000000000000000000000000000000000000000000000000[0m

In [40]:
# default是Roundnearest 在这等于RoundDown
z = setrounding(Float64, RoundUp) do
    x+y
end
printlnbits(z)

[31m0[0m[32m01111111111[0m[34m0000000000000000000000000000000000000000000000000001[0m


In [41]:
(x+y)+y

1.0

In [42]:
x+(y+y)

1.0000000000000002

Computing (1/3)^2 using 16-bits 

In [65]:
S = 10
Q = 5
sigma = 15

15

In [66]:
x = Float16(1/3)
printlnbits(x)
printlnbits(x^2)

[31m0[0m[32m01101[0m[34m0101010101[0m
[31m0[0m[32m01011[0m[34m1100011100[0m


In [67]:
q = parse(Int, "01011", base=2)

11

In [68]:
sig = 2.0^(-1.0*S) * parse(Int, "1" * "1100011100"; base=2)

1.77734375

In [69]:
2.0^(q-sigma) * sig
# as expected 0.1111 but with less precision

0.111083984375

In [64]:
x^2

Float16(0.1111)

## Bounding Rounding errors

In [7]:
# abs error of (1.1+1.2)*1.3
2.99 - (1.1+1.2)*1.3

4.440892098500626e-16

machine epsilon

In [32]:
eps(Float16), 
eps(Float32), 
eps()

(Float16(0.000977), 1.1920929f-7, 2.220446049250313e-16)

In [36]:
eps(1.0), # 空的eps() 默认就是eps(1.0) 
eps(3.0), # 里放数字代表着 the absolute difference between that value and the next representable floating point value
eps(1000.) 

(4.440892098500626e-16, 1.1368683772161603e-13, 2.220446049250313e-16)

The distance between two adjacent representable floating-point numbers is not constant, but is smaller for smaller values and larger for larger values. In other words, the representable floating-point numbers are densest in the real number line near zero, and grow sparser exponentially as one moves farther away from zero.

In [22]:
abs((a+b)*c - (1.1+1.2)*1.3) , 11.47*eps()

(3.907985046680551042012706102204982635293214132069655741830160877725575119256973e-16, 2.5468516184901092e-15)

In [16]:
abs(2.99 - (1.1+1.2)*1.3) <= 11.47*eps()

true

左边是abs（abs error） ;  右边是abs（abs error）的boundary 计算出来的11.47
左边<=右边就对了

In [19]:
a,b,c = big(1.1), big(1.2), big(1.3)

(1.100000000000000088817841970012523233890533447265625, 1.1999999999999999555910790149937383830547332763671875, 1.3000000000000000444089209850062616169452667236328125)

In [20]:
(a+b)*c

2.990000000000000159872115546022543793155223257607638529321413206965574183016088

#### number expression (自己查到的)

0b 开头的数代表2进制
0o 开头的数代表8进制
0x 开头的数代表16进制 123456789abcdef

In [29]:
0x1.8p3 # pn代表2的n次方 [1*16^0 + 8*16^(-1)] * 2^3 = 12

12.0

In [31]:
0x.4p-1 # [4*16^(-1)] * 2^(-1) = 0.125

0.125

In [28]:
1.8e3, 1.8f3 # e, f都代表 10的exponent， fn结尾代表属于Float32

(1800.0, 1800.0f0)