Skip to content
This repository has been archived by the owner on Dec 3, 2019. It is now read-only.

DomainError in tests #53

Closed
gdkrmr opened this issue Jul 9, 2017 · 5 comments
Closed

DomainError in tests #53

gdkrmr opened this issue Jul 9, 2017 · 5 comments
Assignees

Comments

@gdkrmr
Copy link
Contributor

gdkrmr commented Jul 9, 2017

in v0.2.0

INFO: Testing MLKernels.kernelmatrix!
ERROR: LoadError: LoadError: DomainError:
 in exponentialkernel at /home/me/.julia/v0.5/MLKernels/src/kernel.jl:110 [inlined]

z can be smaller than zero, I think this is due to calculating the squared euclidean distance as x^2 - 2xy + y^2

@trthatcher
Copy link
Owner

Which OS are you using?

Do you have a sample matrix that can replicate this error?

@gdkrmr
Copy link
Contributor Author

gdkrmr commented Jul 11, 2017

I am on ubuntu 16.04, my julia version is self compiled (Version 0.5.3-pre+5 (2017-05-16 09:15 UTC) Commit c3b8ad0* (55 days old release-0.5)).

I get this behaviour all the time, especially on the diagonal:

julia> ϕ = ExponentialKernel(1.0)
ExponentialKernel(1.0)

julia> x
10×10 Array{Float64,2}:
 -1.45593    0.286229    2.34105   -1.26427    -2.28669     0.43442    1.53953    -0.91332    0.386365   -0.964745 
  0.183972  -0.0851034  -0.367304  -0.344591   -1.03497     0.327375   0.47774     0.12549   -1.49272     0.301446 
 -0.569892  -0.220827   -0.833623  -0.775212   -0.813796    0.766721  -0.220733   -0.069287  -0.0381343  -1.07011  
  1.56172   -0.266322   -0.140378  -0.0277015  -0.460872   -0.926211   2.59237    -0.861483  -1.11855    -0.0353702
 -1.2609     0.380017    1.16314    0.119831    0.462557    1.3845    -0.0886211   0.779727  -0.418095    0.20683  
  0.356869  -0.215854   -0.505304  -1.77055    -1.1945      0.116052   1.61496     0.501296   1.18037    -1.11746  
  0.467146   0.271994    2.14437    1.35191     0.998612    1.34705    2.07577    -1.37717    0.392143    1.53357  
  2.00001   -0.289949   -0.278978   0.147727    0.0634631   0.533851  -0.534337    0.427071  -0.551059   -0.901151 
  0.307727   0.420818    0.919902  -0.382258    0.964297   -0.096004  -0.856173   -0.712636  -0.236985    0.0871445
  0.273306   0.49039     0.813574  -0.681431    2.72113    -0.746216   2.15815    -0.878577   0.528021   -0.0458678

julia> pairwisematrix!(ColumnMajor(), zeros(n, n), MLKernels.pairwisefunction(ϕ), x, false)
10×10 Array{Float64,2}:
 -3.55271e-15  14.9337       32.9805   14.9182       22.159     27.1182       18.849    21.9356   17.0884     
  0.0          -2.22045e-16   9.24983   8.69629      14.8006     21.3566        8.92718   5.85424   5.69592    
  0.0           0.0           0.0      20.445        28.0014     20.2947       31.5266   16.583    14.924      
  0.0           0.0           0.0      -1.77636e-15  15.6829     36.4918       14.8083   17.6666    2.78561    
  0.0           0.0           0.0       0.0           0.0        39.9478       28.4705   21.8027   13.4939     
  0.0           0.0           0.0       0.0           0.0       29.7014       10.905    12.1144   11.6524     
  0.0           0.0           0.0       0.0           0.0         7.10543e-15  42.1763   25.1898   27.6584     
  0.0           0.0           0.0       0.0           0.0         0.0           0.0      12.5627   16.236      
  0.0           0.0           0.0       0.0           0.0         0.0           0.0       0.0      14.8132     
  0.0           0.0           0.0       0.0           0.0         0.0           0.0       0.0       1.77636e-15

for a concrete example (which I can also reproduce on julia 0.6 and the official 0.5.2 binaries):

n = 10
srand(12)
x = randn(10, n)
ep = 1e-10
x1 = x[:, 1] + rand(n) * ep - 0.5 * ep
x2 = x[:, 1]
x1' * x1 - 2 * x1' * x2 + x2' * x2
julia> 1-element Array{Float64,1}:
 -3.55271e-15

trthatcher pushed a commit that referenced this issue Jul 12, 2017
@trthatcher
Copy link
Owner

Thanks for the example, that was really helpful. It looks like it's just an artifact of floating point error. I adjusted the ordering of the calculation and it seems to be a bit more accurate, but I'm still able to generate cases where z is less than zero:

n = 3
ep = 1e-10
srand(12)
x = rand(n)
y = x + rand(n)*ep - 0.5 * ep

v1 = x'x - 2x'y + y'y

using MLKernels

M = MLKernels.PairwiseFunctions

X = reshape(x,1,n)
Y = reshape(y,1,n)

xᵀx = M.dotvectors(RowMajor(), X)
yᵀy = M.dotvectors(RowMajor(), Y)
G = M.gramian!(RowMajor(), zeros(1,1), X, Y)

v2 = M.squared_distance!(copy(G), xᵀx, yᵀy)[1]

println("Old calculation: ", v1)
println("New calculation: ", v2)

In scipy, it looks like the recommendation is just to clip the error so values outside the approved range are restricted to the interval (ie -1e-20 goes to 0.0). I also looked over the code of Distances.jl, and they have a fallback. If the value does not exceed some threshold (threshT*selfterms where threshT = 0 in the default case), the Euclidean distance is recalculated for the vector pair.

Clipping is less accurate but faster whereas the fallback would be slower but more accurate. Do you have any strong opinions here?

@gdkrmr
Copy link
Contributor Author

gdkrmr commented Jul 12, 2017

If I understand this right then the inaccuracies are only a problem when x ≈ y. I think there are cases, where this matters and cases where it does not, e.g. if you take the square root, then a small error close to 0 gets very large, if you take the exponential it does not matter too much. IMO the approach taken by Distances.jl is the best. This approach should also guarantee that distances are larger than 0.
How big would the performance hit be?

A side note: this tends to happen more frequently on the diagonal, so I suggest looping only over the upper triangle w/o the diagonal and setting the diagonal to 0 by default.

@trthatcher
Copy link
Owner

FYI - I won't access to my computer for the next month... so I won't be able to look into this issue until mid August.

I'm in agreement with the approach.

@trthatcher trthatcher self-assigned this Oct 2, 2017
@trthatcher trthatcher added this to the 0.3 milestone Oct 2, 2017
trthatcher pushed a commit that referenced this issue Oct 6, 2017
gdkrmr pushed a commit to gdkrmr/MLKernels.jl that referenced this issue Sep 26, 2018
gdkrmr pushed a commit to gdkrmr/MLKernels.jl that referenced this issue Sep 26, 2018
gdkrmr pushed a commit to gdkrmr/MLKernels.jl that referenced this issue Sep 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants