ulab std method for matrix #9

huybik · 2019-11-08T08:46:25Z

Symptom:
ulab.std() sometimes produce nan in the result
ulab version 0.25

Reproduce:
import ulab as np
a = np.array([[109.5894, 117.2053, 69.3245],
[109.5894, 117.1788, 69.3245],
[109.596, 117.1854, 69.3245],
[109.596, 117.1854, 69.3245],
[109.5894, 117.1722, 69.31788],
[109.5894, 117.1921, 69.31788],
[109.5894, 117.1854, 69.31788],
[109.596, 117.1656, 69.31788],
[109.596, 117.1589, 69.31126],
[109.596, 117.1788, 69.30463],
[109.596, 117.1656, 69.29801],
[109.596, 117.1391, 69.29801],
[109.5894, 117.1656, 69.29801],
[109.5828, 117.1523, 69.29801],
[109.5762, 117.1391, 69.29139],
[109.5695, 117.1457, 69.29139],
[109.5762, 117.1457, 69.29139],
[109.5762, 117.1258, 69.28477],
[109.5695, 117.1192, 69.28477],
[109.5695, 117.0861, 69.28477],
[109.5695, 117.0728, 69.27814],
[109.5695, 117.0728, 69.27814],
[109.5695, 117.0861, 69.28477],
[109.5629, 117.0861, 69.29139],
[109.5695, 117.1059, 69.29139],
[109.5695, 117.1126, 69.28477],
[109.5762, 117.1059, 69.28477],
[109.5828, 117.1258, 69.29139],
[109.5762, 117.1457, 69.29139],
[109.5762, 117.1324, 69.29139],
[109.5762, 117.1192, 69.28477],
[109.5762, 117.1192, 69.28477],
[109.5695, 117.1391, 69.28477],
[109.5695, 117.1059, 69.28477],
[109.5762, 117.1457, 69.28477],
[109.5828, 117.1258, 69.28477],
[109.5828, 117.0993, 69.28477],
[109.5828, 117.1457, 69.28477],
[109.5828, 117.1523, 69.29139],
[109.5894, 117.0993, 69.28477],
[109.5828, 117.1059, 69.27814],
[109.5762, 117.1457, 69.27814],
[109.5695, 117.1126, 69.27814],
[109.5695, 117.1192, 69.27814],
[109.5695, 117.1457, 69.28477],
[109.5695, 117.1324, 69.29139],
[109.5695, 117.1192, 69.28477],
[109.5762, 117.1523, 69.29139],
[109.5695, 117.1722, 69.29801],
[109.5695, 117.1523, 69.29801],
[109.5629, 117.1391, 69.29801],
[109.5695, 117.1722, 69.30463],
[109.5762, 117.1391, 69.31126],
[109.5762, 117.0927, 69.30463],
[109.5695, 117.1391, 69.30463],
[109.5695, 117.0927, 69.30463],
[109.5629, 117.053, 69.30463],
[109.5629, 117.0795, 69.30463],
[109.5563, 117.0728, 69.30463],
[109.5497, 117.0265, 69.29801],
[109.5497, 117.0066, 69.29139],
[109.5497, 117.0199, 69.29139],
[109.5497, 116.9868, 69.29139],
[109.5497, 116.9603, 69.29139],
[109.5497, 116.9801, 69.29801],
[109.5497, 116.9603, 69.29801],
[109.543, 116.9404, 69.29139],
[109.543, 116.9338, 69.28477],
[109.5364, 116.9536, 69.28477],
[109.543, 116.9536, 69.29139],
[109.543, 116.9603, 69.29139],
[109.5364, 116.947, 69.29139],
[109.5298, 116.9669, 69.29139],
[109.5298, 116.9801, 69.29139],
[109.5232, 116.947, 69.29139],
[109.5298, 116.9735, 69.29139],
[109.5232, 117.0066, 69.28477],
[109.5165, 117.0, 69.27814],
[109.5099, 116.9868, 69.27152],
[109.5099, 117.0132, 69.27152],
[109.5099, 117.0199, 69.27152],
[109.5099, 116.9735, 69.27152],
[109.5033, 117.0132, 69.2649],
[109.4967, 117.0662, 69.25828],
[109.4967, 117.0265, 69.25828],
[109.4967, 117.0397, 69.25828],
[109.5033, 117.0728, 69.25166],
[109.4967, 117.053, 69.25166],
[109.4967, 117.0, 69.24503],
[109.4967, 117.0331, 69.23841],
[109.4901, 117.0662, 69.23841],
[109.4967, 117.0464, 69.23841],
[109.4901, 117.0728, 69.23841],
[109.4901, 117.0993, 69.23841],
[109.4901, 117.0662, 69.23841],
[109.4967, 117.0199, 69.23841],
[109.4901, 117.053, 69.23841],
[109.4901, 117.053, 69.23841],
[109.4901, 117.0199, 69.23841],
[109.4834, 117.0596, 69.23841]])

print(np.std(a,axis=0))
Result:
array([nan, nan, nan], dtype=float)

Expected result:
array([0.03424415, 0.07198399, 0.02212126], dtype=float)

v923z · 2019-11-08T08:59:11Z

Can you provide a minimal example that would produce these results? What is the git signature of the version you are running? And which platform? stm32, unix, or something else?
Looking at the numbers, I suspect that you are running on a platform that has only floats, but no doubles, and we are facing a resolution problem. But this should definitely be fixed.

By the way, I wanted to clean up the code on std and similar functions, so this is certainly a very good time to bring this issue up.

huybik · 2019-11-08T09:23:31Z

Hi , thanks for a great library. I compiled micropython port for ESP32 and ulab followed your instruction. Everything works great and fast except for some hiccups. I'll list all those in a short while.
How do I get git signature of the version? I cloned the code 2 days ago :D

Minimal example:
a = np.array([[109.5894, 117.2053, 69.3245],
[109.5894, 117.1788, 69.3245],
[109.596, 117.1854, 69.3245]])
np.std(a,axis=0)
array([nan, 0.03856275, nan], dtype=float)
Expected result:
array([0.003111004, 0.01126226, 0.0], dtype=float)

v923z · 2019-11-08T09:32:17Z

Hi , thanks for a great library. I compiled micropython port for ESP32 and ulab followed your instruction. Everything works great and fast except for some hiccups. I'll list all those in a short while.

I haven't tried the ESP32, so I can't really comment on that. But I believe, errors are universal, so I would really appreciate, if you could post a summary of what you have found. I have to admit, I don't have too much time for testing, so feedback of this kind is really useful.

How do I get git signature of the version? I cloned the code 2 days ago :D

You can say git log on the command line. That will list all the commits in reverse order, so you can just read off the first couple of lines. That should be the last version of the copy that you have.

Minimal example:
a = np.array([[109.5894, 117.2053, 69.3245],
[109.5894, 117.1788, 69.3245],
[109.596, 117.1854, 69.3245]])
np.std(a,axis=0)
array([nan, 0.03856275, nan], dtype=float)
Expected result:
array([0.003111004, 0.01126226, 0.0], dtype=float)

Thanks! I will look into this.

huybik · 2019-11-08T09:50:42Z

Thanks. Quick summarise of what I found so far:
Setup:
a = np.array([1,2,3])
Issues:

np.mean(a,0) doesnt work, has to specify np.mean(a,axis=0), same for std
(3 * a) doesnt work, have to write (a * 3)
(1 + a) doesnt work, have to write (a+ 1)
b = [False,False,False], a[b]=0 IndexError: empty index range. Expected result: Nothing happen

v923z · 2019-11-08T10:10:54Z

Thanks. Quick summarise of what I found so far:
Setup:
a = np.array([1,2,3])
Issues:
1. np.mean(a,0) doesnt work, has to specify np.mean(a,axis=0), same for std

OK, this should be easy to fix.

2. (3 * a) doesnt work, have to write (a * 3)

3. (1 + a) doesnt work, have to write (a+ 1)

This issue has been mentioned in the manual (https://micropython-ulab.readthedocs.io/en/latest/ulab.html#binary-operators), and has been discussed here: https://forum.micropython.org/viewtopic.php?f=3&t=7005&start=30#p40469 A fix is underway, but till then, just keep ndarrays on the left hand side.

4. b = [False,False,False], a[b]=0 IndexError: empty index range. Expected result: Nothing happen

This will change very soon: I am adding Boolean arrays (this would simplify the code, and would also be numpy-conform). So, I take note of the issue, but I believe, this would have been sorted out very soon, no matter what.

v923z · 2019-11-08T18:57:43Z

I have just pushed a fix to testing https://github.com/v923z/micropython-ulab/tree/testing . That should solve the problem of the mean/std/sum functions (I have tested it on the unix port).

import ulab as np
a = np.array([[109.5894, 117.2053, 69.3245],
[109.5894, 117.1788, 69.3245],
[109.596, 117.1854, 69.3245]])
print(np.std(a,axis=0))

returns

array([0.00311126946350501, 0.01126360895015837, 0.0], dtype=float)

Incidentally,

print(np.std(a, 0))

also works. However, I have also modified the output of relational operators, so, e.g.,

a = np.array([1, 2, 3])
a < 3

which will now return a Boolean ndarray: array([True, True, False], dtype=bool). This breaks indexing and slicing, e.g., a[a<3] won't do too much good. (This is the reason, why I created the testing branch, and why I haven't pushed to master.) However, you can test the numerical functions. Let me know if you run into difficulties with them.

huybik · 2019-11-09T17:40:33Z

Thanks for lighting fast update. I'm on the testing branch
commit 50df005`

Unfortunately, I test the numerical function std and it still produces

array([nan, 0.03856275, nan], dtype=float)

I think it's platform specific bug on the ESP32. I will update you with more debug info for the platform later. Thanks.

v923z · 2019-11-09T17:52:02Z

Thanks for lighting fast update. I'm on the testing branch
commit 50df005`

Unfortunately, I test the numerical function std and it still produces

array([nan, 0.03856275, nan], dtype=float)

I think it's platform specific bug on the ESP32. I will update you with more debug info for the platform later. Thanks.

Can you test it on the unix port?

mdaeron · 2019-11-09T17:55:51Z

FWIW, on my unix (macOS) port, it yields the correct result:

array([0.00311126946350501, 0.01126360895015837, 0.0], dtype=float)

v923z · 2019-11-09T17:59:52Z

FWIW, on my unix (macOS) port, it yields the correct result:
array([0.00311126946350501, 0.01126360895015837, 0.0], dtype=float)

Thanks! Should we push this issue upstream, then? I haven't a clue what the problem might be, so I think, it could be beneficial to bring in some micropython developer. I am just not sure what the proper way of raising this issue is...

mdaeron · 2019-11-09T18:38:48Z

Should we push this issue upstream, then?

I guess so. Probably here?

v923z · 2019-11-10T16:22:42Z

Before doing that, we should probably isolate the problem. E.g., what is the result of

import ulab as np
a = np.array([[109.5894, 117.2053, 69.3245],
[109.5894, 117.1788, 69.3245],
[109.596, 117.1854, 69.3245]])
print(np.mean(a,axis=0))
print(np.sum(a,axis=0))

(this would give us an indication as to whether the error comes from the sqrt function) or

import ulab as np
a = np.array([109.5894, 109.5894, 109.596])
print(np.mean(a))
print(np.sum(a))

(is the issue related to matrices?) or the python implementation of the std function?

from math import sqrt
a = 109.5894
b = 109.5894
c = 109.596
s = a + b + c
ssq = a*a + b*b + c*c
print(s)
s = s/3
print(sqrt(ssq/3-s*s))

If these all yield the correct results, then we will probably have to whip up a minimal implementation, without ulab, but with the relevant functions, that produces the error. If that comes to that, I would suggest that we create a repository, and work there.

By the way, what is the float type on ESP32? You can find that out by checking the output of the .rawsize method https://micropython-ulab.readthedocs.io/en/latest/ulab.html#rawsize .

huybik · 2019-11-11T05:04:23Z

First of the float type

a = np.array([109.5894])
a.rawsize()
(1, 1, 1, 4, 4)

The result of

import ulab as np a = np.array([[109.5894, 117.2053, 69.3245], [109.5894, 117.1788, 69.3245], [109.596, 117.1854, 69.3245]])
print(np.mean(a,axis=0))
array([109.5916, 117.1898, 69.3245], dtype=float)
print(np.sum(a,axis=0))
array([328.7748, 351.5695, 207.9735], dtype=float)

Which is correct, except for some rounding. The result of
import ulab as np
a = np.array([109.5894, 109.5894, 109.596])
print(np.mean(a))
109.5916
print(np.sum(a))
328.7748

Which also correct. Now the last one got problem with trying to get sqrt of a negative number due to rounding. I break down the output as below:
from math import sqrt
a = 109.5894
b = 109.5894
c = 109.596
s = a + b + c
ssq = a*a + b*b + c*c
print(s)

s = s/3
109.5916
ssq
36030.96
ssq/3
12010.32
s*s
12010.32
ssq/3 - s*s
-0.0009765625
sqrt(ssq/3-s*s)
ValueError: math domain error

The error is because the result of
ssq/3-s*s
-0.0009765625
due to rounding of ssq/3 and s*s.

The sqrt function works correctly.
print(sqrt(0.0009765625))
0.03125

I'll try to enable double precission in the build and report back.

v923z · 2019-11-11T07:10:20Z

Thanks for the thorough report!

First of the float type

a = np.array([109.5894])
a.rawsize()
(1, 1, 1, 4, 4)

I thought that it was double, but obviously, I was mistaken.

Which also correct. Now the last one got problem with trying to get sqrt of a negative number due to rounding. I break down the output as below:
from math import sqrt
a = 109.5894
b = 109.5894
c = 109.596
s = a + b + c
ssq = a*a + b*b + c*c
print(s)

s = s/3
109.5916
ssq
36030.96
ssq/3
12010.32
s*s
12010.32
ssq/3 - s*s
-0.0009765625
sqrt(ssq/3-s*s)
ValueError: math domain error

The error is because the result of
ssq/3-s*s
-0.0009765625
due to rounding of ssq/3 and s*s.

The sqrt function works correctly.
print(sqrt(0.0009765625))
0.03125

OK, so this proves that the calculation of the standard deviation can't be done in that particular way. One has to use a different formula then. I can make it safe, at the expense of having to iterate the loop twice.

I'll try to enable double precission in the build and report back.

First, I am not sure you can do that, but more importantly, it wouldn't solve the problem. This whole isssue is the result of the finite resolution of floats and doubles. We would run into the same difficulties, but with different numbers. A proper solution is to use a different algorithm for the standad deviation. I will try to implement that tonight. I will ping you, once the fix is in testing.

I hope this glitch hasn't discouraged you.

mdaeron · 2019-11-11T09:03:58Z

I assume you're thinking of something like the other classical formula, which avoids substracting large similar numbers?

import ulab

def stdev(x, ddof = 0):
	l = len(x)
	m = ulab.mean(x)[0]
	s = (x-m)*(x-m)
	return ulab.sqrt(ulab.sum(s) / (l-ddof))	

x = ulab.array([109.5894, 109.5894, 109.596])

print(stdev(x))

If so, can you please consider adding the ddof argument to mimic numpy.std?

v923z · 2019-11-11T11:06:16Z

I assume you're thinking of something like the other classical formula, which avoids substracting large similar numbers?

Exactly. Actually, it can be done in a single pass, at the expense of an extra multiplication and division. I have to see, whether that is faster than running the loop twice. It might also be a bit platform-dependent.

If so, can you please consider adding the ddof argument to mimic numpy.std?

This is a very good point, thanks! I will try to add this.

v923z · 2019-11-12T19:00:44Z

I have just pushed a fix to testing. Could you, please, check that out, and see if it does what you need. ddof should also work now. Thanks!

huybik · 2019-11-13T07:29:43Z

Hi,
I got correct result now

a = np.array([[109.5894, 117.2053, 69.3245],
[109.5894, 117.1788, 69.3245],
[109.596, 117.1854, 69.3245]])
print(np.std(a,axis=0,ddof=1))
array([0.003810186, 0.0137934, 0.0], dtype=float)

Thanks a lot.

huybik · 2019-11-13T10:56:09Z

Btw I got this problem, which works before, maybe because index has been switched to boleen ndarray? I know you still working on slicing and stuff after the change.

a = np.array([1,2])
b = np.array([1])
print(a-b)
ValueError: operands could not be broadcast together

My temporary fix would be a-b[0]. This happens when I subtract the mean result, e.g. a-mean(a).

v923z · 2019-11-13T18:01:22Z

Btw I got this problem, which works before, maybe because index has been switched to boleen ndarray? I know you still working on slicing and stuff after the change.
a = np.array([1,2])
b = np.array([1])
print(a-b)
ValueError: operands could not be broadcast together

I bet, this never worked, because I never got to working out the binary operator code in its entirety.

My temporary fix would be a-b[0]. This happens when I subtract the mean result, e.g. a-mean(a).

Now, this is a different issue: mean, std, and sum used to return a single number, if the result was of length one. In the code that you tried, mean returned an array of length one. I have modified the code, it returns a float in the case that you mentioned. I tested it on the unix port, it works with your example.

Just as a side note, I want to re-write the ndarray class from scratch, so that it will support arrays with arbitrary rank. That will also change how all other functions are treated.

Merge upstream

v923z added the bug Something isn't working label Nov 8, 2019

v923z self-assigned this Nov 8, 2019

v923z mentioned this issue Nov 8, 2019

Compilation errors for unix port on macOS #8

Closed

v923z added the ESP32 a bug or feature related to ESP32 label Nov 10, 2019

v923z removed the ESP32 a bug or feature related to ESP32 label Nov 11, 2019

v923z mentioned this issue Nov 12, 2019

Fixed several issues preventing compilation on macOS #12

Merged

v923z closed this as completed Dec 2, 2019

jepler added a commit to jepler/circuitpython-ulab that referenced this issue Feb 27, 2020

Merge pull request v923z#9 from jepler/merge-upstream-2

cf0180e

Merge upstream

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ulab std method for matrix #9

ulab std method for matrix #9

huybik commented Nov 8, 2019 •

edited

v923z commented Nov 8, 2019 •

edited

huybik commented Nov 8, 2019 •

edited

v923z commented Nov 8, 2019

huybik commented Nov 8, 2019 •

edited

v923z commented Nov 8, 2019

v923z commented Nov 8, 2019 •

edited

huybik commented Nov 9, 2019 •

edited

v923z commented Nov 9, 2019

mdaeron commented Nov 9, 2019

v923z commented Nov 9, 2019

mdaeron commented Nov 9, 2019

v923z commented Nov 10, 2019

huybik commented Nov 11, 2019 •

edited

v923z commented Nov 11, 2019

mdaeron commented Nov 11, 2019 •

edited

v923z commented Nov 11, 2019

v923z commented Nov 12, 2019 •

edited

huybik commented Nov 13, 2019 •

edited

huybik commented Nov 13, 2019 •

edited

v923z commented Nov 13, 2019 •

edited

ulab std method for matrix #9

ulab std method for matrix #9

Comments

huybik commented Nov 8, 2019 • edited

v923z commented Nov 8, 2019 • edited

huybik commented Nov 8, 2019 • edited

v923z commented Nov 8, 2019

huybik commented Nov 8, 2019 • edited

v923z commented Nov 8, 2019

v923z commented Nov 8, 2019 • edited

huybik commented Nov 9, 2019 • edited

v923z commented Nov 9, 2019

mdaeron commented Nov 9, 2019

v923z commented Nov 9, 2019

mdaeron commented Nov 9, 2019

v923z commented Nov 10, 2019

huybik commented Nov 11, 2019 • edited

v923z commented Nov 11, 2019

mdaeron commented Nov 11, 2019 • edited

v923z commented Nov 11, 2019

v923z commented Nov 12, 2019 • edited

huybik commented Nov 13, 2019 • edited

huybik commented Nov 13, 2019 • edited

v923z commented Nov 13, 2019 • edited

huybik commented Nov 8, 2019 •

edited

v923z commented Nov 8, 2019 •

edited

huybik commented Nov 8, 2019 •

edited

huybik commented Nov 8, 2019 •

edited

v923z commented Nov 8, 2019 •

edited

huybik commented Nov 9, 2019 •

edited

huybik commented Nov 11, 2019 •

edited

mdaeron commented Nov 11, 2019 •

edited

v923z commented Nov 12, 2019 •

edited

huybik commented Nov 13, 2019 •

edited

huybik commented Nov 13, 2019 •

edited

v923z commented Nov 13, 2019 •

edited