You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Testing pandas-profiling with the df I am currently working with, and getting
/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.pycinhistogram(a, bins, range, normed, weights, density)
247n.imag+=np.bincount(indices, weights=tmp_w.imag, minlength=bins)
248else:
-->249n+=np.bincount(indices, weights=tmp_w, minlength=bins).astype(ntype)
250251# We now compute the bin edges since these are returnedValueError: Thefirstargumentofbincountmustbenon-negative
Depending on what you are trying to do, NaN might actually be better than infinities. Are the infinities real, mathematical infinities? If they are right-censored, for example because your sensor can only read to a certain upper limit, than replacing it with the right-censored value might be more appropriate.
In any case, a lot of calculated statistics will be invalid. For example, when there is a positive infinity, the calculated mean should be positive infinity as well, but the value will now be ignored, skewing the results.
I will leave this bug open for now to think about how we should deal with infinities:
Replace them with NaN's by default
Group them in separate bins and plot them separately
Reject the variable since algorithms won't fit anyway
Would love input on this as well.
To reproduce:
import pandas as pd
import numpy as np
series=pd.Series([2.307568, 59.642170, np.inf, 1241.660000, np.inf])
plot = series.plot(kind='hist', bins=10)
Testing
pandas-profiling
with the df I am currently working with, and gettingThe dataframe I'm using:
The text was updated successfully, but these errors were encountered: