Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unorderable types: str() > int() #29

Closed
simonm3 opened this issue Oct 31, 2016 · 13 comments
Closed

unorderable types: str() > int() #29

simonm3 opened this issue Oct 31, 2016 · 13 comments

Comments

@simonm3
Copy link

simonm3 commented Oct 31, 2016

I get the above error message. Works fine if I exclude the object columns.

@simonm3
Copy link
Author

simonm3 commented Oct 31, 2016

Some columns contained errors e.g. a numeric column had NaN values as " NA".
Would be useful if such failure reported the offending columns rather than failing.

@JosPolfliet
Copy link
Contributor

JosPolfliet commented Nov 4, 2016

Good point, I never thought about column types that are not numeric, character or dates. They should indeed just be ignored with a warning message that the type is not supported for analysis.

@simonm3
Copy link
Author

simonm3 commented Nov 4, 2016

As well as reporting them as ignored would be useful to show frequency
counts if possible. For example if you have values "P", "1", "1.0" then it
is clear what the problem is.

On 4 November 2016 at 09:24, Jos Polfliet notifications@github.com wrote:

Good point, I never thougt about column types that are not numeric,
character or dates. They should indeed just be ignored with a warning
message that the type is not supported for analysis.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#29 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABJN6RGNI1wgYOvDx79TcG-B-qhr_cIbks5q6vnDgaJpZM4Klcs0
.

@arsenyinfo
Copy link
Contributor

It would be also useful if one could convert these columns into same type: e.g. if I have both str and int within one column, it's probably a good idea to treat it as str while making the report.

I'd suggest to add a param to ProfileReport for this case with three options: exclude weird columns, cast to str or just raise an exception.

@JosPolfliet would you like to get a PR on this?

@JosPolfliet
Copy link
Contributor

Yes @arsenyinfo, if you have time feel free to send a PR and I will review! Thanks.

@arsenyinfo
Copy link
Contributor

@JosPolfliet PTAL at PR above. Thanks!

@JosPolfliet
Copy link
Contributor

JosPolfliet commented Dec 28, 2016 via email

@simonm3
Copy link
Author

simonm3 commented Mar 5, 2017

In addition to the mixed type fields there is a list type which also crashes the whole report currently. As a short term fix would be good to flag it up as an unsupported type. In the longer term would be useful to see most common values and distribution.

@JosPolfliet
Copy link
Contributor

JosPolfliet commented Mar 6, 2017

I thought this was fixed in the last version. It was related to a Pandas bug. PR was closed but this issue wasn't (mea culpa).

Can you share a working example of when it fails?

@simonm3
Copy link
Author

simonm3 commented Mar 6, 2017

from pandas_profiling import ProfileReport
a=pd.DataFrame(dict(a=[1,2,3], b=[4,5,6], mylist=[["item1", "item2"], ["item3", "item2", "item3"], ["item2", "item2"]]))
ProfileReport(a)

unhashable type list

@simonm3
Copy link
Author

simonm3 commented Mar 6, 2017

BTW this is a really great package but can't you change the name to something without underscores, hyphens and capital letters?....e.g.

from pandaseda import eda
eda()

@conradoqg
Copy link
Contributor

conradoqg commented Jan 4, 2018

Hey,

With the above PR #82 merged, you can have dicts, lists and other object types in your dataframe. The profiler will mark those fields as unsupported (since there is not much analysis that we can do).

If in the future we find a nice way to report something related to that data type we can open a feature request.

About the package naming, I think it's a good suggestion and we should open a specific issue for this.
@romainx can you do that?

I think we can close this issue after those changes.

Best

@romainx
Copy link
Contributor

romainx commented Jan 6, 2018

Ok I've just created a new issue of the name (#87).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants