Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

odo support #45

Closed
femtotrader opened this issue Aug 24, 2015 · 6 comments
Closed

odo support #45

femtotrader opened this issue Aug 24, 2015 · 6 comments

Comments

@femtotrader
Copy link
Contributor

Hello,

maybe you should consider to add support to odo
http://odo.readthedocs.org/
so it will become possible to:

  • get a Pandas DataFrame
  • store to a database supported by SQLAlchemy
  • ...

Kind regards

@femtotrader
Copy link
Contributor Author

Not sure that's very efficient but

>> from odo import odo
>> from pandas as pd
>> df = odo(list(datapkg.data), pd.DataFrame)
>> df

              CPI Country Code Country Name       Year
0       89.169588          AFG  Afghanistan 2004-01-01
1      100.000000          AFG  Afghanistan 2005-01-01
2      103.489659          AFG  Afghanistan 2006-01-01
3      121.031857          AFG  Afghanistan 2007-01-01
4      148.516209          AFG  Afghanistan 2008-01-01
...           ...          ...          ...        ...
6743     1.223167          ZWE     Zimbabwe 2002-01-01
6744     6.503575          ZWE     Zimbabwe 2003-01-01
6745    24.868384          ZWE     Zimbabwe 2004-01-01
6746   100.000000          ZWE     Zimbabwe 2005-01-01
6747  1196.677633          ZWE     Zimbabwe 2006-01-01

[6748 rows x 4 columns]

maybe a method to get a DataFrame should exists see #56

@femtotrader
Copy link
Contributor Author

>> odo(datapkg, pd.DataFrame)

raises

KeyError: <class 'datapackage.datapackage.DataPackage'>

see blaze/odo#317

@femtotrader
Copy link
Contributor Author

I noticed (sorry I'm new to datapackage) that DataPackages can have several "ressources".

That's the reason why datapkg.data is an itertools.chain and datapkg.get_data(datapkg.resources[0]) is a generator.

I don't know how we could handle this with odo.

My opinion is :

  • if "datapkg" is given as a source to odo, first ressource (datapkg.resources[0]) should be used
  • if a "ressource" is given as a source to odo, it should be used

it will be possible to do

odo(datapkg, pd.DataFrame)

which will be same as

odo(datapkg.resources[0], pd.DataFrame)

but accessing an other ressource (second resource) and converting it to DataFrame will also be possible using for example

odo(datapkg.resources[1], pd.DataFrame)

A Ressource should have it's own get_data() method which returns a generator.
( see #21 )

It will be nice if you could provide me an URL for a DataPackage with several ressources, it will help for testing purpose.

@femtotrader
Copy link
Contributor Author

I think we should also work to be able to convert a Resource schema to a DataShape. It will help for this odo support and should open a lot of possibilities in term of output.

see http://datashape.pydata.org/ and http://odo.pydata.org/en/latest/datashape.html

so we will be able to create dispatcher for discover http://odo.pydata.org/en/latest/add-new-backend.html

@femtotrader
Copy link
Contributor Author

A library to convert JSON Table Schema <--> DataShape is now available at
https://github.com/okfn/jts-datashape

@pwalsh
Copy link
Contributor

pwalsh commented Apr 8, 2016

@trickvi can we close this one? If we go forward with it, we can start from the lib @femtotrader has above, and work it into a plugin for jsontableschema-py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants