Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with csvdedupe #825

Closed
fgregg opened this issue Apr 18, 2017 · 1 comment
Closed

Integration with csvdedupe #825

fgregg opened this issue Apr 18, 2017 · 1 comment

Comments

@fgregg
Copy link
Contributor

fgregg commented Apr 18, 2017

On twitter, @jpmckinney, @hunterowens, and I discussed integrating csvdedupe and csvlink with csvkit.

I'd really like to see closer connections between these projects, but this could take a number of forms.

  1. Complete Integration (csvdedupe and csvlink would be subsumed into csvkit)
    1. Pros
      1. Seamless experience for users
      2. Pooling of developer time
    2. Cons
      1. Current core devs of csvkit would need to become somewhat familiar with csvdedupe
      2. The complicated stuff that csvdedupe is doing may not fit within the csvkit philosophy
    3. Neutral
      1. A few years ago, it was pretty hard to install dedupe, but python packaging has gotten a lot better. I think this is not a serious disadvantage at present.
  2. Interface compatibility and publicizing each other's projects on these independent projects. csvdedupe and csvlink would need to provide csvkit's common arguments.
    1. Pros
      1. Better discoverability for users (more benefit for csvdedupe than csvkit obviously)
      2. No need for csvkit core devs to know anything about csvdedupe
      3. Users need to learn less to use csvdedupe
    2. Cons
      1. Harder for users
  3. Only publicizing each others's projects
    1. Pros
      1. Better discoverability for users (more benefit for csvdedupe than csvkit obviously)
        2. No need for csvkit core devs to know anything about csvdedupe
    2. Cons
      1. Harder for users
  4. Do Nothing (status quo)
    1. Pros
      1. Easiest for core devs
    2. Cons
      2. No advantages of 1,2, or 3

We, the core devs of csvdedupe, would be interested in options 1, 2, and 3.

Beyond, @jpmckinney and @onyxfish, @mbauman and @hunterowens might also be interested in this conversation.

@jpmckinney
Copy link
Member

I think we can pursue a version of 2, but that doesn't require as many changes to csvdedupe in terms of its support for common arguments. Users can just pipe their CSV through in2csv or csvformat (and use whatever csvkit options they want) and then pipe the output to csvdedupe. We can perhaps add a page to the tutorial specifically for using csvdedupe with csvkit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants