Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

Update distinct() and repartition() definitions #138

Merged
merged 4 commits into from Jun 30, 2019

Conversation

zpencerq
Copy link
Contributor

@zpencerq zpencerq commented May 23, 2019

Update repartition functions to allow for Col in numPartitions parameter.

Reference

numPartitions – can be an int to specify the target number of partitions or a Column.
    If it is a Column, it will be used as the first partitioning column.
    If not specified, the default number of partitions is used.

Also add stub for DataFrame#distinct()

…ter.

Reference: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.repartition

numPartitions – can be an int to specify the target number of partitions or a Column.
If it is a Column, it will be used as the first partitioning column.
If not specified, the default number of partitions is used.
@zero323
Copy link
Owner

zero323 commented May 30, 2019

Thanks for the PR @zpencerq!

I think that overloading definitions would be more appropriate than Union. Logically these are two different signatures, and the meaning of the Column argument is quite different from a numeric one.

@zpencerq
Copy link
Contributor Author

Ah, good point. I updated it in 18bced3 to reflect that. 👍

@zero323
Copy link
Owner

zero323 commented Jun 1, 2019

I guess repartitionByRange should be overloaded as well, shouldn't it?

@zpencerq
Copy link
Contributor Author

zpencerq commented Jun 2, 2019

Might as well take care of that while we're at it. 5b1b526

@zero323 zero323 merged commit 623b0c0 into zero323:master Jun 30, 2019
@zero323
Copy link
Owner

zero323 commented Jun 30, 2019

Looks good, merging into master.

Thanks for your work @zpencerq!

@zpencerq zpencerq deleted the update_distinct_repartition branch July 26, 2019 18:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants