This repository has been archived by the owner on Nov 22, 2022. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update distinct() and repartition() definitions (#138)
* Update repartition functions to allow for Col in numPartitions parameter. Reference: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.repartition numPartitions – can be an int to specify the target number of partitions or a Column. If it is a Column, it will be used as the first partitioning column. If not specified, the default number of partitions is used. * Add stub for DataFrame#distinct * Break apart the Union into overloaded signatures * Update repartitionByRange by splitting the Union into overloaded signatures
- Loading branch information