Skip to content

Fix misleading add_column() usage example in docstring #7648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ArjunJagdale
Copy link
Contributor

@ArjunJagdale ArjunJagdale commented Jun 27, 2025

Fixes #7611

This PR fixes the usage example in the Dataset.add_column() docstring, which previously implied that add_column() modifies the dataset in-place.

Why:
The method returns a new dataset with the additional column, and users must assign the result to a variable to preserve the change.

This should make the behavior clearer for users.
@lhoestq @davanstrien

This PR fixes the usage example in the Dataset.add_column() docstring, which previously implied that add_column() modifies the dataset in-place.

Why:
The method returns a new dataset with the additional column, and users must assign the result to a variable to preserve the change.

Fixes huggingface#7611
@lhoestq
Copy link
Member

lhoestq commented Jul 7, 2025

I believe there are other occurences of cases like this, like select_columns, select, filter, shard and flatten, could you also fix the docstring for them as well before we merge ?

… shard, and flatten

Fix misleading docstring examples for select_columns, select, filter, shard, and flatten

- Updated usage examples to show correct behavior (methods return new datasets)
- Added inline comments to clarify that methods do not modify in-place
- Fixes follow-up from issue huggingface#7611 and @lhoestq’s review on PR huggingface#7648
@ArjunJagdale
Copy link
Contributor Author

Done! @lhoestq! I've updated the docstring examples for the following methods to clarify that they return new datasets instead of modifying in-place:

  • select_columns
  • select
  • filter
  • shard
  • flatten

@ArjunJagdale
Copy link
Contributor Author

Also, any suggestions on what kind of issues I should work on next? I tried looking on my own, but I’d be happy if you could assign me something — I’ll do my best!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Code example for dataset.add_column() does not reflect correct way to use function
2 participants