Skip to content

Adding Documentation for SparkNLP Readers and Partition class #14571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

paulamib123
Copy link
Contributor

Description

This PR adds documentation and examples for the Partition class and various document readers in spark-nlp.

Motivation and Context

Helps users understand how to use Partition and Readers to read different types of Documents.

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code improvements with no or little impact
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING page.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@DevinTDHa DevinTDHa added documentation DON'T MERGE Do not merge this PR labels May 13, 2025
Copy link
Member

@DevinTDHa DevinTDHa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some long lines need truncation and some minor things but otherwise looks good to me. Thanks!

Comment on lines +50 to +51
Example 1 (Reading Text Files)
----------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check if these custom headers render correctly?

@DevinTDHa DevinTDHa requested a review from danilojsl May 14, 2025 15:52
@DevinTDHa
Copy link
Member

@danilojsl Can you also check if these docs are accurate for your new feature?

@danilojsl
Copy link
Contributor

danilojsl commented May 15, 2025

@paulamib123 Could you please create a new branch based on feature/SPARKNLP-1174-Adding-PartitionTransformer and add your changes there?

I also recommend cloning the original spark-nlp repository directly instead of forking it. This will make it easier for us to review your changes locally and provide more effective feedback.

@DevinTDHa
Copy link
Member

@danilojsl thanks for the thorough review!

@paulamib123 I also usually work on my own fork and If you rebase your changes to the newest brach it should be fine as well.

@danilojsl
Copy link
Contributor

@paulamib123 I forgot that the new feature/SPARKNLP-1174-Adding-PartitionTransformer includes the PartitionTransformer annotator, so we'll need to add documentation for it. Please keep that in mind. If you have any questions, don’t hesitate to reach out to me.

@DevinTDHa
Copy link
Member

Closing, duplicate of #14581

@DevinTDHa DevinTDHa closed this May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation DON'T MERGE Do not merge this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants