Skip to content

Update sql-using-python-udf.py#1

Closed
Ritinikhil wants to merge 2 commits intotimsaucer:mainfrom
Ritinikhil:main
Closed

Update sql-using-python-udf.py#1
Ritinikhil wants to merge 2 commits intotimsaucer:mainfrom
Ritinikhil:main

Conversation

@Ritinikhil
Copy link
Copy Markdown

enhance sql-using-python-udf example

  • Add comprehensive comments and documentation
  • Implement multiple data registration methods for API compatibility
  • Add version information printing for debugging
  • Improve error handling with informative messages
  • Add formatted table output for better readability
  • Include input validation through PyArrow schema
  • Add results verification with assertions

Author: Ritinikhil
Date: 2025-03-06

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

enhance sql-using-python-udf example

- Add comprehensive comments and documentation
- Implement multiple data registration methods for API compatibility
- Add version information printing for debugging
- Improve error handling with informative messages
- Add formatted table output for better readability
- Include input validation through PyArrow schema
- Add results verification with assertions

Author: Ritinikhil
Date: 2025-03-06
improve path handling in substrait example

- Add cross-platform path handling using os.path
- Add error handling for CSV file registration
- Improve code documentation
- Remove hard-coded Windows paths
- Keep Apache license header intact

Author: Ritinikhil
@timsaucer
Copy link
Copy Markdown
Owner

I just noticed this - can you change your upstream to the main repo rather than my fork?

@Ritinikhil Ritinikhil closed this by deleting the head repository Mar 9, 2025
timsaucer added a commit that referenced this pull request Apr 18, 2026
- Wrap CASE/WHEN method-chain examples in parentheses and assign to a
  variable so they are valid Python as shown (Copilot #1, #2).
- Fix INTERSECT/EXCEPT mapping: the default distinct=False corresponds to
  INTERSECT ALL / EXCEPT ALL, not the distinct forms. Updated both the
  Set Operations section and the SQL reference table to show both the
  ALL and distinct variants (Copilot apache#4).
- Change write_parquet / write_csv / write_json examples to file-style
  paths (output.parquet, etc.) to match the convention used in existing
  tests and examples. Note that a directory path is also valid for
  partitioned output (Copilot apache#5).

Verified INTERSECT/EXCEPT semantics with a script:
  df1.intersect(df2)                -> [1, 1, 2]  (= INTERSECT ALL)
  df1.intersect(df2, distinct=True) -> [1, 2]     (= INTERSECT)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants