Skip to content

feat:Adding image upload handler functionality to PPTX converter #1197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

UHB4
Copy link

@UHB4 UHB4 commented Apr 21, 2025

  • Add handler to upload images extracted from PPTX to external storage
  • Generate unique filenames based on UUID to prevent image filename collisions
  • Implement fallback handling logic for upload failures
  • Add unit tests

Key Changes

Image Upload Handler

Introduced support for a user-defined upload_handler function that uploads extracted images to storage.
The handler receives binary image data along with metadata and returns the uploaded image URL.

Unique Filename Generation

Each image is assigned a unique filename using UUID.
The original file extension is preserved.

Error Handling and Fallback Logic

If the upload handler fails, the converter falls back to the existing behavior.
If the returned URL is invalid, a fallback mechanism is applied.

Test Cases

  • Validate URL format returned by the upload handler
  • Ensure metadata contains all required fields
  • Verify image content integrity
  • Handle concurrent document conversions
  • Test error handling and invalid return values from the handler

How It Works

Image Extraction & Preparation

When an image is detected in a PowerPoint slide:

  • Binary data and metadata are extracted
  • A UUID-based filename is generated (e.g., 123e4567-e89b-12d3-a456-426614174000.jpg)

Upload Handler Call

The extracted image data and metadata (filename, content_type) are passed to the user-defined upload_handler.

Markdown Generation

The handler's returned URL is inserted into the Markdown as an image link:
![Image Description](http://example.com/image.jpg)
If the handler fails, the fallback uses a data URI or just the filename, based on settings.

Error Handling

All exceptions during upload are handled gracefully.
If the URL returned by the handler is invalid, fallback logic is applied.

Example Usage

def my_upload_handler(image_blob, meta):
    # In a real implementation, this part should contain the image upload logic
    # For example: s3_client.upload_fileobj(io.BytesIO(image_blob), 'my-bucket', meta['filename'])
    
    # Return the URL after upload
    url = f"https://my-storage.example.com/images/{meta['filename']}"
    return url 

Test Coverage

This PR includes tests to verify:

That the URL returned by the handler is correctly inserted into the Markdown

That metadata contains all required fields

That image binary data is passed properly

That the handler works under concurrent conversions

That fallback behavior works on handler exceptions

That invalid return values from the handler are handled properly

This feature enables integration with external storage systems when converting images from PowerPoint presentations to Markdown, improving image management and enhancing the reusability of the resulting Markdown documents.

* Add handler to upload images extracted from PPTX to external storage
* Generate unique filenames based on UUID to prevent image filename collisions
* Implement fallback handling logic for upload failures
* Add unit tests
@UHB4
Copy link
Author

UHB4 commented Apr 21, 2025

@microsoft-github-policy-service agree company="minisoft"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant