Skip to content

[DRAFT] Implement Vision Capabilities for Azure OpenAI Integration #3634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

DmitryKatson
Copy link
Contributor

@DmitryKatson DmitryKatson commented May 4, 2025

Summary

This PR adds support for vision-enabled AI models to our Azure OpenAI integration, allowing Business Central to send both text and images to AI models and receive detailed responses about image content.

Technical Implementation

We've extended the AOAI Chat Messages codeunit with methods to handle various image sources:

  • Image URLs
  • Image streams (InStream)
  • TenantMedia records
  • Temp Blobs
  • MediaSet fields (supporting multiple images)

All methods convert the images to the format required by the Azure OpenAI API - either direct URLs or base64-encoded data URLs with appropriate MIME types.

Key Features

  • Support for all common image formats (PNG, JPEG, GIF, WEBP)
  • Detail level parameter to control image analysis depth (auto, low, high)
  • Multiple images support in a single request (with 10-image limit enforced)
  • Automatic handling of MediaSet fields with multiple images
  • Proper error handling for unsupported file types and empty images

Usage Examples

Example 1: Using an image URL

var
    AOAIChatMessages: Codeunit "AOAI Chat Messages";
    UserText: Text;
    ImageUrl: Text;
begin
    UserText := 'What can you tell me about this product?';
    ImageUrl := 'https://example.com/product-image.jpg';
    
    // Add a message with both text and image URL
    AOAIChatMessages.AddUserMessage(UserText, ImageUrl, Enum::"AOAI Image Detail Level"::high);
    
    // Send to AI model and process response...
end;

Example 2: Using an image from a Record MediaSet field

var
    AOAIChatMessages: Codeunit "AOAI Chat Messages";
    Item: Record Item;
    MediaSetId: Guid;
    UserText: Text;
begin
    Item.Get('ITEM0001');
    MediaSetId := Item.Picture.MediaId;
    UserText := 'Analyze this product image and suggest improvements:';
    
    // Add a message with text and image(s) from MediaSet
    AOAIChatMessages.AddUserMessage(UserText, MediaSetId, Enum::"AOAI Image Detail Level"::auto);
    
    // Send to AI model and process response...
end;

Example 3: Using an image stream (e.g., from file upload)

var
    AOAIChatMessages: Codeunit "AOAI Chat Messages";
    FileManagement: Codeunit "File Management";
    ImageStream: InStream;
    UserText: Text;
    FileName: Text;
    FileExtension: Text;
begin
    UploadImageFile();
    UserText := 'What does this image show?';
    
    // Add a message with text and image from stream
    AOAIChatMessages.AddUserMessage(UserText, ImageStream, FileExtension, Enum::"AOAI Image Detail Level"::low);
    
    // Send to AI model and process response...
end;

local procedure UploadImageFile()
    var
        ImportImageLbl: Label 'Import Image';
        ImageFileTypeFilterLbl: Label 'Image Files (*.jpg;*.jpeg;*.png)|*.jpg;*.jpeg;*.png';
        FileManagement: Codeunit "File Management";
    begin
        if UploadIntoStream(ImportImageLbl, '', ImageFileTypeFilterLbl, FileName, ImageStream) then
            FileExtension := FileManagement.GetExtension(FileName);
    end;

Example 4: Using multiple images from a MediaSet

var
    AOAIChatMessages: Codeunit "AOAI Chat Messages";
    SalesHeader: Record "Sales Header";
    MediaSetId: Guid;
    UserText: Text;
begin
    SalesHeader.Get(SalesHeader."Document Type"::Order, '1001');
    MediaSetId := SalesHeader.Attachments.MediaId;
    UserText := 'Compare these product images and tell me the differences:';
    
    // Add a message with text and multiple images from MediaSet
    AOAIChatMessages.AddUserMessage(UserText, MediaSetId, Enum::"AOAI Image Detail Level"::high);
    
    // The API will process up to 10 images maximum
    // Send to AI model and process response...
end;

Testing

Comprehensive test coverage has been added with 11 test cases that validate:

  • Proper JSON message structure for all input types
  • URL and data URL encoding
  • MediaSet handling with multiple images
  • 10-image limit enforcement
  • Error handling for unsupported file types
  • End-to-end communication flow
image

Notes

  • This implementation requires access to vision-enabled models like GPT-4o, GPT-4o-mini
  • When using high detail level, token consumption will increase significantly

Work Item(s)

Fixes #3633

Fixes AB#578341

- Introduced new procedures in AOAIChatMessages.Codeunit.al to handle user messages with images from various sources (URL, InStream, MediaSet, Tenant Media, Temp Blob).
- Added AOAIImageDetailLevel.Enum.al to define detail levels for image processing.
- Implemented AOAIImagesImpl.Codeunit.al for image content preparation and encoding.
- Updated app.json to include new modules: BLOB Storage and Base64 Convert.
@github-actions github-actions bot added AL: System Application From Fork Pull request is coming from a fork labels May 4, 2025
@JesperSchulz JesperSchulz added the Integration GitHub request for Integration area label May 14, 2025
@JesperSchulz JesperSchulz changed the title Implement Vision Capabilities for Azure OpenAI Integration [DRAFT] Implement Vision Capabilities for Azure OpenAI Integration Jun 4, 2025
@github-actions github-actions bot added the Linked Issue is linked to a Azure Boards work item label Jun 4, 2025
@github-actions github-actions bot added this to the Version 27.0 milestone Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AL: System Application From Fork Pull request is coming from a fork Integration GitHub request for Integration area Linked Issue is linked to a Azure Boards work item
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BC Idea]: Add Support for Vision-Enabled AI Models in Business Central
2 participants