[DRAFT] Implement Vision Capabilities for Azure OpenAI Integration #3634

DmitryKatson · 2025-05-04T12:44:16Z

Summary

This PR adds support for vision-enabled AI models to our Azure OpenAI integration, allowing Business Central to send both text and images to AI models and receive detailed responses about image content.

Technical Implementation

We've extended the AOAI Chat Messages codeunit with methods to handle various image sources:

Image URLs
Image streams (InStream)
TenantMedia records
Temp Blobs
MediaSet fields (supporting multiple images)

All methods convert the images to the format required by the Azure OpenAI API - either direct URLs or base64-encoded data URLs with appropriate MIME types.

Key Features

Support for all common image formats (PNG, JPEG, GIF, WEBP)
Detail level parameter to control image analysis depth (auto, low, high)
Multiple images support in a single request (with 10-image limit enforced)
Automatic handling of MediaSet fields with multiple images
Proper error handling for unsupported file types and empty images

Usage Examples

Example 1: Using an image URL

var
    AOAIChatMessages: Codeunit "AOAI Chat Messages";
    UserText: Text;
    ImageUrl: Text;
begin
    UserText := 'What can you tell me about this product?';
    ImageUrl := 'https://example.com/product-image.jpg';
    
    // Add a message with both text and image URL
    AOAIChatMessages.AddUserMessage(UserText, ImageUrl, Enum::"AOAI Image Detail Level"::high);
    
    // Send to AI model and process response...
end;

Example 2: Using an image from a Record MediaSet field

var
    AOAIChatMessages: Codeunit "AOAI Chat Messages";
    Item: Record Item;
    MediaSetId: Guid;
    UserText: Text;
begin
    Item.Get('ITEM0001');
    MediaSetId := Item.Picture.MediaId;
    UserText := 'Analyze this product image and suggest improvements:';
    
    // Add a message with text and image(s) from MediaSet
    AOAIChatMessages.AddUserMessage(UserText, MediaSetId, Enum::"AOAI Image Detail Level"::auto);
    
    // Send to AI model and process response...
end;

Example 3: Using an image stream (e.g., from file upload)

var
    AOAIChatMessages: Codeunit "AOAI Chat Messages";
    FileManagement: Codeunit "File Management";
    ImageStream: InStream;
    UserText: Text;
    FileName: Text;
    FileExtension: Text;
begin
    UploadImageFile();
    UserText := 'What does this image show?';
    
    // Add a message with text and image from stream
    AOAIChatMessages.AddUserMessage(UserText, ImageStream, FileExtension, Enum::"AOAI Image Detail Level"::low);
    
    // Send to AI model and process response...
end;

local procedure UploadImageFile()
    var
        ImportImageLbl: Label 'Import Image';
        ImageFileTypeFilterLbl: Label 'Image Files (*.jpg;*.jpeg;*.png)|*.jpg;*.jpeg;*.png';
        FileManagement: Codeunit "File Management";
    begin
        if UploadIntoStream(ImportImageLbl, '', ImageFileTypeFilterLbl, FileName, ImageStream) then
            FileExtension := FileManagement.GetExtension(FileName);
    end;

Example 4: Using multiple images from a MediaSet

var
    AOAIChatMessages: Codeunit "AOAI Chat Messages";
    SalesHeader: Record "Sales Header";
    MediaSetId: Guid;
    UserText: Text;
begin
    SalesHeader.Get(SalesHeader."Document Type"::Order, '1001');
    MediaSetId := SalesHeader.Attachments.MediaId;
    UserText := 'Compare these product images and tell me the differences:';
    
    // Add a message with text and multiple images from MediaSet
    AOAIChatMessages.AddUserMessage(UserText, MediaSetId, Enum::"AOAI Image Detail Level"::high);
    
    // The API will process up to 10 images maximum
    // Send to AI model and process response...
end;

Testing

Comprehensive test coverage has been added with 11 test cases that validate:

Proper JSON message structure for all input types
URL and data URL encoding
MediaSet handling with multiple images
10-image limit enforcement
Error handling for unsupported file types
End-to-end communication flow

Notes

This implementation requires access to vision-enabled models like GPT-4o, GPT-4o-mini
When using high detail level, token consumption will increase significantly

Work Item(s)

Fixes #3633

Fixes AB#578341

- Introduced new procedures in AOAIChatMessages.Codeunit.al to handle user messages with images from various sources (URL, InStream, MediaSet, Tenant Media, Temp Blob). - Added AOAIImageDetailLevel.Enum.al to define detail levels for image processing. - Implemented AOAIImagesImpl.Codeunit.al for image content preparation and encoding. - Updated app.json to include new modules: BLOB Storage and Base64 Convert.

…f image handling features.

DmitryKatson added 2 commits May 4, 2025 15:22

Created AzureOpenAIVisionTest.Codeunit.al for comprehensive testing o…

20ed36c

…f image handling features.

github-actions bot added AL: System Application From Fork Pull request is coming from a fork labels May 4, 2025

JesperSchulz assigned JesperSchulz and DmitryKatson May 14, 2025

JesperSchulz added the Integration GitHub request for Integration area label May 14, 2025

JesperSchulz changed the title ~~Implement Vision Capabilities for Azure OpenAI Integration~~ [DRAFT] Implement Vision Capabilities for Azure OpenAI Integration Jun 4, 2025

github-actions bot added the Linked Issue is linked to a Azure Boards work item label Jun 4, 2025

github-actions bot added this to the Version 27.0 milestone Jun 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT] Implement Vision Capabilities for Azure OpenAI Integration #3634

[DRAFT] Implement Vision Capabilities for Azure OpenAI Integration #3634

Uh oh!

DmitryKatson commented May 4, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

[DRAFT] Implement Vision Capabilities for Azure OpenAI Integration #3634

Are you sure you want to change the base?

[DRAFT] Implement Vision Capabilities for Azure OpenAI Integration #3634

Uh oh!

Conversation

DmitryKatson commented May 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Technical Implementation

Key Features

Usage Examples

Example 1: Using an image URL

Example 2: Using an image from a Record MediaSet field

Example 3: Using an image stream (e.g., from file upload)

Example 4: Using multiple images from a MediaSet

Testing

Notes

Work Item(s)

Uh oh!

Uh oh!

DmitryKatson commented May 4, 2025 •

edited by github-actions bot

Loading