Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
---
title: CompleteContextQuestionProcessor
description: CompleteContextQuestionProcessor class enables you to ask questions about a PDF document and receive answers based on the entire document content.
page_title: CompleteContextQuestionProcessor
slug: radpdfprocessing-features-gen-ai-powered-document-insights-complete-context-question-processor
tags: ai, document, analysis, question, processor, complete, context
published: True
position: 4
position: 5
---
<style>
table, th, td {
Expand All @@ -22,8 +23,6 @@ table th:nth-of-type(2) {

The **CompleteContextQuestionProcessor** class enables you to ask questions about a PDF document and receive answers based on the entire document content. This processor sends the complete document text to the AI model, which is suitable for smaller documents or when you need to ensure that the AI model has access to all the information in the document. This class inherits from the abstract **AIProcessorBase** class, which provides common functionality for all AI processors.

## When to Use CompleteContextQuestionProcessor

The **CompleteContextQuestionProcessor** is ideal for the following scenarios:

1. **Small Documents**: When the document is small enough to fit within the token limit of the AI model.
Expand All @@ -36,7 +35,7 @@ However, if you're working with larger documents or want to optimize token usage

|Property|Description|
|---|---|
|**Settings**|Gets the settings for the AI question-answering process. Returns [CompleteContextProcessorSettings]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-complete-context-question-processor%}#completectextprocessorsettings).|
|**Settings**|Gets the settings for the AI question-answering process. Returns [CompleteContextProcessorSettings]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-complete-context-question-processor%}#completecontextprocessorsettings).|

|Method|Description|
|---|---|
Expand All @@ -58,114 +57,7 @@ The following example demonstrates how to use the **CompleteContextQuestionProce

#### __[C#] Example 1: Using CompleteContextQuestionProcessor__

```csharp
private async void AskQuestionUsingCompleteContext()
{
// Load the PDF document
string filePath = @"path\to\your\document.pdf";
PdfFormatProvider formatProvider = new PdfFormatProvider();
RadFixedDocument fixedDocument;

using (FileStream fs = File.OpenRead(filePath))
{
fixedDocument = formatProvider.Import(fs);
}

// Set up the AI client (Azure OpenAI in this example)
string key = Environment.GetEnvironmentVariable("AZUREOPENAI_KEY");
string endpoint = Environment.GetEnvironmentVariable("AZUREOPENAI_ENDPOINT");
string model = "gpt-4o-mini";

AzureOpenAIClient azureClient = new(
new Uri(endpoint),
new Azure.AzureKeyCredential(key),
new AzureOpenAIClientOptions());
ChatClient chatClient = azureClient.GetChatClient(model);

IChatClient iChatClient = new OpenAIChatClient(chatClient);
int maxTokenCount = 128000;

// Create the processor
using (CompleteContextQuestionProcessor processor =
new CompleteContextQuestionProcessor(iChatClient, maxTokenCount))
{
try
{
// Customize settings if needed
processor.Settings.TokenizationEncoding = "cl100k_base";
processor.Settings.ModelId = "gpt-4o-mini";

// Example 1: Process full document
// Convert the document to a simple text representation
ISimpleTextDocument plainDoc = fixedDocument.ToSimpleTextDocument();

// Ask a question about the full document
string question = "What is the main subject of this document?";
string answer = await processor.AnswerQuestion(plainDoc, question);

Console.WriteLine($"Question: {question}");
Console.WriteLine($"Answer: {answer}");

// Ask another question
string question2 = "What are the key conclusions drawn in this document?";
string answer2 = await processor.AnswerQuestion(plainDoc, question2);

Console.WriteLine($"Question: {question2}");
Console.WriteLine($"Answer: {answer2}");

// Example 2: Process specific pages
// Convert only pages 5-10 to a simple text document (0-based index)
ISimpleTextDocument partialDoc = fixedDocument.ToSimpleTextDocument(4, 9);

// Ask a question about the specific pages
string pageQuestion = "Summarize the content of pages 5-10 of the document.";
string pageAnswer = await processor.AnswerQuestion(partialDoc, pageQuestion);

Console.WriteLine($"Question: {pageQuestion}");
Console.WriteLine($"Answer: {pageAnswer}");
}
catch (ArgumentException ex) when (ex.Message.Contains("The text is too long"))
{
Console.WriteLine("The document is too large to process with CompleteContextQuestionProcessor.");
Console.WriteLine("Consider using PartialContextQuestionProcessor instead.");
}
}
}
```

## Token Limit Considerations

The **CompleteContextQuestionProcessor** sends the entire document to the AI model, which means the document must fit within the model's token limit. If the document exceeds this limit, the **AnswerQuestion** method will throw an **ArgumentException**. This is a key difference from the [SummarizationProcessor]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-summarization-processor%}#handling-large-documents), which can handle documents of any size.

Here's how to check if a document is suitable for processing with **CompleteContextQuestionProcessor**:

#### __[C#] Example 2: Checking Document Size__

```csharp
private bool IsDocumentSuitableForCompleteContext(RadFixedDocument document, int modelMaxInputTokenLimit)
{
ISimpleTextDocument textDoc = document.ToSimpleTextDocument();

if (textDoc is ISimpleTextDocumentInternal internalDoc)
{
string text = internalDoc.Text;

// Create an encoding to count tokens
GptEncoding encoding = GptEncoding.GetEncoding("cl100k_base");

// Estimate the token count for the document text + prompt + typical question
const string prompt = "You are a helpful assistant. Use the following context to answer the question.";
const string typicalQuestion = "What is this document about?";

int estimatedTokens = encoding.Encode(prompt + text + typicalQuestion).Count;

// Allow for a safety margin
return estimatedTokens <= (int)(modelMaxInputTokenLimit * 0.9);
}

return false;
}
```
<snippet id='libraries-pdf-features-gen-ai-ask-questions-using-complete-context'/>

## See Also

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: Getting Started
description: Learn how to use the GenAI-powered Document Insights functionality to summarize a PDF document with PdfProcessing.
page_title: Overview
slug: radpdfprocessing-features-gen-ai-powered-document-insights-getting-started
tags: ai, document, analysis, overview, pdf, processing, genai, powered, insights
published: True
position: 2
---

# Getting Started

The following example demonstrates how to use the GenAI-powered Document Insights functionality to summarize a PDF document and ask questions about it:

>note The following code snippet is valid for Azure Open AI 9.3. The specific **IChatClient** initialization may be different according to the specific version.

>important For **.NET Framework** and **.NET Standard** an [IEmbeddingsStorage]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-partial-context-question-processor%}#implementing-custom-iembeddingsstorage) implementation is required for the [PartialContextQuestionProcessor]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-partial-context-question-processor%}).

#### __[C#] Example 1: Using GenAI-powered Document Insights__

<snippet id='libraries-pdf-features-gen-ai-getting-started'/>

When you run this code, the AI will process your document, generate a summary, and answer your questions.

## See Also

* [Prerequisites]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-prerequisites%})
* [SummarizationProcessor]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-summarization-processor%})
* [PartialContextQuestionProcessor]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-partial-context-question-processor%})
* [Custom IEmbeddingsStorage Implementation]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-partial-context-question-processor%}#implementing-custom-iembeddingsstorage)
* [CompleteContextQuestionProcessor]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-complete-context-question-processor%})
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Overview
description: Learn more about the GenAI-powered Document Insights feature of the PdfProcessing library.
page_title: Overview
slug: radpdfprocessing-features-gen-ai-powered-document-insights-overview
tags: ai, document, analysis, overview, pdf, processing, genai, powered, insights
Expand All @@ -11,115 +12,25 @@ position: 0

The GenAI-powered Document Insights feature enables you to easily extract insights from PDF documents using Large Language Models (LLMs). This functionality allows you to summarize document content and ask questions about the document, with the AI providing relevant answers based on the document's content.

The GenAI-powered Document Insights feature includes three main components:

* **[SummarizationProcessor]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-summarization-processor%})**: Generates concise summaries of PDF documents.
* **[CompleteContextQuestionProcessor]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-complete-context-question-processor%})**: Answers questions by providing the entire document content to the AI model.
* **[PartialContextQuestionProcessor]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-partial-context-question-processor%})**: Answers questions by providing only the relevant portions of the document to the AI model.

## Key Features

* **Extract Document Insights**: Quickly understand the key points of lengthy documents.
* **Efficient Information Retrieval**: Ask specific questions about your documents and receive accurate answers.
* **Token Optimization**: Reduce token usage by only sending relevant portions of the document to the AI model as shown in the [PartialContextQuestionProcessor]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-partial-context-question-processor%}#when-to-use-partialcontextquestionprocessor) section.
* **Multiple LLM Support**: Compatible with different AI providers including Azure OpenAI, OpenAI, and Ollama as described in the [Prerequisites]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-prerequisites%}#ai-provider-setup).

## Complete Example

The following example demonstrates how to use the GenAI-powered Document Insights functionality to summarize a PDF document and ask questions about it:

#### __[C#] Example 1: Using GenAI-powered Document Insights__

```csharp
private async void ProcessPdfWithAI()
{
// Load the PDF document
string filePath = @"path\to\your\document.pdf";
PdfFormatProvider formatProvider = new PdfFormatProvider();
RadFixedDocument fixedDocument;

using (FileStream fs = File.OpenRead(filePath))
{
fixedDocument = formatProvider.Import(fs);
}

// Convert the document to a simple text representation
ISimpleTextDocument plainDoc = fixedDocument.ToSimpleTextDocument();

// Set up the AI client (Azure OpenAI in this example)
string key = Environment.GetEnvironmentVariable("AZUREOPENAI_KEY");
string endpoint = Environment.GetEnvironmentVariable("AZUREOPENAI_ENDPOINT");
string model = "gpt-4o-mini";

AzureOpenAIClient azureClient = new(
new Uri(endpoint),
new Azure.AzureKeyCredential(key),
new AzureOpenAIClientOptions());
ChatClient chatClient = azureClient.GetChatClient(model);

IChatClient iChatClient = new OpenAIChatClient(chatClient);
int maxTokenCount = 128000;

// 1. Summarize the document
using (SummarizationProcessor summarizationProcessor = new SummarizationProcessor(iChatClient, maxTokenCount))
{
// Handle resources calculation event to control token usage
summarizationProcessor.SummaryResourcesCalculated += (object sender, SummaryResourcesCalculatedEventArgs e) =>
{
Console.WriteLine($"Estimated calls required: {e.EstimatedCallsRequired}");
Console.WriteLine($"Estimated tokens required: {e.EstimatedTokensRequired}");

// Confirm if the operation should continue
e.ShouldContinueExecution = true;
};

string summary = await summarizationProcessor.Summarize(plainDoc);
Console.WriteLine("Document Summary:");
Console.WriteLine(summary);
}

// 2. Answer questions using partial context (recommended for efficiency)
#if NET7_0_OR_GREATER
using (PartialContextQuestionProcessor partialContextQuestionProcessor =
new PartialContextQuestionProcessor(iChatClient, maxTokenCount, plainDoc))
{
string question = "What are the main findings in the document?";
string answer = await partialContextQuestionProcessor.AnswerQuestion(question);

Console.WriteLine($"Question: {question}");
Console.WriteLine($"Answer: {answer}");
}
#else
IEmbeddingsStorage embeddingsStorage = new OllamaEmbeddingsStorage();
using (PartialContextQuestionProcessor partialContextQuestionProcessor =
new PartialContextQuestionProcessor(iChatClient, embeddingsStorage, maxTokenCount, plainDoc))
{
string question = "What are the main findings in the document?";
string answer = await partialContextQuestionProcessor.AnswerQuestion(question);

Console.WriteLine($"Question: {question}");
Console.WriteLine($"Answer: {answer}");
}
#endif

// 3. Answer questions using complete context (for smaller documents)
using (CompleteContextQuestionProcessor completeContextQuestionProcessor =
new CompleteContextQuestionProcessor(iChatClient, maxTokenCount))
{
string question = "What is the conclusion of the document?";
string answer = await completeContextQuestionProcessor.AnswerQuestion(plainDoc, question);

Console.WriteLine($"Question: {question}");
Console.WriteLine($"Answer: {answer}");
}
}
```
The GenAI-powered Document Insights feature includes three main components:

When you run this code, the AI will process your document, generate a summary, and answer your questions.
|Processor|Description|
|----|----|
|**[SummarizationProcessor]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-summarization-processor%})**|Generates concise summaries of PDF documents.|
|**[CompleteContextQuestionProcessor]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-complete-context-question-processor%})**|Answers questions by providing the entire document content to the AI model.|
|**[PartialContextQuestionProcessor]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-partial-context-question-processor%})**|Answers questions by providing only the relevant portions of the document to the AI model.|

## See Also

* [Prerequisites]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-prerequisites%})
* [Getting Started]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-getting-started%})
* [SummarizationProcessor]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-summarization-processor%})
* [PartialContextQuestionProcessor]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-partial-context-question-processor%})
* [CompleteContextQuestionProcessor]({%slug radpdfprocessing-features-gen-ai-powered-document-insights-complete-context-question-processor%})
Loading