diff --git a/images/dpl-pdf.png b/images/dpl-pdf.png new file mode 100644 index 00000000..e6b48322 Binary files /dev/null and b/images/dpl-pdf.png differ diff --git a/images/dpl-spread.png b/images/dpl-spread.png new file mode 100644 index 00000000..1cbae9af Binary files /dev/null and b/images/dpl-spread.png differ diff --git a/images/dpl-words.png b/images/dpl-words.png new file mode 100644 index 00000000..29d663da Binary files /dev/null and b/images/dpl-words.png differ diff --git a/images/dpl-zip.png b/images/dpl-zip.png new file mode 100644 index 00000000..60983c68 Binary files /dev/null and b/images/dpl-zip.png differ diff --git a/images/ninja_looking.png b/images/ninja_looking.png new file mode 100644 index 00000000..1b281cea Binary files /dev/null and b/images/ninja_looking.png differ diff --git a/introduction.md b/introduction.md index 260f71a1..8d3b492b 100644 --- a/introduction.md +++ b/introduction.md @@ -12,6 +12,14 @@ position: 0 table th:first-of-type { width: 25%; } + +img[alt$="><"] { + display: block; + max-width: 100%; + height: auto; + margin: auto; + float: none!important; +} # Welcome to Telerik Document Processing Libraries @@ -26,15 +34,15 @@ table th:first-of-type { ## Libraries -Telerik Document Processing features the following components: +Telerik Document Processing features the following libraries: |Library|Description| |----|----| -| [RadPdfProcessing]({%slug radpdfprocessing-overview%})|A processing library that allows you to create, import, and export PDF documents from your code. You can use it in any web or desktop .NET application without relying on third-party software like Adobe Acrobat.| -|[RadSpreadProcessing]({%slug radspreadprocessing-overview%})|A powerful library that enables you to create applications with native support for spreadsheet documents. With RadSpreadProcessing, you can create spreadsheets from scratch, modify existing documents or convert between the most common spreadsheet formats. You can save the generated workbook to a local file, stream, or stream it to the client browser.| -|[RadSpreadStreamProcessing]({%slug radspreadstreamprocessing-overview%})|Spread streaming is a document processing paradigm that allows you to create or read big spreadsheet documents with great performance and minimal memory footprint. The key for the memory efficiency is that the spread streaming library writes the spreadsheet content directly to a stream without creating and preserving the spreadsheet document model in memory.| -|[RadWordsProcessing]({%slug radwordsprocessing-overview%})|A processing library that allows you to create, modify and export documents to a variety of formats. Through the API, you can access each element in the document and modify, remove it or add a new one. The generated content you can save as a stream, as a file, or sent it to the client browser.| -|[RadZipLibrary]({%slug radziplibrary-overview%})| It allows you to compress and combine files in ZIP archives, browse and extract files from existing ZIP archives and compress streams for easy file shipping and reduced storage space.| +|![Pdf](images/dpl-pdf.png) [RadPdfProcessing]({%slug radpdfprocessing-overview%})|A processing library that allows you to create, import, and export PDF documents from your code. You can use it in any web or desktop .NET application without relying on third-party software like Adobe Acrobat.| +|![Spread](images/dpl-spread.png) [RadSpreadProcessing]({%slug radspreadprocessing-overview%})|A powerful library that enables you to create applications with native support for spreadsheet documents. With RadSpreadProcessing, you can create spreadsheets from scratch, modify existing documents or convert between the most common spreadsheet formats. You can save the generated workbook to a local file, stream, or stream it to the client browser.| +|![SpreadStream](images/dpl-spread.png) [RadSpreadStreamProcessing]({%slug radspreadstreamprocessing-overview%})|Spread streaming is a document processing paradigm that allows you to create or read big spreadsheet documents with great performance and minimal memory footprint. The key for the memory efficiency is that the spread streaming library writes the spreadsheet content directly to a stream without creating and preserving the spreadsheet document model in memory.| +|![Words](images/dpl-words.png) [RadWordsProcessing]({%slug radwordsprocessing-overview%})|A processing library that allows you to create, modify and export documents to a variety of formats. Through the API, you can access each element in the document and modify, remove it or add a new one. The generated content you can save as a stream, as a file, or sent it to the client browser.| +|![Zip](images/dpl-zip.png) [RadZipLibrary]({%slug radziplibrary-overview%})| It allows you to compress and combine files in ZIP archives, browse and extract files from existing ZIP archives and compress streams for easy file shipping and reduced storage space.| ## Key Features @@ -52,21 +60,26 @@ For more details about the benefits of using Telerik Document Processing, see th ## Supported Formats - The Telerik Document Processing libraries support the following file formats: -* DOCX (Word Document) -* DOC (Word 97-2003 Document) -* DOT (Word 97-2003 Template) -* HTML -* PDF -* RTF -* TXT -* XLSX (Excel Workbook) -* XLS (Excel 97-2003 Workbook) -* XLSM (macro-enabled spreadsheet created by Microsoft Excel) *Macros are only preserved during import and export. They cannot be executed or changed in the code. -* CSV -* ZIP +![Ninja Looking ><](images/ninja_looking.png) + +|Format|Library|Provider| +|----|----|----| +|**DOCX (Word Document)**|[RadWordsProcessing]({%slug radwordsprocessing-overview%})|[DocxFormatProvider]({%slug radwordsprocessing-formats-and-conversion-docx-docxformatprovider%})| +|**DOC (Word 97-2003 Document)**|[RadWordsProcessing]({%slug radwordsprocessing-overview%})|[DocFormatProvider]({%slug radwordsprocessing-formats-and-conversion-doc-docformatprovider%}) Import only|[DocFormatProvider]({%slug radwordsprocessing-formats-and-conversion-doc-docformatprovider%})| +|**DOT (Word 97-2003 Template)**|[RadWordsProcessing]({%slug radwordsprocessing-overview%})|[DocFormatProvider]({%slug radwordsprocessing-formats-and-conversion-doc-docformatprovider%}) Import only| +|**HTML**|[RadWordsProcessing]({%slug radwordsprocessing-overview%})|[HtmlFormatProvider]({%slug radwordsprocessing-formats-and-conversion-html-htmlformatprovider%})| +|**PDF**|[RadWordsProcessing]({%slug radwordsprocessing-overview%})
[RadPdfProcessing]({%slug radpdfprocessing-overview%})
[RadSpreadProcessing]({%slug radspreadprocessing-overview%})|[PdfFormatProvider in RadWordsProcessing]({%slug radwordsprocessing-formats-and-conversion-pdf-pdfformatprovider%}) Export only
[PdfFormatProvider in RadPdfProcessing]({%slug radpdfprocessing-formats-and-conversion-pdf-pdfformatprovider%})
[PdfFormatProvider in RadSpreadProcessing]({%slug radspreadprocessing-formats-and-conversion-pdf-pdfformatprovider%}) Export only| +|**RTF**|[RadWordsProcessing]({%slug radwordsprocessing-overview%})|[RtfFormatProvider]({%slug radwordsprocessing-formats-and-conversion-rtf-rtfformatprovider%})| +|**TXT**|[RadWordsProcessing]({%slug radwordsprocessing-overview%})
[RadPdfProcessing]({%slug radpdfprocessing-overview%})
[RadSpreadProcessing]({%slug radspreadprocessing-overview%})|[TxtFormatProvider in RadWordsProcessing]({%slug radwordsprocessing-formats-and-conversion-txt-txtformatprovider%})
[TextFormatProvider in RadPdfProcessing]({%slug radpdfprocessing-formats-and-conversion-plain-text-textformatprovider%}) Export only
[TxtFormatProvider in RadSpreadProcessing]({%slug radspreadprocessing-formats-and-conversion-txt-txtformatprovider%})| +|**XLSX (Excel Workbook)**|[RadSpreadProcessing]({%slug radspreadprocessing-overview%})
[RadSpreadStreamProcessing]({%slug radspreadstreamprocessing-overview%})|[XlsxFormatProvider]({%slug radspreadprocessing-formats-and-conversion-xlsx-xlsxformatprovider%})| +|**XLS (Excel 97-2003 Workbook)**|[RadSpreadProcessing]({%slug radspreadprocessing-overview%})|[XlsFormatProvider]({%slug radspreadprocessing-formats-and-conversion-xls-xlsformatprovider%})| +|**XLSM (macro-enabled spreadsheet created by Microsoft Excel)** Macros are only preserved during import and export. They cannot be executed or changed in the code.|[RadSpreadProcessing]({%slug radspreadprocessing-overview%})|[XlsmFormatProvider]({%slug radspreadprocessing-formats-and-conversion-xlsm-xlsmformatprovider%})| +|**CSV**|[RadSpreadProcessing]({%slug radspreadprocessing-overview%})
[RadSpreadStreamProcessing]({%slug radspreadstreamprocessing-overview%})|[CsvFormatProvider]({%slug radspreadprocessing-formats-and-conversion-csv-csvformatprovider%})| +|**DataTable**|[RadSpreadProcessing]({%slug radspreadprocessing-overview%})|[DataTableFormatProvider]({%slug radspreadprocessing-formats-and-conversion-using-data-table-format-provider%})| +|**ZIP**|[RadZipLibrary]({%slug radziplibrary-overview%})|[ZipArchive]({%slug radziplibrary-gettingstarted%})| +|**Image**|[RadPdfProcessing]({%slug radpdfprocessing-overview%})|[SkiaImageFormatProvider]({%slug radpdfprocessing-formats-and-conversion-image-using-skiaimageformatprovider%}) Export only
[OcrFormatProvider]({%slug radpdfprocessing-formats-and-conversion-ocr-ocrformatprovider%}) Import only | ![DPL Ninja](images/dpl-formats.png) diff --git a/knowledge-base/extract-text-from-pdf.md b/knowledge-base/extract-text-from-pdf.md new file mode 100644 index 00000000..dfafba8f --- /dev/null +++ b/knowledge-base/extract-text-from-pdf.md @@ -0,0 +1,51 @@ +--- +title: Extracting Text from PDF Documents +description: Learn how to extract the text from a PDF document using RadPdfProcessing from the Telerik Document Processing libraries. +type: how-to +page_title: How to Extract the Text from PDF documents +slug: extract-text-from-pdf +tags: pdf, document, processing, text, extract, content +res_type: kb +ticketid: 1657503 +--- + +## Environment + +| Version | Product | Author | +| ---- | ---- | ---- | +| 2025.1.128| RadPdfProcessing |[Desislava Yordanova](https://www.telerik.com/blogs/author/desislava-yordanova)| + +## Description + +Learn how to extract the text content in a PDF document. + +## Solution + +Follow the steps: + +1\. Import the PDF document using the [PdfFormatProvider]({%slug radpdfprocessing-formats-and-conversion-pdf-pdfformatprovider%}). + +2\. Export the RadFixedDocument's content to text using the [TextFormatProvider]({%slug radpdfprocessing-formats-and-conversion-plain-text-textformatprovider%}). Thus, if the PDF document contains text fragments, it will be exported to the plain text result. + +```csharp + string filePath = "input.pdf"; + PdfFormatProvider pdf_provider = new PdfFormatProvider(); + RadFixedDocument fixed_document; + using (Stream stream = File.OpenRead(filePath)) + { + fixed_document = pdf_provider.Import(stream); + } + Telerik.Windows.Documents.Fixed.FormatProviders.Text.TextFormatProvider provider = new Telerik.Windows.Documents.Fixed.FormatProviders.Text.TextFormatProvider(); + + string documentContent = provider.Export(fixed_document); + Debug.WriteLine(documentContent); +``` +>important However, depending on the internal document's content, the **TextFormatProvider** may not be applicable for covering all the cases. A common scenario is a document with scanned images which contain text information. In this case, the above approach wouldn't parse the content to plain text because all the text inside is actually not text but [Path]({%slug radpdfprocessing-model-path%}) elements. Here comes the benefit of using the [OcrFormatProvider]({%slug radpdfprocessing-formats-and-conversion-ocr-ocrformatprovider%}) allowing you to convert images of typed, handwritten, or printed text into machine-encoded text from a scanned document. + +## See Also + +- [RadPdfProcessing]({%slug radpdfprocessing-overview%}) +- [OcrFormatProvider]({%slug radpdfprocessing-formats-and-conversion-ocr-ocrformatprovider%}) +- [TextFormatProvider]({%slug radpdfprocessing-formats-and-conversion-plain-text-textformatprovider%}) +- [Summarizing the Text Content of PDF Documents using Text Analytics with Azure AI services]({%slug summarize-pdf-content%}) + diff --git a/knowledge-base/images/azure-ai-key.png b/knowledge-base/images/azure-ai-key.png new file mode 100644 index 00000000..eae7100b Binary files /dev/null and b/knowledge-base/images/azure-ai-key.png differ diff --git a/knowledge-base/summarize-pdf-content.md b/knowledge-base/summarize-pdf-content.md new file mode 100644 index 00000000..e0be8c57 --- /dev/null +++ b/knowledge-base/summarize-pdf-content.md @@ -0,0 +1,148 @@ +--- +title: Summarizing the Text Content of PDF Documents using Text Analytics with Azure AI services +description: Learn how to summarize the text content from a PDF document using RadPdfProcessing and Text Analytics with Azure AI services. +type: how-to +page_title: How to Summarize the Text Content of PDF documents using Text Analytics with Azure AI services +slug: summarize-pdf-content +tags: pdf, document, processing, text, summarize, summary, content, azure +res_type: kb +ticketid: 1657503 +--- + +## Environment + +| Version | Product | Author | +| ---- | ---- | ---- | +| 2025.1.128| RadPdfProcessing |[Desislava Yordanova](https://www.telerik.com/blogs/author/desislava-yordanova)| + +## Description + +Learn how to summarize the text content of a PDF document using [Text Analytics with Azure AI services](https://learn.microsoft.com/en-us/azure/synapse-analytics/machine-learning/tutorial-text-analytics-use-mmlspark). + +## Solution + +Follow the steps: + +1\. Before going further, you can find listed below the **required** assemblies/NuGet packages that should be added to your project: + +* [Azure.AI.TextAnalytics](https://www.nuget.org/packages/Azure.AI.TextAnalytics) +* Telerik.Documents.Fixed +* Telerik.Documents.Core +* Telerik.Zip + +2\. It is necessary to generate your Azure AI key and endpoint: [Get your credentials from your Azure AI services resource](https://learn.microsoft.com/en-us/azure/ai-services/use-key-vault?tabs=azure-cli&pivots=programming-language-csharp) + +![Azure AI key](images/azure-ai-key.png) + +3\. [Extract the text content from a PDF document]({%slug extract-text-from-pdf%}). + +4\. Use the custom implementation to summarize the text content extracted in step 3: + +```csharp + static void Main(string[] args) + { + Telerik.Windows.Documents.Fixed.FormatProviders.Pdf.PdfFormatProvider pdf_provider = new PdfFormatProvider(); + Telerik.Windows.Documents.Fixed.FormatProviders.Text.TextFormatProvider text_provider = new TextFormatProvider(); + Telerik.Windows.Documents.Fixed.Model.RadFixedDocument document = pdf_provider.Import(File.ReadAllBytes("PdfDocument.pdf"), TimeSpan.FromSeconds(10)); + string documentTextContent = text_provider.Export(document); + + AzureTextSummarizationProvider summarizationProvider = new AzureTextSummarizationProvider(azure_key, azure_endpoint); + string summary = summarizationProvider.SummarizeText(documentTextContent).Result; + + Console.WriteLine(summary); + } + + public class AzureTextSummarizationProvider + { + private string languageKey; + private string languageEndpoint; + + public AzureTextSummarizationProvider(string azure_key, string azure_endpoint) + { + this.languageKey = azure_key; + this.languageEndpoint = azure_endpoint; + } + + public async Task SummarizeText(string text) + { + Azure.AzureKeyCredential credentials = new Azure.AzureKeyCredential(languageKey); + Uri endpoint = new Uri(languageEndpoint); + + Azure.AI.TextAnalytics.TextAnalyticsClient client = new Azure.AI.TextAnalytics.TextAnalyticsClient(endpoint, credentials); + + // Prepare analyze operation input. You can add multiple documents to this list and perform the same + // operation to all of them. + List batchInput = new List + { + text + }; + + Azure.AI.TextAnalytics.TextAnalyticsActions actions = new Azure.AI.TextAnalytics.TextAnalyticsActions() + { + ExtractiveSummarizeActions = [new Azure.AI.TextAnalytics.ExtractiveSummarizeAction()] + }; + + // Start analysis process. + Azure.AI.TextAnalytics.AnalyzeActionsOperation operation = await client.StartAnalyzeActionsAsync(batchInput, actions); + await operation.WaitForCompletionAsync(); + + System.Text.StringBuilder stringBuilder = new System.Text.StringBuilder(); + // View operation status. + stringBuilder.AppendLine($"AnalyzeActions operation has completed"); + stringBuilder.AppendLine(); + + stringBuilder.AppendLine($"Created On : {operation.CreatedOn}"); + stringBuilder.AppendLine($"Expires On : {operation.ExpiresOn}"); + stringBuilder.AppendLine($"Id : {operation.Id}"); + stringBuilder.AppendLine($"Status : {operation.Status}"); + + stringBuilder.AppendLine(); + // View operation results. + await foreach (Azure.AI.TextAnalytics.AnalyzeActionsResult documentsInPage in operation.Value) + { + IReadOnlyCollection summaryResults = documentsInPage.ExtractiveSummarizeResults; + + foreach (Azure.AI.TextAnalytics.ExtractiveSummarizeActionResult summaryActionResults in summaryResults) + { + if (summaryActionResults.HasError) + { + stringBuilder.AppendLine($" Error!"); + stringBuilder.AppendLine($" Action error code: {summaryActionResults.Error.ErrorCode}."); + stringBuilder.AppendLine($" Message: {summaryActionResults.Error.Message}"); + continue; + } + + foreach (Azure.AI.TextAnalytics.ExtractiveSummarizeResult documentResults in summaryActionResults.DocumentsResults) + { + if (documentResults.HasError) + { + stringBuilder.AppendLine($" Error!"); + stringBuilder.AppendLine($" Document error code: {documentResults.Error.ErrorCode}."); + stringBuilder.AppendLine($" Message: {documentResults.Error.Message}"); + continue; + } + + stringBuilder.AppendLine($" Extracted the following {documentResults.Sentences.Count} sentence(s):"); + stringBuilder.AppendLine(); + + foreach (Azure.AI.TextAnalytics.ExtractiveSummarySentence sentence in documentResults.Sentences) + { + stringBuilder.Append($"{sentence.Text} "); + } + } + } + } + + string result = stringBuilder.ToString(); + + return result; + } + } +``` + +## See Also + +- [Extracting Text from PDF Documents]({%slug extract-text-from-pdf%}) +- [OcrFormatProvider]({%slug radpdfprocessing-formats-and-conversion-ocr-ocrformatprovider%}) +- [TextFormatProvider]({%slug radpdfprocessing-formats-and-conversion-plain-text-textformatprovider%}) + diff --git a/libraries/radpdfprocessing/formats-and-conversion/ocr/ocrformatprovider.md b/libraries/radpdfprocessing/formats-and-conversion/ocr/ocrformatprovider.md index 82274d12..266a83aa 100644 --- a/libraries/radpdfprocessing/formats-and-conversion/ocr/ocrformatprovider.md +++ b/libraries/radpdfprocessing/formats-and-conversion/ocr/ocrformatprovider.md @@ -12,7 +12,7 @@ position: 1 Since _Q1 2025_ the __RadPdfProcessing__ library supports Optical Character Recognition (OCR). OCR is the electronic or mechanical conversion of images of typed, handwritten, or printed text into machine-encoded text from a scanned document. The library uses the **OcrFormatProvider** class that allows you to import an image which is returned as a [RadFixedPage]({%slug radpdfprocessing-model-radfixedpage%}). By default, the **OcrFormatProvider** takes as a parameter a **TesseractOcrProvider** implementation which is achieved by using the third-party library [Tesseract](https://github.com/tesseract-ocr/tesseract), however you can provide any [custom implementation]({%slug radpdfprocessing-formats-and-conversion-ocr-custom-ocrprovider%}) instead. -You can find all the dependencies and required steps for the implementation in the [Prerequisites]({%slug radpdfprocessing-formats-and-conversion-ocr-prerequisites%}) artilce. +You can find all the dependencies and required steps for the implementation in the [Prerequisites]({%slug radpdfprocessing-formats-and-conversion-ocr-prerequisites%}) article. ## TesseractOcrProvider Public API @@ -35,3 +35,4 @@ You can find all the dependencies and required steps for the implementation in t * [Prerequisites]({%slug radpdfprocessing-formats-and-conversion-ocr-prerequisites%}) * [Timeout Mechanism]({%slug timeout-mechanism-in-dpl%}) * [Implementing a Custom OCR Provider]({%slug radpdfprocessing-formats-and-conversion-ocr-custom-ocrprovider%}) +* [Extracting Text from PDF Documents]({%slug extract-text-from-pdf%}) diff --git a/libraries/radpdfprocessing/formats-and-conversion/pdf/pdfformatprovider/settings.md b/libraries/radpdfprocessing/formats-and-conversion/pdf/pdfformatprovider/settings.md index efad4733..1c625aa7 100644 --- a/libraries/radpdfprocessing/formats-and-conversion/pdf/pdfformatprovider/settings.md +++ b/libraries/radpdfprocessing/formats-and-conversion/pdf/pdfformatprovider/settings.md @@ -29,7 +29,7 @@ The **PdfFormatProvider** class offers the **ImportSettings** property which all |Property|Description| |----|----| -|**ReadingMode**|Gets or sets the mode for loading the document pages content on import. *Introduced in R2 2020*.
**ReadAllAtOnce**All document pages content will be loaded on import. This is the default behavior.
**OnDemand**The document pages content will be loaded on demand. This mode is made for use with PdfViewers and only the currently visible page will be loaded.
Currently, the **OnDemand** mode should be applied when using with viewers only.| +|**ReadingMode**|Gets or sets the mode for loading the document pages content on import. *Introduced in R2 2020*.
  • **ReadAllAtOnce**: All document pages content will be loaded on import. This is the default behavior.
  • **OnDemand**: The document pages content will be loaded on demand. This mode is made for use with PdfViewers and only the currently visible page will be loaded.
Currently, the **OnDemand** mode should be applied when using with viewers only.| |**CopyStream**|Gets or sets whether to copy the document stream on import. When false and ReadingMode is OnDemand, the original stream must be kept open while the document is in use. When true, the original stream can be disposed after import, regardless of the reading mode.| |Event|Description| @@ -50,16 +50,23 @@ The **PdfFormatProvider** class offers the **ExportSettings** property which all |----|----| |**StripJavaScriptActions**|Specifies if the PDF document should strip JavaScript actions on export. *Introduced in Q4 2024*.| |**ShouldEmbedFonts** (obsolete)|Specifies whether the font files should be embedded in the PDF document. The default value is *true* because the fonts should be embedded in the file by the PDF Standard. This means that by default the fonts are added which allows proper viewing on any device. If the fonts are not embedded and the file is viewed on a device that does not have the used fonts the font might be substituted. If the font is embedded in the PDF file, it ensures the most predictable and dependable results. As of **Q2 2024** the **ShouldEmbedFonts** property is obsolete. Use the **FontEmbeddingType** property instead.| -|**FontEmbeddingType**|The property controls what part of the fonts will be embedded in the file offering the following options:
**None**Does not embed fonts.
**Full**Fully embeds fonts.
**Subset**Embeds only the used characters subset of the fonts. This is the default approach.
The subset export option is currently implemented **only** for TrueType fonts (.ttf).| +|**FontEmbeddingType**|The property controls what part of the fonts will be embedded in the file offering the following options:
  • **None**: Does not embed fonts.
  • **Full**: Fully embeds fonts.
  • **Subset**: Embeds only the used characters subset of the fonts. This is the default approach.
The subset export option is currently implemented **only** for TrueType fonts (.ttf).| |**IsEncrypted**|This property specifies if the document should be encrypted. The default value is *false*. You can specify the encryption algorithm by setting the **EncryptionType** property. The supported values are **AES256** and **RC4**.
**All passwords for revision 6 (AES-256) shall be based on Unicode**. Preprocessing of a user-provided password consists first of normalizing its representation by applying the "SASLPrep" profile (Internet RFC 4013) of the "stringprep" algorithm (Internet RFC 3454) to the supplied password using the Normalize and BiDi options.
This setting is ignored when __ComplianceLevel__ differs from __None__ as PDF/A compliant documents do not allow encryption.| |**UserPassword**|The password to be used if the document is encrypted. The default password is an empty string.| |**OwnerPassword**|The password that governs permissions for operations such as printing, copying, and modifying the document. The default password is an empty string.| -|**UserAccessPermissions**|Gets or sets the user access permissions. These permissions specify which access permissions should be granted when the document is opened with user access. In order to be applied, the **IsEncrypted** property should be set to *true*. This property specifies three types of user access permissions:
**PrintingPermissionType**Sets the permissions for document printing. Possible values:
**None**Specify no printing is allowed.
**LowResolution**Specify low resolution (150 dpi) printing is allowed.
**HighResolution**Specify printing on the highest resolution is allowed.
**ChangingPermissionType**Sets the permissions for making changes to the document. Possible values:
**None**Specify no document changes are allowed.
**DocumentAssembly**Specify inserting, deleting, and rotating page changes are allowed.
**FormFieldFillingOrSigning**Specify filling in form fields and signing existing signature fields changes are allowed.
**FormFieldFillingOrSigningAndCommenting**Specify commenting, filling in form fields, and signing existing signature fields changes are allowed.
**AnyExceptExtractingPages**Specify any changes except extracting pages are allowed.
**CopyingPermissionType**Sets the permissions for document copying. Possible values:
**None**Specify no copying is allowed.
**Copying**Specify copying is allowed.
**TextAccess**Specify that text access for screen reader devices for copying is allowed.
**NumberingFieldsPrecisionLevel**Represents precision level when updating numbering fields. When the Normal option is selected the fields are updated once, not taking into account if their new values would have shifted the already measured layout. In such cases, the results could be outdated. This is the MS Word-like behavior. If you need more accurate results, use NumberingFieldsPrecisionLevel.High where the fields are updated until their values become more accurate. This precision level is more accurate than NumberingFieldsPrecisionLevel.Normal but requires more resources.
| +|**UserAccessPermissions**|Gets or sets the user access permissions. These permissions specify which access permissions should be granted when the document is opened with user access. In order to be applied, the **IsEncrypted** property should be set to *true*. This property specifies three types of user access permissions: [Available UserAccessPermissions]({%slug radpdfprocessing-formats-and-conversion-pdf-settings%}#available-useraccesspermissions)| |**ImageQuality**|Specifies the quality with which images are exported to PDF. More information about how it works is available in [this article]({%slug radpdfprocessing-concepts-imagequality%}).
**.NET Standard** specification does not define APIs for converting images or scaling their quality. That is why to allow the library to export images different than Jpeg and Jpeg2000 or ImageQuality different than High, you will need to provide an implementation of the **JpegImageConverterBase** abstract class. This implementation should be passed to the **JpegImageConverter** property of the **FixedExtensibilityManager**. For more information check the [Cross-Platform Support]({%slug radpdfprocessing-cross-platform%}) help article.| -|**ImageCompression**|Sets the desired compression for the images when exporting. You can set one of the following values of the **ImageFilterTypes**:
**Default**The image compression will be preserved as it is in the original document.
**None**The images won't be encoded.
**FlateDecode**The images will be encoded with a FlateDecode filter. Compresses data using the zlib/deflate compression method.
**DCTDecode** The images will be encoded with a DCTDecode filter. Compresses data using a DCT (discrete cosine transform) technique based on the JPEG standard.
| -|**StreamCompression**|Gets or sets the content stream compression type. Possible Values are:
**None**The content streams won't be encoded.
**FlateDecode**Compresses data using the zlib/deflate compression method.
| -|**ComplianceLevel**|Specifies the PDF/A compliance level. It can have one of the following values:
**None**Specify no compliance level.
**PdfA1B**Specify PDF/A-1b compliance level.
**PdfA2B**Specify PDF/A-2b compliance level.
**PdfA2U**Specify PDF/A-2u compliance level.
**PdfA3B**Specify PDF/A-3b compliance level.
**PdfA3U**Specify PDF/A-3u compliance level.
The default value is __None__. For more information on PDF/A compliance, check the [PDF/A Compliance article]({%slug radpdfprocessing-howto-comply-with-pdfa-standard%}).| +|**ImageCompression**|Sets the desired compression for the images when exporting. You can set one of the following values of the **ImageFilterTypes**:
  • **Default**: The image compression will be preserved as it is in the original document.
  • **None**: The images won't be encoded.
  • **FlateDecode**: The images will be encoded with a FlateDecode filter. Compresses data using the zlib/deflate compression method.
  • **DCTDecode**: The images will be encoded with a DCTDecode filter. Compresses data using a DCT (discrete cosine transform) technique based on the JPEG standard.
| +|**StreamCompression**|Gets or sets the content stream compression type. Possible Values are:
  • **None**: The content streams won't be encoded.
  • **FlateDecode**: Compresses data using the zlib/deflate compression method.
| +|**ComplianceLevel**|Specifies the PDF/A compliance level. It can have one of the following values:
  • **None**: Specify no compliance level.
  • **PdfA1B**: Specify PDF/A-1b compliance level.
  • **PdfA2B**: Specify PDF/A-2b compliance level.
  • **PdfA2U**: Specify PDF/A-2u compliance level.
  • **PdfA3B**: Specify PDF/A-3b compliance level.**PdfA3U**Specify PDF/A-3u compliance level.
The default value is __None__. For more information on PDF/A compliance, check the [PDF/A Compliance article]({%slug radpdfprocessing-howto-comply-with-pdfa-standard%}).| |**ShouldExportXfa**|Specifies whether the PDF document should export XFA content (if any). Default value: *false*. Introduced in **Q1 2025**.| + +### Available UserAccessPermissions +|UserAccessPermission Type|Description| +|----|----| +|**PrintingPermissionType**|Sets the permissions for document printing. Possible values:
  • **None**: Specify no printing is allowed.
  • **LowResolution**: Specify low resolution (150 dpi) printing is allowed.
  • **HighResolution**: Specify printing on the highest resolution is allowed.
| +|**ChangingPermissionType**|Sets the permissions for making changes to the document. Possible values:
  • **None**: Specify no document changes are allowed.
  • **DocumentAssembly**: Specify inserting, deleting, and rotating page changes are allowed.
  • **FormFieldFillingOrSigning**: Specify filling in form fields and signing existing signature fields changes are allowed.
  • **FormFieldFillingOrSigningAndCommenting**: Specify commenting, filling in form fields, and signing existing signature fields changes are allowed.
  • **AnyExceptExtractingPages**: Specify any changes except extracting pages are allowed.
| +|**CopyingPermissionType**|Sets the permissions for document copying. Possible values:
  • **None**: Specify no copying is allowed.
  • **Copying**: Specify copying is allowed.
  • **TextAccess**: Specify that text access for screen reader devices for copying is allowed.
  • **NumberingFieldsPrecisionLevel**: Represents precision level when updating numbering fields. When the Normal option is selected the fields are updated once, not taking into account if their new values would have shifted the already measured layout. In such cases, the results could be outdated. This is the MS Word-like behavior. If you need more accurate results, use NumberingFieldsPrecisionLevel.High where the fields are updated until their values become more accurate. This precision level is more accurate than NumberingFieldsPrecisionLevel.Normal but requires more resources.
| >important The receiver of a PDF document must have the same fonts that were originally used to create it. If a different font is substituted, its character set, glyph shapes, and metrics may differ from those in the original font. This substitution can produce unexpected and unwanted results, such as lines of text extending into margins or overlapping with graphics. A PDF file can refer by name to fonts that are not embedded in the PDF file. In this case, a PDF consumer can use those fonts if they are available in its environment. This approach suffers from the uncertainties noted above. diff --git a/libraries/radpdfprocessing/formats-and-conversion/plain-text/textformatprovider.md b/libraries/radpdfprocessing/formats-and-conversion/plain-text/textformatprovider.md index f4f25f1e..c22fd52c 100644 --- a/libraries/radpdfprocessing/formats-and-conversion/plain-text/textformatprovider.md +++ b/libraries/radpdfprocessing/formats-and-conversion/plain-text/textformatprovider.md @@ -41,3 +41,5 @@ __Example 1__ shows how to use __TextFormatProvider__ to export __RadFixedDocume * [Plain text]({%slug radpdfprocessing-formats-and-conversion-plain-text-text%}) * [TextFormatProvider Settings]({%slug radpdfprocessing-formats-and-conversion-plain-text-settings%}) * [Timeout Mechanism]({%slug timeout-mechanism-in-dpl%}) +* [Extracting Text from PDF Documents]({%slug extract-text-from-pdf%}) +* [Summarizing the Text Content of PDF Documents using Text Analytics with Azure AI services]({%slug summarize-pdf-content%})