Vision Text Recognize Text

The contents of this page are based on the original Firebase Documentation

You can use ML Kit to recognize text in images. ML Kit has both a general-purpose API suitable for recognizing text in images, such as the text of a street sign, and an API optimized for recognizing the text of documents. The general-purpose API has both on-device and cloud-based models. Document text recognition is available only as a cloud-based model. See the overview for a comparison of the cloud and on-device models.

Before you begin

If you want to use the cloud-based model, and you have not upgraded your project to a Blaze plan, do so in the Firebase console. Only Blaze-level projects can use the Cloud Vision APIs.
In the Google Cloud Console, enable the Cloud Vision API:
- Open the Cloud Vision API in the Cloud Console API library.
- Ensure that your Firebase project is selected in the menu at the top of the page.
- If the API is not already enabled, click Enable.
  If you want to use only the on-device model, you can skip this step.

Now you are ready to start recognizing text in images.

Recognize text in images

To recognize text in an image using either an on-device or cloud-based model, run the text recognizer as described below.

Run the text recognizer

var vision:VisionANE = VisionANE.vision;
var textRecognizer:TextRecognizer = vision.onDeviceTextRecognizer();

To use the cloud model:

Use of ML Kit to access Cloud ML functionality is subject to the Google Cloud Platform License Agreement and Service Specific Terms, and billed accordingly. For billing information, see the Firebase Pricing page.

var vision:VisionANE = VisionANE.vision;
var textRecognizer:CloudDocumentRecognizer = vision.cloudTextRecognizer();

// Or, to provide language hints to assist with language detection:
// See https://cloud.google.com/vision/docs/languages for supported languages
var options:CloudTextRecognizerOptions = CloudTextRecognizerOptions();
options.languageHints = new <String>["en", "hi"]
var textRecognizer:CloudDocumentRecognizer = vision.cloudTextRecognizer(options)

Create a VisionImage object using a Bitmapdata.

var visionImage:VisionImage = new VisionImage(bmpTextImage.bitmapData);

Then, pass the image to the process() method:

textRecognizer.process(visionImage, function (text:Text, error:TextError):void {
    if (error) {
        // ...
        return;
    }
    // Recognized text
}

Extract text from blocks of recognized text

If the text recognition operation succeeds, it will return a Text object. A Text object contains the full text recognized in the image and zero or more TextBlock objects. Each TextBlock represents a rectangular block of text, which contain zero or more TextLine objects. Each TextLine object contains zero or more TextElement objects, which represent words and word-like entities (dates, numbers, and so on).

For each TextBlock, TextLine, and TextElement object, you can get the text recognized in the region and the bounding coordinates of the region.

For example:

var resultText:String = result.text;
for each (var block:TextBlock in text.blocks) {
    var blockText:String = block.text;
    var blockConfidence:Number = block.confidence;
    var blockLanguages:Vector.<TextRecognizedLanguage>  = block.recognizedLanguages;
    var blockCornerPoints:Vector.<Point>  = block.cornerPoints;
    var blockFrame:Rectangle = block.frame;
    for each (var line:TextLine in block.lines) {
        var lineText:String = line.text;
        var lineConfidence = line.confidence;
        var lineLanguages:Vector.<TextRecognizedLanguage> = line.recognizedLanguages;
        var lineCornerPoints:Vector.<Point>  = line.cornerPoints;
        var lineFrame:Rectangle = line.frame;
        for each (var element:TextElement in line.elements) {
            var elementText:String = element.text;
            var elementConfidence:Number = element.confidence;
            var elementLanguages:Vector.<TextRecognizedLanguage> = element.recognizedLanguages;
            var elementCornerPoints:Vector.<Point> = element.cornerPoints;
            var elementFrame:Rectangle = element.frame;
        }
    }
}

Recognize text in images of documents

To recognize the text of a document, configure and run the cloud-based document text recognizer as described below.

Use of ML Kit to access Cloud ML functionality is subject to the Google Cloud Platform License Agreement and Service Specific Terms, and billed accordingly. For billing information, see the Firebase Pricing page. The document text recognition API, described below, provides an interface that is intended to be more convenient for working with images of documents. However, if you prefer the interface provided by the sparse text API, you can use it instead to scan documents by configuring the cloud text recognizer to use the dense text model.

To use the document text recognition API:

To recognize text in an image using either an on-device or cloud-based model, run the text recognizer as described below.

Run the text recognizer

var vision:VisionANE = VisionANE.vision;
var textRecognizer = vision.cloudTextRecognizer();

// Or, to provide language hints to assist with language detection:
// See https://cloud.google.com/vision/docs/languages for supported languages
var options:CloudDocumentRecognizerOptions = new CloudDocumentRecognizerOptions();
options.languageHints = new <String>["en"];
var textRecognizer = vision.cloudDocumentTextRecognizer(options);

Create a VisionImage object using a Bitmapdata.

var visionImage:VisionImage = new VisionImage(bmpCloudDocumentImage.bitmapData);

Then, pass the image to the process() method:

textRecognizer.process(visionImage, function (document:DocumentText, error:TextError):void {
    if (error) {
        // ...
        return;
    }
    // Recognized text
}

Extract text from blocks of recognized text

If the text recognition operation succeeds, it will return a DocumentText object. A DocumentText object contains the full text recognized in the image and a hierarchy of objects that reflect the structure of the recognized document:

DocumentTextBlock
DocumentTextParagraph
DocumentTextWord
DocumentTextSymbol

For each DocumentTextBlock, DocumentTextParagraph, DocumentTextWord, and DocumentTextSymbol object, you can get the text recognized in the region and the bounding coordinates of the region.

For example:

var resultText:String = result.text;
for each (var block:DocumentTextBlock in text.blocks) {
    var blockText:String = block.text;
    var blockConfidence:Number = block.confidence;
    var blockRecognizedLanguages:Vector.<TextRecognizedLanguage> = block.recognizedLanguages;
    var blockBreak:TextRecognizedBreak = block.recognizedBreak;
    var blockCornerPoints = block.cornerPoints;
    var blockFrame:Rectangle = block.frame;
    for each (var paragraph:DocumentTextParagraph in block.paragraphs) {
        var paragraphText:String = paragraph.text;
        var paragraphConfidence:Number = paragraph.confidence;
        var paragraphRecognizedLanguages:Vector.<TextRecognizedLanguage> = paragraph.recognizedLanguages;
        var paragraphBreak:TextRecognizedBreak = paragraph.recognizedBreak;
        var paragraphCornerPoints:Vector.<Point> = paragraph.cornerPoints;
        var paragraphFrame:Rectangle = paragraph.frame;
        for each (var word:DocumentTextWord in paragraph.words) {
            var wordText:String = word.text;
            var wordConfidence:Number = word.confidence;
            var wordRecognizedLanguages:Vector.<TextRecognizedLanguage> = word.recognizedLanguages;
            var wordBreak:TextRecognizedBreak = word.recognizedBreak;
            var wordCornerPoints:Vector.<Point> = word.cornerPoints;
            var wordFrame:Rectangle = word.frame;
            for each (var symbol:DocumentTextSymbol in word.symbols) {
                var symbolText:String = symbol.text;
                var symbolConfidence:Number = symbol.confidence;
                var symbolRecognizedLanguages:Vector.<TextRecognizedLanguage> = symbol.recognizedLanguages;
                var symbolBreak:TextRecognizedBreak = symbol.recognizedBreak;
                var symbolCornerPoints:Vector.<Point> = symbol.cornerPoints;
                var symbolFrame:Rectangle = symbol.frame;
            }
        }
    }
}

Portions of this page are modifications based on work created and shared by Google and used according to terms described in the Creative Commons 3.0 Attribution License.

Firebase-ANE

Project setup

Configuring FirebaseANE

Analytics

Authentication

Dynamic Links

Google Sign In

Firestore

Configuring the ANE
Get Started
Add and Manage Data

Query Data

Messaging

One Signal

Configuring the ANE

Performance

Remote Config

Storage

Crashlytics

Vision

Detect faces

Scan barcodes

Label images

Recognize landmarks

Natural Language

Custom Models

Overview

External Links

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vision Text Recognize Text

Before you begin

Recognize text in images

Extract text from blocks of recognized text

Recognize text in images of documents

Extract text from blocks of recognized text

Clone this wiki locally