NOTE This guide is intended for engineers extending
smart-docs-parser
for their custom use-case. If you think this could be benificial to the community then please contribute tosmart-docs-parser
source-code using the generic guide here.
smart-docs-parser
supports custom OCR module as API argument. The module is required to implement a function of name extractDocumentText
which accepts document image url, API key as input and returns the list of string lines as output.
Add OCR API key (if any) to the configuration
"smart-docs-parser": {
"api_keys": {
"your-library-name": "YourAPIKEY"
}
}
// ES6 import statement
import SmartDocuments from 'smart-docs-parser';
import CustomOCR from '/path/to/ocr_file';
// Sample Request
const extractedDocumentDetails = await SmartDocuments.extractDocumentDetailsFromImage({
document_url: 'https://avatars2.githubusercontent.com/u/20634933?s=40&v=4',
document_type: 'PAN_CARD',
ocr_library: 'your-library-name',
custom_ocr: CustomOCR
});
Your OCR should expose a function called extractDocumentText
// ******************************************************* //
// Logic for API handlers starts here //
// ******************************************************* //
const extractDocumentText = ({ document_url: documentURL, api_key: apiKey }) => {
....
....
return {
raw_text: rawText
}
};
// ******************************************************* //
// Logic for API handlers ends here //
// ******************************************************* //
export default { extractDocumentText };
Your parser function should accept { document_url: string, api_key: string }
as input.
Your parser function should return { raw_text: Array<string> }
as output.
Please refer to sample_ocr.ts