Document Understanding consists of following features or enhancements:
Contents
In DU, users will have the provision to choose any one of the available OCR engines based on customer needs. Default OCR engine is Tesseract and other options include OCRmyPDF and Azure OCR. The configuration for additional OCR engines needs to be done during deployment.
To create a project with the required OCR engine, follow the steps mentioned below.
Login to DU.
To create a new project, click on Create New Project. This displays the Create Project screen as shown in FigureFigure.
Enter the basic details related to the project in Project Details tab:
Enter the name of the project in Project Name field.
Enter a brief description in the Description field.
Select the Document Type as required.
Click Save to save the Select Type configurations.
Select the required classifier from the Select Classifiers block. Available options are CMS Document Classifier and Language Classifier.
Click Save to save the Select Classifiers configurations.
Select the required option from OCR Configurations block as shown in FigureFigure.
Click Save to save the OCR Configurations configurations.
The scalable DU architecture broadly includes four layers – AI components, DU , PWF and SmartVision
AI Components - Train, test and infer AI components. Includes Classification, Extraction, OCR, Image/Document Enrichment/Correction
DU - Reduce pipeline to allow DU to get insights for any specific document type. Seamless integration with AI Components and global model library
PWF - Document Ingestion, document enrichment and classifications to be handled by the workflow. Supports customer specific configurations, pre-processors and post-processors. Allows seamless integration with AI components and DU and pre-built insights workflow
SmartVision - Create a business solution using a combination of workflows. E.g. Claims Settlement, Loan Processing, Exploratory Search, Document Linking etc.
This will enable timely customer onboarding as well as allow other teams to contribute post processors, classifiers, extractors etc.
Going forward, DU will display the following messages (in detail) when the project is not ready.
• Document Type Classifier is not selected
• Field configuration is not complete
• Incomplete Project Configuration