Document Understanding (DU) platform now provides you the flexibility to choose the required pipeline during project set up. This helps you to optimize the hardware requirements depending on the customer needs and thereby offer competitive prices.
DU includes two active pipelines VESPA & SmartExtract and both of them are capable of extracting Insights & Table fields of interest from the documents.
However to get maximum advantage the recommended usage is as below.
Customers dealing with Forms type of documents which contains similar information, structured/semi-structured data, labeled values, tables should use SmartExtract pipeline. (Document type examples - Invoices, Purchase Order)
Customers requiring long text extraction from documents with paragraphs & sections should use VESPA pipeline. (Document type examples – Contract/Legal documents)
Customers dealing with both types of documents should use a combination of VESPA & SmartExtract pipelines.
Screenshot highlights "Extraction Pipeline selection" section in DU.
<To be added>
<To be added>
VESPA Field configurator feature was introduced in 7.1 release which enables user to add new fields/modify existing fields depending on the customer requirements. This feature is enhanced to improve the flexibility of this feature by including the provision for custom pre-processors.
Pre-processors helps to fine tune the extraction process and thereby yields better accuracy. This feature allows you to view the available pre-processors and modify/add new ones to suite the specific requirements of the customer. This is applicable for VESPA pipeline.
For feature list as of 7.2, Click hereClick here.
SL.No |
Area |
Product Feature |
Feature Description |
Release Version |
1 |
Document Intake |
Reads images & PDF documents |
Reads from TIFF, PNG image formats & PDF documents |
6.4.3 |
2 |
Support direct upload |
Supports direct upload of files, emails, FTP |
6.4.3 |
|
3 |
Prepare Documents |
Digitize documents |
Uses Tesseract for OCR |
6.4.3 |
4 |
Document Pre-processing |
Orientation Correction |
Identifies and corrects the orientation issues related to the document for better accuracy in text extraction. |
6.4.3 |
5 |
Grayscale |
Converts the documents to grayscale. Greyscale image is a kind of black and white or gray monochrome, composed exclusively of shades of gray. The contrast ranges from black at the weakest intensity to white at the strongest. |
6.4.3 |
|
6 |
Binarization |
Converts the document to binary. A binary image consists of pixels that can have one of exactly two colors, black and white. The process is based on the parameter “Threshold”. Valid values are within the range of 1 – 255 and any pixel above threshold value will be converted to white and others to black. |
6.4.3 |
|
7 |
External REST Call |
Invokes a REST API service to process the document. |
6.4.3 |
|
8 |
Handle secure PDFs |
Ability to handle secure PDF documents |
7.0 |
|
9 |
Document Recognition |
Splitting of bundled documents |
Ability to extract individual invoices from zip files |
6.4.3 |
10 |
Document Split - Scan page |
Ability to split invoices based on the scan page |
6.4.3 |
|
11 |
Document Split - Page no:s |
Ability to split invoices based on page no:s |
6.4.3 |
|
12 |
Language Classifier |
AI model to classify English vs non-English documents |
7.1 |
|
13 |
CMS Classifier |
AI model to classify MSA, SOW & Addendum documents |
7.0 |
|
14 |
Document Extraction |
Header fields |
Ability to extract data from header fields |
6.4.3 |
15 |
Tables - Single & Multi-page |
Ability to extract data from single & multi-page tables. |
7.0 |
|
16 |
Long Text & formated fields |
Ability to extract long text & formatted fields |
7.2 |
|
19 |
Document Types |
Extraction of Retail Invoices, SOW, MSA, Addendum, Annual Reports, PAN card is supported |
7.0 |
|
20 |
Document Post-processing |
Document Linking (available for CMS) |
Ability to link MSA, SOW & Addendum documents |
7.0 |
21 |
Exploratory Search (available for CMS) |
Ability to search across linked documents |
7.0 |
|
22 |
View |
View Extracted information |
GUI to review the extracted information (inclusive of candidates wherever available) |
6.4.3 |
27 |
Configure & Train |
Pipeline configuration |
Flexibility to choose extraction engine - Vespa/SmartExtract |
7.2 |
28 |
Update FOI configurations |
Provision to modify config file associated to an FOI |
7.1 |
|
29 |
Configure new FOI |
Provision to add new FOIs for data extraction by providing the appropriate configurations. |
7.1 |
|
30 |
Configure new document type |
Provision to add a new document type and corresponding FOIs by providing the appropriate configurations. |
7.1 |
|
31 |
Configure new pre-processors |
Provision to attach appropriate pre-processors to support data extraction |
7.2 |
|
32 |
Update pre-processors |
Provision to update pre-processors to support data extraction |
7.2 |
|
33 |
Feedback based learning |
Ability to utilize EITL feedback for training the model and using the trained model for predictions |
7.2 |
|
34 |
Export |
Excel download |
Excel based download for extracted fields |
7.0 |
35 |
NFR |
Data Archival |
Archive data based on the retention period |
7.0 |
|
Queue management |
Provision to manage document extraction from multiple channels via routing keys |
7.0 |
|
36 |
Optimizal hardware configuration |
DU hardware requirements can be optimized by deploying the required pipelines (Vespa or Smart Extract) as needed. Helps to offer competitive pricing to customers. |
7.2 |
EITL model trained with Navistar production invoices.
<To be updated>
Today, with improved DU capabilities we are catering to document types beyond invoices and there is a compelling need to have a generic PWF that can support multiple types of documents.
SmartVision PWF is introduced which offers you the basic review & approve capability for any document type.
Note: In fact, SmartVision PWF is a generalized version of Invoice Extraction PWF and includes all the functionalities which is currently available in IE PWF.
Personas – SVision Supervisor, SVision Reviewer, and Installation Engineer.
Document Types – Any (as set in the attached DU project)
Functionalities:
Ability to set up multiple PWF projects to support different document types
Document listing dashboard that provides the operational summary view, multiple search options & time zone based display
Automated document allocation to specific list of users
Document review & correction
Exception processing workflows – Reject/Escalate/New Invoices/Delete
Document export based on pre-defined schedule
Advanced Reporting & Analytics
Additionally, non-functional requirements supported today will continue with SmartVision PWF.
For feature list as of 7.2, Click hereClick here
SL. No |
Area |
Product Feature |
Feature Description |
Release Version |
1 |
PWF Project Creation |
Document Import Configuration |
Provision to configure inbound FTP location & execution schedule |
7.2 |
2 |
Auto-allocation Configuration |
Provision to configure |
7.2 |
|
3 |
Auto-allocation Configuration |
Provision to configure users to participate in auto allocation |
7.2 |
|
4 |
Export Configuration |
Provision to configure outbound FTP location & execution schedule |
7.2 |
|
5 |
FOI configuration |
1. Option to choose from available DU projects in same region |
7.2 |
|
6 |
Configure FOIs for manual data entry |
Provision to configure additional fields for manual data entry. (Data extraction not supported) |
7.2 |
|
7 |
Configure tables for manual data entry |
Provision to configure additional tables for manual data entry. (Data extraction not supported) |
7.2 |
|
8 |
Field Validations |
Provision to configure basic field validations (data type validations) for DU extracted and manual fields |
7.2 |
|
9 |
Field Transformations |
Provision to subset/transform extracted data |
7.2 |
|
10 |
External Service Configuration |
Provision to configure external service to enable business validations from preview screen. [eg: validations against master data, multi-field validations etc] |
7.2 |
|
11 |
Personas |
Provision to configure personas |
7.2 |
|
12 |
Document Listing Dashboard |
Document Summary View |
1. Cards to display the count of Documents by status |
7.2 |
13 |
Document Listing by batches |
Separate tabs available for each Card |
7.2 |
|
14 |
User Details View (Collapsible) |
Quick view of Document distribution @user level available for AP Supervisor |
7.2 |
|
15 |
Filter by User |
Provision to filter for unassigned Documents/specific user from User view. |
7.2 |
|
16 |
Filter by Datetime |
Provision to filter the Documents by batch run date & time. |
7.2 |
|
17 |
Search Documents |
Provision to search Document by full/part of file name (Contains) |
7.2 |
|
18 |
Multi-select |
Provision to select multiple Documents for Assign/Delete operations |
7.2 |
|
19 |
User time zone based display |
User time zone based display in Dashboard and other screens |
7.2 |
|
20 |
Document Allocation |
Auto-allocation |
Provision to automatically distribute Documents to AP Clerks. |
7.2 |
21 |
Assign Documents |
Provision to assign Documents to AP Clerks |
7.2 |
|
22 |
Document Review & Correction |
Review Documents |
Provision to view |
7.2 |
23 |
Split & Original Document view |
Provision to view Original & split Document files |
7.2 |
|
24 |
Document navigation |
1. Provision to navigate across Documents within a batch |
7.2 |
|
25 |
PDF Highlighting |
Highlight value of field in selection in the previewed document for reference |
7.2 |
|
26 |
Auto scroll toggle option |
Provision to turn on/off auto scroll |
7.2 |
|
27 |
Line item dock options |
Ability to dock the line item section at bottom or to right |
7.2 |
|
28 |
Split correction |
Ability to correct the split PDF in case of errors; user can specify individual page no:s or a page range or a combination of this. |
7.2 |
|
29 |
Candidate display |
1. Provide value suggestions for every extracted field based on model prediction |
7.2 |
|
30 |
Edit Document data |
1. Ability to |
7.2 |
|
31 |
Identify Duplicate Invoices |
Provision to identify duplicate Invoices (Invoice number & Vendor code combination) |
7.2 |
|
32 |
Tabular Data Extraction |
Improvements to tabular data extraction using SmartExtract engine |
7.2 |
|
33 |
Table/Column Rebounding |
Provision to extract tabular data based on user inputs. User can |
7.2 |
|
34 |
Point & Crop feature |
Point & Crop feature to copy data from PDF in one click and thereby eliminate manual key-in by the user. Applicable for all types of documents under Finance domain. |
7.2 |
|
35 |
Save Document |
Provision to save work in progress |
7.2 |
|
36 |
Shelve Document data |
1. Provision to save multiple versions for work in progress |
7.2 |
|
37 |
Approve Documents |
Ability to save & approve reviewed Documents |
7.2 |
|
38 |
Exception Processing |
Add New Documents |
Ability to split an Document record from original file and enter Document details manually. |
7.2 |
39 |
Reject Documents |
Ability to reject Documents and attach a reject reason |
7.2 |
|
40 |
Escalate Documents |
Ability to escalate Documents and attach an escalate reason |
7.2 |
|
41 |
Delete Documents |
1. Provision to delete incorrect Documents from Waiting for Approval, Approved, Rejected, Escalated tabs. |
7.2 |
|
42 |
Export Documents |
Sent Documents for downstream processing |
1. Ability to send completed batches for downstream processing. Batch will be send when all the Documents within the batch is approved/rejected. |
7.2 |
43 |
Preview option for Documents ready for export |
Ability to preview the Documents ready for export from custom layer |
7.2 |
|
44 |
Reporting |
Advanced Search for Reporting |
Provision to filter batches/Documents based on the status, datetime, user. |
7.2 |
45 |
Excel Download |
1. Provision to download the Documents (header & line item data) from each card |
7.2 |
|
46 |
APIs for custom reporting |
APIs from Advanced Search & Accuracy Analytics sections |
7.2 |
|
47 |
Advanced Analytics |
Accuracy Analytics |
1. Display accuracy indicator in listing & review screens for approved Documents. |
7.2 |
48 |
Time Analytics |
1. Displays average review time per Document and average extraction time taken per Document. |
7.2 |
|
49 |
Filter by date |
Provision to view the analytics data for a given duration |
7.2 |
|
50 |
NFR |
Audit Trail |
Maintain audit trail at Document level for the state transitions. |
7.2 |
51 |
Provision to plug in multiple data sources |
Provision to override DU extracted data from other data sources. |
7.2 |
|
52 |
Data Archival Framework |
1. Purge & Archival mechanism to systematically remove processed Documents |
7.2 |
|
53 |
E2E Traceability View |
Runlist page displays all the processes from FTP file intake to export of Documents. |
7.2 |
|
54 |
EITL Model Training |
Ability leverage user feedback for training the prediction model and utilizing the trained model for improved predictions. |
7.2 |
|
55 |
Document lock-out |
1. Lock the Document while user is editing in Document preview screen. |
7.2 |
|
56 |
Application time-out |
Graceful exit when user is inactive |
7.2 |
|
57 |
UI Mono repo |
Single UI repository for Document PWF enabling re-use |
7.2 |
SmartVision PWF is enabled with automated distribution of invoices capability. The PWF is enabled with the provision to configure automated distribution rules for invoices. The PWF enables automated distribution of invoices to users, and also automated alerts to Supervisor user. This gives the flexibility to add/remove users from auto allocation process depending on the business needs.
SmartVision PWF is enabled with the capability to persist custom layer data in PWF and use in conjunction with system extracted data.
SmartVision Review screen is enabled with following enhancements.
Systematically identify duplicate invoices and notify the reviewer during the review process. This helps the reviewer to timely reject the invoices.
Extend Point & Crop feature for newly added/split invoices also.
Re-order the display sequence of Insights & Line Items so as to help the reviewer to focus on the most commonly entered fields.