Classification of documents is critical for most business processes and organizations which continuously look for ways to systematically handle it.
Document Classifiers enables the system to accurately identify the Document Type of the documents. Following Document Classifiers are introduced in Smart Vision 2.4:
Document Type Classifier
Zero shot document classifier: Follows a natural language inference-based approach
Keyword/Regex based classifier: Classification based on keywords or expressions provided by the user.
CMS classifier: For classifying MSA, SOW, and Addendums. (Enhancements to existing feature)
Hand Written Classifier: Systematically segregates handwritten documents from printed documents.
Follows a natural language inference based zero shot text classification approach.
To create a project for Zeroshot document classifier, perform the following:
Create a DU Project.
Navigate to the Preprocessor tab.
Select Document Classifier as Zero Shot Classifier as shown in FigureFigure.
Note: Keywords can be defined for Zero shot document classifier, if required.
Classification based on the keywords or expressions input by the user.
To create a project for Keyword/Regex document classifier, perform the following:
Create a DU Project.
Navigate to the Preprocessor tab.
Select Document Classifier as Keyword/REGEX Based Classifier as shown in FigureFigure.
Type the keywords/Regex for all the document types selected. This keywords are considered as input for document classification.
Note: You may provide multiple keywords using "|" (pipe seperator).
Keyword based classifier is enabled to support Spanish language document also.
To enable Keyword Classifier for Spanish invoice, perform the following:
Create a DU Project.
Navigate to the Preprocessor tab.
Enter the new document type for Spanish.
Select Document Classifier as Keyword/REGEX Based Classifier as shown in FigureFigure.
Type the keywords/Regex for English and Spanish keywords. This keywords are considered as input for document classification.
Note: You may provide multiple keywords using "|" (pipe seperator).
CMS classifier is upgraded to accept new keywords from the user. This would continue to be used in legal domain for classification of MSA, SOW & Addendums.
Upon classification, for each document type, user has the provision to configure the fields of interest for extraction.