SmartOps FinOps v 7.2 Performance Test Report 7.2

Performance test was conducted for SmartOps Kubernetes based Infrastructure hosted in Azure Cloud.

Test environment was deployed with Smart Extract pipeline and its related containers only. Vespa pipeline and its related containers are excluded in this test executions.

The below table summarizes the hardware Infrastructure along with Node count used for the Smart Extract infrastructure.

Kubernetes Nodes

Hardware Infrastructure

Node Count

Min

Max

Application Node Pool

Components

DU (Smart-extract, smart-extract-predict, rest, pipeline, scheduler, invoice-split, page-split , invoice-image-ocr), Clones Engine, pwf-invoice-extraction-scheduler, pwf-invoice-extraction-listener, pwf-invoice-extraction-api, ie-ui

Azure D8sv3

CPU - 8 vCPU Core

RAM: 32 GB

Persistent Pool

Components

Mongo, MinIO, RabbitMQ, Elastic Search, Kibana

Azure D4sv3

CPU - 4 vCPU Core

RAM: 16 GB

MySQL

Azure Managed MYSQL

General Purpose, 2 Core(s), 8GB RAM, 100 GB (Auto Grow Enabled)

Performance Testing - Software Used

Tool	Version	Description
JMeter	5.1.1	Implementing Performance testing
Prometheus		Capture resource utilization on server side.
Grafana		Dashboard to view resource utilization.
Microsoft Excel		Analyzing test results &reports.
FinOps	7.2	SmartOps FinOps application.

FinOps Components & Autoscaling configurations

The below table provides details of docker containers of different components associated with FinOps from Document Understanding (DU) , Clones, PWF, Database & Message Broker.

This also provides detail of components identified for Autoscaling and the criteria defined for Autoscaling. As part of ensuring availability of non-scalable components to avoid failures of any components during document processing, 2 instances of each components are by default available in the Kubernetes cluster.

Stack Name	Container Name	Autoscaling Enabled	Scaling Criteria
Document Understanding	du-rest	Y	Based on CPU Usage
	du-scheduler	N
	du-pipeline	Y	Based on CPU Usage
	du-invoice-split	N
	du-tilt-correct	N
	du-smart-extract	N
	du-invoice-image-ocr	N
Clones	clones-sense-queue	N
Clones	clones-engine	Y	Based on CPU Usage
PWF	pwf-Invoice-extraction-listener-du	N
	pwf-invoice-extraction-listener	N
	pwf-invoice-extraction-api	N
	pwf-invoice-extraction-scheduler	N

Following are the details of replicas configured for Database & Message Broker in Kubernetes cluster MySQL is deployed as a Managed service at Azure.

Autoscaling Configuration

Following are the CPU threshold limits, Replicas (Minimum & Maximum) configured for each component identified for Autoscaling in Kubernetes cluster for FinOps.

Container name	CPU Threshold	min replicas	max replicas
Clones-engine	80%	2	4
du-core-nlp	80%	2	4
du-pipeline	80%	2	4
du-rest	80%	2	4

Smart Extract Container Configuration

Based on the optimal Invoice processing observed in previous tests, Smart Extract Engine and Invoice Image OCR POD replicas are set to 3. All the tests are conducted with this configuration.

DU Project Configuration

Document Selected for Processing

Test is executed using Invoice samples of 51 per batch, each Invoice has 2 pages the first page being the Invoice scan page and second page is the invoice page.

Different vendor-based sample invoices are used consisting of Invoices of different complexity categorized as Low, Medium, High.

Parameter	Value
Main File Type	Zip
Zip file contents (Combined, Individual)	1 PDF with combined Invoices
No of Invoices combined in the zip file	51 (102 pages)
Pages per Invoice document.	2 (1 scan and 1 invoice page)
Invoice Batch.	Batch consist of 51 Invoice samples each with 3 different variations of Invoices of Low, Medium, High complexity
Number of fields extracted per invoice	22
Insight Fields	14
Line Items	8
Infrastructure Variation - Document processing Engine	Smart Extract Exclusive

Performance Test -Summary

Comparison report before and after PDF search code changes. An addition of 3 minutes is noticed with PDF search code changes. This is the total time taken in DU to process 51 invoices used in the test.

The following graph represents the time-based execution of the different steps of DU as part of the processing of the 51 Invoice sample for one of the test executions.

The blue bar represents the duration of execution of the different DU steps. This also help us in identifying the parallel execution details of the DU steps.

Detailed Test Results

Clones Workflow Execution Time

It involves the time taken to execute the following workflow by clones engine as part of document processing.

Invoice_PWF_PushToDUSched is a scheduled workflow which polls the FTP folder to fetch the Invoice batch zip file and provides the zip file to the corresponding Document Understanding Project for data extraction.

Invoice_PWF_CheckRoleCondition workflow validates the role-based permission needed as part of the processing of invoice documents.

The below table summarizes the average time taken for the executions of different clones workflow.

Clones Workflow (Seconds)
Invoice Batch	Executions	CheckRoleCondition ‎	Invoice PWF Installation	PushToDUSched
Sample -1	Test1	7.13	2.11	9.59
	Test2	7.26	2.14	12.31
	Test3	7.19	2.23	10.35

Clones Workflow (Seconds)
Invoice Batch	Executions	CheckRoleCondition ‎	Invoice PWF Installation	PushToDUSched
Sample -1	Test1	14.43	1.65	8.55
	Test2	12.37	1.68	40.54
	Test3	10.77	1.57	11.27

Execution time of each steps in DU

Time taken for the execution of different steps in Document extraction in DU as part of Invoice extraction. There are several steps involved in the processing of an Invoice document by Documents Understanding application. PDF search which is introduced in 7.2 recently is one among such step.

DU Steps	Execution Time (Seconds)
DU Steps	AVG	MIN	MAX	Occurrence per batch	Extended Time per batch
INITIALIZE	1.56	0.37	4.23	155	241.99
VALIDATE	0.87	0.16	2.64	155	134.54
ZIP_EXTRACT	1.24	1.24	1.24	1	1.24
INV_DOC_SPLIT	426.20	426.20	426.20	1	426.20
INV_PAGE_SPLIT	10.27	2.62	19.41	51	523.79
INV_IMAGE_OCR	19.91	4.01	59.86	102	2030.92
SMART_EXTRACT_PREPROCESSOR	4.80	1.88	9.68	102	489.76
SMART_EXTRACT	5.43	3.79	8.96	102	554.34
SMART_EXTRACT_POSTPROCESSOR	2.96	0.60	8.38	102	302.16
SMART_EXTRACT_FOI	14.27	1.03	38.61	102	1455.29
TRIGGER_PARALLEL_VESPA_SE	5.86	1.80	18.19	102	597.85
INV_RESULT_AGGREGATE	0.44	0.20	1.78	51	22.57
FINALIZE	1.49	0.33	7.48	359	534.47

DU Steps	Execution Time (Seconds)
DU Steps	AVG	MIN	MAX	Occurrence per batch	Extended Time per batch
INITIALIZE	1.85	0.38	5.51	155	286.64
VALIDATE	1.01	0.20	3.62	155	156.47
ZIP_EXTRACT	1.21	1.21	1.21	1	1.21
INV_DOC_SPLIT	536.35	536.35	536.35	1	536.35
INV_PAGE_SPLIT	3.14	0.88	7.88	51	159.92
INV_IMAGE_OCR	23.63	3.72	97.73	102	2409.75
SMART_EXTRACT_PREPROCESSOR	5.59	2.05	13.32	102	570.07
SMART_EXTRACT	9.43	7.35	14.62	102	962.17
SMART_EXTRACT_POSTPROCESSOR	3.32	0.56	17.62	102	338.45
SMART_EXTRACT_FOI	16.29	1.19	46.76	102	1661.33
TRIGGER_PARALLEL_VESPA_SE	7.14	2.05	21.64	102	728.52
INV_RESULT_AGGREGATE	0.62	0.23	3.12	51	31.42
FINALIZE	2.00	0.38	10.71	359	719.68
PDF_SEARCH	9.45	1.30	44.58	102	964.09

Queue time of each step in DU

In each steps of Document extraction in DU there is a Queue phase associated. The below table summarizes the time spent in the Queue of each of the steps of Document extraction in Document Understanding application.

DU Steps	Queue Time (Seconds)
DU Steps	AVG	MIN	MAX	Occurrence per batch	Extended Time per batch
INITIALIZE	2.80	0.57	8.00	155	434.32
VALIDATE	3.00	0.68	10.13	155	465.61
ZIP_EXTRACT	0.71	0.71	0.71	1	0.71
INV_DOC_SPLIT	0.69	0.69	0.69	1	0.69
INV_PAGE_SPLIT	2.54	0.65	5.91	51	129.36
INV_IMAGE_OCR	204.00	0.76	673.37	102	20808.29
SMART_EXTRACT_PREPROCESSOR	117.42	0.91	761.61	102	11977.33
SMART_EXTRACT	1.74	0.59	5.87	102	177.69
SMART_EXTRACT_POSTPROCESSOR	74.14	0.82	147.11	102	7562.56
SMART_EXTRACT_FOI	140.59	0.85	813.30	102	14340.43
TRIGGER_PARALLEL_VESPA_SE	2.19	0.68	9.13	102	223.05
INV_RESULT_AGGREGATE	1.52	0.61	10.75	51	77.45
FINALIZE	1.57	0.53	9.51	359	563.87

DU Steps	Queue Time (Seconds)
DU Steps	AVG	MIN	MAX	Occurrence per batch	Extended Time per batch
INITIALIZE	3.52	0.65	12.63	155	545.65
VALIDATE	3.71	0.82	14.36	155	574.29
ZIP_EXTRACT	0.71	0.71	0.71	1	0.71
INV_DOC_SPLIT	0.80	0.80	0.80	1	0.80
INV_PAGE_SPLIT	623.66	99.51	1171.29	51	31806.70
INV_IMAGE_OCR	200.58	0.98	725.64	102	20459.26
SMART_EXTRACT_PREPROCESSOR	144.56	0.86	863.67	102	14745.54
SMART_EXTRACT	3.67	0.68	12.24	102	374.48
SMART_EXTRACT_POSTPROCESSOR	83.37	1.08	164.34	102	8503.77
SMART_EXTRACT_FOI	137.72	0.86	872.63	102	14047.80
TRIGGER_PARALLEL_VESPA_SE	2.83	0.82	9.69	102	288.91
INV_RESULT_AGGREGATE	1.96	0.73	8.15	51	99.99
FINALIZE	1.94	0.66	10.22	359	697.04
PDF_SEARCH	6.06	0.68	51.26	102	617.96

No of messages in each Queue of RabbitMQ used by DU

This involves metrics of messages in different Queues used by DU as part of the various steps of Documents processing by DU

It helps in identifying in which Queue the De-Queuing process is slow and helps to identify the component depending on the Queue which has a slow message exchange rate.

The below graph represents the count of messages in different queue used by each DU steps during test execution.

Performance Test – Infrastructure Resource Usage

Below are the metrics captured on maximum CPU cores & Memory used while executing the performance test using the using sample invoices consisting of 51 invoices with 17 each sample of Low, medium, High complexity-based invoices of different vendors.

As part of ensuring high availability 2 POD of each components are deployed in Kubernetes, the CPU core and Memory shown below is the maximum CPU cores & memory usage of both the POD of each core component of Document Understanding, Clones, Database, Message Broker using the Smart Extract documents processing engine.

Document Understanding Stack

Without PDF Search step

Clones Stack

Database & Message Broker

Performance Test Execution results

Raw data of each test executions from Document Understanding are available on the following SharePoint location - Test Run Data

Stack Name	Container Name	No of instances
Database	mongo	3
	minio	2
	elasticsearch	3
	mysql	Azure Managed Service
Message Broker	rabbitmq	3

DU Project setting	Value
Engine Pipeline	Smart Extract
Language Classifier	Disabled
Feedback Learning (HITL)	Disabled
Preprocessors Enabled	Barcode Page Split, Skip Dup Validation

Before PDF search	After PDF search	Impact
19.57	22.53	An addition of 3 minutes (for 51 invoices) after PDF search code changes.

Test Execution	DU Time Taken (Minutes)
Test Execution	First file processed	Last file processed
Test 1	3.48	19.46
Test 2	3.38	20.12
Test 3	3.46	19.13

Test Execution	DU Time Taken (Minutes)
Test Execution	First file processed	Last file processed
Test 1	3.00	22.27
Test 2	3.35	23.23
Test 3	3.08	22.10

Smart Vision v7.2-Performance Test Report

Test Environment – Infrastructure setup