Comprehensive Different UiPath OCR Engines

Contents

Step by Step Guide to Using Different UiPath OCR Engines Different Types of Engine for Uipath OCRProblem StatementInput DataTest CasesOutput DataPrepare EnvironmentApproach to Solve ProblemHow to use Different OCR engine in UiPath#2 Using Google Cloud OCR –What is the difference between Google Cloud OCR and Google OCR in UiPath?#3 Using Microsoft OCR –# 4 Using Microsoft Azure ComputerVision OCR –#5 Using Abbyy OCR#6 Using Abbyy Cloud OCRSource Code Few Huddles Key Points you need to rememberConclusions

This is the descriptive blog post to use of OCR engine with UiPath.

In this detailed guide I’ll cover :

Different Types of OCR used in UiPath
How to choose the right OCR for your next automation
Working example with problem statement & different approach
Pro /Cons of different types of OCR
Lots more
Working workflow example

So if you want learn how to use different ocr engine with UiPath, You will love this case study and guide.

Let’s get started.

Different Types of Engine for Uipath OCR

Tesseract /Google OCR – This actually uses the open-source Tesseract OCR Engine, so it is free to use. Also, this processing is done on the local machine where UiPath is running.
Google Cloud OCR – This requires a Google Cloud API Key, which has a free trial.
Microsoft OCR – This uses the MODI OCR Engine, which is also free to use, and the processing is done locally like Google OCR.
Microsoft Cloud OCR – This uses the Microsoft Computer Vision API, which is also free to sign up for. Also known as Microsoft Azure ComputerVision OCR.
Abbyy OCR – This requires you to install Abbyy FineReader on your local machine and purchase a license.
Abbyy Cloud OCR – This requires a subscription. We will use Abbyy Cloud OCR for our use case.

There are few other options available but based on the various question asked on forums, I have selected the top six to perform the experiment with.

let me tell you working of all the ocr engine follow similar steps …so it wont be challenge for you if you would like to experiment with other OCR Tools.

In Case you love to work with Python for OCR You can read our detailed article on how to use OCR with Python in UiPath

Please feel free to reach me in case you wish to include any other OCR engine in the blogpost. I will be happy to include in the next version.

Problem Statement

With so many ocr-recognition engine available in market its quite obvious to have query on which one to be used ? which will solve my problem and so on …

The problem statement is generic in nature – What we are intended to do is .. to decide and see what works best with which OCR Engine…

We will build matrix by end of exercise to give idea about when to use which OCR…

So On High-level by the end of post you should be able to gain insight on –

Which OCR engine works best with UiPath
Differences in terms of processing when using Paid vs Free OCR
Which one is recommended for Handwritten Materials (like meals receipts, hotels invoices, taxi fares, parking receipts)
Which engines read low-quality scans perfectly
What to consider before starting the OCR Project
Best practices and guidelines

Input Data

As discussed in the problem statement we need to perform the task on different types of PDF so that we should be able to factor multiple things.

for this example we will play with following data set –

Sample Pdf File(Structure PDF- Say it 01-Invoice.pdf)
Short Story(Full Page Text – Say it 02-Short-Stories.pdf)
The invoice with sample text(Tabular Invoice say it 03-Invoice-sample.pdf)
Scanned Invoice with Handwritten Text(Say it 04-Scanned-Invoice-Handwriiten.pdf)
Bills with Handwritten text (Say it 05-good-hand-written-bill.pdf)

Test Cases

Running hundred of test cases are beyond the scope of this blog post and in case you wish to use any ocr engine in your production environment i will suggest to run regression on various input as none of them provide 100 % accurate result and Confidence Scores Matter.

For Test cases we will use the input pdfs listed above to be scanned with all types of ocr engines.

Output Data

In terms of output we will focus on –

Extracted data in (key, value) pair or excel sheet
Confidence Scores
Accuracy

Prepare Environment

First thing first – what all dependencies need to be added into project?

As we are working with six ocr engines at the same time so it requires multiple things such as installing uipath package getting some external package installed, sign-up on external website and create keys for integration etc…

How to use Different OCR engine in UiPath

#1.Using Tesseract/ Google OCR

The Tesseract OCR engine used in UiPath is updated now to version 4.0. That contains an OCR engine – libtesseract and a command line program – tesseract.
Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused online recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns.
UiPath.Core.Activities.GoogleOCR is the activity that is used with other activity to read text from UI elements or image using the Tesseract OCR Engine. It can be also used with other OCR activities, such asClick OCR Text,Hover OCR Text,Double Click OCR Text,Get OCR Text, andFind OCR Text Position.
For our example, we will use UiPath.PDF.Activities.ReadPDFWithOCR to read the input files and result will be stored in KeyValuePair <rectangle,string>variables.

Our work flow will look like this –

Comprehensive Different UiPath OCR Engines | Working Example (1)

We will talk about the details of the workflow in details at a later stage, for now, you need to focus on the details of UiPath.Core.Activities.GoogleOCR properties.

Its take input as Image type only
You should be able to provide details of AllowedCharacters& DeniedCharacters
The language field needs to contain the language file prefix, such as “ron” for Romanian, “ita” for Italian, and “fra” for French.
The scale is important to factor here – You need to specify higher values in case you are reading from scan images. The default value is 2.
Profile –
- None – if no preprocessing is required.
- Screen – Required for RDP application automation
- Scan – To be used with scanned images
- Legacy – Default settings for pre-processing images.

In the output you can get Text as extracted string and Result which contains words along with the screen position.

You can read more details at official documentation Link – here

#2 Using Google Cloud OCR –

You might be wondering what is the difference between two variants of OCR provided by google. You can find the details of difference below.

What is the difference between Google Cloud OCR and Google OCR in UiPath?

The difference is the engine as Google Could OCR is using the Google Cloud OCR engine and Google OCR is using Tesseract OCR Engine.

Also, Google OCR is using the Tesseract engine which is deployed locally (comes with UiPath Studio) and the image processing and text extraction is done locally, on your computer.

While Google Cloud OCR is uploading the image to be processed to Google server (cloud) and you get back the resulting text. So all the processing is done remotely on Google servers and you just get the result.

Google OCR is free while you need to pay for Google Cloud OCR (free trial is available with limitation on Usages)

Google Cloud Vision OCR works in the same way and the only difference would be in terms of properties you will be set while invoking the UiPath.Core.Activities.GoogleCloudOCR activity. Most of the properties are same for Vision OCR except the ApiKey (Obvious to use cloud console) and ResizeToMaxLimitIfNecessary to attempts downsizing the target image so that it does not exceed the size limit of the Google Cloud Vision engine.

#3 Using Microsoft OCR –

Microsoft uses the MODI OCR Engine, which is also free to use, and the processing is done locally like Google OCR.

The MODI (Microsoft Office Document Imaging) engine used by theMicrosoft OCRactivity relies on Microsoft technical support for Windows 7 and Windows 10.

Most of the input Properties are similar to Google OCR engine and it will output the extracted words along with their on-screen position as KeyValuePair

Sample workflow will look like this –

Comprehensive Different UiPath OCR Engines | Working Example (3)

# 4 Using Microsoft Azure ComputerVision OCR –

Similar to Google cloud vision you need to perform certain steps on Azure before you can start using the

You need to perform following steps before you work with Computer Vision API –

Signup for free account on Azure or Login using your pay-as-you-go account
Sign in into Azure portal and add Computer Vision
Check how to embed Computer Vision with quickstarts and documentation.

Extracting the text from images using Computer Vision API to extract printed and handwritten text from images/pdfs into machine-readable character stream is super easy all you need to know is the

Endpoints of vision api ;
Keys to connect those services

Bit caution here as azure provide two variants of Computer Vision API

Read API
OCR API

Azure Computer Vision OCR APIrecognizes printed text and supports a large variety of languages.

Azure Computer Vision Read APIrecognizes the handwritten and printed text, but temporary is available only in English.

The major difference among these two is that Read API uses the model that support only English language as of now while OCR supports more than 25 languages with auto detection and rotation of recognized text from Image.

Image Requirements –

The image must be presented in JPEG, PNG, GIF, or BMP format
The file size of the image must be less than 4 megabytes (MB)
The dimensions of the image must be greater than 50 x 50 pixels
- For the Read API, the dimensions of the image must be between 50 x 50 and 10000 x 10000 pixels.

Distill actionable information from images ( 5,000 transactions, 20 per minute.)

You need to note down the Endpoints & the Keys for your further processing.

#5 Using Abbyy OCR
For Abby OCR activity to work, you need to install ABBYY FineReader Engine and purchase a license for it. After installing ABBYY FineReader Engine you must activate it.
- You will need ABBYY FineReader Engine
- You will need a runtime key that is provided by UiPath. (you must contact the sales team for this)
- You must follow Instructions to install it. Pay attention to install x32(x86) version. If you use the provided installation instruction the command is using x86 by default.
You can read more details on steps that you need to do to install/activate ABBYY FineReader – here
#6 Using Abbyy Cloud OCR
- Abbyy Cloud OCR SDK supports the recognition of printed text in more than 200 languages, including most Asian languages: Chinese, Japanese, Korean, Arabic, Farsi, Vietnamese, Thai and others using industry leading FineReader OCR technology.
- Abbyy Cloud OCR SDK recognizes both printed and hand-printed text within specific fields (zonal OCR).
- Its Cloud OCR recognition features are used for reading invoices, receipts, bills, business cards and many other document category. Not Only this it also support handwritten or manually filed forms extraction as well.
- Convert image/PDF to searchable PDF, PDF/A
- Convert image/PDF to Microsoft Word, Excel, PowerPoint
To start working with ABBYY Cloud OCR you need to setup things similar to Google & Microsoft Vision API.
1. Check UiPath.Abbyy.Activities.AbbyyCloudOCR in case not enable you can enable using the managed package from studio
2. You need ApplicationID, Password, ServerUrlto be used with AbbyyCloudOCR Activity. So…You need to create a new Application after signup on their cloud platform
3. Once you create the new application you need to note down –
  - Display Name:RPABOTSWORLD
  - Application ID:dd3410e2-e883-xxxx-xxxx-b4de7dd3d40f (Something like this)
4. You need to note down your password as well along with the server URL for required properties configuration in the workflow.
5. You will not see the password on web console it will be send to you on your email, However you can reset it from portal.
Your sample workflow will look like this-
The key important properties here are given below. Few other common properties have same meaning as in other ocr engine processing.
- ApplicationID– The application ID provided when subscribing to the Abbyy Cloud OCR service.
- Password– The password provided when subscribing to the Abbyy Cloud OCR service.
- ServerUrl– The Server URL provided when subscribing to the Abbyy Cloud OCR service.
Must Read – Comparison Cloud OCR SDK vs. FineReader Engine SDK https://abbyy.technology/en:features:comparisons:comp_onlinesdk-fre
Source Code
You can visit the code repository to download the code from Git Hub in case you wish to try your hands.
As mentioned above this code contains the urls and API keys etc as it is to avoid any confusion for learners. However those will not work and you need to adjust the values of API KEY, URL, PASSWORD etc. for code to run.
Download Link here
Source Code + Input PDF
You can Check the raw Output result in Excel in the Output folder of the Code.
However here are the observation of RPABOTSWORD Team on Trying Different OCR with Uipath.
There were few issues faced while this code for Uipath Ocr example was being written. You might face similar issue so we have also listed down those for your help.
#1. Microsoft Cloud Ocr Was not getting connect with Uipath and throwing below error-
UiPath.10:43:03.4398 Fatal UiPath.Vision.OCR.OCRException: MicrosoftAzureComputerVisionErrorRunEngine —> System.Net.Http.HttpRequestException: An error occurred while sending the request. —> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. —> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. —> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
This was due to the fact that one of the required DLL assembly for public key was missing and we need to update the Computer vision package from Manage Package Screen.
This can be identified by looking at the Studio Logs.
21:20:34.8925 => [ERROR] [UiPath.Studio.exe] [24] $LoadAssembly: UiPath.CV, Version=19.12.0.0, Culture=neutral, PublicKeyToken=null: System.IO.FileNotFoundException: Could not load file or assembly ‘Emgu.CV.World, Version=3.4.1.2984, Culture=neutral, PublicKeyToken=null’ or one of its dependencies. The system cannot find the file specified.
File name: ‘Emgu.CV.World, Version=3.4.1.2984, Culture=neutral, PublicKeyToken=null’
#2 Issue with the Maximum Size of the PDF file with Google Cloud OCR Engine
Read PDF With OCR: Error performing OCR: Request payload size exceeds the limit: 10485760 bytes. GoogleCloudErrorInvalidResponse
You need to check the MaxSizeLimit Property for GoogleCloudOCR To fix this issue.
1. You should note that in many cases, in order to get better OCR results, you’ll need toimprove the qualityof the image you are giving to OCR engine.
- Unsurprisingly, the paid OCR engines performed the best, especially with scanned documents. None of the engines read low-quality scans perfectly, but the cloud options were closest.
- If OCR is a key part of your project, I recommend trying all of your available options with Uipath OCR for the specific document types you’re working with to find the best option that works within your project budget.
- OCR is all about experimenting with different settings so you need to modify the scale, dpi or sometimes you might need to pre-process the image for better result.
References –
1. Various OCR Activities – https://docs.uipath.com/studio/docs/ocr-activities
2. https://docs.uipath.com/activities/docs/microsoft-azure-computer-vision-ocr

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.

Comprehensive Different UiPath OCR Engines | Working Example (2024)