Page 1 of 1

Low performance of the OCR_MakeSearchable method

Posted: Sat Nov 23, 2013 10:50 am
by igor_p
Hello,

We are using your OCR component in our ASP.NET application. Everything works for us correctly, however we are wondering about low performance of the OCR_MakeSearchable method. We compared your product to the Quick Scan Pro solution and the QSP was definitely faster.
OCRing the file below (it's converted to PDF before OCRing) using the OCR_MakeSearchable() method takes about 3 minutes. It's pretty long. For comparison, QSP has processed the same document in about 20 seconds.


Is there any way to make this method faster? Are you planing improve performance in next release?

Our PXO_Options are:

Code: Select all

				OCR.PXO_Options options = new OCR.PXO_Options();
				options.blacklist = String.Empty;
				options.whitelist = String.Empty;
				options.DataPath = OcrUtility.GetLanguagesDirectory();
				options.ImageFlags = (uint)OCR.OCR_ImageProcessingFlags.OCR_Image_SuppressOutput;
				options.lang = OCR.PXO_Language.PXO_English;
				options.raster_dpi = 300;
				options.RegionMode = OCR.OCR_RegionMode.OCR_Auto;
				options.reserved = 0;
Out test machine has got 2 cores and 4gb physical memory. We use 1.0.14.1 version of the ocrtools.dll.

PS. When are you going to release a new version of the ocrtools? We are looking forward a two new abilities. First is the full orientation detection while OCRing. Second is the new functionality which places only text layer to the original PDF file. Now, we are dealing with it by using the OCR_Image_SuppresOutput setting and PlaceContents() method from the xcpro40.dll. Unfortunately, it prevents us from using the rotation mode.

Thanks in advance and best regards,
Igor

Re: Low performance of the OCR_MakeSearchable method

Posted: Mon Nov 25, 2013 8:59 am
by Tracker Supp-Stefan
Hi Igor,

Thanks for the post. I will pass it to our OCR SDK experts and we will post back here a bit later with further advise!

Regards,
Stefan

Re: Low performance of the OCR_MakeSearchable method

Posted: Mon Nov 25, 2013 7:01 pm
by Walter-Tracker Supp
I've looked at your document, and while I don't see nearly the poor performance you do, I do note that it takes longer than typical files. You will notice that pages 8 and 9 are the culprits, and this is because the layout of those pages are difficult for our engine to process, due to the complexity. This is a bit of an edge case for our engine (other OCR engines likewise have their own edge cases). Our benchmarks have shown comparable performance to other offerings, across a broad spectrum of documents, however this particular case happens to be troublesome.

Your other requests are on our feature wishlist and we will roll them out as soon as reasonably practical.