PDF-XChange - Tracker PDF Viewer - TIFF-XChange - Image-XChange - XMF-XChange - Raster-XChange - Support

Moderators: Tracker Support, Paul - Tracker Supp, Chris - Tracker Supp, Vasyl-Tracker Dev Team, Ivan - Tracker Software, Tracker Supp-Stefan

 
bulubuluplopplop
User
Topic Author
Posts: 5
Joined: Fri May 22, 2015 12:37 pm

OCR speed and CPU

Fri Oct 09, 2015 4:16 pm

Hello,
I'm using OCR on some long pdf docs.

The OCR is very slow, but use only 25 % of my CPU capacity.
Is it possible to make it use more CPU and be faster ?

thank you
 
User avatar
Will - Tracker Supp
Site Admin
Posts: 5843
Joined: Mon Oct 15, 2012 9:21 pm
Location: Chemainus, BC
Contact:

Re: OCR speed and CPU

Fri Oct 09, 2015 7:52 pm

Hi bulubuluplopplop,

Thanks for the post - can you please advise on how long OCR takes and supply a same document that you're working with?

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support
http://www.tracker-software.com
 
bulubuluplopplop
User
Topic Author
Posts: 5
Joined: Fri May 22, 2015 12:37 pm

Re: OCR speed and CPU

Sun Oct 11, 2015 2:52 pm

Hello,
the OCR process takes about 30 minutes for a pdf document of 100 pages.
 
User avatar
Will - Tracker Supp
Site Admin
Posts: 5843
Joined: Mon Oct 15, 2012 9:21 pm
Location: Chemainus, BC
Contact:

Re: OCR speed and CPU

Mon Oct 12, 2015 5:44 pm

Hi bulubuluplopplop,

I've not experienced any issues like this, nor have I heard reports of others experiencing the problem, so I would need to see a specific sample.

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support
http://www.tracker-software.com
 
claude vidal
User
Posts: 74
Joined: Wed Mar 09, 2016 12:47 am

Re: OCR speed and CPU

Tue Mar 22, 2016 12:38 am

I did a speed test on the scan + OCR of a one page document.

The scanner is a Canon LIDE 220. I used PDFXchange Editor 316.1 and the software bundled with the scanner (Canon quick menu). Both scans were at 300 DPI, auto-detect for color, OCR was using the same language (French) and OCR set to auto after scan.

Canon's program completed the entire task in 10 seconds.

PDFXchange took 39 seconds to scan and another 62 seconds to OCR for a total of 101 seconds. That's 10 times slower. The OCR output quality was a bit better with Canon's program.

I understand PDFXchange's main purpose is not scan & OCR, so I expected a bit slower performance. Is 10 times slower to be expected? Anything I could tune while retaining the same quality output?
 
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 11551
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR speed and CPU

Tue Mar 22, 2016 11:30 am

Hello Claude,

A 10 times difference is certainly significant, but please note that the scanner does everything internally while we need to obtain the information from the scanner (which for high DPI scans over slower USB connection might take longer than the scanner needs internally to process), then we grab the image data and start OCRing it, and optimizing the image (also a slow process) - while the scanner will prepare the PDF internally, and then the already generated file will be written to disk.
May I ask you to include in an archive and attach here the following files:
- a scaned image (as 300 dpi png/jpeg)
- the .pdf your device produces (with the OCR layer)
- the .pdf the Editor produces

Please note that the Viewer is a deprecated product now and no longer developed so I would ask you to download and test with the Editor instead:
https://www.tracker-software.com/produc ... nge-editor

Regards,
Stefan
 
claude vidal
User
Posts: 74
Joined: Wed Mar 09, 2016 12:47 am

Re: OCR speed and CPU

Tue Mar 22, 2016 3:50 pm

Hi Stefan,

Thanks for looking into this.

Given that my initial speed test was using a document with sensitive personal information, I ran another test:
- 300 dpi
- OCR language English this time
- Color
- Scanned image contains more images, less text

For this test, the speed ratio was 6.6 instead of 10 for the previous test.

I attached the requested files, including a summary of my system specs. Please note that, although the port is USB 3, the Canon scanner is USB 2. Also, I left the "Image to PDF" options to their PDFXchange default.

P.S. Your posts mentions the Viewer as being deprecated: as indicated in my first post, I use PDFXchange Editor 316.1 (Pro)
Attachments
Scan & OCR speed test 160322.zip
(4.97 MiB) Downloaded 74 times
 
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 11551
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR speed and CPU

Tue Mar 22, 2016 5:04 pm

Hi claude vidal,

Apologies for missing the part where you mention you use the Editor. But the topic itself is in the Viewer section, so that's what provoked my comment in the above post.

OCRing the page in the Editor using File -> New Document -> From Images, and leaving all options but the OCR to defaults produced the attached file in 12 seconds.
I notice that the file you've provided that was created by our Editor actually has two images in it. Maybe the processes the scanner uses when doing the PDF internally and when sending image data to external products are different and this causes the significant increase in processing time.

Can you try at your end and compare the speed of the scanner itself with the speed of the Editor generating the PDFs internally via File -> New Document -> From Images and tell us how it fares that way?

Regards,
Stefan
 
claude vidal
User
Posts: 74
Joined: Wed Mar 09, 2016 12:47 am

Re: OCR speed and CPU

Wed Mar 23, 2016 8:00 pm

The object of my previous tests was created like this:
1- Print a page from an existing PDF document
2- Scan & OCR that page image

I did the following test. I loaded the original PDF document into the Editor and asked it to OCR that same page. I thought bypassing the analog part of scanning a printed page would help. Unfortunately, not by much: it took 50 seconds for that single page with well defined characters. But then you did it in 12 seconds, go figure.

I'm still puzzled how a low cost scanner can scan, OCR and transfer the same page in 10 seconds: it doesn't have the CPU cycles nor the memory to work with as my PC.

Bottom line: I'll stick with the scanner for OCR. This does not take away the great features of PDFXchange; as I said initially, I don't see scanning and OCR as the main focus for PDFXchange in handling PDF.

I may try again with 317. I know you guys had issues, so is availability on for tonight?
 
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 11551
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR speed and CPU

Thu May 12, 2016 11:42 am

Hello claude vidal,

Indeed it is rather unusual that you get such high OCR times, as my CPU is 3-4 years old now, so it's unlikely it is 3-4 times faster than yours. Indeed we released a new build since the last time we wrote in this forum topic, so please do update to build 317.1, and let us know if this gives you any different result.

Regards,
Stefan
 
DIV²
User
Posts: 33
Joined: Fri Jun 23, 2017 1:47 am

Re: OCR speed and CPU

Sun Aug 27, 2017 12:52 pm

Here I have tested out the OCR capabilities on a colour 300dpi scan of German text that includes both roman fonts and fraktur (blackletter) fonts.

I compared Adobe Acrobat 7.0 OCR performance with three different accuracy settings in PDF-XChange Editor 6.0.

In summary, Acrobat is always much faster, but Editor is more accurate if either "Medium" or "High" accuracy is chosen:
  • Acrobat. GERMAN/EXACT/600DPI: 7 seconds, very poor accuracy (Note: this has no specific fraktur recognition capability.)
  • Editor, LOW ACCURACY: 60 seconds, poor accuracy
  • Editor, MEDIUM ACCURACY: a 24 seconds, good accuracy
  • Editor, HIGH ACCURACY: 30 seconds, good accuracy

As shown, actual accuracy of the results is practically equivalent for the first two and the last two.

Please note that times are for just one single page.
I consider 20–30 seconds to be rather slow for just one page. However, the increased accuracy of the results makes it worthwhile.
60 seconds for a single page is completely impractical, especially when the results are poor.

Due to copyright issues I am not going to post the entire document, but attached hereto are an overview of the page analysed, an enlarged view of the sample text, and various OCR results.

—DIV

N.B. As suggested also elsewhere, the newer versions of Adobe Acrobat can be expected to be much better than the old version (7.0) tested here!
Attachments
Duden_OCR-options.pdf
(5.54 KiB) Downloaded 9 times
Duden_OCR-test-text.png
Duden_OCR-test-page.png
 
User avatar
Will - Tracker Supp
Site Admin
Posts: 5843
Joined: Mon Oct 15, 2012 9:21 pm
Location: Chemainus, BC
Contact:

Re: OCR speed and CPU

Mon Aug 28, 2017 7:16 am

Hi DIV,

As per my post in the other topic, this shold be addressed in the new OCR.

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support
http://www.tracker-software.com

Who is online

Users browsing this forum: No registered users and 1 guest