All,
I own a powerful 8 core (16 with hyper threading) Win10 64 bit PC with 32 GB of RAM whose power I'd like to employ for OCR.
I've just upgraded my installation to PDF Exchange Editor Plus V8 Build 335.0 with enhanced OCR plugin.
No matter what settings I chose in the OCR dialog or in Settings/Performance (16 threads), CPU consumption in Win10 task manager doesn't rise beyond 35% during OCR.
OCR of larger PDF's should be perfect for parallelization so I'd hope to find a way how the OCR plugin makes better use of my compute resources.
Thanks for your insights
Christoph
How to improve OCR performance
Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan
-
- Site Admin
- Posts: 17960
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
Re: How to improve OCR performance
Hello CHristoph,
I am checking with colleagues from the dev team to see if the EOCR engine is affected by these settings, and if not - what can be done.
Season's greetings,
Stefan
I am checking with colleagues from the dev team to see if the EOCR engine is affected by these settings, and if not - what can be done.
Season's greetings,
Stefan
-
- Site Admin
- Posts: 2353
- Joined: Thu Jun 30, 2005 4:11 pm
- Location: Canada
Re: How to improve OCR performance
Hi Christoph.
We found an issue that limits the number of threads that can be used for OCR, on x64 systems. We will fix it in the upcoming build.
Sorry for the inconvenience and thanks for the report.
Cheers.
We found an issue that limits the number of threads that can be used for OCR, on x64 systems. We will fix it in the upcoming build.
Sorry for the inconvenience and thanks for the report.
Cheers.
Vasyl Yaremyn
Tracker Software Products
Project Developer
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Tracker Software Products
Project Developer
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
-
- User
- Posts: 874
- Joined: Tue Jun 26, 2012 1:50 pm
Re: How to improve OCR performance
I only just noticed that I was still using 334, which was limited in its number of OCR threads (3 full load threads maximum). Just tested 336 and happy to say that it makes full use of all my CPU cores now. It creates more threads than CPU cores, which may or may not be intentional? But in the end it speeds up OCR considerably.
-
- User
- Posts: 874
- Joined: Tue Jun 26, 2012 1:50 pm
Re: How to improve OCR performance
Unfortunately with "Fine Page Content" the "Rasterizing" and especially "Applying results of recognition" parts seem to be mostly single-threaded and correspondingly can take a long time to complete.
-
- Site Admin
- Posts: 17960
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
Re: How to improve OCR performance
Hello Timur,
I will check with Vasyl if there can be any improvements in both of those steps and we will post any further news as soon as we get them!
Cheers,
Stefan
I will check with Vasyl if there can be any improvements in both of those steps and we will post any further news as soon as we get them!
Cheers,
Stefan
-
- Site Admin
- Posts: 17960
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
Re: How to improve OCR performance
Hello Timur,
Our devs said that they will investigate what can be done for those two steps of the OCR process, and I've made a ticket for it:
#5101: OCR Performance optimisations for "Fine Page Content" and "Rasterizing" steps of the process
So we will post again here as soon as there are any further news.
Regards,
Stefan
Our devs said that they will investigate what can be done for those two steps of the OCR process, and I've made a ticket for it:
#5101: OCR Performance optimisations for "Fine Page Content" and "Rasterizing" steps of the process
So we will post again here as soon as there are any further news.
Regards,
Stefan
-
- User
- Posts: 1
- Joined: Mon Feb 15, 2021 8:50 am
Re: How to improve OCR performance
Same problem here:
my CPU is a 16-core Ryzen 9 59050x with 32GB of RAM. I am running PDF-XChange Editor Plus (Version: 9.0 (Build 352.0) (Feb 4 2021; 17:55:44) 64bit) on Windows 10 Home (19041.1.amd64fre.vb_release.191206-1406).
When using OCR multi-threading is pretty much non-existent. Doing OCR on large files with several hundred pages sometimes takes over half an hour. CPU-utilization idles at around 5% all the time with only one core (constantly changing) being used at around 30-80%.
My first instinct was that the software is not very good at distributing the pages within a single document over different threads. So I tried OCR on a large number of files simultaniously using batch-processing in "PDF-tools". Same problem: CPU-utilization is around 5% and OCR takes forever.
I also tried changing multi-threading in the options from "automatic" to "16 cores" - no effect.
The weird thing is: Every once in a while with some files OCR does suddenly use 16 cores/32 threads at around 95% core-usage and everything works extremely fast and smooth. However, I could not establish any rules behind this behaviour so far (depending on file size or similar). It all seems quite random to me.
For the record: The problem is most annoying when I am using OCR because it does take forever to finish a job. But I have the impression that multi-threading does not work very well in general. For instance, when I am printing a large document to PDF using the "PDF X-Change Standard PDF printer" it also takes a very long time and CPU-utilization is mostly below 5% with only one core doing all the work.
I would be very grateful for a solution to this problem. Looking at my CPU and its extremely low utilization I assume I could cut the time for many jobs by over 90% if multi-threading would work properly.
Thanks in advance!
Sincerely,
Chris
my CPU is a 16-core Ryzen 9 59050x with 32GB of RAM. I am running PDF-XChange Editor Plus (Version: 9.0 (Build 352.0) (Feb 4 2021; 17:55:44) 64bit) on Windows 10 Home (19041.1.amd64fre.vb_release.191206-1406).
When using OCR multi-threading is pretty much non-existent. Doing OCR on large files with several hundred pages sometimes takes over half an hour. CPU-utilization idles at around 5% all the time with only one core (constantly changing) being used at around 30-80%.
My first instinct was that the software is not very good at distributing the pages within a single document over different threads. So I tried OCR on a large number of files simultaniously using batch-processing in "PDF-tools". Same problem: CPU-utilization is around 5% and OCR takes forever.
I also tried changing multi-threading in the options from "automatic" to "16 cores" - no effect.
The weird thing is: Every once in a while with some files OCR does suddenly use 16 cores/32 threads at around 95% core-usage and everything works extremely fast and smooth. However, I could not establish any rules behind this behaviour so far (depending on file size or similar). It all seems quite random to me.
For the record: The problem is most annoying when I am using OCR because it does take forever to finish a job. But I have the impression that multi-threading does not work very well in general. For instance, when I am printing a large document to PDF using the "PDF X-Change Standard PDF printer" it also takes a very long time and CPU-utilization is mostly below 5% with only one core doing all the work.
I would be very grateful for a solution to this problem. Looking at my CPU and its extremely low utilization I assume I could cut the time for many jobs by over 90% if multi-threading would work properly.
Thanks in advance!
Sincerely,
Chris
-
- Site Admin
- Posts: 17960
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
Re: How to improve OCR performance
Hello DrStoertebecker,
On our last meeting with the devs, this subject was discussed, and our devs did tell me that we are currently looking at ways to indeed allow multi threading to work fully when performing compute heavy tasks like OCR. There are some things that need to be tested, and to ensure that this will not have negative impacts elsewhere, but we are definitely working on this multithreading and will have it out as soon as possible (but no specific ETA yet)!
Kind regards,
Stefan
On our last meeting with the devs, this subject was discussed, and our devs did tell me that we are currently looking at ways to indeed allow multi threading to work fully when performing compute heavy tasks like OCR. There are some things that need to be tested, and to ensure that this will not have negative impacts elsewhere, but we are definitely working on this multithreading and will have it out as soon as possible (but no specific ETA yet)!
Kind regards,
Stefan