Hi,
We use the tracker ocr as a service at a server to manipulate all incoming pdfs of our document management system.
How can I separate the pdfs with images from the "normal" pdfs to reduce the time to run ?
And what happend with "normal" pdfs if I put theses files in the ocr process ?
regards Michael
How to separate pdfs with images from "normal" pdfs
Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan
-
- User
- Posts: 41
- Joined: Tue Dec 08, 2009 10:44 pm
-
- Site Admin
- Posts: 8624
- Joined: Wed Jan 03, 2018 6:52 pm
Re: How to separate pdfs with images from "normal" pdfs
Hello Michipapa,
Currently there is not a method to separate the PDF's based on content.
There is a checkbox in the OCR function to "Skip pages that already contain text content items", this may help in your situation. Note however that this function will also skip pages that have both images and base content text on them, so it may not be a catch all solution.
For "normal" PDFs that are processed with OCR, if the aforementioned checkbox is checked off, they will not be affected, and will add minimal time to the process que. If the tickbox is not checked, you may find that you have a duplicate layer of invisible text on the document.
I hope this helps!
Edit:
I have just brought this to the Dev team, and we have decided to undertake the challenge. I cannot make any promises about a timeline for the function, but If you are ever looking for updates on the progress, please ask any member of our support staff about the below ticket number, and we will be able to assist.
RT #4474
Currently there is not a method to separate the PDF's based on content.
There is a checkbox in the OCR function to "Skip pages that already contain text content items", this may help in your situation. Note however that this function will also skip pages that have both images and base content text on them, so it may not be a catch all solution.
For "normal" PDFs that are processed with OCR, if the aforementioned checkbox is checked off, they will not be affected, and will add minimal time to the process que. If the tickbox is not checked, you may find that you have a duplicate layer of invisible text on the document.
I hope this helps!
Edit:
I have just brought this to the Dev team, and we have decided to undertake the challenge. I cannot make any promises about a timeline for the function, but If you are ever looking for updates on the progress, please ask any member of our support staff about the below ticket number, and we will be able to assist.
RT #4474
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
-
- User
- Posts: 41
- Joined: Tue Dec 08, 2009 10:44 pm
Re: How to separate pdfs with images from "normal" pdfs
Hi Daniel,
If you write
>There is a checkbox in the OCR function to "Skip pages that already contain text content items"
which function or parameter of your OCR - SDK do you mean ? I see this in the GUI of the PDF Editor but not in the OCR Optionlist .....
regards Michael
If you write
>There is a checkbox in the OCR function to "Skip pages that already contain text content items"
which function or parameter of your OCR - SDK do you mean ? I see this in the GUI of the PDF Editor but not in the OCR Optionlist .....
regards Michael
-
- Site Admin
- Posts: 8624
- Joined: Wed Jan 03, 2018 6:52 pm
Re: How to separate pdfs with images from "normal" pdfs
Hello michipapa,
My sincerest apologies, I jumped on this a bit quickly and did not notice that it was an SDK issue.
While this option is available from the End User GUI, I do not believe that they are available from the OCR SDK. With that being said, I've created another feature request for you, this time to add these functions into the SDK products.
#4475: FR: OCR SDK Add more scan options
Hopefully we can add these in soon, but until then, I do not have an interim solution for you. Ive asked the dev team for more information on this, so should anything come up, or if they find a workaround to help you implement it, I am sure they will let you know.
My sincerest apologies, I jumped on this a bit quickly and did not notice that it was an SDK issue.
While this option is available from the End User GUI, I do not believe that they are available from the OCR SDK. With that being said, I've created another feature request for you, this time to add these functions into the SDK products.
#4475: FR: OCR SDK Add more scan options
Hopefully we can add these in soon, but until then, I do not have an interim solution for you. Ive asked the dev team for more information on this, so should anything come up, or if they find a workaround to help you implement it, I am sure they will let you know.
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
Re: How to separate pdfs with images from "normal" pdfs
Hello Michael,
If you want to deeply control the OCR logic - I recommend using it in pair with the Core API SDK. What I see from this page is that you should have it in the PRO SDK bundle:
https://www.pdf-xchange.com/produc ... ge-pro-sdk
Though I do not know what license do you have exactly.
Cheers,
Alex
If you want to deeply control the OCR logic - I recommend using it in pair with the Core API SDK. What I see from this page is that you should have it in the PRO SDK bundle:
https://www.pdf-xchange.com/produc ... ge-pro-sdk
Though I do not know what license do you have exactly.
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ