How to determine is a PDF is searchable
Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan
-
- User
- Posts: 4
- Joined: Tue Aug 12, 2014 7:37 am
How to determine is a PDF is searchable
I have the PDF X-Change PRO SDK that includes the OCR module. I can OCR documents, but I have a large amount of documents, some of which are image-based and thus need to be OCR'ed and other that are already searchable and do not need to be OCR'ed. Is there a way with the SDK to determine if a document is already searchable or not?
-
- Site Admin
- Posts: 6902
- Joined: Wed Mar 25, 2009 10:37 pm
- Location: Chemainus, Canada
Re: How to determine is a PDF is searchable
Hi Arno,
thanks for the post,
I moved it from the End User OCR to the SDK OCR forum.
I an not personally sure how to do this and will have one of the development team advise when they have a spare moment.
regards
thanks for the post,
I moved it from the End User OCR to the SDK OCR forum.
I an not personally sure how to do this and will have one of the development team advise when they have a spare moment.
regards
Best regards
Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
-
- Site Admin
- Posts: 2353
- Joined: Thu Jun 30, 2005 4:11 pm
- Location: Canada
Re: How to determine is a PDF is searchable
Hi, arno.engelbrecht.
Possible way - you can check if any page contains any text by:
HTH
Possible way - you can check if any page contains any text by:
Code: Select all
PDFDocument hDoc;
// open document...
DWORD pagesNum = 0;
PXCp_GetPagesCount(hDoc, &pagesNum);
// check for existing text
PXCp_ET_Prepare(hDoc);
bool isSeachable = false;
for (DWORD i = 0; i < pageNum; i++)
{
PXCp_ET_AnalyzePageContent(hDoc, i);
DWORD textElementsNum = 0;
PXCp_ET_GetElementCount(hDocument, &textElementsNum);
if (textElementsNum != 0)
{
isSeachable = true;
break;
}
}
PXCp_ET_Finish(hDoc);
Vasyl Yaremyn
Tracker Software Products
Project Developer
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Tracker Software Products
Project Developer
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
-
- User
- Posts: 4
- Joined: Tue Aug 12, 2014 7:37 am
Re: How to determine is a PDF is searchable
Hi
Thanks a lot. Can I assume that if I find any text that it is already searchable or should I search for a minimum amount of text? Basically I just want to make sure that I don't get a few random characters in some files that aren't actually searchable.
Thanks a lot. Can I assume that if I find any text that it is already searchable or should I search for a minimum amount of text? Basically I just want to make sure that I don't get a few random characters in some files that aren't actually searchable.
-
- Site Admin
- Posts: 5219
- Joined: Tue Jun 29, 2004 10:34 am
- Location: United Kingdom
Re: How to determine is a PDF is searchable
Well that would be down to you to analyse what's returned and decide if its usable or not ...
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.
Best regards
Tracker Support
http://www.tracker-software.com
Best regards
Tracker Support
http://www.tracker-software.com