Extract OCR, only numeric text

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

Post Reply
dataco
User
Posts: 6
Joined: Mon Nov 26, 2012 3:11 pm

Extract OCR, only numeric text

Post by dataco »

Hi,

I use the clarion SDK.

It's possible to extract only(or force) numeric value when converting.

I've try to use the whitelist option in OCR_Options.SetOptions(PXO_French, OCR_Auto, '0123456789', '', 'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\ocrdats', OCR_Image_Autorotate, 300, 0) but it doesn't work !

Any idea ?

Thanks in advance
Tracker - Clarion Support
Site Admin
Posts: 64
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Re: Extract OCR, only numeric text

Post by Tracker - Clarion Support »

Hi!

I found two problems:

1. The pDataPath should not have the ocrdats after the last backslash "\":

'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\'

2. I found a bug in the SetOptions method. Pleaase download and unzip the attached files into your 3rdparty or accessory \Libsrc folder o9r subfolder over the excisting files.
Attachments
OCR_Libsrc.zip
Modified OCR class files
(10.76 KiB) Downloaded 242 times
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com
dataco
User
Posts: 6
Joined: Mon Nov 26, 2012 3:11 pm

Re: Extract OCR, only numeric text

Post by dataco »

I forgot to say that I use the SDK Trial Clarion.

I do what you said, but when I unzip the files into the Libsrc\Win of the Clarion 8 directory I have compilation errors(See attached image)

Thanks for your answer!
Attachments
Errors.zip
Compilation errors
(294.04 KiB) Downloaded 225 times
Tracker - Clarion Support
Site Admin
Posts: 64
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Re: Extract OCR, only numeric text

Post by Tracker - Clarion Support »

Hi!

I'll have to look into why that's happening. I should have an answer later today.

Later:

I think I posted the wrong set of files. Please try the attached instead.
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com
dataco
User
Posts: 6
Joined: Mon Nov 26, 2012 3:11 pm

Re: Extract OCR, only numeric text

Post by dataco »

HI,

It's compiling now, but the result is the same!

The output PDF file is ok, but when I try to export the PDF file into a text file, the result is a empty file!

If I put the blaklist parameter OCR_Options.SetOptions(PXO_French, OCR_Auto, '', 'ioO', 'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\', OCR_Image_Autorotate, 300, 0) all is ok !

You know why ?
Tracker - Clarion Support
Site Admin
Posts: 64
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Re: Extract OCR, only numeric text

Post by Tracker - Clarion Support »

Hi!

Yes i do and there will be a patch out later today after I finish testing it. :D

Later: I ran into some problems. I'll have it out tomorrow for certain.
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com
Tracker - Clarion Support
Site Admin
Posts: 64
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Re: Extract OCR, only numeric text

Post by Tracker - Clarion Support »

Hi!

Not quite yet. :(

I ran into Access Violations while testing and I'm trying to figure out what's causing that.

It shouldn't take too long.
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com
dataco
User
Posts: 6
Joined: Mon Nov 26, 2012 3:11 pm

Re: Extract OCR, only numeric text

Post by dataco »

Hi,

I'm very interesting to this template, so I look forward to the patch!
Tracker - Clarion Support
Site Admin
Posts: 64
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Re: Extract OCR, only numeric text

Post by Tracker - Clarion Support »

Hi Koen!

Be out tomorrow - I don't have the latest OCR Template Editor build yet. It'll be later today.
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com
Tracker - Clarion Support
Site Admin
Posts: 64
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Re: Extract OCR, only numeric text

Post by Tracker - Clarion Support »

Hi!

Please try this version of the OCR class files. Just unzip into your 3rdparty or accessory \Libsrc folder.

I found that the class CLW file was not matching the INC file, and had to correct the CLW file.

It is working here. I have tested with Clarion 6 and 8.

If you have problems with access violations, I suggest omitting the DataPath variable which will use an \ocrdats that should be in your application folder, or double check to make SURE you are using the correct parent folder for the \ocrdats folder. And don't forget the trailing backslash "\" on the path name.
Attachments
OCR_Libsrc.zip
OCR class files
(10.87 KiB) Downloaded 209 times
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com
dataco
User
Posts: 6
Joined: Mon Nov 26, 2012 3:11 pm

Re: Extract OCR, only numeric text

Post by dataco »

Hi,

Thank you for your library, it's work.


But thers is a little problem, when I open the output PDF file and save it as a text file, the result is an empty file !


You know why ?
Tracker - Clarion Support
Site Admin
Posts: 64
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Re: Extract OCR, only numeric text

Post by Tracker - Clarion Support »

Hi!

Not without more information.

Are you using one of our demos or a program you wrote? If one of ours, which one? Have you changed it in any way?

Please supply a sample PDF file (zipped) that displays this behaviour - thanks.
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com
dataco
User
Posts: 6
Joined: Mon Nov 26, 2012 3:11 pm

Re: Extract OCR, only numeric text

Post by dataco »

Hi,

I use your ocr1demo.app and I have changed the SetOptions line(OCR_Options.SetOptions(PXO_French, OCR_Auto, '0123456789>+', '', 'C:\Users\Public\Documents\SoftVelocity\Clarion8\accessory\TrackerSP\PXC_OCR\', OCR_Image_Autorotate, 300, 0))

I send you the input file 150.pdf and the output file 150_xs.pdf
Attachments
150_0.pdf
input
(112.42 KiB) Downloaded 227 times
150_xs.pdf
output
(770 Bytes) Downloaded 234 times
Tracker - Clarion Support
Site Admin
Posts: 64
Joined: Wed Jun 30, 2004 4:45 pm
Location: Maryland, USA
Contact:

Re: Extract OCR, only numeric text

Post by Tracker - Clarion Support »

Wait - I thought you were extracting numeric fields from a rasterized PDF page. ocr1demo.app only makes a rasterized PDF page "searchable" by creating an "invsisble" text underlay for it. But for that to work, you should omit whitelist and blacklist parameters.

ocr2demo.app demonstrates field extraction from a rasterized PDF page.
Craig Ransom
Tracker Software - Clarion Support
http://www.tracker-software.com
Post Reply