Extract OCR, only numeric text

dataco · Post by **dataco** » Thu Nov 29, 2012 7:56 am

Hi,

I use the clarion SDK.

It's possible to extract only(or force) numeric value when converting.

I've try to use the whitelist option in OCR_Options.SetOptions(PXO_French, OCR_Auto, '0123456789', '', 'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\ocrdats', OCR_Image_Autorotate, 300, 0) but it doesn't work !

Any idea ?

Thanks in advance

Thu Nov 29, 2012 6:01 pm

Hi!

I found two problems:

1. The pDataPath should not have the ocrdats after the last backslash "\":

'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\'

2. I found a bug in the SetOptions method. Pleaase download and unzip the attached files into your 3rdparty or accessory \Libsrc folder o9r subfolder over the excisting files.

dataco · Post by **dataco** » Fri Nov 30, 2012 9:42 am

I forgot to say that I use the SDK Trial Clarion.

I do what you said, but when I unzip the files into the Libsrc\Win of the Clarion 8 directory I have compilation errors(See attached image)

Thanks for your answer!

Fri Nov 30, 2012 12:01 pm

Hi!

I'll have to look into why that's happening. I should have an answer later today.

Later:

I think I posted the wrong set of files. Please try the attached instead.

dataco · Post by **dataco** » Mon Dec 03, 2012 11:20 am

HI,

It's compiling now, but the result is the same!

The output PDF file is ok, but when I try to export the PDF file into a text file, the result is a empty file!

If I put the blaklist parameter OCR_Options.SetOptions(PXO_French, OCR_Auto, '', 'ioO', 'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\', OCR_Image_Autorotate, 300, 0) all is ok !

You know why ?

Mon Dec 03, 2012 11:28 am

Hi!

Yes i do and there will be a patch out later today after I finish testing it.

Later: I ran into some problems. I'll have it out tomorrow for certain.

Tue Dec 04, 2012 10:54 pm

Hi!

Not quite yet.

I ran into Access Violations while testing and I'm trying to figure out what's causing that.

It shouldn't take too long.

dataco · Post by **dataco** » Wed Dec 05, 2012 12:59 pm

Hi,

I'm very interesting to this template, so I look forward to the patch!

Wed Dec 05, 2012 6:42 pm

Hi Koen!

Be out tomorrow - I don't have the latest OCR Template Editor build yet. It'll be later today.

Thu Dec 06, 2012 5:36 pm

Hi!

Please try this version of the OCR class files. Just unzip into your 3rdparty or accessory \Libsrc folder.

I found that the class CLW file was not matching the INC file, and had to correct the CLW file.

It is working here. I have tested with Clarion 6 and 8.

If you have problems with access violations, I suggest omitting the DataPath variable which will use an \ocrdats that should be in your application folder, or double check to make SURE you are using the correct parent folder for the \ocrdats folder. And don't forget the trailing backslash "\" on the path name.

dataco · Post by **dataco** » Mon Dec 10, 2012 3:53 pm

Hi,

Thank you for your library, it's work.

But thers is a little problem, when I open the output PDF file and save it as a text file, the result is an empty file !

You know why ?

Mon Dec 10, 2012 5:02 pm

Hi!

Not without more information.

Are you using one of our demos or a program you wrote? If one of ours, which one? Have you changed it in any way?

Please supply a sample PDF file (zipped) that displays this behaviour - thanks.

dataco · Post by **dataco** » Tue Dec 11, 2012 8:57 am

Hi,

I use your ocr1demo.app and I have changed the SetOptions line(OCR_Options.SetOptions(PXO_French, OCR_Auto, '0123456789>+', '', 'C:\Users\Public\Documents\SoftVelocity\Clarion8\accessory\TrackerSP\PXC_OCR\', OCR_Image_Autorotate, 300, 0))

I send you the input file 150.pdf and the output file 150_xs.pdf

Tue Dec 11, 2012 10:37 am

Wait - I thought you were extracting numeric fields from a rasterized PDF page. ocr1demo.app only makes a rasterized PDF page "searchable" by creating an "invsisble" text underlay for it. But for that to work, you should omit whitelist and blacklist parameters.

ocr2demo.app demonstrates field extraction from a rasterized PDF page.

Extract OCR, only numeric text

Extract OCR, only numeric text

Re: Extract OCR, only numeric text

Re: Extract OCR, only numeric text

Re: Extract OCR, only numeric text

Re: Extract OCR, only numeric text

Re: Extract OCR, only numeric text

Re: Extract OCR, only numeric text

Re: Extract OCR, only numeric text

Re: Extract OCR, only numeric text

Re: Extract OCR, only numeric text

Re: Extract OCR, only numeric text

Re: Extract OCR, only numeric text

Re: Extract OCR, only numeric text

Re: Extract OCR, only numeric text