Extract OCR, only numeric text
Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan
Extract OCR, only numeric text
Hi,
I use the clarion SDK.
It's possible to extract only(or force) numeric value when converting.
I've try to use the whitelist option in OCR_Options.SetOptions(PXO_French, OCR_Auto, '0123456789', '', 'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\ocrdats', OCR_Image_Autorotate, 300, 0) but it doesn't work !
Any idea ?
Thanks in advance
I use the clarion SDK.
It's possible to extract only(or force) numeric value when converting.
I've try to use the whitelist option in OCR_Options.SetOptions(PXO_French, OCR_Auto, '0123456789', '', 'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\ocrdats', OCR_Image_Autorotate, 300, 0) but it doesn't work !
Any idea ?
Thanks in advance
-
- Site Admin
- Posts: 64
- Joined: Wed Jun 30, 2004 4:45 pm
- Location: Maryland, USA
- Contact:
Re: Extract OCR, only numeric text
Hi!
I found two problems:
1. The pDataPath should not have the ocrdats after the last backslash "\":
'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\'
2. I found a bug in the SetOptions method. Pleaase download and unzip the attached files into your 3rdparty or accessory \Libsrc folder o9r subfolder over the excisting files.
I found two problems:
1. The pDataPath should not have the ocrdats after the last backslash "\":
'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\'
2. I found a bug in the SetOptions method. Pleaase download and unzip the attached files into your 3rdparty or accessory \Libsrc folder o9r subfolder over the excisting files.
- Attachments
-
- OCR_Libsrc.zip
- Modified OCR class files
- (10.76 KiB) Downloaded 252 times
Re: Extract OCR, only numeric text
I forgot to say that I use the SDK Trial Clarion.
I do what you said, but when I unzip the files into the Libsrc\Win of the Clarion 8 directory I have compilation errors(See attached image)
Thanks for your answer!
I do what you said, but when I unzip the files into the Libsrc\Win of the Clarion 8 directory I have compilation errors(See attached image)
Thanks for your answer!
- Attachments
-
- Errors.zip
- Compilation errors
- (294.04 KiB) Downloaded 229 times
-
- Site Admin
- Posts: 64
- Joined: Wed Jun 30, 2004 4:45 pm
- Location: Maryland, USA
- Contact:
Re: Extract OCR, only numeric text
Hi!
I'll have to look into why that's happening. I should have an answer later today.
Later:
I think I posted the wrong set of files. Please try the attached instead.
I'll have to look into why that's happening. I should have an answer later today.
Later:
I think I posted the wrong set of files. Please try the attached instead.
Re: Extract OCR, only numeric text
HI,
It's compiling now, but the result is the same!
The output PDF file is ok, but when I try to export the PDF file into a text file, the result is a empty file!
If I put the blaklist parameter OCR_Options.SetOptions(PXO_French, OCR_Auto, '', 'ioO', 'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\', OCR_Image_Autorotate, 300, 0) all is ok !
You know why ?
It's compiling now, but the result is the same!
The output PDF file is ok, but when I try to export the PDF file into a text file, the result is a empty file!
If I put the blaklist parameter OCR_Options.SetOptions(PXO_French, OCR_Auto, '', 'ioO', 'C:\cw60\3rdparty\examples\TrackerSP\PXC_OCR\', OCR_Image_Autorotate, 300, 0) all is ok !
You know why ?
-
- Site Admin
- Posts: 64
- Joined: Wed Jun 30, 2004 4:45 pm
- Location: Maryland, USA
- Contact:
Re: Extract OCR, only numeric text
Hi!
Yes i do and there will be a patch out later today after I finish testing it.
Later: I ran into some problems. I'll have it out tomorrow for certain.
Yes i do and there will be a patch out later today after I finish testing it.
Later: I ran into some problems. I'll have it out tomorrow for certain.
-
- Site Admin
- Posts: 64
- Joined: Wed Jun 30, 2004 4:45 pm
- Location: Maryland, USA
- Contact:
Re: Extract OCR, only numeric text
Hi!
Not quite yet.
I ran into Access Violations while testing and I'm trying to figure out what's causing that.
It shouldn't take too long.
Not quite yet.
I ran into Access Violations while testing and I'm trying to figure out what's causing that.
It shouldn't take too long.
Re: Extract OCR, only numeric text
Hi,
I'm very interesting to this template, so I look forward to the patch!
I'm very interesting to this template, so I look forward to the patch!
-
- Site Admin
- Posts: 64
- Joined: Wed Jun 30, 2004 4:45 pm
- Location: Maryland, USA
- Contact:
Re: Extract OCR, only numeric text
Hi Koen!
Be out tomorrow - I don't have the latest OCR Template Editor build yet. It'll be later today.
Be out tomorrow - I don't have the latest OCR Template Editor build yet. It'll be later today.
-
- Site Admin
- Posts: 64
- Joined: Wed Jun 30, 2004 4:45 pm
- Location: Maryland, USA
- Contact:
Re: Extract OCR, only numeric text
Hi!
Please try this version of the OCR class files. Just unzip into your 3rdparty or accessory \Libsrc folder.
I found that the class CLW file was not matching the INC file, and had to correct the CLW file.
It is working here. I have tested with Clarion 6 and 8.
If you have problems with access violations, I suggest omitting the DataPath variable which will use an \ocrdats that should be in your application folder, or double check to make SURE you are using the correct parent folder for the \ocrdats folder. And don't forget the trailing backslash "\" on the path name.
Please try this version of the OCR class files. Just unzip into your 3rdparty or accessory \Libsrc folder.
I found that the class CLW file was not matching the INC file, and had to correct the CLW file.
It is working here. I have tested with Clarion 6 and 8.
If you have problems with access violations, I suggest omitting the DataPath variable which will use an \ocrdats that should be in your application folder, or double check to make SURE you are using the correct parent folder for the \ocrdats folder. And don't forget the trailing backslash "\" on the path name.
- Attachments
-
- OCR_Libsrc.zip
- OCR class files
- (10.87 KiB) Downloaded 215 times
Re: Extract OCR, only numeric text
Hi,
Thank you for your library, it's work.
But thers is a little problem, when I open the output PDF file and save it as a text file, the result is an empty file !
You know why ?
Thank you for your library, it's work.
But thers is a little problem, when I open the output PDF file and save it as a text file, the result is an empty file !
You know why ?
-
- Site Admin
- Posts: 64
- Joined: Wed Jun 30, 2004 4:45 pm
- Location: Maryland, USA
- Contact:
Re: Extract OCR, only numeric text
Hi!
Not without more information.
Are you using one of our demos or a program you wrote? If one of ours, which one? Have you changed it in any way?
Please supply a sample PDF file (zipped) that displays this behaviour - thanks.
Not without more information.
Are you using one of our demos or a program you wrote? If one of ours, which one? Have you changed it in any way?
Please supply a sample PDF file (zipped) that displays this behaviour - thanks.
Re: Extract OCR, only numeric text
Hi,
I use your ocr1demo.app and I have changed the SetOptions line(OCR_Options.SetOptions(PXO_French, OCR_Auto, '0123456789>+', '', 'C:\Users\Public\Documents\SoftVelocity\Clarion8\accessory\TrackerSP\PXC_OCR\', OCR_Image_Autorotate, 300, 0))
I send you the input file 150.pdf and the output file 150_xs.pdf
I use your ocr1demo.app and I have changed the SetOptions line(OCR_Options.SetOptions(PXO_French, OCR_Auto, '0123456789>+', '', 'C:\Users\Public\Documents\SoftVelocity\Clarion8\accessory\TrackerSP\PXC_OCR\', OCR_Image_Autorotate, 300, 0))
I send you the input file 150.pdf and the output file 150_xs.pdf
- Attachments
-
- 150_0.pdf
- input
- (112.42 KiB) Downloaded 231 times
-
- 150_xs.pdf
- output
- (770 Bytes) Downloaded 241 times
-
- Site Admin
- Posts: 64
- Joined: Wed Jun 30, 2004 4:45 pm
- Location: Maryland, USA
- Contact:
Re: Extract OCR, only numeric text
Wait - I thought you were extracting numeric fields from a rasterized PDF page. ocr1demo.app only makes a rasterized PDF page "searchable" by creating an "invsisble" text underlay for it. But for that to work, you should omit whitelist and blacklist parameters.
ocr2demo.app demonstrates field extraction from a rasterized PDF page.
ocr2demo.app demonstrates field extraction from a rasterized PDF page.