PDF-X OCR SDK Module

 

 

   Announcing a New FREE OCR module for
PDF-XChange PRO 2012 SDK
...

We are pleased to advise our developer toolkit clients that we have released the first fruits of our work on providing OCR functionality within our end user and developer toolkit products. The PDFX-OCR SDK Module is designed to function with our existing PDF-XChange PRO 2012 SDK, as a free SDK library enhancement, allowing developers to convert image based PDF files into fully searchable PDF’s files – whilst retaining the image based properties of the original file.

 


OCR Module Usage Requirements

Developers interested in using the new, free, live PDF-X OCR Module must own a license for PDF-XChange PRO 2012 SDK, and have purchased in the last 9 months or have a minimum of 3 months left in their PRO 2012 SDK maintenance subscription. Developers can contact us at sales@tracker-software.com with their licensing information to receive the password for the zipped Live PDF-X OCR DLL library to replace the demo files* in the PRO 2012 SDK. Live DLL file is available for download on our Developers Downloads page in the PDF-XChange PRO 2012 SDK file section.

*A demo version of the PDF-X OCR SDK module is included in the PDF-XChange PRO SDK build 4.0.199 (builds dated later than Oct 14, 2011), that will allow users to trial the OCR Module's functionality and convert the first 2 pages of any imaged-based text document to text searchable format though a license for PDF-XChange PRO 2012 SDK is required for the live version of the module..
 

Features and functionality provided will be:

  • Add a text layer to existing PDF files to allow full text search capabilities
  • Scan and OCR to make the final PDF text searchable
  • DPI downsampling for improved recognition processing speeds
  • Colour to grayscale internal conversion and image processing for improved recognition accuracy
  • Despeckling / denoising
  • Text extraction to files
  • Auto-rotation of pages for correct recognition
  • Support for all currently supported PDF version formats
  • Multi-language recognition including:  English, French, Spanish & German included in core library
  • Language Extensions package available including: Chinese (Simplified and Traditional), Korean, Cyrillic (e.g. Russian, Ukrainian), Italian and numerous others, available on the PDF-X OCR SDK Language Extensions Page.
  • Multi-zonal / field support for extracting information fields or text areas from pages
  • Character whitelists and blacklists (field types – e.g. Numbers, Symbols, Letters)
  • Dictionary support to check word validity
  • Low level access to OCR output properties (e.g. character size, position, text base lines, bounding boxes, etc.
  • Pre-trained on a wide variety of font types/styles
     

Anticipated Future enhancements*

  • Standalone SDK – not reliant on license for PDF-XChange PRO 2012  SDK
  • Font/Style recognition
  • Complex objects such as tables, embedded images, etc.
  • Formatted/editable document output – e.g. save to RTF/XPS etc
  • Spell-checking
  • Customized/Personal user dictionary
  • User training for fonts, styles, etc.
  • Bar-code recognition
  • Scan Image/Paper Forms to fill-able PDF forms

*Note that future enhancements are OE&E and are subject to change or omission without notice.