Get a Scribd book as a searchable PDF with PDF Xchange Editor

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
Mitch
User
Posts: 4
Joined: Mon May 01, 2017 3:14 am

Get a Scribd book as a searchable PDF with PDF Xchange Editor

Post by Mitch »

Create a searchable PDF from any Scribd book with PDF Xchange Editor
How? Use software to automate flipping the pages and take screenshots, and use PDF Xchange to create a a high quality searchable PDF.

Windows or Mac

I need my books to be available offline. Scribd Premium its offline storage often fails. The solution is to scan the pages from screen and create a searchable PDF. It's for personal usage. I tried OSX Automator, Abbyy Finereader, Acrobat DC professional, and ePub. All of these have issues with readability or OCR accuracy.

Here we go:

Login to Scribd and open the book to read.

Step 1: Screenshot pages as PNG with Keyboard Maestro (Mac) or AutoHotkey (Windows). Both are Freeware. Make sure screenshots are taken in Full Screen.

Keyboard Maestro for Mac:
Keyboard Maestro settings
Keyboard Maestro settings
AutoHotkey for Windows
This is a bit more complex and involves a script:

^!R:: ; CTRL+ALT+R to run the script
loop 400 ; keep going for n number of times in this case 400 times
{
Send +{Printscreen} ;keystroke [shift]+[PrintScreen]
SetKeyDelay, 5000 ; delay for 5 seconds
Send {right} ; keystroke right
SetKeyDelay, 5000 ; delay for 5 seconds

}
return
For more info check https://autohotkey.com/board/topic/5811 ... re-script/

Step 2: Batch conversion and rename with XnView (Mac) or Irfanview (Windows).
PDF Xchange can also sharpen scan images but not in batch. That's why I use XnView or Irfanview.

Xnview for Mac
Choose Tools - Batch Convert and set the actions below under the second tab:
XnView settings
XnView settings
Irfanview for Windows
Choose Menu - Batch conversion and rename
a. PNG compression level 6
b. Crop to 1170 x 770 (this is optional and removes the grey Scribd borders. The size is based on Macbook screen resolution 1440 x 900. Get SwitchResX for Mac if you want a screenshot of higher quality, which requires a higher screen resolution).
c. Sharpen 10, Contrast 15

Step 3: Image to PDF with PDF Xchange Editor (V6 build 321)
a. File - New Document from Image files
Go to Options
b. Select Paper size from Image size (under New Page Options)
c. Fit Image to Cell (centre-middle) under Images Layout Options
d. Flate compression all (True color, Grayscale, etc.) under Image compression
e. Set OCR Medium under Image Postprocessing. You can also skip this setting first and pre-process to see how the quality of the scanned images will be. And make your document searchable with the desired OCR accuracy via Menu - Document - OCR.
Image options
Image options
After setting the above options click ok and ok again to run and process the images. PDF Xchange is now going to OCR (recognize) the images you selected.
Alternatively, skip the OCR part first
PDF Exchange OCR process
PDF Exchange OCR process
Step 4: Split Pages
a. Split pages with PDF Xchange Editor
b. Menu - Document - Split Pages
c. Click on icon to select a Vertical split 50% after which a dotted vertical red line appears in the preview
d. Select Remove Source pages
e. Select Change physical size
Split Pages
Split Pages
That's it. Good luck.

Mitch
Last edited by Mitch on Sat Jul 22, 2017 11:19 am, edited 4 times in total.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17818
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Get a Scribd book as a searchable PDF with PDF Xchange Editor

Post by Tracker Supp-Stefan »

Hello Mitch,

Many thanks for this tutorial!
Hope other people will find it useful as well!

Cheers,
Stefan
Post Reply