How to remove added text layer

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
enaef
User
Posts: 46
Joined: Sat Apr 02, 2011 1:53 pm

How to remove added text layer

Post by enaef »

Hi

Somewhere (was it in the newsletter?) I read that the "convert to image only" makes it impossible to remove the text. This probably means, that in the case of "preserve the original content ..." the text layer can be removed. How is this done?
Up until now I have saved the original files in case of an advanced OCR functionality in the future.
If I consequently use the "preserve the original content ..." version and at the same time am able to remove the text layer, I won't need to save the original file (without OCR) anymore ...

Thanks, Ernst
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17815
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How to remove added text layer

Post by Tracker Supp-Stefan »

Hello Ernst,

The "convert to image only" will make all the contents of a page a single image - e.g. if the page contained images and typewritten annotation - it will all become a single image which will then be OCRed.

If you decide to preserve the original content - any machine recognizable text will remain as such.
In both cases - the OCRed text will be placed "on top" in a new invisible layer.
Currently there is no way to remove this layer in our products - but as you are aware - a more advanced OCR set of features is coming.

For now I would recommend you to have a non OCRed copy just in case - and once the new functions become available - you will decide for yourself whether the originals are needed any more.

Best,
Stefan
Ludwig
User
Posts: 17
Joined: Sun Feb 24, 2013 1:52 pm

Re: How to remove added text layer

Post by Ludwig »

Hi Stefan,

I am desperately waiting for this new OCR feature. I would very much like to have the option to remove existing text layers (sometimes the layers of given files are wrong and I would like to replace them) - but not by turning the file into a mere image-file with a size that is much larger than the original. Can it be said when such a new (but very essential) feature will be implemented?

Best regards
Ludwig
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17815
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How to remove added text layer

Post by Tracker Supp-Stefan »

Hi Ludwig,

Actually using the new PDF-X Editor:
https://www.pdf-xchange.com/product ... nge-editor
You can modify the base contents of a file and even remove unwanted components, so do give it a try.
The advanced OCR tool that will allow you to pre-select some such operations to eb performed as part of the OCR process is coming a bit later.

Regards,
Stefan
Ludwig
User
Posts: 17
Joined: Sun Feb 24, 2013 1:52 pm

Re: How to remove added text layer

Post by Ludwig »

Hi,

I just want to double check: You recommended the pdf-x editor and to edit the content. Maybe I misunderstood something - does it mean that the pdf-x editor can delete existing ocr-layers too? I don't want to modify any base content but only the text layers on top. In case there is such a feature already then I simply can not find it (I found the "edit content tool" though).

Thanks a lot, Ludwig
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17815
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How to remove added text layer

Post by Tracker Supp-Stefan »

Hi Ludwig,

Once the OCR layer of text is added to a document it's added as a "base" element and not as e.g. an annotation, so yes using the "Edit content tool" you should be able to select and remove that invisible text object.

Regards,
Stefan
Ludwig
User
Posts: 17
Joined: Sun Feb 24, 2013 1:52 pm

Re: How to remove added text layer

Post by Ludwig »

Hi Stefan,

As I found out only preselected sections of a certain page can be edited this way. But this does not work for a document of several hundred pages. Furthermore the size of a file gets bigger when deleting such information. I also have to admit that it took me a while to find out how to use the editor for my concern. Coming from "Menu - Tools - Content Editing Tools - Edit Content Tool" I miss the a selection of the different things and ways to edit. "Editing" can mean a lot of things.

So thank you for suggesting the editor but I think I will have to wait for the advanced OCR-feature.

Best regards
Ludwig
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17815
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How to remove added text layer

Post by Tracker Supp-Stefan »

Hi Ludwig,

Thanks for trying and understanding. We are already working on the new advanced OCR tool.

Regards,
Stefan
User avatar
David.P
User
Posts: 1510
Joined: Thu Feb 28, 2008 8:16 pm

Re: How to remove added text layer

Post by David.P »

Hi @all,

just found this thread by Ludwig, and would like to +1 it.
Ludwig wrote:I am desperately waiting for this new OCR feature. I would very much like to have the option to remove existing text layers (sometimes the layers of given files are wrong and I would like to replace them) - but not by turning the file into a mere image-file with a size that is much larger than the original. Can it be said when such a new (but very essential) feature will be implemented?
I have also "desperately" been looking for ways to remove so-called "renderable" text (layers) from PDF files.

For example, I often have like 500-pages scanned PDF's which are only around 10MB INCLUDING an OCR text layer which however I'd like to remove for certain reasons. By re-printing the file to PDF however I always seem to end up with something that is 5 to 10 times bigger (and that is, without text layer).

So has there been any progress on this (i.e. remove text layers from entire documents)?

Regards David.P
David.P
PDF-XChange Pro
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17815
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: How to remove added text layer

Post by Tracker Supp-Stefan »

Hello David,

When you go to Document -> OCR if you select "create new searchable PDF" - this will effectively rasterize the original file, and OCR it after that. And the result will be a new file that will only have the single raster image as background and the new OCR text layer on top on each page.

Regards,
Stefan
User avatar
David.P
User
Posts: 1510
Joined: Thu Feb 28, 2008 8:16 pm

Re: How to remove added text layer

Post by David.P »

Yes thanks Stefan, however the "create new searchable PDF" feature again changes color space and/or resolution of the bitmaps in the original file which in many cases makes the file much larger, and possibly even less "sharp", dpi-wise.

The "Add Text Layer" function OTOH does not do that, but it also does not rasterize any vector objects (or text).

So it seems that there is still no way to remove/convert/rasterize all (hidden and/or visible) text in a PDF file while keeping existing bitmap compression untouched.

Best regards
David
David.P
PDF-XChange Pro
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: How to remove added text layer

Post by Will - Tracker Supp »

Hi David,

I'm afraid that you're right, this is currently possible but maybe something that we can look into implementing in the future.

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Ludwig
User
Posts: 17
Joined: Sun Feb 24, 2013 1:52 pm

Re: How to remove added text layer

Post by Ludwig »

Hi there,

I am still hoping that such an essential feature is going to be implemented one day. I would like to combine this with another request/suggestion: Very often a book includes several languages (at least in humanities and sciences). Especially when an English or German book has long Greek passages this becomes a problem as, when applying German or Greek on the whole book, the Greek parts just produce rubbish. At the moment I have two versions of a book - one ocr-ed in English and one in Greek for instance.

I would like to ocr a book in the main language and then I want to "re-ocr" certain selected parts afterwards in another language. For this I suggest not only to implement a removing/erasing feature of the text layer for the whole book but also for selected parts of a page. This also means that it should be possible to ocr not only whole pages but also selected parts of a page.

Thanks and best regards
Ludwig
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: How to remove added text layer

Post by Will - Tracker Supp »

Hi Ludwig,

Thanks for that - I'll make sure the suggestion is passed along for consideration.

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Post Reply