Default image recoding settings when splitting a page

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

User avatar
Jensen Head
User
Posts: 430
Joined: Mon Sep 13, 2021 8:12 am

Default image recoding settings when splitting a page

Post by Jensen Head »

When I create a new PDF document consisting of only raster images, the resulting document is approximately the same size as the combined size of the original images. For example, taking 16 large (40000+ pixels on the largest side) screenshots of conversations in the messenger with a total size of 108555815 bytes, I get a PDF document of 108565623 bytes in size. That is, the increase is only 9.5 KB or 0.01%, which convincingly indicates that images are not transcoded when embedded in PDF. A visual comparison of the appearance of screenshots and pages of a PDF document also does not reveal any differences.

However, when I cut the pages in half (to be able to do OCR) I find that the document size is reduced by half, and each page cut has a significant loss in quality. Visual artifacts appear around all contrasting elements (tables, graphs, text).

I saved the image before cutting and after cutting the page to disk in JPEG format. The IrfanView application displays the following in the properties of these files:

original:
JPEG, progressive, quality: 70, subsampling OFF
Number of unique colors: 187417
after splitting:
JPEG, progressive, quality: 70, subsampling OFF
Number of unique colors: 131079
I can assume that either after cutting the image, the JPEG compression ratio is the same as in the original image, or all (?) images saved from the document are re-encoded at 70% quality.

I did not find a way in the application settings to specify at what quality the split images should be encoded. Can you help me find this setting, or suggest another way to split pages that are too long for recognition without degrading the quality of images (which, among other things, leads to deterioration in the quality of OCR)?
You do not have the required permissions to view the files attached to this post.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17960
Joined: Mon Jan 12, 2009 8:07 am
Location: London

Re: Default image recoding settings when splitting a page

Post by Tracker Supp-Stefan »

Hello Jensen Head,

When you are splitting a page - we have to split the images as well - and create new ones. That obviously would require re-encoding the image, and while I do not have the exact settings - I would suspect that the defaults are the same as what is set up for "File -> New Document -> From images":
image.png
However that would be recompressing an already compressed with a Lossy algorithm image - so indeed the image quality would drop.
If you expect to need to manipulate images - please do that before you add them to the Editor, as we do not have the same full sets of image processing tools as dedicated image software would have.

Kind regards,
Stefan
You do not have the required permissions to view the files attached to this post.
User avatar
Jensen Head
User
Posts: 430
Joined: Mon Sep 13, 2021 8:12 am

Re: Default image recoding settings when splitting a page

Post by Jensen Head »

Tracker Supp-Stefan wrote: Fri Mar 15, 2024 11:11 amIf you expect to need to manipulate images - please do that before you add them to the Editor
Stefan, I didn’t understand what exactly you meant by “do that”, but if you meant pre-converting images to a lossless format like PNG or TIFF, then you are absolutely right: even multiple cutting of such images embedded in a PDF document does not entail a deterioration in their quality. I've tested both dark text on a light background and light text on a dark background, with consistently excellent results. It's a workaround, but it works, and with many automatic graphics format converters available, it's not difficult. Thank you for your help!

(NB: the each attachment contain four versions of the same document from left to right and top to bottom:
1. original - 1 page,
2. split in half - 2 pages,
3. 2, split in half - 4 pages
4. 3, split in half - 8 pages)
You do not have the required permissions to view the files attached to this post.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17960
Joined: Mon Jan 12, 2009 8:07 am
Location: London

Re: Default image recoding settings when splitting a page

Post by Tracker Supp-Stefan »

Hello Jensen Head,

I had in mind that you should be processing your images before they need to be cut in the Editor. I was thinking that you maybe would "cut" them to the desired size in image processing software, and then add them to the Editor.
However if having lossless images to start with then allows you to do the rest of the PD manipulation with the Editor without loss of quality - I am happy if you are happy with that approach as well!

Kind regards,
Stefan