OCR Text rotated? Not compatible with Citavi PDF Component

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
JanKor
User
Posts: 4
Joined: Wed Nov 30, 2016 11:37 am

OCR Text rotated? Not compatible with Citavi PDF Component

Post by JanKor »

Hey there,

I am new to the forum and am having trouble with the Editor's OCR tool.

What I am trying to do is scan books (black and white, 600 dpi) to then OCR with the Editor (free version) and annotate them in Citavi 5.4. But when I do that, there is no way to mark recognized text in Citavi. With the Adobe Reader, annotations are no problem, everything works fine.

The Citavi Support tells me in this forum thread (its in German: https://support.citavi.com/forum/viewto ... 63&t=13930) after looking at the file that the problem is that the Editor slightly rotates the text during OCR. It doesnt matter whether I add a text layer to the image or create a new, searchable pdf, the Citavi pdf component cannot deal with that.

Now, is there any way to, I don't know, make it so the text isn't rotated? Maybe save to a different pdf standard? Change settings for the OCR? Does this even make sense to you as a possible problem for other pdf viewers or components?

Thanks, Jan
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: OCR Text rotated? Not compatible with Citavi PDF Compone

Post by Will - Tracker Supp »

Hi Jan,

Thanks for the post - Can you please send us a sample document? This sounds more like an issue with Citavi to me, as the slight rotation shouldn't stop an application from annotating a document (as demonstrated by Adobe). Please send the original document and the result after OCRing.

Also, please advise on the release of the Editor you're using.

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
JanKor
User
Posts: 4
Joined: Wed Nov 30, 2016 11:37 am

Re: OCR Text rotated? Not compatible with Citavi PDF Compone

Post by JanKor »

Thanks for the quick reply, I am using version 6.0 build 318.1

I attached the two documents to this post.

Basically their support said Citavi couldn't deal with rotated text because of problems when selecting multiple sections of text (holding ctrl+selecting text in their case).

Now with the documents attached I realized that it is not ALL parts of the text. On some pages Citavi doesnt "see" any texts, but on others, individual lines of text can be marked and annotated. Is the rotation of the text happening in individual words, lines or entire pages? Thanks!
Attachments
Buchan.pdf
After OCR, text layer added
(4.1 MiB) Downloaded 202 times
SKM_224e16112317400.pdf
Straight from scanner
(3.91 MiB) Downloaded 234 times
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: OCR Text rotated? Not compatible with Citavi PDF Compone

Post by Will - Tracker Supp »

Hi JanKor,

Thanks for those - From what I can see, the OCR hasn't rotated the text, the image layer was slightly skewed and so the text was placed to match. Any PDF reader should be able to handle annotating that document without issue and every reader that I've tried has been able to do so (the Editor, the Viewer, Adobe Reader DC & various others). I would go back to the guys from Citavi and mention that they're the only ones that appear to not handle annotating these documents.

Regarding the skewed image when scanning - you may be able to solve this by enabling the deskew option during scanning:

Image
Image

HTH!
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
JanKor
User
Posts: 4
Joined: Wed Nov 30, 2016 11:37 am

Re: OCR Text rotated? Not compatible with Citavi PDF Compone

Post by JanKor »

Thanks,

so just to be clear: when you said the text was placed to match, that does basically mean the text is a little off axis, just the same as the image is? Does this happen with individual words, lines or pages? I think that is what they meant when saying the text is "rotated." Did older versions of XChange Viewer not do that to match the image?

Regarding the descewing option, that would only help when scanning with your Editor and not with already scanned pdfs right? Because atm that is not an option for me. I also don't think that this would help either, because honestly my scans are pretty straight for the most part. This problem with Citavi probably will not be solved by a better scan image. As you said, all other readers deal with it fine. To my eyes, some of these scans are perfectly straight and Citavi won't recognize the text.

Thanks for your reply!
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: OCR Text rotated? Not compatible with Citavi PDF Compone

Post by Will - Tracker Supp »

No problem :)
so just to be clear: when you said the text was placed to match, that does basically mean the text is a little off axis, just the same as the image is?
It's very hard to tell, because it's only a very slight rotation, but it does appear to be that way (unless my eyes are being tricked, which is entirely possible!). I had to remove the underlying image layer and change the text colour to black, then compare the two documents to see it.
(If you want information on how to do that, please see here: https://www.pdf-xchange.com/knowle ... the-Editor ).
Does this happen with individual words, lines or pages? I think that is what they meant when saying the text is "rotated."
The way that the text is placed will depend entirely on the image being recognized. Sometimes it might be individual words, others lines and others the entire page.
Regarding the descewing option, that would only help when scanning with your Editor and not with already scanned pdfs right?
That's correct. I believe that we're looking to implement an 'after-the-fact' deskew feature, but I don't know for sure and wouldn't be able to get a definite implementation date for now.
This problem with Citavi probably will not be solved by a better scan image. As you said, all other readers deal with it fine. To my eyes, some of these scans are perfectly straight and Citavi won't recognize the text.
A perfectly straight scan might help, but that's not really practical because it's often difficult to get a perfect scan, especially when the paper moves after closing the lid on the scanner.
To my eyes, some of these scans are perfectly straight and Citavi won't recognize the text.
I agree completely - Citavi should be able to handle highlighting text like this, so it's not something that we can help with and is definitely something that they need to look at. Given that they appear to want to lay the blame on us, I suspect that they're not going to be particularly receptive to my comments, so I'd recommend that you mention to them that all other 'main-stream' readers work perfectly with this and that includes Adobe.

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
JanKor
User
Posts: 4
Joined: Wed Nov 30, 2016 11:37 am

Re: OCR Text rotated? Not compatible with Citavi PDF Compone

Post by JanKor »

Hey,

just to follow up, things are still not working out, but as we have all concluded the problem roots in the way Citavi handles rotated text in pdfs. Actually, because of all the examples that I showed them and pressing the issue, they will try larger tolerances in rotated text in their next beta, which is great.

Their support team is really a bunch of nice guys, I showed them this thread also and they explicitly asked me to clear something up with you guys over here: They never wanted it to seem like they were putting the blame on the Editor or your OCR tool, but they fully acknowledge responsibility for how annotations are handled in Citavi. So there, did it.

Have a great day and thanks again!
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: OCR Text rotated? Not compatible with Citavi PDF Compone

Post by Will - Tracker Supp »

Hi Jan,

Awesome, glad to hear that! That's great and I hope they didn't feel I was being offensive? It just appeared that initially they were directing it toward us, but I may have misunderstood.

Anyway, have a great weekend :)
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Post Reply