Page 1 of 1

PXCp_ET_GetElement returns 0 for position

Posted: Wed Nov 23, 2011 10:47 am
by marcovdlinden
Hi,

We're calling PXCp_ET_GetElement to acquire the text position on the page

Code: Select all

...
tElement.mask		= PTEM_Text | PTEM_Offsets | PTEM_Matrix | PTEM_TextParams | PTEM_FontInfo;
...
HRESULT h = PXCp_ET_GetElement( pMdData->m_pPdfDoc, dwTemp, &tElement, 0 );
This works well except for some PDF documents, there matrix.e and matrix.f are zero (everything is zero except scaling .a and .d are 1) for every element in the document.

Anyone got an Idea why this might be?

Re: PXCp_ET_GetElement returns 0 for position

Posted: Wed Nov 23, 2011 11:24 am
by Tracker Supp-Stefan
Hello marcovdlinden,

We will need a sample file (and a specific page number if it has more than one page) for testing at our end.

Best,
Stefan

Re: PXCp_ET_GetElement returns 0 for position

Posted: Thu Nov 24, 2011 9:07 am
by marcovdlinden
checking with the pdf owner, when I get permission I'll add it.

Re: PXCp_ET_GetElement returns 0 for position

Posted: Thu Nov 24, 2011 9:51 am
by Tracker Supp-Stefan
Thanks for the update marcovdlinden,

We would be waiting for your follow up on this!

Best,
Stefan

Re: PXCp_ET_GetElement returns 0 for position

Posted: Thu Nov 24, 2011 3:22 pm
by marcovdlinden
Got permission, I added the document to my first post.

Re: PXCp_ET_GetElement returns 0 for position

Posted: Thu Nov 24, 2011 3:54 pm
by Tracker Supp-Stefan
Thanks marcovdlinden,

Passed to one of my colleagues and we will investigate this and advise as soon as we have any news.

Best,
Stefan

Re: PXCp_ET_GetElement returns 0 for position

Posted: Fri Nov 25, 2011 11:10 am
by Tracker Supp-Stefan
Thanks for the sample marcovdlinden,

I just learned that this problem was resolved and that the fix will be in the next build (200).

Best,
Stefan

Re: PXCp_ET_GetElement returns 0 for position

Posted: Thu Dec 01, 2011 10:15 am
by marcovdlinden
Thanks. thats good news.

Re: PXCp_ET_GetElement returns 0 for position

Posted: Thu Dec 01, 2011 12:41 pm
by Tracker Supp-Stefan
:)

Re: PXCp_ET_GetElement returns 0 for position

Posted: Thu Dec 01, 2011 2:37 pm
by marcovdlinden
quick offtopic question,
as far as I know we are not notified of new builds.
Is there a mailing list or something like that?
We would like to addopt this new build once its released :)

Re: PXCp_ET_GetElement returns 0 for position

Posted: Thu Dec 01, 2011 3:43 pm
by Tracker Supp-Stefan
Hello marcovdlinden,

Yes - if you have an account on our website - you can subscribe for our newsletter, and monthly e-mails. The new build (#200) is planned for mid December as far as I know at this point.

Best,
Stefan
Tracker

Re: PXCp_ET_GetElement returns 0 for position

Posted: Tue Dec 13, 2011 8:33 am
by marcovdlinden
We now have a different PDF document where the Offsets always return 0.
Might this be a related problem, and will this also be fixed in 200?

Or should I create a new topic for this?

Re: PXCp_ET_GetElement returns 0 for position

Posted: Tue Dec 13, 2011 2:01 pm
by Tracker Supp-Stefan
Thanks for the update marcovdlinden,

I am checking with the colleagues if this is related, and if not - we will see what we can do to fix it.

Best,
Stefan

Re: PXCp_ET_GetElement returns 0 for position

Posted: Tue Dec 13, 2011 2:07 pm
by marcovdlinden
Ok if this is not related I can ask if I can upload the PDF in question.

Re: PXCp_ET_GetElement returns 0 for position

Posted: Thu Dec 15, 2011 11:22 am
by Tracker Supp-Stefan
Hello marcovdlinden,

We released build 200 last night, so please do check if your second problem is also resolved, and if it is not - please do consider sending us the sample file - and we will see to get this resolved in the next build.

Best,
Stefan

Re: PXCp_ET_GetElement returns 0 for position

Posted: Tue Mar 20, 2012 9:36 am
by marcovdlinden
The second problem of offset zero for all elements still occurs (build 200 and 201).

Attached is an example document, striped down to the bare basics (but the entire original pdf had this problem) to reproduce this problem.

When we process the element acquired with PXCp_ET_GetElement we get offset 0 for all characters.
Mask used: PTEM_Text | PTEM_Offsets | PTEM_Matrix | PTEM_TextParams | PTEM_FontInfo

Try the "healthcare" text and it returns all 0's for all character offsets.

Re: PXCp_ET_GetElement returns 0 for position

Posted: Tue Mar 20, 2012 1:52 pm
by Tracker Supp-Stefan
Hello Marco,

We will need to check this a bit further and will post back here as soon as we have any additional info/comments.

Best,
Stefan

Re: PXCp_ET_GetElement returns 0 for position

Posted: Tue Mar 20, 2012 1:54 pm
by marcovdlinden
ok thanks.

Re: PXCp_ET_GetElement returns 0 for position

Posted: Tue Mar 20, 2012 4:46 pm
by Tracker Supp-Stefan
Hi Marco,

We've created a ticket for this case and a developer has been assigned:
#1452: PXCp_ET_GetElement returns 0 for position
and will try to resolve it as soon as possible. We will update this topic when there is any further info.

Best,
Stefan

Re: PXCp_ET_GetElement returns 0 for position

Posted: Tue Mar 27, 2012 9:21 am
by marcovdlinden
Do you have any idea on what the cause is of this problem.
The reason I'm asking is, since this happens only with some documents, might there be a temporary work around, till you resolve this?

I'm guessing that the way this PDF was created is giving some problems, might there be specific settings when generating the original PDF that could avoid this problem?

note that this particular document was not generated with your tools.

Re: PXCp_ET_GetElement returns 0 for position

Posted: Tue Mar 27, 2012 12:06 pm
by Tracker Supp-Stefan
Hello Marco,

Yes it's quite possible that there is something specific to the file that is causing this.
Our ticketing system is temporary offline - so I can't check the latest comments in there - but as soon as it's back on - I will see to post an update here.

Best,
Stefan

Re: PXCp_ET_GetElement returns 0 for position

Posted: Mon Apr 09, 2012 12:17 pm
by Lzcat - Tracker Supp
Hello Marco.
There is problem in your file (not critical for viewing or text ectraction) and xcpro was not avble to handle it correctly. According to PDF specification Font Descriptor dictionary must contain FontName entry (missing in your file). From next build (202) xcpro will ignore absence of this entry, so Offsets will be filled correctly.
HTH.

Re: PXCp_ET_GetElement returns 0 for position

Posted: Tue Apr 10, 2012 7:38 am
by marcovdlinden
Ok, good to know what the cause is.

And even better that xcpro will be able to handle this.

Thanks for the fix.

Re: PXCp_ET_GetElement returns 0 for position

Posted: Tue Apr 10, 2012 8:10 am
by Tracker Supp-Stefan
:)

Re: PXCp_ET_GetElement returns 0 for position

Posted: Wed Jun 20, 2012 9:57 pm
by Paul - Tracker Supp
Hi marcovdlinden,

We have addressed this issue in the last build. Can you update and confirm this works at your end please?

regards

Re: PXCp_ET_GetElement returns 0 for position

Posted: Tue Jun 26, 2012 8:08 am
by marcovdlinden
Hi Paul,

All test doc's that I have available work now. Thanks for the fix.

Re: PXCp_ET_GetElement returns 0 for position

Posted: Tue Jun 26, 2012 9:30 am
by Tracker Supp-Stefan
Great to hear that marcovdlinden!

Best,
Stefan