A forum for questions or concerns related to the PDF-XChange Core API SDK
Moderators:TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan
Forum rules DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.
When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
Dim myDoc As PDFXCoreAPI.IPXC_Document = g_Inst.OpenDocumentFromFile(Me.TextBox1.Text, Nothing)
Try
Dim bHasDoc As Boolean = myDoc IsNot Nothing
Dim docStringBuilder As New StringBuilder
If bHasDoc Then
For pageNum As UInteger = 0 To CUInt(myDoc.Pages.Count - 1)
Dim curPage As IPXC_Page = myDoc.Pages(pageNum)
Dim MyPageText As IPXC_PageText
MyPageText = curPage.GetText(Nothing, False)
Dim FirstChar As UInteger = 0
Dim CharCount As UInteger = 0
For i As UInteger = 0 To CUInt(MyPageText.LinesCount - 1)
FirstChar = MyPageText.LineInfo(i).nFirstCharIndex
CharCount = MyPageText.LineInfo(i).nCharsCount
Dim pdfWord As String = Regex.Replace(MyPageText.GetChars(FirstChar, CharCount), " {2,}", " ")
docStringBuilder.AppendLine(pdfWord)
Next
Next
End If
Dim file As New System.IO.StreamWriter("C:\temp\PDFExport.txt", False)
file.WriteLine(docStringBuilder.ToString())
file.Close()
docStringBuilder.Clear()
Catch ex As Exception
Console.WriteLine(ex)
End Try
The issue that I have is that sometimes the text in the export text file does not seem to be in order, please see screen shot below:
TextInWrongOrder.png
Is there a way to resolve this, or a way that I can maybe use the text line Y position to output a correctly ordered text file?
Thanks in advance
Simon
You do not have the required permissions to view the files attached to this post.
Being a bit lazy as away from my computer, but wanted to work on this over the weekend. Do you have an example of how to get the Y position of each line of text?
Dim nowTime As DateTime = DateTime.Now
Console.WriteLine("Start: " & nowTime.ToLongTimeString & ":" & nowTime.Millisecond.ToString)
Dim myDoc As PDFXCoreAPI.IPXC_Document = g_Inst.OpenDocumentFromFile(Me.TextBox1.Text, Nothing)
Try
Dim bHasDoc As Boolean = myDoc IsNot Nothing
Dim docStringBuilder As New StringBuilder
If bHasDoc Then
For pageNum As UInteger = 0 To CUInt(myDoc.Pages.Count - 1)
Dim curPage As IPXC_Page = myDoc.Pages(pageNum)
Dim MyPageText As IPXC_PageText
MyPageText = curPage.GetText(Nothing, False)
Dim FirstChar As UInteger = 0
Dim CharCount As UInteger = 0
For i As UInteger = 0 To CUInt(MyPageText.LinesCount - 1)
FirstChar = MyPageText.LineInfo(i).nFirstCharIndex
CharCount = MyPageText.LineInfo(i).nCharsCount
Dim pdfWord As String = Regex.Replace(MyPageText.GetChars(FirstChar, CharCount), " {2,}", " ")
docStringBuilder.AppendLine(pdfWord & " Top: " & MyPageText.LineInfo(i).rcBBox.top.ToString & " Bottom: " & MyPageText.LineInfo(i).rcBBox.bottom.ToString & " Left: " & MyPageText.LineInfo(i).rcBBox.left.ToString & " Right: " & MyPageText.LineInfo(i).rcBBox.right.ToString)
Next
Next
End If
Dim file As New System.IO.StreamWriter("C:\temp\PDFExport.txt", False)
file.WriteLine(docStringBuilder.ToString())
file.Close()
docStringBuilder.Clear()
Catch ex As Exception
Console.WriteLine(ex)
End Try
nowTime = DateTime.Now
Console.WriteLine("End: " & nowTime.ToLongTimeString & ":" & nowTime.Millisecond.ToString)
And I have attached the PDF I am using along with the text output file.
Thanks
Simon
You do not have the required permissions to view the files attached to this post.
Of course the coordinates would be like that in your case - those are the coordinates of text in line's coordinate system. To convert them into the visual coordinate representation, the line and page matrices should be used: