How can I export the text in a PDF so that I can save to a database?
Thanks
Simon
Export PDF text to file or stream...
Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan
Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.
When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.
When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: Export PDF text to file or stream...
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
Re: Export PDF text to file or stream...
Alex,
I have managed to extract the text from a PDF using the below code. However I am now struggling to actually get the text out of the IPXC_PageText element. Could you please help, I'm happy for the text to be in a StringBuilder or stream.
Thanks in advance
Simon
I have managed to extract the text from a PDF using the below code. However I am now struggling to actually get the text out of the IPXC_PageText element. Could you please help, I'm happy for the text to be in a StringBuilder or stream.
Code: Select all
Public Sub exportPDFText()
Dim doc As PDFXEdit.IPXV_Document = Me.docPreview.Doc
Dim bHasDoc As Boolean = doc IsNot Nothing
Dim cp As UInteger = 0
If bHasDoc Then
Dim pl As PDFXEdit.IPXV_PagesLayoutManager = doc.ActiveView.PagesView.Layout
cp = pl.CurrentPage
Dim curPage As PDFXEdit.IPXC_Page = doc.CoreDoc.Pages(cp)
Dim MyPageText As PDFXEdit.IPXC_PageText
MyPageText = curPage.GetText(Nothing, False)
End If
If bHasDoc Then
System.Runtime.InteropServices.Marshal.ReleaseComObject(doc)
End If
End Sub
Simon
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: Export PDF text to file or stream...
Hello Simon,
Well, you have the CharsCount property and here's how you get each individual char https://sdkhelp.pdf-xchange.com/vie ... eText_Char
I'm afraid the PDF text is not that simple as, for example, Notepad so you will have to do it this way. Additionally you can use the LineInfo https://sdkhelp.pdf-xchange.com/vie ... t_LineInfo to determine the line position and the characters that are in that line - that way you can get the visual text representation.
Cheers,
Alex
Well, you have the CharsCount property and here's how you get each individual char https://sdkhelp.pdf-xchange.com/vie ... eText_Char
I'm afraid the PDF text is not that simple as, for example, Notepad so you will have to do it this way. Additionally you can use the LineInfo https://sdkhelp.pdf-xchange.com/vie ... t_LineInfo to determine the line position and the characters that are in that line - that way you can get the visual text representation.
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
Re: Export PDF text to file or stream...
Alex,
Thanks for getting back to me so quickly.
I have done the following, which outputs all the text, however there are huge amount of spaces between each word.
Therefore I was think about using the GetChar2 method to output it to a stringbuilder and then work with that, however when I use the following code VisualStudio stops working
What I basically want to get is the same that is able to be saved through PDF Xchange Editor each word arranged by line.
Thanks in advance
Simon
Thanks for getting back to me so quickly.
I have done the following, which outputs all the text, however there are huge amount of spaces between each word.
Code: Select all
Console.WriteLine(MyPageText.GetChars(0, MyPageText.CharCount))
Code: Select all
Dim myDocSB As New StringBuilder
MyPageText.GetChars2(0, MyPageText.CharCount, myDocSB)
Thanks in advance
Simon
Re: Export PDF text to file or stream...
Alex,
Don't worry I have figured it out
Thanks for your help
Don't worry I have figured it out
Code: Select all
Dim FirstChar As UInteger = 0
Dim CharCount As UInteger = 0
For i As UInteger = 0 To CUInt(MyPageText.LinesCount - 1)
FirstChar = MyPageText.LineInfo(i).nFirstCharIndex
CharCount = MyPageText.LineInfo(i).nCharsCount
Dim pdfWord As String = Regex.Replace(MyPageText.GetChars(FirstChar, CharCount), " {2,}", " ")
Console.WriteLine(pdfWord)
Next
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: Export PDF text to file or stream...
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ