Export PDF text to file or stream...

PDF-XChange Editor SDK for Developers

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.

When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
Post Reply
lidds
User
Posts: 510
Joined: Sat May 16, 2009 1:55 pm

Export PDF text to file or stream...

Post by lidds »

How can I export the text in a PDF so that I can save to a database?

Thanks

Simon
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Export PDF text to file or stream...

Post by Sasha - Tracker Dev Team »

Hello Simon,

Use the https://sdkhelp.pdf-xchange.com/vie ... ge_GetText method.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
lidds
User
Posts: 510
Joined: Sat May 16, 2009 1:55 pm

Re: Export PDF text to file or stream...

Post by lidds »

Alex,

I have managed to extract the text from a PDF using the below code. However I am now struggling to actually get the text out of the IPXC_PageText element. Could you please help, I'm happy for the text to be in a StringBuilder or stream.

Code: Select all

    Public Sub exportPDFText()
        Dim doc As PDFXEdit.IPXV_Document = Me.docPreview.Doc
        Dim bHasDoc As Boolean = doc IsNot Nothing
        Dim cp As UInteger = 0

        If bHasDoc Then
            Dim pl As PDFXEdit.IPXV_PagesLayoutManager = doc.ActiveView.PagesView.Layout
            cp = pl.CurrentPage

            Dim curPage As PDFXEdit.IPXC_Page = doc.CoreDoc.Pages(cp)

            Dim MyPageText As PDFXEdit.IPXC_PageText
            MyPageText = curPage.GetText(Nothing, False)
        End If

        If bHasDoc Then
            System.Runtime.InteropServices.Marshal.ReleaseComObject(doc)
        End If
    End Sub
Thanks in advance

Simon
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Export PDF text to file or stream...

Post by Sasha - Tracker Dev Team »

Hello Simon,

Well, you have the CharsCount property and here's how you get each individual char https://sdkhelp.pdf-xchange.com/vie ... eText_Char
I'm afraid the PDF text is not that simple as, for example, Notepad so you will have to do it this way. Additionally you can use the LineInfo https://sdkhelp.pdf-xchange.com/vie ... t_LineInfo to determine the line position and the characters that are in that line - that way you can get the visual text representation.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
lidds
User
Posts: 510
Joined: Sat May 16, 2009 1:55 pm

Re: Export PDF text to file or stream...

Post by lidds »

Alex,

Thanks for getting back to me so quickly.

I have done the following, which outputs all the text, however there are huge amount of spaces between each word.

Code: Select all

Console.WriteLine(MyPageText.GetChars(0, MyPageText.CharCount))
Therefore I was think about using the GetChar2 method to output it to a stringbuilder and then work with that, however when I use the following code VisualStudio stops working

Code: Select all

Dim myDocSB As New StringBuilder
MyPageText.GetChars2(0, MyPageText.CharCount, myDocSB)
What I basically want to get is the same that is able to be saved through PDF Xchange Editor each word arranged by line.

Thanks in advance

Simon
lidds
User
Posts: 510
Joined: Sat May 16, 2009 1:55 pm

Re: Export PDF text to file or stream...

Post by lidds »

Alex,

Don't worry I have figured it out

Code: Select all

Dim FirstChar As UInteger = 0
            Dim CharCount As UInteger = 0

            For i As UInteger = 0 To CUInt(MyPageText.LinesCount - 1)
                FirstChar = MyPageText.LineInfo(i).nFirstCharIndex
                CharCount = MyPageText.LineInfo(i).nCharsCount
                Dim pdfWord As String = Regex.Replace(MyPageText.GetChars(FirstChar, CharCount), " {2,}", " ")
                    Console.WriteLine(pdfWord)
            Next
Thanks for your help
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: Export PDF text to file or stream...

Post by Sasha - Tracker Dev Team »

:)
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
Post Reply