Extracting Text

RJT · August 17, 2023, 7:32pm

Is there a function that can return all the text in a document? I would like to be able to get the text so it can be added into our full-text index. Currently we’re using FreeSpire.Doc, FreeSpire.PDF, & FreeSpire.XLS to do this.

It seems like it should be built into your product since you have to extract the information to rebuild it on to a web page!

carlos.molina · August 17, 2023, 8:47pm

@RJT,

Yes, there is a function called CopyPage(int page).

I suggest you see the demo code of the samples to see how it works out.

I am talking about this:

The function only copies the text by page, not by document.

RJT · August 18, 2023, 1:33pm

From what I can see, this is just on the client (javascript) object? I’m looking to do this on the server side. This little example represents what I would like to do but I can’t find anything within the ctldoc object that resembles the pages or extracted text. Do you have a sample of how it works in code?

                'read the file into server
            Dim ctldoc As New DocViewer
            Dim config As New DotnetDaddy.DocumentConfig.PdfConfig
            config.AllowCopy = True

            ctldoc.OpenDocument("c:\temp\test.pdf", config)

            'this is fake code on how I would think it could work
            Dim sb As New StringBuilder  'string builder
            For Each Page In ctldoc.Pages  'loop through each page
                sb.Append(CopyPage(Page))  'put the text into the stringbuilder
            Next

carlos.molina · August 18, 2023, 1:52pm

@RJT,

Sadly there is nothing like that on Doconut. We are trying to be the best and most solid document Viewer.

For everything related to reading or manipulating documents on the code behind, please refer to our parent company, Aspose(https://products.aspose.com/pdf/)

RJT · August 18, 2023, 2:09pm

Thanks, that has far more features than what we need. Just need to extract the text.