Extract text form pdf using Foxit SDK - c#

I am using Foxit SDK to extract the text from Pdf document .
Everything is okay but when I extract a pdf in other languages rather than English I don't get the correct output .
I have also used PDFBox in java but that gives me the worst output, output from Foxit SDK is better than PDFBox.
Are there ant other libraries which can solve the issue..?
Or there is some other solution.

Personally if you want it done right you have to pay for it. ComponentOne has a PDFViewer for WPF. Not sure what framework your working with since your tag is missing one.
ComponentOne PDF Viewer for WPF

You might want to try the trial version of Quick PDF Library to see how it performs on your documents. http://www.quickpdflibrary.com
QP.GetPageText(7) or GetPageText(8) returns pretty good results for most PDF files.
Andrew.
Disclaimer: I do some consulting work for Quick PDF Library.

If you are on windows, you can use the IFilter that adobe provides. Me, I used the IFilter adobe provides with the adobe reader 8.
Here is a link to the exact example I used
http://www.codeproject.com/Articles/13391/Using-IFilter-in-C
The performance was okay (I think. I haven't used many other methods). Takes about 15 sec for a 400 page PDF.

Related

Open PDF in C# as view only without adobe

im needing to create a form in my C# project that just allows the user to view the pdf.
i have a way to open the pdf and read it but i need to disable features like printing, saving, highlighting, copy/pasting while maintaining the ability to search in the document
they should really just be able to open the document, read it,search for words in the document, close it
any help would be great
thanks in advanced
You could use Ghostscript to convert PDF to images and then show the images on your form or you could rasterize your PDF directly to the screen.
To use Ghostscript from .NET you can take a look at the Ghostscript.NET library (managed wrapper around the Ghostscript library).
Ghostscript Viewer C# sample that rasterizes PDF directly to the screen can be found here: https://github.com/jhabjan/Ghostscript.NET/tree/master/Ghostscript.NET.Viewer
To search for the text inside the pdf you can use iTextSharp
(Disclaimer I worked on this component at Software Siglo XXI)
If you don't want to mess with Ghostscript API and need a quick working solution to visualise the documents, you could use ImageZoom Viewer .NET. It's available for both 32 and 64 bit and is very cheap and effective. I'd recommend you to try it since it's a very fancy and fast. You can browse, scroll and print the pages from the viewer.
You can take a look here: http://softwaresigloxxi.com/ImageZoom.html
This is for quick browsing and reading. Then, when you want to use text operations, you could let the user to use Adobe Reader, launching the PDF from there.

Printing a Word Doc or a PDF

My program takes a word doc and manipulates it with the Codeplex DocX open source app. That works great.
Now I need to print it. I've looked for a few hours and I haven't found a good way to print the PDF version of the file. I even tried to use AcroRd32.exe and it's just plain clunky and not really usable for a serious application.
I do have it printing with the Word.Interop but that is tying me down to a specific version of Word, more specifically, the version I have on my machine. That makes the lower versions that our customers use not work and the devs cannot compile if they aren't on 2010.
I need a way to print either a pdf or a word doc (2003 or greater) seamlessly without being prompted with each document like Acrobat Reader is doing.
Anyone have any suggestions?
Thanks!
I've used the following library for printing PDFs in past projects:
http://www.debenu.com/products/development/debenu-pdf-library/
They have a free and professional (commercial) version. It's a great library and well worth the minor expense.

Open PDF and print to PDF programmatically C#

I am developing an application that is able to open and display PDFs after I open them and print them to another PDF using CutePDF, but the originals are not viewable.
I am looking for a way to programmatically open a PDF file, and print to another PDF file (not necessarily using CutePDF, just printing to another PDF is the desired functionality).
This will be integrated into a C# .NET project. Are there any suggestions how to go about doing this?
Thanks.
You could use Office Interop and generate the PDF, when you say "print to another pdf", I imagine you mean just generate? Or are you saying spool them to a pdf print driver that essentially will just create a PDF to be saved.
Use iText, which is available in Java and C# versions. I have used the Java version successfully. I recommend the iText in Action book to help you get up to speed with iText faster. The book discusses only the Java API, but I imagine you will be able to learn the principles of iText from the book and then figure out the minor differences for the C# version.
To implement this you can use PDFFlow library for generating PDF files from C#. It has easy fluent syntax and many features.
Here are many examples of real complex PDF documents: examples
Good luck :)

How to highlight text in Pdf Winforms C#

I have a pdf file which I want to open in a Windows Forms Application and perform following tasks-
View the pdf document
Zoom +/- document
Search Text
Highlight a specific text
Show it in a listbox/dropdown
select those words and highlight in pdf
Remove selection/Highlight.
I have tried using certain libraries like pdfSharp/iTextSharp even Acrobat Reader OCX control.
Its really bugging me..is there any help??
I'd suggest looking at some means of converting the PDF if you don't have a direct need to edit it. Even then, it may be easier to convert to a different form, make changes, and then convert back. PDF is a form of PostScript, which makes it powerful, but also makes it a mess to deal with and my personal preference is to skip that headache. Not always avoidable (had a lot of fun creating Thai support in PDF print#home ticket creation once without bloating the document beyond unusable), but highly recommended where possible.
Anyways, there are a variety of PDF conversion libraries out there, some of which may be available for .NET. Worst case, you may need to create a managed C++ layer to allow your C# code to access them.
Doesn't acrobat reader OCX already have all those features ? What exactly doesnt the OCX do that you need to do in your code ?
You might try contacting Adobe and getting their full SDK for PDF. It might have controls which you can use to solve your problem.
Come to think of it , is there even an SDK for PDF from Adobe ?
You have not mentioned your preference of using Free or Commercial PDF Viewer option. If you are open to use Commercial PSF viewer, you may evaluate SyncFusion PDF Viewer control, Telerik PDF Viewer, Dynamic PDF Viewer or TallComponents. I have checked feature set and all seem to have features you are looking for. I do not represent or promote any of these SDKs, I have used TallComponents and Dynamic PDF for PDF manipulation and both have excellent support, I would say PDF Veterans in .NET space.

HTML Printing

I am too cheap to buy crystal reports so i built the report in asp.net, the only problem I'm facing is printing the report and making it look professional. On different printer's the report looks diff, i want to be able to control the final output and make the report print standard across all printers. You guys have any suggestions on how to achieve this properly?
Why not just use Reporting Services? It's free and easy to integrate with both WebForms and WinForms apps. Supports export to PDF, Excel, etc.
Mabye a stylesheet? Google it, good luck
You could try implementing a print stylesheet (you'll find many examples Googling the term), but that can be laborious if you're not familiar with css.
If you're checking out pdf solutions, I've used iTextSharp to create pdfs. It's relatively easy, open source and mature and used by many corporations.
You could try printing to a PDF. Not sure what your budget is, but ExpertPDF is a good option I'm using now.
You could create the report as a PDF using a C# library such as PDFsharp (Open-Source).
This approach allows you to:
Serve PDF files to your user, giving them the option to:
print it now
archive it for later use
Automatically email reports to your users using a scheduled task
Store generated PDFs in a database or on the file system
cutePDF is a free PDF writer and should work for what you need

Categories