I need to highlight some random text to my PDF file. My friend advised me to convert the PDF to a Word document and parse the doc and do the highlighting before converting back to PDF.
Is there any way to this highlighting of text?
Any 3rd party library that can be use to convert PDF to doc and vice versa. Thanks.
You can use Aspose dll 's which has option to convert pdf file to word and vice versa.
For highlighting of specific words you can use BytesCout.pdfextractor dll to find the location of the searched word.Once you have find the location of word you can easily highlight it.
Conversion of PDF to Word, especially if you want the resulting Word document to be easily editable, in general is not an easy task. I doubt you'll find that as free software.
Maybe you should instead look for a lib which can search PDFs and highlight text in it. It is possible in iTextSharp (free with AGPL) if you create a custom RenderListener which waits for the word you search. When it finds the word, mark it like this (thanks, pmtamal, for the link).
There of course are numerous other PDF libs which can do that, too, I'm merely predominantly using iText...
Related
I'm currently working with Microsoft.Office.Interop.Word to open a .docx file and convert to PDF. In order to do that, I'm doing this :
Opening the .docx file with Microsoft.Office.Interop.Word.Documents.Open()
Exporting this document as a PDF with Microsoft.Office.Interop.Word._Document.ExportAsFixedFormat()
I used this method ("Call the Protect method of the Microsoft.Office.Interop.Word.Document that you want to protect") : https://msdn.microsoft.com/library/ms178793(v=vs.110).aspx
I have my PDF file, but I can open it in Microsoft Word and edit it... I don't want this PDF to be editable and I must use Word automation to make the PDF.
My question is in the title:
How to disable Word 2013 PDF edition?
It's not possible to change the way Word converts to PDF. There are some options that can be set, using the method you use, but the basic code can't be changed. If you want special things it might be worthwhile to invest in Adobe Acrobat and/Or its API. Since PDF is Adobe's file format its product will have all the special things - it doesn't license some of these things to third parties (such as Microsoft).
I have PDF document data with table structure format and I would like to convert that PDF file into a text file with the same structure with margin and spaces between text in pdf
You need to write your own PDF tool then. Which is not exactly an easy task. Honestly, 3rd party tools make your job much easier, why don't you want to use one?
If you change your mind, I can suggest iTextSharp. I've used it in the past with great success. Here are some example to get you going:
http://www.codeproject.com/Articles/12445/Converting-PDF-to-Text-in-C
ps. there are 3 tools used in there.
Is there a way to use word interop (for instance) to watermark a doc or docx file?
Any solution would be good, but it is going to have to work from within C#.
I.e. supply the code with a byte[] and it will have to add a watermark and spit out a new byte[]
You can do that using Aspose Words, the C# library for creating and modifying Word documents without actually needing Word.
Have a look at http://www.aspose.com/documentation/.net-components/aspose.words-for-.net/howto-add-a-watermark-to-a-document.html.
No, I don't own Aspose shares, unfortunately...
I need to convert PDF files into .doc files using C#. The computer has no file system though it doesn't have Office installed. Any good ideas how I can approach this? I did some research and most of people use the interop services.
You need to understand that PDF is not really implemented as a single document format.
If your PDF docs are created by rendering text to a PDF file, then direct PDF conversion is not only possible, but can be very good (reliable).
If the source of your PDF is either a scanner or fax (essentially a scanner...) then what you have is a document with an "picture" of text. This scenario is more difficult to deal with. If you open up the markup for this there is no 'text' to be converted. In this situation you have to deal with some manner of OCR (optical character recognition) which is less reliable due to a variety of issues.
If you have the option of intercepting the data before it is rendered to PDF (say like in SSRS or Crystal) then it would be better for you to bypass the PDF stage and move your data to a Word document.
If you are constrained to receiving faxes and then needing to interpret their content, prepare for OCR hell. It has been a while since I was there, so I hope that it has gotten better.
Even with out office installed on your machine, you have access (with Visual Studios) to the Office developer toolkit which will allow you build documents to be distributed in the Word formats.(.doc/.docx).
An option/idea may be to convert the PDF to Html, which can be opened in Word?
use aspose pdf kit to conver pdf to text and then text to doc using filestream or aspose doc
I want to read tables which are in a PDF document and I want to store these values in a Database.
What I have found so far through searching the web:
Read text from PDF using abcpdf .net, which is freeware available. But it's not right solution because I want to read the tables.
Convert PDF document into Excel/Word. Tables will come in the target document as it is. Word conversion is possible by using EasyPDF Converter which is third party tool which is much cheaper than the other solution available in other tool which converts PDF into Excel.
But I am looking for any other solution/API classes which can convert PDF into Excel.
There are 2 possible solutions
a) Cometdocs makes a free online conversion from PDF to XLS surprisingly good and send for your email the result file.
b) Cognview is a comertial shareware that converts PDF to XLS. There is OCR and text version. I didn't use personally, but they have good recomendations.
If you are looking to upload your data into a database, converting your PDFs to CSV is probably the safest option. The PDFTables API will allow you to do this with C#, converting as many PDFs at once as necessary. https://pdftables.com/pdf-to-excel-api#csharp
You can try to use Quablo, a PDF table extractor available at this web page (link updated/corrected).