How to convert PDF to WORD in c# [closed]

How to convert PDF to WORD in c# [closed] - c#

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Does the anyone know a .Net component to convert PDF to Word or RTF programatically? I don't want to use OCR and Adobe dependent solutions.

I tried several libraries:
PDF Focus .NET: https://sautinsoft.com/products/pdf-focus/index.php
Aspose.PDF: https://products.aspose.com/pdf/net
Gembox: https://www.gemboxsoftware.com/document
Spire.PDF: https://www.e-iceblue.com/Introduce/pdf-for-net-introduce.html
considered also using Word via COM automation to open and save to pdf programmatically.
Among all of them I liked PDF Focus .NET best of all, and I will explain why:
They try to keep the structure of the document EDITABLE, so that
when I will try to continue editing the text, the paragraph will be
smoothly prolonged. Other libraries are trying to do a
"minimalistic" approach by inserting absolute positioned shapes, so
that if you continue editing the text, it will overlap with the next
piece of text.
They do all their best to recognize tables, so
that tables in the output document will be REAL TABLES, but not a
collection of shapes and texts with absolute positioning (as
produced by other libraries).
A customer of ours is evaluating now different libraries, and I will recommend PDF Focus .NET first of all.
P.S. I AM NOT INVOLVED IN ANY KIND OF RELATIONSHIP WITH THIS SOFTWARE PRODUCER. As a former .NET developer I simply see a high quality components which really work fine.

Use PDF Focus.
Nice and easy.
EDIT: And also
How to convert DOC into other formats using C#
http://dotnetf1.blogspot.com/2008/07/convert-word-doc-into-pdf-using-c-code.html

You need something like GemBox.Document. It's a simple .NET component that enables you to manipulate and convert all kinds of document files.

You should have read this: C# and PDF. There are methods to convert, like beforementioned PDF Focus but be warned: it is buggy, and crashy process. PDF is not intended to be PC-readable.

Related

Fast, multi-threaded and free HTML to PDF converter in C# for A4 documents [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 days ago.
Improve this question
I would like to ask for your advice. I need a converter that will create an A4 PDF documents from HTML - I need it to be possible to define the margins of the document and I need the css rendering to work - (css code defined in HTML string). I don't need to save anything to a file - I need a converter to which I send html as a string and it returns a pdf as byte array. My requirement is for it to work as fast as possible - to be able to convert 5000 html strings to pdf documents - each one or two-page long, in a reasonable time. I need the converter to work with a C# ASP .NET Core application.
So far I have tried these converters:
Tuespechkin
Dink
Both work well but very slowly. The convert method takes a very long time. Unfortunately it can't even be called in parallel, even if I create multiple threads the method is always executed serially.
HtmlRenderer.PdfSharp
It works very fast (several times faster than tuespechkin and dink) and is executed in parallel, but the rendered pdf looks bad - some important css styles are ignored and even if I choose the A4 format - some text is beyond the edge of the document.
I also read this entire thread but found nothing helpful to solve my problem: Convert HTML to PDF in .NET
Thanks for every reply

C# - PDF AxAcroPDFLib and AcroPDFLib, missing functions? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I was trying to display a PDF file. After long research in google, stack, and msdn, I found few options. Libs from Adobe was first hit, but unsuccessful.
Link to album with properties, events and code.
None of the functions and properties from answers were found here. I added .COM control to toolbox, put in Form, and added "usings". As in pictures.
And here I have few questions to this topic:
1. What did I wrong, that this hasn't got their functions?
2. Can I use this libs in commercial program?
3. Does user must have install Adobe Reader to run my program?
Later on I found another clearly commercial libs, but I can't afford their license.
Last one was "PDFSharp", but I can't understand where to put code from this sample.
If someone could recommend a lib, or program which is:
Independent (I would like not force to instal Adobe Reader, for example).
Display PDF.
Move through PDF pages (scroll bars, and change index of actual page).
And actually that's all. Even simple conversion from PDF to ImageBox would be enough.
Thank You all in advance for all help.

(and 2) It can be related to the latest changes in Adobe Reader COM objects that broke Adobe Reader controls on Windows x64. The solution is to use WebBrowser control instead (and expect that it will call PDF plugin) OR to compile your application targeted x86 instead of AnyCPU.
If you rely on Adobe PDF pluggin then yes, user should have Adobe Reader installed and must explicitly permit PDF plugin in Internet Explorer.

Content-wise not page-wise pdf comparison library [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I'm looking for a library that I can use in a C# windows application for comparing PDF files. There are a lot of tools that I have seen for doing page-wise pdf comparison (e.g., http://www.inetsoftware.de/other-products/pdf-content-comparer). However, I want content-wise comparison. That means that if content is added or removed that will cause everything after the change to be shiffted, then I do not want the shifted content to be considered as changed.
One option is to extract the text from the pdf files and then doing a text comparison using an algorithm like the one proposed by Eugene W. Myers in his paper "An O(ND) Difference Algorithm and its Variation". However, I wonder if there is a tool or library that I can use in C# to do this? Ideally, the tool will show the entire original document and highlight the changes. The tool will also detect other content changes like image changes.
Thanks.

A commercial option is DocsCorp compareDocs SDK (also known as DocuComp) http://www.docscorp.com/public/products/publicProductsDocuCompServer.cfm
It is a content based comparison solution. For example shifting of content due to insertion of a new paragraph will not cause all subsequent text to be considered 'changed'. The inserted paragraph will be marked as 'inserted' while the subsequent text will still be considered 'same'.
PDF to PDF comparison with output as single PDF. Changes are shown as annotations (insertions shown as underlined text, deletes are represented by PDF comments (yellow sticky notes) anchored to the point the deletion took place). Output can be a single PDF illustrating the changes. This is based on the modified PDF OR it can show a side by side view representing both PDF's in one PDF.
The comparison is text based only. It does not currently attempt to show changes in images or other graphical elements in PDF's.
For full disclosure I am employed and part own this company. My position is R&D VP.
Regards
Shane

Correct way of parsing XMP XML metadata attached to the end of a PDF file? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have a PDF with some meta data in XMP XML format attached to the end. What is the correct way of parsing and using this meta data?
At the minute i have a working solution using C99, parsing each character in the file, starting at the beginning and using loops until i reach a tag im after and then recording the contents until i reach the closing tag. I can't see this as the best way of doing things.
I'm now rewriting this program using C# + Mono (not .NET) and i wonder if there is a magic framework class for this task instead of just imitating the C99 version? (Also, i can only rely on third party libraries if they don't contain any p/invoke stuff, etc.)
I'm using Mono because i need this app to be cross-platform.

Adobe has published the XMP specification. Give it a try. You need to find out what XMP schema the XML uses and parse it accordingly.

If you can get the complete XML as a string, you can use XmlDocument.Load to get the complete XML in memory for querying.
You can then use XPath with the XmlDocument.SelectNodes method in order to get to your data.

Looking for a reporting tool that will allow vector graphics in output file (PDF) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I have been tasked with evaluating our current system that we use for creating and outputing reports.
Currently we are using Crystal Reports 2008, (I know that this is and old version.), which has a custom commandline app that we wrote in C# to execute the report for a given parameter passed through the command line.
We like Crystal becuase it's easy to setup and design the report. It's also easy to print and create a PDF file from crystal using our custom commandline program.
One of the problems/complaints that we have is that Crystal does not appear to have a method that will allow us to create a PDF file with a vector images, such as our company logo. Crystal Reports always converts an image into a bitmap. When the PDF is printed, the results are less than flattering, and the PDF file size is increased.
Does anyone have any recomendadtions for a reporting product that we should consider?

iTextSharp supports importing WMF as vector image. Maybe other formats too.
See sample here. N.B.: it seems, it's a bit outdated... you'll need to replace 'getInstance' with 'GetInstance'.

www.hagridsolutions.com/xtraction
Offers easier use than Crystal and a rich export that can cater for exporting data into a MS Word template (that could contain vector images, headers, table of contents) and also export this into PDF or HTML format.
Design is drag-and-drop with no coding or dependence on specialized staff whatsoever.

You can define the reports once and have them scheduled to output to PDF, saved to the system to be viewed online or to a file system.
The dates can be rolling (as in Last Week, Last Month) and so always deliver based on what you need.
The design is drag-and-drop, the dashboards are interactive, the reports are available when you need them, and there is security to lock down access to the dashboards/reports and control of who can design dashboards/reports. The flexibility is surely there for whatever combination is needed.

I think that Combit's List and LAbel will fit this requirement.
www.combit.de
however the support for EMFs is not perfect, it works good for small and medium complexity.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.