As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
We are in need of converting all MS Office documents to PDF, TIFF, or any similar image format with no loss in formatting (these are official documents that cannot have tampering).
Is there any way to do this without installing Office on the machine that would do this? Ideally, this would go on a server and run multi-threaded without the overhead of Office Automation.
You could use a third-party library such as Aspose.NET for document conversion, but I'm afraid - if high-fidelity rendering is critical - there is no way around using the original application.
Microsoft Office provides a converter API which allows conversions without Office being installed. However, not only might you be facing license issues (IANAL), this API only supports conversions of text-processing formats that don't require rendering the document (e.g. RTF -> DOC, DOC -> DOCX), so it is not really an option for you.
Update: Probably the best option would be to have a look at the SharePoint 2010 conversion engine, which is exactly made for automated (server-side) document conversions. It's quite heavy though (both hardware and pricing) so maybe it is overkill for your use-case.
If this application will be run on a dedicated machine (i.e. the machine's only job is to convert a gigantic collection of Office documents), your safest bet is probably to use Office automation in a single-threaded manner and let the app happily convert one file at a time. A multi-threaded Office Automation app would probably convert documents at a faster overall rate (especially on a multi-core processor), up to the point where the server crashes.
Office Open XML is a non-Office-Automation alternative, but since I'm currently battling its tendency to produce OutOfMemoryException errors when exporting to relatively small Excel files (~1MB), I can't really recommend it.
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Is there a .dll I can use which uses a PDF file as an input and HTML file as an output?
I want to convert from PDF to HTML. My colleague says that it's very difficult going step by step, getting text/font/image/margins/links etc. from PDF and then creating new HTML file with the same content. He says it's nearly impossible. So I was thinking - if there's some dll which I can use as a reference to do that?
Writing a program to do it is definitely not trivial. If you don't find any .NET Library to do this (I couldn't, at least not free), I would just download this and invoke it programmatically to get my html.
If you have the time to spare and/or PDFToHtml does not produce acceptable output for you, you could use iText to write the program yourself. It's a very mature free pdf library. I've used it in the past to manipulate PDFs (merge, create, etc).
UPDATE
As noted in the comment by Quandary, the PDFSharp library offers a more relaxed license (MIT) compared to the Commercial or AGPL license offered by iText. Keep this is mind when choosing your library. I have not used the PDFSharp library myself and I don't know how they compare in terms of functionality.
You can download this free tool: PDFToHTML
Then in your program just fork a new process and run the executable passing the PDF file. I just tested it now and it seems to work ok.
If you don't mind paying, Aspose offers a very good solution, this is what we use at my company.
http://www.aspose.com/categories/.net-components/aspose.pdf-for-.net/key-features.aspx
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I found out that windows 8 is going to be heavily dependent on C++, HTML5 and CSS based apps(WPF ?). I spend a lot of time working on applications like matlab, scipy and C# as programming language at my workplace. Considering this, is there going to be any big change for desktop app developers ? are these apps going to be re-written under new code and C# has any future for desktop apps?
A lot of hearsay at the moment until September it seems there is nothing definite.
There is wide speculation on whether Jupiter will be the unifying
user-interface model for Windows, Web and mobile. Burela believes
Jupiter may a “next generation” XAML-based framework, perhaps a
“mashup between WPF & Silverlight.”
There also appears to be equally strong support for three key
programming languages: C# to appease the .NET developers, C++ to
appease the Windows core developers, and HTML5/JavaScript to try to
lure developers from other platforms.
Of course the controversy has been Microsoft’s focus on JavaScript
while nearly ignoring Silverlight and .NET developers. Articles like
this one — though unofficial and speculative — should help calm some
nervous developers.
Source: http://www.isdotnetdead.com/windows-8-supports-all-programming-models/#
ZDnet try asking probing questions.
Here is another link about the future of C#
Okay, your question confuses some terms - based on the articles you cite.
Microsoft are quoted as saying that the application they demonstrated on Windows 8 was written using HTML and JavaScript. The article interprets this as saying that WPF and Silverlight are likely to be binned in favour of HTML and JavaScript.
Let's have a think about this.
As far as I'm aware, Microsoft have been really keen to run applications in the Browser for a very long time. They have made ActiveX controls that run in a browser, they have written Silverlight to run in a browser. They are one of the leaders in the whole "browser based applications" concept. People may criticise their methods of achieving this in the past, but at least they were trying.
When you consider that Microsoft currently have a desktop package called Office and also a web-based package called Office365, you can understand why they might want to just have one package to maintain that works on the desktop and in a browser.
My final note - the ARS Technica article describes HTML tooling as inferior, but seeing as you can use the same tools to write a WPF application or an HTML application I don't agree with this point.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I need to create a word file from a HTML content (on a ASP.NET server application) but couldn't find a robust way of doing that. So decided to run a discussion here to see what are possible options of doing this.
Aspose has a .NET component for this but the price is so high so can not be a solution (due to budgeting issues).
We expect this conversion to preserve tables, images, hiding invisible elements, links, etc.
There is a similar discussion here but solutions provided are all around Office Interop which is not a recommended solution for server application.
Any idea? Basically how do components like Aspose work?
Has the hard work already been done? There seems to be a project on codeplex.
Blog post describing HTML to docx converter
Project on codeplex
I would suggest writing code using the OpenXml API, you can navigate the DOM and programmatically add elements to the word document. Its no simple task through since you are interpretting markup and attempting to convert it.
link for Open XML: http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=5124
It's probably worth checking out Microsoft's own XSLT Inference tool which can generate WordML from XML input.
If you are flexible with the source of the document itself being HTML/XHTML/XML this could easily get the job done.
http://msdn.microsoft.com/en-us/library/aa212886%28v=office.11%29.aspx
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=3412
I've used it in the past to generate Word documents from within an ASP .NET app, which obtained its source XML data from SQL stored procedures.
The tool can be a bit temperamental, but with a little sanitisating of the XSLT that it generates it could just work.
If docx is appliable you can create a word document, save it as docx, reverse engineer the xml and create your own xml/docx. I did it with excel/xslx and it worked perfectly. To speed things up we created the XML as text and joined the strings (before our data - our data - after our data).
The RTF format is not a standard afaik but it is wide spread. Create an RTF document and return it as a word document. Word opens rtf without problem.
Create a HTML document and return it as a word document.
HTH
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am in need of recommendations for a good C# .NET PDF library for a web application that will make heavy use of PDF forms. First, the library needs to have commercial licensing. Also, it needs to have robust features for merging data into PDF forms, extracting data, flattening form fields that have data, etc. If the library includes barcoding, that'd be great.
iTextSharp is fairly popular around here. It is available under AGPL and commercial licenses (with source of course).
It'll do just about anything that can be done with AcroForms-based PDF forms (including flattening), and can get/set values with LiveCycle Designer forms. It also supports the following symbologies:
Codabar
Code 39 (and variants)
Code 128
DataMatrix (2d)
EAN (8 & 13)
Interleaved 2 of 5
PDF417 (2d)
Postnet
UPCA and UPCE
Huh. I thought it supported code 93, but I don't see anything in the code here to back that up.
The book iText In Action 2nd Edition is pretty good. Commercial support is available (included with a commercial license), and the help here and on their mailing list is quite good.
Disclaimer: iTextSoftware pays me from time to time for services rendered. Whether or not you use decide to use iText will not affect that amount.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
What's the best .NET PDF editing library available, and why?
It needs to be used on an IIS web-server.
Specifically, I need to edit a PDF which was generated by reporting services.
Factors I'm interested in:
Speed
Memory Consumption
Price
Quality of documentation
Library stability
Size of library
Whatever else you think is important
Have a look at iTextSharp. iTextSharp is a port of the iText , a free Java-Pdf library.
To quote iText:
You can use iText to:
Serve PDF to a browser
Generate dynamic documents from XML files or databases
Use PDF's many interactive features
Add bookmarks, page numbers, watermarks, etc.
Split, concatenate, and manipulate PDF pages
Automate filling out of PDF forms
Add digital signatures to a PDF file
And much more...
Syncfusion Essential PDF is the best. I have been using it for years. Also, Syncfusion provides a best support compared to other vendors.
I've researched quite a few tools that aren't offered specifically by Adobe, and the two that come to mind right away are Atalasoft's DotImage and LEADTools. They are both rather pricy, but provide server licensing and use the ultra-fast C++ GDI libraries.
There's a freeware .Net library called PDFSharp that uses .Net native GDI+, so it's slower and memory intensive. But then again, it's free.
webSupergoo have a super PDF library for .NET
Their ABCpdf product is designed for use with web servers. The documentation is clear and the installation is accompanied by an example website project.
If you visit their website you should see a link to the live demonstration:
http://www.websupergoo.com/abcpdf-1.htm
ABCpdf 7 is the current version. The performance and reliability is excellent. The standard version costs $329 USD, but sometimes an installation can be obtained for free. The download size is about 30 MB. Supports both 32 and 64-bit servers.
I've used http://www.tallcomponents.com/ mainly to fill in pdf forms and then flatten the pdf. Seems to work fine. I haven't had any issues.
I don't know if it's the best, but I use PDF-Writer.NET, for which I paid $89. I have used it in several production applications. I like it because it's easy to set up and use, and it doesn't require a lot of coding, which makes it easier for new developers to ramp up on it.
Before that I was hacking together PDFs using an open source library and the Acrobat interop DLLs. That was rough.
Not sure about the PDF part but Aspose has a library for PDF. I've used their word library for generating word documents. their documentation is very decent in my opinion.
http://www.aspose.com/categories/file-format-components/aspose.pdf-for-.net-and-java/default.aspx
Check out Aspose.Pdf for .NET. It has a nice API, is well documented and has a light footprint.