I've taken a look at this thread about converting RTF to HTML but some links are down or cost money.
What is the best way to convert RTF to HTML, it is just text (font size, bold, underline, color etc.) not images or anything else.
Any help would be greatly appreciated!
Just Google that dead link of DocFrac, it gives you the open source location on SourceForge and a download location at SoftPedia.
Seems pretty stable, but haven't tried it myself.
EDIT: It uses a COM DLL, or unmanaged DLL, so to speak. You can link that with ordinary P/Invoke calls, but if you have trouble setting it up, have a look at this post which shows how to do this for converting RTF to HTML with DoxLib. The DLLs are found in the *.gz file from SourceForge. There's even a VB6 example project, but that can only be run with a non-.NET version of VB.
The CopySourceAsHtml visual studio plugin (which is open source) does exactly that. Take a look at the sourcecode.
You can try the C# library from the article Writing Your Own RTF Converter which is free.
Related
I use Open XML SDK tool to parse pptx-files. Now I am going to develop my own .NET library/tool to generate an image from a PowerPoint slide. Open XML SDK in principle is not for such tasks, and I do not know where to start research?.
maybe to solve it better to use another programming language, for example, C++ (I also know it) with some library?
or it may be necessary intermediate convert pptx into some another format, for example, HTML and only then to image?
I also tried to investigate Aspose.Slides and Spire.Presentation libraries' dependencies in NuGet to know what they use to an image generation, but these attempts did not succeed.
This looks like a good starting point: Presentation to image conversion
The cited versions are old, but the syntax is still the same: PPT slides to images
My mistake, I misunderstood the end goal.
Hi i'm new programming and i have written few application to access pdf content by using some dll files, but now my question is how can we write our own dll to access the pdf files. I know it's a big process but i'm very much interested to learn about this. any one please help me.
You can start by reading the PDF specification (warning 32MB behind this link) in order to understand how the PDF file format is implemented. This is necessary if you want to be able to parse it and extract the information you are interested in.
In the meantime (as this reading might occupy you during a certain amount of time) if you have pressing project deadlines you probably want to use an existing library such as iTextSharp.
I know it's a big process but i'm very much interested to learn about this.
That's true. I'd like to suggest to study some open source APIs (iTextSharp) and PDF SDK.
We are using Report Definition laguage (RDL) templates to define various reports in one of our Sharepoint applications. These reports are (then) saved as PDFs into various SharePoint Document Library's. One report in-particular renders, but is considered to be "failing" due to the styling needs of the report. So it appears RDL only understand "very simple" HTML.
For Example:
Trademark characters are not rendering as superscript (they render as normal text instead)
The ability to assign Line Height fails
The ability to assign Word Spacing fails (so printers "leading" requirements fail)
Both of these point to various marked Microsoft limitation for RDL's to interprint various HTML...of which we are now aware.
So...
I need a better tool...and we are scratching our heads on this one!
QUESTION:
What tools take-in HTML, understand CSS (well!) and can generate PDFs from C-Sharp objects?
Please keep in-mind I need the to PDF generator tools you recommend (below) to understand CSS and HTML.
NOTE:
I looked at the various other StackEchange sites to see if there is a better forum for this particular question, but this one was the only one that seemed to fit-the-bill. If you are a mediator, and feel this question is mis-placed, please feel free to move this question.
This HTML to PDF converter has the most accurate conversion of a complex html/css page. There is also a demo to try the conversion with your html
Maybe you can give Amyuni WebkitPDF a try. It is a Free component for converting HTML+CSS into PDF files. From the home page:
Directly convert HTML files into PDF without the use of a web browser or a printer driver
Convert HTML files into XAML/XPS for rendering within Silverlight
Integrate and deploy the HTML conversion feature within your applications
Generate either a single continuous PDF page or split the HTML into multiple PDF pages
Amyuni WebkitPDF is distributed as a library with a sample application, and sample code for C++ and C#.
Disclaimer: I currently work as software developer at Amyuni Technologies.
I only know a workaround for the "leading space" issue. This example "leads" the value with 10 spaces:
=space(10) & Fields!FieldName.Value
This should work for any renderer, I'll update this if I come around other tricks.
Have a look at Aspose.Pdf for .NET: http://www.aspose.com/categories/.net-components/aspose.pdf-for-.net/default.aspx
I need to create a C# or C++ (MFC) application that converts pdf files to txt. I need not only to convert, but remove headers, footers, some garbage characters on the left margin etc. Thus the application shold allow the user to set page margins to cut off what is not needed. I actually have already created such an application using xpdf, but it gives me some problems when I am trying to insert custom tags into the extracted text to preserve italics and bold. Maybe somebody could suggest something useful?
Thanks.
There are shareware and freeware utilities out there. Try fetching their source code, or perhaps use them the way they are.
A public version of the PDF specification can be found here: Adobe PDF Specification
PDF Shareware readers can be found: PDF Reader source code # SourceForge
Please look at Podofo. It's a LGPL-licensed library that has many powerful editing features. One of it's examples, txt2pdf IIRC, is a good start: it shows basic text-extraction; From there you can check if pre (in pdf engine) or post (in text) filtering suffices to your goals. I didn't get to use Pdf Hummus, but it's supposed to have these capabilities too, although it's less straightforward.
i need to convert bulk of pdf documents into non-editable format(scanned) some one help me to achieve this using C#.net
Assuming that Chris's comment is correct and you're trying to convert PDF docs to pictures, I'd suggest taking a look at ImageMagick.NET which is a .Net wrapper around ImageMagick which is an open source library for doing things like that.
Never used it myself, but it looks interesting.