How to convert ppt to HTML in C#?

How to convert ppt to HTML in C#? - c#

In my website, admin can upload a PPT & on submission, I am in need to convert to html.
I was using OpenXML library for the Word document. I thought the same lib can be used for PPT also. But not finding the method for the same.
namespace OpenXML_Sample
{
class Program
{
static void Main(string[] args)
{
ExportHTML.GenerateHTML(#"D:\test.pptx");
Console.ReadKey();
}
}
public class ExportHTML
{
public static XElement GenerateHTML(string filePath)
{
try
{
byte[] byteArray = File.ReadAllBytes(filePath);
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (PresentationDocument pptDoc=
PresentationDocument.Open(memoryStream, true))
{
HtmlConverterSettings settings = new HtmlConverterSettings()
{
PageTitle = "My Page Title"
};
//not accepting pptDoc as parameter,throws compile time error.
XElement xHtml = HtmlConverter.ConvertToHtml(pptDoc, settings);
var html = xHtml.ToString();
File.WriteAllText(#"D:\sample.html", html,Encoding.UTF8);
return xHtml;
}
}
}
catch (Exception ex)
{
throw new FileLoadException(ex.InnerException.Message.ToString());
}
}
}
}
How do I pass the ppt document to the method to generate the html document of the uploaded ppt file.
Would welcome for any other(free) api as well.

I have used the Aspose library before and I believe it supports what you are wanting to achieve.
A quick search on their forums revealed this post which might suit your needs;

web,
I like to share that Aspose.Slides for .NET supports exporting presentation file to HTML and you don't even need to install MS Office for this on your machine. All you need to do is to use the appropriate functionality in API. Please visit this documentation link for your kind reference. If you still have an issue then please contact us in Aspose.Slides support forum.
I am working as Support developer/ Evangelist at Aspose.

There are some examples of converting in C# with iSpring Platform http://www.ispringsolutions.com/ispring-platform. It isn’t tailored for a certain programming language, but it’s easy to use it with C#. First of all, there are some examples, and secondly, there’s a Code Builder app, so you can set the necessary conversion configuration and use the generated C# code in your app.

Related

C# .net converting HTML to RTF [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
Theres another post at HTML to RTF Converter for .NET, but are there any open source converters or tutorials? I don't want to use Sautinsoft. I think there is a solution at ExpertsExchange, but I have to pay for that. Most of the search results on google point to an RTF to html converter, but not a html to RTF converter.

Create a WebBrowser. Load it with the html content. Select all and copy from it. Paste into a richtextbox. Then you have the RTF
string html = "...."; // html content
RichTextBox rtbTemp = new RichTextBox();
WebBrowser wb = new WebBrowser();
wb.Navigate("about:blank");
wb.Document.Write(html);
wb.Document.ExecCommand("SelectAll", false, null);
wb.Document.ExecCommand("Copy", false, null);
rtbTemp.SelectAll();
rtbTemp.Paste();
Now rtbTemp.RTF has the RTF converted from the HTML.

TL;DR: I recommend using the OpenXml format and the HtmlToOpenXml nuget package if possible.
Microsoft Word COM
I haven't really searched much into this topic as a my use case is to use the functionality on a server which makes COM components not a great selection.
XHTML2RTF
As #IAmTimCorey mentioned you can use this codeproject library.
Disadvantages are:
Limited supported HTML and CSS
Not really .NET
...
Windows Forms Web Browser
As #Jerry mentioned you can use the Windows Forms WebBrowser control.
Disadvantages are:
Reference to System.Windows.Forms
Uses copy & paste (problematic for multithreading)
Only works in an STA thread
Not supported features include:
Fonts
Colors
Numbered lists
Strikethrough (del element)
...
DevExpress
Code sample of "Paul V" from the devexpress support center. (03.02.2015)
public String ConvertRTFToHTML(String RTF)
{
MemoryStream ms = new MemoryStream();
StreamWriter writer = new StreamWriter(ms);
writer.Write(RTF);
writer.Flush();
ms.Position = 0;
String output = "";
HtmlEditorExtension.Import(HtmlEditorImportFormat.Rtf, ms, (s, enumerable) => output = s);
return output;
}
public String ConvertHTMLToRTF(String Html)
{
MemoryStream ms = new MemoryStream();
var editor = new ASPxHtmlEditor { Html = html };
editor.Export(HtmlEditorExportFormat.Rtf, ms);
ms.Position = 0;
StreamReader reader = new StreamReader(ms);
return reader.ReadToEnd();
}
Or you could use the RichEditDocumentServer type as shown in this example.
A license for devexpress can cost from around 1500.- USD to 2200.- USD.
Unknown what actually is supported.
Disadvantages are:
Price
Quite a lot of references for one small thing
More?
Not supported features include:
Striketrough (del element)
Sautinsoft
public string ConvertHTMLToRTF(string html)
{
SautinSoft.HtmlToRtf h = new SautinSoft.HtmlToRtf();
return h.ConvertString(htmlString);
}
public string ConvertRTFToHTML(string rtf)
{
SautinSoft.RtfToHtml r = new SautinSoft.RtfToHtml();
byte[] bytes = Encoding.ASCII.GetBytes(rtf);
r.OpenDocx(bytes );
return r.ToHtml();
}
More examples and configuration options can be found here and here.
A licence for this component can cost from 400.- USD to 2000.- USD.
Supported is the following:
HTML 3.2
HTML 4.01
HTML 5
CSS
XHTML
Disadvantages are:
I'm not sure how active the development is
Price
Usage knowledgebase:
Converting numbered lists from the trix angular editor destroys indend
DIY
If you only wanted to support limited functionality you could write your own converter. I would not recommend this if the supported feature set is too large. (Sautinsoft claims to have written over 20'000 lines of code).
I have a small sample project here but is only for educational purposes in its current state.
OpenXml
If the OpenXml format is also ok for your use case you can use the HtmlToOpenXml nuget package. Its free and did support all features I've tested the other solutions against.
The project is based on the Open Xml SDK by microsoft and seems active.
public static byte[] ConvertHtmlToOpenXml(string html)
{
using (var generatedDocument = new MemoryStream())
{
using (var package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
{
var mainPart = package.MainDocumentPart;
if (mainPart == null)
{
mainPart = package.AddMainDocumentPart();
new Document(new Body()).Save(mainPart);
}
var converter = new HtmlConverter(mainPart);
converter.ParseHtml(html);
mainPart.Document.Save();
}
return generatedDocument.ToArray();
}
}
Link to example gist

The ExpertsExchange article is a poor one at best. Basically the OP gave up because they couldn't give a good answer. They list a link to the CodeProject article ( http://www.codeproject.com/KB/HTML/XHTML2RTF.aspx ) that shows you how to convert HTML to RTF but it isn't really a .NET solution. Instead, it would be something that would need to be highly adapted.
From my experience, there isn't a good open source converter out there. The pieces all seem to be there but it is waiting for someone to do the legwork of putting it all together. However, the immediate answer to your question is that there is not a converter already out there.

There seems to be a new opensource solution based on a WPF RichTextBox. The only caveat is it in the core only supports STAThreaded applications and in order to use in a i.e. ASP.net you need to call it in a STAThread (but there is a sample for that in the writeup).
For use in VSTO add-ins this is confirmed to work (ie. Outlook RTFBody)
Nuget:
https://www.nuget.org/packages/MarkupConverter/
Project:
https://github.com/figuemon/MarkupConverter
Writeup:
https://code.msdn.microsoft.com/Converting-between-RTF-and-aaa02a6e

Extract Data from .PDF files

I need to extract data from .PDF files and load it in to SQL 2008.
Can any one tell me how to proceed??

Here is an example of how to use iTextSharp to extract text data from a PDF. You'll have to fiddle with it some to make it do exactly what you want, I think it's a good outline. You can see how the StringBuilder is being used to store the text, but you could easily change that to use SQL.
static void Main(string[] args)
{
PdfReader reader = new PdfReader(#"c:\test.pdf");
StringBuilder builder = new StringBuilder();
for (int x = 1; x <= reader.NumberOfPages; x++)
{
PdfDictionary page = reader.GetPageN(x);
IRenderListener listener = new SBTextRenderer(builder);
PdfContentStreamProcessor processor = new PdfContentStreamProcessor(listener);
PdfDictionary pageDic = reader.GetPageN(x);
PdfDictionary resourcesDic = pageDic.GetAsDict(PdfName.RESOURCES);
processor.ProcessContent(ContentByteUtils.GetContentBytesForPage(reader, x), resourcesDic);
}
}
public class SBTextRenderer : IRenderListener
{
private StringBuilder _builder;
public SBTextRenderer(StringBuilder builder)
{
_builder = builder;
}
#region IRenderListener Members
public void BeginTextBlock()
{
}
public void EndTextBlock()
{
}
public void RenderImage(ImageRenderInfo renderInfo)
{
}
public void RenderText(TextRenderInfo renderInfo)
{
_builder.Append(renderInfo.GetText());
}
#endregion
}

Imagine if you asked this question. How can I load data from arbitrary text files into a SQL table. The challenge isn't opening the text file and reading it, its getting meaningful data out of the files automatically.
So you can use either iText or pdfSharp to read the PDF files, but its the getting meaningful data out that's going to be the challenge.

what you need to do is to use a tool to extract the text from PDF first and then read the file into a binary reader .. then store it into your database .. for extracting the text there are several tools to use. the first to mention are:
iTextsharp which is a Library that can be downloaded and used to do extensive work and in-depth edits and builds when dealing with PDF documents, and there are a lot of examples available online along with a full book that explains the ins and outs of itThe second tool is Adobe PDF iFilter which is a tool from adobe to deal with PDF modifications and manipulation.
Also Foxit iFilter also is a similar assembly that can do just what u r asking for!
PDF Boxwill also serve you!
these are the most well known and well documented ones!
check the following examples:
try the following examples on code project:
Parsing PDF files in .NET using PDFBox and IKVM.NET.A simple class to extract plain text from PDF documents with ITextSharpUsing the IFilter interface to extract text from various document types A parser for PDF Forms written in C#.NET
These do the job and they ain't hard to understand. Hope they help you :-)
A final note: as for me, i would iTextSharp as it's the most well documented library with most available examples.

If you mean metadata, try this question (first answer)
Read/Modify PDF Metadata using iTextSharp
You'll have to do the database stuff yourself though.

HTML to PDF in c#

I'm trying to create an application that converts a file from the HTML format to the PDF format.
The approach I am using is:
HTML to XHTML
XHTML to Formatting Object
Formatting Object to PDF
I'm having a bit of trouble with the whole XHTML to FO(or xsl).
Can you please tell me how to transform the XHTML to FO?
Or maybe a different approapch to the whole HTML to PDF?
Thanks, Catalin

I have write easiest way to write html to pdf code using NRerco Pdf library which available free, Install nuget package
PM > Install-Package NReco.PdfGenerator
Create HtmltoPdf()
{
if (System.IO.File.Exists("HTMLFile.html"))
{
System.IO.File.Delete("HTMLFile.html");
}
System.IO.File.WriteAllText("HTMLFile.html", html);
var htmlToPdf = new NReco.PdfGenerator.HtmlToPdfConverter();
if (System.IO.File.Exists("export.pdf"))
{
System.IO.File.Delete("export.pdf");
}
htmlToPdf.GeneratePdfFromFile("HTMLFile.html", null, "export.pdf");
}

Well you could use a HTML to PDF converter via shell, I am sorry I can not rememeber the name of the one I have used in the past, if you have a Google around, you should be able to find a good one.

Searched a lot for my personal stack app project SO2PDF and finally settled with wkhtmltopdf which so far is the best free tool to convert HTML to PDF. Yes I used it with c# ;-)

Here is the different approach. We are going to convert HTML/XML to PDF directly with 3d party tool (it has multiple preference and settings of conversion and doesn't require any external libraries).
1) Download free HTML to PDF SDK from her (it is easy PDF SDK)
2) Use the following code or run Action Center to customize the conversion
using BCL.easyPDF.Printer;
namespace TestPrinter
{
class Program
{
static void Main(string[] args)
{
if(args.Length != 2)
return;
string inputFileName = args[0];
string outputFileName = args[1];
Printer printer = new Printer();
try
{
IEPrintJob printjob = printer.IEPrintJob;
printjob.PrintOut(inputFileName, outputFileName);
}
catch(PrinterException ex)
{
System.Console.WriteLine(ex.Message);
}
finally
{
printer.Dispose();
}
}
}
}
Image: HTML to PDF C# API - Action Center

What is the best solution for converting RichTextFormat info to HTML in C#?

What is the best solution for converting RichTextFormat info to HTML in C#?
I know there are libraries out there that do this, and I was curious to see if you guys had any advice as to which ones are the better ones.
Thanks,
Jeff

I recently used a RTF to HTML conRTverter that worked great, called DocFrac.
It can be used with a GUI to convert files, but it also is a DLL.
I converted over 400 RTF files to HTML in a few minutes so performance is good too. I used the GUI so I don't have the details on the DLL. According to the site the DLL works with .NET however.
DocFrac at SourceForge
Update: fixed link, because www.docfrac.net doesn't exist anymore.

Try to use this library RTF to HTML .Net. It supports RTF to HTML and text to HTML converting ways. Full version not free but there is a free trial.
This code maybe useful:
SautinSoft.RtfToHtml r = new SautinSoft.RtfToHtml();
//specify some options
r.OutputFormat = SautinSoft.RtfToHtml.eOutputFormat.XHTML_10;
r.Encoding = SautinSoft.RtfToHtml.eEncoding.UTF_8;
string rtfFile = #"d:\test.rtf";
string htmlFile = #"d:\test.html";
string rtfString = null;
ReadFromFile(rtfFile,ref rtfString);
int i = r.ConvertStringToFile(rtfString,htmlFile);
if (i == 0)
{
System.Console.WriteLine("Converted successfully!");
System.Diagnostics.Process.Start(htmlFile);
}
else
System.Console.WriteLine("Converting Error!");
}
public static int ReadFromFile(string fileName,ref string fileStr)
{
try
{
FileInfo fi = new FileInfo(fileName);
StreamReader strmRead = fi.OpenText();
fileStr = strmRead.ReadToEnd();
strmRead.Close();
return 0;
}
catch
{
//error open file
System.Console.WriteLine("Error in open file");
return 1;
}
}

ScroogeXHTML, a small library for RTF to HTML / XHTML conversion, might be useful. However it only supports a subset of the RTF standard. For reports with tables and other advanced layout, there are other libraries like the Logictran R2Net converter.

Convert HTML to PDF in .NET [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I want to generate a PDF by passing HTML contents to a function. I have made use of iTextSharp for this but it does not perform well when it encounters tables and the layout just gets messy.
Is there a better way?

EDIT: New Suggestion
HTML Renderer for PDF using PdfSharp
(After trying wkhtmltopdf and suggesting to avoid it)
HtmlRenderer.PdfSharp is a 100% fully C# managed code, easy to use, thread safe and most importantly FREE (New BSD License) solution.
Usage
Download HtmlRenderer.PdfSharp nuget package.
Use Example Method.
public static Byte[] PdfSharpConvert(String html)
{
Byte[] res = null;
using (MemoryStream ms = new MemoryStream())
{
var pdf = TheArtOfDev.HtmlRenderer.PdfSharp.PdfGenerator.GeneratePdf(html, PdfSharp.PageSize.A4);
pdf.Save(ms);
res = ms.ToArray();
}
return res;
}
A very Good Alternate Is a Free Version of iTextSharp
Until version 4.1.6 iTextSharp was licensed under the LGPL licence and versions until 4.16 (or there may be also forks) are available as packages and can be freely used. Of course someone can use the continued 5+ paid version.
I tried to integrate wkhtmltopdf solutions on my project and had a bunch of hurdles.
I personally would avoid using wkhtmltopdf - based solutions on Hosted Enterprise applications for the following reasons.
First of all wkhtmltopdf is C++ implemented not C#, and you will
experience various problems embedding it within your C# code,
especially while switching between 32bit and 64bit builds of your
project. Had to try several workarounds including conditional
project building etc. etc. just to avoid "invalid format exceptions"
on different machines.
If you manage your own virtual machine its ok. But if your project
is running within a constrained environment like (Azure
(Actually is impossible withing azure as mentioned by the
TuesPenchin author) ,
Elastic Beanstalk etc) it's a nightmare to configure that environment only for wkhtmltopdf to work.
wkhtmltopdf is creating files within your server so you have to
manage user permissions and grant "write" access to where
wkhtmltopdf is running.
Wkhtmltopdf is running as a standalone application, so its not
managed by your IIS application pool. So you have to either host it
as a service on another machine or you will experience processing spikes and memory consumption within your production
server.
It uses temp files to generate the pdf, and in cases Like AWS
EC2 which has really slow disk i/o it is a big performance
problem.
The most hated "Unable to load DLL 'wkhtmltox.dll'" error reported
by many users.
--- PRE Edit Section ---
For anyone who want to generate pdf from html in simpler applications / environments I leave my old post as suggestion.
TuesPechkin
https://www.nuget.org/packages/TuesPechkin/
or Especially For MVC Web Applications
(But I think you may use it in any .net application)
Rotativa
https://www.nuget.org/packages/Rotativa/
They both utilize the
wkhtmtopdf binary for converting html to pdf. Which uses the webkit engine for rendering the pages so it can also parse css style sheets.
They provide easy to use seamless integration with C#.
Rotativa can also generate directly PDFs from any Razor View.
Additionally for real world web applications they also manage thread safety etc...

Last Updated: October 2020
This is the list of options for HTML to PDF conversion in .NET that I have put together (some free some paid)
GemBox.Document
https://www.nuget.org/packages/GemBox.Document/
Free (up to 20 paragraphs)
$680 - https://www.gemboxsoftware.com/document/pricelist
https://www.gemboxsoftware.com/document/examples/c-sharp-convert-html-to-pdf/307
PDF Metamorphosis .Net
https://www.nuget.org/packages/sautinsoft.pdfmetamorphosis/
$539 - $1078 - https://www.sautinsoft.com/products/pdf-metamorphosis/order.php
https://www.sautinsoft.com/products/pdf-metamorphosis/convert-html-to-pdf-dotnet-csharp.php
HtmlRenderer.PdfSharp
https://www.nuget.org/packages/HtmlRenderer.PdfSharp/1.5.1-beta1
BSD-UNSPECIFIED License
PuppeteerSharp
https://www.puppeteersharp.com/examples/index.html
MIT License
https://github.com/kblok/puppeteer-sharp
EO.Pdf
https://www.nuget.org/packages/EO.Pdf/
$799 - https://www.essentialobjects.com/Purchase.aspx?f=3
WnvHtmlToPdf_x64
https://www.nuget.org/packages/WnvHtmlToPdf_x64/
$750 - $1600 - http://www.winnovative-software.com/Buy.aspx
demo - http://www.winnovative-software.com/demo/default.aspx
IronPdf
https://www.nuget.org/packages/IronPdf/
$399 - $1599 - https://ironpdf.com/licensing/
https://ironpdf.com/examples/using-html-to-create-a-pdf/
Spire.PDF
https://www.nuget.org/packages/Spire.PDF/
Free (up to 10 pages)
$599 - $1799 - https://www.e-iceblue.com/Buy/Spire.PDF.html
https://www.e-iceblue.com/Tutorials/Spire.PDF/Spire.PDF-Program-Guide/Convert-HTML-to-PDF-Customize-HTML-to-PDF-Conversion-by-Yourself.html
Aspose.Html
https://www.nuget.org/packages/Aspose.Html/
$599 - $1797 - https://purchase.aspose.com/pricing/html/net
https://docs.aspose.com/html/net/html-to-pdf-conversion/
EvoPDF
https://www.nuget.org/packages/EvoPDF/
$450 - $1200 - http://www.evopdf.com/buy.aspx
ExpertPdfHtmlToPdf
https://www.nuget.org/packages/ExpertPdfHtmlToPdf/
$550 - $1200 - https://www.html-to-pdf.net/Pricing.aspx
Zetpdf
https://zetpdf.com
$299 - $599 - https://zetpdf.com/pricing/
Is not a well know or supported library - ZetPDF - Does anyone know the background of this Product?
PDFtron
https://www.pdftron.com/documentation/samples/cs/HTML2PDFTes
$4000/year - https://www.pdftron.com/licensing/
WkHtmlToXSharp
https://github.com/pruiz/WkHtmlToXSharp
Free
Concurrent conversion is implemented as processing queue.
SelectPDF
https://www.nuget.org/packages/Select.HtmlToPdf/
Free (up to 5 pages)
$499 - $799 - https://selectpdf.com/pricing/
https://selectpdf.com/pdf-library-for-net/
If none of the options above help you you can always search the NuGet packages:
https://www.nuget.org/packages?q=html+pdf

I highly recommend NReco, seriously. It has the free and paid version, and really worth it. It uses wkhtmtopdf in background, but you just need one assembly. Fantastic.
Example of use:
Install via NuGet.
var htmlContent = String.Format("<body>Hello world: {0}</body>", DateTime.Now);
var pdfBytes = (new NReco.PdfGenerator.HtmlToPdfConverter()).GeneratePdf(htmlContent);
Disclaimer: I'm not the developer, just a fan of the project :)

Most HTML to PDF converter relies on IE to do the HTML parsing and rendering. This can break when user updates their IE. Here is one that does not rely on IE.
The code is something like this:
EO.Pdf.HtmlToPdf.ConvertHtml(htmlText, pdfFileName);
Like many other converters, you can pass text, file name, or Url. The result can be saved into a file or a stream.

For all those looking for an working solution in .net 5 and above here you go.
Here are my working solutions.
Using wkhtmltopdf:
Download and install wkhtmltopdf latest version from here.
Use the below code.
public static string HtmlToPdf(string outputFilenamePrefix, string[] urls,
string[] options = null,
string pdfHtmlToPdfExePath = #"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe")
{
string urlsSeparatedBySpaces = string.Empty;
try
{
//Determine inputs
if ((urls == null) || (urls.Length == 0))
throw new Exception("No input URLs provided for HtmlToPdf");
else
urlsSeparatedBySpaces = String.Join(" ", urls); //Concatenate URLs
string outputFilename = outputFilenamePrefix + "_" + DateTime.Now.ToString("yyyy-MM-dd-hh-mm-ss-fff") + ".PDF"; // assemble destination PDF file name
var p = new System.Diagnostics.Process()
{
StartInfo =
{
FileName = pdfHtmlToPdfExePath,
Arguments = ((options == null) ? "" : string.Join(" ", options)) + " " + urlsSeparatedBySpaces + " " + outputFilename,
UseShellExecute = false, // needs to be false in order to redirect output
RedirectStandardOutput = true,
RedirectStandardError = true,
RedirectStandardInput = true, // redirect all 3, as it should be all 3 or none
WorkingDirectory = Path.Combine(Path.GetDirectoryName(Assembly.GetEntryAssembly().Location))
}
};
p.Start();
// read the output here...
var output = p.StandardOutput.ReadToEnd();
var errorOutput = p.StandardError.ReadToEnd();
// ...then wait n milliseconds for exit (as after exit, it can't read the output)
p.WaitForExit(60000);
// read the exit code, close process
int returnCode = p.ExitCode;
p.Close();
// if 0 or 2, it worked so return path of pdf
if ((returnCode == 0) || (returnCode == 2))
return outputFilename;
else
throw new Exception(errorOutput);
}
catch (Exception exc)
{
throw new Exception("Problem generating PDF from HTML, URLs: " + urlsSeparatedBySpaces + ", outputFilename: " + outputFilenamePrefix, exc);
}
}
And call the above method as HtmlToPdf("test", new string[] { "https://www.google.com" }, new string[] { "-s A5" });
If you need to convert HTML string to PDF, the tweak the above method and replace the Arguments to Process StartInfo as $#"/C echo | set /p=""{htmlText}"" | ""{pdfHtmlToPdfExePath}"" {((options == null) ? "" : string.Join(" ", options))} - ""C:\Users\xxxx\Desktop\{outputFilename}""";
Drawbacks of this approach:
The latest build of wkhtmltopdf as of posting this answer does not support latest HTML5 and CSS3. Hence if you try to export any html that as CSS GRID then the output will not be as expected.
You need to handle concurrency issues.
Using chrome headless:
Download and install latest chrome browser from here.
Use the below code.
var p = new System.Diagnostics.Process()
{
StartInfo =
{
FileName = "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe",
Arguments = #"/C --headless --disable-gpu --run-all-compositor-stages-before-draw --print-to-pdf-no-header --print-to-pdf=""C:/Users/Abdul Rahman/Desktop/test.pdf"" ""C:/Users/Abdul Rahman/Desktop/grid.html""",
}
};
p.Start();
// ...then wait n milliseconds for exit (as after exit, it can't read the output)
p.WaitForExit(60000);
// read the exit code, close process
int returnCode = p.ExitCode;
p.Close();
This will convert html file to pdf file.
If you need to convert some url to pdf then use the following as Argument to Process StartInfo
#"/C --headless --disable-gpu --run-all-compositor-stages-before-draw --print-to-pdf-no-header --print-to-pdf=""C:/Users/Abdul Rahman/Desktop/test.pdf"" ""https://www.google.com""",
Drawbacks of this approach:
This works as expected with latest HTML5 and CSS3 features. Output will be same as you view in browser but when running this via IIS you need to run the AppliactionPool of your application under LocalSystem Identity or you need to provide read/write access to IISUSRS.
Using Selenium WebDriver:
Install Nuget Packages Selenium.WebDriver and Selenium.WebDriver.ChromeDriver.
Use the below code.
public async Task<byte[]> ConvertHtmlToPdf(string html)
{
var directory = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.CommonDocuments), "ApplicationName");
Directory.CreateDirectory(directory);
var filePath = Path.Combine(directory, $"{Guid.NewGuid()}.html");
await File.WriteAllTextAsync(filePath, html);
var driverOptions = new ChromeOptions();
// In headless mode, PDF writing is enabled by default (tested with driver major version 85)
driverOptions.AddArgument("headless");
using var driver = new ChromeDriver(driverOptions);
driver.Navigate().GoToUrl(filePath);
// Output a PDF of the first page in A4 size at 90% scale
var printOptions = new Dictionary<string, object>
{
{ "paperWidth", 210 / 25.4 },
{ "paperHeight", 297 / 25.4 },
{ "scale", 0.9 },
{ "pageRanges", "1" }
};
var printOutput = driver.ExecuteChromeCommandWithResult("Page.printToPDF", printOptions) as Dictionary<string, object>;
var pdf = Convert.FromBase64String(printOutput["data"] as string);
File.Delete(filePath);
return pdf;
}
Advantage of this method:
This just needs an Nuget installation and works as expected with latest HTML5 and CSS3 features. Output will be same as you view in browser.
Drawbacks of this approach:
This approach needs latest chrome browser to be installed in the server where the app runs.
If the chrome browser version in server is updated then Selenium.WebDriver.ChromeDriver Nuget package needs to be updated. Else this will throw run time error due to version mismatch.
With this approach, please make sure to add <PublishChromeDriver>true</PublishChromeDriver> in .csproj file as shown below:
<PropertyGroup>
<TargetFramework>net5.0</TargetFramework>
<LangVersion>latest</LangVersion>
<Nullable>enable</Nullable>
<PublishChromeDriver>true</PublishChromeDriver>
</PropertyGroup>
This will publish the chrome driver when publishing the project.
Here is the link to my working project repo - HtmlToPdf
Using window.print() in JavaScript to generate PDF from browser
If the users are using your app from browser then you can rely on JavaScript and use window.print() and necessary print media css to generate PDF from the browser. For example generating invoice from browser in an inventory app.
Advantage of this method:
No dependency on any tools.
PDF generated directly from HTML, CSS and JS in browser.
Faster
Supports all the latest CSS properties.
Drawbacks of this approach:
In SPA like Blazor, we need to do some workaround with iframe to print sections of page.
I arrived at the above answer after almost spending 2 days with available options and finally implemented Selenium based solution and it's working. Hope this helps you and save your time.

You can use Google Chrome print-to-pdf feature from its headless mode. I found this to be the simplest yet the most robust method.
var url = "https://stackoverflow.com/questions/564650/convert-html-to-pdf-in-net";
var chromePath = #"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe";
var output = Path.Combine(Environment.CurrentDirectory, "printout.pdf");
using (var p = new Process())
{
p.StartInfo.FileName = chromePath;
p.StartInfo.Arguments = $"--headless --disable-gpu --print-to-pdf={output} {url}";
p.Start();
p.WaitForExit();
}

2018's update, and Let's use standard HTML+CSS=PDF equation!
There are good news for HTML-to-PDF demands. As this answer showed, the W3C standard css-break-3 will solve the problem... It is a Candidate Recommendation with plan to turn into definitive Recommendation in 2017 or 2018, after tests.
As not-so-standard there are solutions, with plugins for C#, as showed by print-css.rocks.

Quite likely most projects will wrap a C/C++ engine rather than implementing a C# solution from scratch. Try Project Gotenberg.
To test it
docker run --rm -p 3000:3000 thecodingmachine/gotenberg:6
Curl sample
curl --request POST \
--url http://localhost:3000/convert/url \
--header 'Content-Type: multipart/form-data' \
--form remoteURL=https://brave.com \
--form marginTop=0 \
--form marginBottom=0 \
--form marginLeft=0 \
--form marginRight=0 \
-o result.pdf
C# sample.cs
using System;
using System.Net.Http;
using System.Threading.Tasks;
using System.IO;
using static System.Console;
namespace Gotenberg
{
class Program
{
public static async Task Main(string[] args)
{
try
{
var client = new HttpClient();
var formContent = new MultipartFormDataContent
{
{new StringContent("https://brave.com/"), "remoteURL"},
{new StringContent("0"), "marginTop" }
};
var result = await client.PostAsync(new Uri("http://localhost:3000/convert/url"), formContent);
await File.WriteAllBytesAsync("brave.com.pdf", await result.Content.ReadAsByteArrayAsync());
}
catch (Exception ex)
{
WriteLine(ex);
}
}
}
}
To compile
csc sample.cs -langversion:latest -reference:System.Net.Http.dll && mono ./sample.exe

Below is an example of converting html + css to PDF using iTextSharp (iTextSharp + itextsharp.xmlworker)
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.tool.xml;
byte[] pdf; // result will be here
var cssText = File.ReadAllText(MapPath("~/css/test.css"));
var html = File.ReadAllText(MapPath("~/css/test.html"));
using (var memoryStream = new MemoryStream())
{
var document = new Document(PageSize.A4, 50, 50, 60, 60);
var writer = PdfWriter.GetInstance(document, memoryStream);
document.Open();
using (var cssMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(cssText)))
{
using (var htmlMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)))
{
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, htmlMemoryStream, cssMemoryStream);
}
}
document.Close();
pdf = memoryStream.ToArray();
}

It depends on any other requirements you have.
A really simple but not easily deployable solution is to use a WebBrowser control to load the Html and then using the Print method printing to a locally installed PDF printer. There are several free PDF printers available and the WebBrowser control is a part of the .Net framework.
EDIT:
If you Html is XHtml you can use PDFizer to do the job.

This is a free library and works very easily : OpenHtmlToPdf
string timeStampForPdfName = DateTime.Now.ToString("yyMMddHHmmssff");
string serverPath = System.Web.Hosting.HostingEnvironment.MapPath("~/FolderName");
string pdfSavePath = Path.Combine(#serverPath, "FileName" + timeStampForPdfName + ".FileExtension");
//OpenHtmlToPdf Library used for Performing PDF Conversion
var pdf = Pdf.From(HTML_String).Content();
//FOr writing to file from a ByteArray
File.WriteAllBytes(pdfSavePath, pdf.ToArray()); // Requires System.Linq

It seems like so far the best free .NET solution is the TuesPechkin library which is a wrapper around the wkhtmltopdf native library.
I've now used the single-threaded version to convert a few thousand HTML strings to PDF files and it seems to work great. It's supposed to also work in multi-threaded environments (IIS, for example) but I haven't tested that.
Also since I wanted to use the latest version of wkhtmltopdf (0.12.5 at the time of writing), I downloaded the DLL from the official website, copied it to my project root, set copy to output to true, and initialized the library like so:
var dllDir = AppDomain.CurrentDomain.BaseDirectory;
Converter = new StandardConverter(new PdfToolset(new StaticDeployment(dllDir)));
Above code will look exactly for "wkhtmltox.dll", so don't rename the file. I used the 64-bit version of the DLL.
Make sure you read the instructions for multi-threaded environments, as you will have to initialize it only once per app lifecycle so you'll need to put it in a singleton or something.

You can also check Spire, it allow you to create HTML to PDF with this simple piece of code
string htmlCode = "<p>This is a p tag</p>";
//use single thread to generate the pdf from above html code
Thread thread = new Thread(() =>
{ pdf.LoadFromHTML(htmlCode, false, setting, htmlLayoutFormat); });
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
thread.Join();
// Save the file to PDF and preview it.
pdf.SaveToFile("output.pdf");
System.Diagnostics.Process.Start("output.pdf");

I was also looking for this a while back. I ran into HTMLDOC http://www.easysw.com/htmldoc/ which is a free open source command line app that takes an HTML file as an argument and spits out a PDF from it. It's worked for me pretty well for my side project, but it all depends on what you actually need.
The company that makes it supplies the compiled binaries, but you are free to download and compile from source and use it for free. I managed to compile a pretty recent revision (for version 1.9) and I intend on releasing a binary installer for it in a few days, so if you're interested I can provide a link to it as soon as I post it.
HTMLDOC converts HTML and Markdown source files or web pages to EPUB, PostScript, or PDF files with an optional table of contents.
Edit (2/25/2014): Seems like the docs and site moved to https://www.msweet.org/htmldoc/
Edit (2022/3) Binaries are on github GPL2 licensed https://github.com/michaelrsweet/htmldoc

You need to use a commercial library if you need perfect html rendering in pdf.
ExpertPdf Html To Pdf Converter is very easy to use and it supports the latest html5/css3. You can either convert an entire url to pdf:
using ExpertPdf.HtmlToPdf;
byte[] pdfBytes = new PdfConverter().GetPdfBytesFromUrl(url);
or a html string:
using ExpertPdf.HtmlToPdf;
byte[] pdfBytes = new PdfConverter().GetPdfBytesFromHtmlString(html, baseUrl);
You also have the alternative to directly save the generated pdf document to a Stream of file on the disk.

If you want user to download the pdf of rendered page in the browser then the easiest solution to the problem is
window.print();
on client side it will prompt user to save pdf of current page. You can also customize the appearance of pdf by linking style
<link rel="stylesheet" type="text/css" href="print.css" media="print">
print.css is applied to the html while printing.
Limitation
You can't store the file on server side.
User prompt to print the page than he had to save page manually.
Page must to be rendered in a tab.

Best Tool i have found and used for generating PDF of javascript and styles rendered views or html pages is phantomJS.
Download the .exe file with the rasterize.js function found in root of exe of example folder and put inside solution.
It Even allows you to download the file in any code without opening that file also it also allows to download the file when the styles and specially jquery are applied.
Following code generate PDF File :
public ActionResult DownloadHighChartHtml()
{
string serverPath = Server.MapPath("~/phantomjs/");
string filename = DateTime.Now.ToString("ddMMyyyy_hhmmss") + ".pdf";
string Url = "http://wwwabc.com";
new Thread(new ParameterizedThreadStart(x =>
{
ExecuteCommand(string.Format("cd {0} & E: & phantomjs rasterize.js {1} {2} \"A4\"", serverPath, Url, filename));
//E: is the drive for server.mappath
})).Start();
var filePath = Path.Combine(Server.MapPath("~/phantomjs/"), filename);
var stream = new MemoryStream();
byte[] bytes = DoWhile(filePath);
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=Image.pdf");
Response.OutputStream.Write(bytes, 0, bytes.Length);
Response.End();
return RedirectToAction("HighChart");
}
private void ExecuteCommand(string Command)
{
try
{
ProcessStartInfo ProcessInfo;
Process Process;
ProcessInfo = new ProcessStartInfo("cmd.exe", "/K " + Command);
ProcessInfo.CreateNoWindow = true;
ProcessInfo.UseShellExecute = false;
Process = Process.Start(ProcessInfo);
}
catch { }
}
private byte[] DoWhile(string filePath)
{
byte[] bytes = new byte[0];
bool fail = true;
while (fail)
{
try
{
using (FileStream file = new FileStream(filePath, FileMode.Open, FileAccess.Read))
{
bytes = new byte[file.Length];
file.Read(bytes, 0, (int)file.Length);
}
fail = false;
}
catch
{
Thread.Sleep(1000);
}
}
System.IO.File.Delete(filePath);
return bytes;
}

As a representative of HiQPdf Software I believe the best solution is HiQPdf HTML to PDF converter for .NET. It contains the most advanced HTML5, CSS3, SVG and JavaScript rendering engine on market. There is also a free version of the HTML to PDF library which you can use to produce for free up to 3 PDF pages. The minimal C# code to produce a PDF as a byte[] from a HTML page is:
HtmlToPdf htmlToPdfConverter = new HtmlToPdf();
// set PDF page size, orientation and margins
htmlToPdfConverter.Document.PageSize = PdfPageSize.A4;
htmlToPdfConverter.Document.PageOrientation = PdfPageOrientation.Portrait;
htmlToPdfConverter.Document.Margins = new PdfMargins(0);
// convert HTML to PDF
byte[] pdfBuffer = htmlToPdfConverter.ConvertUrlToMemory(url);
You can find more detailed examples both for ASP.NET and MVC in HiQPdf HTML to PDF Converter examples repository.

To convert HTML to PDF in C# use ABCpdf.
ABCpdf can make use of the Gecko or Trident rendering engines, so your HTML table will look the same as it appears in FireFox and Internet Explorer.
There's an on-line demo of ABCpdf at www.abcpdfeditor.com. You could use this to check out how your tables will render first, without needing to download and install software.
For rendering entire web pages you'll need the AddImageUrl or AddImageHtml functions. But if all you want to do is simply add HTML styled text then you could try the AddHtml function, as below:
Doc theDoc = new Doc();
theDoc.FontSize = 72;
theDoc.AddHtml("<b>Some HTML styled text</b>");
theDoc.Save(Server.MapPath("docaddhtml.pdf"));
theDoc.Clear();
ABCpdf is a commercial software title, however the standard edition can often be obtained for free under special offer.

Instead of parsing HTML directly to PDF, you can create an Bitmap of your HTML-page and then insert the Bitmap into your PDF, using for example iTextSharp.
Here's a code how to get an Bitmap of an URL. I found it somewhere here on SO, if I find the source I'll link it.
public System.Drawing.Bitmap HTMLToImage(String strHTML)
{
System.Drawing.Bitmap myBitmap = null;
System.Threading.Thread myThread = new System.Threading.Thread(delegate()
{
// create a hidden web browser, which will navigate to the page
System.Windows.Forms.WebBrowser myWebBrowser = new System.Windows.Forms.WebBrowser();
// we don't want scrollbars on our image
myWebBrowser.ScrollBarsEnabled = false;
// don't let any errors shine through
myWebBrowser.ScriptErrorsSuppressed = true;
// let's load up that page!
myWebBrowser.Navigate("about:blank");
// wait until the page is fully loaded
while (myWebBrowser.ReadyState != System.Windows.Forms.WebBrowserReadyState.Complete)
System.Windows.Forms.Application.DoEvents();
myWebBrowser.Document.Body.InnerHtml = strHTML;
// set the size of our web browser to be the same size as the page
int intScrollPadding = 20;
int intDocumentWidth = myWebBrowser.Document.Body.ScrollRectangle.Width + intScrollPadding;
int intDocumentHeight = myWebBrowser.Document.Body.ScrollRectangle.Height + intScrollPadding;
myWebBrowser.Width = intDocumentWidth;
myWebBrowser.Height = intDocumentHeight;
// a bitmap that we will draw to
myBitmap = new System.Drawing.Bitmap(intDocumentWidth - intScrollPadding, intDocumentHeight - intScrollPadding);
// draw the web browser to the bitmap
myWebBrowser.DrawToBitmap(myBitmap, new System.Drawing.Rectangle(0, 0, intDocumentWidth - intScrollPadding, intDocumentHeight - intScrollPadding));
});
myThread.SetApartmentState(System.Threading.ApartmentState.STA);
myThread.Start();
myThread.Join();
return myBitmap;
}

With Winnovative HTML to PDF converter you can convert a HTML string in a single line
byte[] outPdfBuffer = htmlToPdfConverter.ConvertHtml(htmlString, baseUrl);
The base URL is used to resolve the images referenced by relative URLs in HTML string. Alternatively you can use full URLs in HTML or embed images using src="data:image/png" for image tag.
In answer to 'fubaar' user comment about Winnovative converter, a correction is necessary. The converter does not use IE as rendering engine. It actually does not depend on any installed software and the rendering is compatible with WebKit engine.

PDFmyURL recently released a .NET component for web page / HTML to PDF conversion as well. This has a very user friendly interface, for example:
PDFmyURL pdf = new PDFmyURL("yourlicensekey");
pdf.ConvertURL("http://www.example.com", Application.StartupPath + #"\example.pdf");
Documentation: PDFmyURL .NET component documentation
Disclaimer: I work for the company that owns PDFmyURL

Already if you are using itextsharp dll, no need to add third party dll's(plugin), I think you are using htmlworker instead of it use xmlworker you can easily convert your html to pdf.
Some css won't work they are Supported CSS
Full Explain with example Reference Click here
MemoryStream memStream = new MemoryStream();
TextReader xmlString = new StringReader(outXml);
using (Document document = new Document())
{
PdfWriter writer = PdfWriter.GetInstance(document, memStream);
//document.SetPageSize(iTextSharp.text.PageSize.A4);
document.Open();
byte[] byteArray = System.Text.Encoding.UTF8.GetBytes(outXml);
MemoryStream ms = new MemoryStream(byteArray);
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, ms, System.Text.Encoding.UTF8);
document.Close();
}
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=" + filename + ".pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.BinaryWrite(memStream.ToArray());
Response.End();
Response.Flush();

Another suggestion it to try the solution by https://grabz.it.
They provide a nice .NET API to catch screenshots and manipulate it in an easy and flexible approach.
To use it in your app you will need to first get key + secret and download the .NET SDK (it's free).
Now a short example of using it.
To use the API you will first need to create an instance of the GrabzItClient class, passing your application key and application secret from your GrabzIt account to the constructor, as shown in the below example:
//Create the GrabzItClient class
//Replace "APPLICATION KEY", "APPLICATION SECRET" with the values from your account!
private GrabzItClient grabzIt = GrabzItClient.Create("Sign in to view your Application Key", "Sign in to view your Application Secret");
Now, to convert the HTML to PDF all you need to do it:
grabzIt.HTMLToPDF("<html><body><h1>Hello World!</h1></body></html>");
You can convert to image as well:
grabzIt.HTMLToImage("<html><body><h1>Hello World!</h1></body></html>");
Next you need to save the image. You can use one of the two save methods available, Save if publicly accessible callback handle available and SaveTo if not. Check the documentation for details.

Another trick you can use WebBrowser control, below is my full working code
Assigning Url to text box control in my case
protected void Page_Load(object sender, EventArgs e)
{
txtweburl.Text = "https://www.google.com/";
}
Below is code for generate screeen using thread
protected void btnscreenshot_click(object sender, EventArgs e)
{
// btnscreenshot.Visible = false;
allpanels.Visible = true;
Thread thread = new Thread(GenerateThumbnail);
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
thread.Join();
}
private void GenerateThumbnail()
{
// btnscreenshot.Visible = false;
WebBrowser webrowse = new WebBrowser();
webrowse.ScrollBarsEnabled = false;
webrowse.AllowNavigation = true;
string url = txtweburl.Text.Trim();
webrowse.Navigate(url);
webrowse.Width = 1400;
webrowse.Height = 50000;
webrowse.DocumentCompleted += webbrowse_DocumentCompleted;
while (webrowse.ReadyState != WebBrowserReadyState.Complete)
{
System.Windows.Forms.Application.DoEvents();
}
}
In below code I am saving the pdf file after download
private void webbrowse_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
// btnscreenshot.Visible = false;
string folderPath = Server.MapPath("~/ImageFiles/");
WebBrowser webrowse = sender as WebBrowser;
//Bitmap bitmap = new Bitmap(webrowse.Width, webrowse.Height);
Bitmap bitmap = new Bitmap(webrowse.Width, webrowse.Height, PixelFormat.Format16bppRgb565);
webrowse.DrawToBitmap(bitmap, webrowse.Bounds);
string Systemimagedownloadpath = System.Configuration.ConfigurationManager.AppSettings["Systemimagedownloadpath"].ToString();
string fullOutputPath = Systemimagedownloadpath + Request.QueryString["VisitedId"].ToString() + ".png";
MemoryStream stream = new MemoryStream();
bitmap.Save(fullOutputPath, System.Drawing.Imaging.ImageFormat.Jpeg);
//generating pdf code
Document pdfDoc = new Document(new iTextSharp.text.Rectangle(1100f, 20000.25f));
PdfWriter writer = PdfWriter.GetInstance(pdfDoc, Response.OutputStream);
pdfDoc.Open();
iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance(fullOutputPath);
img.ScaleAbsoluteHeight(20000);
img.ScaleAbsoluteWidth(1024);
pdfDoc.Add(img);
pdfDoc.Close();
//Download the PDF file.
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=ImageExport.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.Write(pdfDoc);
Response.End();
}
You can also refer my oldest post for more information: Navigation to the webpage was canceled getting message in asp.net web form

Try this PDF Duo .Net converting component for converting HTML to PDF from ASP.NET application without using additional dlls.
You can pass the HTML string or file, or stream to generate the PDF.
Use the code below (Example C#):
string file_html = #"K:\hdoc.html";
string file_pdf = #"K:\new.pdf";
try
{
DuoDimension.HtmlToPdf conv = new DuoDimension.HtmlToPdf();
conv.OpenHTML(file_html);
conv.SavePDF(file_pdf);
textBox4.Text = "C# Example: Converting succeeded";
}
Info + C#/VB examples you can find at: http://www.duodimension.com/html_pdf_asp.net/component_html_pdf.aspx

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to convert ppt to HTML in C#? - c#

I have used the Aspose library before and I believe it supports what you are wanting to achieve. A quick search on their forums revealed this post which might suit your needs;

Related

C# .net converting HTML to RTF [closed]

Extract Data from .PDF files

HTML to PDF in c#

What is the best solution for converting RichTextFormat info to HTML in C#?

Convert HTML to PDF in .NET [closed]

Categories

Resources