I'm trying to crate a PDF out of a HTML page. The CMS I'm using is EPiServer.
This is my code so far:
protected void Button1_Click(object sender, EventArgs e)
{
naaflib.pdfDocument(CurrentPage);
}
public static void pdfDocument(PageData pd)
{
//Extract data from Page (pd).
string intro = pd["MainIntro"].ToString(); // Attribute
string mainBody = pd["MainBody"].ToString(); // Attribute
// makae ready HttpContext
HttpContext.Current.Response.Clear();
HttpContext.Current.Response.ContentType = "application/pdf";
// Create PDF document
Document pdfDocument = new Document(PageSize.A4, 80, 50, 30, 65);
//PdfWriter pw = PdfWriter.GetInstance(pdfDocument, HttpContext.Current.Response.OutputStream);
PdfWriter.GetInstance(pdfDocument, HttpContext.Current.Response.OutputStream);
pdfDocument.Open();
pdfDocument.Add(new Paragraph(pd.PageName));
pdfDocument.Add(new Paragraph(intro));
pdfDocument.Add(new Paragraph(mainBody));
pdfDocument.Close();
HttpContext.Current.Response.End();
}
This outputs the content of the article name, intro-text and main body.
But it does not pars HTML which is in the article text and there is no layout.
I've tried having a look at http://itextsharp.sourceforge.net/tutorial/index.html without becomming any wiser.
Any pointers to the right direction is greatly appreciated :)
For later versions of iTextSharp:
Using iTextSharp you can use the iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList() method to create a PDF from HTML.
ParseToList() takes a TextReader (an abstract class) for its HTML source, which means you can use a StringReader or StreamReader (both of which use TextReader as a base type). I used a StringReader and was able to generate PDFs from simple mark up. I tried to use the HTML returned from a webpage and got errors on all but the simplist pages. Even the simplist webpage I retrieved (http://black.ea.com/) was rendering the content of the page's 'head' tag onto the PDF, so I think the HTMLWorker.ParseToList() method is picky about the formatting of the HTML it parses.
Anyway, if you want to try here's the test code I used:
// Download content from a very, very simple "Hello World" web page.
string download = new WebClient().DownloadString("http://black.ea.com/");
Document document = new Document(PageSize.A4, 80, 50, 30, 65);
try {
using (FileStream fs = new FileStream("TestOutput.pdf", FileMode.Create)) {
PdfWriter.GetInstance(document, fs);
using (StringReader stringReader = new StringReader(download)) {
ArrayList parsedList = HTMLWorker.ParseToList(stringReader, null);
document.Open();
foreach (object item in parsedList) {
document.Add((IElement)item);
}
document.Close();
}
}
} catch (Exception exc) {
Console.Error.WriteLine(exc.Message);
}
I couldn't find any documentation on which HTML constructs HTMLWorker.ParseToList() supports; if you do please post it here. I'm sure a lot of people would be interested.
For older versions of iTextSharp:
You can use the iTextSharp.text.html.HtmlParser.Parse method to create a PDF based on html.
Here's a snippet demonstrating this:
Document document = new Document(PageSize.A4, 80, 50, 30, 65);
try {
using (FileStream fs = new FileStream("TestOutput.pdf", FileMode.Create)) {
PdfWriter.GetInstance(document, fs);
HtmlParser.Parse(document, "YourHtmlDocument.html");
}
} catch(Exception exc) {
Console.Error.WriteLine(exc.Message);
}
The one (major for me) problem is the HTML must be strictly XHTML compliant.
Good luck!
Related
I'm trying to create a CV builder that saves the CV edited by the user to a folder in my project for further processing of sending it through email, I have reached as far as using itext to create a PDF of an HTML div, but has no CSS or any of the text values I have returned from my database. Through some research i find that my problem could be solved by using itext 7 and an add-on pdfHTML but can not find any proper examples of how to use it with my ASP.NET code. Would really appreciate any help.
Bellow is the code for the on-click button event I use to generate the PDF
protected void ButtonDownload_Click(object sender, EventArgs e)
{
Response.ContentType = "application/pdf";
//Response.AddHeader("content-disposition", "attachment;filename=Panel.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
StringWriter sw = new StringWriter();
HtmlTextWriter hw = new HtmlTextWriter(sw);
contentdiv.RenderControl(hw); //convert the div to PDF
StringReader sr = new StringReader(sw.ToString());
Document pdfDoc = new Document(PageSize.A4, 10f, 10f, 10f, 0f);
HTMLWorker htmlparser = new HTMLWorker(pdfDoc);
PdfWriter.GetInstance(pdfDoc, Response.OutputStream);
pdfDoc.Open();
htmlparser.Parse(sr);
pdfDoc.Close();
string filename = base.Server.MapPath("~/PDF/" + "UserCV.pdf");
HttpContext.Current.Request.SaveAs(filename, false);
Response.End();
}
This picture shows the pdf result i get when i click the download button
And this is html page it is trying to convert
The text bellow the headings on the HTML page are Labels whose values are being set by retrieving values form a database
This is an example on how to use pdfHTML
This example is quite extensive, as it also sets document properties, and registers a custom Font.
public void createPdf(String src, String dest, String resources) throws IOException {
try {
FileOutputStream outputStream = new FileOutputStream(dest);
WriterProperties writerProperties = new WriterProperties();
//Add metadata
writerProperties.addXmpMetadata();
PdfWriter pdfWriter = new PdfWriter(outputStream, writerProperties);
PdfDocument pdfDoc = new PdfDocument(pdfWriter);
pdfDoc.getCatalog().setLang(new PdfString("en-US"));
//Set the document to be tagged
pdfDoc.setTagged();
pdfDoc.getCatalog().setViewerPreferences(new PdfViewerPreferences().setDisplayDocTitle(true));
//Set meta tags
PdfDocumentInfo pdfMetaData = pdfDoc.getDocumentInfo();
pdfMetaData.setAuthor("Joris Schellekens");
pdfMetaData.addCreationDate();
pdfMetaData.getProducer();
pdfMetaData.setCreator("JS");
pdfMetaData.setKeywords("example, accessibility");
pdfMetaData.setSubject("PDF accessibility");
//Title is derived from html
// pdf conversion
ConverterProperties props = new ConverterProperties();
FontProvider fp = new FontProvider();
fp.addStandardPdfFonts();
fp.addDirectory(resources);//The noto-nashk font file (.ttf extension) is placed in the resources
props.setFontProvider(fp);
props.setBaseUri(resources);
//Setup custom tagworker factory for better tagging of headers
DefaultTagWorkerFactory tagWorkerFactory = new AccessibilityTagWorkerFactory();
props.setTagWorkerFactory(tagWorkerFactory);
HtmlConverter.convertToPdf(new FileInputStream(src), pdfDoc, props);
pdfDoc.close();
} catch (Exception e) {
e.printStackTrace();
}
}
The most relevant line here is
HtmlConverter.convertToPdf(new FileInputStream(src), pdfDoc, props);
Which essentially tells pdfHTML to perform the conversion of the inputstream (specified by src), put the content in pdfDoc and use the given ConverterProperties (specified by props).
I am using ITextSharp to convert HTML to PDF but i want the PDF to be generated of size 5cm width. I used the following code
var pgSize = new iTextSharp.text.Rectangle(2.05f, 2.05f);
Document doc = new Document(pgSize);
but it is just resizing the pdf and my data disappeared in the pdf or get hide.
How can i align the data in the center in PDF or resize the pdf? Here is my code
public void ConvertHTMLToPDF(string HTMLCode)
{
try
{
System.IO.StringWriter stringWrite = new StringWriter();
System.Web.UI.HtmlTextWriter htmlWrite = new HtmlTextWriter(stringWrite);
StringReader reader = new StringReader(HTMLCode);
var pgSize = new iTextSharp.text.Rectangle(2.05f, 2.05f);
Document doc = new Document(pgSize);
HTMLWorker parser = new HTMLWorker(doc);
PdfWriter.GetInstance(doc, new FileStream(Server.MapPath("~") + "/App_Data/HTMLToPDF.pdf",
FileMode.Create));
doc.Open();
foreach (IElement element in HTMLWorker.ParseToList(
new StringReader(HTMLCode), null))
{
doc.Add(element);
}
doc.Close();
Response.End();
}
catch (Exception ex)
{
}
}
You are creating a PDF that measures 0.0723 cm by 0.0723 cm. That is much too small to add any content. If you want to create a PDF of 5 cm by 5 cm, you need to create your document like this:
var pgSize = new iTextSharp.text.Rectangle(141.732f, 141.732f);
Document doc = new Document(pgSize);
As for the alignment, that should be defined in the HTML, but you are using an old version of iText and you are using the deprecated HTMLWorker.
You should upgrade to iText 7 and pdfHTML as described here: Converting HTML to PDF using iText
Also: the size of the page can be defined in the #page-rule of the CSS. See Huge white space after header in PDF using Flying Saucer
Why would you make it difficult for yourself by using an old iText version, when the new version allows you to do this:
#page {
size: 5cm 5cm;
}
I have this code which I merged and modified for my needs. But I still can't make it work as I need. The first part that I made, it generates PDF with an option from aspx page chosen. Second, I need to have the background over the page, so I added next code, but now it generates just the second code and not the PDF. And im not able to merge those codes together.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
public partial class CreatePDFFromScratch : System.Web.UI.Page
{
protected void btnCreatePDF_Click(object sender, EventArgs e)
{
// Create a Document object
var document = new Document(iTextSharp.text.PageSize.LETTER.Rotate(), 0f, 0f, 0f, 0f);
// Create a new PdfWrite object, writing the output to a MemoryStream
var output = new MemoryStream();
var writer = PdfWriter.GetInstance(document, output);
// Open the Document for writing
document.Open();
// First, create our fonts..
var titleFont = FontFactory.GetFont("Arial", 18, Font.BOLD);
var subTitleFont = FontFactory.GetFont("Arial", 14, Font.BOLD);
var boldTableFont = FontFactory.GetFont("Arial", 12, Font.BOLD);
var endingMessageFont = FontFactory.GetFont("Arial", 10, Font.ITALIC);
var bodyFont = FontFactory.GetFont("Arial", 12, Font.NORMAL);
// Add the "Northwind Traders Receipt" title
document.Add(new Paragraph("Northwind Traders Receipt", titleFont));
// Now add the "Thank you for shopping at Northwind Traders. Your order details are below." message
document.Add(new Paragraph("Thank you for shopping at Northwind Traders. Your order details are below.", bodyFont));
document.Add(Chunk.NEWLINE);
// Add the "Order Information" subtitle
document.Add(new Paragraph("Order Information", subTitleFont));
// Create the Order Information table
var orderInfoTable = new PdfPTable(2);
orderInfoTable.HorizontalAlignment = 0;
orderInfoTable.SpacingBefore = 10;
orderInfoTable.SpacingAfter = 10;
orderInfoTable.DefaultCell.Border = 0;
orderInfoTable.SetWidths(new int[] { 1, 4 });
orderInfoTable.AddCell(new Phrase("Order:", boldTableFont));
orderInfoTable.AddCell(txtOrderID.Text);
orderInfoTable.AddCell(new Phrase("Price:", boldTableFont));
orderInfoTable.AddCell(Convert.ToDecimal(txtTotalPrice.Text).ToString("c"));
document.Add(orderInfoTable);
// Add the "Items In Your Order" subtitle
document.Add(new Paragraph("Items In Your Order", subTitleFont));
// Create the Order Details table
var orderDetailsTable = new PdfPTable(3);
orderDetailsTable.HorizontalAlignment = 0;
orderDetailsTable.SpacingBefore = 10;
orderDetailsTable.SpacingAfter = 35;
orderDetailsTable.DefaultCell.Border = 0;
orderDetailsTable.AddCell(new Phrase("Item #:", boldTableFont));
orderDetailsTable.AddCell(new Phrase("Item Name:", boldTableFont));
orderDetailsTable.AddCell(new Phrase("Qty:", boldTableFont));
foreach (System.Web.UI.WebControls.ListItem item in cblItemsPurchased.Items)
if (item.Selected)
{
// Each CheckBoxList item has a value of ITEMNAME|ITEM#|QTY, so we split on | and pull these values out...
var pieces = item.Value.Split("|".ToCharArray());
orderDetailsTable.AddCell(pieces[1]);
orderDetailsTable.AddCell(pieces[0]);
orderDetailsTable.AddCell(pieces[2]);
}
document.Add(orderDetailsTable);
// Add ending message
var endingMessage = new Paragraph("Thank you for your business! If you have any questions about your order, please contact us at 800-555-NORTH.", endingMessageFont);
endingMessage.SetAlignment("Center");
document.Add(endingMessage);
document.Close();
Response.ContentType = "application/pdf";
Response.AddHeader("Content-Disposition", string.Format("inline;filename=Receipt-{0}.pdf", txtOrderID.Text));
///create background
Response.BinaryWrite(output.ToArray());
Response.Cache.SetCacheability(HttpCacheability.NoCache);
string imageFilePath = Server.MapPath(".") + "/images/1.jpg";
iTextSharp.text.Image jpg = iTextSharp.text.Image.GetInstance(imageFilePath);
Document pdfDoc = new Document(iTextSharp.text.PageSize.LETTER.Rotate(), 0, 0, 0, 0);
jpg.ScaleToFit(790, 777);
jpg.Alignment = iTextSharp.text.Image.UNDERLYING;
pdfDoc.Open();
pdfDoc.NewPage();
pdfDoc.Add(jpg);
pdfDoc.Close();
Response.Write(pdfDoc);
Response.End();
}
}
Thanks
I almost missed this question because it wasn't tagged as an itext question.
First let me copy/paste/adapt #mkl's comment:
The first part of your code in which you create a document document makes sense.
The second part in which you create a document pdfDoc does not.
First of all, at the end of the first part you write the pdf to
the response. That PDF is complete. It's finished. It's done.
It's ready to send to the browser.
Why do you think anything additional written to the
response thereafter might have a chance of combining with the original
written data to a properly generated PDF?
Also: the second part of your code is
written as if you want to create a new PDF from scratch; but didn't you
want to manipulate the PDF created in the first part?
All of this is true, but it doesn't solve your problem. It only reveals your deep lack of understanding in PDF.
There are different ways to achieve what you want. I see that you want to use an image as a background of all the pages of a newly created PDF. In that case, you should create a page event, and add that image underneath all the existing content in the OnEndPage() method. This is explained in the answer to How can I add an image to all pages of my PDF?
Create a PDF as is done in the first part of your code, but introduce a page event:
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.GetInstance(document, stream);
MyEvent event = new MyEvent();
writer.PageEvent = event;
// step 3
document.Open();
// step 4
// Add whatever content you want to add
// step 5
document.Close();
What is the MyEvent class, you might ask? Well, that's a class you create yourself like this:
protected class MyEvent : PdfPageEventHelper {
Image image;
public override void OnOpenDocument(PdfWriter writer, Document document) {
image = Image.GetInstance(Server.MapPath("~/images/background.png"));
image.SetAbsolutePosition(0, 0);
}
public override void OnEndPage(PdfWriter writer, Document document) {
writer.DirectContent.AddImage(image);
}
}
Suppose that your requirement isn't as easy as adding an image in the background, then you could use the bytes created as output to create a PdfReader instance. You could then use the PdfReader to create a PdfStamper and you can use the PdfStamper to watermark the original document. If the simple solution doesn't meet your needs, create a new question that involves PdfReader/PdfStamper and don't forget to tag that question as an iText question. (And also: please read the documentation. A lot of time was spent on the iText web site. That time was wasted if you don't consult it.)
I am doing html to pdf file . Its Downloading instantly . I dont want download instantly. i want to save the file in my project folder once converted.
My C# Code
string html ="<table><tr><td>some contents</td></tr></table>";
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=WelcomeLetter.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
StringWriter sw = new StringWriter();
HtmlTextWriter hw = new HtmlTextWriter(sw);
StringReader sr = new StringReader(table);
Document ResultPDF = new Document(iTextSharp.text.PageSize.A4, 25, 10, 20, 30);
PdfPTable Headtable = new PdfPTable(7);
Headtable.TotalWidth = 525f;
Headtable.LockedWidth = true;
Headtable.HeaderRows = 5;
Headtable.FooterRows = 2;
Headtable.KeepTogether = true;
HTMLWorker htmlparser = new HTMLWorker(ResultPDF);
PdfWriter.GetInstance(ResultPDF, Response.OutputStream);
ResultPDF.Open();
htmlparser.Parse(sr);
ResultPDF.Close();
Response.Write(ResultPDF);
Response.End();
For saving pdf file locally in your project folder you can use FileStream class like this.
FileStream stream = new FileStream(filePath, FileMode.Create);//Here filePath is path of your project folder.
Now use this stream instead of using Response.OutputStream when you create instance of PdfWriter object.
PdfWriter.GetInstance(ResultPDF, stream);
Now do not use Responce.Write as you don't want to download your file.And close your stream at end.
stream.Close();
I'm going to combine everyone's answer into one that you should be able to drop in and use. If this works, I would accept Manish Parakhiya's answer because that had the most important part.
First, I'm going to assume you are using a recent version of iTextSharp. I think 5.5.5 is the most recent version. Second, because of this, I'm going to restructure your code a bit in order to use the using pattern. If you're stuck on an older obsolete unsupported version like 4.1.6 you'll need to re-adjust.
Almost every tutorial out there shows you that you can bind directly the Response.OutputStream. This is 100% valid but I would argue that it is also a really bad idea. Instead, bind to a more generic MemoryStream. This makes debugging much easier and your code will port and adapt that much easier.
The below code includes comments about each of the changes and what things are actually doing. The top section is all about creating a PDF from a string of HTML. The bottom actually does something with it, including writing it to disk and/or streaming it to a browser.
//Will hold our PDF eventually
Byte[] bytes;
//HTML that we want to parse
string html = "<table><tr><td>some contents</td></tr></table>";
//Create a MemoryStream to write our PDF to
using (var ms = new MemoryStream()) {
//Create our document abstraction
using (var ResultPDF = new Document(iTextSharp.text.PageSize.A4, 25, 10, 20, 30)) {
//Bind a writer to our Document abstraction and our stream
using (var writer = PdfWriter.GetInstance(ResultPDF, ms)) {
//Open the PDF for writing
ResultPDF.Open();
//Parse our HTML using the old, obsolete, not support parser
using (var sw = new StringWriter()) {
using (var hw = new HtmlTextWriter(sw)) {
using (var sr = new StringReader(html)) {
using (var htmlparser = new HTMLWorker(ResultPDF)) {
htmlparser.Parse(sr);
}
}
}
}
//Close the PDF
ResultPDF.Close();
}
}
//Grab the raw bytes of the PDF
bytes = ms.ToArray();
}
//At this point, the bytes variable holds a valid PDF file.
//You can write it disk:
System.IO.File.WriteAllBytes("your file path here", bytes);
//You can also send it to a browser:
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=WelcomeLetter.pdf");
Response.BinaryWrite(bytes);
Response.Cache.SetCacheability(HttpCacheability.NoCache);
//Never do the next line, it doesn't do what you think it does and actually produces corrupt PDFs
//Response.Write(ResultPDF); //BAD!!!!!!
Response.End();
string tempDirectory = Session.SessionID.ToString();
string location = Path.Combine(Server.MapPath(
WebConfigurationManager.AppSettings["PathSet"].ToString()), tempDirectory);
if (!Directory.Exists(location))
{
Directory.CreateDirectory(location);
}
string fileName="abc.pdf";
filePath = Path.Combine(location, fileName);
when converting html to pdf using itextsharp the style iam applying with css for the web page is not working in the converted pdf.
here is my css code :
<style type="text/css">
.cssformat
{
width:300px;
height:200px;
border:2px solid black;
background-color:white;
border-top-left-radius:60px 90px;
border-bottom-right-radius:60px 90px;
}
</style>
here is my html code :
<div id="divpdf" runat="server">
<table id="tid" runat="server">
<tr>
<td>
<asp:Label ID="Label1" runat="server" Text="this is new way of pdf" CssClass="cssformat"></asp:Label>
</td>
</tr>
</table>
</div>
The following is what i have tried with c# :
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=TestPage.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
StringBuilder sb = new StringBuilder();
StringWriter sw = new StringWriter();
HtmlTextWriter hw = new HtmlTextWriter(sw);
Document pdfDoc = new Document(PageSize.A4, 60f, 80f, -2f, 35f);
divpdf.RenderControl(hw);
StringReader sr = new StringReader(sw.ToString());
HTMLWorker htmlparser = new HTMLWorker(pdfDoc);
PdfWriter writer = PdfWriter.GetInstance(pdfDoc, Response.OutputStream);
pdfDoc.Open();
hw1.Parse(new StringReader(sttt));
htmlparser.Parse(sr);
pdfDoc.Close();
Response.Write(pdfDoc);
Response.End();
sw.Close();
sr.Close();
hw.Close();
I struggled quite a bit to convert from HTML to PDF using iTextSharp and eventually gave up because I could not get a converted PDF that looked 100% the same as my HTML5/CSS3 page. So I'm giving you the alternative that eventually worked for me.
There is surprisingly very little options available when you are not prepared to pay for a commercial library. I had the same requirement from one of my clients(to convert from HTML to PDF) that did not want to pay for any third party tools, so I had to make a plan. This is what I did, not the best solution, but it got the job done
I downloaded the newest version of wkhtmltopdf. Unfortunately the wkhtmltopdf tool did not display some of my google graphs embedded in my HTML when converting to PDF. So I used the wkhtmltoimage tool also included to convert to a PNG, which woked as expected and displayed all the graphs.
I then downloaded the newest version of imagemagick and converted the PNG to PDF.
I automated this process using C#.
Unfortunately this is not the most elegant solution because you have to perform two conversions and do a bit of work to automate everything, but this is the best solution I could come up with that gave me the desired results and quality.
Of course there are lots of commercial software out there that will do a faster and better job.
Just a side note:
The web page that I had to convert was devloped in HTML5 and CSS3 using version 3 of bootstrap and it contained some google graphs and charts. Everything was converted without any problems.
Below is the example to convert HTML content containing the inline CSS Code.
public static class PdfCreator {
public static string ConvertHtmlToPdf(string htmlContent, string fileNameWithoutExtension, string filePath, string cssContent = "") {
if (!Directory.Exists(filePath)) {
Directory.CreateDirectory(filePath);
}
var fileNameWithPath = Path.Combine(filePath, fileNameWithoutExtension + ".pdf");
using(var stream = new FileStream(fileNameWithPath, FileMode.Create)) {
using(var document = new Document()) {
var writer = PdfWriter.GetInstance(document, stream);
document.Open();
// instantiate custom tag processor and add to `HtmlPipelineContext`.
var tagProcessorFactory = Tags.GetHtmlTagProcessorFactory();
tagProcessorFactory.AddProcessor(new TableData(), new string[] {
HTML.Tag.TD
});
var htmlPipelineContext = new HtmlPipelineContext(null);
htmlPipelineContext.SetTagFactory(tagProcessorFactory);
var pdfWriterPipeline = new PdfWriterPipeline(document, writer);
var htmlPipeline = new HtmlPipeline(htmlPipelineContext, pdfWriterPipeline);
// get an ICssResolver and add the custom CSS
var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
cssResolver.AddCss(cssContent, "utf-8", true);
var cssResolverPipeline = new CssResolverPipeline(
cssResolver, htmlPipeline);
var worker = new XMLWorker(cssResolverPipeline, true);
var parser = new XMLParser(worker);
using(var stringReader = new StringReader(htmlContent)) {
parser.Parse(stringReader);
}
}
}
return fileNameWithPath;
}
}
The output format of <asp:Lable> is "span", which is inline type of display. So change the display to block. Enjoy..