I am trying to parse some HTML to PDF using itextsharp XMLWorker library. It is working fine but I am unable to render some Unicode characters (Turkish) into my pdf.
I have read several blogs about the problem and they all propose registering a font which supports unicode characters. Then in external css file, I need to specify the font family to use.
html
{
font-family: 'Arial Unicode MS';
}
I also tried all Arial as family too. I tried setting the family in html as well.
<body face = 'Arial'>
None of them are working. Font is registered without problems and external CSS file is working too.
This is how I convert HTML to PDF,
string arialuniTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF");
FontFactory.Register(arialuniTff);
// Resolve CSS
var cssResolver = new StyleAttrCSSResolver();
var cssFile = XMLWorkerHelper.GetCSS(new FileStream(Server.MapPath("~/Content/Editor.css"), FileMode.Open));
cssResolver.AddCss(cssFile);
// HTML
CssAppliers ca = new CssAppliersImpl();
HtmlPipelineContext hpc = new HtmlPipelineContext(ca);
hpc.SetTagFactory(Tags.GetHtmlTagProcessorFactory());
// PIPELINES
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline htmlPipe = new HtmlPipeline(hpc, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, htmlPipe);
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
StringReader sr = new StringReader("<html><head></head><body>" + topMessage.Replace("<br>", "<br></br>") + "</body></html>");
p.Parse(sr);
I see that you create your CssAppliersImpl instance without using a parameter. If you want to deal with fonts, you should create a ´FontProvider´ implementation and use an instance of that implementation as parameter for the CssAppliersImpl constructor. For instance: create a TestFontProvider class that shows you which font names are needed when parsing your HTML. That will help you understand if the right fonts are registered. If you see that all the fonts that are necessary are registered, the problem may be caused by something else. For instance: maybe the HTML is parsed using the wrong encoding...
Here is the working solution after so many attempts:
string fontPath = Path.Combine(#"fonts\Gaegu-Regular.ttf");
var fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.Register(fontPath);
CssAppliers ca = new CssAppliersImpl(fontProvider);
HtmlPipelineContext htmlContext = new HtmlPipelineContext(ca);
var pipeline = new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new PdfWriterPipeline(document, writer)));
Thanks.
Related
I'm using the iTextSharp library to convert my html to pdf. The issue is I'm trying to add checkbox appearance using the below code:
string HTML,public static String FONT = "c:/windows/fonts/WINGDING.TTF";
public static String TEXT = "o";
public void HTMLToPdf( string FileName)
{
string HTML="<!DOCTYPE html>
<html>
<head><title></title><meta charset='UTF-8'></head>
<body><div class='mystyle'>Here i want to print many checkbox lik appearances</div></body>
<html>";
Document pdfDoc = new Document(PageSize.A4, 30f, 30f, 10f, 10f);
pdfDoc.Add(p);
BaseFont bf = BaseFont.CreateFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font f = new Font(bf, 12);
Paragraph p = new Paragraph(TEXT, f);
pdfDoc.Add(p);
}
The problem is this method adds the checkbox at the begining of pdf, please help me to attach the paragraph containing the checkbox value to my html.
Simply put, I'm getting the value at pdfDoc.Add(p), but I want it in a variable to print it many times in html.
In fact, it is not very clear what was meant in the question:
please help me to attach the paragraph containing the checkbox value to my html
I can assume that you wanted to convert HTML to PDF, and add paragraphs on the next line.
In general, it's a bad idea to use iTextSharp for this, since this library is outdated and no longer supported. I can suggest my own way of solving your problem in pdfHTML, this is an iText7 add-on. My code is in Java, but it's not much different from sharp.The main idea is not to close the document after html conversion. Because if you try to write a paragraph in a closed document, it will be at the very beginning, as in your example.
String FONT = "c:/windows/fonts/WINGDING.TTF";
String TEXT = "o";
File htmlSource = new File("checkBoxHtml.html");
File pdfDest = new File("output.pdf");
ConverterProperties converterProperties = new ConverterProperties();
Document document = HtmlConverter.convertToDocument(new FileInputStream(htmlSource),
new PdfDocument(new PdfWriter(pdfDest)), converterProperties);
PdfFont font = PdfFontFactory.createFont(FONT);
Text text = new Text(TEXT);
text.setFont(font);
Paragraph paragraph = new Paragraph();
// Adding text to the paragraph
paragraph.add(text);
// Adding paragraph to the document
document.add(paragraph);
document.close();
I am trying to force MigraDoc to render pdf in unicode (currently Chinese/Japanese characters) in c#.
Here is the code I use:
public void Render()
{
var doc = new MigraDoc.DocumentObjectModel.Document();
doc.AddSection();
Style style = doc.Styles["Normal"];
style.Font.Name = "Lucida Sans Unicode";
var paragraph = GetLastSection().AddParagraph();
paragraph.AddText("彤");
var pdfRenderer = new PdfDocumentRenderer(true, PdfFontEmbedding.Always);
pdfRenderer.Document = doc;
pdfRenderer.RenderDocument();
pdfRenderer.PdfDocument.Save(#"c:\temp\test.pdf");
}
The pdf itself gets generated but unfortunately the only thing I read is a square.
Version of MigraDoc is 1.32.4334.0
Thank you for any help.
I am trying to generate a multi-column PDF from HTML using iText for .NET.
I am using CSS3 syntax to generate two columns.
And below code is not working for me.
CSS
column-count:2;
C# Code
StringReader html = new StringReader(#"
<div style='column-count:2;'>Sample Text. Sample Text. Sample Text. Sample Text.
Sample Text. Sample Text. Sample Text. Sample Text. Sample Text. Sample Text.
Sample Text. Sample Text. Sample Text. Sample Text. Sample Text. Sample Text.
Sample Text. Sample Text. </div>
");
Document document = new Document();
PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(#"d:\temp\xyz.pdf", FileMode.Create));
document.Open();
XMLWorkerHelper.GetInstance().ParseXHtml(
writer, document, html
);
document.Close();
Please suggest what is issue in this code. Or is there any other HTML to PDF library available to fix this issue.
The CSS property column-count is not supported in XML Worker, and it probably never will.
However, this doesn't mean that you can't display HTML in columns.
If you go to the official XML Worker documentation, you'll find the ParseHtmlObjects where we parse a large HTML file and render it to a PDF with two columns: walden5.pdf
This is done by parsing the HTML into an ElementList first:
// CSS
CSSResolver cssResolver =
XMLWorkerHelper.getInstance().getDefaultCssResolver(true);
// HTML
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
htmlContext.autoBookmark(false);
// Pipelines
ElementList elements = new ElementList();
ElementHandlerPipeline end = new ElementHandlerPipeline(elements, null);
HtmlPipeline html = new HtmlPipeline(htmlContext, end);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
Once we have the list of Element objects, we can add them to a ColumnText object:
// step 1
Document document = new Document(PageSize.LEGAL.rotate());
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
Rectangle left = new Rectangle(36, 36, 486, 586);
Rectangle right = new Rectangle(522, 36, 972, 586);
ColumnText column = new ColumnText(writer.getDirectContent());
column.setSimpleColumn(left);
boolean leftside = true;
int status = ColumnText.START_COLUMN;
for (Element e : elements) {
if (ColumnText.isAllowedElement(e)) {
column.addElement(e);
status = column.go();
while (ColumnText.hasMoreText(status)) {
if (leftside) {
leftside = false;
column.setSimpleColumn(right);
}
else {
document.newPage();
leftside = true;
column.setSimpleColumn(left);
}
status = column.go();
}
}
}
// step 5
document.close();
As you can see, you need to make some decisions here: you need to define the rectangles on the pages. You need to introduce new pages, etc...
Note: there is currently no C# port of this documentation. Please think of the Java code as if it were pseudo code.
I have been trying to get my MVC application te create pdf files based on MVC Views. I got this working with plain html. But i would also like to iclude my css files that i use for the browser. Now some of them work but with one i get the following error:
An exception of type 'System.FormatException' occurred in mscorlib.dll but was not handled in user code
Additional information: Input string was not in a correct format.
I am using the following code:
var data = GetHtml(new IndexModel(Context), "~\\Views\\Home\\Index.cshtml", "");
using (var document = new iTextSharp.text.Document())
{
//define output control HTML
var memStream = new MemoryStream();
TextReader xmlString = new StringReader(data);
PdfWriter writer = PdfWriter.GetInstance(document, new FileStream("c:\\tmp\\my.pdf", FileMode.OpenOrCreate));
//open doc
document.Open();
// register all fonts in current computer
FontFactory.RegisterDirectories();
// Set factories
var htmlContext = new HtmlPipelineContext(null);
htmlContext.SetTagFactory(Tags.GetHtmlTagProcessorFactory());
// Set css
ICSSResolver cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(false);
cssResolver.AddCssFile(HttpContext.Server.MapPath("~/Content/elements.css"), true);
cssResolver.AddCssFile(HttpContext.Server.MapPath("~/Content/style.css"), true);
cssResolver.AddCssFile(HttpContext.Server.MapPath("~/Content/jquery-ui.css"), true);
// Export
IPipeline pipeline = new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new PdfWriterPipeline(document, writer)));
var worker = new XMLWorker(pipeline, true);
var xmlParse = new XMLParser(true, worker);
xmlParse.Parse(xmlString);
xmlParse.Flush();
document.Close();
}
the string "data" is correct and has no issues, the problem lies with the AddCssFile().
If i create the pdf without and css files everything works, but including the css files triggers the error.
Help will be very much appreciated.
I don't know the exact answer, but by looking at the error you are getting back, I would try two different approaches.
Move the
cssResolver.AddCssFile(HttpContext.Server.MapPath("~/Content/elements.css"), true);
To something like
var cssPath = HttpContext.Server.MapPath("~/Content/elements.css"), true);
cssResolver.AddCssFile(cssPath);
Then set a breakpoint and look at the values being returned for cssPath. Make sure they are accurate and do not contain any odd characters.
Second approach... If all else fails, try giving an absolute URL to the CSS resource such as http://yourdomain.com/cssPath instead of a file system path.
If either of those two appraoches help you, then you can use it to determine the actual problem and then refactor it to your hearts content after that.
UPDATE ------------------------------------------------------------------>
According to the documentation, you need an absolute URL for the file, so Server.MapPath won't work.
addCssFile
void addCssFile(String href,
boolean isPersistent)
throws CssResolverException
Add a
Parameters:
href - the link to the css file ( an absolute uri )
isPersistent - true if the added css should not be deleted on a call to clear
Throws:
CssResolverException - thrown if something goes wrong
In that case, I would try using something like :
public string AbsoluteContent(string contentPath)
{
var path = Url.Content(contentPath);
var url = new Uri(HttpContext.Current.Request.Url, path);
return url.AbsoluteUri;
}
and use it like such :
var cssPath = AbsoluteContent("~/Content/embeddedCss/yourcssfile.css");
I've searched for solution to this problem, but still cannot find the answer. Any help would be appreciated.
Document document = new Document();
Section section = document.AddSection();
Paragraph paragraph = section.AddParagraph();
paragraph.Format.Font.Color = Color.FromCmyk(100, 30, 20, 50);
paragraph.AddText("ąčęėįųųūū");
paragraph.Format.Font.Size = 9;
paragraph.Format.Alignment = ParagraphAlignment.Center;
</b>
<...>
In example above characters "ąčęėįųųūū" are not displayed in exported pdf.
How can I set 'MigraDoc' character set ?
Just tell the Renderer to create an Unicode document:
PdfDocumentRenderer renderer = new PdfDocumentRenderer(true, PdfSharp.Pdf.PdfFontEmbedding.Always);
renderer.Document = document;
The first parameter of PdfDocumentRenderer must be true to get Unicode.
Please note that not all True Type fonts include all Unicode characters (but it should work with Arial, Verdana, etc.).
See here for a complete sample:
http://www.pdfsharp.net/wiki/HelloMigraDoc-sample.ashx
If you are mixing PDFSharp and MigraDoc, as I do ( it means that you have a PdfSharp object PdfDocument document and a MigraDoc object Document doc, which is rendered as a part of document), everything is not that simple. The example, that PDFSharp Team has given works only when you are using MigraDoc separately.
So you should use it this way:
Make sure you are rendering your MigraDoc doc earlier than rendering the MigraDoc object to the PDF sharp XGraphics gfx.
Use the hack to set encoding for the gfx object.
XGraphics gfx = XGraphics.FromPdfPage(page);
// HACK²
gfx.MUH = PdfFontEncoding.Unicode;
gfx.MFEH = PdfFontEmbedding.Always;
// HACK²
Document doc = new Document();
PdfDocumentRenderer pdfRenderer = new PdfDocumentRenderer(true, PdfFontEmbedding.Always);
pdfRenderer.Document = doc;
pdfRenderer.RenderDocument();
MigraDoc.Rendering.DocumentRenderer docRenderer = new DocumentRenderer(doc);
docRenderer.PrepareDocument();
docRenderer.RenderObject(gfx, XUnit.FromCentimeter(5), XUnit.FromCentimeter(10), "12cm", para);
For 1.5.x-betax
let gfx = XGraphics.FromPdfPage(page)
gfx.MUH <- PdfFontEncoding.Unicode
let doc = new Document()
let pdfRenderer = new PdfDocumentRenderer(true, PdfFontEmbedding.Always)
pdfRenderer.Document <- doc
pdfRenderer.RenderDocument()
let docRenderer = new DocumentRenderer(doc)
docRenderer.PrepareDocument()
docRenderer.RenderObject(gfx, XUnit.FromCentimeter 5, XUnit.FromCentimeter 10, "12cm", para)