MigraDoc and UTF characters

MigraDoc and UTF characters - c#

I am trying to force MigraDoc to render pdf in unicode (currently Chinese/Japanese characters) in c#.
Here is the code I use:
public void Render()
{
var doc = new MigraDoc.DocumentObjectModel.Document();
doc.AddSection();
Style style = doc.Styles["Normal"];
style.Font.Name = "Lucida Sans Unicode";
var paragraph = GetLastSection().AddParagraph();
paragraph.AddText("彤");
var pdfRenderer = new PdfDocumentRenderer(true, PdfFontEmbedding.Always);
pdfRenderer.Document = doc;
pdfRenderer.RenderDocument();
pdfRenderer.PdfDocument.Save(#"c:\temp\test.pdf");
}
The pdf itself gets generated but unfortunately the only thing I read is a square.
Version of MigraDoc is 1.32.4334.0
Thank you for any help.

Related

Add multiple Checkboxes in ITextsharp html to pdf

I'm using the iTextSharp library to convert my html to pdf. The issue is I'm trying to add checkbox appearance using the below code:
string HTML,public static String FONT = "c:/windows/fonts/WINGDING.TTF";
public static String TEXT = "o";
public void HTMLToPdf( string FileName)
{
string HTML="<!DOCTYPE html>
<html>
<head><title></title><meta charset='UTF-8'></head>
<body><div class='mystyle'>Here i want to print many checkbox lik appearances</div></body>
<html>";
Document pdfDoc = new Document(PageSize.A4, 30f, 30f, 10f, 10f);
pdfDoc.Add(p);
BaseFont bf = BaseFont.CreateFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font f = new Font(bf, 12);
Paragraph p = new Paragraph(TEXT, f);
pdfDoc.Add(p);
}
The problem is this method adds the checkbox at the begining of pdf, please help me to attach the paragraph containing the checkbox value to my html.
Simply put, I'm getting the value at pdfDoc.Add(p), but I want it in a variable to print it many times in html.

In fact, it is not very clear what was meant in the question:
please help me to attach the paragraph containing the checkbox value to my html
I can assume that you wanted to convert HTML to PDF, and add paragraphs on the next line.
In general, it's a bad idea to use iTextSharp for this, since this library is outdated and no longer supported. I can suggest my own way of solving your problem in pdfHTML, this is an iText7 add-on. My code is in Java, but it's not much different from sharp.The main idea is not to close the document after html conversion. Because if you try to write a paragraph in a closed document, it will be at the very beginning, as in your example.
String FONT = "c:/windows/fonts/WINGDING.TTF";
String TEXT = "o";
File htmlSource = new File("checkBoxHtml.html");
File pdfDest = new File("output.pdf");
ConverterProperties converterProperties = new ConverterProperties();
Document document = HtmlConverter.convertToDocument(new FileInputStream(htmlSource),
new PdfDocument(new PdfWriter(pdfDest)), converterProperties);
PdfFont font = PdfFontFactory.createFont(FONT);
Text text = new Text(TEXT);
text.setFont(font);
Paragraph paragraph = new Paragraph();
// Adding text to the paragraph
paragraph.add(text);
// Adding paragraph to the document
document.add(paragraph);
document.close();

ITextSharp pdf resize and data alignment

I am using ITextSharp to convert HTML to PDF but i want the PDF to be generated of size 5cm width. I used the following code
var pgSize = new iTextSharp.text.Rectangle(2.05f, 2.05f);
Document doc = new Document(pgSize);
but it is just resizing the pdf and my data disappeared in the pdf or get hide.
How can i align the data in the center in PDF or resize the pdf? Here is my code
public void ConvertHTMLToPDF(string HTMLCode)
{
try
{
System.IO.StringWriter stringWrite = new StringWriter();
System.Web.UI.HtmlTextWriter htmlWrite = new HtmlTextWriter(stringWrite);
StringReader reader = new StringReader(HTMLCode);
var pgSize = new iTextSharp.text.Rectangle(2.05f, 2.05f);
Document doc = new Document(pgSize);
HTMLWorker parser = new HTMLWorker(doc);
PdfWriter.GetInstance(doc, new FileStream(Server.MapPath("~") + "/App_Data/HTMLToPDF.pdf",
FileMode.Create));
doc.Open();
foreach (IElement element in HTMLWorker.ParseToList(
new StringReader(HTMLCode), null))
{
doc.Add(element);
}
doc.Close();
Response.End();
}
catch (Exception ex)
{
}
}

You are creating a PDF that measures 0.0723 cm by 0.0723 cm. That is much too small to add any content. If you want to create a PDF of 5 cm by 5 cm, you need to create your document like this:
var pgSize = new iTextSharp.text.Rectangle(141.732f, 141.732f);
Document doc = new Document(pgSize);
As for the alignment, that should be defined in the HTML, but you are using an old version of iText and you are using the deprecated HTMLWorker.
You should upgrade to iText 7 and pdfHTML as described here: Converting HTML to PDF using iText
Also: the size of the page can be defined in the #page-rule of the CSS. See Huge white space after header in PDF using Flying Saucer
Why would you make it difficult for yourself by using an old iText version, when the new version allows you to do this:
#page {
size: 5cm 5cm;
}

converting HTML to a multi-column PDF

I am trying to generate a multi-column PDF from HTML using iText for .NET.
I am using CSS3 syntax to generate two columns.
And below code is not working for me.
CSS
column-count:2;
C# Code
StringReader html = new StringReader(#"
<div style='column-count:2;'>Sample Text. Sample Text. Sample Text. Sample Text.
Sample Text. Sample Text. Sample Text. Sample Text. Sample Text. Sample Text.
Sample Text. Sample Text. Sample Text. Sample Text. Sample Text. Sample Text.
Sample Text. Sample Text. </div>
");
Document document = new Document();
PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(#"d:\temp\xyz.pdf", FileMode.Create));
document.Open();
XMLWorkerHelper.GetInstance().ParseXHtml(
writer, document, html
);
document.Close();
Please suggest what is issue in this code. Or is there any other HTML to PDF library available to fix this issue.

The CSS property column-count is not supported in XML Worker, and it probably never will.
However, this doesn't mean that you can't display HTML in columns.
If you go to the official XML Worker documentation, you'll find the ParseHtmlObjects where we parse a large HTML file and render it to a PDF with two columns: walden5.pdf
This is done by parsing the HTML into an ElementList first:
// CSS
CSSResolver cssResolver =
XMLWorkerHelper.getInstance().getDefaultCssResolver(true);
// HTML
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
htmlContext.autoBookmark(false);
// Pipelines
ElementList elements = new ElementList();
ElementHandlerPipeline end = new ElementHandlerPipeline(elements, null);
HtmlPipeline html = new HtmlPipeline(htmlContext, end);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
Once we have the list of Element objects, we can add them to a ColumnText object:
// step 1
Document document = new Document(PageSize.LEGAL.rotate());
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
Rectangle left = new Rectangle(36, 36, 486, 586);
Rectangle right = new Rectangle(522, 36, 972, 586);
ColumnText column = new ColumnText(writer.getDirectContent());
column.setSimpleColumn(left);
boolean leftside = true;
int status = ColumnText.START_COLUMN;
for (Element e : elements) {
if (ColumnText.isAllowedElement(e)) {
column.addElement(e);
status = column.go();
while (ColumnText.hasMoreText(status)) {
if (leftside) {
leftside = false;
column.setSimpleColumn(right);
}
else {
document.newPage();
leftside = true;
column.setSimpleColumn(left);
}
status = column.go();
}
}
}
// step 5
document.close();
As you can see, you need to make some decisions here: you need to define the rectangles on the pages. You need to introduce new pages, etc...
Note: there is currently no C# port of this documentation. Please think of the Java code as if it were pseudo code.

Unable to Set Font Family in iTextSharp XMLWorker

I am trying to parse some HTML to PDF using itextsharp XMLWorker library. It is working fine but I am unable to render some Unicode characters (Turkish) into my pdf.
I have read several blogs about the problem and they all propose registering a font which supports unicode characters. Then in external css file, I need to specify the font family to use.
html
{
font-family: 'Arial Unicode MS';
}
I also tried all Arial as family too. I tried setting the family in html as well.
<body face = 'Arial'>
None of them are working. Font is registered without problems and external CSS file is working too.
This is how I convert HTML to PDF,
string arialuniTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF");
FontFactory.Register(arialuniTff);
// Resolve CSS
var cssResolver = new StyleAttrCSSResolver();
var cssFile = XMLWorkerHelper.GetCSS(new FileStream(Server.MapPath("~/Content/Editor.css"), FileMode.Open));
cssResolver.AddCss(cssFile);
// HTML
CssAppliers ca = new CssAppliersImpl();
HtmlPipelineContext hpc = new HtmlPipelineContext(ca);
hpc.SetTagFactory(Tags.GetHtmlTagProcessorFactory());
// PIPELINES
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline htmlPipe = new HtmlPipeline(hpc, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, htmlPipe);
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
StringReader sr = new StringReader("<html><head></head><body>" + topMessage.Replace("<br>", "<br></br>") + "</body></html>");
p.Parse(sr);

I see that you create your CssAppliersImpl instance without using a parameter. If you want to deal with fonts, you should create a ´FontProvider´ implementation and use an instance of that implementation as parameter for the CssAppliersImpl constructor. For instance: create a TestFontProvider class that shows you which font names are needed when parsing your HTML. That will help you understand if the right fonts are registered. If you see that all the fonts that are necessary are registered, the problem may be caused by something else. For instance: maybe the HTML is parsed using the wrong encoding...

Here is the working solution after so many attempts:
string fontPath = Path.Combine(#"fonts\Gaegu-Regular.ttf");
var fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.Register(fontPath);
CssAppliers ca = new CssAppliersImpl(fontProvider);
HtmlPipelineContext htmlContext = new HtmlPipelineContext(ca);
var pipeline = new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new PdfWriterPipeline(document, writer)));
Thanks.

.NET C# - MigraDoc - How to change document charset?

I've searched for solution to this problem, but still cannot find the answer. Any help would be appreciated.
Document document = new Document();
Section section = document.AddSection();
Paragraph paragraph = section.AddParagraph();
paragraph.Format.Font.Color = Color.FromCmyk(100, 30, 20, 50);
paragraph.AddText("ąčęėįųųūū");
paragraph.Format.Font.Size = 9;
paragraph.Format.Alignment = ParagraphAlignment.Center;
</b>
<...>
In example above characters "ąčęėįųųūū" are not displayed in exported pdf.
How can I set 'MigraDoc' character set ?

Just tell the Renderer to create an Unicode document:
PdfDocumentRenderer renderer = new PdfDocumentRenderer(true, PdfSharp.Pdf.PdfFontEmbedding.Always);
renderer.Document = document;
The first parameter of PdfDocumentRenderer must be true to get Unicode.
Please note that not all True Type fonts include all Unicode characters (but it should work with Arial, Verdana, etc.).
See here for a complete sample:
http://www.pdfsharp.net/wiki/HelloMigraDoc-sample.ashx

If you are mixing PDFSharp and MigraDoc, as I do ( it means that you have a PdfSharp object PdfDocument document and a MigraDoc object Document doc, which is rendered as a part of document), everything is not that simple. The example, that PDFSharp Team has given works only when you are using MigraDoc separately.
So you should use it this way:
Make sure you are rendering your MigraDoc doc earlier than rendering the MigraDoc object to the PDF sharp XGraphics gfx.
Use the hack to set encoding for the gfx object.
XGraphics gfx = XGraphics.FromPdfPage(page);
// HACK²
gfx.MUH = PdfFontEncoding.Unicode;
gfx.MFEH = PdfFontEmbedding.Always;
// HACK²
Document doc = new Document();
PdfDocumentRenderer pdfRenderer = new PdfDocumentRenderer(true, PdfFontEmbedding.Always);
pdfRenderer.Document = doc;
pdfRenderer.RenderDocument();
MigraDoc.Rendering.DocumentRenderer docRenderer = new DocumentRenderer(doc);
docRenderer.PrepareDocument();
docRenderer.RenderObject(gfx, XUnit.FromCentimeter(5), XUnit.FromCentimeter(10), "12cm", para);
For 1.5.x-betax
let gfx = XGraphics.FromPdfPage(page)
gfx.MUH <- PdfFontEncoding.Unicode
let doc = new Document()
let pdfRenderer = new PdfDocumentRenderer(true, PdfFontEmbedding.Always)
pdfRenderer.Document <- doc
pdfRenderer.RenderDocument()
let docRenderer = new DocumentRenderer(doc)
docRenderer.PrepareDocument()
docRenderer.RenderObject(gfx, XUnit.FromCentimeter 5, XUnit.FromCentimeter 10, "12cm", para)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

MigraDoc and UTF characters - c#

Related

Add multiple Checkboxes in ITextsharp html to pdf

ITextSharp pdf resize and data alignment

converting HTML to a multi-column PDF

Unable to Set Font Family in iTextSharp XMLWorker

.NET C# - MigraDoc - How to change document charset?

Categories

Resources