Trouble with ITextSharp - Converting XML to PDF - c#

Okay... I'm trying to use the most recent version of ITextSharp to turn an XML file into a PDF. It isn't working.
The documentation on SourceForge doesn't seem to have kept up with the actual releases; the code in the provided example won't even compile under the newest version.
Here is my test XML:
<Remittance>
<RemitHeader>
<Payer>BlueCross</Payer>
<Provider>Maricopa</Provider>
<CheckDate>20100329</CheckDate>
<CheckNumber>123456789</CheckNumber>
</RemitHeader>
<RemitDetail>
<NPI>NPI_GOES_HERE</NPI>
<Patient>Patient Name</Patient>
<PCN>0034567</PCN>
<DateOfService>20100315</DateOfService>
<TotalCharge>125.57</TotalCharge>
<TotalPaid>55.75</TotalPaid>
<PatientShare>35</PatientShare>
</RemitDetail>
</Remittance>
And here is the code I'm attempting to use to turn that into a PDF.
Document doc = new Document(PageSize.LETTER, 36, 36, 36, 36);
iTextSharp.text.pdf.PdfWriter.GetInstance(doc,
new StreamWriter(fileOutputPath).BaseStream);
doc.Open();
SimpleXMLParser.Parse((ISimpleXMLDocHandler)doc,
new StreamReader(fileInputPath).BaseStream);
doc.Close();
Now, I was pretty sure the (ISimpleXMLDocHandler)doc piece wasn't going to work, but I can't actually find anything in the source that both a) implements ISimleXMLDocHandler and b) will accept a standard XML document and parse it to PDF.
FYI- I did try an older version which would compile using the example code from sourceforge, but it wasn't working either.

After browsing the 5.0.2 code, it looks like there is no document that will just take XML and turn it into a PDF for you. So, unless you can find similar code on the web or in an old release of iTextSharp, you'll need to write it yourself.

iText XML to PDF requires that the XML be formatted according to itext.dtd (please Google responsibly). You can also use a tagmap.xml to map your XML entities to those appropriate according to the DTD.

Related

MVCRazorToPdf (iTextSharp) using custom font

I am trying to add a custom font to my pdf output using the nuget package MVCRazorToPdf but I am having trouble with how to do this as the documentation for iTextSharp isn't great and all seems to be outdated.
The current code I have for creating the pdf is:
return new PdfActionResult(
"test.cshtml",
new TestModel(),
(writer, document) =>
{
FontFactory.Register(HostingEnvironment.MapPath("~/content/fonts/vegur-regular-webfont.ttf"), "VegurRegular");
});
Where writer is a PdfWriter and document is a Document
All the examples of using the FontFactory show that you need to use the XmlWorker but I don't have access to that, so I was wondering if there was any way to change the documents font using the writer or document?
I've seen that there is the document.HtmlStyleClass property but can't find anything about how to use this anywhere.
Any help with this would be greatly appreciated
MVCRazorToPdf is a very, very simple wrapper around iTextSharp's XMLWorker and uses the even simpler XMLWorkerHelper with all defaults to do its work. If you look at the source you'll see this:
document.Open();
using (var reader = new StringReader(RenderRazorView(context, viewName)))
{
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, reader);
document.Close();
output = workStream.ToArray();
}
If you're dead-set on using the NuGet version then you're stuck with this implementation and you're not going to be able to register a custom font.
However, there's an open issue regarding this that includes a fix so if you're willing to compile from source you can apply that change and you should be all set.
If you want to go one step further I'd recommend reading this great post that shows how simple parsing HTML with iTextSharp is as well Bruno's post here that shows how to register fonts.
EDIT
As per the post in the includes a fix link (just in case the link breaks in future), change the above using statement to:
using (var reader = new MemoryStream(Encoding.UTF8.GetBytes(RenderRazorView(context, viewName))))
{
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, reader, null, FontFactory.FontImp as IFontProvider);
document.Close();
output = workStream.ToArray();
}
And then the font factory as registered in the question above will work when using style="font-family:VegurRegular;"

Adding MarkInfo entry element to logical structure of a PDF file using Aspose dll's

I am using the most recent Aspose.PDF DLL in Visual Studio with the appropriate (in the code applied) license.
For my conversion from pdf files to pdfa types I use the following code:
Aspose.Pdf.Document pdf = new Aspose.Pdf.Document(pdfPath);
bool converted = pdf.Convert(temptext, PdfFormat.PDF_A_1A, ConvertErrorAction.None);
Now I receive the following errors, extracted from the temptext txtfile:
<Problem Severity="Error" Clause="6.8.3.3" Convertable="True">Catalog shall have struct tree root entry</Problem>
<Problem Severity="Error" Clause="6.8.2.2" Convertable="True">Catalog shall have MarkInfo entry</Problem>
Now to get a MarkInfo entry into the structure of my PDF file, I am supposed to be able to add elements to the catalog or root structure ( I am not sure exactly) which will give me the ability to create this entry tag to the logical structure of the PDF file.
Then these two errors will be avoided and the PDFa file will be converted correctly.
I noticed PDFSharp had a solution for this problem with their dll’s in the following way:
PdfSharp.Pdf.PdfDocument doc = PdfSharp.Pdf.IO.PdfReader.Open(pdfPath);
PdfSharp.Pdf.PdfDictionary structureTreeRoot = new PdfSharp.Pdf.PdfDictionary(doc);
structureTreeRoot.Elements["/StructElem"] = new PdfSharp.Pdf.PdfName("/Entry1");
PdfSharp.Pdf.PdfArray array = new PdfSharp.Pdf.PdfArray(doc);
doc.Internals.AddObject(structureTreeRoot);
doc.Internals.Catalog.Elements["/StructTreeRoot"] = PdfInternals.GetReference(structureTreeRoot);
I want to only use the Aspose dll. Does anyone know how I can apply this with aspose dll?
Currently, Aspose.Pdf does not support to add MarkInfo entry in logical structure of PDF. Please check the forum thread for similar question.
My name is Tilal Ahmad and I am developer evangelist at Aspose.

System.Exception: Internal error 20, found element in OpenXmlPowerTools.RevisionAccepter.AcceptRevisions

I want to Accept all the track changes from word document. I have written following codes to do so.(I am using PowerTools from codeplex.)
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(filePath, true))
{
OpenXmlPowerTools.RevisionAccepter.AcceptRevisions(wordDoc);
}
But above code is not working in some of the document. It shows System.Exception: Internal error 20, found element exception in some of the document.
So is there any issue with my word document? If yes then what should I look into document? In short I want to know that what is wrong with my document so that I can correct my document to run above code.
Another thing is that I am able to accept tracking changes in Word 2013/2010/2007 itself!!
Any help would be highly appreciated,
I have asked same question on https://powertools.codeplex.com/discussions and they have updated RevisionAccepter.cs file to get this resolved.
Thread from Power Tools Discussion

How to convert docx to html file using open xml with formatting

I know there are lot of question having same title but I am currently having some issue for them I didn't get the correct way to go.
I am using Open xml sdk 2.5 along with Power tool to convert .docx file to .html file which uses HtmlConverter class for conversion.
I am successfully able to convert the docx file into the Html file but the problem is, html file doesn't retain the original formatting of the document file. eg. Font-size,color,underline,bold etc doesn't reflect into the html file.
Here is my existing code:
public void ConvertDocxToHtml(string fileName)
{
byte[] byteArray = File.ReadAllBytes(fileName);
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
{
HtmlConverterSettings settings = new HtmlConverterSettings()
{
PageTitle = "My Page Title"
};
XElement html = HtmlConverter.ConvertToHtml(doc, settings);
File.WriteAllText(#"E:\Test.html", html.ToStringNewLineOnAttributes());
}
}
}
So I just want to know if is there any way by which I can retain the formatting in converted HTML file.
I know about some third party APIs which does the same thing. But I would prefer if there any way using open xml or any other open source to do this.
PowerTools for Open XML just released a new HtmlConverter module. It now contains an open source, free implementation of a conversion from DOCX to HTML formatted with CSS. The module HtmlConverter.cs supports all paragraph, character, and table styles, fonts and text formatting, numbered and bulleted lists, images, and more. See https://openxmldeveloper.org/
Your end result will not look exactly the way your Word Document turns out, but this link might help.
You might want to find an external tool to help you do this, like Aspose Words
You can use OpenXML Viewer extension for Firefox for Converting with formatting.
http://openxmlviewer.codeplex.com
This works for me. Hope this helps.

PDF generated with itext becomes 'corrupted' when using SetSimpleColumn()

First I would like to point out that stackowerflow helped me with many problems in the past, so thank you all. But now I have come to problem that I haven't fount a solution for yet and it's driving me crazy. I'm not native english speaker, so sorry for any language mistakes.
So here it is:
I'm generating pdf with itextsharp library(great library by the way). I'm starting with some kind of pdf form/template, to which i'm adding 'fill-out' data. I'm using PdfReader to read template pdf and by caling PdfStamper method GetOverContent(pageNum) for individual pages I get PdfContentByte. With that PdfContentByte I'm adding my text/data (BeginText and EndText is used on every page). Most of text I add with method ShowTextAligned. That all ok, generated pdf contains my text. The problem begins where i have to add 'columned' text. I do that with following code:
ColumnText ct = new ColumnText(cb);//cb is PdfContentByte
Phrase p = new Phrase(txt, FontFactory.GetFont(DEFAULT_FONT, BaseFont.CP1250, true, font_size));
ct.SetSimpleColumn(p, x, y, x+width, y+height, 10, alignment);
ct.Go();
setDefaultFont();//sets font to PdfContentByte again with setFontAndSize and SetColorFill
Columned text is added with this code OK, but the text(on that same page/same PdfContentByte) added AFTER this with ShowTextAligned is not visible in Acrobat Reader.
Here is the 'fun' part - that text in same pdf file opened with foxit reader is fine/visible/ok.
So text added with ShowTextAligned after adding ColumnText is not visible in acrobat reader but visible in foxit reader just fine. This problem exists inside one page, new page resets this problem (PdfContentByte for next page is new).
My workaround for that was to add all ColumnText AFTER all calls of ShowTextAligned. That worked till today, when customer printed out generated pdf with acrobat reader, which after printing the document, displayed message that pdf contains error and that author of pdf should be contacted. Version of Adobe Reader is 10.1.1. Problem is not in customer computer, same thing hapens on my computer.
After researching the web I installed Adobe Acrodat Pro Trial which contains tool Preflight, which is purposed for analyzing pdfs (as far I understand). This tool outputs warning "Invalid content state stream for operator". And here I'm stucked. I belive the problem exists inside added ColumnText, because document generated without them causes no problem displaying/printing and Preflight states "No problem found".
It is possible that i'm missing some fact and that the problem is in my code...
Please help me, because i'm runnig out of ideas.
I hope this post will help someday someone else with the same problem.
I cannot attach sample pdf because it contains sensitive data, but if there is no other way, i'll recreate the scenario/code.
So to answer my question/problem:
When writing to pdf using PdfContentByte and using method ShowTextAligned you have to call BeginText before writing and after you are finished you have to call EndText. So i did. BUT if you want to add some other element(like ColumnText, Image and probably anything else) you can't do that before you call EndText. If you do, generated pdf will be 'problematical'/corrupted.
So in pseudocode following is wrong:
BeginText();
ShowtextAligned();
AddImage();
ShowtextAligned();
EndText();
Correct usage is:
BeginText();
ShowtextAligned();
EndText();
AddImage();
BeginText();
ShowtextAligned();
EndText();
I hope this will help someone someday somewhere.

Categories