Convert HTML to PDF using iText with special char fail

Convert HTML to PDF using iText with special char fail - c#

I'm facing a problem by using iText and XMLWorkerHelper for a specific case. I generate multiple PDF files with multiple pages without problem but sometimes, an error occur with special characters.
I tested my template and it's not a problem with my HTML, even if the exception say :
Exception thrown: 'iTextSharp.tool.xml.exceptions.RuntimeWorkerException' in itextsharp.xmlworker.dll
Additional information: Invalid nested tag tr found, expected closing tag td.
This error is due to the character : & which is added into my template.
<td>Launch C&O</td>
I don't exactly know how to resolve this error, is it an encoding error ? Should I specify an encoding mode when I create the PDF ?
This is the code which create a PDF :
public async Task Generate(Stream stream, List<string> contentPages)
{
try
{
int cpt = 1;
Document document = new Document();
PdfWriter writer = PdfWriter.GetInstance(document, stream);
writer.CloseStream = false;
document.Open();
foreach (string pdfContentPage in contentPages)
{
try
{
document.NewPage();
using (StringReader srHtml = new StringReader(pdfContentPage ))
{
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, srHtml);
}
++cpt;
}
catch (RuntimeWorkerException ex)
{
Console.Write($"An error occured at PDF generation for cpt = {cpt}");
Console.Write(ex.Message);
}
catch (Exception)
{
Console.WriteLine($"Content Error : pdfContentPage}");
throw;
}
}
document.Close();
}
catch (Exception)
{
throw;
}
}
If you have an advice, I'm glad to read it ! :)

At the query binding field where & symbol comes i used a replace function for all describtion's For example at the bind grid s(2)=" TEST & TEST"
for loop
dim desc as string
desc="TEST & TEST"
desc=desc.replace("&"," ")
s(2)= desc
end of loop
Thus the issue was solved in my case

Try with below logic
InputStream is = new ByteArrayInputStream(srHtml.getBytes(Charset.forName("UTF-8")));
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, is, Charset.forName("UTF-8"));
With, xmlworker 5.5.12 and itextpdf 5.5.12 version

Related

Create PDF by copying it from template with PdfCopy (lost of data)

I'm trying to create a new pdf file based on another one using PdfCopy.
Everything work fine during generation and the generated file can be opened without any problem on my desktop, but the file seems to be corrupted and isn't accepted by the service that I must use :
SignService error when calling 'sign', probably caused by a bad file format.
I noticed that the generated pdf is always ligther than the original template, so i compared the template version with the generated one. There are some big parts of missing data, especially a whole bunch of xml. I guess PdfCopy does not copying every of my original pdf but i cannot figured out what am i missing.
here is my method :
byte[] completedDocument = null;
string originalUri = Path.Combine(this.PdfPath, pdfName);
string generatedUri = Path.Combine(this.PdfGeneratedPath, generatedPdfName);
using(MemoryStream streamCompleted = new MemoryStream())
{
using(Document doc = new Document())
{
PdfCopy copy = new PdfCopy(doc, streamCompleted);
copy.PdfVersion = PdfWriter.VERSION_1_6;
doc.Open();
copy.Open();
byte[] mergedDocument = null;
PdfReader pdfReader = new PdfReader(originalUri);
int pdfPageNumber = pdfReader.NumberOfPages;
using(MemoryStream streamTemplate = new MemoryStream())
{
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, streamTemplate))
{
AcroFields acrofields = pdfStamper.AcroFields;
foreach (KeyValuePair<string, AcroFields.Item> field in acrofields.Fields)
{
string data;
if (pdfFieldsValues.TryGetValue(field.Key, out data))
{
if (data == null)
{
data = string.Empty;
}
acrofields.SetField(field.Key, data);
}
}
pdfStamper.FormFlattening = true;
pdfStamper.Writer.CloseStream = false;
}
mergedDocument = streamTemplate.ToArray();
}
pdfReader = new PdfReader(mergedDocument);
for (int page = 1; page <= pdfPageNumber; page++)
{
if (!excludedPages.Any(s => s == page))
{
copy.AddPage(copy.GetImportedPage(pdfReader, page));
}
}
doc.Close();
copy.Close();
}
completedDocument = streamCompleted.ToArray();
}
File.WriteAllBytes(generatedUri, completedDocument);
I tried to upload the "mergedDocument" rather than the "completedDocument" and my service accepting it, so i'm pretty sure it has something to do with this part :
for (int page = 1; page <= pdfPageNumber; page++)
{
if (!excludedPages.Any(s => s == page))
{
copy.AddPage(copy.GetImportedPage(pdfReader, page));
}
}
Or pdfCopy init

You start with a form. You fill out the form and you flatten it. By flattening it, you deliberately throw away all interactivity. I'm surprised that you're surprised that the file is getting smaller: you're throwing away the form infrastructure!
You then upload the flattened file to some service unknown to us. This service complains:
SignService error when calling 'sign', probably caused by a bad file format.
As we don't know which service you are talking about, we can only guess. An educated guess would be that the original form contains a signature field that needs to be signed by a signing service.
Obviously that field is gone: you flattened the form! I may be wrong, but I assume that the service also tries to read the fields you filled out, but that won't be possible either as you throw away all interactivity. Please remove the following line:
pdfStamper.FormFlattening = true;
Then there's Chris' comment: it seems that you're using PdfCopy. If you're using an old version of iTextSharp (before iText 5.5.1), you shouldn't expect the form to be preserved. If you are using a recent version, you should instruct PdfCopy to preserve the form (but that line is missing). You don't need to ask 'how do I preserve the form?' because you shouldn't be using PdfCopy anyway.
You only need PdfStamper. You already use PdfStamper to fill out the fields, now you can also use the selectPages() method to select the pages you want to keep (or to exclude the ones you want to remove).
Finally, it is unclear what you mean when you write:
There are some big parts of missing data, especially a whole bunch of xml.
Are you saying that the form isn't a pure AcroForm, but that it also contains an XFA stream? If so, then you most definitely can't use PdfCopy.

how to convert window form data into PDF

I am working on an Salary Project and what i want to do is
when an user see their Salary slip and click on Downloads then the complete form data is converted into PDf file and stored on an predifined location..
plz suggest the code to meet my requirements..

I had faced this problem before, and the best I found to solve it was to user Microsoft Word Interops. You can put whatever you want in a word document and then save it as PDF, fortunately Microsoft word allows you to export the document to PDF.
The simplest way to do this would be to save your data as just plain text, but don't forget to well format them, and then run this method to convert the plain text to PDF.
public PDFWriter(String Path, String FileName) {
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
try
{
word.Visible = false;
word.Documents.Open(Path);
word.ActiveDocument.SaveAs2(FileName + ".pdf", Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatPDF);
this.Path = FileName + ".pdf";
}
catch (Exception e)
{
word.Quit();
throw new Exception(e.Message);
}
finally
{
word.Quit();
}
}

Download pdfSharp.dll pdfSharp
and add it as reference.
Capture your form as image and then
private void ImageToPdf()
{
PdfSharp.Pdf.PdfDocument doc = new PdfSharp.Pdf.PdfDocument();
PdfSharp.Pdf.PdfPage oPage = new PdfSharp.Pdf.PdfPage();
String destinaton = "your destination";
doc.Pages.Add(oPage);
XGraphics xgr;
XImage img;
img = XImage.FromGdiPlusImage(form image);
xgr = PdfSharp.Drawing.XGraphics.FromPdfPage(oPage);
xgr.DrawImage(img, 0, 0);
doc.Save(destinaton);
doc.Close();
}
valter

xml.LoadData - Data at the root level is invalid. Line 1, position 1

I'm trying to parse some XML inside a WiX installer. The XML would be an object of all my errors returned from a web server. I'm getting the error in the question title with this code:
XmlDocument xml = new XmlDocument();
try
{
xml.LoadXml(myString);
}
catch (Exception ex)
{
System.IO.File.WriteAllText(#"C:\text.txt", myString + "\r\n\r\n" + ex.Message);
throw ex;
}
myString is this (as seen in the output of text.txt)
<?xml version="1.0" encoding="utf-8"?>
<Errors></Errors>
text.txt comes out looking like this:
<?xml version="1.0" encoding="utf-8"?>
<Errors></Errors>
Data at the root level is invalid. Line 1, position 1.
I need this XML to parse so I can see if I had any errors.

The hidden character is probably BOM.
The explanation to the problem and the solution can be found here, credits to James Schubert, based on an answer by James Brankin found here.
Though the previous answer does remove the hidden character, it also removes the whole first line. The more precise version would be:
string _byteOrderMarkUtf8 = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
if (xml.StartsWith(_byteOrderMarkUtf8))
{
xml = xml.Remove(0, _byteOrderMarkUtf8.Length);
}
I encountered this problem when fetching an XSLT file from Azure blob and loading it into an XslCompiledTransform object.
On my machine the file looked just fine, but after uploading it as a blob and fetching it back, the BOM character was added.

Use Load() method instead, it will solve the problem. See more

The issue here was that myString had that header line. Either there was some hidden character at the beginning of the first line or the line itself was causing the error. I sliced off the first line like so:
xml.LoadXml(myString.Substring(myString.IndexOf(Environment.NewLine)));
This solved my problem.

I Think that the problem is about encoding. That's why removing first line(with encoding byte) might solve the problem.
My solution for Data at the root level is invalid. Line 1, position 1.
in XDocument.Parse(xmlString) was replacing it with XDocument.Load( new MemoryStream( xmlContentInBytes ) );
I've noticed that my xml string looked ok:
<?xml version="1.0" encoding="utf-8"?>
but in different text editor encoding it looked like this:
?<?xml version="1.0" encoding="utf-8"?>
At the end i did not need the xml string but xml byte[]. If you need to use the string you should look for "invisible" bytes in your string and play with encodings to adjust the xml content for parsing or loading.
Hope it will help

Save your file with different encoding:
File > Save file as... > Save as UTF-8 without signature.
In VS 2017 you find encoding as a dropdown next to Save button.

Main culprit for this error is logic which determines encoding when converting Stream or byte[] array to .NET string.
Using StreamReader created with 2nd constructor parameter detectEncodingFromByteOrderMarks set to true, will determine proper encoding and create string which does not break XmlDocument.LoadXml method.
public string GetXmlString(string url)
{
using var stream = GetResponseStream(url);
using var reader = new StreamReader(stream, true);
return reader.ReadToEnd(); // no exception on `LoadXml`
}
Common mistake would be to just blindly use UTF8 encoding on the stream or byte[]. Code bellow would produce string that looks valid when inspected in Visual Studio debugger, or copy-pasted somewhere, but it will produce the exception when used with Load or LoadXml if file is encoded differently then UTF8 without BOM.
public string GetXmlString(string url)
{
byte[] bytes = GetResponseByteArray(url);
return System.Text.Encoding.UTF8.GetString(bytes); // potentially exception on `LoadXml`
}

I've solved this issue by directly editing the byte array.
Collect the UTF8 preamble and remove directly the header.
Afterward you can transform the byte[]to a string with GetString method, see below.
The \r and \t I've removed as well, just as precaution.
XmlDocument configurationXML = new XmlDocument();
List<byte> byteArray = new List<byte>(webRequest.downloadHandler.data);
foreach(byte singleByte in Encoding.UTF8.GetPreamble())
{
byteArray.RemoveAt(byteArray.IndexOf(singleByte));
}
string xml = System.Text.Encoding.UTF8.GetString(byteArray.ToArray());
xml = xml.Replace("\\r", "");
xml = xml.Replace("\\t", "");

If your xml is in a string use the following to remove any byte order mark:
xml = new Regex("\\<\\?xml.*\\?>").Replace(xml, "");

At first I had problems escaping the "&" character, then diacritics and special letters were shown as question marks and ended up with the issue OP mentioned.
I looked at the answers and I used #Ringo's suggestion to try Load() method as an alternative. That made me realize that I can deal with my response in other ways not just as a string.
using System.IO.Stream instead of string solved all the issues for me.
var response = await this.httpClient.GetAsync(url);
var responseStream = await response.Content.ReadAsStreamAsync();
var xmlDocument = new XmlDocument();
xmlDocument.Load(responseStream);
The cool thing about Load() is that this method automatically detects the string format of the input XML (for example, UTF-8, ANSI, and so on). See more

I have found out one of the solutions.
For your code this could be as follows -
XmlDocument xml = new XmlDocument();
try
{
// assuming the location of the file is in the current directory
// assuming the file name be loadData.xml
string myString = "./loadData.xml";
xml.Load(myString);
}
catch (Exception ex)
{
System.IO.File.WriteAllText(#"C:\text.txt", myString + "\r\n\r\n" + ex.Message);
throw ex;
}

if we are using XDocument.Parse(#"").
Use # it resolves the issue.

Using an XmlDataDocument object is much better than using an XDocument or XmlDocument object. XmlDataDocument works fine with UTF8 and it doesn't have problems with Byte Order Sequences. You can get the child nodes of each element using ChildNodes property.
Use a custom function such as the following one:
static public void ReadXmlDataDocument2(string xmlFilePath)
{
if (xmlFilePath != null)
{
if (File.Exists(xmlFilePath))
{
System.IO.FileStream fs = default(System.IO.FileStream);
try
{
fs = new System.IO.FileStream(xmlFilePath, System.IO.FileMode.Open, System.IO.FileAccess.Read);
System.Xml.XmlDataDocument k_XDoc = new System.Xml.XmlDataDocument();
k_XDoc.Load(fs);
fs.Close();
fs.Dispose();
fs = null;
XmlNodeList ndsRoot = k_XDoc.ChildNodes;
foreach (System.Xml.XmlNode xLog in ndsRoot)
{
foreach (System.Xml.XmlNode xLog2 in xLog.ChildNodes)
{
if (xLog2.Name == "ERRORs")
{
foreach (System.Xml.XmlNode xLog3 in xLog2.ChildNodes)
{
if (xLog3.Name == "ErrorCode")
{
// Do something
}
if (xLog3.Name == "Description")
{
// Do something
}
}
}
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
}
}

PDFSharp filling in form fields

I would like to fill in form fields in a premade PDF doc, but I'm receiving a Null Refrence error with AcroForm when running.
string fileN4 = TextBox1.Text + " LOG.pdf";
File.Copy(Path.Combine(textBox4.Text + "\\", fileN4),
Path.Combine(Directory.GetCurrentDirectory(), fileN4), true);
// Open the file
PdfDocument document = PdfReader.Open(fileN4, PdfDocumentOpenMode.Modify);
PdfTextField currentField = (PdfTextField)(document.AcroForm.Fields["<CASENUM>"]);
//const
string caseName = TextBox1.Text;
PdfString caseNamePdfStr = new PdfString(caseName);
//set the value of this field
currentField.Value = caseNamePdfStr;
// Save the document...
document.Save(fileN4);
So PdfTextField currentField = (PdfTextField)(document.AcroForm.Fields["<CASENUM>"]); is where the error happens. It seams that AcroForm is not even recognizing the fields.
Another option would be a find and replace text in a PDF (without using itextsharp as cannot use due to licensing).
Any help would be awesome!

You also need this if you are attempting to populate PDF form fields, you also need to set the NeedsAppearances element to true. Otherwise the PDF will "hide" the values on the form. Here is the VB code.
If objPdfSharpDocument.AcroForm.Elements.ContainsKey("/NeedAppearances") = False Then
objPdfSharpDocument.AcroForm.Elements.Add("/NeedAppearances", New PdfSharp.Pdf.PdfBoolean(True))
Else
objPdfSharpDocument.AcroForm.Elements("/NeedAppearances") = New PdfSharp.Pdf.PdfBoolean(True)
End If

I've been working on this today and I've managed to create a working solution. I've pasted my working code below. The only real differences I can see between my code and the OP's is the following:
I included Marc Ferree's code to set NeedAppearances (+1 and Many thanks!!)
I set the Text property of the field using a String variable, and not the Value property using a PdfString.
Hopefully this will be of use to somebody trying to do the same.
string templateDocPath = Server.MapPath("~/Documents/MyTemplate.pdf");
PdfDocument myTemplate = PdfReader.Open(templateDocPath, PdfDocumentOpenMode.Modify);
PdfAcroForm form = myTemplate.AcroForm;
if (form.Elements.ContainsKey("/NeedAppearances"))
{
form.Elements["/NeedAppearances"] = new PdfSharp.Pdf.PdfBoolean(true);
}
else
{
form.Elements.Add("/NeedAppearances", new PdfSharp.Pdf.PdfBoolean(true));
}
PdfTextField testField = (PdfTextField)(form.Fields["TestField"]);
testField.Text = "012345";
myTemplate.Save(Server.MapPath("~/Documents/Amended.pdf")); // Save to new file.

I got stuck with this same problem earlier today. However, I think the source code has ?updated, so if you try the method above you are going to get a NullExceptionError. Instead, for TextField you need to generate a PdfString and use testfield.Value instead of .text. Here's an example.
static PdfAccess()
{
Pdf.PdfDocument doc = Pdf.IO.PdfReader.Open(#"C:\...\ Contract.pdf", Pdf.IO.PdfDocumentOpenMode.Modify);
Pdf.AcroForms.PdfAcroForm form = doc.AcroForm;
if (form.Elements.ContainsKey("/NeedAppearances"))
{
form.Elements["/NeedAppearances"] = new PdfSharp.Pdf.PdfBoolean(true);
}
else
{
form.Elements.Add("/NeedAppearances", new PdfSharp.Pdf.PdfBoolean(true));
}
var name = (Pdf.AcroForms.PdfTextField)(form.Fields["Email"]);
name.Value = new Pdf.PdfString("ramiboy");
doc.Save(#"C:\...\ Contract.pdf");
doc.Close();

I have just experienced something similar to this. The first pdf file I opened did not contain acroform data and resulted in a null exception as described above. The issue is not with the opening of the pdf but the reference to the Acroform member variable having a value of null. You can test your pdf using the following code example:
OpenFileDialog ofd = new OpenFileDialog();
if (ofd.ShowDialog() == DialogResult.OK)
{
PdfDocument _document = null;
try
{
_document = PdfReader.Open(ofd.FileName, PdfDocumentOpenMode.Modify);
}
catch(Exception ex)
{
MessageBox.Show(ex.Message,"FATAL");
//do any cleanup and return
return;
}
if (_document != null)
{
if (_document.AcroForm != null)
{
MessageBox.Show("Acroform is object","SUCCEEDED");
//pass acroform to some function for processing
_document.Save(#"C:\temp\newcopy.pdf");
}
else
{
MessageBox.Show("Acroform is null","FAILED");
}
}
else
{
MessageBox.Show("Uknown error opening document","FAILED");
}
}
ADENDUM
I also noticed the key in this line of code should not have angle brackets
document.AcroForm.Fields["<CASENUM>"]
Change it to
document.AcroForm.Fields["CASENUM"]

The solution to overcome the NullReferenceException is to open your pre-made
PDF with Adobe Acrobat and give your form fields a default value, by changing their property-type to be something other than null.

Have you tried putting the current directory in when you try to open it?
Change
PdfDocument document = PdfReader.Open(fileN4, PdfDocumentOpenMode.Modify);
to
PdfDocument document = PdfReader.Open(Path.Combine(Directory.GetCurrentDirectory(), fileN4), PdfDocumentOpenMode.Modify);
I'm pretty sure that PdfReader will need a full file path, although I only use ASPOSE for pdf creation.

Convert blank .txt file to PDF in C#

I am converting a .txt to .pdf in c#. This works fine if the .txt file is not blank. if it is, it threw an error of "The document has no pages".
The pdf gets generated but threw an error of "There was an error opening this document. The file is damaged and could not be repaired" when opening a pdf file.
Code is seen below
public void converttxttoPDF(string sourcePath, string destPath)
{
try
{
iTextSharp.text.Document document = new iTextSharp.text.Document();
string filename = Path.GetFileNameWithoutExtension(sourcePath);
System.IO.StreamReader myFile = new System.IO.StreamReader(sourcePath);
string myString = myFile.ReadToEnd();
myFile.Close();
if (!Directory.Exists(destPath))
Directory.CreateDirectory(destPath);
iTextSharp.text.pdf.PdfWriter.GetInstance(document, new FileStream(destPath + "\\" + filename + ".pdf", FileMode.CreateNew));
document.Open();
document.Add(new iTextSharp.text.Paragraph(myString));
document.Close();
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
let me know if any info needed.
thanks

You need to add some content to the pdf. So try this:
myString = string.IsNullOrEmpty(myString) ? " " : myString;
document.Add(new iTextSharp.text.Paragraph(myString));

You need to convince iText that there IS something on that page.
Two Methods:
Be explicit. writer.setPageEmpty(false);
Trick it (which is what Darin suggests). writer.getDirectContent().setLiteral(" ");

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.