PDF generated with itext becomes 'corrupted' when using SetSimpleColumn()

PDF generated with itext becomes 'corrupted' when using SetSimpleColumn() - c#

First I would like to point out that stackowerflow helped me with many problems in the past, so thank you all. But now I have come to problem that I haven't fount a solution for yet and it's driving me crazy. I'm not native english speaker, so sorry for any language mistakes.
So here it is:
I'm generating pdf with itextsharp library(great library by the way). I'm starting with some kind of pdf form/template, to which i'm adding 'fill-out' data. I'm using PdfReader to read template pdf and by caling PdfStamper method GetOverContent(pageNum) for individual pages I get PdfContentByte. With that PdfContentByte I'm adding my text/data (BeginText and EndText is used on every page). Most of text I add with method ShowTextAligned. That all ok, generated pdf contains my text. The problem begins where i have to add 'columned' text. I do that with following code:
ColumnText ct = new ColumnText(cb);//cb is PdfContentByte
Phrase p = new Phrase(txt, FontFactory.GetFont(DEFAULT_FONT, BaseFont.CP1250, true, font_size));
ct.SetSimpleColumn(p, x, y, x+width, y+height, 10, alignment);
ct.Go();
setDefaultFont();//sets font to PdfContentByte again with setFontAndSize and SetColorFill
Columned text is added with this code OK, but the text(on that same page/same PdfContentByte) added AFTER this with ShowTextAligned is not visible in Acrobat Reader.
Here is the 'fun' part - that text in same pdf file opened with foxit reader is fine/visible/ok.
So text added with ShowTextAligned after adding ColumnText is not visible in acrobat reader but visible in foxit reader just fine. This problem exists inside one page, new page resets this problem (PdfContentByte for next page is new).
My workaround for that was to add all ColumnText AFTER all calls of ShowTextAligned. That worked till today, when customer printed out generated pdf with acrobat reader, which after printing the document, displayed message that pdf contains error and that author of pdf should be contacted. Version of Adobe Reader is 10.1.1. Problem is not in customer computer, same thing hapens on my computer.
After researching the web I installed Adobe Acrodat Pro Trial which contains tool Preflight, which is purposed for analyzing pdfs (as far I understand). This tool outputs warning "Invalid content state stream for operator". And here I'm stucked. I belive the problem exists inside added ColumnText, because document generated without them causes no problem displaying/printing and Preflight states "No problem found".
It is possible that i'm missing some fact and that the problem is in my code...
Please help me, because i'm runnig out of ideas.
I hope this post will help someday someone else with the same problem.
I cannot attach sample pdf because it contains sensitive data, but if there is no other way, i'll recreate the scenario/code.

So to answer my question/problem:
When writing to pdf using PdfContentByte and using method ShowTextAligned you have to call BeginText before writing and after you are finished you have to call EndText. So i did. BUT if you want to add some other element(like ColumnText, Image and probably anything else) you can't do that before you call EndText. If you do, generated pdf will be 'problematical'/corrupted.
So in pseudocode following is wrong:
BeginText();
ShowtextAligned();
AddImage();
ShowtextAligned();
EndText();
Correct usage is:
BeginText();
ShowtextAligned();
EndText();
AddImage();
BeginText();
ShowtextAligned();
EndText();
I hope this will help someone someday somewhere.

Related

MVCRazorToPdf (iTextSharp) using custom font

I am trying to add a custom font to my pdf output using the nuget package MVCRazorToPdf but I am having trouble with how to do this as the documentation for iTextSharp isn't great and all seems to be outdated.
The current code I have for creating the pdf is:
return new PdfActionResult(
"test.cshtml",
new TestModel(),
(writer, document) =>
{
FontFactory.Register(HostingEnvironment.MapPath("~/content/fonts/vegur-regular-webfont.ttf"), "VegurRegular");
});
Where writer is a PdfWriter and document is a Document
All the examples of using the FontFactory show that you need to use the XmlWorker but I don't have access to that, so I was wondering if there was any way to change the documents font using the writer or document?
I've seen that there is the document.HtmlStyleClass property but can't find anything about how to use this anywhere.
Any help with this would be greatly appreciated

MVCRazorToPdf is a very, very simple wrapper around iTextSharp's XMLWorker and uses the even simpler XMLWorkerHelper with all defaults to do its work. If you look at the source you'll see this:
document.Open();
using (var reader = new StringReader(RenderRazorView(context, viewName)))
{
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, reader);
document.Close();
output = workStream.ToArray();
}
If you're dead-set on using the NuGet version then you're stuck with this implementation and you're not going to be able to register a custom font.
However, there's an open issue regarding this that includes a fix so if you're willing to compile from source you can apply that change and you should be all set.
If you want to go one step further I'd recommend reading this great post that shows how simple parsing HTML with iTextSharp is as well Bruno's post here that shows how to register fonts.
EDIT
As per the post in the includes a fix link (just in case the link breaks in future), change the above using statement to:
using (var reader = new MemoryStream(Encoding.UTF8.GetBytes(RenderRazorView(context, viewName))))
{
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, reader, null, FontFactory.FontImp as IFontProvider);
document.Close();
output = workStream.ToArray();
}
And then the font factory as registered in the question above will work when using style="font-family:VegurRegular;"

Append DataVizualisation Chart Control to text file

I'm developing a system that takes data input from textboxes, and on a button click, saves these values to the respective listbox ready to be written to a text file once the process is complete.
The next stage has been using this data to create graphs, which has gone successfully but I'm now looking for a way to add these onto the end of my text file so it's all included in one place.
I originally tried it like this (included in the total 'saveToFile' function):
consoleFile.WriteLine(chartBP.Text); //chart title
chartBP.SaveImage((fileName), System.Drawing.Imaging.ImageFormat.Jpeg);
consoleFile.WriteLine("\n\n");
This appeared to work ok but threw a run-time error stating that the file could not be accessed because it was being used by another process.
I don't think I'm far off where I need to be, but I don't have enough experience with charts to know what to try next.
Does anyone have any idea how to make this work, or another method that wouldn't produce the error?
Any help is greatly appreciated!

If your problem is just to get rid of the exception, then you could do this:
consoleFile.WriteLine(chartBP.Text);
consoleFile.Close();
FileStream imgFile = File.Open(filename, FileMode.Append);
chartBP.SaveImage(imgFile, ChartImageFormat.Jpeg);
imgFile.Close();
consoleFile = new StreamWriter(filename, true);
consoleFile.WriteLine("\n\n");
consoleFile.Close();
But remember, you're not going to see any images in your text file, just a messy stream of characters corresponding to the binary image, displayed as text.

Set BaseUrl of an existing Pdf Document

We're having trouble setting a BaseUrl using iTextSharp. We have used Adobes Implementation for this in the past, but we got some severe performance issues. So we switched to iTextSharp, which is aprox 10 times faster.
Adobe enabled us to set a base url for each document. We really need this in order to deploy our documents on different servers. But we cant seem to find the right code to do this.
This code is what we used with Adobe:
public bool SetBaseUrl(object jso, string baseUrl)
{
try
{
object result = jso.GetType().InvokeMember("baseURL", BindingFlags.SetProperty, null, jso, new Object[] {baseUrl });
return result != null;
}
catch
{
return false;
}
}
A lot of solutions describe how you can insert links in new or empty documents. But our documents already exist and do contain more than just text. We want to overlay specific words with a link that leads to one or more other documents. Therefore, its really important to us that we can insert a link without accessing the text itself. Maybe lay a box ontop of these words and set its position (since we know where the words are located in the document)
We have tried different implementations, using the setAction method, but it doesnt seem to work properly. The result was in most cases, that we saw out box, but there was no link inside or associated with it. (the cursor didn't change and nothing happend, when i clicked inside the box)
Any help is appreciated.

I've made you a couple of examples.
First, let's take a look at BaseURL1. In your comment, you referred to JavaScript, so I created a document to which I added a snippet of document-level JavaScript:
writer.addJavaScript("this.baseURL = \"http://itextpdf.com/\";");
This works perfectly in Adobe Acrobat, but when you try this in Adobe Reader, you get the following error:
NotAllowedError: Security settings prevent access to this property or
method. Doc.baseURL:1:Document-Level:0000000000000000
This is consistent with the JavaScript reference for Acrobat where it is clearly indicated that special permissions are needed to change the base URL.
So instead of following your suggested path, I consulted ISO-32000-1 (which was what I asked you to do, but... I've beaten you in speed).
I discovered that you can add a URI dictionary to the catalog with a Base entry. So I wrote a second example, BaseURL2, where I add this dictionary to the root dictionary of the PDF:
PdfDictionary uri = new PdfDictionary(PdfName.URI);
uri.put(new PdfName("Base"), new PdfString("http://itextpdf.com/"));
writer.getExtraCatalog().put(PdfName.URI, uri);
Now the BaseURL works in both Acrobat and Reader.
Assuming that you want to add a BaseURL to existing documents, I wrote BaseURL3. In this example, we add the same dictionary to the root dictionary of an existing PDF:
PdfReader reader = new PdfReader(src);
PdfDictionary uri = new PdfDictionary(PdfName.URI);
uri.put(new PdfName("Base"), new PdfString("http://itextpdf.com/"));
reader.getCatalog().put(PdfName.URI, uri);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
Using this code, you can change a link that points to "index.php" (base_url.pdf) into a link that points to "http://itextpdf.com/index.php" (base_url_3.pdf).
Now you can replace your Adobe license with a less expensive iTextSharp license ;-)

iText GetTextFromPage exception with inline image

I have the same problem as was discussed here, which was not solved. My objective is to extract the text from an existing pdf file. I get the error message Could not find image data or EI for a certain pdf, which I cannot share as a sample. It works for other pdfs, with the following code
string fileURI = "C:\\Test\\Sample.pdf";
PdfReader reader = new PdfReader(fileURI);
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
string s = PdfTextExtractor.GetTextFromPage(reader, 1, strategy);
Debug.WriteLine(s);
I am using iTextSharp 5.5.0 and tried changing found == 1 to found <= 1 as suggested in other posts. It does not help.
Would it help to remove all images in the pdf? I really just need the text. Which commands from iText could help me with this?

I downloaded the trial version of Acrobat to create a version of the pdf file, that I could share. After opening the file and saving it again as "Optimized PDF" over the Acrobat, the code was working and I could extract the text.
So the solution to the problem is probably opening each file in Acrobat and saving it again with the right settings using the Acrobat reference and then extracting the text.

Can I fill in an encrypted PDF with iTextSharp?

I have a fillable, saveable PDF file that has an owner password (that I don't have access to). I can fill it out in Adobe reader, export the FDF file, modify the FDF file, and then import it.
Then I tried to do it with iText for .NET. I can't create a PdfStamper from my PdfReader because I didn't provide the owner password to the reader. Is there any way to do this programmatically or must I recreate the document?
Even using FdfReader requires a PdfStamper. Am I missing anything? Anything legal that is - I'm pretty sure I could hack the document, but I can't. Ironically, recreating it would probably be ok.

This line will bypass edit password checking in iTextSharp:
PdfReader.unethicalreading = true;

[I found this question several months after it was posted and I'm posting this solution now for anyone who comes across this question in a search.]
I was in the exact same situation: my customer had a PDF with fillable fields that I needed to programmatically access. Unfortunately the PDF was password protected and they didn't have the password so I found couldn't work with their file.
What I discovered was that iTextSharp version 4.0.4 (and later) enforces password restrictions, earlier versions did not.
So I downloaded version 4.0.3 and sure enough it worked. In my case I didn't even have to change my code to use this older version.
You can download 4.0.3 (and all other versions) at SourceForge.

Two important things
Set PdfReader.unethicalreading = true to prevent BadPasswordException.
Set append mode in PdfStamper's constructor, otherwise the Adobe Reader Extensions signature becomes broken and Adobe Reader will display following message: "This document contained certain rights to enable special features in Adobe Reader. The document has been changed since it was created and these rights are no longer valid. Please contact the author for the original version of this document."
So all you need to do is this:
PdfReader.unethicalreading = true;
using (var pdfReader = new PdfReader("form.pdf"))
{
using (var outputStream = new FileStream("filled.pdf", FileMode.Create, FileAccess.Write))
{
using (var stamper = new iTextSharp.text.pdf.PdfStamper(pdfReader, outputStream, '\0', true))
{
stamper.AcroFields.Xfa.FillXfaForm("data.xml");
}
}
}
See How to fill XFA form using iText?

Unless someone else chimes in, I'll assume the answer is "No"
I wound up regenerating the PDF in an unencrypted form.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.