I'm using iTextSharp to merge a number of pdf files together into a single file.
I'm using method described in iTextSharp official tutorials, specifically here, which merges files page by page via PdfWriter and PdfImportedPage.
Turns out some of the files I need to merge are filled out PDF Forms and using this method of merging form data is lost.
I've see several examples of using PdfStamper to fill out forms and flatten them.
What I can't find, is a way to flatten already filled out PDF Form and hopefully merge it with the other files without saving it flattened out version first.
Thanks
Just setting .FormFlattening on PdfStamper wasn't quite enough...I ended up using a PdfReader with byte array of file contents that i used to stamp/flatten the data to get the byte array of that to put in a new PdfReader. Below is how i did it. works great now.
private void AppendPdfFile(FileDTO file, PdfContentByte cb, iTextSharp.text.Document printDocument, PdfWriter iwriter)
{
var reader = new PdfReader(file.FileContents);
if (reader.AcroForm != null)
reader = new PdfReader(FlattenPdfFormToBytes(reader,file.FileID));
AppendFilePages(reader, printDocument, iwriter, cb);
}
private byte[] FlattenPdfFormToBytes(PdfReader reader, Guid fileID)
{
var memStream = new MemoryStream();
var stamper = new PdfStamper(reader, memStream) {FormFlattening = true};
stamper.Close();
return memStream.ToArray();
}
When creating the files to be merged, I changed this setting:
pdfStamper.FormFlattening = true;
Works Great.
I think this problem is same with this one: AcroForm values missing after flattening
Based on the answer, this should do the trick:
pdfStamper.FormFlattening = true;
pdfStamper.AcroFields.GenerateAppearances = true;
This is the same answer as the accepted one but without any unused variables:
private byte[] GetUnEditablePdf(byte[] fileContents)
{
byte[] newFileContents = null;
var reader = new PdfReader(fileContents);
if (reader.AcroForm != null)
newFileContents = FlattenPdfFormToBytes(reader);
else newFileContents = fileContents;
return newFileContents;
}
private byte[] FlattenPdfFormToBytes(PdfReader reader)
{
var memStream = new MemoryStream();
var stamper = new PdfStamper(reader, memStream) { FormFlattening = true };
stamper.Close();
return memStream.ToArray();
}
Related
I am using iTextSharp to fill a few fields on a pdf. I need to be able to combine a series of these pdfs into one single batch pdf file. Below I am looping through a SQL result set, filling the fields of the pdf with values corresponding to the current record, storing that as a byte array, and consolidating all of those into a list of byte arrays. I am then attempting to merge each byte array in that list into a single byte array, and serve that as a pdf to the user.
It seems to work, generating a single file containing presumably as many individual pages as were in my result set, but with all the fields blank on each page. It works as expected when using FillForm() to serve a single pdf. What am I doing wrong?
byte[] pdfByteArray = new byte[0];
List<byte[]> pdfByteArrayList = new List<byte[]>();
byte[] pdfByteArrayItem = new byte[0];
foreach (DataRow row in results.Rows)
{
certNum = row[1].ToString();
certName = row[2].ToString();
certDate = row[3].ToString();
pdfByteArrayItem = FillForm(certType, certName, certNum, certDate);
pdfByteArrayList.Add(pdfByteArrayItem);
}
using (var ms = new MemoryStream()) {
using (var doc = new Document()) {
using (var copy = new PdfSmartCopy(doc, ms)) {
doc.Open();
//Loop through each byte array
foreach (var p in pdfByteArrayList) {
//Create a PdfReader bound to that byte array
using (var reader = new PdfReader(p)) {
//Add the entire document instead of page-by-page
copy.AddDocument(reader);
}
}
doc.Close();
}
}
pdfByteArray = ms.ToArray();
context.Response.ContentType = "application/pdf";
context.Response.BinaryWrite(pdfByteArray);
context.Response.Flush();
context.Response.End();
private byte[] FillForm(string certType, string certName, string certNum, string certDate)
{
string pdfTemplate = string.Format(#"\\filePath\{0}.pdf", certType);
PdfReader pdfReader = new PdfReader(pdfTemplate);
MemoryStream stream = new MemoryStream();
PdfStamper pdfStamper = new PdfStamper(pdfReader, stream);
AcroFields pdfFormFields = pdfStamper.AcroFields;
// set form pdfFormFields
pdfFormFields.SetField("CertName", certName);
pdfFormFields.SetField("CertNum", certNum);
pdfFormFields.SetField("CertDate", certDate);
// flatten the form to remove editting options, set it to false
// to leave the form open to subsequent manual edits
pdfStamper.FormFlattening = false;
// close the pdf
pdfStamper.Close();
stream.Flush();
stream.Close();
byte[] pdfByte = stream.ToArray();
return pdfByte;
}
Adding the line below after setting the value for the fields seems to have fixed it:
pdfFormFields.GenerateAppearances = true;
I working with PDF annotations using ITextSharp. I was able to add annotations pretty smoothly.
But now I'm trying to edit them. It looks like my PdfReader object is actually updated. But for some reason I can't save it. As shown in the snippet below, I try to get the byte array from using a stamper. The byte array is only 1 byte longer than the previous version no matter how long is the annotation. And when I open the PDF saved on the file system, I still have the old annotation...
private void UpdatePDFAnnotation(string title, string body)
{
byte[] newBuffer;
using (PdfReader pdfReader = new PdfReader(dataBuffer))
{
int pageIndex = 1;
int annotIndex = 0;
PdfDictionary pageDict = pdfReader.GetPageN(pageIndex);
var annots = pageDict.GetAsArray(PdfName.ANNOTS);
if (annots != null)
{
PdfDictionary annot = annots.GetAsDict(annotIndex);
annot.Put(PdfName.T, new PdfString(title));
annot.Put(PdfName.CONTENTS, new PdfString(body));
}
// ********************************
// this line shows the new annotation is in here. Just have to save it somehow !!
var updatedBody = pdfReader.GetPageN(pageIndex).GetAsArray(PdfName.ANNOTS).GetAsDict(0).GetAsString(PdfName.CONTENTS);
Debug.Assert(newBody == updatedBody.ToString(), "Annotation body should be equal");
using (MemoryStream outStream = new MemoryStream())
{
using (PdfStamper stamp = new PdfStamper(pdfReader, outStream, '\0', true))
{
newBuffer = outStream.ToArray();
}
}
}
File.WriteAllBytes( #"Assets\Documents\AnnotedPdf.pdf", newBuffer);
}
Any idea what's wrong with my code?
PdfStamper does much of the writing at the time it is being closed. This implicitly happens at the end of its using block. But you retrieve the MemoryStream contents already in that block. Thus, the PDF is not yet written to the retrieved byte[].
Instead either explicitly close the PdfStamper instance before retrieving the byte[]:
using (PdfStamper stamp = new PdfStamper(pdfReader, outStream, '\0', true))
{
stamp.Close();
newBuffer = outStream.ToArray();
}
or retrieve the byte[] after that using block:
using (PdfStamper stamp = new PdfStamper(pdfReader, outStream, '\0', true))
{
}
newBuffer = outStream.ToArray();
Allright, I finally got it to work. The trick was the two last parameter in the PdfStamper instantiation. I tried it before with only 2 parameters and ended up with a corrupted file. Then I tried again and now it works... here's the snippet
private void UpdatePDFAnnotation(string title, string body)
{
using (PdfReader pdfReader = new PdfReader(dataBuffer))
{
PdfDictionary pageDict = pdfReader.GetPageN(pageIndex);
var annots = pageDict.GetAsArray(PdfName.ANNOTS);
PdfDictionary annot = annots.GetAsDict(annotIndex);
annot.Put(PdfName.T, new PdfString(title));
annot.Put(PdfName.CONTENTS, new PdfString(body));
using (MemoryStream ms = new MemoryStream())
{
PdfStamper stamp = new PdfStamper(pdfReader, ms);
stamp.Dispose();
dataBuffer = ms.ToArray();
}
}
}
I have tried this now and its not working. form.GenerateAppearances = true; I merge my 2 documents and then save it. Then I open it again to populate all the fields. It says all the Acrofields keys are gone but when I open it in Nitro pro its there. Why can't I see them in code? Do I have to add something before I save?
private static void CombineAndSavePdf1(string savePath, List<string> lstPdfFiles)
{
using (Stream outputPdfStream = new FileStream(savePath, FileMode.Create, FileAccess.Write, FileShare.None))
{
Document document = new Document();
PdfSmartCopy copy = new PdfSmartCopy(document, outputPdfStream);
document.Open();
PdfReader reader;
int totalPageCnt;
PdfStamper stamper;
string[] fieldNames;
foreach (string file in lstPdfFiles)
{
reader = new PdfReader(file);
totalPageCnt = reader.NumberOfPages;
for (int pageCnt = 0; pageCnt < totalPageCnt; )
{
//have to create new reader for each page or PdfStamper will throw error
reader = new PdfReader(file);
stamper = new PdfStamper(reader, outputPdfStream);
fieldNames = new string[stamper.AcroFields.Fields.Keys.Count];
stamper.AcroFields.Fields.Keys.CopyTo(fieldNames, 0);
foreach (string name in fieldNames)
{
stamper.AcroFields.RenameField(name, name);
}
copy.AddPage(copy.GetImportedPage(reader, ++pageCnt));
}
copy.FreeReader(reader);
}
}
}
You are merging the documents the wrong way. See MergeForms to find out how to do it correctly. The key line that is missing in your code, is:
copy.setMergeFields();
Without it, the fields disappear (as you have noticed).
There's also a MergeForms2 example that explains how to merge two identical forms. In this case, you need to rename the fields, because each field needs to have a unique name. I'm adding a reference to this second example, because I see that you also try renaming the fields. There is, however, a serious flaw in your code: you create a stamper object, but you never do stamper.close(). Your use of the reader object is also problematic. All in all, it would be best to throw away your code, and to start anew using the two examples from the official iText web site.
Update: I've added the tags itext and itextsharp to your question. Only then I noticed that you're using iTextSharp instead of iText. Porting the Java code to C# should be easy for a C# developer, but I've never written a C# program, so please use the JAVA examples as if you were using pseudo-code. The code in C# won't be that different.
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
Document document = new Document();
PdfCopy copy = new PdfSmartCopy(document, new FileOutputStream(dest));
copy.setMergeFields();
document.open();
List<PdfReader> readers = new ArrayList<PdfReader>();
for (int i = 0; i < 3; ) {
PdfReader reader = new PdfReader(renameFields(src, ++i));
readers.add(reader);
copy.addDocument(reader);
}
document.close();
for (PdfReader reader : readers) {
reader.close();
}
}
public byte[] renameFields(String src, int i) throws IOException, DocumentException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, baos);
AcroFields form = stamper.getAcroFields();
Set<String> keys = new HashSet<String>(form.getFields().keySet());
for (String key : keys) {
form.renameField(key, String.format("%s_%d", key, i));
}
stamper.close();
reader.close();
return baos.toByteArray();
}
I'm using itext sharp to fill my form fields on my template with values.
I created the template using pdfescape.com
Here is my code that I use to place the values in the pdf template.
private static byte[] GeneratePdf(Dictionary<String, String> formKeys, String pdfPath)
{
var templatePath = System.Web.HttpContext.Current.Server.MapPath(pdfPath);
var reader = new PdfReader(templatePath);
var outStream = new MemoryStream();
var stamper = new PdfStamper(reader, outStream);
var form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
// "Flatten" the form so it wont be editable/usable anymore
// stamper.FormFlattening = true;
foreach (KeyValuePair<String, String> pair in formKeys)
{
if (fieldKeys.Any(f => f == pair.Key))
{
form.SetField(pair.Key, pair.Value);
form.SetFieldProperty(pair.Key, "setfflags", PdfFormField.FF_READ_ONLY, null);
}
}
stamper.Close();
reader.Close();
return outStream.ToArray();
}
I first used the stamper.FormFlattening = true, but then the values weren't visible. So Instead of using the form flattening I just set the values as ready only and everything works fine.
Now I want to merge multiple of these pdf files using the the pdf merger by smart-soft
Once the merging is complete the values aren't visible. When I highlight over the form it highlights all the text, but I can't read it.I did research on this and read that the fields need to be flattened.
Here is an image of how it looks on the pdf when I highlight everything:
I don't know why my fields aren't visible when they are flattened, even if I don't use the merger. Is there something wrong with the code or the template? Alternatives will also be appreciated.
Btw my project is an asp-mvc project if that is relevant.
EDIT
I added the following code so that I first read the template, write the values to the form fields, close it, reopen it, flatten and then close it again as suggested by one of the comments. I just pass the result I get from the GeneratePdf function to this function:
private static byte[] flattenPdf(byte[] pdf)
{
var reader = new PdfReader(pdf);
var outStream = new MemoryStream();
var stamper = new PdfStamper(reader, outStream);
stamper.FormFlattening = true;
stamper.Close();
reader.Close();
return outStream.ToArray();
}
I still get the same result
I found a solution to this problem thanks to this answer by rhens
All I had to do is modify my GeneratePdf function by adding one line:
form.GenerateAppearances = true;
Here is the end result:
private static byte[] GeneratePdf(Dictionary<String, String> formKeys, String pdfPath)
{
var templatePath = System.Web.HttpContext.Current.Server.MapPath(pdfPath);
var reader = new PdfReader(templatePath);
var outStream = new MemoryStream();
var stamper = new PdfStamper(reader, outStream);
var form = stamper.AcroFields;
form.GenerateAppearances = true; //Added this line, fixed my problem
var fieldKeys = form.Fields.Keys;
foreach (KeyValuePair<String, String> pair in formKeys)
{
if (fieldKeys.Any(f => f == pair.Key))
{
form.SetField(pair.Key, pair.Value);
}
}
stamper.Close();
reader.Close();
return flattenPdf(outStream.ToArray());
}
and the flattenPdf stays the same as in my question.
I am using itextsharp to populate my PDFs. I have no issues with this. Basically what I am doing is getting the PDF and populating the fields in memory then passing back the MemoryStream to be displayed on a webpage. All this is working with a single document PDF.
What I am trying to figure out now, is merging multiple PDFs into one MemoryStream. The part I cant figure out is, the documents I am populating are identical. So for example, I have a List<Person> that contains 5 persons. I want to fill out a PDF for each person and merge them all into one, in memory. Bare in mind I am going to fill out the same type of document for each person.
The problem I am getting is that when I try to add a second copy of the same PDF to be filled out for the second iteration, it just overwrites the first populated PDF, since it's the same document, therefore not adding a second copy for the second Person at all.
So basically if I had the 5 people, I would end up with a single page with the data of the 5th person, instead of a PDF with 5 like pages that contain the data of each person respectively.
Here's some code...
MemoryStream ms = ms = new MemoryStream();
PdfReader docReader = null;
PdfStamper Stamper = null;
List<Person> persons = new List<Person>() {
new Person("Larry", "David"),
new Person("Dustin", "Byfuglien"),
new Person("Patrick", "Kane"),
new Person("Johnathan", "Toews"),
new Person("Marian", "Hossa")
};
try
{
// Iterate thru all persons and populate a PDF for each
foreach(var person in persons){
PdfCopyFields Copier = new PdfCopyFields(ms);
Copier.AddDocument(GetReader("Person.pdf"));
Copier.Close();
docReader = new PdfReader(ms.ToArray());
Stamper = new PdfStamper(docReader, ms);
AcroFields Fields = Stamper.AcroFields;
Fields.SetField("FirstName", person.FirstName);
}
}catch(Exception e){
// handle error
}finally{
if (Stamper != null)
{
Stamper.Close();
}
if (docReader != null)
{
docReader.Close();
}
}
I have created a working solution, I hope this helps someone along the way.
Create a PopulatePDF() method that takes the Person object and returns a byte[]:
private byte[] PopulatePersonPDF(Person obj)
{
MemoryStream ms = new MemoryStream();
PdfStamper Stamper = null;
try
{
PdfCopyFields Copier = new PdfCopyFields(ms);
Copier.AddDocument(GetReader("Person.pdf"));
Copier.Close();
PdfReader docReader = new PdfReader(ms.ToArray());
ms = new MemoryStream();
Stamper = new PdfStamper(docReader, ms);
AcroFields Fields = Stamper.AcroFields;
Fields.SetField("FirstName", obj.FirstName);
}
finally
{
if (Stamper != null)
{
Stamper.Close();
}
}
return ms.ToArray();
}
Create a MergePDFs() method that returns the MemoryStream:
private MemoryStream MergePDFs(List<byte[]> pdfs)
{
MemoryStream ms = new MemoryStream();
PdfCopyFields Copier = new PdfCopyFields(ms);
foreach (var pdf in pdfs)
Copier.AddDocument(new PdfReader(pdf));
Copier.Close();
return ms;
}
Example Implementation:
List<Person> persons = new List<Person>() {
new Person("Larry", "David"),
new Person("Dustin", "Byfuglien"),
new Person("Patrick", "Kane"),
new Person("Johnathan", "Toews"),
new Person("Marian", "Hossa")
};
List<byte[]> pdfs = new List<byte[]>();
foreach(var person in persons)
pdfs.Add(PopulatePersonPDF(person));
MemoryStream ms = MergePDFs(pdfs);
Check the PdfStamper constructor signature there is an overload that takes a boolean value that tells it to append to the current document.
here might be another answer to your solution: Batch Pdf generation