I'm trying to add a named destination to an existing PDF using iTextSharp, but the example code I've found online has not worked (the original example is in javascript). Am I doing something wrong? Below is the code I'm using, New Text Document.pdf is a single page PDF created from a txt file.
var sourcePdfPath = #"C:\Temp\New Text Document.pdf";
var destPdfPath = #"C:\Temp\New Text Document_bettered.pdf";
using (var reader = new PdfReader(sourcePdfPath))
{
using (var stamper = new PdfStamper(reader, new FileStream(destPdfPath, FileMode.Create)))
{
var destination = new PdfDestination(PdfDestination.FIT);
var writer = stamper.Writer;
writer.AddNamedDestination("Destination.Name", 1, destination);
stamper.Close();
}
reader.Close();
}
Related
I'm saving the contents of my PDF file (pdfAsString) to the database.
File is of type IFormFile (file uploaded by the user).
string pdfAsString;
using (var reader = new StreamReader(indexModel.UploadModel.File.OpenReadStream()))
{
pdfAsString = await reader.ReadToEndAsync();
// pdfAsString = ; // encoding function or lack thereof
}
Later I'm trying to fetch and use these contents to initialize a new instance of MemoryStream, then using that to create a PdfReader and then using that to create a PdfDocument, but at this point I get the 'Trailer not found' exception. I have verified that the Trailer part of the PDF is present inside the contents of the string that I use to create the MemoryStream. I have also made sure the position is set to the beginning of the file.
The issue seems related to the format of the PDF contents fetched from the database. iText7 doesn't seem able to navigate through it other than the beginning of the file.
I'm expecting to be able to create an instance of PdfDocument with the contents of the PDF saved to my database.
Note 1: Using the Stream created from OpenReadStream() works when trying to create a PdfReader and then PdfDocument, but I don't have access to that IFormFile when reading from the DB, so this doesn't help me in my use case.
Note 2: If I use the PDF from my device by giving a path, it works correctly, same for using a FileStream created from a path. However, this doesn't help my use case.
So far, I've tried saving it raw and then using that right out of the gate (1) or encoding special symbols like \n \t to ASCII hexadecimal notation (2). I've also tried HttpUtility.UrlEncode on save and UrlDecode after getting the database record (3), and also tried ToBase64String on save and FromBase64String on get (4).
// var pdfContent = databaseString; // 1
// var pdfContent = databaseString.EncodeSpecialCharacters(); // encode special symbols // 2
// var pdfContent = HttpUtility.UrlDecode(databaseString); // decode urlencoded string // 3
var pdfContent = Convert.FromBase64String(databaseString); // decode base64 // 4
using (var stream = new MemoryStream(pdfContent))
{
PdfReader pdfReader = new PdfReader(stream).SetUnethicalReading(true);
PdfWriter pdfWriter = new PdfWriter("new-file.pdf");
PdfDocument pdf = new PdfDocument(pdfReader, pdfWriter); // exception here :(
// some business logic...
}
Any help would be appreciated.
EDIT: on a separate project, I'm trying to run this code:
using (var stream = File.OpenRead("C:\\<path>\\<filename>.pdf"))
{
var formFile = new FormFile(stream, 0, stream.Length, null, "<filename>.pdf");
var reader = new StreamReader(formFile.OpenReadStream());
var pdfAsString = reader.ReadToEnd();
var pdfAsBytes = Encoding.UTF8.GetBytes(pdfAsString);
using (var newStream = new MemoryStream(pdfAsBytes))
{
newStream.Seek(0, SeekOrigin.Begin);
var pdfReader = new PdfReader(newStream).SetUnethicalReading(true);
var pdfWriter = new PdfWriter("Test-PDF-1.pdf");
PdfDocument pdf = new PdfDocument(pdfReader, pdfWriter);
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdf, true);
IDictionary<string, PdfFormField> fields = form.GetFormFields();
foreach (var field in fields)
{
field.Value.SetValue(field.Key);
}
//form.FlattenFields();
pdf.Close();
}
}
and if I replace "newStream" inside of PdfReader with formFile.OpenReadStream() it works fine, otherwise I get the 'Trailer not found' exception.
Answer: use BinaryReader and ReadBytes instead of StreamReader when initially trying to read the data. Example below:
using (var stream = File.OpenRead("C:\\<filepath>\\<filename>.pdf"))
{
// FormFile - my starting point inside of the web application
var formFile = new FormFile(stream, 0, stream.Length, null, "<filename>.pdf");
var reader = new BinaryReader(formFile.OpenReadStream());
var pdfAsBytes = reader.ReadBytes((int)formFile.Length); // store this in the database
using (var newStream = new MemoryStream(pdfAsBytes))
{
newStream.Seek(0, SeekOrigin.Begin);
var pdfReader = new PdfReader(newStream).SetUnethicalReading(true);
var pdfWriter = new PdfWriter("Test-PDF-1.pdf");
PdfDocument pdf = new PdfDocument(pdfReader, pdfWriter);
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdf, true);
IDictionary<string, PdfFormField> fields = form.GetFormFields();
foreach (var field in fields)
{
field.Value.SetValue(field.Key);
}
//form.FlattenFields();
pdf.Close();
}
}
I have two fillable pdf files and did the code to merge those pdfs into one single pdf. Below is my code for that.
public void PDFSplit()
{
List<string> files=new List<string>();
files.Add(Server.MapPath("~/Template/sample_pdf.pdf"));
files.Add(Server.MapPath("~/Template/temp/sample_pdf.pdf"));
//call method
Merge(files, Server.MapPath("~/Template/sample_pdf_123.pdf"));
}
//Merge pdf
public void Merge(List<String> InFiles, String OutFile)
{
using (FileStream stream = new FileStream(OutFile, FileMode.Create))
using (iTextSharp.text.Document doc = new iTextSharp.text.Document())
using (PdfCopy pdf = new PdfCopy(doc, stream))
{
doc.Open();
PdfReader reader = null;
PdfImportedPage page = null;
InFiles.ForEach(file =>
{
reader = new PdfReader(file);
for (int i = 0; i < reader.NumberOfPages; i++)
{
page = pdf.GetImportedPage(reader, i + 1);
pdf.AddPage(page);
}
pdf.FreeReader(reader);
reader.Close();
});
}
}
The code is working fine, but the problem is when I am trying to read that new generated merged file, it's not showing fields using AcroFields.
//To read pdf data
PdfReader reader = null;
reader = new PdfReader(Server.MapPath("~/Template/sample_pdf_123.pdf"));
AcroFields pdfFormFields = reader.AcroFields;
You are unable to marge fallible PDF files because you are using an old version of iText. Please upgrade to iText 7 for .NET and read the iText 7 jump-start tutorial, more specifically chapter 6 where it says:
Merging forms
This is how it's done:
PdfDocument destPdfDocument = new PdfDocument(new PdfWriter(dest));
PdfDocument[] sources = new PdfDocument[] {
new PdfDocument(new PdfReader(SRC1)),
new PdfDocument(new PdfReader(SRC2)) };
PdfPageFormCopier formCopier = new PdfPageFormCopier();
foreach (PdfDocument sourcePdfDocument in sources) {
sourcePdfDocument.CopyPagesTo(1,
sourcePdfDocument.GetNumberOfPages(), destPdfDocument, formCopier);
sourcePdfDocument.Close();
}
destPdfDocument.Close();
I'm using iTextSharp 5.x. I'm trying to merge two pdfs and preserve the isTagged flag. When I remove copy.SetTagged(); the result pdf contains both pdfs which is great. When adding the copy.SetTagged() is get an exception
Exception -->System.ObjectDisposedException: Cannot access a closed file.
at System.IO.__Error.FileNotOpen()
at System.IO.FileStream.get_Position()
Here is the code
List<string> filesToMerge = new List<string> { "C:/dev/dcs/wp-cla-dcs/Hex/Docs/metadata/coverPage.pdf", "C:/dev/dcs/wp-cla-dcs/Hex/Docs/metadata/49W7a.pdf" };
string outputFileName = "C:/dev/dcs/wp-cla-dcs/Hex/Docs/metadata/results.pdf";
using (FileStream outFS = new FileStream(outputFileName, FileMode.Create))
using (Document document = new Document())
// using (PdfCopy copy = new PdfCopy(document, outFS))
using (PdfCopy copy = new PdfSmartCopy(document, outFS))
{
{
copy.SetTagged();
// Set up the iTextSharp document
document.Open();
foreach (string pdfFile in filesToMerge)
{
using (var reader = new PdfReader(pdfFile))
{
copy.AddDocument(reader);
copy.FreeReader(reader);
}
}
}
}
despite #bruno-lowagie's comment, I have had better results doing this with with iText5.
Uisng iText7, PdfMerger left several contents untagged (all were tagged in the source document). PdfCopy in iText5 however worked just fine, only needed to manually add Xmp metadata, title, lang, etc:
public static void CombineMultiplePDFs(string[] fileNames, string outFile)
{
var lang = "en";
var title = "My new title";
// step 1: creation of a document-object
Document document = new Document();
// step 2: we create a writer that listens to the document
FileStream newFileStream = new FileStream(outFile, FileMode.Create);
PdfCopy writer = new PdfCopy(document, newFileStream);
writer.SetTagged();
writer.PdfVersion = PdfWriter.VERSION_1_7;
writer.AddViewerPreference(PdfName.DISPLAYDOCTITLE, new PdfBoolean(true));
writer.Info.Put(PdfName.TITLE, new PdfString(title));
writer.CreateXmpMetadata();
// step 3: we open the document
document.Open();
// set meta data
document.AddLanguage(lang);
document.AddTitle(title);
// keep an array of all open readers so they can be closed again.
var readers = new PdfReader[fileNames.Length];
for (var fi = 0; fi < fileNames.Length; fi++)
{
// we create a reader for a certain document
var fileName = fileNames[0];
PdfReader reader = new PdfReader(fileName);
readers[fi] = reader;
reader.ConsolidateNamedDestinations();
// step 4: we add content
for (int i = 1; i <= reader.NumberOfPages; i++)
{
// IMPORTANT: the third param is is "KeepTaggedPdfStructure"
PdfImportedPage page = writer.GetImportedPage(reader, i, true);
writer.AddPage(page);
}
}
// step 5: we close the document and writer
writer.Close();
document.Close();
// close readers only after document is lcosed
foreach (var r in readers)
{
r.Close();
}
}
I have a PDF which is generated by iTextSharp in C# - it's a template PDF, which gets some additional lines of text added using stamper, then pushed to S3 and finally returned to the browser as a file stream (using mvc.net).
The newly added lines work fine when the PDF is viewed in the browser (Chrome), but when I download the PDF and open it locally (with Preview or Adobe Acrobat on Mac), only the template is showing, and the newly added lines are gone.
What could cause this?
Here's a code example: (condensed)
using(var receiptTemplateStream = GetType().Assembly.GetManifestResourceStream("XXXXX.DepositReceipts.Receipt.pdf" ))
{
var reader = new PdfReader(receiptTemplateStream);
var outputPdfStream = new MemoryStream();
var stamper = new PdfStamper(reader, outputPdfStream) { FormFlattening = true, FreeTextFlattening = true };
var _pbover = stamper.GetOverContent(1);
using (var latoLightStream = GetType().Assembly.GetManifestResourceStream("XXXXX.DepositReceipts.Fonts.Lato-Light.ttf"))
using (var latoLightMS = new MemoryStream())
{
_pbover.SetFontAndSize(latoLight, 11.0f);
var verticalPosition = 650;
_pbover.ShowTextAligned(0, account.company_name, 45, verticalPosition, 0);
verticalPosition = verticalPosition - 15;
var filename = "Receipt 0001.pdf";
stamper.SetFullCompression();
stamper.Close();
var file = outputPdfStream.ToArray();
using (var output = new MemoryStream())
{
output.Write(file, 0, file.Length);
output.Position = 0;
var response = await _s3Client.PutObjectAsync(new PutObjectRequest()
{
InputStream = output,
BucketName = "XXXX",
CannedACL = S3CannedACL.Private,
Key = filename
});
}
return filename;
}
}
This was a funky one!
I had another method in the same solution that worked without problems. It turns out that the template PDF, that I load and write my content over, was the issue.
The template I used was generated in Adobe Illustrator. I had another one which was generated in Adobe Indesign - which worked.
When I pulled this template pdf into Indesign, and then exported it again (from Indesign) it suddenly worked.
I'm not sure exactly what caused this issue, but it must be some sort of encoding.
I'm using iTextSharp to merge a number of pdf files together into a single file.
I'm using method described in iTextSharp official tutorials, specifically here, which merges files page by page via PdfWriter and PdfImportedPage.
Turns out some of the files I need to merge are filled out PDF Forms and using this method of merging form data is lost.
I've see several examples of using PdfStamper to fill out forms and flatten them.
What I can't find, is a way to flatten already filled out PDF Form and hopefully merge it with the other files without saving it flattened out version first.
Thanks
Just setting .FormFlattening on PdfStamper wasn't quite enough...I ended up using a PdfReader with byte array of file contents that i used to stamp/flatten the data to get the byte array of that to put in a new PdfReader. Below is how i did it. works great now.
private void AppendPdfFile(FileDTO file, PdfContentByte cb, iTextSharp.text.Document printDocument, PdfWriter iwriter)
{
var reader = new PdfReader(file.FileContents);
if (reader.AcroForm != null)
reader = new PdfReader(FlattenPdfFormToBytes(reader,file.FileID));
AppendFilePages(reader, printDocument, iwriter, cb);
}
private byte[] FlattenPdfFormToBytes(PdfReader reader, Guid fileID)
{
var memStream = new MemoryStream();
var stamper = new PdfStamper(reader, memStream) {FormFlattening = true};
stamper.Close();
return memStream.ToArray();
}
When creating the files to be merged, I changed this setting:
pdfStamper.FormFlattening = true;
Works Great.
I think this problem is same with this one: AcroForm values missing after flattening
Based on the answer, this should do the trick:
pdfStamper.FormFlattening = true;
pdfStamper.AcroFields.GenerateAppearances = true;
This is the same answer as the accepted one but without any unused variables:
private byte[] GetUnEditablePdf(byte[] fileContents)
{
byte[] newFileContents = null;
var reader = new PdfReader(fileContents);
if (reader.AcroForm != null)
newFileContents = FlattenPdfFormToBytes(reader);
else newFileContents = fileContents;
return newFileContents;
}
private byte[] FlattenPdfFormToBytes(PdfReader reader)
{
var memStream = new MemoryStream();
var stamper = new PdfStamper(reader, memStream) { FormFlattening = true };
stamper.Close();
return memStream.ToArray();
}