Using itextsharp to merge pdf files within a folder - c#

I'm trying to use codes below to merge the pdf files in a folder and output into a new file but apparently the generated file seems corrupted.
public Boolean MergeForm(String destinationFile, String sourceFolder)
{
try
{
using (MemoryStream stream = new MemoryStream())
using (Document doc = new Document())
using (PdfCopy pdf = new PdfCopy(doc, stream))
{
doc.Open();
PdfReader reader = null;
PdfImportedPage page = null;
foreach (var file in Directory.GetFiles(sourceFolder))
{
reader = new PdfReader(file);
for (int i = 0; i < reader.NumberOfPages; i++)
{
page = pdf.GetImportedPage(reader, i + 1);
pdf.AddPage(page);
}
pdf.FreeReader(reader);
reader.Close();
}
using (FileStream streamX = new FileStream(destinationFile, FileMode.Create))
{
stream.WriteTo(streamX);
}
}
return true;
}
catch (Exception)
{
return false;
}
}
Can anyone spot on where's the problem? Thank you.

Can anyone spot on where's the problem?
Your main problem is that you use the contents of the MemoryStream before the Document and PdfCopy have finished creating the PDF (during the Dispose at the end of the using block). Thus, you save an incomplete PDF file as a result.
Doing it like this instead should work:
using (MemoryStream stream = new MemoryStream())
{
using (Document doc = new Document())
{
PdfCopy pdf = new PdfCopy(doc, stream);
pdf.CloseStream = false;
doc.Open();
PdfReader reader = null;
PdfImportedPage page = null;
foreach (var file in Directory.GetFiles(sourceFolder))
{
reader = new PdfReader(file);
for (int i = 0; i < reader.NumberOfPages; i++)
{
page = pdf.GetImportedPage(reader, i + 1);
pdf.AddPage(page);
}
pdf.FreeReader(reader);
reader.Close();
}
}
using (FileStream streamX = new FileStream(destinationFile, FileMode.Create))
{
stream.WriteTo(streamX);
}
}
BTW, you also see here that I did not put PdfCopy into a using block. This is because the Document implicitly closes the PDFCopy when it is disposed. First disposing the PdfCopy and then the Document (which tries to close the PdfCopy again), therefore, is not necessary and can result in hiding exceptions thrown from within the block by other exceptions occurring in this closing circus.
Furthermore I needed to add the pdf.CloseStream = false, otherwise the memory stream would have been closed when the PdfCopy is closed.
That been said,
Of course you should also use AddDocument instead of iterating over the document pages yourself as already explained by #Bruno.
Your memory footprint would decrease if you immediately wrote to the file stream instead of the memory stream.

Related

The document has no catalog object (meaning: it's an invalid PDF)

I am reading and writing to the same PDF at the same time i am getting error "The document has no catalog object (meaning: it's an invalid PDF)" on this line "PdfReader pdfReader = new PdfReader(inputPdf2);" in the below code snippet.
iTextSharp.text.pdf.PdfCopy pdfCopy = null;
Document finalPDF = new Document();
//pdfReader = null;
FileStream fileStream = null;
int pageCount = 1;
int TotalPages = 20;
try
{
fileStream = new FileStream(finalPDFFile, FileMode.OpenOrCreate, FileAccess.Write);
pdfCopy = new PdfCopy(finalPDF, fileStream);
finalPDF.Open();
foreach (string inputPdf1 in inputPDFFiles)
{
if (File.Exists(inputPdf1))
{
var bytes = File.ReadAllBytes(inputPdf1);
PdfReader pdfReader = new PdfReader(bytes);
fileStream = new FileStream(inputPdf1, FileMode.Open, FileAccess.Write);
var stamper = new PdfStamper(pdfReader, fileStream);
var acroFields = stamper.AcroFields;
stamper.AcroFields.SetField(acrofiled.Key, "Page " + 1+ " of " + 16);
stamper.FormFlattening = true;
stamper.Close();
stamper.Dispose();
fileStream.Close();
fileStream.Dispose();
pdfReader.Close();
pdfReader.Dispose();
}
}
foreach (string inputPdf2 in inputPDFFiles)
{
if (File.Exists(inputPdf2))
{
PdfReader pdfReader = new PdfReader(inputPdf2);
int pageNumbers = pdfReader.NumberOfPages;
for (int pages = 1; pages <= pageNumbers; pages++)
{
PdfImportedPage page = pdfCopy.GetImportedPage(pdfReader, pages);
PdfCopy.PageStamp pageStamp = pdfCopy.CreatePageStamp(page);
pdfCopy.AddPage(page);
}
pdfReader.Close();
pdfReader.Dispose();
}
}
pdfCopy.Close();
pdfCopy.Dispose();
finalPDF.Close();
finalPDF.Dispose();
fileStream.Close();
fileStream.Dispose();
please help me in order to fix issue or give me any alternate approach
In your first loop you overwrite each of your files with a manipulated version like this:
var bytes = File.ReadAllBytes(inputPdf1);
PdfReader pdfReader = new PdfReader(bytes);
fileStream = new FileStream(inputPdf1, FileMode.Open, FileAccess.Write);
var stamper = new PdfStamper(pdfReader, fileStream);
[...]
Using FileMode.Open here is an error. You want to replace the existing file with a new one, and for such a use case you have to use FileMode.Create or FileMode.Truncate.
Using FileMode.Open results in the original file content remaining there and you writing into it. Thus, if your new file content is shorter than the original one (which can happen when flattening a form), your new file keeps a tail segment of the original file. In PDFs there are relevant lookup information at the end, so upon reading this new file the PdfReader finds the lookup information of the old file which don't match the new content anymore at all.
By the way, you create the PdfCopy like this:
fileStream = new FileStream(finalPDFFile, FileMode.OpenOrCreate, FileAccess.Write);
pdfCopy = new PdfCopy(finalPDF, fileStream);
This is wrong for the same reason: If there already is PDF there, FileMode.OpenOrCreate works just like FileMode.Open with the unwanted effects described above.
Thus, you should replace the FileMode values for streams you write to with FileMode.Create.

itextsharp merging pdfs an emailing

I am battling with some memorystream logic.
I have a method that receives a list of Id. uses them to pull pdfs from a webserver, and merges them into one pdf.
I want to then email this pdf (working in memory only)
private Stream GetWebReport(string selected_id)
{
var IdLst = selected_id.Split(',').ToList();
MemoryStream stream = new MemoryStream();
Document document = new Document();
PdfCopy pdf = new PdfCopy(document, stream);
PdfReader reader = null;
try
{
document.Open();
foreach (var id in IdLst)
{
int i = Convert.ToInt32(id);
string invoice_url = string.Concat("http://specialurl/", id);
var urlpdf = new System.Net.WebClient().OpenRead(invoice_url);
reader = new PdfReader(urlpdf);
pdf.AddDocument(reader);
reader.Close();
}
}
catch (Exception)
{
throw;
}
finally
{
if (document != null)
{
document.Close();
}
}
return stream;
}
but when I try use the resulting stream for an email
var mem = GetWebReport(selected_id);
mem.Seek(0, SeekOrigin.Begin);
Attachment att = new Attachment(mem, "Report for you", "application/pdf");
I get told:
System.ObjectDisposedException: 'Cannot access a closed Stream.'
So I am sure that my itextsharp logic is good (When I use a filestream I get the correct results).
I am sure that my logic in passing streams is what is faulty
Use
PdfCopy pdf = new PdfCopy(document, stream);
pdf.CloseStream = false;
This will keep the stream open after closing the pdf to be used elsewhere.

merging pdf and preserve SetTagged

I'm using iTextSharp 5.x. I'm trying to merge two pdfs and preserve the isTagged flag. When I remove copy.SetTagged(); the result pdf contains both pdfs which is great. When adding the copy.SetTagged() is get an exception
Exception -->System.ObjectDisposedException: Cannot access a closed file.
at System.IO.__Error.FileNotOpen()
at System.IO.FileStream.get_Position()
Here is the code
List<string> filesToMerge = new List<string> { "C:/dev/dcs/wp-cla-dcs/Hex/Docs/metadata/coverPage.pdf", "C:/dev/dcs/wp-cla-dcs/Hex/Docs/metadata/49W7a.pdf" };
string outputFileName = "C:/dev/dcs/wp-cla-dcs/Hex/Docs/metadata/results.pdf";
using (FileStream outFS = new FileStream(outputFileName, FileMode.Create))
using (Document document = new Document())
// using (PdfCopy copy = new PdfCopy(document, outFS))
using (PdfCopy copy = new PdfSmartCopy(document, outFS))
{
{
copy.SetTagged();
// Set up the iTextSharp document
document.Open();
foreach (string pdfFile in filesToMerge)
{
using (var reader = new PdfReader(pdfFile))
{
copy.AddDocument(reader);
copy.FreeReader(reader);
}
}
}
}
despite #bruno-lowagie's comment, I have had better results doing this with with iText5.
Uisng iText7, PdfMerger left several contents untagged (all were tagged in the source document). PdfCopy in iText5 however worked just fine, only needed to manually add Xmp metadata, title, lang, etc:
public static void CombineMultiplePDFs(string[] fileNames, string outFile)
{
var lang = "en";
var title = "My new title";
// step 1: creation of a document-object
Document document = new Document();
// step 2: we create a writer that listens to the document
FileStream newFileStream = new FileStream(outFile, FileMode.Create);
PdfCopy writer = new PdfCopy(document, newFileStream);
writer.SetTagged();
writer.PdfVersion = PdfWriter.VERSION_1_7;
writer.AddViewerPreference(PdfName.DISPLAYDOCTITLE, new PdfBoolean(true));
writer.Info.Put(PdfName.TITLE, new PdfString(title));
writer.CreateXmpMetadata();
// step 3: we open the document
document.Open();
// set meta data
document.AddLanguage(lang);
document.AddTitle(title);
// keep an array of all open readers so they can be closed again.
var readers = new PdfReader[fileNames.Length];
for (var fi = 0; fi < fileNames.Length; fi++)
{
// we create a reader for a certain document
var fileName = fileNames[0];
PdfReader reader = new PdfReader(fileName);
readers[fi] = reader;
reader.ConsolidateNamedDestinations();
// step 4: we add content
for (int i = 1; i <= reader.NumberOfPages; i++)
{
// IMPORTANT: the third param is is "KeepTaggedPdfStructure"
PdfImportedPage page = writer.GetImportedPage(reader, i, true);
writer.AddPage(page);
}
}
// step 5: we close the document and writer
writer.Close();
document.Close();
// close readers only after document is lcosed
foreach (var r in readers)
{
r.Close();
}
}

iTextSharp problem concatenating PDF documents

I am trying to build up a single PDF from a bunch of other PDFs that I am filling out some form values in. Essentially I am doing a PDF mail merge. My code is below:
byte[] completedDocument = null;
using (MemoryStream streamCompleted = new MemoryStream())
{
using (Document document = new Document())
{
document.Open();
PdfCopy copy = new PdfCopy(document, streamCompleted);
copy.Open();
foreach (var item in eventItems)
{
byte[] mergedDocument = null;
PdfReader reader = new PdfReader(pdfTemplates[item.DataTokens[NotifyTokenType.OrganisationID]]);
using (MemoryStream streamTemplate = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(reader, streamTemplate))
{
foreach (var token in item.DataTokens)
{
if (stamper.AcroFields.Fields.Any(fld => fld.Key == token.Key.ToString()))
{
stamper.AcroFields.SetField(token.Key.ToString(), token.Value);
}
}
stamper.FormFlattening = true;
stamper.Writer.CloseStream = false;
}
mergedDocument = new byte[streamTemplate.Length];
streamTemplate.Position = 0;
streamTemplate.Read(mergedDocument, 0, (int)streamTemplate.Length);
}
reader = new PdfReader(mergedDocument);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
document.SetPageSize(PageSize.A4);
copy.AddPage(copy.GetImportedPage(reader, i));
}
}
}
completedDocument = new byte[streamCompleted.Length];
streamCompleted.Position = 0;
streamCompleted.Read(completedDocument, 0, (int)streamCompleted.Length);
}
The problem I am having is that is throws a null reference exception when it exits the using (Document document = new Document()) block.
From debugging the iTextSharp source the problem is the below method in PdfAnnotationsimp
public bool HasUnusedAnnotations() {
return annotations.Count > 0;
}
annotations is null so this throws the null ref exception. Is there something I should be doing to instantiate this?
I changed:
document.Open();
PdfCopy copy = new PdfCopy(document, streamCompleted);
to
PdfCopy copy = new PdfCopy(document, streamCompleted);
document.Open();
And it fixed the problem. This library needs better exception handling. When you do something slightly wrong it falls over horribly and gives you no clue about what you did wrong. I have no idea how i could possibly have worked this out if I didn't have the source code.
What version of iTextSharp are you using? The Document class doesn't implement IDisposable so you can't wrap it in a using block.

Combine two (or more) PDF's

Background: I need to provide a weekly report package for my sales staff. This package contains several (5-10) crystal reports.
Problem:
I would like to allow a user to run all reports and also just run a single report. I was thinking I could do this by creating the reports and then doing:
List<ReportClass> reports = new List<ReportClass>();
reports.Add(new WeeklyReport1());
reports.Add(new WeeklyReport2());
reports.Add(new WeeklyReport3());
<snip>
foreach (ReportClass report in reports)
{
report.ExportToDisk(ExportFormatType.PortableDocFormat, #"c:\reports\" + report.ResourceName + ".pdf");
}
This would provide me a folder full of the reports, but I would like to email everyone a single PDF with all the weekly reports. So I need to combine them.
Is there an easy way to do this without install any more third party controls? I already have DevExpress & CrystalReports and I'd prefer not to add too many more.
Would it be best to combine them in the foreach loop or in a seperate loop? (or an alternate way)
I had to solve a similar problem and what I ended up doing was creating a small pdfmerge utility that uses the PDFSharp project which is essentially MIT licensed.
The code is dead simple, I needed a cmdline utility so I have more code dedicated to parsing the arguments than I do for the PDF merging:
using (PdfDocument one = PdfReader.Open("file1.pdf", PdfDocumentOpenMode.Import))
using (PdfDocument two = PdfReader.Open("file2.pdf", PdfDocumentOpenMode.Import))
using (PdfDocument outPdf = new PdfDocument())
{
CopyPages(one, outPdf);
CopyPages(two, outPdf);
outPdf.Save("file1and2.pdf");
}
void CopyPages(PdfDocument from, PdfDocument to)
{
for (int i = 0; i < from.PageCount; i++)
{
to.AddPage(from.Pages[i]);
}
}
Here is a single function that will merge X amount of PDFs using PDFSharp
using PdfSharp;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
public static void MergePDFs(string targetPath, params string[] pdfs) {
using(var targetDoc = new PdfDocument()){
foreach (var pdf in pdfs) {
using (var pdfDoc = PdfReader.Open(pdf, PdfDocumentOpenMode.Import)) {
for (var i = 0; i < pdfDoc.PageCount; i++)
targetDoc.AddPage(pdfDoc.Pages[i]);
}
}
targetDoc.Save(targetPath);
}
}
This is something that I figured out, and wanted to share with you, using PdfSharp.
Here you can join multiple Pdfs in one, without the need of an output directory (following the input list order)
public static byte[] MergePdf(List<byte[]> pdfs)
{
List<PdfSharp.Pdf.PdfDocument> lstDocuments = new List<PdfSharp.Pdf.PdfDocument>();
foreach (var pdf in pdfs)
{
lstDocuments.Add(PdfReader.Open(new MemoryStream(pdf), PdfDocumentOpenMode.Import));
}
using (PdfSharp.Pdf.PdfDocument outPdf = new PdfSharp.Pdf.PdfDocument())
{
for(int i = 1; i<= lstDocuments.Count; i++)
{
foreach(PdfSharp.Pdf.PdfPage page in lstDocuments[i-1].Pages)
{
outPdf.AddPage(page);
}
}
MemoryStream stream = new MemoryStream();
outPdf.Save(stream, false);
byte[] bytes = stream.ToArray();
return bytes;
}
}
I used iTextsharp with c# to combine pdf files. This is the code I used.
string[] lstFiles=new string[3];
lstFiles[0]=#"C:/pdf/1.pdf";
lstFiles[1]=#"C:/pdf/2.pdf";
lstFiles[2]=#"C:/pdf/3.pdf";
PdfReader reader = null;
Document sourceDocument = null;
PdfCopy pdfCopyProvider = null;
PdfImportedPage importedPage;
string outputPdfPath=#"C:/pdf/new.pdf";
sourceDocument = new Document();
pdfCopyProvider = new PdfCopy(sourceDocument, new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
//Open the output file
sourceDocument.Open();
try
{
//Loop through the files list
for (int f = 0; f < lstFiles.Length-1; f++)
{
int pages =get_pageCcount(lstFiles[f]);
reader = new PdfReader(lstFiles[f]);
//Add pages of current file
for (int i = 1; i <= pages; i++)
{
importedPage = pdfCopyProvider.GetImportedPage(reader, i);
pdfCopyProvider.AddPage(importedPage);
}
reader.Close();
}
//At the end save the output file
sourceDocument.Close();
}
catch (Exception ex)
{
throw ex;
}
private int get_pageCcount(string file)
{
using (StreamReader sr = new StreamReader(File.OpenRead(file)))
{
Regex regex = new Regex(#"/Type\s*/Page[^s]");
MatchCollection matches = regex.Matches(sr.ReadToEnd());
return matches.Count;
}
}
Here is a example using iTextSharp
public static void MergePdf(Stream outputPdfStream, IEnumerable<string> pdfFilePaths)
{
using (var document = new Document())
using (var pdfCopy = new PdfCopy(document, outputPdfStream))
{
pdfCopy.CloseStream = false;
try
{
document.Open();
foreach (var pdfFilePath in pdfFilePaths)
{
using (var pdfReader = new PdfReader(pdfFilePath))
{
pdfCopy.AddDocument(pdfReader);
pdfReader.Close();
}
}
}
finally
{
document?.Close();
}
}
}
The PdfReader constructor has many overloads. It's possible to replace the parameter type IEnumerable<string> with IEnumerable<Stream> and it should work as well. Please notice that the method does not close the OutputStream, it delegates that task to the Stream creator.
PDFsharp seems to allow merging multiple PDF documents into one.
And the same is also possible with ITextSharp.
Combining two byte[] using iTextSharp up to version 5.x:
internal static MemoryStream mergePdfs(byte[] pdf1, byte[] pdf2)
{
MemoryStream outStream = new MemoryStream();
using (Document document = new Document())
using (PdfCopy copy = new PdfCopy(document, outStream))
{
document.Open();
copy.AddDocument(new PdfReader(pdf1));
copy.AddDocument(new PdfReader(pdf2));
}
return outStream;
}
Instead of the byte[]'s it's possible to pass also Stream's
There's some good answers here already, but I thought I might mention that pdftk might be useful for this task. Instead of producing one PDF directly, you could produce each PDF you need and then combine them together as a post-process with pdftk. This could even be done from within your program using a system() or ShellExecute() call.
You could try pdf-shuffler gtk-apps.org
I know a lot of people have recommended PDF Sharp, however it doesn't look like that project has been updated since june of 2008. Further, source isn't available.
Personally, I've been playing with iTextSharp which has been pretty easy to work with.
I combined the two above, because I needed to merge 3 pdfbytes and return a byte
internal static byte[] mergePdfs(byte[] pdf1, byte[] pdf2,byte[] pdf3)
{
MemoryStream outStream = new MemoryStream();
using (Document document = new Document())
using (PdfCopy copy = new PdfCopy(document, outStream))
{
document.Open();
copy.AddDocument(new PdfReader(pdf1));
copy.AddDocument(new PdfReader(pdf2));
copy.AddDocument(new PdfReader(pdf3));
}
return outStream.ToArray();
}
Following method gets a List of byte array which is PDF byte array and then returns a byte array.
using ...;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
public static class PdfHelper
{
public static byte[] PdfConcat(List<byte[]> lstPdfBytes)
{
byte[] res;
using (var outPdf = new PdfDocument())
{
foreach (var pdf in lstPdfBytes)
{
using (var pdfStream = new MemoryStream(pdf))
using (var pdfDoc = PdfReader.Open(pdfStream, PdfDocumentOpenMode.Import))
for (var i = 0; i < pdfDoc.PageCount; i++)
outPdf.AddPage(pdfDoc.Pages[i]);
}
using (var memoryStreamOut = new MemoryStream())
{
outPdf.Save(memoryStreamOut, false);
res = Stream2Bytes(memoryStreamOut);
}
}
return res;
}
public static void DownloadAsPdfFile(string fileName, byte[] content)
{
var ms = new MemoryStream(content);
HttpContext.Current.Response.Clear();
HttpContext.Current.Response.ContentType = "application/pdf";
HttpContext.Current.Response.AddHeader("content-disposition", $"attachment;filename={fileName}.pdf");
HttpContext.Current.Response.Buffer = true;
ms.WriteTo(HttpContext.Current.Response.OutputStream);
HttpContext.Current.Response.End();
}
private static byte[] Stream2Bytes(Stream input)
{
var buffer = new byte[input.Length];
using (var ms = new MemoryStream())
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
ms.Write(buffer, 0, read);
return ms.ToArray();
}
}
}
So, the result of PdfHelper.PdfConcat method is passed to PdfHelper.DownloadAsPdfFile method.
PS: A NuGet package named [PdfSharp][1] need to be installed. So in the Package Manage Console window type:
Install-Package PdfSharp
Following method merges two pdfs( f1 and f2) using iTextSharp. The second pdf is appended after a specific index of f1.
string f1 = "D:\\a.pdf";
string f2 = "D:\\Iso.pdf";
string outfile = "D:\\c.pdf";
appendPagesFromPdf(f1, f2, outfile, 3);
public static void appendPagesFromPdf(String f1,string f2, String destinationFile, int startingindex)
{
PdfReader p1 = new PdfReader(f1);
PdfReader p2 = new PdfReader(f2);
int l1 = p1.NumberOfPages, l2 = p2.NumberOfPages;
//Create our destination file
using (FileStream fs = new FileStream(destinationFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
Document doc = new Document();
PdfWriter w = PdfWriter.GetInstance(doc, fs);
doc.Open();
for (int page = 1; page <= startingindex; page++)
{
doc.NewPage();
w.DirectContent.AddTemplate(w.GetImportedPage(p1, page), 0, 0);
//Used to pull individual pages from our source
}// copied pages from first pdf till startingIndex
for (int i = 1; i <= l2;i++)
{
doc.NewPage();
w.DirectContent.AddTemplate(w.GetImportedPage(p2, i), 0, 0);
}// merges second pdf after startingIndex
for (int i = startingindex+1; i <= l1;i++)
{
doc.NewPage();
w.DirectContent.AddTemplate(w.GetImportedPage(p1, i), 0, 0);
}// continuing from where we left in pdf1
doc.Close();
p1.Close();
p2.Close();
}
}
To solve a similar problem i used iTextSharp like this:
//Create the document which will contain the combined PDF's
Document document = new Document();
//Create a writer for de document
PdfCopy writer = new PdfCopy(document, new FileStream(OutPutFilePath, FileMode.Create));
if (writer == null)
{
return;
}
//Open the document
document.Open();
//Get the files you want to combine
string[] filePaths = Directory.GetFiles(DirectoryPathWhereYouHaveYourFiles);
foreach (string filePath in filePaths)
{
//Read the PDF file
using (PdfReader reader = new PdfReader(vls_FilePath))
{
//Add the file to the combined one
writer.AddDocument(reader);
}
}
//Finally close the document and writer
writer.Close();
document.Close();
Here is a link to an example using PDFSharp and ConcatenateDocuments
Here the solution http://www.wacdesigns.com/2008/10/03/merge-pdf-files-using-c
It use free open source iTextSharp library http://sourceforge.net/projects/itextsharp
I've done this with PDFBox. I suppose it works similarly to iTextSharp.

Categories