merging pdf and preserve SetTagged - c#

I'm using iTextSharp 5.x. I'm trying to merge two pdfs and preserve the isTagged flag. When I remove copy.SetTagged(); the result pdf contains both pdfs which is great. When adding the copy.SetTagged() is get an exception
Exception -->System.ObjectDisposedException: Cannot access a closed file.
at System.IO.__Error.FileNotOpen()
at System.IO.FileStream.get_Position()
Here is the code
List<string> filesToMerge = new List<string> { "C:/dev/dcs/wp-cla-dcs/Hex/Docs/metadata/coverPage.pdf", "C:/dev/dcs/wp-cla-dcs/Hex/Docs/metadata/49W7a.pdf" };
string outputFileName = "C:/dev/dcs/wp-cla-dcs/Hex/Docs/metadata/results.pdf";
using (FileStream outFS = new FileStream(outputFileName, FileMode.Create))
using (Document document = new Document())
// using (PdfCopy copy = new PdfCopy(document, outFS))
using (PdfCopy copy = new PdfSmartCopy(document, outFS))
{
{
copy.SetTagged();
// Set up the iTextSharp document
document.Open();
foreach (string pdfFile in filesToMerge)
{
using (var reader = new PdfReader(pdfFile))
{
copy.AddDocument(reader);
copy.FreeReader(reader);
}
}
}
}

despite #bruno-lowagie's comment, I have had better results doing this with with iText5.
Uisng iText7, PdfMerger left several contents untagged (all were tagged in the source document). PdfCopy in iText5 however worked just fine, only needed to manually add Xmp metadata, title, lang, etc:
public static void CombineMultiplePDFs(string[] fileNames, string outFile)
{
var lang = "en";
var title = "My new title";
// step 1: creation of a document-object
Document document = new Document();
// step 2: we create a writer that listens to the document
FileStream newFileStream = new FileStream(outFile, FileMode.Create);
PdfCopy writer = new PdfCopy(document, newFileStream);
writer.SetTagged();
writer.PdfVersion = PdfWriter.VERSION_1_7;
writer.AddViewerPreference(PdfName.DISPLAYDOCTITLE, new PdfBoolean(true));
writer.Info.Put(PdfName.TITLE, new PdfString(title));
writer.CreateXmpMetadata();
// step 3: we open the document
document.Open();
// set meta data
document.AddLanguage(lang);
document.AddTitle(title);
// keep an array of all open readers so they can be closed again.
var readers = new PdfReader[fileNames.Length];
for (var fi = 0; fi < fileNames.Length; fi++)
{
// we create a reader for a certain document
var fileName = fileNames[0];
PdfReader reader = new PdfReader(fileName);
readers[fi] = reader;
reader.ConsolidateNamedDestinations();
// step 4: we add content
for (int i = 1; i <= reader.NumberOfPages; i++)
{
// IMPORTANT: the third param is is "KeepTaggedPdfStructure"
PdfImportedPage page = writer.GetImportedPage(reader, i, true);
writer.AddPage(page);
}
}
// step 5: we close the document and writer
writer.Close();
document.Close();
// close readers only after document is lcosed
foreach (var r in readers)
{
r.Close();
}
}

Related

How to read marge fillable pdf data using itextsharp

I have two fillable pdf files and did the code to merge those pdfs into one single pdf. Below is my code for that.
public void PDFSplit()
{
List<string> files=new List<string>();
files.Add(Server.MapPath("~/Template/sample_pdf.pdf"));
files.Add(Server.MapPath("~/Template/temp/sample_pdf.pdf"));
//call method
Merge(files, Server.MapPath("~/Template/sample_pdf_123.pdf"));
}
//Merge pdf
public void Merge(List<String> InFiles, String OutFile)
{
using (FileStream stream = new FileStream(OutFile, FileMode.Create))
using (iTextSharp.text.Document doc = new iTextSharp.text.Document())
using (PdfCopy pdf = new PdfCopy(doc, stream))
{
doc.Open();
PdfReader reader = null;
PdfImportedPage page = null;
InFiles.ForEach(file =>
{
reader = new PdfReader(file);
for (int i = 0; i < reader.NumberOfPages; i++)
{
page = pdf.GetImportedPage(reader, i + 1);
pdf.AddPage(page);
}
pdf.FreeReader(reader);
reader.Close();
});
}
}
The code is working fine, but the problem is when I am trying to read that new generated merged file, it's not showing fields using AcroFields.
//To read pdf data
PdfReader reader = null;
reader = new PdfReader(Server.MapPath("~/Template/sample_pdf_123.pdf"));
AcroFields pdfFormFields = reader.AcroFields;
You are unable to marge fallible PDF files because you are using an old version of iText. Please upgrade to iText 7 for .NET and read the iText 7 jump-start tutorial, more specifically chapter 6 where it says:
Merging forms
This is how it's done:
PdfDocument destPdfDocument = new PdfDocument(new PdfWriter(dest));
PdfDocument[] sources = new PdfDocument[] {
new PdfDocument(new PdfReader(SRC1)),
new PdfDocument(new PdfReader(SRC2)) };
PdfPageFormCopier formCopier = new PdfPageFormCopier();
foreach (PdfDocument sourcePdfDocument in sources) {
sourcePdfDocument.CopyPagesTo(1,
sourcePdfDocument.GetNumberOfPages(), destPdfDocument, formCopier);
sourcePdfDocument.Close();
}
destPdfDocument.Close();

how read pdf layers and add to another page using itextsharp?

How can I read PDF layers and add them to another page using iTextSharp?
I want to copy layers from one PDF page and move them to another PDF page in C#.
I've tried reading from a PDF layer from one page, but I am unable to copy that layer.
var document = new Document();
FileStream outfile = new FileStream(outPutFilePath, FileMode.Create);
var writer = new PdfCopy(document, outfile);
document.Open();
foreach (var fileName in filesPath)
{
var reader = new PdfReader(fileName);
PdfStamper stamper = new PdfStamper(reader, outfile);
Dictionary<String, PdfLayer> layers = stamper.GetPdfLayers();
//PdfLayer layer = layers.get("Nested layer 1");
//layer.setOn(false);
for (var i = 1; i <= reader.NumberOfPages; i++)
{
var page = writer.GetImportedPage(reader, i);
page.ContentTagged = true;
writer.AddPage(page);
}
stamper.Close();
reader.Close();
}
writer.Close();
document.Close();

Grab all of the pages of a PDF using textsharp

I am getting a pfd using the older version of itextsharp with this code
string Oldfile = #"C:/test.pdf"; // Gets the Template
(new FileInfo("C:/C:/test.pdf")).Directory.Create(); // Go create this folder if it's not there
string NewFile = "C:/test.pdf";
PdfReader reader = new PdfReader(Oldfile);
iTextSharp.text.Rectangle Size = reader.GetPageSizeWithRotation(1);
Document document = new Document(Size);
// MemoryStream memory_stream = new MemoryStream();
FileStream fs = new FileStream(NewFile, FileMode.Create, FileAccess.Write);
PdfWriter weiter = PdfWriter.GetInstance(document, fs);
document.Open();
PdfContentByte cb = weiter.DirectContent;
PdfImportedPage page = weiter.GetImportedPage(reader, 1);
//PdfImportedPage page2 = weiter.GetImportedPage(reader, 2);
cb.AddTemplate(page, 0, 0);
The problem I am having is when it gets that file it has 2 pages in that pdf but it only gets the 1st page and adds lines and saves the only 1st page of the pdf I want to be able to grab both of them or is there a way to merge them after wards
I bet you need to iterate all pages.
using System;
using System.IO;
using System.Collections.Generic;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace TestAnything
{
class Program
{
static void Main(string[] args)
{
List<string> filesToMerge = new List<string> { #"c:\temp\1.pdf", #"c:\temp\2.pdf" };
FileInfo destinationFile = new FileInfo(#"c:\temp\merge.pdf");
if (File.Exists(destinationFile.FullName))
File.Delete(destinationFile.FullName);
MergeFiles(filesToMerge, destinationFile);
}
public static void MergeFiles(List<string> sourceFiles, FileInfo destinationFile)
{
if (sourceFiles == null || sourceFiles.Count == 0)
throw new ArgumentNullException("blahhh.");
PdfReader reader = new PdfReader(sourceFiles[0]);
Document document = new Document(reader.GetPageSizeWithRotation(1));
PdfCopy writer = new PdfCopy(document, new FileStream(destinationFile.FullName, FileMode.Create));
document.Open();
try
{
foreach (string sourceFile in sourceFiles)
{
reader = new PdfReader(sourceFile);
reader.ConsolidateNamedDestinations();
for (int x = 1; x <= reader.NumberOfPages; x++)
writer.AddPage(writer.GetImportedPage(reader, x));
PRAcroForm form = reader.AcroForm;
if (form != null)
writer.CopyAcroForm(reader);
}
}
finally
{
if (document.IsOpen())
document.Close();
}
}
}
}

Using itextsharp to merge pdf files within a folder

I'm trying to use codes below to merge the pdf files in a folder and output into a new file but apparently the generated file seems corrupted.
public Boolean MergeForm(String destinationFile, String sourceFolder)
{
try
{
using (MemoryStream stream = new MemoryStream())
using (Document doc = new Document())
using (PdfCopy pdf = new PdfCopy(doc, stream))
{
doc.Open();
PdfReader reader = null;
PdfImportedPage page = null;
foreach (var file in Directory.GetFiles(sourceFolder))
{
reader = new PdfReader(file);
for (int i = 0; i < reader.NumberOfPages; i++)
{
page = pdf.GetImportedPage(reader, i + 1);
pdf.AddPage(page);
}
pdf.FreeReader(reader);
reader.Close();
}
using (FileStream streamX = new FileStream(destinationFile, FileMode.Create))
{
stream.WriteTo(streamX);
}
}
return true;
}
catch (Exception)
{
return false;
}
}
Can anyone spot on where's the problem? Thank you.
Can anyone spot on where's the problem?
Your main problem is that you use the contents of the MemoryStream before the Document and PdfCopy have finished creating the PDF (during the Dispose at the end of the using block). Thus, you save an incomplete PDF file as a result.
Doing it like this instead should work:
using (MemoryStream stream = new MemoryStream())
{
using (Document doc = new Document())
{
PdfCopy pdf = new PdfCopy(doc, stream);
pdf.CloseStream = false;
doc.Open();
PdfReader reader = null;
PdfImportedPage page = null;
foreach (var file in Directory.GetFiles(sourceFolder))
{
reader = new PdfReader(file);
for (int i = 0; i < reader.NumberOfPages; i++)
{
page = pdf.GetImportedPage(reader, i + 1);
pdf.AddPage(page);
}
pdf.FreeReader(reader);
reader.Close();
}
}
using (FileStream streamX = new FileStream(destinationFile, FileMode.Create))
{
stream.WriteTo(streamX);
}
}
BTW, you also see here that I did not put PdfCopy into a using block. This is because the Document implicitly closes the PDFCopy when it is disposed. First disposing the PdfCopy and then the Document (which tries to close the PdfCopy again), therefore, is not necessary and can result in hiding exceptions thrown from within the block by other exceptions occurring in this closing circus.
Furthermore I needed to add the pdf.CloseStream = false, otherwise the memory stream would have been closed when the PdfCopy is closed.
That been said,
Of course you should also use AddDocument instead of iterating over the document pages yourself as already explained by #Bruno.
Your memory footprint would decrease if you immediately wrote to the file stream instead of the memory stream.

iTextSharp problem concatenating PDF documents

I am trying to build up a single PDF from a bunch of other PDFs that I am filling out some form values in. Essentially I am doing a PDF mail merge. My code is below:
byte[] completedDocument = null;
using (MemoryStream streamCompleted = new MemoryStream())
{
using (Document document = new Document())
{
document.Open();
PdfCopy copy = new PdfCopy(document, streamCompleted);
copy.Open();
foreach (var item in eventItems)
{
byte[] mergedDocument = null;
PdfReader reader = new PdfReader(pdfTemplates[item.DataTokens[NotifyTokenType.OrganisationID]]);
using (MemoryStream streamTemplate = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(reader, streamTemplate))
{
foreach (var token in item.DataTokens)
{
if (stamper.AcroFields.Fields.Any(fld => fld.Key == token.Key.ToString()))
{
stamper.AcroFields.SetField(token.Key.ToString(), token.Value);
}
}
stamper.FormFlattening = true;
stamper.Writer.CloseStream = false;
}
mergedDocument = new byte[streamTemplate.Length];
streamTemplate.Position = 0;
streamTemplate.Read(mergedDocument, 0, (int)streamTemplate.Length);
}
reader = new PdfReader(mergedDocument);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
document.SetPageSize(PageSize.A4);
copy.AddPage(copy.GetImportedPage(reader, i));
}
}
}
completedDocument = new byte[streamCompleted.Length];
streamCompleted.Position = 0;
streamCompleted.Read(completedDocument, 0, (int)streamCompleted.Length);
}
The problem I am having is that is throws a null reference exception when it exits the using (Document document = new Document()) block.
From debugging the iTextSharp source the problem is the below method in PdfAnnotationsimp
public bool HasUnusedAnnotations() {
return annotations.Count > 0;
}
annotations is null so this throws the null ref exception. Is there something I should be doing to instantiate this?
I changed:
document.Open();
PdfCopy copy = new PdfCopy(document, streamCompleted);
to
PdfCopy copy = new PdfCopy(document, streamCompleted);
document.Open();
And it fixed the problem. This library needs better exception handling. When you do something slightly wrong it falls over horribly and gives you no clue about what you did wrong. I have no idea how i could possibly have worked this out if I didn't have the source code.
What version of iTextSharp are you using? The Document class doesn't implement IDisposable so you can't wrap it in a using block.

Categories