Merging PDF with ITextSharp takes time - c#

I am using ITextSharp to merge PDFs.
My problem is when I merge huge PDFs, it takes a very long time to do it (many minutes). It appears that it takes all this time on the "document.close()".
Here is my code :
iTextSharp.text.Document doc = new iTextSharp.text.Document();
PdfCopy copy = new PdfCopy(doc, msOutput);
copy.SetMergeFields();
doc.Open();
byte[] byteArray = Convert.FromBase64String("someString");
PdfReader reader = new PdfReader(byteArray);
copy.AddDocument(reader);
doc.Close(); // <== It takes time here !
byte[] form = msOutput.ToArray();
Is there anything I did wrong ?
How can I improve this merging time ?

You are missing some Close() calls - this may help to bring your time down:
byte[] form
using (var msOutput = new MemoryStream())
{
iTextSharp.text.Document doc = new iTextSharp.text.Document();
byte[] byteArray = Convert.FromBase64String("someString");
PdfCopy copy = new PdfCopy(doc, msOutput);
copy.SetMergeFields();
doc.Open();
PdfReader reader = new PdfReader(byteArray);
copy.AddDocument(reader);
reader.Close();
copy.Close();
doc.Close();
form = msOutput.ToArray();
}
You should also be sure you are properly disposing of your stream after use.

Related

iText7 PdfDocument save to two locations on disk

In the code below I want to be able to read in a PDF file and add some encryption and then resave the file. what is the best way to do that? I dont see a Save method in pdfDocument is there another object I should use?
PdfReader pdfReader = null;
byte[] bytesPassword = System.Text.ASCIIEncoding.UTF8.GetBytes("PassWord");
WriterProperties writerProperties = new WriterProperties();
PdfDocument pdfDocument = null;
using (MemoryStream ms = new MemoryStream())
{
pdfReader = new PdfReader(destFile);
writerProperties.SetStandardEncryption(null, bytesPassword, EncryptionConstants.ALLOW_PRINTING, EncryptionConstants.ENCRYPTION_AES_256);
pdfDocument = new PdfDocument(pdfReader, new PdfWriter(ms, writerProperties));
pdfDocument.Close();
}
//pdfDocument.Save(FilePath1)
//pdfDocument.Save(FilePath2)

How to return PDF using iText

I am trying to return a PDF with simple text, but getting the following error when downloading the document: Failed to load PDF document. Any ideas on how to resolve this is appreciated.
MemoryStream ms = new MemoryStream();
PdfWriter writer = new PdfWriter(ms);
PdfDocument pdfDocument = new PdfDocument(writer);
Document document = new Document(pdfDocument);
document.Add(new Paragraph("Hello World"));
//document.Close();
//writer.Close();
ms.Position = 0;
string pdfName = $"IP-Report-{DateTime.Now.ToString("yyyyMMddHHmmssfff")}.pdf";
return File(ms, "application/pdf", pdfName);
You have to close the writer without closing the underlying stream, which will flush its internal buffer. As is, the document isn't being written to the memory stream in its entirety. Everything but ms should be in a using, too.
You can verify this is occuring by checking the length of ms in your code vs. the code below.
When the using (PdfWriter writer =...) closes, it will close the writer, which causes it to flush its pending writes to the underlying stream ms.
MemoryStream ms = new MemoryStream();
using (PdfWriter writer = new PdfWriter(ms))
using (PdfDocument pdfDocument = new PdfDocument(writer))
using (Document document = new Document(pdfDocument))
{
/*
* Depending on iTextSharp version, you might instead use:
* writer.SetCloseStream(false);
*/
writer.CloseStream = false;
document.Add(new Paragraph("Hello World"));
}
ms.Position = 0;
string pdfName = $"IP-Report-{DateTime.Now.ToString("yyyyMMddHHmmssfff")}.pdf";
return File(ms, "application/pdf", pdfName);

Convert html to pdf and merge it with existing pdfs

I have a System.Net.Mail.MailMessage which shall have it's html body and pdf attachments converted into one single pdf.
Converting the html body to pdf works for me with this answer
Converting the pdf attachments into one pdf works for me with this answer
However after ~10 hours of trying I can not come up with a combined solution which does both. All I'm getting are NullReferenceExceptions somewhere in IText source, "the document is not open", etc...
For example, this will throw no error but the resulting pdf will only contain the attachments but not the html email body:
Document document = new Document();
StringReader sr = new StringReader(mail.Body);
HTMLWorker htmlparser = new HTMLWorker(document);
using (FileStream fs = new FileStream(targetPath, FileMode.Create))
{
PdfCopy writer = new PdfCopy(document, fs);
document.Open();
htmlparser.Parse(sr);
foreach (string fileName in pdfList)
{
PdfReader reader = new PdfReader(fileName);
reader.ConsolidateNamedDestinations();
for (int i = 1; i <= reader.NumberOfPages; i++)
{
PdfImportedPage page = writer.GetImportedPage(reader, i);
writer.AddPage(page);
}
PRAcroForm form = reader.AcroForm;
if (form != null)
{
writer.CopyAcroForm(reader);
}
reader.Close();
}
writer.Close();
document.Close();
}
I'm using the LGPL licensed ITextSharp 4.1.6
From v4.1.6 fanboy to v4.1.6 fanboy :D
Looks like the HTMLWorker is closing the documents stream right after parsing. So as a workaround, you could create a pdf from your mailbody in memory. And then add this one together with the attachment to your final pdf.
Here is some code, that should do the trick:
StringReader htmlStringReader = new StringReader("<html><body>Hello World!!!!!!</body></html>");
byte[] htmlResult;
using (MemoryStream htmlStream = new MemoryStream())
{
Document htmlDoc = new Document();
PdfWriter htmlWriter = PdfWriter.GetInstance(htmlDoc, htmlStream);
htmlDoc.Open();
HTMLWorker htmlWorker = new HTMLWorker(htmlDoc);
htmlWorker.Parse(htmlStringReader);
htmlDoc.Close();
htmlResult = htmlStream.ToArray();
}
byte[] pdfResult;
using (MemoryStream pdfStream = new MemoryStream())
{
Document doc = new Document();
PdfCopy copyWriter = new PdfCopy(doc, pdfStream);
doc.Open();
PdfReader htmlPdfReader = new PdfReader(htmlResult);
AppendPdf(copyWriter, htmlPdfReader); // your foreach pdf code here
htmlPdfReader.Close();
PdfReader attachmentReader = new PdfReader("C:\\temp\\test.pdf");
AppendPdf(copyWriter, attachmentReader);
attachmentReader.Close();
doc.Close();
pdfResult = pdfStream.ToArray();
}
using (FileStream fs = new FileStream("C:\\temp\\test2.pdf", FileMode.Create, FileAccess.Write))
{
fs.Write(pdfResult, 0, pdfResult.Length);
}
private void AppendPdf(PdfCopy writer, PdfReader reader)
{
for (int i = 1; i <= reader.NumberOfPages; i++)
{
PdfImportedPage page = writer.GetImportedPage(reader, i);
writer.AddPage(page);
}
}
Ofc you could directly use a FileStream for the final document instead of a MemoryStream as well.

iTextSharp structureTreeRoot.numTree is null

I'm getting an error while closing my document. It's thrown when calling the function "FixTaggedStructure" from PdfCopy
Dictionary<int, PdfIndirectReference> numTree = structureTreeRoot.NumTree;
My debugger shows that "structureTreeRoot" is null, but I don't know why.
My code is very simple. I am trying to convert a PDF to an PDF/A-1 referring to
Convert PDF to PDF/A3 or PDF/A-1 to PDF/A-3
Document doc = new Document();
FileStream fs = new FileStream(destPdfA, FileMode.Create);
PdfReader reader = new PdfReader(pdfParth);
PdfCopy copy = new PdfCopy(doc, fs);
copy.SetPdfVersion(PdfCopy.PDF_VERSION_1_4);
copy.SetTagged();
copy.CreateXmpMetadata();
doc.Open();
ICC_Profile icc = ICC_Profile.GetInstance(new FileStream(ICM, FileMode.Open));
PdfDictionary outi = new PdfDictionary(PdfName.OUTPUTINTENT);
outi.Put(PdfName.OUTPUTCONDITIONIDENTIFIER, new PdfString("sRGB IEC61966-2.1"));
outi.Put(PdfName.INFO, new PdfString("sRGB IEC61966-2.1"));
outi.Put(PdfName.S, PdfName.GTS_PDFA1);
// get this file here: http://old.nabble.com/attachment/10971467/0/srgb.profile
PdfICCBased ib = new PdfICCBased(icc);
ib.Remove(PdfName.ALTERNATE);
outi.Put(PdfName.DESTOUTPUTPROFILE, copy.AddToBody(ib).IndirectReference);
copy.ExtraCatalog.Put(PdfName.OUTPUTINTENTS, outi);
copy.AddDocument(reader);
doc.Close();

I want to watermark a pdf file without creating another pdf file

I want to convert an image to PDF and add a watermark to it. I used iTextSharp to convert it. I successfully converted the image file to pdf but I'm not able to add watermark to it without creating another pdf file.
The code below creates a PDF file and also adds custom attributes,
function watermarkpdf is used to add watermark and pdfname is given as the arguement
foreach (string filenm in Images)
using (var imageStream = new FileStream(filenm, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
doc.NewPage();
iTextSharp.text.Image jpeg = iTextSharp.text.Image.GetInstance(filenm);
float width = doc.PageSize.Width;
float height = doc.PageSize.Height;
jpeg.ScaleToFit(width,height);
doc.Add(jpeg);
}
doc.AddHeader("name", "vijay");
watermarkpdf(pdfname);
The watermarkpdf function is given below.
PdfReader pdfReader = new PdfReader(txtpath.Text+"\\pdf\\" + pdfname);
FileStream stream = new FileStream(txtpath.Text + pdfname,FileMode.Open);
PdfStamper pdfStamper = new PdfStamper(pdfReader, stream);
for (int pageIndex = 1; pageIndex <= pdfReader.NumberOfPages; pageIndex++)
{
Rectangle pageRectangle = pdfReader.GetPageSizeWithRotation(pageIndex);
PdfContentByte pdfData = pdfStamper.GetUnderContent(pageIndex);
pdfData.SetFontAndSize(BaseFont.CreateFont(BaseFont.HELVETICA_BOLD, BaseFont.CP1252, BaseFont.NOT_EMBEDDED), 40);
PdfGState graphicsState = new PdfGState();
graphicsState.FillOpacity = 0.4F;
pdfData.SetGState(graphicsState);
pdfData.SetColorFill(BaseColor.BLUE);
pdfData.BeginText();
pdfData.ShowTextAligned(Element.ALIGN_CENTER, "SRO-Kottarakkara", pageRectangle.Width / 2, pageRectangle.Height / 2, 45);
pdfData.EndText();
}
pdfStamper.Close();
stream.Close();
iTextSharp doesn't support "in-place editing" of files, only reading existing files and creating new files. The problem is that it would have to write to something that is being written to which could be very problematic.
However, instead of using a file you can create your image in a MemoryStream, grab the bytes from that and pipe that to the PdfReader, all with minimal changes to your code. All of the PDF writing functions that take files actually work with the abstract Stream class and which MemoryStream inherits from so they can be used interchangeably. Below is some basic code that should show you what I'm talking about. I don't have an IDE currently so there might be a typo or two but for the most part it should work.
//Image part
//We will dump the bytes from the memory stream to the variable below later
byte[] bytes;
using (MemoryStream ms = new MemoryStream()){
Document doc = new Document(PageSize.LETTER);
PdfWriter writer = PdfWriter.GetInstance(doc, ms);
doc.Open();
//foreach (string filenm in Images)
//...
doc.Close();
//Dump the bytes, make sure to use ToArray() and not GetBuffer()
bytes = ms.ToArray();
}
//Watermark part
//Read from our bytes
PdfReader pdfReader = new PdfReader(bytes);
FileStream stream = new FileStream(txtpath.Text + pdfname,FileMode.Open);
//...

Categories