Merging two PDF pages on top of each other - c#

I am looking for a way to merge the content of two pdf pages.
It could be a watermark, an image or whatever.
The scenario is as follows:
I have a Word-addin that allows the user to create different templates for different customers based on several template forms. For each new customer, the user can provide a new letter paper containing header image / logos and footer. This shall be applied anyhow at runtime. Could be an image that is loaded directly into the header of the template (then I would need to render pdf to image, for the letter paper will mostly be provided as pdf-file) or when exporting the document (merging letter paper as background).
But the template shall not be accessible by the user, so this must be done programmatically.
So far, I tried Pdfsharp library, which does not support neither the version of my provided backpapers, nor the version of my documents created in Word 2007.
iTextSharp seemed very promising, but I could not manage to merge the contents so far.
I also tried pdftk.exe, but even when i ran it manually from command line, I got the error: "Done. Input errors, so no output created."
It does not matter how it is handled, but the output matters.
I forgot to mention, there is a whiteline created in the Word-template for archiving purposes, so this part may not be added as image or it has to be added afterwords into the output document.
Thanks in advance!

StampStationery.cs, a sample from the Webified iTextSharp Examples which essentially are the C#/iTextSharp versions of the Java/iText samples from the book iText in Action — 2nd Edition, does show how to add the contents of a page from one PDF document as stationery behind the content of each page of another PDF.
The central method is this:
public byte[] ManipulatePdf(byte[] src, byte[] stationery)
{
// Create readers
PdfReader reader = new PdfReader(src);
PdfReader s_reader = new PdfReader(stationery);
using (MemoryStream ms = new MemoryStream())
{
// Create the stamper
using (PdfStamper stamper = new PdfStamper(reader, ms))
{
// Add the stationery to each page
PdfImportedPage page = stamper.GetImportedPage(s_reader, 1);
int n = reader.NumberOfPages;
PdfContentByte background;
for (int i = 1; i <= n; i++)
{
background = stamper.GetUnderContent(i);
background.AddTemplate(page, 0, 0);
}
}
return ms.ToArray();
}
}
This method returns the manipulated PDF as a byte[].

Related

iText7 C# - Merging same pdf document multiple times reduces file size

I have a .NET 6.0 Windows Forms Application with iText7 added from NuGet package manager.
In the UI I have:
A Open File Dialogue to allow user to select a single .pdf file
A text box to allow to enter a number. Content of the selected pdf will be multiplied this many times.
A button to generate a single output .pdf file.
When user clicks on the generate output button - a single output.pdf need to be generated and it should have the content of the original .pdf duplicated as many time as user specified in the text box.
Notes:
In the UI, user can select any valid .pdf file (content could be anything - text or image or combination or whatever really). Let's say the size of this .pdf file is 2 Mb.
Code I have:
GenerateContentDuplicatedPDF(string existingFileFullPath, string outputFileFullPath, int iNumberOfMerges)
{
using (var memoryStream = new MemoryStream())
{
using (PdfReader pdfReader = new PdfReader(existingFileFullPath))
{
using (PdfDocument SourceDocument1 = new PdfDocument(pdfReader))
{
WriterProperties properties = new WriterProperties().SetCompressionLevel(CompressionConstants.NO_COMPRESSION);
using (PdfWriter pdfWriter = new PdfWriter(memoryStream, properties))
{
using (PdfDocument pdfDocument = new PdfDocument(pdfWriter))
{
PdfMerger merge = new PdfMerger(pdfDocument);
for (int i = 0; i < iNumberOfMerges; i++)
{
merge.Merge(SourceDocument1, 1, SourceDocument1.GetNumberOfPages());
}
merge.Close();
SourceDocument1.Close();
byte[] result = memoryStream.ToArray();
File.WriteAllBytes(outputFileFullPath, result);
}
}
}
}
}
}
It works and the output is as expected. The generated .pdf has content duplicated as expected.
What is the issue:
I was expecting the output file that gets generated would have the size multiplied by the number of times user wants the content duplicated.
For example: If user has selected a 2 Mb file, and entered 10 as the number, then I expect the output to be 20Mb. But iText7 is generating a file around 2.7 Mb. This is happening for all types of content, be it text or image or combination.
I have set the compression level to No compression too but still its the same. I want the generated output file to multiple the file size as well.
Not sure what is going wrong here or if iText7 is cleverly optimized to minimize the pdf size with duplicate content when the file is generated. Can I override this behavior?

Inserting an image after text

I'm making a simple program with .NET and iText7
Inserting a signature image in a PDF document is one of the functions under production.
It has been implemented until the image is inserted into the PDF and newly saved, but I don't know if the image goes behind the text.
The Canvas function seems to be possible, but no matter how many times I look at the example, I can't see any parameters related to the placement.
It would be nice to present a keyword that can implement the function.
The sample results are attached to help understanding. In the figure, the left is the capture of the PDF in which I inserted my signature using a word processor, and the right is the capture of the PDF generated through IText.
My iText version is .Net 7.2.1.
I attached the code below just in case it was necessary.
Thank you.
public void PDF_SIGN(FileInfo old_fi)
{
string currentPath = System.IO.Path.GetDirectoryName(Process.GetCurrentProcess().MainModule.FileName);
String imageFile = currentPath + "\\sign.jpg";
ImageData data = ImageDataFactory.Create(imageFile);
string source = old_fi.FullName;
string sourceFileName = System.IO.Path.GetFileNameWithoutExtension(source);
string sourceFileExtenstion = System.IO.Path.GetExtension(source);
string dest = old_fi.DirectoryName + "\\" + sourceFileName + "(signed)" + sourceFileExtenstion;
PdfDocument pdfDoc = new PdfDocument(new PdfReader(source), new PdfWriter(dest));
Document document = new Document(pdfDoc);
iText.Layout.Element.Image image = new iText.Layout.Element.Image(data);
image.ScaleAbsolute(Xsize, Ysize);
image.SetFixedPosition(1, Xaxis, Yaxis);
document.Add(image);
document.Close();
pdfDoc.Close();
}
Sample Result (Left: Gaol, Right: Current result):
You can organize content you add using iText in the same pass before or behind each other simply by the order of adding, and layout elements may have background images or colors.
The previously existing content of the source document, consequentially, usually serves as a mere background of everything new. Except, that is, if you draw to a page in a content stream that precedes earlier content.
Unfortunately you cannot use the Document class for this as its renderer automatically works in the foreground. But you can use the Canvas class here; this class only works on a single object (e.g. a single page) but it can be initialized in a far more flexible way.
In your case, therefore, replace
Document document = new Document(pdfDoc);
iText.Layout.Element.Image image = new iText.Layout.Element.Image(data);
image.ScaleAbsolute(Xsize, Ysize);
image.SetFixedPosition(1, Xaxis, Yaxis);
document.Add(image);
document.Close();
by
iText.Layout.Element.Image image = new iText.Layout.Element.Image(data);
image.ScaleAbsolute(Xsize, Ysize);
image.SetFixedPosition(1, Xaxis, Yaxis);
PdfPage pdfPage = pdfDoc.GetFirstPage();
PdfCanvas pdfCanvas = new PdfCanvas(pdfPage.NewContentStreamBefore(), pdfPage.GetResources(), pdfDoc);
using (Canvas canvas = new Canvas(pdfCanvas, pdfPage.GetCropBox()))
{
canvas.Add(image);
}
and you should get the desired result.
(Actually I tested that using Java and ported it to C# in writing this answer. I hope it's ported all right.)
As an aside, if you only want to put an image on the page, you don't really need the Canvas, you can directly use one of the AddImage* methods of PdfCanvas. For multiple elements to be automatically arranged, though, using the Canvas is a good idea.
Also I said above that you cannot use Document here. Actually you can if you replace the document renderer that class uses. For the task at hand that would have been an overkill, though.

Replacing an image

I am trying to add or replace an image in an existing PDF file using iTextSharp. The file has 3 layers which are required by the printing company. The content on these layers cannot be merged.
Thus far I have tried many of the code examples (most don't seem to be in C# and can't find the conversion from java). The closest example is:
PdfReader reader = new PdfReader(this.FrontPDFFile);
PdfStamper stamper = new PdfStamper(reader, new FileStream(this.OutputDirectory, FileMode.Create));
var pdfContentBuffer = stamper.GetOverContent(1);
// get image from our api
System.Drawing.Image image = GenerateQRCode("GUOIO", 1000, 1000);
// convert to itextsharp image and insert
iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance(image, iTextSharp.text.BaseColor.WHITE);
img.SetAbsolutePosition(100, 100);
img.ScaleToFit(100, 100);
pdfContentBuffer.AddImage(img, true);
stamper.Close();
This generates the pdf with the image on, however opening in Illustrator it is not shown, this is likely to be something to do with layers (I am told). Anyone got any ideas?

Strip Adobe Reader and Version requirements from PDF before outputting it to browser

I am planning on using pdf.js to have PDF context via the browser with Javascript. The problem is that some PDFs, the ones I am using, require Adobe's Reader with a specific Version. pdf.js does does not yet(ever?) support spoofing of these. What I need to know is if there's a way in C# to open the PDF and remove these Reader and Version requirements and how to do it. I was planning on using itextsharp to do other PDF manipulation server-side so an example using this would be most helpful. I plan on serving these as an ActionResult from an ajax request via MVC 4, so a MemoryStream would be most helpful at the end of this manipulation.
So in the end pdf.js was unable to do what I needed it too, however, what I was able to do was convert the Xfa/Pdf to a C# object then send the pages as needed via Json to my Javascript for rendering in the HTML5 Canvas. The code below takes an xfa-in-a-pdf file and turns it into a C# object with the help of itextsharp:
PdfReader.unethicalreading = true;
PdfReader reader = new PdfReader(new FileStream(Statics.PdfUploadLocation + PdfFileName, FileMode.Open, FileAccess.Read));
XfaForm xfaForm = new XfaForm(reader);
XDocument xDoc = XDocument.Parse(xfaForm.DomDocument.InnerXml);
string xfaNamespace = #"{http://www.xfa.org/schema/xfa-template/2.6/}";
List<XElement> formPages = xDoc.Descendants(xfaNamespace + "subform").Descendants(xfaNamespace + "subform").ToList();
TotalPages = formPages.Count();
var fieldIndex = 0;
RawPdfFields = new List<XfaField>();
for (int page = 0; page < formPages.Count(); page++)
{
RawPdfFields.AddRange(formPages[page].Descendants(xfaNamespace + "field")
.Select(x => new XfaField
{
Page = page,
Index = fieldIndex++,
Name = (string)x.Attribute("name"),
Height = GetUnitFromPossibleString((string)x.Attribute("h")),
Width = GetUnitFromPossibleString((string)x.Attribute("w")),
XPosition = GetUnitFromPossibleString((string)x.Attribute("x")),
YPosition = GetUnitFromPossibleString((string)x.Attribute("y")),
Reference = GetReference(x.Descendants(xfaNamespace + "traverse")),
AssistSpeak = GetAssistSpeak(x.Descendants(xfaNamespace + "speak"))
}).ToList());
}
Your PDF file n-400.pdf uses the Adobe XML Forms Architecture (XFA). This means you require a viewer that also supports XFA which pdf.js seemingly does not.
Such a PDF normally contains some standard PDF content which indicates that the PDF requires some viewer that supports XFA. In your case the content contains
If this message is not eventually replaced by the proper contents of the document, your PDF viewer may not be able to display this type of document.
This actually indicates what a XFA enabled viewer does, it renders some pages based upon information in the XFA XML data and displays it instead of the PDF style page descriptions.
While being defined proprietarily by Adobe, the PDF specification ISO 32000-1 describes how XFA data is to be embedded in a PDF document, cf. section 12.7.8 XFA Forms.
If you only need those forms in a flattened state, you might want to have a look at iText Demo: Dynamic XFA forms in PDF.

How to read a PDF Portfolio using iTextSharp

I'm using iTextSharp, in a C# app that reads PDF files and breaks out the pages as separate PDF documents. It works well, except in the case of portfolios. Now I'm trying to figure out how to read a PDF portfolio (or Collection, as they seem to be called in iText) that contains two embedded PDF documents. I want to simply open the portfolio, enumerate the embedded files and then save them as separate, simple PDF files.
There's a good example of how to programmatically create a PDF portfolio, here:
Kubrick Collection Example
But I haven't seen any examples that read portfolios. Any help would be much appreciated!
The example you referenced adds the embedded files as document-level attachments. So you can extract the files like this:
PdfReader reader = new PdfReader(readerPath);
PdfDictionary root = reader.Catalog;
PdfDictionary documentnames = root.GetAsDict(PdfName.NAMES);
PdfDictionary embeddedfiles =
documentnames.GetAsDict(PdfName.EMBEDDEDFILES);
PdfArray filespecs = embeddedfiles.GetAsArray(PdfName.NAMES);
for (int i = 0; i < filespecs.Size; ) {
filespecs.GetAsString(i++);
PdfDictionary filespec = filespecs.GetAsDict(i++);
PdfDictionary refs = filespec.GetAsDict(PdfName.EF);
foreach (PdfName key in refs.Keys) {
PRStream stream = (PRStream) PdfReader.GetPdfObject(
refs.GetAsIndirectObject(key)
);
using (FileStream fs = new FileStream(
filespec.GetAsString(key).ToString(), FileMode.OpenOrCreate
)){
byte[] attachment = PdfReader.GetStreamBytes(stream);
fs.Write(attachment, 0, attachment.Length);
}
}
}
Pass the output file from the Kubrick Collection Example you referenced to the PdfReader constructor (readerPath) if you want to test this.
Java version: part4.chapter16.KubrickDocumentary
C# version.
Hopefully I'll have time to update the C# examples this month from version 5.2.0.0 (the iTextSharp version is about three weeks behind the Java version right now).

Categories