Add an existing PDF from file to an unwritten document using iTextSharp - c#

How do I correctly add the page(s) from an existing on-disk PDF file, to a currently-being-generated in-memory PDF document?
We have a class that produces a PDF document using iTextSharp. This works fine. At the moment we add a Terms & Conditions as an image at the last page:
this.nettPriceListDocument.NewPage();
this.currentPage++;
Image logo = Image.GetInstance("/inetpub/Applications/Trade/Reps/images/TermsAndConditions.gif");
logo.SetAbsolutePosition(0, 0);
logo.ScaleToFit(this.nettPriceListDocument.PageSize.Width, this.nettPriceListDocument.PageSize.Height);
this.nettPriceListDocument.Add(logo);
I have this image as a PDF document and would prefer to append it. If I can figure this out it would be possible to append other PDF documents we might need to what I am generating.
I have tried:
string tcfile = "/inetpub/Applications/Trade/Reps/images/TermsAndConditions.pdf";
PdfReader reader = new PdfReader(tcfile);
PdfWriter writer = PdfWriter.GetInstance(this.nettPriceListDocument, this.nettPriceListMemoryStream);
PdfContentByte content = writer.DirectContentUnder;
for (int pageno = 1; pageno < reader.NumberOfPages + 1; pageno++)
{
this.nettPriceListDocument.NewPage();
this.currentPage++;
PdfImportedPage newpage = writer.GetImportedPage(reader, pageno);
content.AddTemplate(newpage, 1f, 1f);
}
Which results in a "document not open" exception at the writer.DirectContentUnder
I've also tried:
string tcfile = "/inetpub/Applications/Trade/Reps/images/TermsAndConditions.pdf";
PdfReader reader = new PdfReader(tcfile);
PdfConcatenate concat = new PdfConcatenate(this.nettPriceListMemoryStream);
concat.AddPages(reader);
Which results in inserting a blank, oddly sized page over the usual first page of the document.
I've also tried:
string tcfile = "/inetpub/Applications/Trade/Reps/images/TermsAndConditions.pdf";
PdfReader reader = new PdfReader(tcfile);
PdfCopy copier = new PdfCopy(nettPriceListDocument, nettPriceListMemoryStream);
for (int pageno = 1; pageno < reader.NumberOfPages + 1; pageno++)
{
this.currentPage++;
PdfImportedPage newpage = copier.GetImportedPage(reader, pageno);
copier.AddPage(newpage);
}
copier.Close();
reader.Close();
Which results in a NullReferenceException at copier.AddPage(newpage).
I've also tried:
string tcfile = "/inetpub/Applications/Trade/Reps/images/TermsAndConditions.pdf";
PdfReader reader = new PdfReader(tcfile);
PdfCopyFields copier = new PdfCopyFields(nettPriceListMemoryStream);
copier.AddDocument(reader);
This also results in a NullReferenceException at copier.AddDocument(reader).
I've got most of these ideas from various StackOverflow questions and answers. One thing that nobody seemed to deal with, is adding new page(s) from an existing PDF file, to an already-existing in-memory document that is not yet written to a PDF file on disk. This document has already been opened and had pages of data written into it. If I leave this Terms & Conditions procedure out, or just write it is an image (like originally), the resulting PDF comes out just fine.
To finish as I started: How do I correctly add the page(s) from an existing on-disk PDF file, to a currently-being-generated in-memory PDF document?
Thanks and appreciation in advance for your musings on this. Please let me know if I can provide further information.

Here's a simple merge method that copies PDF files into one PDF. I use this method quite often when merging pdfs. I have used this to merge different sized pages as well without issue. Hope it helps.
public MemoryStream MergePdfForms(List<byte[]> files)
{
if (files.Count > 1)
{
PdfReader pdfFile;
Document doc;
PdfWriter pCopy;
MemoryStream msOutput = new MemoryStream();
pdfFile = new PdfReader(files[0]);
doc = new Document();
pCopy = new PdfSmartCopy(doc, msOutput);
doc.Open();
for (int k = 0; k < files.Count; k++)
{
pdfFile = new PdfReader(files[k]);
for (int i = 1; i < pdfFile.NumberOfPages + 1; i++)
{
((PdfSmartCopy)pCopy).AddPage(pCopy.GetImportedPage(pdfFile, i));
}
pCopy.FreeReader(pdfFile);
}
pdfFile.Close();
pCopy.Close();
doc.Close();
return msOutput;
}
else if (files.Count == 1)
{
return new MemoryStream(files[0]);
}
return null;
}

Related

Image' is an ambiguous reference between 'System.Drawing.Image' and 'iText.Layout.Element.Image'

With that code I can split a multi tiff and save the images to files.
public void SplitImage(string file)
{
Bitmap bitmap = (Bitmap)Image.FromFile(file);
int count = bitmap.GetFrameCount(FrameDimension.Page);
var new_files = file.Split("_");
String new_file = new_files[new_files.Length - 1];
for (int idx = 0; idx < count; idx++)
{
bitmap.SelectActiveFrame(FrameDimension.Page, idx);
bitmap.Save($"C:\\temp\\{idx}-{new_file}", ImageFormat.Tiff);
}
}
here the code for the Pdf creation
public void CreatePDFFromImages(string path_multi_tiff)
{
Image img = new Image(ImageDataFactory.Create(path_multi_tiff));
var p = new Paragraph("Image").Add(img);
var writer = new PdfWriter("C:\\temp\\test.pdf");
var pdf = new PdfDocument(writer);
var document = new Document(pdf);
document.Add(new Paragraph("Images"));
document.Add(p);
document.Close();
Console.WriteLine("Done !");
}
now I would like to save the images to pdf pages and tried it with iText7. But this fails as
using System.Drawing.Imaging;
using Image = iText.Layout.Element.Image;
are to close to have them both in the same class. How could I save the splitted images to PDF pages ? I would like to avoid saving first to files and reloading all the images.
The line
using Image = iText.Layout.Element.Image;
is a so-called using alias directive. It creates the alias Image for the namespace or type iText.Layout.Element.Image. If this poses a problem, you can simply create a different alias. For example
using A = iText.Layout.Element.Image;
will create the alias A for the namespace or type iText.Layout.Element.Image.

PDFsharp asking for a PDF to be opened for Import when adding pages

I have 3 PDFs, I want to add the pages from them to an output PDF file. What I’m trying to do is: import first PDF -> create new PDF document -> add pages -> draw in certain page and finally I want to add the pages from that document to the main PDF document that will be exported. Proceed to do the same with the second PDF file if needed.
ERROR: A PDF document must be opened with PdfDocumentOpenMode.Import to import pages from it.
From the main class I call the method that processes the PDF:
Pdftest pdftest = new Pdftest();
PdfDocument pdf = PdfReader.Open(#"C:\Users\pdf_file.pdf", PdfDocumentOpenMode.Import);
pdftest.CreatePages(pdf_file);
Pdftest class:
public class Pdftest
{
PdfDocument PDFNewDoc = new PdfDocument();
XFont fontChico = new XFont("Verdana", 8, XFontStyle.Bold);
XFont fontGrande = new XFont("Verdana", 12, XFontStyle.Bold);
XBrush fontBrush = XBrushes.Black;
public void CreatePages(PdfDocument archivoPdf)
{
PdfDocument NuevoDoc = new PdfDocument();
for (int Pg = 0; Pg < archivoPdf.Pages.Count; Pg++)
{
PdfPage pp = NuevoDoc.AddPage(archivoPdf.Pages[Pg]);
}
XGraphics Graficador = XGraphics.FromPdfPage(NuevoDoc.Pages[0]);
XPoint coordinate = new XPoint();
coordinate.X = XUnit.FromInch(1.4);
coordinate.Y = XUnit.FromInch(1.8);
graficador.DrawString("TEST", fontChico, fontBrush, coordinates);
for (int Pg = 0; Pg < NuevoDoc.Pages.Count; Pg++)
{
PdfPage pp = PDFNewDoc.AddPage(NuevoDoc.Pages[Pg]); //Error mentioned.
}
}
}
The error message relates to NuevoDoc. You have to save NuevoDoc in a MemoryStream and re-open it in Import mode to get your code going.
I do not understand why you try to copy pages from NuevoDoc to PDFNewDoc- so most likely you can avoid the MemoryStream when optimising the code.

Performance issue when using old iText for .NET version when splitting a huge PDF

My goal here is to split a huge PDF (over 1000 pages).
I tried the example below :
public void ExtractPages(string sourcePdfPath, string outputPdfPath,
int startPage, int endPage)
{
PdfReader reader = null;
Document sourceDocument = null;
PdfCopy pdfCopyProvider = null;
PdfImportedPage importedPage = null;
try
{
// Intialize a new PdfReader instance with the contents of the source Pdf file:
reader = new PdfReader(sourcePdfPath);
// For simplicity, I am assuming all the pages share the same size
// and rotation as the first page:
sourceDocument = new Document(reader.GetPageSizeWithRotation(startPage));
// Initialize an instance of the PdfCopyClass with the source
// document and an output file stream:
pdfCopyProvider = new PdfCopy(sourceDocument,
new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
sourceDocument.Open();
// Walk the specified range and add the page copies to the output file:
for (int i = startPage; i <= endPage; i++)
{
importedPage = pdfCopyProvider.GetImportedPage(reader, i);
pdfCopyProvider.AddPage(importedPage);
}
sourceDocument.Close();
reader.Close();
}
catch (Exception ex)
{
throw ex;
}
}
It works but it takes more than 10 min.
My question here is there any way to skip the for loop and getting all the pages quickly ??

iTextSharp - does not contain a definition for 'getInstance'

I'm trying to generate a pdf from image with iTextSharp, but I'm getting the following errors: iTextSharp.Image does not contain a definition for 'getInstance' and 'iTextSharp.text.Document does not contain a definition for 'add' and 'iTextSharp.text.Document does not contain a definition for 'newPage' and iTextSharp.text.Image does not contain a definition for 'scalePercent'**
I have already add the iText Library (itextsharp, itextsharp.pdfa and itextshar.xtra). here is my code:
private void button3_Click_1(object sender, EventArgs e)
{
saveFileDialog1.FileName = "name.pdf";
if (saveFileDialog1.ShowDialog() == DialogResult.OK)
{
using (Bitmap bitmap = new Bitmap(panel1.ClientSize.Width,
panel1.ClientSize.Height))
{
panel1.DrawToBitmap(bitmap, panel1.ClientRectangle);
bitmap.Save("C:\\" + (nPaginasPDF + 1) + ".bmp", ImageFormat.Bmp);
}
Document doc = new Document();
PdfWriter.GetInstance(doc, new FileOutputStream(yourOutFile));
doc.Open();
for (int iCnt = 0; iCnt < nPaginasPDF; iCnt++)
{
iTextSharp.text.Image image1 = iTextSharp.text.Image.GetInstance("C:\\" + (iCnt + 1) + ".bmp");
image1.ScalePercent(23f);
doc.NewPage();
doc.Add(image1);
}
using (var Stream = saveFileDialog1.OpenFile())
{
doc.Save(Stream);
}
doc.Close();
}
Both #Nenad and #MaxStoun are correct, you just need to adapt the Java conventions to .Net. Additionally, you'll also need to swap the Java FileOutputStream for the .Net System.IO.FileStream object.
EDIT
You have some "magic variables" in there that I need to work around. For instance, I'm not 100% sure what you're doing with your for loop so I just removed it for this sample. Also, I don't have write permissions to my c:\ directory to I'm saving to the desktop. Otherwise, this code should hopefully get you on the correct path.
//I don't know what you're doing with this variable so I'm just setting it to something
int nPaginasPDF = 10;
//I can't write to my C: drive so I'm saving to the desktop
string saveFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
//Set the default file name
saveFileDialog1.FileName = "name.pdf";
//If the user presses "OK"
if (saveFileDialog1.ShowDialog() == DialogResult.OK) {
//Create a bitmap and save it to disk
using (Bitmap bitmap = new Bitmap(panel1.ClientSize.Width, panel1.ClientSize.Height)) {
panel1.DrawToBitmap(bitmap, panel1.ClientRectangle);
//Path.Combine is a safer way to build file pathes
bitmap.Save(System.IO.Path.Combine(saveFolder, nPaginasPDF + ".bmp"), ImageFormat.Bmp);
}
//Create a new file stream instance with some locks for safety
using (var fs = new System.IO.FileStream(saveFileDialog1.FileName, System.IO.FileMode.Create, System.IO.FileAccess.Write, System.IO.FileShare.None)) {
//Create our iTextSharp document
using (var doc = new Document()) {
//Bind a PdfWriter to the Document and FileStream
using (var writer = PdfWriter.GetInstance(doc, fs)) {
//Open the document for writing
doc.Open();
//Get an instance of our image
iTextSharp.text.Image image1 = iTextSharp.text.Image.GetInstance(System.IO.Path.Combine(saveFolder, nPaginasPDF + ".bmp"));
//Sacle it
image1.ScalePercent(23f);
//Add a new page
doc.NewPage();
//Add our image to the document
doc.Add(image1);
//Close our document for writing
doc.Close();
}
}
}
}
If you use iText documentation or books for Java, you need to adapt things a bit for .NET. In your example, since .NET implicit getters and setters for properties, this:
var instance = iTextSharp.Image.getInstance();
becomes this:
var instance = iTextSharp.Image.Instance;
Second issue: method names in Java are camel case, vs .NET pascal case, so this (camelCase):
image1.scalePercent(23f);
doc.newPage();
doc.add(image1);
becomes this (PascalCase):
image1.ScalePercent(23f);
doc.NewPage();
doc.Add(image1);
And so on. Just apply .NET code naming conventions instead of Java's.
You'll upper first letter in method name (I just downloaded it from Nuget)
Image.getInstance(); => Image.GetInstance();
doc.add(image1); => doc.Add(image1);

using ITextSharp to extract and update links in an existing PDF

I need to post several (read: a lot) PDF files to the web but many of them have hard coded file:// links and links to non-public locations. I need to read through these PDFs and update the links to the proper locations. I've started writing an app using itextsharp to read through the directories and files, find the PDFs and iterate through each page. What I need to do next is find the links and then update the incorrect ones.
string path = "c:\\html";
DirectoryInfo rootFolder = new DirectoryInfo(path);
foreach (DirectoryInfo di in rootFolder.GetDirectories())
{
// get pdf
foreach (FileInfo pdf in di.GetFiles("*.pdf"))
{
string contents = string.Empty;
Document doc = new Document();
PdfReader reader = new PdfReader(pdf.FullName);
using (MemoryStream ms = new MemoryStream())
{
PdfWriter writer = PdfWriter.GetInstance(doc, ms);
doc.Open();
for (int p = 1; p <= reader.NumberOfPages; p++)
{
byte[] bt = reader.GetPageContent(p);
}
}
}
}
Quite frankly, once I get the page content I'm rather lost on this when it comes to iTextSharp. I've read through the itextsharp examples on sourceforge, but really didn't find what I was looking for.
Any help would be greatly appreciated.
Thanks.
This one is a little complicated if you don't know the internals of the PDF format and iText/iTextSharp's abstraction/implementation of it. You need to understand how to use PdfDictionary objects and look things up by their PdfName key. Once you get that you can read through the official PDF spec and poke around a document pretty easily. If you do care I've included the relevant parts of the PDF spec in parenthesis where applicable.
Anyways, a link within a PDF is stored as an annotation (PDF Ref 12.5). Annotations are page-based so you need to first get each page's annotation array individually. There's a bunch of different possible types of annotations so you need to check each one's SUBTYPE and see if its set to LINK (12.5.6.5). Every link should have an ACTION dictionary associated with it (12.6.2) and you want to check the action's S key to see what type of action it is. There's a bunch of possible ones for this, link's specifically could be internal links or open file links or play sound links or something else (12.6.4.1). You are looking only for links that are of type URI (note the letter I and not the letter L). URI Actions (12.6.4.7) have a URI key that holds the actual address to navigate to. (There's also an IsMap property for image maps that I can't actually imagine anyone using.)
Whew. Still reading? Below is a full working VS 2010 C# WinForms app based on my post here targeting iTextSharp 5.1.1.0. This code does two main things: 1) Create a sample PDF with a link in it pointing to Google.com and 2) replaces that link with a link to bing.com. The code should be pretty well commented but feel free to ask any questions that you might have.
using System;
using System.Text;
using System.Windows.Forms;
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.IO;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
//Folder that we are working in
private static readonly string WorkingFolder = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Hyperlinked PDFs");
//Sample PDF
private static readonly string BaseFile = Path.Combine(WorkingFolder, "OldFile.pdf");
//Final file
private static readonly string OutputFile = Path.Combine(WorkingFolder, "NewFile.pdf");
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
CreateSamplePdf();
UpdatePdfLinks();
this.Close();
}
private static void CreateSamplePdf()
{
//Create our output directory if it does not exist
Directory.CreateDirectory(WorkingFolder);
//Create our sample PDF
using (iTextSharp.text.Document Doc = new iTextSharp.text.Document(PageSize.LETTER))
{
using (FileStream FS = new FileStream(BaseFile, FileMode.Create, FileAccess.Write, FileShare.Read))
{
using (PdfWriter writer = PdfWriter.GetInstance(Doc, FS))
{
Doc.Open();
//Turn our hyperlink blue
iTextSharp.text.Font BlueFont = FontFactory.GetFont("Arial", 12, iTextSharp.text.Font.NORMAL, iTextSharp.text.BaseColor.BLUE);
Doc.Add(new Paragraph(new Chunk("Go to URL", BlueFont).SetAction(new PdfAction("http://www.google.com/", false))));
Doc.Close();
}
}
}
}
private static void UpdatePdfLinks()
{
//Setup some variables to be used later
PdfReader R = default(PdfReader);
int PageCount = 0;
PdfDictionary PageDictionary = default(PdfDictionary);
PdfArray Annots = default(PdfArray);
//Open our reader
R = new PdfReader(BaseFile);
//Get the page cont
PageCount = R.NumberOfPages;
//Loop through each page
for (int i = 1; i <= PageCount; i++)
{
//Get the current page
PageDictionary = R.GetPageN(i);
//Get all of the annotations for the current page
Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);
//Make sure we have something
if ((Annots == null) || (Annots.Length == 0))
continue;
//Loop through each annotation
foreach (PdfObject A in Annots.ArrayList)
{
//Convert the itext-specific object as a generic PDF object
PdfDictionary AnnotationDictionary = (PdfDictionary)PdfReader.GetPdfObject(A);
//Make sure this annotation has a link
if (!AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK))
continue;
//Make sure this annotation has an ACTION
if (AnnotationDictionary.Get(PdfName.A) == null)
continue;
//Get the ACTION for the current annotation
PdfDictionary AnnotationAction = (PdfDictionary)AnnotationDictionary.Get(PdfName.A);
//Test if it is a URI action
if (AnnotationAction.Get(PdfName.S).Equals(PdfName.URI))
{
//Change the URI to something else
AnnotationAction.Put(PdfName.URI, new PdfString("http://www.bing.com/"));
}
}
}
//Next we create a new document add import each page from the reader above
using (FileStream FS = new FileStream(OutputFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (Document Doc = new Document())
{
using (PdfCopy writer = new PdfCopy(Doc, FS))
{
Doc.Open();
for (int i = 1; i <= R.NumberOfPages; i++)
{
writer.AddPage(writer.GetImportedPage(R, i));
}
Doc.Close();
}
}
}
}
}
}
EDIT
I should note, this only changes the actual link. Any text within the document won't get updated. Annotations are drawn on top of text but aren't really tied to the text underneath in anyway. That's another topic completely.
Noted if the Action is indirect it will not return a dictionary and you will have an error:
PdfDictionary AnnotationAction = (PdfDictionary)AnnotationDictionary.Get(PdfName.A);
In cases of possible indirect dictionaries:
PdfDictionary Action = null;
//Get action directly or by indirect reference
PdfObject obj = Annotation.Get(PdfName.A);
if (obj.IsIndirect) {
Action = PdfReader.GetPdfObject(obj);
} else {
Action = (PdfDictionary)obj;
}
In that case you have to investigate the returned dictionary to figure out where the URI is found. As with an indirect /Launch dictionary the URI is located in the /F item being of type PRIndirectReference with the /Type being a /FileSpec and the URI located in the value of /F
Added code for dealing with indirect and launch actions and null annotation-dictionary:
PdfReader r = new PdfReader(#"d:\kb2\" + f);
for (int i = 1; i <= r.NumberOfPages; i++) {
//Get the current page
var PageDictionary = r.GetPageN(i);
//Get all of the annotations for the current page
var Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);
//Make sure we have something
if ((Annots == null) || (Annots.Length == 0))
continue;
foreach (var A in Annots.ArrayList) {
var AnnotationDictionary = PdfReader.GetPdfObject(A) as PdfDictionary;
if (AnnotationDictionary == null)
continue;
//Make sure this annotation has a link
if (!AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK))
continue;
//Make sure this annotation has an ACTION
if (AnnotationDictionary.Get(PdfName.A) == null)
continue;
var annotActionObject = AnnotationDictionary.Get(PdfName.A);
var AnnotationAction = (PdfDictionary)(annotActionObject.IsIndirect() ? PdfReader.GetPdfObject(annotActionObject) : annotActionObject);
var type = AnnotationAction.Get(PdfName.S);
//Test if it is a URI action
if (type.Equals(PdfName.URI)) {
//Change the URI to something else
string relativeRef = AnnotationAction.GetAsString(PdfName.URI).ToString();
AnnotationAction.Put(PdfName.URI, new PdfString(url));
} else if (type.Equals(PdfName.LAUNCH)) {
//Change the URI to something else
var filespec = AnnotationAction.GetAsDict(PdfName.F);
string url = filespec.GetAsString(PdfName.F).ToString();
AnnotationAction.Put(PdfName.F, new PdfString(url));
}
}
}
//Next we create a new document add import each page from the reader above
using (var output = File.OpenWrite(outputFile.FullName)) {
using (Document Doc = new Document()) {
using (PdfCopy writer = new PdfCopy(Doc, output)) {
Doc.Open();
for (int i = 1; i <= r.NumberOfPages; i++) {
writer.AddPage(writer.GetImportedPage(r, i));
}
Doc.Close();
}
}
}
r.Close();

Categories