Copying PDF without loose form field structure with ItextSharp - c#

I would like to get a pdf, keep somes pages, then save it to another destination without losing fieldstructure.
Here the code perfectly working for copying:
string sourceFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
string sourceFile = Path.Combine(sourceFolder, "POMultiple.pdf");
string fileName = #"C:\Users\MyUser\Desktop\POMultiple.pdf";
byte[] file = System.IO.File.ReadAllBytes(fileName);
public static void removePagesFromPdf(byte[] sourceFile, String destinationFile, params int[] pagesToKeep)
{
//Used to pull individual pages from our source
PdfReader r = new PdfReader(sourceFile);
//Create our destination file
using (FileStream fs = new FileStream(destinationFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (Document doc = new Document())
{
PdfWriter writer = PdfWriter.GetInstance(doc, fs);
//Open the desitination for writing
doc.Open();
//Loop through each page that we want to keep
foreach (int page in pagesToKeep)
{
//Add a new blank page to destination document
doc.NewPage();
//Extract the given page from our reader and add it directly to the destination PDF
writer.DirectContent.AddTemplate(writer.GetImportedPage(r, page), 0, 0);
}
//Close our document
doc.Close();
}
}
}
But when I open "TestOutput.pdf" file in acrobat reader all my fields are empty.
Any Help ?

You need something like this:
PdfReader reader = new PdfReader(sourceFile);
reader.SelectPages(2-4,8-9);
PdfStamper stp = new PdfStamper(reader, new FileStream(destinationFile, FileMode.Create));
stp.Close();
reader.Close();

Related

The document has no catalog object (meaning: it's an invalid PDF)

I am reading and writing to the same PDF at the same time i am getting error "The document has no catalog object (meaning: it's an invalid PDF)" on this line "PdfReader pdfReader = new PdfReader(inputPdf2);" in the below code snippet.
iTextSharp.text.pdf.PdfCopy pdfCopy = null;
Document finalPDF = new Document();
//pdfReader = null;
FileStream fileStream = null;
int pageCount = 1;
int TotalPages = 20;
try
{
fileStream = new FileStream(finalPDFFile, FileMode.OpenOrCreate, FileAccess.Write);
pdfCopy = new PdfCopy(finalPDF, fileStream);
finalPDF.Open();
foreach (string inputPdf1 in inputPDFFiles)
{
if (File.Exists(inputPdf1))
{
var bytes = File.ReadAllBytes(inputPdf1);
PdfReader pdfReader = new PdfReader(bytes);
fileStream = new FileStream(inputPdf1, FileMode.Open, FileAccess.Write);
var stamper = new PdfStamper(pdfReader, fileStream);
var acroFields = stamper.AcroFields;
stamper.AcroFields.SetField(acrofiled.Key, "Page " + 1+ " of " + 16);
stamper.FormFlattening = true;
stamper.Close();
stamper.Dispose();
fileStream.Close();
fileStream.Dispose();
pdfReader.Close();
pdfReader.Dispose();
}
}
foreach (string inputPdf2 in inputPDFFiles)
{
if (File.Exists(inputPdf2))
{
PdfReader pdfReader = new PdfReader(inputPdf2);
int pageNumbers = pdfReader.NumberOfPages;
for (int pages = 1; pages <= pageNumbers; pages++)
{
PdfImportedPage page = pdfCopy.GetImportedPage(pdfReader, pages);
PdfCopy.PageStamp pageStamp = pdfCopy.CreatePageStamp(page);
pdfCopy.AddPage(page);
}
pdfReader.Close();
pdfReader.Dispose();
}
}
pdfCopy.Close();
pdfCopy.Dispose();
finalPDF.Close();
finalPDF.Dispose();
fileStream.Close();
fileStream.Dispose();
please help me in order to fix issue or give me any alternate approach
In your first loop you overwrite each of your files with a manipulated version like this:
var bytes = File.ReadAllBytes(inputPdf1);
PdfReader pdfReader = new PdfReader(bytes);
fileStream = new FileStream(inputPdf1, FileMode.Open, FileAccess.Write);
var stamper = new PdfStamper(pdfReader, fileStream);
[...]
Using FileMode.Open here is an error. You want to replace the existing file with a new one, and for such a use case you have to use FileMode.Create or FileMode.Truncate.
Using FileMode.Open results in the original file content remaining there and you writing into it. Thus, if your new file content is shorter than the original one (which can happen when flattening a form), your new file keeps a tail segment of the original file. In PDFs there are relevant lookup information at the end, so upon reading this new file the PdfReader finds the lookup information of the old file which don't match the new content anymore at all.
By the way, you create the PdfCopy like this:
fileStream = new FileStream(finalPDFFile, FileMode.OpenOrCreate, FileAccess.Write);
pdfCopy = new PdfCopy(finalPDF, fileStream);
This is wrong for the same reason: If there already is PDF there, FileMode.OpenOrCreate works just like FileMode.Open with the unwanted effects described above.
Thus, you should replace the FileMode values for streams you write to with FileMode.Create.

how read pdf layers and add to another page using itextsharp?

How can I read PDF layers and add them to another page using iTextSharp?
I want to copy layers from one PDF page and move them to another PDF page in C#.
I've tried reading from a PDF layer from one page, but I am unable to copy that layer.
var document = new Document();
FileStream outfile = new FileStream(outPutFilePath, FileMode.Create);
var writer = new PdfCopy(document, outfile);
document.Open();
foreach (var fileName in filesPath)
{
var reader = new PdfReader(fileName);
PdfStamper stamper = new PdfStamper(reader, outfile);
Dictionary<String, PdfLayer> layers = stamper.GetPdfLayers();
//PdfLayer layer = layers.get("Nested layer 1");
//layer.setOn(false);
for (var i = 1; i <= reader.NumberOfPages; i++)
{
var page = writer.GetImportedPage(reader, i);
page.ContentTagged = true;
writer.AddPage(page);
}
stamper.Close();
reader.Close();
}
writer.Close();
document.Close();

Saving PDF to Local Disk C#

I am trying to save a PDF to the document folder. I have googled and came across a lot of resources but none of them worked for me. I have tried using showfiledialog which did not work. What I want is to save my PDF file to the documents folder. I need this done for a school project and this is the only part that has stumped me. So far this is my code:
private void savePDF_Click(object sender, EventArgs e)
{
FileStream fileStream = new FileStream(nameTxtB.Text + "Repair.pdf", FileMode.Create, FileAccess.Write, FileShare.None);
Document document = new Document();
PdfWriter pdfWriter = PdfWriter.GetInstance(document, fileStream);
pdfWriter.Open();
PdfContentByte cb = pdfWriter.DirectContent;
ColumnText ct = new ColumnText(cb);
document.Open();
...
You should add your content (nameTxtB.Text) to Paragraph not to FileStream
using System.IO;
using iTextSharp.text.pdf;
using iTextSharp.text;
static void Main(string[] args) {
// open the writer
string fileName = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments), "Repair.pdf");
FileStream fs = new FileStream(fileName, FileMode.Create, FileAccess.Write);
Document doc = new Document();
//Create a New instance of PDFWriter Class for Output File
PdfWriter.GetInstance(doc, fs);
//Open the Document
doc.Open();
//Add the content of Text File to PDF File
doc.Add(new Paragraph("Document Content"));
//Close the Document
doc.Close();
System.Diagnostics.Process.Start(fileName);
}

Convert html to pdf and merge it with existing pdfs

I have a System.Net.Mail.MailMessage which shall have it's html body and pdf attachments converted into one single pdf.
Converting the html body to pdf works for me with this answer
Converting the pdf attachments into one pdf works for me with this answer
However after ~10 hours of trying I can not come up with a combined solution which does both. All I'm getting are NullReferenceExceptions somewhere in IText source, "the document is not open", etc...
For example, this will throw no error but the resulting pdf will only contain the attachments but not the html email body:
Document document = new Document();
StringReader sr = new StringReader(mail.Body);
HTMLWorker htmlparser = new HTMLWorker(document);
using (FileStream fs = new FileStream(targetPath, FileMode.Create))
{
PdfCopy writer = new PdfCopy(document, fs);
document.Open();
htmlparser.Parse(sr);
foreach (string fileName in pdfList)
{
PdfReader reader = new PdfReader(fileName);
reader.ConsolidateNamedDestinations();
for (int i = 1; i <= reader.NumberOfPages; i++)
{
PdfImportedPage page = writer.GetImportedPage(reader, i);
writer.AddPage(page);
}
PRAcroForm form = reader.AcroForm;
if (form != null)
{
writer.CopyAcroForm(reader);
}
reader.Close();
}
writer.Close();
document.Close();
}
I'm using the LGPL licensed ITextSharp 4.1.6
From v4.1.6 fanboy to v4.1.6 fanboy :D
Looks like the HTMLWorker is closing the documents stream right after parsing. So as a workaround, you could create a pdf from your mailbody in memory. And then add this one together with the attachment to your final pdf.
Here is some code, that should do the trick:
StringReader htmlStringReader = new StringReader("<html><body>Hello World!!!!!!</body></html>");
byte[] htmlResult;
using (MemoryStream htmlStream = new MemoryStream())
{
Document htmlDoc = new Document();
PdfWriter htmlWriter = PdfWriter.GetInstance(htmlDoc, htmlStream);
htmlDoc.Open();
HTMLWorker htmlWorker = new HTMLWorker(htmlDoc);
htmlWorker.Parse(htmlStringReader);
htmlDoc.Close();
htmlResult = htmlStream.ToArray();
}
byte[] pdfResult;
using (MemoryStream pdfStream = new MemoryStream())
{
Document doc = new Document();
PdfCopy copyWriter = new PdfCopy(doc, pdfStream);
doc.Open();
PdfReader htmlPdfReader = new PdfReader(htmlResult);
AppendPdf(copyWriter, htmlPdfReader); // your foreach pdf code here
htmlPdfReader.Close();
PdfReader attachmentReader = new PdfReader("C:\\temp\\test.pdf");
AppendPdf(copyWriter, attachmentReader);
attachmentReader.Close();
doc.Close();
pdfResult = pdfStream.ToArray();
}
using (FileStream fs = new FileStream("C:\\temp\\test2.pdf", FileMode.Create, FileAccess.Write))
{
fs.Write(pdfResult, 0, pdfResult.Length);
}
private void AppendPdf(PdfCopy writer, PdfReader reader)
{
for (int i = 1; i <= reader.NumberOfPages; i++)
{
PdfImportedPage page = writer.GetImportedPage(reader, i);
writer.AddPage(page);
}
}
Ofc you could directly use a FileStream for the final document instead of a MemoryStream as well.

Insert HTML directly in PDF using itextsharp

I have WPF application in which user enters some text in rich text box(rtb), I convert that rtb string to HTML and then convert that HTML to image and then insert it in the PDF document
using (Stream inputPdfStream = new FileStream("sample.pdf", FileMode.Open, FileAccess.Read, FileShare.Read))
using (Stream outputPdfStream = new FileStream("result2.pdf", FileMode.Create, FileAccess.Write, FileShare.None))
{
var reader = new PdfReader(inputPdfStream);
var stamper = new PdfStamper(reader, outputPdfStream);
PdfContentByte pdfContentByte = null;
int c = reader.NumberOfPages;
iTextSharp.text.Image image = TextSharp.text.Image.GetInstance(ConvertXamltohtmltoImage(xamlstring));
foreach (var item in lst)
{
image.ScaleToFit(item._Size.Width, item._Size.Height);
image.SetAbsolutePosition(item.Location.X, item.Location.Y);
pdfContentByte = stamper.GetOverContent(item.pageNo);
pdfContentByte.AddImage(image);
}
stamper.Close();
}
My question is can I insert HTML directly into PDF?
You need an extra DLL to do that: http://sourceforge.net/projects/itextsharp/files/xmlworker/
See the demo: http://demo.itextsupport.com/xmlworker/
Unfortunately, the documentation hasn't been updated recently. We're working on it.

Categories