How to convert PdfContentBytes to Array of Bytes - c#

I'm using iTextSharp DLLin asp.net.
PdfReader reader = new PdfReader(path);
//create footer
MemoryStream outStream = new MemoryStream();
PdfStamper textStamp = new PdfStamper(reader, outStream);
BaseFont baseFont = BaseFont.CreateFont(BaseFont.HELVETICA_BOLD, Encoding.ASCII.EncodingName, false);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
PdfContentByte pdfPageContents = textStamp.GetOverContent(i);
//How to convert the PdfContentByte to array of bytes here?
}
I want to convert each page of the PDF to JPEG. How to convert the PdfContentByte to array of bytes here?

I don't think your plan is going to work. Not everything that looks like it lives on a "page" actually lives on a page, some things live in a global shared location. So extracting a page's bytes would give you a corrupt document. You could extract every page in a PDF to separate files which would bring over these shared resources but that still is in the PDF format. If you have already written a PDF-to-JPEG routine then maybe you're OK. If you haven't, then iTextSharp won't be able to help you.
iTextSharp doesn't (currently) "know" what a PDF "looks" like, it only knows the contents of the PDF. It "knows" that a run of text exists but it doesn't "know" how that should be rendered visually. It "knows" that a PDF might have two images but doesn't "know" or even care if they overlap, once again that's the renderer's problem.
Once again, if you've written a PDF-to-JPEG routine then disregard all that I'm saying. But the bytes of a PDF have nothing in common with the bytes of JPEG. Although a PDF may contain a JPEG it can also contain many other types of binary data. And that data is probably compressed inside of a stream, too.
Now, if you're looking to just extract images from a PDF, that is something that iTextSharp can help you with.

Try this:
PdfReader reader = new PdfReader(path);
MemoryStream outStream = new MemoryStream();
PdfStamper textStamp = new PdfStamper(reader, outStream);
byte[] content = outStream.ToArray();

You can get a byte[] of a PdfContentByte as follows:
pdfPageContents.getInternalBuffer().toByteArray();

Related

How to redact a large rectangle of a PDF by iTextSharp?

I tried to use iTextSharp 5.5.9 to redact PDF files. The problem is when I redact a large rectangle field on a PDF, it can not save the file. This is the code:
PdfReader reader1 = new PdfReader(new FileStream(DesFile, FileMode.Open));
Stream fs = new FileStream(DesFile, FileMode.Open);
PdfStamper stamper = new PdfStamper(reader1, fs);
List<PdfCleanUpLocation> cleanUpLocations = new List<PdfCleanUpLocation>();
cleanUpLocations.Add(new PdfCleanUpLocation(1, new Rectangle(77f,77f,600f,600f), BaseColor.GRAY));
PdfCleanUpProcessor cleaner = new PdfCleanUpProcessor(cleanUpLocations, stamper);
cleaner.CleanUp();
stamper.Close();
reader1.Close();
I use the http://sox.sourceforge.net/sox.pdf to test, if I change the Rectangle to
new Rectangle(77f,77f,200f,200f)
It will work well... But when I change back the larger Rectangle:
new Rectangle(77f,77f,600f,600f)
It stops working. Please help!
iText development usually warns against stamping to the same file the underlying PdfReader reads from. If done as in the OP's code, reading and writing operations can get into each other's way, the results being unpredictable.
After using different files to read from and write to, the OP's solution started working.
If one first reads the source file into memory as a byte[] and then constructs the PdfReader from that array, it is possible to use the same file as output of a PdfStamper operating on that reader. But this pattern is not recommended either: If some problem occurs during stamping, the original file contents may already have been removed, so you have neither the unstamped original PDF nor a stamped result PDF.
It might be embarrassing to have to explain to the client that his documents are completely gone for good...

Replacing an image

I am trying to add or replace an image in an existing PDF file using iTextSharp. The file has 3 layers which are required by the printing company. The content on these layers cannot be merged.
Thus far I have tried many of the code examples (most don't seem to be in C# and can't find the conversion from java). The closest example is:
PdfReader reader = new PdfReader(this.FrontPDFFile);
PdfStamper stamper = new PdfStamper(reader, new FileStream(this.OutputDirectory, FileMode.Create));
var pdfContentBuffer = stamper.GetOverContent(1);
// get image from our api
System.Drawing.Image image = GenerateQRCode("GUOIO", 1000, 1000);
// convert to itextsharp image and insert
iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance(image, iTextSharp.text.BaseColor.WHITE);
img.SetAbsolutePosition(100, 100);
img.ScaleToFit(100, 100);
pdfContentBuffer.AddImage(img, true);
stamper.Close();
This generates the pdf with the image on, however opening in Illustrator it is not shown, this is likely to be something to do with layers (I am told). Anyone got any ideas?

How could I make WebBrowser.Navigate analog for bytes array, instead of URL?

I planning to load PDF files into it, but I can't save those to disk. PDFs exist only as byte arrays in my program.
For text data I can use something like this:
webBrowser1.DocumentText = "<html>page content</html>";
But PDF is not text, so I need some other way, but can't find any.
I tried this:
byte[] file_content = File.ReadAllBytes("C:\\Users\\Metafalica\\Documents\\DatabaseSQLLanguageRzheutskaya.pdf");
MemoryStream ms = new MemoryStream(file_content);
ms.Flush();
ms.Position = 0;
webBrowser1.DocumentStream = ms;
But getting this:
It's not possible to load and render a PDF via webBrowser.DocumentStream. What happens behind the scene is that an instance of MSHTML Document Object gets created and initialized with the supplied stream. You could possibly load an image (which MIME type is recognized by MSHTML), but not a PDF. On the other hand, when webBrowser.Navigate is used, an instance of Adobe Acrobat Reader PDF Document gets created, rather than MSHTML.

Merging two PDF pages on top of each other

I am looking for a way to merge the content of two pdf pages.
It could be a watermark, an image or whatever.
The scenario is as follows:
I have a Word-addin that allows the user to create different templates for different customers based on several template forms. For each new customer, the user can provide a new letter paper containing header image / logos and footer. This shall be applied anyhow at runtime. Could be an image that is loaded directly into the header of the template (then I would need to render pdf to image, for the letter paper will mostly be provided as pdf-file) or when exporting the document (merging letter paper as background).
But the template shall not be accessible by the user, so this must be done programmatically.
So far, I tried Pdfsharp library, which does not support neither the version of my provided backpapers, nor the version of my documents created in Word 2007.
iTextSharp seemed very promising, but I could not manage to merge the contents so far.
I also tried pdftk.exe, but even when i ran it manually from command line, I got the error: "Done. Input errors, so no output created."
It does not matter how it is handled, but the output matters.
I forgot to mention, there is a whiteline created in the Word-template for archiving purposes, so this part may not be added as image or it has to be added afterwords into the output document.
Thanks in advance!
StampStationery.cs, a sample from the Webified iTextSharp Examples which essentially are the C#/iTextSharp versions of the Java/iText samples from the book iText in Action — 2nd Edition, does show how to add the contents of a page from one PDF document as stationery behind the content of each page of another PDF.
The central method is this:
public byte[] ManipulatePdf(byte[] src, byte[] stationery)
{
// Create readers
PdfReader reader = new PdfReader(src);
PdfReader s_reader = new PdfReader(stationery);
using (MemoryStream ms = new MemoryStream())
{
// Create the stamper
using (PdfStamper stamper = new PdfStamper(reader, ms))
{
// Add the stationery to each page
PdfImportedPage page = stamper.GetImportedPage(s_reader, 1);
int n = reader.NumberOfPages;
PdfContentByte background;
for (int i = 1; i <= n; i++)
{
background = stamper.GetUnderContent(i);
background.AddTemplate(page, 0, 0);
}
}
return ms.ToArray();
}
}
This method returns the manipulated PDF as a byte[].

Using iTextSharp to write data to PDF works great, but Acrobat Reader asks 'Do you want to save changes' when closing file

I'm using iTextSharp 5.3.2.0 to add information to an existing PDF file that contains a W-2 form. Everything is working perfectly and the PDF file looks great when written into the browser's response stream; however, when the user is done looking at the PDF, he is asked "Do you want to save changes to 'W2.pdf' before closing?" every time he views the document from the web page.
In trying to narrow the problem down, I've actually stripped out all of my modifications but the problem continues. Here's the simple version of my code, with my data-writing call commented out:
PdfReader pdfReader = new PdfReader(dataSource.ReportTemplate);
using(MemoryStream outputStream = new MemoryStream())
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, outputStream))
{
//dataSource.DrawDataFields(pdfStamper);
pdfStamper.FormFlattening = true;
return outputStream;
}
In this case, the "empty" PDF is written to the browser and looks good, but I still get asked, "Do you want to save" when I close the Acrobat window.
At this point I was thinking that there was something wrong with the source PDF file. However, when I send back the PDF file's raw bytes to the browser, I am NOT asked the "Do you want to save" question when using the code below.
byte[] bytes = File.ReadAllBytes(dataSource.ReportTemplate);
using (MemoryStream outputStream = new MemoryStream())
{
outputStream.Write(bytes, 0, bytes.Length);
return outputStream;
}
My conclusion is that iTextSharp is doing something "bad" to the PDF in the process of opening it and writing the bytes to the stream, but I'm new to iTextSharp and could easily be missing something.
FWIW, this is Acobat Reader 10.1.4 that we're talking about.
EDIT: The original PDF used as a template is approximately 80K in size. If I look at the temporary file that's been streamed down through my browser, the PDF file written by iTextSharp is approximately 150K. However, when I answer "Yes" to the "Save Changes" question asked by Acrobat Reader, the resulting file is approximately 80K again. iTextSharp is definitely doing something unexpected to this file.
Non-working:
public byte[] MergeDataByDrawing(int copies)
{
PdfReader pdfReader = new PdfReader(reportTemplate);
using (MemoryStream outputStream = new MemoryStream())
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, outputStream))
{
pdfStamper.FormFlattening = true;
return outputStream.GetBuffer();
}
}
Working:
public byte[] MergeDataByDrawing(int copies)
{
PdfReader pdfReader = new PdfReader(reportTemplate);
using (MemoryStream outputStream = new MemoryStream())
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, outputStream))
{
pdfStamper.FormFlattening = true;
return outputStream.ToArray();
}
}
Seems the GetBuffer method is a problem. I don't understand why, but I'll take the result!
Props to MKL for giving me an idea and Fredrik for the right example at the right time.
See http://itextpdf.com/history/?branch=52&node=521
Bugfix AcroForms: In some cases, Adobe Reader X asks if you want to
"save changes" after closing a flattened PDF form. This was due to the
presence of some unnecessary entries in the /AcroForm dictionary (for
instance added when the form was created with OOo).
I'm the Bruno who fixed this bug. I remember that it occurred in Adobe Reader 10, but not in Adobe Reader 9. I was able to fix the bug because the person reporting it was a customer who sent me a PDF that showed this behavior.
If you would share your PDF, we could take a look and see what other entries should be removed from the /AcroForm dictionary. I only removed those that were added when the form is created using Open Office. If you don't want to share the PDF, the cause will always remain a mystery.

Categories