How to redact a large rectangle of a PDF by iTextSharp? - c#

I tried to use iTextSharp 5.5.9 to redact PDF files. The problem is when I redact a large rectangle field on a PDF, it can not save the file. This is the code:
PdfReader reader1 = new PdfReader(new FileStream(DesFile, FileMode.Open));
Stream fs = new FileStream(DesFile, FileMode.Open);
PdfStamper stamper = new PdfStamper(reader1, fs);
List<PdfCleanUpLocation> cleanUpLocations = new List<PdfCleanUpLocation>();
cleanUpLocations.Add(new PdfCleanUpLocation(1, new Rectangle(77f,77f,600f,600f), BaseColor.GRAY));
PdfCleanUpProcessor cleaner = new PdfCleanUpProcessor(cleanUpLocations, stamper);
cleaner.CleanUp();
stamper.Close();
reader1.Close();
I use the http://sox.sourceforge.net/sox.pdf to test, if I change the Rectangle to
new Rectangle(77f,77f,200f,200f)
It will work well... But when I change back the larger Rectangle:
new Rectangle(77f,77f,600f,600f)
It stops working. Please help!

iText development usually warns against stamping to the same file the underlying PdfReader reads from. If done as in the OP's code, reading and writing operations can get into each other's way, the results being unpredictable.
After using different files to read from and write to, the OP's solution started working.
If one first reads the source file into memory as a byte[] and then constructs the PdfReader from that array, it is possible to use the same file as output of a PdfStamper operating on that reader. But this pattern is not recommended either: If some problem occurs during stamping, the original file contents may already have been removed, so you have neither the unstamped original PDF nor a stamped result PDF.
It might be embarrassing to have to explain to the client that his documents are completely gone for good...

Related

Replacing an image

I am trying to add or replace an image in an existing PDF file using iTextSharp. The file has 3 layers which are required by the printing company. The content on these layers cannot be merged.
Thus far I have tried many of the code examples (most don't seem to be in C# and can't find the conversion from java). The closest example is:
PdfReader reader = new PdfReader(this.FrontPDFFile);
PdfStamper stamper = new PdfStamper(reader, new FileStream(this.OutputDirectory, FileMode.Create));
var pdfContentBuffer = stamper.GetOverContent(1);
// get image from our api
System.Drawing.Image image = GenerateQRCode("GUOIO", 1000, 1000);
// convert to itextsharp image and insert
iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance(image, iTextSharp.text.BaseColor.WHITE);
img.SetAbsolutePosition(100, 100);
img.ScaleToFit(100, 100);
pdfContentBuffer.AddImage(img, true);
stamper.Close();
This generates the pdf with the image on, however opening in Illustrator it is not shown, this is likely to be something to do with layers (I am told). Anyone got any ideas?

How to convert PdfContentBytes to Array of Bytes

I'm using iTextSharp DLLin asp.net.
PdfReader reader = new PdfReader(path);
//create footer
MemoryStream outStream = new MemoryStream();
PdfStamper textStamp = new PdfStamper(reader, outStream);
BaseFont baseFont = BaseFont.CreateFont(BaseFont.HELVETICA_BOLD, Encoding.ASCII.EncodingName, false);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
PdfContentByte pdfPageContents = textStamp.GetOverContent(i);
//How to convert the PdfContentByte to array of bytes here?
}
I want to convert each page of the PDF to JPEG. How to convert the PdfContentByte to array of bytes here?
I don't think your plan is going to work. Not everything that looks like it lives on a "page" actually lives on a page, some things live in a global shared location. So extracting a page's bytes would give you a corrupt document. You could extract every page in a PDF to separate files which would bring over these shared resources but that still is in the PDF format. If you have already written a PDF-to-JPEG routine then maybe you're OK. If you haven't, then iTextSharp won't be able to help you.
iTextSharp doesn't (currently) "know" what a PDF "looks" like, it only knows the contents of the PDF. It "knows" that a run of text exists but it doesn't "know" how that should be rendered visually. It "knows" that a PDF might have two images but doesn't "know" or even care if they overlap, once again that's the renderer's problem.
Once again, if you've written a PDF-to-JPEG routine then disregard all that I'm saying. But the bytes of a PDF have nothing in common with the bytes of JPEG. Although a PDF may contain a JPEG it can also contain many other types of binary data. And that data is probably compressed inside of a stream, too.
Now, if you're looking to just extract images from a PDF, that is something that iTextSharp can help you with.
Try this:
PdfReader reader = new PdfReader(path);
MemoryStream outStream = new MemoryStream();
PdfStamper textStamp = new PdfStamper(reader, outStream);
byte[] content = outStream.ToArray();
You can get a byte[] of a PdfContentByte as follows:
pdfPageContents.getInternalBuffer().toByteArray();

Creating report with Aspose.Word without losing formatting

I am using Aspose.Words to create reports from a template file (.docx filetype).
After using Aspose.Words to modify the template file and saving it into a new file, the formatting of the template file were lost (such as bold text, comments, etc).
I have tried:
Aspose.Words.Document doc = new Document(inputStream);
var outputStream = new MemoryStream();
doc.Save(outputStream, SaveFormat.docx);
What I did not expect is that outputStream is much less bytes than inputStream although I have yet to make any modification on doc. It may the reason why the report file lose their formatting.
What should I try now?
Ok, the problem is because the current version of Aspose.Words I'm using does not support docx filetype. But it still can read text of a .docx file, and only text(without any associated formatting).

Using iTextSharp to write data to PDF works great, but Acrobat Reader asks 'Do you want to save changes' when closing file

I'm using iTextSharp 5.3.2.0 to add information to an existing PDF file that contains a W-2 form. Everything is working perfectly and the PDF file looks great when written into the browser's response stream; however, when the user is done looking at the PDF, he is asked "Do you want to save changes to 'W2.pdf' before closing?" every time he views the document from the web page.
In trying to narrow the problem down, I've actually stripped out all of my modifications but the problem continues. Here's the simple version of my code, with my data-writing call commented out:
PdfReader pdfReader = new PdfReader(dataSource.ReportTemplate);
using(MemoryStream outputStream = new MemoryStream())
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, outputStream))
{
//dataSource.DrawDataFields(pdfStamper);
pdfStamper.FormFlattening = true;
return outputStream;
}
In this case, the "empty" PDF is written to the browser and looks good, but I still get asked, "Do you want to save" when I close the Acrobat window.
At this point I was thinking that there was something wrong with the source PDF file. However, when I send back the PDF file's raw bytes to the browser, I am NOT asked the "Do you want to save" question when using the code below.
byte[] bytes = File.ReadAllBytes(dataSource.ReportTemplate);
using (MemoryStream outputStream = new MemoryStream())
{
outputStream.Write(bytes, 0, bytes.Length);
return outputStream;
}
My conclusion is that iTextSharp is doing something "bad" to the PDF in the process of opening it and writing the bytes to the stream, but I'm new to iTextSharp and could easily be missing something.
FWIW, this is Acobat Reader 10.1.4 that we're talking about.
EDIT: The original PDF used as a template is approximately 80K in size. If I look at the temporary file that's been streamed down through my browser, the PDF file written by iTextSharp is approximately 150K. However, when I answer "Yes" to the "Save Changes" question asked by Acrobat Reader, the resulting file is approximately 80K again. iTextSharp is definitely doing something unexpected to this file.
Non-working:
public byte[] MergeDataByDrawing(int copies)
{
PdfReader pdfReader = new PdfReader(reportTemplate);
using (MemoryStream outputStream = new MemoryStream())
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, outputStream))
{
pdfStamper.FormFlattening = true;
return outputStream.GetBuffer();
}
}
Working:
public byte[] MergeDataByDrawing(int copies)
{
PdfReader pdfReader = new PdfReader(reportTemplate);
using (MemoryStream outputStream = new MemoryStream())
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, outputStream))
{
pdfStamper.FormFlattening = true;
return outputStream.ToArray();
}
}
Seems the GetBuffer method is a problem. I don't understand why, but I'll take the result!
Props to MKL for giving me an idea and Fredrik for the right example at the right time.
See http://itextpdf.com/history/?branch=52&node=521
Bugfix AcroForms: In some cases, Adobe Reader X asks if you want to
"save changes" after closing a flattened PDF form. This was due to the
presence of some unnecessary entries in the /AcroForm dictionary (for
instance added when the form was created with OOo).
I'm the Bruno who fixed this bug. I remember that it occurred in Adobe Reader 10, but not in Adobe Reader 9. I was able to fix the bug because the person reporting it was a customer who sent me a PDF that showed this behavior.
If you would share your PDF, we could take a look and see what other entries should be removed from the /AcroForm dictionary. I only removed those that were added when the form is created using Open Office. If you don't want to share the PDF, the cause will always remain a mystery.

iTextSharp for PDF - how add file attachments?

I am using iTextSharp to create a PDF document in C#. I would like to attach another file to the PDF. I'm having just loads of trouble trying to do so. The examples here show some annotations, which apparently attachments are.
This is what I've tried:
writer.AddAnnotation(its.pdf.PdfAnnotation.CreateFileAttachment(writer, new iTextSharp.text.Rectangle(100,100,100,100), "File Attachment", its.pdf.PdfFileSpecification.FileExtern(writer, "C:\\test.xml")));
Well, what happens is it does add an annotation on the PDF (appears as a little comment voice balloon), which i don't want. test.xml is shown in the attachments pane in Adobe Reader, but it can't be read or saved, and its file size is unknown so it's likely that it's never being properly attached.
Any suggestions?
Well, I got some code working to attach it:
its.Document PDFD = new its.Document(its.PageSize.LETTER);
its.pdf.PdfWriter writer;
writer = its.pdf.PdfWriter.GetInstance(PDFD, new FileStream(targetpath, FileMode.Create));
its.pdf.PdfFileSpecification pfs = its.pdf.PdfFileSpecification.FileEmbedded(writer, "C:\\test.xml", "New.xml", null);
writer.AddFileAttachment(pfs);
where "its"="iTextSharp.text"
Now to read the attachment!

Categories