I have a stream (PDF file with annotations) and another stream (the same PDF file without annotations). I use streams because I need to execute this operations in memory.
I need to copy annotations from first document to another. Annotations can be different: comments, highlighting and other. So it is better to copy annotations without parsing it.
Can you advice me some helpful PDF library for .NET? And some sample for this problem.
You can use this example for iTextSharp to approach your problem (this example copies a list of pdf files with annotations into a new pdf file):
var output = new MemoryStream();
using (var document = new Document(PageSize.A4, 70f, 70f, 20f, 20f))
{
var readers = new List<PdfReader>();
var writer = PdfWriter.GetInstance(document, output);
writer.CloseStream = false;
document.Open();
const Int32 requiredWidth = 500;
const Int32 zeroBottom = 647;
const Int32 left = 50;
Action<String, Action> inlcudePdfInDocument = (filename, e) =>
{
var reader = new PdfReader(filename);
readers.Add(reader);
var pageCount = reader.NumberOfPages;
for (var i = 0; i < pageCount; i++)
{
e?.Invoke();
var imp = writer.GetImportedPage(reader, (i + 1));
var scale = requiredWidth / imp.Width;
var height = imp.Height * scale;
writer.DirectContent.AddTemplate(imp, scale, 0, 0, scale, left, zeroBottom - height);
var annots = reader.GetPageN(i + 1).GetAsArray(PdfName.ANNOTS);
if (annots != null && annots.Size != 0)
{
foreach (var a in annots)
{
var newannot = new PdfAnnotation(writer, new Rectangle(0, 0));
var annotObj = (PdfDictionary) PdfReader.GetPdfObject(a);
newannot.PutAll(annotObj);
var rect = newannot.GetAsArray(PdfName.RECT);
rect[0] = new PdfNumber(((PdfNumber)rect[0]).DoubleValue * scale + left); // Left
rect[1] = new PdfNumber(((PdfNumber)rect[1]).DoubleValue * scale); // top
rect[2] = new PdfNumber(((PdfNumber)rect[2]).DoubleValue * scale + left); // right
rect[3] = new PdfNumber(((PdfNumber)rect[3]).DoubleValue * scale); // bottom
writer.AddAnnotation(newannot);
}
}
document.NewPage();
}
}
foreach (var apprPdf in pdfs)
{
document.NewPage();
inlcudePdfInDocument(apprPdf.Pdf, null);
}
document.Close();
readers.ForEach(x => x.Close());
}
output.Position = 0;
return output;
PdfReader has a constructor that takes an array of bytes so you can adapt it for MemoryStream.
I'm using ITextSharp which is forked from IText (a java implemenation fpr pdf editing).
http://sourceforge.net/projects/itextsharp/
http://itextpdf.com/
Edit - this is what you need to do (untested but shoul be close):
using System;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
// return processed stream (a new MemoryStream)
public Stream copyAnnotations(Stream sourcePdfStream, Stream destinationPdfStream)
{
// Create new document (IText)
Document outdoc = new Document(PageSize.A4);
// Seek to Stream start and create Reader for input PDF
m.Seek(0, SeekOrigin.Begin);
PdfReader inputPdfReader = new PdfReader(sourcePdfStream);
// Seek to Stream start and create Reader for destination PDF
m.Seek(0, SeekOrigin.Begin);
PdfReader destinationPdfReader = new PdfReader(destinationPdfStream);
// Create a PdfWriter from for new a pdf destination stream
// You should write into a new stream here!
Stream processedPdf = new MemoryStream();
PdfWriter pdfw = PdfWriter.GetInstance(outdoc, processedPdf);
// do not close stream if we've read everything
pdfw.CloseStream = false;
// Open document
outdoc.Open();
// get number of pages
int numPagesIn = inputPdfReader.NumberOfPages;
int numPagesOut = destinationPdfReader.NumberOfPages;
int max = numPagesIn;
// Process max number of pages
if (max<numPagesOut)
{
throw new Exception("Impossible - different number of pages");
}
int i = 0;
// Process Pdf pages
while (i < max)
{
// Import pages from corresponding reader
PdfImportedPage pageIn = writer.inputPdfReader(reader, i);
PdfImportedPage pageOut = writer.destinationPdfReader(reader, i);
// Get named destinations (annotations
List<Annotations> toBeAdded = ParseInAndOutAndGetAnnotations(pageIn, pageOut);
// add your annotations
foreach (Annotation anno in toBeAdded) pageOut.Add(anno);
// Add processed page to output PDFWriter
outdoc.Add(pageOut);
}
// PDF creation finished
outdoc.Close();
// your new destination stream is processedPdf
return processedPdf;
}
The implementation of ParseInAndOutAndGetAnnotations(pageIn, pageOut) needs to reflect your annotations.
Here is a good example with annotations: http://www.java2s.com/Open-Source/Java-Document/PDF/pdf-itext/com/lowagie/text/pdf/internal/PdfAnnotationsImp.java.htm
Related
I am trying to crop pdf 5 mm from every edge i.e top,bottom,right and left. I tried with below code
public void TrimPdf(string sourceFilePath, string outputFilePath)
{
PdfReader pdfReader = new PdfReader(sourceFilePath);
float widthTo_Trim = iTextSharp.text.Utilities.MillimetersToPoints(5);
using (FileStream output = new FileStream(outputFilePath, FileMode.Create, FileAccess.Write))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, output))
{
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
Rectangle cropBox = pdfReader.GetCropBox(page);
cropBox.Left += widthTo_Trim;
cropBox.Right += widthTo_Trim;
cropBox.Top += widthTo_Trim;
cropBox.Bottom += widthTo_Trim;
pdfReader.GetPageN(page).Put(PdfName.CROPBOX, new PdfRectangle(cropBox));
}
}
}
By using this code i am Able to Crop only Left and Bottom part. unable to crop top and right side
How can i get desire result ?
This solved my problem by using Below code
public void TrimLeftandRightFoall(string sourceFilePath, string outputFilePath, float cropwidth)
{
PdfReader pdfReader = new PdfReader(sourceFilePath);
float width = (float)GetPDFwidth(sourceFilePath);
float height = (float)GetPDFHeight(sourceFilePath);
float widthTo_Trim = iTextSharp.text.Utilities.MillimetersToPoints(cropwidth);
PdfRectangle rectLeftside = new PdfRectangle(widthTo_Trim, widthTo_Trim, width-widthTo_Trim , height-widthTo_Trim);
using (var output = new FileStream(outputFilePath, FileMode.CreateNew, FileAccess.Write))
{
// Create a new document
Document doc = new Document();
// Make a copy of the document
PdfSmartCopy smartCopy = new PdfSmartCopy(doc, output);
// Open the newly created document
doc.Open();
// Loop through all pages of the source document
for (int i = 1; i <= pdfReader.NumberOfPages; i++)
{
// Get a page
var page = pdfReader.GetPageN(i);
page.Put(PdfName.MEDIABOX, rectLeftside);
var copiedPage = smartCopy.GetImportedPage(pdfReader, i);
smartCopy.AddPage(copiedPage);
}
doc.Close();
}
}
I had been used the first code create a water mark and removed and it work perfect
private void Form1_Load(object sender, EventArgs e) {
string workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
string startFile = Path.Combine(workingFolder, "StartFile.pdf");
string watermarkedFile = Path.Combine(workingFolder, "Watermarked.pdf");
string unwatermarkedFile = Path.Combine(workingFolder, "Un-watermarked.pdf");
string watermarkText = "This is a test";
//SECTION 1
//Create a 5 page PDF, nothing special here
using (FileStream fs = new FileStream(startFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document(PageSize.LETTER)) {
using (PdfWriter witier = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
for (int i = 1; i <= 5; i++) {
doc.NewPage();
doc.Add(new Paragraph(String.Format("This is page {0}", i)));
}
doc.Close();
}
}
}
//SECTION 2
//Create our watermark on a separate layer. The only different here is that we are adding the watermark to a PdfLayer which is an OCG or Optional Content Group
PdfReader reader1 = new PdfReader(startFile);
using (FileStream fs = new FileStream(watermarkedFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (PdfStamper stamper = new PdfStamper(reader1, fs)) {
int pageCount1 = reader1.NumberOfPages;
//Create a new layer
PdfLayer layer = new PdfLayer("WatermarkLayer", stamper.Writer);
for (int i = 1; i <= pageCount1; i++) {
iTextSharp.text.Rectangle rect = reader1.GetPageSize(i);
//Get the ContentByte object
PdfContentByte cb = stamper.GetUnderContent(i);
//Tell the CB that the next commands should be "bound" to this new layer
cb.BeginLayer(layer);
cb.SetFontAndSize(BaseFont.CreateFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED), 50);
PdfGState gState = new PdfGState();
gState.FillOpacity = 0.25f;
cb.SetGState(gState);
cb.SetColorFill(BaseColor.BLACK);
cb.BeginText();
cb.ShowTextAligned(PdfContentByte.ALIGN_CENTER, watermarkText, rect.Width / 2, rect.Height / 2, 45f);
cb.EndText();
//"Close" the layer
cb.EndLayer();
}
}
}
//SECTION 3
//Remove the layer created above
//First we bind a reader to the watermarked file, then strip out a bunch of things, and finally use a simple stamper to write out the edited reader
PdfReader reader2 = new PdfReader(watermarkedFile);
//NOTE, This will destroy all layers in the document, only use if you don't have additional layers
//Remove the OCG group completely from the document.
//reader2.Catalog.Remove(PdfName.OCPROPERTIES);
//Clean up the reader, optional
reader2.RemoveUnusedObjects();
//Placeholder variables
PRStream stream;
String content;
PdfDictionary page;
PdfArray contentarray;
//Get the page count
int pageCount2 = reader2.NumberOfPages;
//Loop through each page
for (int i = 1; i <= pageCount2; i++) {
//Get the page
page = reader2.GetPageN(i);
//Get the raw content
contentarray = page.GetAsArray(PdfName.CONTENTS);
if (contentarray != null) {
//Loop through content
for (int j = 0; j < contentarray.Size; j++) {
//Get the raw byte stream
stream = (PRStream)contentarray.GetAsStream(j);
//Convert to a string. NOTE, you might need a different encoding here
content = System.Text.Encoding.ASCII.GetString(PdfReader.GetStreamBytes(stream));
//Look for the OCG token in the stream as well as our watermarked text
if (content.IndexOf("/OC") >= 0 && content.IndexOf(watermarkText) >= 0) {
//Remove it by giving it zero length and zero data
stream.Put(PdfName.LENGTH, new PdfNumber(0));
stream.SetData(new byte[0]);
}
}
}
}
//Write the content out
using (FileStream fs = new FileStream(unwatermarkedFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (PdfStamper stamper = new PdfStamper(reader2, fs)) {
}
}
this.Close();
}
}
}
but when I used digital sign some time when a I tried to remove the watermark some page lost the information , them I'm would like to used the PdfLayerRemover and I change to C# but when I made this
PdfDictionary ocProps = _reader.Catalog.GetAsDict(PdfName.OCPROPERTIES);
I had been used the first code create a water mark and removed and it work perfect but when I used digital sign some time when a I tried to remove the watermark some page lost the information , them I'm would like to used the PdfLayerRemover and I change to C# but when I made this
PdfDictionary ocProps = _reader.Catalog.GetAsDict(PdfName.OCPROPERTIES);
always get null value and the pdf has the watermark??? please if can help me and the pdf has ocpproperties because when I use this works but it remove additional information . but the way Itextshap version 5.5.2.0
if (content.IndexOf("/OC") >= 0 && content.IndexOf(watermarkText) >= 0) {
//Remove it by giving it zero length and zero data
stream.Put(PdfName.LENGTH, new PdfNumber(0));
stream.SetData(new byte[0]);
}
I am trying to add text to an existing PDF file using iTextSharp. I have been reading many posts, including the popular thread here.
I have some differences:
My PDF are X pages long
I want to keep everything in memory, and never have a file stored on my filesystem
So I tried to modify the code, so it takes in a byte array and returns a byte array. I have come this far:
The code compiles and runs
My out byte array has a different length than my in byte array
My problem:
I cannot see my added text when i later store the modified byte array and open it in my PDF reader
I don't get why. From every StackOverflow post I have seen, I do the same. using the DirectContent, I use BeginText and write a text. However, i cannot see it, no matter how I move the position around.
Any idea what is missing from my code?
public static byte[] WriteIdOnPdf(byte[] inPDF, string str)
{
byte[] finalBytes;
// open the reader
using (PdfReader reader = new PdfReader(inPDF))
{
Rectangle size = reader.GetPageSizeWithRotation(1);
using (Document document = new Document(size))
{
// open the writer
using (MemoryStream ms = new MemoryStream())
{
using (PdfWriter writer = PdfWriter.GetInstance(document, ms))
{
document.Open();
for (var i = 1; i <= reader.NumberOfPages; i++)
{
document.NewPage();
var baseFont = BaseFont.CreateFont(BaseFont.HELVETICA_BOLD, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
var importedPage = writer.GetImportedPage(reader, i);
var contentByte = writer.DirectContent;
contentByte.BeginText();
contentByte.SetFontAndSize(baseFont, 18);
var multiLineString = "Hello,\r\nWorld!";
contentByte.ShowTextAligned(PdfContentByte.ALIGN_LEFT, multiLineString,100, 200, 0);
contentByte.EndText();
contentByte.AddTemplate(importedPage, 0, 0);
}
document.Close();
ms.Close();
writer.Close();
reader.Close();
}
finalBytes = ms.ToArray();
}
}
}
return finalBytes;
}
The code below shows off a full-working example of creating a PDF in memory and then performing a second pass, also in memory. It does what #mkl says and closes all iText parts before trying to grab the raw bytes from the stream. It also uses GetOverContent() to draw "on top" of the previous pdf. See the code comments for more details.
//Bytes will hold our final PDFs
byte[] bytes;
//Create an in-memory PDF
using (var ms = new MemoryStream()) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, ms)) {
doc.Open();
//Create a bunch of pages and add text, nothing special here
for (var i = 1; i <= 10; i++) {
doc.NewPage();
doc.Add(new Paragraph(String.Format("First Pass - Page {0}", i)));
}
doc.Close();
}
}
//Right before disposing of the MemoryStream grab all of the bytes
bytes = ms.ToArray();
}
//Another in-memory PDF
using (var ms = new MemoryStream()) {
//Bind a reader to the bytes that we created above
using (var reader = new PdfReader(bytes)) {
//Store our page count
var pageCount = reader.NumberOfPages;
//Bind a stamper to our reader
using (var stamper = new PdfStamper(reader, ms)) {
//Setup a font to use
var baseFont = BaseFont.CreateFont(BaseFont.HELVETICA_BOLD, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
//Loop through each page
for (var i = 1; i <= pageCount; i++) {
//Get the raw PDF stream "on top" of the existing content
var cb = stamper.GetOverContent(i);
//Draw some text
cb.BeginText();
cb.SetFontAndSize(baseFont, 18);
cb.ShowText(String.Format("Second Pass - Page {0}", i));
cb.EndText();
}
}
}
//Once again, grab the bytes before closing things out
bytes = ms.ToArray();
}
//Just to see the final results I'm writing these bytes to disk but you could do whatever
var testFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "test.pdf");
System.IO.File.WriteAllBytes(testFile, bytes);
I have a method, which takes in the following:
Byte array, which is a PDF file
A "from" size
A "to" size
The idea is it transforms a PDF file with a specific size, to another size. I want to return a byte array, and want to keep the whole thing in memory.
I create the PdfWriter using a memorystream in the constructor (outPDF), and then does my conversion. After, I want to say outBytes = outPDF.ToArray(); .
I tried putting this code in three places, see place A, B and C in the code. In place A, the length of the memorystream is only 255, which doesn't work. My guess is the doc.Close() has to run first. In place B and C, the stream is closed, and cannot be accessed.
My question is therefore:
How to get a byte array from PdfWriter, writing to a memorystream in iTextSharp
My code:
public static byte[] ConvertPdfSize(byte[] inPDF, LetterSize fromSize, LetterSize toSize)
{
if (fromSize != LetterSize.A4 || toSize != LetterSize.Letter)
{
throw new ArgumentException("Function only supports from size A4 to size letter");
}
MemoryStream outPDF = new MemoryStream();
byte[] outBytes;
using (PdfReader pdfr = new PdfReader(inPDF))
{
using (Document doc = new Document(PageSize.LETTER))
{
Document.Compress = true;
PdfWriter writer = PdfWriter.GetInstance(doc, outPDF);
doc.Open();
PdfContentByte cb = writer.DirectContent;
PdfImportedPage page;
for (int i = 1; i < pdfr.NumberOfPages + 1; i++)
{
page = writer.GetImportedPage(pdfr, i);
cb.AddTemplate(page, PageSize.LETTER.Width / pdfr.GetPageSize(i).Width, 0, 0, PageSize.LETTER.Height / pdfr.GetPageSize(i).Height, 0, 0);
doc.NewPage();
}
// place A
doc.Close();
// place B
}
pdfr.Close();
// place C
}
return new byte[0];
}
Just return your bytes after all of the iTextSharp stuff is done but before discarding the MemoryStream
using(MemoryStream outPDF = new MemoryStream())
{
using (PdfReader pdfr = new PdfReader(inPDF))
{
using (Document doc = new Document(PageSize.LETTER))
{
//...
}
}
return outPDF.ToArray();
}
My requirement is to create xps document which has 10 pages (say). I am using the following code to create a xps document. Please take a look.
// Create the new document
XpsDocument xd = new XpsDocument("D:\\9780545325653.xps", FileAccess.ReadWrite);
IXpsFixedDocumentSequenceWriter xdSW = xd.AddFixedDocumentSequence();
IXpsFixedDocumentWriter xdW = xdSW.AddFixedDocument();
IXpsFixedPageWriter xpW = xdW.AddFixedPage();
fontURI = AddFontResourceToFixedPage(xpW, #"D:\arial.ttf");
image = AddJpegImageResourceToFixedPage(xpW, #"D:\Single content\20_1.jpg");
StringBuilder pageContents = new StringBuilder();
pageContents.Append(ReadFile(#"D:\Single content\20.fpage\20.fpage", i));
xmlWriter = xpW.XmlWriter;
xmlWriter.WriteRaw(pageContents.ToString());
}
xmlWriter.Close();
xpW.Commit();
// Commit the fixed document
xdW.Commit();
// Commite the fixed document sequence writer
xdSW.Commit();
// Commit the XPS document itself
xd.Close();
}
private static string AddFontResourceToFixedPage(IXpsFixedPageWriter pageWriter, String fontFileName)
{
string fontUri = "";
using (XpsFont font = pageWriter.AddFont(false))
{
using (Stream dstFontStream = font.GetStream())
using (Stream srcFontStream = File.OpenRead(fontFileName))
{
CopyStream(srcFontStream, dstFontStream);
// commit font resource to the package file
font.Commit();
}
fontUri = font.Uri.ToString();
}
return fontUri;
}
private static Int32 CopyStream(Stream srcStream, Stream dstStream)
{
const int size = 64 * 1024; // copy using 64K buffers
byte[] localBuffer = new byte[size];
int bytesRead;
Int32 bytesMoved = 0;
// reset stream pointers
srcStream.Seek(0, SeekOrigin.Begin);
dstStream.Seek(0, SeekOrigin.Begin);
// stream position is advanced automatically by stream object
while ((bytesRead = srcStream.Read(localBuffer, 0, size)) > 0)
{
dstStream.Write(localBuffer, 0, bytesRead);
bytesMoved += bytesRead;
}
return bytesMoved;
}
private static string ReadFile(string filePath,int i)
{
FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.ReadWrite);
StringBuilder sb = new StringBuilder();
using (StreamReader sr = new StreamReader(fs))
{
String line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
sb.AppendLine(line);
}
}
string allines = sb.ToString();
//allines = allines.Replace("FontUri=\"/Resources/f7728e4c-2606-4fcb-b963-d2d3f52b013b.odttf\"", "FontUri=\"" + fontURI + "\" ");
//XmlReader xmlReader = XmlReader.Create(fs, new XmlReaderSettings() { IgnoreComments = true });
XMLSerializer serializer = new XMLSerializer();
FixedPage fp = (FixedPage)serializer.DeSerialize(allines, typeof(FixedPage));
foreach (Glyphs glyph in fp.lstGlyphs)
{
glyph.FontUri = fontURI;
}
fp.Path.PathFill.ImageBrush.ImageSource = image;
fs.Close();
string fpageString = serializer.Serialize(fp);
return fpageString;
}
private static string AddJpegImageResourceToFixedPage(IXpsFixedPageWriter pageWriter, String imgFileName)
{
XpsImage image = pageWriter.AddImage("image/jpeg");
using (Stream dstImageStream = image.GetStream())
using (Stream srcImageStream = File.OpenRead(imgFileName))
{
CopyStream(srcImageStream, dstImageStream); // commit image resource to the package file
//image.Commit();
}
return image.Uri.ToString();
}
If you see it, i would have passed single image and single fpage to create a xps document. I want to pass multiple fpages list and image list to create a xps document which has multiple pages..?
You are doing this in the most excruciatingly difficult manner possible. I'd suggest taking the lazy man's route.
Realize that an XpsDocument is just a wrapper on a FixedDocumentSequence, which contains zero or more FixedDocuments, which contains zero or more FixedPages. All these types can be created, manipulated and combined without writing XML.
All you really need to do is create a FixedPage with whatever content on it you need. Here's an example:
static FixedPage CreateFixedPage(Uri imageSource)
{
FixedPage fp = new FixedPage();
fp.Width = 320;
fp.Height = 240;
Grid g = new Grid();
g.HorizontalAlignment = System.Windows.HorizontalAlignment.Center;
g.VerticalAlignment = System.Windows.VerticalAlignment.Center;
fp.Children.Add(g);
Image img = new Image
{
UriSource = imageSource,
};
g.Children.Add(image);
return fp;
}
This is all WPF. I'm creating a FixedPage that has as its root a Grid, which contains an Image that is loaded from the given Uri. The image will be stretched to fill the available space of the Grid. Or, you could do whatever you want. Create a template as a UserControl, send it text to place within itself, whatever.
Next, you just need to add a bunch of fixed pages to an XpsDocument. It's incredibly hard, so read carefully:
public void WriteAllPages(XpsDocument document, IEnumerable<FixedPage> pages)
{
var writer = XpsDocument.CreateXpsDocumentWriter(document);
foreach(var page in pages)
writer.Write(page);
}
And that's all you need to do. Create your pages, add them to your document. Done.