itext7 update annotation text - c#

I would like to update text content within a FreeText annotation when I copy the annotation from one PDF document to another, but for some reason the text does not update in the final PDF using the approach shown below. The annotation object updates, but the final result within the PDF does not reflect the updated content for the FreeText annotation type. Strangely, Ink type annotations do get updated with the revised content, as it shows up in the form of a sticky note looking comment overlaid on top of the Ink annotation itself.
Here's a quick snippet of the code I've used (if needed I can add more):
foreach (var anno in annots)
{
var a = anno.GetPdfObject().CopyTo(masterPdfDoc);
PdfAnnotation ano = PdfAnnotation.MakeAnnotation(a);
var contents = ano.GetContents().ToString();
ano.SetContents(new PdfString("COMMENT: " + contents));
//ano.Put(PdfName.Contents, new PdfString("COMMENT: " + contents));
masterDocPage.AddAnnotation(ano);
}
Would greatly appreciate any advice provided. Thanks

The following code snippet copies and modifies the text content of FreeText annotations from 1 PDF (i.e. annots) and saves the modified annotations into a new PDF. A good chunk of the code is similar to the answer of this post but was updated for iText7.
foreach (var anno in annots)
{
var a = anno.GetPdfObject().CopyTo(masterPdfDoc);
PdfAnnotation ano = PdfAnnotation.MakeAnnotation(a);
var apDict = ano.GetAppearanceDictionary();
if (apDict == null)
{
Console.WriteLine("No appearances.");
continue;
}
foreach (PdfName key in apDict.KeySet())
{
Console.WriteLine("Appearance: {0}", key);
PdfStream value = apDict.GetAsStream(key);
if (value != null)
{
var text = ExtractAnnotationText(value);
Console.WriteLine("Extracted Text: {0}", text);
if (text != "")
{
var valueString = Encoding.ASCII.GetString(value.GetBytes());
value.SetData(Encoding.ASCII.GetBytes(valueString.Replace(text, "COMMENT: " + text)));
}
}
}
masterDocPage.AddAnnotation(ano);
}
public static String ExtractAnnotationText(PdfStream xObject)
{
PdfResources resources = new PdfResources(xObject.GetAsDictionary(PdfName.Resources));
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
PdfCanvasProcessor processor = new PdfCanvasProcessor(strategy);
processor.ProcessContent(xObject.GetBytes(), resources);
var text = strategy.GetResultantText();
return text;
}

Related

How to copy content of Rich Text Content Control from Word document and remove the control itself using Open XML SDK

I'm trying to copy the content of Rich Text Content Control(s) from one to another Word document. Each of Rich Text Content Control(s) contains a text block and a few of Plain Text Content Controls. The code below seems to work...
using (WordprocessingDocument doc = WordprocessingDocument.Open(destinationFile, true))
{
MainDocumentPart mainPart = doc.MainDocumentPart;
Dictionary<string, SdtBlock> sdtBlocks = getContentControlsFromDocument(sourceFile);
foreach (KeyValuePair<string, SdtBlock> sdtBlock in sdtBlocks)
{
SdtElement control = mainPart.Document.Body.Descendants<SdtElement>().Where(r =>
{
var tag = r.SdtProperties.GetFirstChild<Tag>();
return tag != null && tag.Val == sdtBlock.Key.ToLower();
}).FirstOrDefault();
SdtContentBlock cloneSdtContentBlock = (SdtContentBlock)sdtBlock.Value.Descendants<SdtContentBlock>().FirstOrDefault().Clone();
control.Parent.InsertAfter(cloneSdtContentBlock, control);
control.Remove();
}
mainPart.Document.Save();
}
but when I try to find all the Content Controls within the destinationFile using the code below
string key = "tag_name";
List<SdtElement> controls = mainPart.Document.Body.Descendants<SdtElement>().Where(r =>
{
var tag = r.SdtProperties.GetFirstChild<Tag>();
return tag != null && tag.Val == key.ToLower();
}).ToList();
I can't find the ones that were/are within the Rich Text Content Control copied from the sourceFile. In other words, I'd like to copy only the content of the Rich Text Content Control without the control itself.
Update:
To simplify the question. I have a Rich Text Content Control which may have plain text and a couple of Plain Text Content Controls. All I need is copying (only) the content within this Rich Text Content Control which wrappes the whole thing to an other Word document.
In the meantime I managed to solve the problem myself. Here is the solution so it might be useful to someone else.
using (WordprocessingDocument doc = WordprocessingDocument.Open(#"C:\tmp\test-1.docx", true))
{
MainDocumentPart mainPart = doc.MainDocumentPart;
foreach (var conditialTemplate in conditionalTemplates)
{
List<SdtElement> controls = mainPart.Document.Body.Descendants<SdtElement>().Where(r =>
{
var tag = r.SdtProperties.GetFirstChild<Tag>();
return tag != null && tag.Val == conditialTemplate.ToLower();
}).ToList();
foreach (var control in controls)
{
if (control != null)
{
SdtProperties props = control.Elements<SdtProperties>().FirstOrDefault();
Tag tag = props.Elements<Tag>().FirstOrDefault();
Console.WriteLine("Tag: " + tag.Val);
string theRightBlock = "A";
SdtBlock theRightSdtBlock = GetTheRightConditionalSdtBlock(theRightBlock, tag.Val);
if (theRightSdtBlock != null)
{
OpenXmlElement parent = control.Parent;
SdtBlock clone = new SdtBlock();
clone = (SdtBlock)theRightSdtBlock.Clone();
var elements = clone.GetFirstChild<SdtContentBlock>().ChildElements.ToList();
elements.Reverse();
elements.ForEach(child =>
{
parent.InsertAfter(child.Clone() as OpenXmlElement, control);
});
control.Remove();
}
}
}
}
mainPart.Document.Save();
}

C# OpenXML How to Replace \r\n with Break()?

I have a text field in my database and it has a text with many lines.
When generating a MS Word document using OpenXML and bookmarks, the text become one single line.
I've noticed that in each new line the bookmark value show the characters "\r\n".
Looking for a solution, I've found some answers which helped me, but I'm still having a problem.
I've used the run.Append(new Break()); solution, but the text replaced is showing the name of the bookmark as well.
For example:
bookmark test = "Big text here in first paragraph\r\nSecond paragraph".
It is shown in MS Word document like:
testBig text here in first paragraph
Second paragraph
Can anyone, please, help me to eliminate the bookmark name?
Here is my code:
public void UpdateBookmarksVistoria(string originalPath, string copyPath, string fileType)
{
string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
// Make a copy of the template file.
File.Copy(originalPath, copyPath, true);
//Open the document as an Open XML package and extract the main document part.
using (WordprocessingDocument wordPackage = WordprocessingDocument.Open(copyPath, true))
{
MainDocumentPart part = wordPackage.MainDocumentPart;
//Setup the namespace manager so you can perform XPath queries
//to search for bookmarks in the part.
NameTable nt = new NameTable();
XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
nsManager.AddNamespace("w", wordmlNamespace);
//Load the part's XML into an XmlDocument instance.
XmlDocument xmlDoc = new XmlDocument(nt);
xmlDoc.Load(part.GetStream());
//pega a url para exibir as fotos
string url = HttpContext.Current.Request.Url.ToString();
string enderecoURL;
if (url.Contains("localhost"))
enderecoURL = url.Substring(0, 26);
else if (url.Contains("www."))
enderecoURL = url.Substring(0, 24);
else
enderecoURL = url.Substring(0, 20);
//Iterate through the bookmarks.
int cont = 56;
foreach (KeyValuePair<string, string> bookmark in bookmarks)
{
var res = from bm in part.Document.Body.Descendants<BookmarkStart>()
where bm.Name == bookmark.Key
select bm;
var bk = res.SingleOrDefault();
if (bk != null)
{
Run bookmarkText = bk.NextSibling<Run>();
if (bookmarkText != null) // if the bookmark has text replace it
{
var texts = bookmark.Value.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
for (int i = 0; i < texts.Length; i++)
{
if (i > 0)
bookmarkText.Append(new Break());
Text text = new Text();
text.Text = texts[i];
bookmarkText.Append(text); //HERE IS MY PROBLEM
}
}
else // otherwise append new text immediately after it
{
var parent = bk.Parent; // bookmark's parent element
Text text = new Text(bookmark.Value);
Run run = new Run(new RunProperties());
run.Append(text);
// insert after bookmark parent
parent.Append(run);
}
bk.Remove(); // we don't want the bookmark anymore
}
}
//Write the changes back to the document part.
xmlDoc.Save(wordPackage.MainDocumentPart.GetStream(FileMode.Create));
wordPackage.Close();
}}

Get Layer2 Text (Signature Description) from signature image using itextsharp

I need to retrieve the layer2 text from a signature. How can I get the description (under the signature image) using itextsharp? below is the code I'm using to get the sign date and username:
PdfReader reader = new PdfReader(pdfPath, System.Text.Encoding.UTF8.GetBytes(MASTER_PDF_PASSWORD));
using (MemoryStream memoryStream = new MemoryStream())
{
PdfStamper stamper = new PdfStamper(reader, memoryStream);
AcroFields acroFields = stamper.AcroFields;
List<String> names = acroFields.GetSignatureNames();
foreach (String name in names)
{
PdfPKCS7 pk = acroFields.VerifySignature(name);
String userName = PdfPKCS7.GetSubjectFields(pk.SigningCertificate).GetField("CN");
Console.WriteLine("Sign Date: " + pk.SignDate.ToString() + " Name: " + userName);
// Here i need to retrieve the description underneath the signature image
}
reader.RemoveUnusedObjects();
reader.Close();
stamper.Writer.CloseStream = false;
if (stamper != null)
{
stamper.Close();
}
}
and below is the code I used to set the description
PdfStamper st = PdfStamper.CreateSignature(reader, memoryStream, '\0', null, true);
PdfSignatureAppearance sap = st.SignatureAppearance;
sap.Render = PdfSignatureAppearance.SignatureRender.GraphicAndDescription;
sap.Layer2Font = font;
sap.Layer2Text = "Some text that i want to retrieve";
Thank you.
While Bruno addressed the issue starting with a PDF containing a "layer 2", allow me to first state that using these "signature layers" in PDF signature appearances is not required by the PDF specification, the specification actually does not even know these layers at all! Thus, if you try to parse a specific layer, you may not find such a "layer" or, even worse, find something that looks like that layer (a XObject named n2) which contains the wrong data.
That been said, though, Whether you look for text from a layer 2 or from the signature appearance as a whole, you can use iTextSharp text extraction capabilities. I used Bruno's code as base for retrieving the n2 layer.
public static void ExtractSignatureTextFromFile(FileInfo file)
{
try
{
Console.Out.Write("File: {0}\n", file);
using (var pdfReader = new PdfReader(file.FullName))
{
AcroFields fields = pdfReader.AcroFields;
foreach (string name in fields.GetSignatureNames())
{
Console.Out.Write(" Signature: {0}\n", name);
iTextSharp.text.pdf.AcroFields.Item item = fields.GetFieldItem(name);
PdfDictionary widget = item.GetWidget(0);
PdfDictionary ap = widget.GetAsDict(PdfName.AP);
if (ap == null)
continue;
PdfStream normal = ap.GetAsStream(PdfName.N);
if (normal == null)
continue;
Console.Out.Write(" Content of normal appearance: {0}\n", extractText(normal));
PdfDictionary resources = normal.GetAsDict(PdfName.RESOURCES);
if (resources == null)
continue;
PdfDictionary xobject = resources.GetAsDict(PdfName.XOBJECT);
if (xobject == null)
continue;
PdfStream frm = xobject.GetAsStream(PdfName.FRM);
if (frm == null)
continue;
PdfDictionary res = frm.GetAsDict(PdfName.RESOURCES);
if (res == null)
continue;
PdfDictionary xobj = res.GetAsDict(PdfName.XOBJECT);
if (xobj == null)
continue;
PRStream n2 = (PRStream) xobj.GetAsStream(PdfName.N2);
if (n2 == null)
continue;
Console.Out.Write(" Content of normal appearance, layer 2: {0}\n", extractText(n2));
}
}
}
catch (Exception ex)
{
Console.Error.Write("Error... " + ex.StackTrace);
}
}
public static String extractText(PdfStream xObject)
{
PdfDictionary resources = xObject.GetAsDict(PdfName.RESOURCES);
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
PdfContentStreamProcessor processor = new PdfContentStreamProcessor(strategy);
processor.ProcessContent(ContentByteUtils.GetContentBytesFromContentObject(xObject), resources);
return strategy.GetResultantText();
}
For the sample file signature_n2.pdf Bruno used you get this:
File: ...\signature_n2.pdf
Signature: Signature1
Content of normal appearance: This document was signed by Bruno
Specimen.
Content of normal appearance, layer 2: This document was signed by Bruno
Specimen.
As this sample uses the layer 2 as the OP expects, it already contains the text in question.
Please take a look at the following PDF: signature_n2.pdf. It contains a signature with the following text in the n2 layer:
This document was signed by Bruno
Specimen.
Before we can write code to extract this text, we should use iText RUPS to look at the internal structure of the PDF, so that we can find out where this /n2 layer is stored:
Based on this information, we can start writing our code. See the GetN2fromSig example:
public static void main(String[] args) throws IOException {
PdfReader reader = new PdfReader(SRC);
AcroFields fields = reader.getAcroFields();
Item item = fields.getFieldItem("Signature1");
PdfDictionary widget = item.getWidget(0);
PdfDictionary ap = widget.getAsDict(PdfName.AP);
PdfStream normal = ap.getAsStream(PdfName.N);
PdfDictionary resources = normal.getAsDict(PdfName.RESOURCES);
PdfDictionary xobject = resources.getAsDict(PdfName.XOBJECT);
PdfStream frm = xobject.getAsStream(PdfName.FRM);
PdfDictionary res = frm.getAsDict(PdfName.RESOURCES);
PdfDictionary xobj = res.getAsDict(PdfName.XOBJECT);
PRStream n2 = (PRStream) xobj.getAsStream(PdfName.N2);
byte[] stream = PdfReader.getStreamBytes(n2);
System.out.println(new String(stream));
}
We get the widget annotation for the signature field with name "signature1". Based on the info from RUPS, we know that we have to get the resources (/Resources) of the normal (/N) appearance (/AP). In the /XObjects dictionary, we'll find a form XObject named /FRM. This XObject has in turn also some /Resources, more specifically two /XObjects, one named /n0, the other one named /n2.
We get the stream of the /n2 object and we convert it to an uncompressed byte[]. When we print this array as a String, we get the following result:
BT
1 0 0 1 0 49.55 Tm
/F1 12 Tf
(This document was signed by Bruno)Tj
1 0 0 1 0 31.55 Tm
(Specimen.)Tj
ET
This is PDF syntax. BT and ET stand for "Begin Text" and "End Text". The Tm operator set the text matrix. The Tf operator set the font. Tj shows the strings that are delimited by ( and ). If you want the plain text, it's sufficient to extract only the text that is between parentheses.

New page in ITextSharp does not create new page

In the following code I create a pdf dynamically using ITextSharp.
I want the 2nd table to be splitted when there is not room enough on the page.
How can this be accomplished ? I tried it with the newPage method on the pdf stamper, but no new page has been created...
(not all codepaths included for readability)
private byte[] DoGenerateStatisticsPerOrganisationalUnitPdf(
string emptyPdfFile,
DateTime currentDateTime,
string organisationalUnit,
int? roleId,
DateTime? fromDate,
DateTime? toDate)
{
var pdfReader = new ITextSharp.pdf.PdfReader(emptyPdfFile); // note that PdfReader is not IDisposeable
using (MemoryStream memoryStream = new MemoryStream())
using (ITextSharp.pdf.PdfStamper pdfStamper = new ITextSharp.pdf.PdfStamper(pdfReader, memoryStream))
{
// Get content bytes of first page
var pdfContentByte = pdfStamper.GetOverContent(1);
// Make a page width/height large rectangle column for write actions
var ct = new ITextSharp.pdf.ColumnText(pdfContentByte);
ct.SetSimpleColumn(
PageStartX,
PageStartY,
PageEndX,
PageEndY);
var paragraph = new iTextSharp.text.Paragraph(new ITextSharp.Chunk("Statistieken Profchecks", titleFont));
ct.AddElement(paragraph);
// Add printed date time
var dateTimeText = string.Format(
CultureInfo.CurrentCulture,
"Afdrukdatum: {0}",
currentDateTime.ToString(DateFormat, CultureInfo.CurrentCulture));
paragraph = new iTextSharp.text.Paragraph(new ITextSharp.Chunk(dateTimeText, rowFont));
ct.AddElement(paragraph);
// Add selected filter
var filterItems = string.Empty;
if (!string.IsNullOrEmpty(organisationalUnit))
{
filterItems += "\n" + string.Format(CultureInfo.CurrentCulture, " ° Organisatie: {0}", organisationalUnit);
}
if (roleId.HasValue)
{
filterItems += "\n" + string.Format(CultureInfo.CurrentCulture, " ° Rol: {0}", roleService.GetById(roleId.Value).Name);
}
if (fromDate.HasValue)
{
filterItems += "\n" + string.Format(CultureInfo.CurrentCulture, " ° Datum van: {0}", fromDate.Value.ToString(DateFormat, CultureInfo.CurrentCulture));
}
if (toDate.HasValue)
{
filterItems += "\n" + string.Format(CultureInfo.CurrentCulture, " ° Datum t/m: {0}", toDate.Value.ToString(DateFormat, CultureInfo.CurrentCulture));
}
var filterText = string.Format(
CultureInfo.CurrentCulture,
"Geselecteerde filter: {0}",
filterItems.Length > 0 ? filterItems : "(geen filter)");
paragraph = new iTextSharp.text.Paragraph(new ITextSharp.Chunk(filterText, rowFont));
ct.AddElement(paragraph);
paragraph = new iTextSharp.text.Paragraph(new ITextSharp.Chunk("\nResultaten per game", titleFont));
ct.AddElement(paragraph);
// Table: Results per game
var table = CreateTable(new string[] { "Game", "Unieke spelers", "Resultaat" });
var gameResultList = statisticsService.GetOrganisationalUnitStatistics(1, 20, organisationalUnit, roleId, fromDate, toDate);
foreach (var gameResultItem in gameResultList)
{
table.AddCell(new iTextSharp.text.Phrase(gameResultItem.Game, rowFont));
table.AddCell(new iTextSharp.text.Phrase(gameResultItem.NumberOfUsers.ToString(CultureInfo.CurrentCulture), rowFont));
var percentage = gameResultItem.AveragePercentage.HasValue ? string.Format(CultureInfo.CurrentCulture, "{0}%", gameResultItem.AveragePercentage) : "?";
table.AddCell(new iTextSharp.text.Phrase(percentage, rowFont));
}
table.CompleteRow();
ct.AddElement(table);
paragraph = new iTextSharp.text.Paragraph(new ITextSharp.Chunk("\nResultaten per kenniscategorie", titleFont));
ct.AddElement(paragraph);
// Table: Results per knowledgecategory
table = CreateTable(new string[] { "Kenniscategorie", "Gemiddeld", "Laagste", "Hoogste", "Standaard deviatie" });
var knowledgeCategoryResultList = statisticsService.GetGlobalKnowledgeCategoryResultStatistics(
organisationalUnit,
roleId,
fromDate,
toDate);
foreach (var knowledgeCategoryResultItem in knowledgeCategoryResultList)
{
table.AddCell(new iTextSharp.text.Phrase(knowledgeCategoryResultItem.KnowledgeCategory.Name, rowFont));
table.AddCell(new iTextSharp.text.Phrase(
knowledgeCategoryResultItem.Statistics.Average.ToString(CultureInfo.CurrentCulture),
rowFont));
table.AddCell(new iTextSharp.text.Phrase(
knowledgeCategoryResultItem.Statistics.Minimum.ToString(CultureInfo.CurrentCulture),
rowFont));
table.AddCell(new iTextSharp.text.Phrase(
knowledgeCategoryResultItem.Statistics.Maximum.ToString(CultureInfo.CurrentCulture),
rowFont));
table.AddCell(new iTextSharp.text.Phrase(
knowledgeCategoryResultItem.Statistics.StDev.HasValue ? knowledgeCategoryResultItem.Statistics.StDev.Value.ToString(
CultureInfo.CurrentCulture) : "?",
rowFont));
}
table.CompleteRow();
ct.AddElement(table);
// Parse
ct.Go();
pdfStamper.FormFlattening = true;
pdfStamper.FreeTextFlattening = true;
// Close stamper explicitly, otherwise the pdf gets corrupted (don't wait until the Dispose is called in the using-clause)
pdfStamper.Close();
// Always call ToArray, to get all the bytes returned.
return memoryStream.ToArray();
}
}
I see you take an existing PDF file (referred to as "emptyPdfFile") add content to that PDF (2 tables) and want to add pages as necessary. So I assume you actually want to create a PDF from scratch.
In that case it's most likely easier to use PdfWriter and add your tables using Document.Add(). Tables will be split and pages will be added automatically when the end of the current page is reached.
A simple example of adding a table with Document.Add() can be found here in the MyFirstTable example (that's iText code in Java, check the C# port for iTextSharp code).
If you do want to follow the approach of your example code, using PdfReader, PdfStamper and ColumnText:
ColumnText.Go() adds content to the defined area until that area is full. Any remaining content stays in the ColumnText object. So if you want to split the content over multiple areas, you have to loop and call ColumnText.Go() until all content is consumed.
Here's an example of the ColumnText.Go() looping: ColumnTable (Again, you may want to check the C# port).
In that example the tables are layed out in 2 columns on each page, but the approach stays the same for 1 table per page.
Note that Document.NewPage() is used in the example to add an extra page. You'll have to replace this call with PdfStamper.InsertPage() in your case.

using ITextSharp to extract and update links in an existing PDF

I need to post several (read: a lot) PDF files to the web but many of them have hard coded file:// links and links to non-public locations. I need to read through these PDFs and update the links to the proper locations. I've started writing an app using itextsharp to read through the directories and files, find the PDFs and iterate through each page. What I need to do next is find the links and then update the incorrect ones.
string path = "c:\\html";
DirectoryInfo rootFolder = new DirectoryInfo(path);
foreach (DirectoryInfo di in rootFolder.GetDirectories())
{
// get pdf
foreach (FileInfo pdf in di.GetFiles("*.pdf"))
{
string contents = string.Empty;
Document doc = new Document();
PdfReader reader = new PdfReader(pdf.FullName);
using (MemoryStream ms = new MemoryStream())
{
PdfWriter writer = PdfWriter.GetInstance(doc, ms);
doc.Open();
for (int p = 1; p <= reader.NumberOfPages; p++)
{
byte[] bt = reader.GetPageContent(p);
}
}
}
}
Quite frankly, once I get the page content I'm rather lost on this when it comes to iTextSharp. I've read through the itextsharp examples on sourceforge, but really didn't find what I was looking for.
Any help would be greatly appreciated.
Thanks.
This one is a little complicated if you don't know the internals of the PDF format and iText/iTextSharp's abstraction/implementation of it. You need to understand how to use PdfDictionary objects and look things up by their PdfName key. Once you get that you can read through the official PDF spec and poke around a document pretty easily. If you do care I've included the relevant parts of the PDF spec in parenthesis where applicable.
Anyways, a link within a PDF is stored as an annotation (PDF Ref 12.5). Annotations are page-based so you need to first get each page's annotation array individually. There's a bunch of different possible types of annotations so you need to check each one's SUBTYPE and see if its set to LINK (12.5.6.5). Every link should have an ACTION dictionary associated with it (12.6.2) and you want to check the action's S key to see what type of action it is. There's a bunch of possible ones for this, link's specifically could be internal links or open file links or play sound links or something else (12.6.4.1). You are looking only for links that are of type URI (note the letter I and not the letter L). URI Actions (12.6.4.7) have a URI key that holds the actual address to navigate to. (There's also an IsMap property for image maps that I can't actually imagine anyone using.)
Whew. Still reading? Below is a full working VS 2010 C# WinForms app based on my post here targeting iTextSharp 5.1.1.0. This code does two main things: 1) Create a sample PDF with a link in it pointing to Google.com and 2) replaces that link with a link to bing.com. The code should be pretty well commented but feel free to ask any questions that you might have.
using System;
using System.Text;
using System.Windows.Forms;
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.IO;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
//Folder that we are working in
private static readonly string WorkingFolder = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Hyperlinked PDFs");
//Sample PDF
private static readonly string BaseFile = Path.Combine(WorkingFolder, "OldFile.pdf");
//Final file
private static readonly string OutputFile = Path.Combine(WorkingFolder, "NewFile.pdf");
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
CreateSamplePdf();
UpdatePdfLinks();
this.Close();
}
private static void CreateSamplePdf()
{
//Create our output directory if it does not exist
Directory.CreateDirectory(WorkingFolder);
//Create our sample PDF
using (iTextSharp.text.Document Doc = new iTextSharp.text.Document(PageSize.LETTER))
{
using (FileStream FS = new FileStream(BaseFile, FileMode.Create, FileAccess.Write, FileShare.Read))
{
using (PdfWriter writer = PdfWriter.GetInstance(Doc, FS))
{
Doc.Open();
//Turn our hyperlink blue
iTextSharp.text.Font BlueFont = FontFactory.GetFont("Arial", 12, iTextSharp.text.Font.NORMAL, iTextSharp.text.BaseColor.BLUE);
Doc.Add(new Paragraph(new Chunk("Go to URL", BlueFont).SetAction(new PdfAction("http://www.google.com/", false))));
Doc.Close();
}
}
}
}
private static void UpdatePdfLinks()
{
//Setup some variables to be used later
PdfReader R = default(PdfReader);
int PageCount = 0;
PdfDictionary PageDictionary = default(PdfDictionary);
PdfArray Annots = default(PdfArray);
//Open our reader
R = new PdfReader(BaseFile);
//Get the page cont
PageCount = R.NumberOfPages;
//Loop through each page
for (int i = 1; i <= PageCount; i++)
{
//Get the current page
PageDictionary = R.GetPageN(i);
//Get all of the annotations for the current page
Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);
//Make sure we have something
if ((Annots == null) || (Annots.Length == 0))
continue;
//Loop through each annotation
foreach (PdfObject A in Annots.ArrayList)
{
//Convert the itext-specific object as a generic PDF object
PdfDictionary AnnotationDictionary = (PdfDictionary)PdfReader.GetPdfObject(A);
//Make sure this annotation has a link
if (!AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK))
continue;
//Make sure this annotation has an ACTION
if (AnnotationDictionary.Get(PdfName.A) == null)
continue;
//Get the ACTION for the current annotation
PdfDictionary AnnotationAction = (PdfDictionary)AnnotationDictionary.Get(PdfName.A);
//Test if it is a URI action
if (AnnotationAction.Get(PdfName.S).Equals(PdfName.URI))
{
//Change the URI to something else
AnnotationAction.Put(PdfName.URI, new PdfString("http://www.bing.com/"));
}
}
}
//Next we create a new document add import each page from the reader above
using (FileStream FS = new FileStream(OutputFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (Document Doc = new Document())
{
using (PdfCopy writer = new PdfCopy(Doc, FS))
{
Doc.Open();
for (int i = 1; i <= R.NumberOfPages; i++)
{
writer.AddPage(writer.GetImportedPage(R, i));
}
Doc.Close();
}
}
}
}
}
}
EDIT
I should note, this only changes the actual link. Any text within the document won't get updated. Annotations are drawn on top of text but aren't really tied to the text underneath in anyway. That's another topic completely.
Noted if the Action is indirect it will not return a dictionary and you will have an error:
PdfDictionary AnnotationAction = (PdfDictionary)AnnotationDictionary.Get(PdfName.A);
In cases of possible indirect dictionaries:
PdfDictionary Action = null;
//Get action directly or by indirect reference
PdfObject obj = Annotation.Get(PdfName.A);
if (obj.IsIndirect) {
Action = PdfReader.GetPdfObject(obj);
} else {
Action = (PdfDictionary)obj;
}
In that case you have to investigate the returned dictionary to figure out where the URI is found. As with an indirect /Launch dictionary the URI is located in the /F item being of type PRIndirectReference with the /Type being a /FileSpec and the URI located in the value of /F
Added code for dealing with indirect and launch actions and null annotation-dictionary:
PdfReader r = new PdfReader(#"d:\kb2\" + f);
for (int i = 1; i <= r.NumberOfPages; i++) {
//Get the current page
var PageDictionary = r.GetPageN(i);
//Get all of the annotations for the current page
var Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);
//Make sure we have something
if ((Annots == null) || (Annots.Length == 0))
continue;
foreach (var A in Annots.ArrayList) {
var AnnotationDictionary = PdfReader.GetPdfObject(A) as PdfDictionary;
if (AnnotationDictionary == null)
continue;
//Make sure this annotation has a link
if (!AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK))
continue;
//Make sure this annotation has an ACTION
if (AnnotationDictionary.Get(PdfName.A) == null)
continue;
var annotActionObject = AnnotationDictionary.Get(PdfName.A);
var AnnotationAction = (PdfDictionary)(annotActionObject.IsIndirect() ? PdfReader.GetPdfObject(annotActionObject) : annotActionObject);
var type = AnnotationAction.Get(PdfName.S);
//Test if it is a URI action
if (type.Equals(PdfName.URI)) {
//Change the URI to something else
string relativeRef = AnnotationAction.GetAsString(PdfName.URI).ToString();
AnnotationAction.Put(PdfName.URI, new PdfString(url));
} else if (type.Equals(PdfName.LAUNCH)) {
//Change the URI to something else
var filespec = AnnotationAction.GetAsDict(PdfName.F);
string url = filespec.GetAsString(PdfName.F).ToString();
AnnotationAction.Put(PdfName.F, new PdfString(url));
}
}
}
//Next we create a new document add import each page from the reader above
using (var output = File.OpenWrite(outputFile.FullName)) {
using (Document Doc = new Document()) {
using (PdfCopy writer = new PdfCopy(Doc, output)) {
Doc.Open();
for (int i = 1; i <= r.NumberOfPages; i++) {
writer.AddPage(writer.GetImportedPage(r, i));
}
Doc.Close();
}
}
}
r.Close();

Categories