How to embed all fonts from other PDF iText 7

How to embed all fonts from other PDF iText 7 - c#

I am trying to overlay two PDF files using iText7/C#.
The first one is kind of background and the second one is containing form fields.
Everything works fine and only problem is that I lose fonts from the second file.
I try as follows:
static public bool Overlay(string back_path, string front_path, string merge_path)
{
PdfReader reader;
PdfDocument pdf = null, front;
try
{
reader = new PdfReader(back_path);
pdf = new PdfDocument(reader, new PdfWriter(merge_path));
front = new PdfDocument(new PdfReader(front_path));
var form = PdfAcroForm.GetAcroForm(front, false);
PdfAcroForm dform = PdfAcroForm.GetAcroForm(pdf, true);
IDictionary<String, PdfFormField> fields = form.GetFormFields();
// copy styles
dform.SetDefaultResources(form.GetDefaultResources());
dform.SetDefaultAppearance(form.GetDefaultAppearance().GetValue());
// do overlay
foreach (KeyValuePair<string, PdfFormField> pair in fields)
{
try
{
var field = pair.Value;
PdfPage page = field.GetWidgets().First().GetPage();
int pg_no = front.GetPageNumber(page);
if (pg_no < front_start_page || pg_no > front_end_page)
continue;
PdfObject copied = field.GetPdfObject().CopyTo(pdf, true);
PdfFormField copiedField = PdfFormField.MakeFormField(copied, pdf);
// The following returns null. If it returns something, I think I could use copiedField.setFont(font).
// var font = field.GetFont();
dform.AddField(copiedField, pdf.GetPage(pg_no));
}
catch (Exception ex)
{
System.Diagnostics.Debug.WriteLine($"Overlaying field {pair.Key} failed. ({ex.Message})");
}
}
pdf.Close();
return true;
}
catch (Exception ex)
{
throw new OverlayException(ex.Message);
}
}
public static PdfDictionary get_font_dict(PdfDocument pdfDoc)
{
PdfDictionary acroForm = pdfDoc.GetCatalog().GetPdfObject().GetAsDictionary(PdfName.AcroForm);
if (acroForm == null)
{
return null;
}
PdfDictionary dr = acroForm.GetAsDictionary(PdfName.DR);
if (dr == null)
{
return null;
}
PdfDictionary font = dr.GetAsDictionary(PdfName.Font);
return font;
}
So basically I get all fonts from the second PDF and copy them to the final PDF.
But it does not work.
Logically, I think setting font of the original field to the copied one is the right way.
I mean PdfFormField.GetFont() and SetFont().
But it always returns null.

In a comment you clarified:
the background PDF can be assumed not to have form fields or annotations. I mean we can assume background PDF only contains static content (scanned form) and the front PDF only contains formfields.
In that case the easiest way to implement your method is to add the background as xobject to the form PDF instead of adding the form to the background PDF.
You can simply do that like this:
PdfReader formReader = new PdfReader(front_path);
PdfReader backReader = new PdfReader(back_path);
PdfWriter writer = new PdfWriter(merge_path);
using (PdfDocument source = new PdfDocument(backReader))
using (PdfDocument target = new PdfDocument(formReader, writer))
{
PdfFormXObject xobject = source.GetPage(1).CopyAsFormXObject(target);
PdfPage targetFirstPage = target.GetFirstPage();
PdfStream stream = targetFirstPage.NewContentStreamBefore();
PdfCanvas pdfCanvas = new PdfCanvas(stream, targetFirstPage.GetResources(), target);
Rectangle cropBox = targetFirstPage.GetCropBox();
pdfCanvas.AddXObject(xobject, cropBox.GetX(), cropBox.GetY());
}
Depending on the exact static contents of the background and the form PDF, you might want to use NewContentStreamAfter instead of NewContentStreamBefore or even to use some nifty blend mode to get the exact static content look you want.

Related

How to Page Break HTML Content in HTML Renderer

I have a project where HTML code is converted to a PDF using HTML Renderer. The HTML code contains a single table. The PDF is displayed but the issue is that the contents of the table are cut off at the end. So is there any solution to the problem?
PdfDocument pdf=new PdfDocument();
var config = new PdfGenerateConfig()
{
MarginBottom = 20,
MarginLeft = 20,
MarginRight = 20,
MarginTop = 20,
};
//config.PageOrientation = PageOrientation.Landscape;
config.ManualPageSize = new PdfSharp.Drawing.XSize(1080, 828);
pdf = PdfGenerator.GeneratePdf(html, config);
byte[] fileContents = null;
using (MemoryStream stream = new MemoryStream())
{
pdf.Save(stream, true);
fileContents = stream.ToArray();
return new FileStreamResult(new MemoryStream(fileContents.ToArray()), "application/pdf");
}

HTMLRenderer should be able break the table to the next page.
See also:
https://github.com/ArthurHub/HTML-Renderer/pull/41
Make sure you are using the latest version. You may have to add those CSS properties.
Also see this answer:
https://stackoverflow.com/a/37833107/162529

As far as I know page breaks are not supported, but I've done a bit of a work-around (which may not work for all cases) by splitting the HTML into separate pages using a page break class, then adding each page to the pdf.
See example code below:
//This will only work on page break elements that are direct children of the body element.
//Each page's content must be inside the pagebreak element
private static PdfDocument SplitHtmlIntoPagedPdf(string html, string pageBreakBeforeClass, PdfGenerateConfig config, PdfDocument pdf)
{
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var htmlBodyNode = htmlDoc.DocumentNode.SelectSingleNode("//body");
var tempHtml = string.Empty;
foreach (var bodyNode in htmlBodyNode.ChildNodes)
{
if (bodyNode.Attributes["class"]?.Value == pageBreakBeforeClass)
{
if (!string.IsNullOrWhiteSpace(tempHtml))
{
//add any content found before the page break
AddPageToPdf(htmlDoc,tempHtml,config,ref pdf);
tempHtml = string.Empty;
}
AddPageToPdf(htmlDoc,bodyNode.OuterHtml,config,ref pdf);
}
else
{
tempHtml += bodyNode.OuterHtml;
}
}
if (!string.IsNullOrWhiteSpace(tempHtml))
{
//add any content found after the last page break
AddPageToPdf(htmlDoc, tempHtml, config, ref pdf);
}
return pdf;
}
private static void AddPageToPdf(HtmlDocument htmlDoc, string html, PdfGenerateConfig config, ref PdfDocument pdf)
{
var tempDoc = new HtmlDocument();
tempDoc.LoadHtml(htmlDoc.DocumentNode.OuterHtml);
var docNode = tempDoc.DocumentNode;
docNode.SelectSingleNode("//body").InnerHtml = html;
var nodeDoc = PdfGenerator.GeneratePdf(docNode.OuterHtml, config);
using (var tempMemoryStream = new MemoryStream())
{
nodeDoc.Save(tempMemoryStream, false);
var openedDoc = PdfReader.Open(tempMemoryStream, PdfDocumentOpenMode.Import);
foreach (PdfPage page in openedDoc.Pages)
{
pdf.AddPage(page);
}
}
}
Then call the code as follows:
var pdf = new PdfDocument();
var config = new PdfGenerateConfig()
{
MarginLeft = 5,
MarginRight = 5,
PageOrientation = PageOrientation.Portrait,
PageSize = PageSize.A4
};
if (!string.IsNullOrWhiteSpace(pageBreakBeforeClass))
{
pdf = SplitHtmlIntoPagedPdf(html, pageBreakBeforeClass, config, pdf);
}
else
{
pdf = PdfGenerator.GeneratePdf(html, config);
}
For any html that you want to have in its own page, just put the html inside a div with a class of "pagebreak" (or whatever you want to call it). If you want to, you could add that class to your css and give it "page-break-before: always;", so that the html will be print-friendly.

I've just figured out how to make it work, rather than page-break-inside on a TD, do that on the TABLE. Here's the code:
table { page-break-inside: avoid; }
I'm currently on the following versions (not working on stable versions at the moment):
HtmlRenderer on v1.5.1-beta1
PDFsharp on v1.51.5185-beta

Convert image to pdf using itextsharp [duplicate]

I have the following code but this code add only the last image into pdf.
try {
filePath = (filePath != null && filePath.endsWith(".pdf")) ? filePath
: filePath + ".pdf";
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document,
new FileOutputStream(filePath));
document.open();
// document.add(new Paragraph("Image Example"));
for (String imageIpath : imagePathsList) {
// Add Image
Image image1 = Image.getInstance(imageIpath);
// Fixed Positioning
image1.setAbsolutePosition(10f, 10f);
// Scale to new height and new width of image
image1.scaleAbsolute(600, 800);
// image1.scalePercent(0.5f);
// Add to document
document.add(image1);
//document.bottom();
}
writer.close();
} catch (Exception e) {
LOGGER.error(e.getMessage());
}
Would you give me a hint about how to update the code in order to add all the images into the exported pdf? imagePathsList contains all the paths of images that that I want to add into a single pdf.
Best Regards,
Aurelian

Take a look at the MultipleImages example and you'll discover that there are two errors in your code:
You create a page with size 595 x 842 user units, and you add every image to that page regardless of the dimensions of the image.
You claim that only one image is added, but that's not true. You are adding all the images on top of each other on the same page. The last image covers all the preceding images.
Take a look at my code:
public void createPdf(String dest) throws IOException, DocumentException {
Image img = Image.getInstance(IMAGES[0]);
Document document = new Document(img);
PdfWriter.getInstance(document, new FileOutputStream(dest));
document.open();
for (String image : IMAGES) {
img = Image.getInstance(image);
document.setPageSize(img);
document.newPage();
img.setAbsolutePosition(0, 0);
document.add(img);
}
document.close();
}
I create a Document instance using the size of the first image. I then loop over an array of images, setting the page size of the next page to the size of each image before I trigger a newPage() [*]. Then I add the image at coordinate 0, 0 because now the size of the image will match the size of each page.
[*] The newPage() method only has effect if something was added to the current page. The first time you go through the loop, nothing has been added yet, so nothing happens. This is why you need set the page size to the size of the first image when you create the Document instance.

Android has the feature "PdfDocument" to achieve this,
class Main2Activity : AppCompatActivity() {
private var imgFiles: Array<File?>? = null
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main2)
imgFiles= arrayOfNulls(2)
imgFiles!![0] = File(Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_PICTURES).toString() + "/doc1.png")
imgFiles!![1] = File(Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_PICTURES).toString() + "/doc3.png")
val file = getOutputFile(File(Environment.getExternalStorageDirectory().absolutePath)
, "/output.pdf")
val fOut = FileOutputStream(file)
val document = PdfDocument()
var i = 0
imgFiles?.forEach {
i++
val bitmap = BitmapFactory.decodeFile(it?.path)
val pageInfo = PdfDocument.PageInfo.Builder(bitmap.width, bitmap.height, i).create()
val page = document.startPage(pageInfo)
val canvas = page?.canvas
val paint = Paint()
canvas?.drawPaint(paint)
paint.color = Color.BLUE;
canvas?.drawBitmap(bitmap, 0f, 0f, null)
document.finishPage(page)
bitmap.recycle()
}
document.writeTo(fOut)
document.close()
}
private fun getOutputFile(path: File, fileName: String): File? {
if (!path.exists()) {
path.mkdirs()
}
val file = File(path, fileName)
try {
if (file.exists()) {
file.delete()
}
file.createNewFile()
} catch (e: Exception) {
e.printStackTrace()
}
return file
}
}
finally enable the storage permission in manifest, this should works

PDFsharp does not see pages in documents

I made some file merging using PDFsharp before, and now I'm trying to change several files (insert or remove some pages) and I faced with problem, that the library does not see pages. It says that PageCount == 0 and I cannot find pages in object (while debugging). And sure, I cannot do my current work. I use this very simple code:
var destinationPdf = new PdfDocument(destinationFilePath);
Int32 count = destinationPdf.PageCount;
And also, here is the code, that I used to merge files to one PDF before:
public class PdfCreator
{
private PdfDocument document;
public PdfCreator()
{
this.document = new PdfDocument();
}
public void AddImage(String imageFilePath)
{
PdfPage newPage = this.document.AddPage();
XGraphics xGraphics = XGraphics.FromPdfPage(newPage);
XImage image = XImage.FromFile(imageFilePath);
xGraphics.DrawImage(image, 0, 0);
}
public void AddPdfFile(String pdfFilePath)
{
PdfDocument inputDocument = PdfReader.Open(pdfFilePath, PdfDocumentOpenMode.Import);
Int32 count = inputDocument.PageCount;
for (Int32 currentPage = 0; currentPage < count; currentPage++)
{
PdfPage page = inputDocument.Pages[currentPage];
this.document.AddPage(page);
}
}
public void AddTextFile(String txtFilePath)
{
PdfPage newPage = this.document.AddPage();
XGraphics xGraphics = XGraphics.FromPdfPage(newPage);
var xFont = new XFont("Times New Roman", 12, XFontStyle.Bold);
var xTextFormatter = new XTextFormatter(xGraphics);
var rect = new XRect(30, 30, 540, 740);
xGraphics.DrawRectangle(XBrushes.Transparent, rect);
xTextFormatter.Alignment = XParagraphAlignment.Left;
xTextFormatter.DrawString(File.ReadAllText(txtFilePath), xFont, XBrushes.Black, rect, XStringFormats.TopLeft);
}
public void Save(String destinationFilePath)
{
if (this.document.Pages.Count > 0)
{
this.document.Save(destinationFilePath);
this.document.Close();
}
}
}

Your code
var destinationPdf = new PdfDocument(destinationFilePath);
Int32 count = destinationPdf.PageCount;
creates a new document in memory - and surely this document is empty.
Use PdfReader.Open to create a document in memory from an existing file.
When I place the mouse cursor over PdfDocument in your code I get this tooltip:
Creates a new PDF document with the specified file name. The file is
immediately created and keeps locked until the document is closed, at
that time the document is saved automatically. Do not call Save() for
documents created with this constructor, just call Close(). To open an
existing PDF file and import it, use the PdfReader class.

Get Layer2 Text (Signature Description) from signature image using itextsharp

I need to retrieve the layer2 text from a signature. How can I get the description (under the signature image) using itextsharp? below is the code I'm using to get the sign date and username:
PdfReader reader = new PdfReader(pdfPath, System.Text.Encoding.UTF8.GetBytes(MASTER_PDF_PASSWORD));
using (MemoryStream memoryStream = new MemoryStream())
{
PdfStamper stamper = new PdfStamper(reader, memoryStream);
AcroFields acroFields = stamper.AcroFields;
List<String> names = acroFields.GetSignatureNames();
foreach (String name in names)
{
PdfPKCS7 pk = acroFields.VerifySignature(name);
String userName = PdfPKCS7.GetSubjectFields(pk.SigningCertificate).GetField("CN");
Console.WriteLine("Sign Date: " + pk.SignDate.ToString() + " Name: " + userName);
// Here i need to retrieve the description underneath the signature image
}
reader.RemoveUnusedObjects();
reader.Close();
stamper.Writer.CloseStream = false;
if (stamper != null)
{
stamper.Close();
}
}
and below is the code I used to set the description
PdfStamper st = PdfStamper.CreateSignature(reader, memoryStream, '\0', null, true);
PdfSignatureAppearance sap = st.SignatureAppearance;
sap.Render = PdfSignatureAppearance.SignatureRender.GraphicAndDescription;
sap.Layer2Font = font;
sap.Layer2Text = "Some text that i want to retrieve";
Thank you.

While Bruno addressed the issue starting with a PDF containing a "layer 2", allow me to first state that using these "signature layers" in PDF signature appearances is not required by the PDF specification, the specification actually does not even know these layers at all! Thus, if you try to parse a specific layer, you may not find such a "layer" or, even worse, find something that looks like that layer (a XObject named n2) which contains the wrong data.
That been said, though, Whether you look for text from a layer 2 or from the signature appearance as a whole, you can use iTextSharp text extraction capabilities. I used Bruno's code as base for retrieving the n2 layer.
public static void ExtractSignatureTextFromFile(FileInfo file)
{
try
{
Console.Out.Write("File: {0}\n", file);
using (var pdfReader = new PdfReader(file.FullName))
{
AcroFields fields = pdfReader.AcroFields;
foreach (string name in fields.GetSignatureNames())
{
Console.Out.Write(" Signature: {0}\n", name);
iTextSharp.text.pdf.AcroFields.Item item = fields.GetFieldItem(name);
PdfDictionary widget = item.GetWidget(0);
PdfDictionary ap = widget.GetAsDict(PdfName.AP);
if (ap == null)
continue;
PdfStream normal = ap.GetAsStream(PdfName.N);
if (normal == null)
continue;
Console.Out.Write(" Content of normal appearance: {0}\n", extractText(normal));
PdfDictionary resources = normal.GetAsDict(PdfName.RESOURCES);
if (resources == null)
continue;
PdfDictionary xobject = resources.GetAsDict(PdfName.XOBJECT);
if (xobject == null)
continue;
PdfStream frm = xobject.GetAsStream(PdfName.FRM);
if (frm == null)
continue;
PdfDictionary res = frm.GetAsDict(PdfName.RESOURCES);
if (res == null)
continue;
PdfDictionary xobj = res.GetAsDict(PdfName.XOBJECT);
if (xobj == null)
continue;
PRStream n2 = (PRStream) xobj.GetAsStream(PdfName.N2);
if (n2 == null)
continue;
Console.Out.Write(" Content of normal appearance, layer 2: {0}\n", extractText(n2));
}
}
}
catch (Exception ex)
{
Console.Error.Write("Error... " + ex.StackTrace);
}
}
public static String extractText(PdfStream xObject)
{
PdfDictionary resources = xObject.GetAsDict(PdfName.RESOURCES);
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
PdfContentStreamProcessor processor = new PdfContentStreamProcessor(strategy);
processor.ProcessContent(ContentByteUtils.GetContentBytesFromContentObject(xObject), resources);
return strategy.GetResultantText();
}
For the sample file signature_n2.pdf Bruno used you get this:
File: ...\signature_n2.pdf
Signature: Signature1
Content of normal appearance: This document was signed by Bruno
Specimen.
Content of normal appearance, layer 2: This document was signed by Bruno
Specimen.
As this sample uses the layer 2 as the OP expects, it already contains the text in question.

Please take a look at the following PDF: signature_n2.pdf. It contains a signature with the following text in the n2 layer:
This document was signed by Bruno
Specimen.
Before we can write code to extract this text, we should use iText RUPS to look at the internal structure of the PDF, so that we can find out where this /n2 layer is stored:
Based on this information, we can start writing our code. See the GetN2fromSig example:
public static void main(String[] args) throws IOException {
PdfReader reader = new PdfReader(SRC);
AcroFields fields = reader.getAcroFields();
Item item = fields.getFieldItem("Signature1");
PdfDictionary widget = item.getWidget(0);
PdfDictionary ap = widget.getAsDict(PdfName.AP);
PdfStream normal = ap.getAsStream(PdfName.N);
PdfDictionary resources = normal.getAsDict(PdfName.RESOURCES);
PdfDictionary xobject = resources.getAsDict(PdfName.XOBJECT);
PdfStream frm = xobject.getAsStream(PdfName.FRM);
PdfDictionary res = frm.getAsDict(PdfName.RESOURCES);
PdfDictionary xobj = res.getAsDict(PdfName.XOBJECT);
PRStream n2 = (PRStream) xobj.getAsStream(PdfName.N2);
byte[] stream = PdfReader.getStreamBytes(n2);
System.out.println(new String(stream));
}
We get the widget annotation for the signature field with name "signature1". Based on the info from RUPS, we know that we have to get the resources (/Resources) of the normal (/N) appearance (/AP). In the /XObjects dictionary, we'll find a form XObject named /FRM. This XObject has in turn also some /Resources, more specifically two /XObjects, one named /n0, the other one named /n2.
We get the stream of the /n2 object and we convert it to an uncompressed byte[]. When we print this array as a String, we get the following result:
BT
1 0 0 1 0 49.55 Tm
/F1 12 Tf
(This document was signed by Bruno)Tj
1 0 0 1 0 31.55 Tm
(Specimen.)Tj
ET
This is PDF syntax. BT and ET stand for "Begin Text" and "End Text". The Tm operator set the text matrix. The Tf operator set the font. Tj shows the strings that are delimited by ( and ). If you want the plain text, it's sufficient to extract only the text that is between parentheses.

using ITextSharp to extract and update links in an existing PDF

I need to post several (read: a lot) PDF files to the web but many of them have hard coded file:// links and links to non-public locations. I need to read through these PDFs and update the links to the proper locations. I've started writing an app using itextsharp to read through the directories and files, find the PDFs and iterate through each page. What I need to do next is find the links and then update the incorrect ones.
string path = "c:\\html";
DirectoryInfo rootFolder = new DirectoryInfo(path);
foreach (DirectoryInfo di in rootFolder.GetDirectories())
{
// get pdf
foreach (FileInfo pdf in di.GetFiles("*.pdf"))
{
string contents = string.Empty;
Document doc = new Document();
PdfReader reader = new PdfReader(pdf.FullName);
using (MemoryStream ms = new MemoryStream())
{
PdfWriter writer = PdfWriter.GetInstance(doc, ms);
doc.Open();
for (int p = 1; p <= reader.NumberOfPages; p++)
{
byte[] bt = reader.GetPageContent(p);
}
}
}
}
Quite frankly, once I get the page content I'm rather lost on this when it comes to iTextSharp. I've read through the itextsharp examples on sourceforge, but really didn't find what I was looking for.
Any help would be greatly appreciated.
Thanks.

This one is a little complicated if you don't know the internals of the PDF format and iText/iTextSharp's abstraction/implementation of it. You need to understand how to use PdfDictionary objects and look things up by their PdfName key. Once you get that you can read through the official PDF spec and poke around a document pretty easily. If you do care I've included the relevant parts of the PDF spec in parenthesis where applicable.
Anyways, a link within a PDF is stored as an annotation (PDF Ref 12.5). Annotations are page-based so you need to first get each page's annotation array individually. There's a bunch of different possible types of annotations so you need to check each one's SUBTYPE and see if its set to LINK (12.5.6.5). Every link should have an ACTION dictionary associated with it (12.6.2) and you want to check the action's S key to see what type of action it is. There's a bunch of possible ones for this, link's specifically could be internal links or open file links or play sound links or something else (12.6.4.1). You are looking only for links that are of type URI (note the letter I and not the letter L). URI Actions (12.6.4.7) have a URI key that holds the actual address to navigate to. (There's also an IsMap property for image maps that I can't actually imagine anyone using.)
Whew. Still reading? Below is a full working VS 2010 C# WinForms app based on my post here targeting iTextSharp 5.1.1.0. This code does two main things: 1) Create a sample PDF with a link in it pointing to Google.com and 2) replaces that link with a link to bing.com. The code should be pretty well commented but feel free to ask any questions that you might have.
using System;
using System.Text;
using System.Windows.Forms;
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.IO;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
//Folder that we are working in
private static readonly string WorkingFolder = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Hyperlinked PDFs");
//Sample PDF
private static readonly string BaseFile = Path.Combine(WorkingFolder, "OldFile.pdf");
//Final file
private static readonly string OutputFile = Path.Combine(WorkingFolder, "NewFile.pdf");
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
CreateSamplePdf();
UpdatePdfLinks();
this.Close();
}
private static void CreateSamplePdf()
{
//Create our output directory if it does not exist
Directory.CreateDirectory(WorkingFolder);
//Create our sample PDF
using (iTextSharp.text.Document Doc = new iTextSharp.text.Document(PageSize.LETTER))
{
using (FileStream FS = new FileStream(BaseFile, FileMode.Create, FileAccess.Write, FileShare.Read))
{
using (PdfWriter writer = PdfWriter.GetInstance(Doc, FS))
{
Doc.Open();
//Turn our hyperlink blue
iTextSharp.text.Font BlueFont = FontFactory.GetFont("Arial", 12, iTextSharp.text.Font.NORMAL, iTextSharp.text.BaseColor.BLUE);
Doc.Add(new Paragraph(new Chunk("Go to URL", BlueFont).SetAction(new PdfAction("http://www.google.com/", false))));
Doc.Close();
}
}
}
}
private static void UpdatePdfLinks()
{
//Setup some variables to be used later
PdfReader R = default(PdfReader);
int PageCount = 0;
PdfDictionary PageDictionary = default(PdfDictionary);
PdfArray Annots = default(PdfArray);
//Open our reader
R = new PdfReader(BaseFile);
//Get the page cont
PageCount = R.NumberOfPages;
//Loop through each page
for (int i = 1; i <= PageCount; i++)
{
//Get the current page
PageDictionary = R.GetPageN(i);
//Get all of the annotations for the current page
Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);
//Make sure we have something
if ((Annots == null) || (Annots.Length == 0))
continue;
//Loop through each annotation
foreach (PdfObject A in Annots.ArrayList)
{
//Convert the itext-specific object as a generic PDF object
PdfDictionary AnnotationDictionary = (PdfDictionary)PdfReader.GetPdfObject(A);
//Make sure this annotation has a link
if (!AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK))
continue;
//Make sure this annotation has an ACTION
if (AnnotationDictionary.Get(PdfName.A) == null)
continue;
//Get the ACTION for the current annotation
PdfDictionary AnnotationAction = (PdfDictionary)AnnotationDictionary.Get(PdfName.A);
//Test if it is a URI action
if (AnnotationAction.Get(PdfName.S).Equals(PdfName.URI))
{
//Change the URI to something else
AnnotationAction.Put(PdfName.URI, new PdfString("http://www.bing.com/"));
}
}
}
//Next we create a new document add import each page from the reader above
using (FileStream FS = new FileStream(OutputFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (Document Doc = new Document())
{
using (PdfCopy writer = new PdfCopy(Doc, FS))
{
Doc.Open();
for (int i = 1; i <= R.NumberOfPages; i++)
{
writer.AddPage(writer.GetImportedPage(R, i));
}
Doc.Close();
}
}
}
}
}
}
EDIT
I should note, this only changes the actual link. Any text within the document won't get updated. Annotations are drawn on top of text but aren't really tied to the text underneath in anyway. That's another topic completely.

Noted if the Action is indirect it will not return a dictionary and you will have an error:
PdfDictionary AnnotationAction = (PdfDictionary)AnnotationDictionary.Get(PdfName.A);
In cases of possible indirect dictionaries:
PdfDictionary Action = null;
//Get action directly or by indirect reference
PdfObject obj = Annotation.Get(PdfName.A);
if (obj.IsIndirect) {
Action = PdfReader.GetPdfObject(obj);
} else {
Action = (PdfDictionary)obj;
}
In that case you have to investigate the returned dictionary to figure out where the URI is found. As with an indirect /Launch dictionary the URI is located in the /F item being of type PRIndirectReference with the /Type being a /FileSpec and the URI located in the value of /F

Added code for dealing with indirect and launch actions and null annotation-dictionary:
PdfReader r = new PdfReader(#"d:\kb2\" + f);
for (int i = 1; i <= r.NumberOfPages; i++) {
//Get the current page
var PageDictionary = r.GetPageN(i);
//Get all of the annotations for the current page
var Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);
//Make sure we have something
if ((Annots == null) || (Annots.Length == 0))
continue;
foreach (var A in Annots.ArrayList) {
var AnnotationDictionary = PdfReader.GetPdfObject(A) as PdfDictionary;
if (AnnotationDictionary == null)
continue;
//Make sure this annotation has a link
if (!AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK))
continue;
//Make sure this annotation has an ACTION
if (AnnotationDictionary.Get(PdfName.A) == null)
continue;
var annotActionObject = AnnotationDictionary.Get(PdfName.A);
var AnnotationAction = (PdfDictionary)(annotActionObject.IsIndirect() ? PdfReader.GetPdfObject(annotActionObject) : annotActionObject);
var type = AnnotationAction.Get(PdfName.S);
//Test if it is a URI action
if (type.Equals(PdfName.URI)) {
//Change the URI to something else
string relativeRef = AnnotationAction.GetAsString(PdfName.URI).ToString();
AnnotationAction.Put(PdfName.URI, new PdfString(url));
} else if (type.Equals(PdfName.LAUNCH)) {
//Change the URI to something else
var filespec = AnnotationAction.GetAsDict(PdfName.F);
string url = filespec.GetAsString(PdfName.F).ToString();
AnnotationAction.Put(PdfName.F, new PdfString(url));
}
}
}
//Next we create a new document add import each page from the reader above
using (var output = File.OpenWrite(outputFile.FullName)) {
using (Document Doc = new Document()) {
using (PdfCopy writer = new PdfCopy(Doc, output)) {
Doc.Open();
for (int i = 1; i <= r.NumberOfPages; i++) {
writer.AddPage(writer.GetImportedPage(r, i));
}
Doc.Close();
}
}
}
r.Close();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to embed all fonts from other PDF iText 7 - c#

Related

How to Page Break HTML Content in HTML Renderer

Convert image to pdf using itextsharp [duplicate]

PDFsharp does not see pages in documents

Get Layer2 Text (Signature Description) from signature image using itextsharp

using ITextSharp to extract and update links in an existing PDF

Categories

Resources