Set BaseUrl of an existing Pdf Document

Set BaseUrl of an existing Pdf Document - c#

We're having trouble setting a BaseUrl using iTextSharp. We have used Adobes Implementation for this in the past, but we got some severe performance issues. So we switched to iTextSharp, which is aprox 10 times faster.
Adobe enabled us to set a base url for each document. We really need this in order to deploy our documents on different servers. But we cant seem to find the right code to do this.
This code is what we used with Adobe:
public bool SetBaseUrl(object jso, string baseUrl)
{
try
{
object result = jso.GetType().InvokeMember("baseURL", BindingFlags.SetProperty, null, jso, new Object[] {baseUrl });
return result != null;
}
catch
{
return false;
}
}
A lot of solutions describe how you can insert links in new or empty documents. But our documents already exist and do contain more than just text. We want to overlay specific words with a link that leads to one or more other documents. Therefore, its really important to us that we can insert a link without accessing the text itself. Maybe lay a box ontop of these words and set its position (since we know where the words are located in the document)
We have tried different implementations, using the setAction method, but it doesnt seem to work properly. The result was in most cases, that we saw out box, but there was no link inside or associated with it. (the cursor didn't change and nothing happend, when i clicked inside the box)
Any help is appreciated.

I've made you a couple of examples.
First, let's take a look at BaseURL1. In your comment, you referred to JavaScript, so I created a document to which I added a snippet of document-level JavaScript:
writer.addJavaScript("this.baseURL = \"http://itextpdf.com/\";");
This works perfectly in Adobe Acrobat, but when you try this in Adobe Reader, you get the following error:
NotAllowedError: Security settings prevent access to this property or
method. Doc.baseURL:1:Document-Level:0000000000000000
This is consistent with the JavaScript reference for Acrobat where it is clearly indicated that special permissions are needed to change the base URL.
So instead of following your suggested path, I consulted ISO-32000-1 (which was what I asked you to do, but... I've beaten you in speed).
I discovered that you can add a URI dictionary to the catalog with a Base entry. So I wrote a second example, BaseURL2, where I add this dictionary to the root dictionary of the PDF:
PdfDictionary uri = new PdfDictionary(PdfName.URI);
uri.put(new PdfName("Base"), new PdfString("http://itextpdf.com/"));
writer.getExtraCatalog().put(PdfName.URI, uri);
Now the BaseURL works in both Acrobat and Reader.
Assuming that you want to add a BaseURL to existing documents, I wrote BaseURL3. In this example, we add the same dictionary to the root dictionary of an existing PDF:
PdfReader reader = new PdfReader(src);
PdfDictionary uri = new PdfDictionary(PdfName.URI);
uri.put(new PdfName("Base"), new PdfString("http://itextpdf.com/"));
reader.getCatalog().put(PdfName.URI, uri);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
Using this code, you can change a link that points to "index.php" (base_url.pdf) into a link that points to "http://itextpdf.com/index.php" (base_url_3.pdf).
Now you can replace your Adobe license with a less expensive iTextSharp license ;-)

Related

PdfReader.unethicalreading = true; not working

We have an interface that allows users to send back copies of the receipts to our customers. The user feeds in up to a five or so PDFs and they all get merged using itextsharp's (v5.5.13.1) PdfReader.
Unfortunately some users somehow password-protect a file or two here and there... I want to avoid the issue of needing to trust the user to supply us with unprotected files.
Relevant code
PdfReader.unethicalreading = true;
PdfReader reader = null;
foreach(string pdf in pdfs)
{
reader = new PdfReader(pdf);//iTextSharp.text.exceptions.BadPasswordException : 'Bad user password'
}
If there are no password-protected files, the merge goes off without a hitch. Otherwise when it comes to the offending file, the above will throw at the line with the thrown exception.
From what I understood unethicalreading was all that was needed. I even tried to assign it immediately before the line that throws but it gives me the same result.
Does this flag work on C# since most of the help I see online for this library is written for Java (normal, I know, but I wonder if this part of the library has been properly ported)?

Visible signature created using iText 7 not shown in chrome

I am using iText 7 to sign pdf documents.
This works without problems, and the signature is shown as valid.
In addition to the digital signature, i want to show a visual representation on the pdf. This is described in the digital signature book chapter 2.4 Creating different signature appearances.
The produced pdf shows this appearance if i open it using adobe reader.
The first image is a pdf created using word and the save as pdf functionality.
The second image is a demo pdf i just downloaded random.
If i open the first pdf in chrome, the signature appearance text is not shown, but if i open the pdf which was initially created using word, the signature apperance is missing.
Any ideas on whats wrong with the pdf which doesn't show the signature appearance in chrome?
edit: Links to the documents
Pdf which shows signature in chrome
https://1drv.ms/b/s!AkROTDoCWFJnkd5VOFjUHZfpQXzJWQ?e=MeyZje
Pdf which doesn't show signature in chrome
https://1drv.ms/b/s!AkROTDoCWFJnkd5W5P3MCbb8fwLASA?e=zsmks0
edit 2: Code sample
The following code sample will sign a pdf document using a local certificate and place some text into the SignatureAppearance which is not shown in chrome.
using iText.Kernel.Geom;
using iText.Kernel.Pdf;
using iText.Signatures;
using System.IO;
using System.Security.Cryptography.X509Certificates;
namespace PdfSigning.Lib.Helpers
{
public class SignPdfTest
{
public static byte[] SingPdfUsingCertificate(X509Certificate2 cert2, byte[] pdfToSign)
{
var apk = Org.BouncyCastle.Security.DotNetUtilities.GetKeyPair(cert2.PrivateKey).Private;
IExternalSignature pks = new PrivateKeySignature(apk, DigestAlgorithms.SHA512);
var cp = new Org.BouncyCastle.X509.X509CertificateParser();
var chain = new[] { cp.ReadCertificate(cert2.RawData) };
using (PdfReader reader = new PdfReader(new MemoryStream(pdfToSign)))
{
using (MemoryStream fout = new MemoryStream())
{
StampingProperties sp = new StampingProperties();
sp.UseAppendMode();
PdfSigner signer = new PdfSigner(reader, fout, sp);
PdfSignatureAppearance appearance = signer.GetSignatureAppearance();
appearance.SetPageNumber(1);
appearance.SetLayer2Text("Hello world");
appearance.SetLayer2FontSize(8);
Rectangle pr = new Rectangle(10, 10, 200, 100);
appearance.SetPageRect(pr);
appearance.SetRenderingMode(PdfSignatureAppearance.RenderingMode.DESCRIPTION);
appearance.SetPageRect(pr);
signer.SignDetached(pks, chain, null, null, null, 0, PdfSigner.CryptoStandard.CMS);
return fout.ToArray();
}
}
}
}
}
private static void SignDocumentUsingCertificateConfiguration()
{
try
{
var certificateSignatureConfiguration = new CertificateSignatureConfiguration();
var cert2 = new X509Certificate2(#"C:\temp\MyCertificate.pfx", "mypassword", X509KeyStorageFlags.Exportable);
CertificatePdfSigner certPdfSigner = new CertificatePdfSigner(certificateSignatureConfiguration);
byte[] signedPdf = PdfSigning.Lib.Helpers.SignPdfTest.SingPdfUsingCertificate(cert2, File.ReadAllBytes(#"C:\temp\WordSaveAsPdf.pdf"));
File.WriteAllBytes(#"C:\temp\WordSaveAsPdf_Signed.pdf", signedPdf);
Console.WriteLine("Done");
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}

In short
Chrome appears to not read object streams of hybrid reference PDFs, in particular not in the incremental update added during signature creation.
iText, on the other hand, puts nearly all its changes during signing into an object stream.
Thus, Chrome is not aware of the added signature and its appearance.
One can resolve the situation by forcing iText not to create an object stream here.
What is special about the Word generated source PDF?
PDF files contain object cross reference information which map object numbers to offsets of the respective starts of these objects in the file. These information can be stored in two ways, as cross reference table and (since PDF 1.5) also as cross reference stream. Also since PDF 1.5 the format allows to put non-stream objects into so called object streams which allows superior compression as only stream contents can be compressed.
As most PDF viewers at the time PDF 1.5 has been introduced did not support cross reference and object streams, a mixed, hybrid reference style was also introduced then. In this style the basic objects in a PDF which are strictly necessary to display it, are added normally (not in object streams) and are referenced from cross reference tables. Extra information which is not strictly necessary is then added in object streams and referenced from cross reference streams.
MS Word creates PDFs in this hybrid style and is virtually the only software to do so.
What is special about the iText signed result PDF?
iText put nearly all the changes into an object stream in a new incremental update.
Apparently, though, Chrome does not fully support object and cross reference streams, in particular not if combined with further incremental updates.
Thus, Chrome is not aware of the added signature and its visualization.
How to resolve the problem?
What we need to do, therefore, is convince iText that it shall not add important data in an object stream during signing. Due to member variable visibilities this is not as easy as one would like; I used reflection here for that.
In your code simply use the following PdfSignerNoObjectStream instead of PdfSigner:
public class PdfSignerNoObjectStream : PdfSigner
{
public PdfSignerNoObjectStream(PdfReader reader, Stream outputStream, StampingProperties properties) : base(reader, outputStream, properties)
{
}
protected override PdfDocument InitDocument(PdfReader reader, PdfWriter writer, StampingProperties properties)
{
try
{
return base.InitDocument(reader, writer, properties);
}
finally
{
FieldInfo propertiesField = typeof(PdfWriter).GetField("properties", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic);
WriterProperties writerProperties = (WriterProperties) propertiesField.GetValue(writer);
writerProperties.SetFullCompressionMode(false);
}
}
}
Beware, though, tweaking iText functionality like this is not guaranteed to work across versions. I tested it for a recent iText-7.1.7-SNAPSHOT development state; I expect it to also work for the previous 7.1.x versions.
Is this a Chrome bug? Or an iText bug? Or what?
Most likely it's kind of both.
On one hand the Chrome PDF viewer appears to have issues with hybrid reference PDFs. Considering how long they have been part of the PDF format, that is somewhat disappointing.
And on the other hand the PDF specification requires in the context of hybrid reference documents:
In general, the objects that may be hidden are optional objects specified by indirect references. [...]
Items that shall be visible include the entire page tree, fonts, font descriptors, and width tables. Objects that may be hidden in a hybrid-reference file include the structure tree, the outline tree, article threads, annotations, destinations, Web Capture information, and page labels.
(ISO 32000-1, section 7.5.8.4 Compatibility with Applications That Do Not Support Compressed Reference Streams)
In the case at hand an (updated) page object is in the object stream, i.e. hidden from viewers not supporting cross reference and object streams.
Currently iText 7 PdfDocument attempts to enforce FullCompression on PdfWriters if the underlying PdfReader has any cross reference stream (HasXrefStm):
writer.properties.isFullCompression = reader.HasXrefStm();
(PdfDocument method Open)
Probably it shouldn't enforce that if the PdfReader also is identified as hybrid reference stream (HasHybridXref).

This might be simply caused by the chrome build-in PDF reader. As far as I understood his case, the person who requested help from Chrome devs in this question has received some answers and was redirected to another part of the forum where he could get help. I can try to recreate the problem with itext-sharp 5 (I used that in a previous project) and see if that signature is not shown in Chrome but the odds won't be good.

This sounds an awful lot like a case of the "Needs Appearances" flag not being set. Back in my day (wheeze) iText form fields were generated with as little graphical data as possible, and would set the \NeedsAppearances flag to true, letting the PDF viewer in question (Acrobat Reader was about it back then) that it needed to generate the form fields' appearances before trying to draw them to screen.
And visible PDF Signatures are held in form fields.
So its at least theoretically possible that you can fix this programmatically by telling iText to (re?)generate the form field appearances.

How to Check PDF is Reader enabled or not using C#?

My only requirement is to find a selected pdf in a folder is Reader enabled or not, more specifically if usage rights are defined in a way that allows people to add annotations (e.g. comments).
I am doing this in windows application. If I click a button, an event is triggered searching a folder for PDF files. This event needs to check whether or not the PDFs in the folder are Reader enabled for comments. If they are, I need to remove the comment usage rights or revert the PDF back to its original version.
My code can only find PDF files in the folder. I don`t know how to check if the selected PDF is comment enabled or not. Please be gentle and suggest solution.
Here's my code:
private void button1_Click(object sender, EventArgs e)
{
{
string[] filePaths = Directory.GetFiles("D:\\myfolder\\pdffolder");
List<ListViewItem> files = new List<ListViewItem>();
foreach (string filePath in filePaths)
{
---need to check comment enabled or not---
}
}
}

You want to know if a PDF is Reader enabled or not. Reader enabling is established by adding a digital signature known as a Usage Rights (UR) signature. If you have an instance of PdfReader, you can check whether or not a PDF is Reader enabled by using the hasUsageRights() method:
PdfReader reader = new PdfReader(path_to_file);
boolean isReaderEnabled = reader.hasUsageRights();
Usage rights can encompass many different things, such as allowing people to comment, allowing people to save a filled out form, allowing people to sign a document,...
To find out which rights are enabled, you have to inspect either the UR or the UR3 dictionary (note that UR is deprecated, but there may still be PDFs out there that have a UR dictionary):
PdfDictionary catalog = reader.getCatalog();
PdfDictionary perms = catalog.getAsDict(PdfName.PERMS);
PdfDictionary ur = null;
if (perms != null) {
PdfDictionary ur = perms.getAsDict(PdfName.UR);
if (ur == null)
ur = perms.getAsDict(PdfName.UR3);
}
}
If ur remains null, there are no usage rights. If you only want to check if commenting is enabled, you'll have to inspect the entries of the ur dictionary. There will be an /Annots entry with as value an array with values such as Create, Delete, Modify, Copy, Import, Export, Online and SummaryView. FOr the full overview of possible entries, see Table 255 "Entries in the UR transform parameters dictionary" of ISO-32000-1.
You can remove all usage rights like this:
PdfReader reader = new PdfReader(path_to_file);
if (reader.hasUsageRights()) {
reader.removeUsageRights();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(path_to_new_file));
stamper.close();
}
It is impossible to remove only the usage rights for commenting while preserving other usage rights (if present). Just removing the /Annots entry from the /UR or /UR3 dictionary will break the digital signature that enables usage rights. This digital signature is created with a private key owned by Adobe and no third party tool (other than an Adobe product) is allowed to use that key.
Final note:
all code snippet were written in Java, but iTextSharp has corresponding methods or properties in C#. It shouldn't be a problem to port the snippets to C#.
In many cases, it's sufficient to change a lower case into an upper case:
Java: object.add(something);
C#: object.Add(something);
Or you have to remove the set/get:
Java: object.setSomething(something);
C#: object.Something = something;

Thanks for all who takes effect to my question. I finally found the answer in a similar way by reading the PDF and check for a particular string (particular string presented if Comment enabled on the PDF).
The particular string starts with /Annot ....., First I read the PDF thru System.IO, then store in a string and looking for the particular string, If the searching string available then the PDF is comment enabled else not.

YASR - Yet another search and replace question

Environment: asp.net c# openxml
Ok, so I've been reading a ton of snippets and trying to recreate the wheel, but I'm hoping that somone can help me get to my desination faster. I have multiple documents that I need to merge together... check... I'm able to do that with openxml sdk. Birds are singing, sun is shining so far. Now that I have the document the way I want it, I need to search and replace text and/or content controls.
I've tried using my own text - {replace this} but when I look at the xml (rename docx to zip and view the file), the { is nowhere near the text. So I either need to know how to protect that within the doucment so they don't diverge or I need to find another way to search and replace.
I'm able to search/replace if it is an xml file, but then I'm back to not being able to combine the doucments easily.
Code below... and as I mentioned... document merge works fine... just need to replace stuff.
* Update * changed my replace call to go after the tag instead of regex. I have the right info now, but the .Replace call doesn't seem to want to work. Last four lines are for validation that I was seeing the right tag contents. I simply want to replace those contents now.
protected void exeProcessTheDoc(object sender, EventArgs e)
{
string doc1 = Server.MapPath("~/Templates/doc1.docx");
string doc2 = Server.MapPath("~/Templates/doc2.docx");
string final_doc = Server.MapPath("~/Templates/extFinal.docx");
File.Delete(final_doc);
File.Copy(doc1, final_doc);
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(final_doc, true))
{
string altChunkId = "AltChunkId2";
MainDocumentPart mainPart = myDoc.MainDocumentPart;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
using (FileStream fileStream = File.Open(doc2, FileMode.Open))
chunk.FeedData(fileStream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
mainPart.Document.Save();
}
exeSearchReplace(final_doc);
}
public static void GetPropertyFromDocument(string document, string outdoc)
{
XmlDocument xmlProperties = new XmlDocument();
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, false))
{
ExtendedFilePropertiesPart appPart = wordDoc.ExtendedFilePropertiesPart;
xmlProperties.Load(appPart.GetStream());
}
XmlNodeList chars = xmlProperties.GetElementsByTagName("Company");
chars.Item(0).InnerText.Replace("{ClientName}", "Penn Inc.");
StreamWriter sw;
sw = File.CreateText(outdoc);
sw.WriteLine(chars.Item(0).InnerText);
sw.Close();
}
}
}

If I'm reading this right, you have something like "{replace me}" in a .docx and then when you loop through the XML, you're finding things like <t>{replace</t><t> me</><t>}</t> or some such havoc. Now, with XML like that, it's impossible to create a routine that will replace "{replace me}".
If that's the case, then it's very, very likely related to the fact that it's considered a proofing error. i.e. it's misspelled as far as Word is concerned. The cause of it is that you've opened the document in Word and have proofing turned on. As such, the text is marked as "isDirty" and split up into different runs.
The two ways about fixing this are:
Client-side. In Word, just make sure all proofing errors are either corrected or ignored.
Format-side. Use the MarkupSimplifier tool that is part of Open XML Package Editor Power Tool for Visual Studio 2010 to fix this outside of the client. Eric White has a great (and timely for you - just a few days old) write up here on it: Getting Started with Open XML PowerTools Markup Simplifier

If you want to search and replace text in a WordprocessingML document, there is a fairly easy algorithm that you can use:
Break all runs into runs of a single character. This includes runs that have special characters such as a line break, carriage return, or hard tab.
It is then pretty easy to find a set of runs that match the characters in your search string.
Once you have identified a set of runs that match, then you can replace that set of runs with a newly created run (which has the run properties of the run containing the first character that matched the search string).
After replacing the single-character runs with a newly created run, you can then consolidate adjacent runs with identical formatting.
I've written a blog post and recorded a screen-cast that walks through this algorithm.
Blog post: http://openxmldeveloper.org/archive/2011/05/12/148357.aspx
Screen cast: http://www.youtube.com/watch?v=w128hJUu3GM
-Eric

Can I fill in an encrypted PDF with iTextSharp?

I have a fillable, saveable PDF file that has an owner password (that I don't have access to). I can fill it out in Adobe reader, export the FDF file, modify the FDF file, and then import it.
Then I tried to do it with iText for .NET. I can't create a PdfStamper from my PdfReader because I didn't provide the owner password to the reader. Is there any way to do this programmatically or must I recreate the document?
Even using FdfReader requires a PdfStamper. Am I missing anything? Anything legal that is - I'm pretty sure I could hack the document, but I can't. Ironically, recreating it would probably be ok.

This line will bypass edit password checking in iTextSharp:
PdfReader.unethicalreading = true;

[I found this question several months after it was posted and I'm posting this solution now for anyone who comes across this question in a search.]
I was in the exact same situation: my customer had a PDF with fillable fields that I needed to programmatically access. Unfortunately the PDF was password protected and they didn't have the password so I found couldn't work with their file.
What I discovered was that iTextSharp version 4.0.4 (and later) enforces password restrictions, earlier versions did not.
So I downloaded version 4.0.3 and sure enough it worked. In my case I didn't even have to change my code to use this older version.
You can download 4.0.3 (and all other versions) at SourceForge.

Two important things
Set PdfReader.unethicalreading = true to prevent BadPasswordException.
Set append mode in PdfStamper's constructor, otherwise the Adobe Reader Extensions signature becomes broken and Adobe Reader will display following message: "This document contained certain rights to enable special features in Adobe Reader. The document has been changed since it was created and these rights are no longer valid. Please contact the author for the original version of this document."
So all you need to do is this:
PdfReader.unethicalreading = true;
using (var pdfReader = new PdfReader("form.pdf"))
{
using (var outputStream = new FileStream("filled.pdf", FileMode.Create, FileAccess.Write))
{
using (var stamper = new iTextSharp.text.pdf.PdfStamper(pdfReader, outputStream, '\0', true))
{
stamper.AcroFields.Xfa.FillXfaForm("data.xml");
}
}
}
See How to fill XFA form using iText?

Unless someone else chimes in, I'll assume the answer is "No"
I wound up regenerating the PDF in an unencrypted form.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.