iText7 for .NET SetAuthor() is Doubling the Author Value - c#

When using the iText7 library to set a PDF document's properties the value for the builtin property author is getting doubled like '"Lastname, Firstname"; Lastname; Firstname'. It should be 'Lastname, Firstname'. It is getting the double quotes added, the name value twice and a comma changed to a semicolon. This has happened in two versions, 7.1.17 and 7.2.1.
The steps in creating the PDF are:
Use Microsoft.Interop.Word Document.ExportAsFixedFormat() to create the first PDF used by readerPDF. This does not get the custom document properties to populate. I need to set four custom properties used by a later step in this process.
Use iText7 to read the above PDF, add the custom document properties and also reset the built in properties and write that out to a second file accessed trough writerPDF. iText7 only modifies a PDF by reading from file and writing to a second.
In step 2 the code calls the command to set the author property, PdfDocument.PdfDocumentInfo.SetAuthor(authorvalue);
The problem seems to only happen with commas in the author value, and I need the commas to do Lastname, Firstname. That is a requirement. If I do not reset the property Author is has double quotes around it, that is not useful for our project. All other properties, builtin and custom are working as expected.
The code looks like this:
iText.Kernel.Pdf.PdfReader readerPDF;
iText.Kernel.Pdf.PdfWriter writerPDF;
string authorValue = "Lastname, Firstname";
readerPDF = new PdfReader(saveAsPathAndNameTemp);
writerPDF = new PdfWriter(pSavedPathAndPDFName);
PdfDocument pdfdocument = new PdfDocument(readerPDF, writerPDF);
PdfDocumentInfo info = pdfdocument.GetDocumentInfo();
info.SetAuthor(string.Empty);
info.SetAuthor(authorValue);
pdfdocument.Close();
readerPDF.Close();
writerPDF.Close();

There is issue in Adobe Acrobat reader. I used the next code to reproduce your issue on Java:
String filename = DESTINATION_FOLDER + "openSimpleDoc.pdf";
String author = "Test, Author";
String title = "Test, Title";
String subject = "Test, Subject";
PdfDocument pdfDoc = new PdfDocument(new PdfWriter(filename));
pdfDoc.getDocumentInfo().setAuthor(author).setTitle(title).setSubject(subject);
pdfDoc.addNewPage();
pdfDoc.close();
PdfReader reader = new PdfReader(filename);
pdfDoc = new PdfDocument(reader);
Assert.assertEquals(author, pdfDoc.getDocumentInfo().getAuthor());
Assert.assertEquals(title, pdfDoc.getDocumentInfo().getTitle());
Assert.assertEquals(subject, pdfDoc.getDocumentInfo().getSubject());
pdfDoc.close();
As you can see, I didn't set the author twice and when I open the resulting PDF in Adobe Acrobat, I see that the author's name is enclosed in two quotes:
But in fact there are no two quotes. You can see it in PDF Studio, RUPS and Notepad++:
PDF Studio
RUPS
Notepad++

I resolved the main issue, the duplicating of the Author value if a comma was in the value. I updated the reference BouncyCastle.Crypto to 1.9.0.0. That resolved the main issue. the secondary issue of the double quotes in properties dialog box is address below by Nikita Kovaliov. I thank this poster for there input.

Related

How to read manually added text within a pdf file with c#?

i'm using iTextSharp with this C# code:
string parsedText = string.Empty;
PdfReader reader = new PdfReader(pdfPath);
ITextExtractionStrategy its = new LocationTextExtractionStrategy();
parsedText = PdfTextExtractor.GetTextFromPage(reader, 1, its);
parsedText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(parsedText)));
It parses the pdf as expected, but it does not parse text, that is manually added with tools like FoxItReader oder NuancePDF.
Our accounting is manually adding an internal invoicenumber on each pdf and I need to parse that number. For some reason i can't find it.
It looks like it is on another "layer" of something that is not parsed.
Any ideas how to read those numbers?
Thanks
It is possible that the internal invoice number is being added as an annotation, rather than as actual text on the page.
Have you tried iText's facilities for extracting annotations to see if there are any on the page?

Set BaseUrl of an existing Pdf Document

We're having trouble setting a BaseUrl using iTextSharp. We have used Adobes Implementation for this in the past, but we got some severe performance issues. So we switched to iTextSharp, which is aprox 10 times faster.
Adobe enabled us to set a base url for each document. We really need this in order to deploy our documents on different servers. But we cant seem to find the right code to do this.
This code is what we used with Adobe:
public bool SetBaseUrl(object jso, string baseUrl)
{
try
{
object result = jso.GetType().InvokeMember("baseURL", BindingFlags.SetProperty, null, jso, new Object[] {baseUrl });
return result != null;
}
catch
{
return false;
}
}
A lot of solutions describe how you can insert links in new or empty documents. But our documents already exist and do contain more than just text. We want to overlay specific words with a link that leads to one or more other documents. Therefore, its really important to us that we can insert a link without accessing the text itself. Maybe lay a box ontop of these words and set its position (since we know where the words are located in the document)
We have tried different implementations, using the setAction method, but it doesnt seem to work properly. The result was in most cases, that we saw out box, but there was no link inside or associated with it. (the cursor didn't change and nothing happend, when i clicked inside the box)
Any help is appreciated.
I've made you a couple of examples.
First, let's take a look at BaseURL1. In your comment, you referred to JavaScript, so I created a document to which I added a snippet of document-level JavaScript:
writer.addJavaScript("this.baseURL = \"http://itextpdf.com/\";");
This works perfectly in Adobe Acrobat, but when you try this in Adobe Reader, you get the following error:
NotAllowedError: Security settings prevent access to this property or
method. Doc.baseURL:1:Document-Level:0000000000000000
This is consistent with the JavaScript reference for Acrobat where it is clearly indicated that special permissions are needed to change the base URL.
So instead of following your suggested path, I consulted ISO-32000-1 (which was what I asked you to do, but... I've beaten you in speed).
I discovered that you can add a URI dictionary to the catalog with a Base entry. So I wrote a second example, BaseURL2, where I add this dictionary to the root dictionary of the PDF:
PdfDictionary uri = new PdfDictionary(PdfName.URI);
uri.put(new PdfName("Base"), new PdfString("http://itextpdf.com/"));
writer.getExtraCatalog().put(PdfName.URI, uri);
Now the BaseURL works in both Acrobat and Reader.
Assuming that you want to add a BaseURL to existing documents, I wrote BaseURL3. In this example, we add the same dictionary to the root dictionary of an existing PDF:
PdfReader reader = new PdfReader(src);
PdfDictionary uri = new PdfDictionary(PdfName.URI);
uri.put(new PdfName("Base"), new PdfString("http://itextpdf.com/"));
reader.getCatalog().put(PdfName.URI, uri);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
Using this code, you can change a link that points to "index.php" (base_url.pdf) into a link that points to "http://itextpdf.com/index.php" (base_url_3.pdf).
Now you can replace your Adobe license with a less expensive iTextSharp license ;-)

Assigning an Existing Custom Layout/Navigator to a Portfolio Generated By iTextSharp

I am generating collections (Acrobat Portfolios) via iTextSharp. I would like to assign an existing custom navigator (custom layout) to the generated collection. I believe iTextSharp allows for the CUSTOM parameter to define a custom navigator, as in the last code line of this block:
Document document = new Document();
FileStream stream = new FileStream(portfolioPath, FileMode.Create);
PdfWriter writer = PdfWriter.GetInstance(document, stream);
document.Open();
document.Add(new Paragraph(" "));
PdfCollection collection = new PdfCollection(PdfCollection.CUSTOM);
//The integer 3 can also substitute for PdfCollection.CUSTOM
However, when the collection/portfolio is generated the CUSTOM parameter inserts the TILE layout within the generated collection. I want to have the CUSTOM parameter use a custom .nav navigator I developed to insert a custom layout.
I located this post on SO:
How to embed a .nav file into pdf portfolio?
which lead to:
Adobe® Supplement to the
ISO 32000
BaseVersion: 1.7
ExtensionLevel: 3e
Pages 34 - 37 of this document says it is possible to have the collection access the custom navigator by adjusting the Navigator entry in the collection dictionary and the navigator dictionary itself. Additionally, page 541 of the Second Edition of iText in Action implies this is possible (and it is my hope what is possible in iText is also possible in iTextSharp).
So is it possible -- using iTextSharp -- to have a generated collection/portfolio access and implement a custom layout/navigator? If so, how? Or is there another way to do this via C# and/or through some workaround? All help is greatly appreciated.
I found a different way to dictate the custom layout/navigator using iTextSharp. Instead of defining the custom layout during or after
PdfCollection collection = new PdfCollection(PdfCollection.CUSTOM);
I threw out the code listed in my question and used the iTextSharp stamper.
First, I created an empty portfolio file. Within this file I assigned the custom layout that I wanted to use. When opened the file displays the layout, but contains no attached files. This file will serve as a template and assign it's navigator to each newly created iTextSharp PDF using this code:
const string templatePath = #"C:\PortfolioTemplate\PortfolioTemplate.pdf"; //this file will contain the custom navigator/layout for the new pdf
const string portfolioPath = #"C:\OutputFile\NewPortfolio.pdf";
string[] packageitems = { file-to-add-to-collection };
PdfReader reader = new PdfReader(templatePath);
FileStream outputstream = new FileStream(portfolioPath, FileMode.Create);
PdfStamper stamp = new PdfStamper(reader, outputstream);
PdfFileSpecification fs = PdfFileSpecification.FileEmbedded(stamp.Writer, packageitems[0], packageitems[0], null);
stamp.AddFileAttachment(packageitems[0], fs);
stamp.Close();
I used the above proof of concept to loop through all the files in a directory folder and I have created large portfolios without issue that are styled using the custom navigator/layout I wanted.
After coming up with the idea of using a template to pass the navigator into a newly created portfolio, I used the code in the below link to guide me to the above conclusion:
http://itextsharp.10939.n7.nabble.com/Attach-file-td3812.html

Trying to insert an image into a pdf‏ in c#

I need to insert an image based on a generated barcode file.
The problem I'm having is when using the iTextSharp library I can normally fill in text such as
PdfReader pdfReader = new PdfReader(oldFile);
PdfStamper pdfStamper = new PdfStamper(pdfReader, outFile);
AcroFields fields = pdfStamper.AcroFields;
fields.SetField("topmostSubform[0].Page1[0].BARCODE[0]", "X974005-1");
though there's one field where in pdf if I click onto it it prompts me for an image to insert into field, but I can't seem to programmatically accomplish this. Based on some google searches and stumbling upon a stackoverflow page, I inserted the following code expecting it to work as desired:
string fieldName = "topmostSubform[0].Page1[0].BARCODE[0]";
string imageFile = "test-barcode.jpg";
AcroFields.FieldPosition fieldPosition = pdfStamper.AcroFields.GetFieldPositions(fieldName)[0];
PushbuttonField imageField = new PushbuttonField(pdfStamper.Writer, fieldPosition.position, fieldName);
imageField.Layout = PushbuttonField.LAYOUT_ICON_ONLY;
imageField.Image = iTextSharp.text.Image.GetInstance(imageFile);
imageField.ScaleIcon = PushbuttonField.SCALE_ICON_ALWAYS;
imageField.ProportionalIcon = false;
imageField.Options = BaseField.READ_ONLY;
pdfStamper.AcroFields.RemoveField(fieldName);
pdfStamper.AddAnnotation(imageField.Field, fieldPosition.page);
The problem I am having is while it removes the existing field as intended, when I open the newly created PDF file I don't see this new push button field with the intended image file but rather as a blank but when I perform this through debug mode I can see that it's at least picking up the correct dimensions of the image file, so I don't know what I'm doing wrong here.
Please advise, thanks.
If you read the official documentation (that is: my book), you'll find this example: ReplaceIcon.cs
You're removing the field using pdfStamper.AcroFields.RemoveField(fieldName); and subsequently you try adding the new field using pdfStamper.AddAnnotation(imageField.Field, fieldPosition.page);
That's wrong. You should replace the field using pdfStamper.AcroFields.ReplacePushbuttonField(fieldname, imageField.Field);
The ReplacePushbuttonField() method copies plenty of settings behind the scenes.

YASR - Yet another search and replace question

Environment: asp.net c# openxml
Ok, so I've been reading a ton of snippets and trying to recreate the wheel, but I'm hoping that somone can help me get to my desination faster. I have multiple documents that I need to merge together... check... I'm able to do that with openxml sdk. Birds are singing, sun is shining so far. Now that I have the document the way I want it, I need to search and replace text and/or content controls.
I've tried using my own text - {replace this} but when I look at the xml (rename docx to zip and view the file), the { is nowhere near the text. So I either need to know how to protect that within the doucment so they don't diverge or I need to find another way to search and replace.
I'm able to search/replace if it is an xml file, but then I'm back to not being able to combine the doucments easily.
Code below... and as I mentioned... document merge works fine... just need to replace stuff.
* Update * changed my replace call to go after the tag instead of regex. I have the right info now, but the .Replace call doesn't seem to want to work. Last four lines are for validation that I was seeing the right tag contents. I simply want to replace those contents now.
protected void exeProcessTheDoc(object sender, EventArgs e)
{
string doc1 = Server.MapPath("~/Templates/doc1.docx");
string doc2 = Server.MapPath("~/Templates/doc2.docx");
string final_doc = Server.MapPath("~/Templates/extFinal.docx");
File.Delete(final_doc);
File.Copy(doc1, final_doc);
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(final_doc, true))
{
string altChunkId = "AltChunkId2";
MainDocumentPart mainPart = myDoc.MainDocumentPart;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
using (FileStream fileStream = File.Open(doc2, FileMode.Open))
chunk.FeedData(fileStream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
mainPart.Document.Save();
}
exeSearchReplace(final_doc);
}
public static void GetPropertyFromDocument(string document, string outdoc)
{
XmlDocument xmlProperties = new XmlDocument();
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, false))
{
ExtendedFilePropertiesPart appPart = wordDoc.ExtendedFilePropertiesPart;
xmlProperties.Load(appPart.GetStream());
}
XmlNodeList chars = xmlProperties.GetElementsByTagName("Company");
chars.Item(0).InnerText.Replace("{ClientName}", "Penn Inc.");
StreamWriter sw;
sw = File.CreateText(outdoc);
sw.WriteLine(chars.Item(0).InnerText);
sw.Close();
}
}
}
If I'm reading this right, you have something like "{replace me}" in a .docx and then when you loop through the XML, you're finding things like <t>{replace</t><t> me</><t>}</t> or some such havoc. Now, with XML like that, it's impossible to create a routine that will replace "{replace me}".
If that's the case, then it's very, very likely related to the fact that it's considered a proofing error. i.e. it's misspelled as far as Word is concerned. The cause of it is that you've opened the document in Word and have proofing turned on. As such, the text is marked as "isDirty" and split up into different runs.
The two ways about fixing this are:
Client-side. In Word, just make sure all proofing errors are either corrected or ignored.
Format-side. Use the MarkupSimplifier tool that is part of Open XML Package Editor Power Tool for Visual Studio 2010 to fix this outside of the client. Eric White has a great (and timely for you - just a few days old) write up here on it: Getting Started with Open XML PowerTools Markup Simplifier
If you want to search and replace text in a WordprocessingML document, there is a fairly easy algorithm that you can use:
Break all runs into runs of a single character. This includes runs that have special characters such as a line break, carriage return, or hard tab.
It is then pretty easy to find a set of runs that match the characters in your search string.
Once you have identified a set of runs that match, then you can replace that set of runs with a newly created run (which has the run properties of the run containing the first character that matched the search string).
After replacing the single-character runs with a newly created run, you can then consolidate adjacent runs with identical formatting.
I've written a blog post and recorded a screen-cast that walks through this algorithm.
Blog post: http://openxmldeveloper.org/archive/2011/05/12/148357.aspx
Screen cast: http://www.youtube.com/watch?v=w128hJUu3GM
-Eric

Categories