iTextSharp - PDF Bookmark not pointing to a page - c#

I have built a tree view to show the bookmarks of a given PDF document.
Using iTextSharp I get the bookmarks in a List object and use the Title value to show on the tree view, no problem.
The problem comes when I want the tree view node to reference a page number in the PDF document.
Some PDF documents have a value for Title, Page and Action for example:
Title: "Title Page",
Page: "1 XYZ -3 845 1.0",
Action: "GoTo"
However, others are in this format:
Title: "Title Page",
Named: "G1.1009819",
Action: "GoTo"
I have no idea what to do this the "Named" value. I have tried going through all the links in the document and comparing the value to the destination value of the link, but with no luck.
Does anyone know what this "Named" property represents?

It's a named destination, see the keyword list for some examples. It is a very common way to mark destinations in a document.
What do you want to do with the named destinations?
Do you want to consolidateNamedDestinations() so that they are no longer named destinations, but links to the specific place in the document.
Or do you want to create a link to a named destination? (That's probably more work. I don't think there are examples at hand.)
If you browse the examples, you'll discover the LinkActions where we use the SimpleNamedDestination object to retrieve the named destinations almost the same way you retrieve bookmarks using the SimpleBookmark class.
This code snippet gives us the bookmarks in the form of an XML file:
public void createXml(String src, String dest) throws IOException {
PdfReader reader = new PdfReader(src);
HashMap<String,String> map = SimpleNamedDestination.getNamedDestination(reader, false);
SimpleNamedDestination.exportToXML(map, new FileOutputStream(dest),
"ISO8859-1", true);
reader.close();
}
See destinations.xml for the result.
The code is much easier because the structure isn't nested: each name corresponds with a single destination.

Related

How to get the name and the path of the picture I've just inserted to document (MS Word) by C#?

I have a document and I insert a picture (name koala.jpg from C:\Data) to this document. I want to check that I inserted the exactly picture (koala.jpg from C:\Data not another picture). So I hope I will get the name and the path of this picture that was inserted. I found a lot on the Internet but I don't have any good idea to fix this problem. I found out LinkFormat.SourceFullName in Microsoft.Office.Interop.Word but I don't know how to use it. I think LinkFormat.SourceFullName is used for Inser Picture type "Link to file". But my picture is inserted at default "Insert" (not Insert Link to file).
I hope you can give me the idea to fix my problem.
Microsoft.Office.Interop.Word.Application oWord = (Microsoft.Office.Interop.Word.Application)w;
_activeDoc = oWord.ActiveDocument;
MessageBox.Show(_activeDoc.InlineShapes[1].LinkFormat.SourceFullName.ToString());//is error
Word does not store this information in the document. It does store the name of the original file in the Word Open XML, but not the original file path. And the file name is not exposed to the object model ("interop"), it can only be extracted from the Word Open XML (which can be read from the "interop", via the Range.WordOpenXML property.
If you need to trace the entire file path of an image then your application needs to manage the interface used to insert it. Either "roll your own" dialog box or leverage Word's built-in dialog box.
Both are possible, but the latter requires PInvoke when using C# as the dialog box arguments are "late-bound" into the Word object model and thus not accessible via the object model for C#.
For an example on how to use Word's built-in dialog boxes in C#, see this Stack Overflow answer. A list of built-in dialog boxes with their arguments can be found at https://learn.microsoft.com/en-us/office/vba/word/concepts/customizing-word/built-in-dialog-box-argument-lists-word. The built-in dialog box argument to return the file path for the picture the user selected is Name.

How can I export a piece of a DOCX file and keep the same paragraph numbering?

TL;DR:
How can I capture the paragraph numbering as a 'part' of the text and export it to a DOCX?
Problem
I have a document that's split into sections and sub-sections that reads similarly to a set of state statutes (Statute 208, with subsections Statute 208.1, Statute 208.2, etc.). We created this by modifying the numbering.xml file within the .docx zip.
I want to export a 'sub-section' (208.5) and its text to a separate .docx file. My VSTO add-in exports the text well enough, but the numbering resets to 208.1. This does make some sense as it's now the first paragraph with that <ilvl> in the document.
PDF works okay
Funnily enough, I'm able to call Word.Range's ExportAsFixedFormat function and export this selection to PDF just fine - even retaining the numbering. This led me down a path of trying to 'render' the selection, possibly as it would be printed, in order to throw it into a new .docx file, but I haven't figured that out, either.
What I've tried:
Range.ExportFragment() using both wdFormatStrictOpenXMLDocument and wdFormatDocumentDefaultas the wdSaveType values.
These export but also reset the numbering.
Document.PrintOut() using PrintToFile = true and a valid filename. I realize now that this, quite literally, generates 'printout instructions' and won't inject a new file at path filename with any valid file structure.
Plainly doesn't work. :)
Application.Selection.XML to a variable content and calling Document.Content.InsertXML(content) on a newly added Document object.
Still resets the numbering.
Code Section for Context
using Word = Microsoft.Office.Interop.Word;
Word.Range range = Application.ActiveDocument.Range(startPosition, endPosition);
range.Select();
//export to DOCX?
Application.Selection.Range.ExportFragment(
filename, Word.WdSaveFormat.wdFormatDocumentDefault);
You could use ConvertNumbersToText(wdNumberAllNumbers) before exporting, then _Document.Undo() or close without saving after the export.
There is some good information at this (dated) link that still should work with current Word APIs:
https://forums.windowssecrets.com/showthread.php/27711-Determining-which-ListTemplates-item-is-in-use-(VBA-Word-2000)
Information at that link suggests that you can create a name/handle for your ListTemplate so that you can reference it in code--as long as your statute-style bullets are associated with a named style for the document. The idea is to first name the ListTemplate that's associated with the statute bullet style for the active document and then reference that name when accessing the ListLevels collection.
For instance, you could have code that looks something like this:
ActiveDocument.Styles("StatutesBulletStyle").ListTemplate.Name = "StatuteBulletListTemplate";
After the above assignment, you can refer to the template by name:
ActiveDocument.ListTemplates("StatuteBulletListTemplate").ListLevels(1).StartAt = 5;
Using the above technique no longer requires that you try to figure out what the active template is...
Does that help?

Adding text after a particular text in word

I am trying to access word application from C#.
I want to write a paragraph as soon as I find a particular text in a word file.
For example if I find a text/header "Address" in word doc, below this I would write complete address as contents.
I am trying to approach this by getting control of the cursor and placing it after I find address, but am unable to do. Can anyone please sugest an approach for the same.
It sounds like you want to insert text into a document at a predetermined location. If that's true then you should consider using Word's Bookmarks feature instead of searching for arbitrary text like "Address". You can define bookmark names within a Word document (using the Insert > Bookmark command). Bookmarks are easy to access from C#, allowing you to insert or replace text at that location.
For example, create a new Word document and enter some arbitrary text. Select the text you want to be replaced, then click Insert > Bookmark and name the bookmark "BOOKMARK1". Save and close the document. You can now use code like the following to replace the text:
var app = new Microsoft.Office.Interop.Word.Application();
var document = app.Documents.Open("c:\\temp\\interoptest.docx");
document.Bookmarks["BOOKMARK1"].Range.Text = "This text has been replaced.";
document.Save();
app.Quit(SaveChanges: false);
Note that you'll need to add a reference to the Microsoft Word Object Library for the above code to compile. This library is found under the COM section when adding references in the most recent version of Visual Studio.

iTextSharp produce PDF from existing PDF template

I am looking at the feasibility of creating something using C# and iTextSharp that can take a PDF template and replace various place holder values with actual values retrieved from a database. Essentially a PDF mail merge. I have the iText in action book but it covers rather a lot of stuff i don't need and I am struggling to find anything related to what i want to do. I am happy to use PDF fields as the place holders so long as the merged/flattened form does not look like it has fields in it, the output document should look like a mail merged letter and not a form that has been filled in. In an ideal world i just want search the text content of the PDF and then replace text place holders with their correct field values a la word mail merge.
Can anyone advise me of the best approach to this and point me in the direction of the most helpful iTextSharp classes to use, or if you know the iText in Action book a pointer to the most helpful section for me to read.
Build your template sans fields in your page-layout/text-editor of choice.
Save to PDF.
Open that PDF and add fields to it. This is easy to do in Acrobat Pro (you could download a trial if need be). It's also possible in iText, just much harder.
In either case, you want to set your form fields to have no border, and no background... that way only their contents will be visible, no boxes to make your fields look like fields.
Merging field data into a form is Quite Trivial with iText (forgive my Java, I don't know much about C#):
void fillPDF( String filePath, Map<String, String> fieldVals ) {
PdfReader reader = new PdfReader(myFilePath);
PdfStamper stamper = new PdfStamper( reader, outputFileStream );
stamper.setFormFlattening(true);
AcroFields fields = stamper.getAcroFields();
for (String fldName : fieldVals.keySet()) {
fields.setField( fldName, fieldVals.get(fldName) );
}
stamper.close();
}
This ignores list boxes with multiple selections (and exceptions), but other than that should be ready to go. Given that you're doing a mail-merge type thing, I don't think multiple selections will be much of an issue.

Interop Word - Delete Page from Document

What is the easiest and most efficient way to delete a specific page from a Document object using the Word Interop Libraries?
I have noticed there is a Pages property that extends/implements IEnumerable. Can one simply remove the elements in the array and the pages will be removed from the Document?
I have also seen the Ranges and Section examples, but the don't look very elegant to use.
Thanks.
The short answer to your question is that there is no elegant way to do what you are trying to achieve.
Word heavily separates the content of a document from its layout. As far as Word is concerned, a document doesn't have pages; rather, pages are something derived from a document by viewing it in a certain way (e.g. print view). The Pages collection belongs to the Pane interface (accessed, for example, by Application.ActiveWindow.ActivePane), which controls layout. Consequently, there are no methods on Page that allow you to change (or delete) the content that leads to the existence of the page.
If you have control over the document(s) that you are processing in your code, I suggest that you define sections within the document that represent the parts you want to programmatically delete. Sections are a better construct because they represent content, not layout (a section may, in turn, contain page breaks). If you were to do this, you could use the following code to remove a specific section:
object missing = Type.Missing;
foreach (Microsoft.Office.Interop.Word.Section section in doc.Sections) {
if (/* some criteria */) {
section.Range.Delete(ref missing, ref missing);
break;
}
}
One possible option is to bookmark the whole pages (Select the whole page, go to Tools | Insert Bookmark then type in a name). You can then use the Bookmarks collection of the Document object to refer to the text and delete it.
Alternatively, try the C# equivalent of this code:
Doc.ActiveWindow.Selection.GoTo wdPage, PageNumber
Doc.Bookmarks("\Page").Range.Text = ""
The first line moves the cursor to page "PageNumber". The second one uses a Predefined Bookmark which always refers to the page the cursor is currently on, including the the page break at the end of the page if it exists.

Categories