Microsoft Office Word add-in paste functionality - C#

Microsoft Office Word add-in paste functionality - C# - c#

I have a simple word add-on which I use to paste a sequence of strings into different places of a Microsoft Word document.
Currently I am using these lines of code, to get the information about the position:
int PageNumber = range.get_Information(WdInformation.wdActiveEndPageNumber);
int ColumnNumber = range.get_Information(WdInformation.wdFirstCharacterColumnNumber);
int LineNumber = range.get_Information(WdInformation.wdFirstCharacterLineNumber);
I need a way to track their place dynamically. Let’s say if user paste a name somewhere, and then our user decides to change the contents of document before this pasted name.
Do I need to parse the entire document to find my pasted string?
What if it’s a common string value like "Hello"? Can I hide or attach something to my pasted string dynamically? Like a pointer to specific string in document?
I appreciate any help or idea, thanks.

As per the comments above it sounds like bookmarks provide what you need. Bookmarks can be hidden - to do that (IIRC) you need to give them a name that begins with an underscore. If you want to iterate over your hidden bookmarks later on then you need to set
Bookmarks.ShowHidden = true
beforehand.

Related

Word merge field in header loses value in print preview

I have an ASP WebForms app where I use a word template that contains merge fields, to replace them with data extracted from the database. The app works great, the word document is exported, but when trying to print the document, one of the merge fields, which exists in the header, loses it's value and restores to the initial merge field name. Is this something that has to be fixed from the application's code or is this a word settings issue.
Any help is greatly appreciated.
Thank you!

I have managed to solve this problem using OpenXML Productivity Tool. It turns out that you can't add a merge field in the header, so what I did was to put it inside of a textbox. I forgot to mention that part in the initial description. Thus, the text element was buried deep in open xml. When I managed to find it and log what was inside, I found out that I inserted the MERGEFIELD <> MERGEFORMAT. Every time I tried to insert the value that I wanted in this <>, it got reset when I hit print preview. So what I did, based on a suggestion from someone who had a similar problem, was to delete this textbox and create a clean one where I only entered "Test". It needs to have a string inside so that open xml created the element Text (instrTxt).
In C# I did this:
foreach (var hPart in firstDoc.MainDocumentPart.HeaderParts)
{
foreach (var txt in hPart.Header.Descendants<Text>())
{
if(txt.Text == "Test")
{
txt.Text = "My custom text";
}
}
}
So for each header part (because I can't tell for sure in which one it is..it could even be in multiple ones), get all descendants of type text.
I got a few more than I wanted. I also got a few that contained the page number (since I have it in the header as well). So I added an if to check if it's the text element that I wanted. Once I found it, I added the text I wanted.
So long story short, instead of using a merge field in the header, I just used a text. Perhaps it's not the most efficient way of doing this. Maybe the question still remains, (if I could have inserted a merge field in the header and actually made it work without having word reset the value upon print preview? idk), but this worked for me.

Is there any way to assign Id's to paragraphs in Open XML SDK 2.5?

I'm working on an application which has to create word documents with the use of Office Open XML SDK 2.5. The idea that I'm having now is that I will start from a template with an empty body (so I have all the namespaces etc. defined already), and add Paragraphsto it. If I need images I will add the ImageParts and try to give the ImagePart the Id present in the predefined paragraphpart which will contain the image. I will store the paragraphs as xml in a database, fetch the ones I need, fill in/modify some values if needed and insert them into my word document. But this is the tricky part, how can I easily insert them in a way so I don't have to query on their content to later on find one of the paragraphs? In other words, I need Id's. I have some options in mind:
For each possible paragraph I have, manually create a SdtBlock. This SdtBlock will have an Id which matches the Id of each paragraph in the database. This seems like a lot of manual work though, and I'd rather be able to create future word documents easier...
I chose this approach but I insert Building Blocks which can be stored in templates with a specific tagname.
Create the paragraphs, copy the xml from the developer tool, and manually add a ParagraphId. This seems even more of a nightmare though, because for every future new paragraphs I will have to create new Id's etc. Also it would be impossible to insert tables as there is no way (afaik) to give those an Id.
Work with bookmarks to know where to insert the data. I don't really like this either as bookmarks are visible for everyone. I know I can replace them, but then I don't have any way to identify individual paragraphs later on.
**** my database and just add everything in the template :D Remove the paragraphs I don't need by deleting the bookmarks with their content. This idea seems the worst of all though as I don't want to depend on having a templatefile with all possible content per word-file I need to generate.
Anyone with experience in OpenXml who knows which approach would be the best? Maybe there is another approach which is better and I have completely overlooked? The ideal solution would be that I can add Ids in Office Word but that's a no-go as I haven't found anything to do that yet.
Thanks in advance!

Content Controls (std) were designed for this, although I'm not sure the designers ever contemplated "targeting" each and every paragraph in the document...
Back in the 2003/2007 days it was possible to add custom XML mark-up to a Word document, which would have been exactly what you're looking for. But Microsoft lost a patent court case around 2009 and had to pull the functionality. So content controls are really your only good choice.
Your approach could, possibly, be combined with the BuildingBlocks concept. BuildingBlocks are Word content stored in a Word template file as valid Word Open XML. They can be assigned to "galleries" and categorized. There is a Content Control of type BuildingBlock that can be associated with a specific Gallery and Category which might help you to a certain extent and would be an alternative to storing content in a database.

Ok, I did a small research, you can do it in strict OpenXML, but only before you open your file in Word. Word will remove everything it cannot read.
using (WordprocessingDocument document = WordprocessingDocument.Open(path, true)) {
document.MainDocumentPart.Document.Body.Ancestors().First()
.SetAttribute(new OpenXmlAttribute() {
LocalName = "someIdName",
Value = "111" });
}
Here, for example, I set attribute "someIdName", which doesn't exits in OpenXML, to some random element. You can set it anywhere and use it as id

Creating Word file from ObservableCollection with C#

I have an observable collection with a class that has 2 string properties: Word and Translation. I want to create a word file in format:
word = translation word = translation
word = translation word = translation...
The word document needs to be in 2 Columns (PageLayout) and the Word should be in bold.
I have first tried Microsoft.Office.Interop.Word.
PageSetup.TextColumns.SetCount(2) sets the PageLayout. As for the text itself I used a foreach loop and in each iteration I did this:
paragraph.Range.Text = Word + " = " + Translation;
object boldStart = paragraph.Range.Start;
object boldEnd = paragraph.Range.Start + Word.Length;
Word.Range boldPart = document.Range(boldStart, boldEnd);
boldPart.Bold = 1;
paragraph.Range.InsertParagraphAfter();
This does exactly what I want, but if there are 1000 items in the collection it takes about 10sec, much much more if the number is 10k+. I then used a StringBuilder and just set document.Content.Text = sb.ToString(); and that takes less than a sec, but I can't set the word to be bold that way.
Then I switched to using Open XML SDK 2.5, but even after reading the msdn documentation I still have no idea how to make just a part of the text bold, and I don't know if it's even possible to set PageLayout Columns count. The only thing I could do was to make it look the same as with Interop.Word, but with just 1 column and <1sec creation time.
Should I be using Interop.Word or Open XML (or maybe combined) for this? And can someone pls show me how to write this properly, so it doesn't take forever if the collection is relatively large? Any help is appreciated. :)

OOXML can be intimidating at first. http://officeopenxml.com/anatomyofOOXML.php has some good examples. Whenever you get confused unzip the docx and browse the contents to see how it's done.
The basic idea is you'd open Word, create a template with the styling you want and a code word to find the paragraph, then multiply the paragraph, replacing the text in that template with each word.
Your Word template would look like this:
Here's some pseudo code to get you started, assuming you have the SDK installed
var templateRegex = new Regex("\\[templateForWords\\]");
var wordPlacementRegex = new Regex("\\[word\\]");
var translationPlacementRegex = new Regex("\\[translation]\\]");
using (var document = WordprocessingDocument.Open(stream, true))
{
MainDocumentPart mainPart = document.MainDocumentPart;
// do your work here...
var paragraphTemplate = mainPart.Document.Body
.Descendants<Paragraph>()
.Where(p=>templateRegex.IsMatch(p.InnerText)); //pseudo
//... or whatever gives you the text of the Para, I don't have the SDK right now
foreach (string word in YourDictionary){
var paraClone = paragraphTemplate.Clone(); // pseudo
// you may need to do something like
// paraClone.Descendents<Text>().Where(t=>regex.IsMatch(t.Value))
// to find the exact element containing template text
paraClone.Text = templateRegex.Replace(paraClone.Text,"");// pseudo
paraClone.Text = wordPlacementRegex.Replace(paraClone.Text,word);
paraClone.Text = translationPlacementRegex.Replace(paraClone.Text,YourDictionary[word]);
paragraphTemplate.Parent.InsertAfter(paraClone,ParagraphTemplate); // pseudo
}
paragraphTemplate.Remove();
// document should auto-save
document.Package.Flush();
}

OpenXML is absolutely better, because it is faster, has less bugs, more reliable and flexible in runtime (especially in server environment). And it's not really difficult to find out how to make one or another element using OpenXML. As docx file is just a zip file with xml files inside, I open it and read the xml to get the idea, how word itself makes it. First of all, I create a document, then format it (in your case, you can create some file with two columns and bold words inside), save it, rename it to .zip file. Then open it, open "word" directory inside and the file "document.xml" inside the directory. This document contains essential part of xml, looking at this it's not difficult to figure out how to recreate it in OpenXML

Open XML is a much better option than Office COM. But the problem is that it is a low-level file format library that unlike Office COM doesn’t work on a high abstraction level. You might want to go that route but I recommend you to first consider looking into a commercial library that will give you the benefits of a high-level DOM without the need to have MS Word installed on the production machine. Our company recently purchased this toolkit which allows you to use template based approach and also DOM/programmatic approach to generate/modify/create documents.

Use OpenXML to replace text in DOCX file - strange content

I'm trying to use the OpenXML SDK and the samples on Microsoft's pages to replace placeholders with real content in Word documents.
It used to work as described here, but after editing the template file in Word adding headers and footers it stopped working. I wondered why and some debugging showed me this:
Which is the content of texts in this piece of code:
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(DocumentFile, true))
{
var texts = wordDoc.MainDocumentPart.Document.Body.Descendants<Text>().ToList();
}
So what I see here is that the body of the document is "fragmented", even though in Word the content looks like this:
Can somebody tell me how I can get around this?
I have been asked what I'm trying to achieve. Basically I want to replace user defined "placeholders" with real content. I want to treat the Word document like a template. The placeholders can be anything. In my above example they look like {var:Template1}, but that's just something I'm playing with. It could basically be any word.
So for example if the document contains the following paragraph:
Do not use the name USER_NAME
The user should be able to replace the USER_NAME placeholder with the word admin for example, keeping the formatting intact. The result should be
Do not use the name admin
The problem I see with working on paragraph level, concatenating the content and then replacing the content of the paragraph, I fear I'm losing the formatting that should be kept as in
Do not use the name admin

Various things can fragment text runs. Most frequently proofing markup (as apparently is the case here, where there are "squigglies") or rsid (used to compare documents and track who edited what, when), as well as the "Go back" bookmark Word sets in the background. These become readily apparent if you view the underlying WordOpenXML (using the Open XML SDK Productivity Tool, for example) in the document.xml "part".
It usually helps to go an element level "higher". In this case, get the list of Paragraph descendants and from there get all the Text descendants and concatenate their InnerText.

OpenXML is indeed fragmenting your text:
I created a library that does exactly this : render a word template with the values from a JSON.
From the documenation of docxtemplater :
Why you should use a library for this
Docx is a zipped format that contains some xml. If you want to build a simple replace {tag} by value system, it can already become complicated, because the {tag} is internally separated into <w:t>{</w:t><w:t>tag</w:t><w:t>}</w:t>. If you want to embed loops to iterate over an array, it becomes a real hassle.
The library basically will do the following to keep formatting :
If the text is :
<w:t>Hello</w:t>
<w:t>{name</w:t>
<w:t>} !</w:t>
<w:t>How are you ?</w:t>
The result would be :
<w:t>Hello</w:t>
<w:t>John !</w:t>
<w:t>How are you ?</w:t>
You also have to replace the tag by <w:t xml:space=\"preserve\"> to ensure that the space is not stripped out if they is any in your variables.

Adding text after a particular text in word

I am trying to access word application from C#.
I want to write a paragraph as soon as I find a particular text in a word file.
For example if I find a text/header "Address" in word doc, below this I would write complete address as contents.
I am trying to approach this by getting control of the cursor and placing it after I find address, but am unable to do. Can anyone please sugest an approach for the same.

It sounds like you want to insert text into a document at a predetermined location. If that's true then you should consider using Word's Bookmarks feature instead of searching for arbitrary text like "Address". You can define bookmark names within a Word document (using the Insert > Bookmark command). Bookmarks are easy to access from C#, allowing you to insert or replace text at that location.
For example, create a new Word document and enter some arbitrary text. Select the text you want to be replaced, then click Insert > Bookmark and name the bookmark "BOOKMARK1". Save and close the document. You can now use code like the following to replace the text:
var app = new Microsoft.Office.Interop.Word.Application();
var document = app.Documents.Open("c:\\temp\\interoptest.docx");
document.Bookmarks["BOOKMARK1"].Range.Text = "This text has been replaced.";
document.Save();
app.Quit(SaveChanges: false);
Note that you'll need to add a reference to the Microsoft Word Object Library for the above code to compile. This library is found under the COM section when adding references in the most recent version of Visual Studio.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.