Selecting text by font in Word - c#

Is there a way of extracting all lines that are using a particular font (size, is it bolded, font name, etc) in word via C#?
In addition, is there a way to find out what is the font for some text that is in the document?
My hunch is that there are functions in the Microsoft.Office.Interop.Word namespace that can do this, but I cannot seem to find them.
Edit: I am using word 2010.

You can loop through the document using the Find object from Word Interop. You can set the Find.Font.Name property for a Selection or Range from your document. Note that the Font interface has several Name* properties for various encodings.
EDIT
Here's the equivalent VBA code:
Dim selectionRange As Range
Set selectionRange = Application.ActiveDocument.Range
With selectionRange.Find
.ClearFormatting
.Format = True
.Font.NameBi = "Narkisim" //for doc without bidirectional script, use Name
Do While .Execute
MsgBox selectionRange.Text
Loop
End With
The object model from Word Interop is the same, see the link above.
Don't go asking me for C# code now... this is SO, we don't do silver platters. And if you're ever going to do serious work with the Office Interop API, you will need to be able to read VBA code.

Related

How can I export a piece of a DOCX file and keep the same paragraph numbering?

TL;DR:
How can I capture the paragraph numbering as a 'part' of the text and export it to a DOCX?
Problem
I have a document that's split into sections and sub-sections that reads similarly to a set of state statutes (Statute 208, with subsections Statute 208.1, Statute 208.2, etc.). We created this by modifying the numbering.xml file within the .docx zip.
I want to export a 'sub-section' (208.5) and its text to a separate .docx file. My VSTO add-in exports the text well enough, but the numbering resets to 208.1. This does make some sense as it's now the first paragraph with that <ilvl> in the document.
PDF works okay
Funnily enough, I'm able to call Word.Range's ExportAsFixedFormat function and export this selection to PDF just fine - even retaining the numbering. This led me down a path of trying to 'render' the selection, possibly as it would be printed, in order to throw it into a new .docx file, but I haven't figured that out, either.
What I've tried:
Range.ExportFragment() using both wdFormatStrictOpenXMLDocument and wdFormatDocumentDefaultas the wdSaveType values.
These export but also reset the numbering.
Document.PrintOut() using PrintToFile = true and a valid filename. I realize now that this, quite literally, generates 'printout instructions' and won't inject a new file at path filename with any valid file structure.
Plainly doesn't work. :)
Application.Selection.XML to a variable content and calling Document.Content.InsertXML(content) on a newly added Document object.
Still resets the numbering.
Code Section for Context
using Word = Microsoft.Office.Interop.Word;
Word.Range range = Application.ActiveDocument.Range(startPosition, endPosition);
range.Select();
//export to DOCX?
Application.Selection.Range.ExportFragment(
filename, Word.WdSaveFormat.wdFormatDocumentDefault);
You could use ConvertNumbersToText(wdNumberAllNumbers) before exporting, then _Document.Undo() or close without saving after the export.
There is some good information at this (dated) link that still should work with current Word APIs:
https://forums.windowssecrets.com/showthread.php/27711-Determining-which-ListTemplates-item-is-in-use-(VBA-Word-2000)
Information at that link suggests that you can create a name/handle for your ListTemplate so that you can reference it in code--as long as your statute-style bullets are associated with a named style for the document. The idea is to first name the ListTemplate that's associated with the statute bullet style for the active document and then reference that name when accessing the ListLevels collection.
For instance, you could have code that looks something like this:
ActiveDocument.Styles("StatutesBulletStyle").ListTemplate.Name = "StatuteBulletListTemplate";
After the above assignment, you can refer to the template by name:
ActiveDocument.ListTemplates("StatuteBulletListTemplate").ListLevels(1).StartAt = 5;
Using the above technique no longer requires that you try to figure out what the active template is...
Does that help?

.net Interop.Word insert from docx without screwing up formatting

I have an issue I've stuck with for over a year now. I made a Forms application in VB.net which allows the user to type in some information and select items which represent docx-files with tables with special formatting, pictures and other formatting quirks in them.
At the end the software creates a Word document via Office.Interop, using the information the user provided in text fields in the Forms and the items they selected (e.g. it creates a table in Word, listing the user's selections with some extra info) and then appends the content from multiple docx-files depending on the user's selection to the document created via Interop.
The problem is: To achieve this I had to use a pretty dirty method:
I open the respective docx-files, select all content (Range.Wholestory()) and copy it (Range.Copy()). Then I insert this content from the clipboard into my newly created document with the following option:
Selection.PasteAndFormat (wdFormatOriginalFormatting)
This produces a satisfactory result but it feels super dirty since it uses the user's clipboard (which I save at the beginning of the runtime and restore at the end).
I originally tried to use the Selection.InsertFile-Method and tried this again today but it completely screws the formatting.
When the content of the docx is inserted this way it neither has the formatting of the original docx nor the one of the file I created with the program. E.g. the SpaceBefore and SpaceAfter values are wrong, even if I explicitly define them in my created file. Changing the formatting afterwards is no option since the source files contain a lot of special formatting and can change all the time.
Another factor which makes it hard: I cannot save the file before it is presented to the user, using temp folder is not an option in the environment this application is deployed into, so basically everything happens in RAM.
Summary:
Basically what I want is to create the same outcome as with my "Copy and Paste" method utilizing the OriginalFormatting WITHOUT using the clipboard. The problem is, the InsertFile-Method doesn't provide an option for the formatting.
Any idea or help would be greatly appreciated.
Edit:
The FormattedText option as suggested by Rich Michaels produces the same result as the InsertFile-Method. Here is the relevant part of what I did (word is the Microsoft.Office.Interop.Word.Application):
#Opening the source file
Dim doctemp As Microsoft.Office.Interop.Word.Document
doctemp = word.Documents.Open(doctempfilepath)
#Selecting whole document; this is what I did for the "Copy/Paste"-Method, too
doctemp.Range.WholeStory()
Dim insert_range As wordoptions.Range
doc_destination.Activate()
#Jumping to the end and selecting the range
word.Selection.EndKey(Unit:=Microsoft.Office.Interop.Word.WdUnits.wdStory)
insert_range = word.Selection.Range
#Inserting the text
insert_range.FormattedText = doctemp.Range.FormattedText
doctemp.Close(False)
This is the problem:
Use the Range.FormattedText property. It doesn't touch the clipboard and it maintains the source formatting. The process is ...
Set the range in the Source document you want "copied" and set the insertion point in the Destination document and then,
DestinationRange.FormattedText = SourceRange.FormattedText

Use OpenXML to replace text in DOCX file - strange content

I'm trying to use the OpenXML SDK and the samples on Microsoft's pages to replace placeholders with real content in Word documents.
It used to work as described here, but after editing the template file in Word adding headers and footers it stopped working. I wondered why and some debugging showed me this:
Which is the content of texts in this piece of code:
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(DocumentFile, true))
{
var texts = wordDoc.MainDocumentPart.Document.Body.Descendants<Text>().ToList();
}
So what I see here is that the body of the document is "fragmented", even though in Word the content looks like this:
Can somebody tell me how I can get around this?
I have been asked what I'm trying to achieve. Basically I want to replace user defined "placeholders" with real content. I want to treat the Word document like a template. The placeholders can be anything. In my above example they look like {var:Template1}, but that's just something I'm playing with. It could basically be any word.
So for example if the document contains the following paragraph:
Do not use the name USER_NAME
The user should be able to replace the USER_NAME placeholder with the word admin for example, keeping the formatting intact. The result should be
Do not use the name admin
The problem I see with working on paragraph level, concatenating the content and then replacing the content of the paragraph, I fear I'm losing the formatting that should be kept as in
Do not use the name admin
Various things can fragment text runs. Most frequently proofing markup (as apparently is the case here, where there are "squigglies") or rsid (used to compare documents and track who edited what, when), as well as the "Go back" bookmark Word sets in the background. These become readily apparent if you view the underlying WordOpenXML (using the Open XML SDK Productivity Tool, for example) in the document.xml "part".
It usually helps to go an element level "higher". In this case, get the list of Paragraph descendants and from there get all the Text descendants and concatenate their InnerText.
OpenXML is indeed fragmenting your text:
I created a library that does exactly this : render a word template with the values from a JSON.
From the documenation of docxtemplater :
Why you should use a library for this
Docx is a zipped format that contains some xml. If you want to build a simple replace {tag} by value system, it can already become complicated, because the {tag} is internally separated into <w:t>{</w:t><w:t>tag</w:t><w:t>}</w:t>. If you want to embed loops to iterate over an array, it becomes a real hassle.
The library basically will do the following to keep formatting :
If the text is :
<w:t>Hello</w:t>
<w:t>{name</w:t>
<w:t>} !</w:t>
<w:t>How are you ?</w:t>
The result would be :
<w:t>Hello</w:t>
<w:t>John !</w:t>
<w:t>How are you ?</w:t>
You also have to replace the tag by <w:t xml:space=\"preserve\"> to ensure that the space is not stripped out if they is any in your variables.

Adding text after a particular text in word

I am trying to access word application from C#.
I want to write a paragraph as soon as I find a particular text in a word file.
For example if I find a text/header "Address" in word doc, below this I would write complete address as contents.
I am trying to approach this by getting control of the cursor and placing it after I find address, but am unable to do. Can anyone please sugest an approach for the same.
It sounds like you want to insert text into a document at a predetermined location. If that's true then you should consider using Word's Bookmarks feature instead of searching for arbitrary text like "Address". You can define bookmark names within a Word document (using the Insert > Bookmark command). Bookmarks are easy to access from C#, allowing you to insert or replace text at that location.
For example, create a new Word document and enter some arbitrary text. Select the text you want to be replaced, then click Insert > Bookmark and name the bookmark "BOOKMARK1". Save and close the document. You can now use code like the following to replace the text:
var app = new Microsoft.Office.Interop.Word.Application();
var document = app.Documents.Open("c:\\temp\\interoptest.docx");
document.Bookmarks["BOOKMARK1"].Range.Text = "This text has been replaced.";
document.Save();
app.Quit(SaveChanges: false);
Note that you'll need to add a reference to the Microsoft Word Object Library for the above code to compile. This library is found under the COM section when adding references in the most recent version of Visual Studio.

Fillable doc files

I have a samples of some documents in .doc format. So I need to create some "fillable# areas instead of certain values in samples. Then I need to automatically fill this documents using C#. So what do you think about it? Is that possible? Thanks in advance, guys! P.S.: if you need some information from me please feel free to ask me about additions to my question.
Besides simply injecting/replacing text into the document itself you could also utilize docvariables. You can define/create them in your document and then you can codewise set the values.
Using docvariables you seperate the design of the worddoc (where is the text shown) from setting the values which might be usefull for your case.
You can certainly manipulate them using C# but a bit more info using a vba sample can found at What is a DOCVARIABLE in word
One little warning when using c# to edit them. If you set the value of a docvariable to "" (empty string) it results in the docvariable being deleted from the document. If you want to keep the docvariable around set it's value to a " " (space)
Yes this is possible, you can create in your Document a placeholder areas which you search and change when you access the file. Check these results on how to modify the word document using C#

Categories