HitHighlight only for current Range - c#

I am using the following code to find some strings in my document:
Application application = Addin.Application;
Document document = application.ActiveDocument;
Range rng = document.Content;
rng.Find.ClearFormatting();
rng.Find.Forward = true;
rng.Find.Text = findText;
while (rng.Find.Execute() && rng.Find.Found)
{
//here: rng.Text == findText
//however rng.Find.HitHighlight(findText, WdColor.wdColorAqua);
//highlights all occurrences in the document, not in the current range
}
as the comments in the code state, I'd expect rng.Find.HitHighlight(findText, WdColor.wdColorAqua); to only work on the current range but instead it executes on the whole document.
Interestingly if I start from a different range this works as I would expect... ie.
Range rng = document.Content.Paragraphs.First.Range;
rng.Find.HitHighlight("video", WdColor.wdColorAqua);
will only HitHighlight the findText in the first paragraph.
This is inconsistent... Any ideas on how to perform HitHighlight only on the range selected using Find?
NOTE: I tried this in a NetOffice addin and I get the same behavior.

It seems like the Find.HitHighlight method is not intended to be used at the same time as Find.Execute. It seems to use whatever range was present when you call Execute. If you don't call execute, it uses the current range.
Remarks
The HitHighlight method applies primarily to search highlighting in Office Outlook when Word is specified as the email editor. However, you can use this method on documents inside Word if you want to highlight found text. Otherwise, use the Execute method.
I suspect there is no way to do this except by iterating through each paragraph, which you already know about.
I realize this is not a complete answer but VSTO is not a popular tag and this may be the best you are going to get. Most questions go completely unanswered, and often without comments as well.

Related

c# Interop: How to restart Selection.Find to beginning of doc?

Simple question that I'm not finding an answer to. My code below is in a loop and finds the first text matching "{{foo}}" in a Word doc. I then want to reset the Find so that it begins its next search at the beginning of the doc again. Currently, it picks up where after the "foo".
Selection sel = application.Selection;
sel.Find.ClearFormatting();
sel.Find.MatchWildcards = true;
sel.Find.Text = #"\{\{?#\}\}";
sel.Find.Forward = true;
sel.Find.Execute();
How do I reset the starting location of Find?
It's always "better" to use Range rather than Selection in Word, whenever possible. You can have only one selection, but code can work with multiple ranges. In addition, the screen is quieter and execution tends to be faster. There are situations where Selection is necessary, but this is not one of them.
To get the Range of the entire document
Word.Range rngDoc = document.Content;
To "find" using the range:
rngDoc.Find.ClearFormatting();
rngDoc.Find.MatchWildcards = true;
rngDoc.Find.Text = #"\{\{?#\}\}";
rngDoc.Find.Forward = true;
rngDoc.Find.Wrap = Word.WdFindWrap.wdFindStop //ensure Word won't entire an infinite loop
rngDoc.Find.Execute();
When "find" is successful, the Range (or Selection) contains only what was found. To "reset" to start again from the beginning of the document (including the whole document):
rngDoc = document.Content;
And (what people ask more frequently) to continue searching from just beyond the "found" term to the end of the document:
object oCollapseEnd = Word.WdCollapseDirection.wdCollapseEnd;
rngDoc.Collapse(ref oCollapseEnd); //go just beyond what was found
rngDoc.End = document.Content.End;
In VBA we'd use:
Selection.HomeKey Word.WdUnits.wdStory
So, in your C# code would that convert to?
Sel.HomeKey(Word.wdUnits.wdStory);
.Find.Execute always resets the range to the found range. Accordingly, you need to either re-set it to the entire document (as per Olivier's comment) or use the C# equivalent of .Wrap = wdFindContinue (VBA) to instruct Word to continue searching from the top after getting to the end of the document.

Duplicate ranges in C# Word Interop

I have a word template where some paragraphs needs to be duplicated programmitacaly. I tried to use range.duplicate, but it wouldn't do the job.
Now, i have this code:
document.Bookmarks["Experience"].Select();
Word.Range range = application.Selection.Range;
range.Copy();
range.Paste();
But it doesn't insert anything to the documentum. Can you please help me?
The problem is that the two ranges are identical. It's like when you're working in the document using the mouse or keyboard: If you have a selection and paste, what you paste will replace what was selected. In order to have the one follow the other you first need to press the right-arrow key or click somewhere.
So you need to specify a second Range (such as the end of the document or another bookmark), or as suggested in #HansPassant comment, "collapse" the Range (like pressing an arrow key).
Another thing to keep in mind is that you shouldn't use the Clipboard, if at all possible. The alternative in Word is to use FormattedRange to transfer foramtted content. The sample code below shows both variations.
//Possibility 1:
Word.Range rangeSource = document.Bookmarks["Experience"].Range;
Word.Range rangeTarget = rangeSource.Duplicate();
rngTarget.Collapse(Word.WdCollapseDirection.wdCollapseEnd);
rngTarget.FormattedText = rngSource.FormattedText;
//Possibility 2:
Word.Range rangeSource = document.Bookmarks["Experience"].Range;
rangeSource.Copy();
rngSource.Collapse(Word.WdCollapseDirection.wdCollapseEnd);
rangeSource.Paste();

Creating Word file from ObservableCollection with C#

I have an observable collection with a class that has 2 string properties: Word and Translation. I want to create a word file in format:
word = translation word = translation
word = translation word = translation...
The word document needs to be in 2 Columns (PageLayout) and the Word should be in bold.
I have first tried Microsoft.Office.Interop.Word.
PageSetup.TextColumns.SetCount(2) sets the PageLayout. As for the text itself I used a foreach loop and in each iteration I did this:
paragraph.Range.Text = Word + " = " + Translation;
object boldStart = paragraph.Range.Start;
object boldEnd = paragraph.Range.Start + Word.Length;
Word.Range boldPart = document.Range(boldStart, boldEnd);
boldPart.Bold = 1;
paragraph.Range.InsertParagraphAfter();
This does exactly what I want, but if there are 1000 items in the collection it takes about 10sec, much much more if the number is 10k+. I then used a StringBuilder and just set document.Content.Text = sb.ToString(); and that takes less than a sec, but I can't set the word to be bold that way.
Then I switched to using Open XML SDK 2.5, but even after reading the msdn documentation I still have no idea how to make just a part of the text bold, and I don't know if it's even possible to set PageLayout Columns count. The only thing I could do was to make it look the same as with Interop.Word, but with just 1 column and <1sec creation time.
Should I be using Interop.Word or Open XML (or maybe combined) for this? And can someone pls show me how to write this properly, so it doesn't take forever if the collection is relatively large? Any help is appreciated. :)
OOXML can be intimidating at first. http://officeopenxml.com/anatomyofOOXML.php has some good examples. Whenever you get confused unzip the docx and browse the contents to see how it's done.
The basic idea is you'd open Word, create a template with the styling you want and a code word to find the paragraph, then multiply the paragraph, replacing the text in that template with each word.
Your Word template would look like this:
Here's some pseudo code to get you started, assuming you have the SDK installed
var templateRegex = new Regex("\\[templateForWords\\]");
var wordPlacementRegex = new Regex("\\[word\\]");
var translationPlacementRegex = new Regex("\\[translation]\\]");
using (var document = WordprocessingDocument.Open(stream, true))
{
MainDocumentPart mainPart = document.MainDocumentPart;
// do your work here...
var paragraphTemplate = mainPart.Document.Body
.Descendants<Paragraph>()
.Where(p=>templateRegex.IsMatch(p.InnerText)); //pseudo
//... or whatever gives you the text of the Para, I don't have the SDK right now
foreach (string word in YourDictionary){
var paraClone = paragraphTemplate.Clone(); // pseudo
// you may need to do something like
// paraClone.Descendents<Text>().Where(t=>regex.IsMatch(t.Value))
// to find the exact element containing template text
paraClone.Text = templateRegex.Replace(paraClone.Text,"");// pseudo
paraClone.Text = wordPlacementRegex.Replace(paraClone.Text,word);
paraClone.Text = translationPlacementRegex.Replace(paraClone.Text,YourDictionary[word]);
paragraphTemplate.Parent.InsertAfter(paraClone,ParagraphTemplate); // pseudo
}
paragraphTemplate.Remove();
// document should auto-save
document.Package.Flush();
}
OpenXML is absolutely better, because it is faster, has less bugs, more reliable and flexible in runtime (especially in server environment). And it's not really difficult to find out how to make one or another element using OpenXML. As docx file is just a zip file with xml files inside, I open it and read the xml to get the idea, how word itself makes it. First of all, I create a document, then format it (in your case, you can create some file with two columns and bold words inside), save it, rename it to .zip file. Then open it, open "word" directory inside and the file "document.xml" inside the directory. This document contains essential part of xml, looking at this it's not difficult to figure out how to recreate it in OpenXML
Open XML is a much better option than Office COM. But the problem is that it is a low-level file format library that unlike Office COM doesn’t work on a high abstraction level. You might want to go that route but I recommend you to first consider looking into a commercial library that will give you the benefits of a high-level DOM without the need to have MS Word installed on the production machine. Our company recently purchased this toolkit which allows you to use template based approach and also DOM/programmatic approach to generate/modify/create documents.

MS Word Interop to C# - Inserting multiple files at a bookmark

I have one master document into which I want to insert a number of files. These should be inserted into the file one after another at a certain point in the middle of the document.
So I have created a bookmark at this point called "TESTS", since this seems to be the easiest way of programatically finding the point.
I am able to insert a single file using this code:
Microsoft.Office.Interop.Word.Application oWord = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document oWordDoc = oWord.Documents.Open(#"C:\master.doc");
oWordDoc.Bookmarks.Cast<Bookmark>().First(b => b.Name == "TESTS").Range.InsertFile(#"C:\test1.doc");
But this removes the bookmark, making it impossible to insert a second file at the same point. I don't mind losing the bookmark, but only once I have inserted all files.
Can this be done? I am guessing that the above code replaces the range with the bookmark so finding the location just before or after and then deleting the bookmark range would be best - but I just can't find the code for it. Everything I have tried seems to replace the whole document.
Alternatively, is there any way to do this without the Interop (i.e. by parsing the file - no touching MS Word at all)?
There must be something particular about the way your document is set up and the exact range of the the bookmark because I am able to get this to work without losing the bookmark. According to this MVP article Inserting text at a bookmark without deleting the bookmark, adding Text to a bookmarked range should delete the bookmark; maybe you are running into similar issue with InsertFile.
Try their suggestion of storing the bookmark's range into a variable ie MyRange and then calling Bookmarks.Add "mybookmark", MyRange
Dim BMRange As Range
Set BMRange = ActiveDocument.Bookmarks("MyBookmark").Range
BMRange.Text = "Hello world"
ActiveDocument.Bookmarks.Add "MyBookmark", BMRange

How to Change value From Persian Culture to English [duplicate]

This question already has answers here:
How to Convert Persian Digits in variable to English Digits Using Culture?
(18 answers)
Closed 9 years ago.
i have a variable with persian culture digits like this:
string Value="۱۰۳۶۷۵۱";
i want to convert this digits to English version and save it again in my string like this
Value="1036751";
please help me how can i do this
if i can use easy way like culture info instead of switch case
You can use the Windows.Globalization.NumberFormatting.DecimalFormatter class to parse the string. This will parse strings in any of the supported numeral systems (as long as it is internally coherent).
You can do that with a number of tools. iTextPdfSharp will likely be able to do it. It will amount to opening the document and walking the tree in the catalog that has the bookmarks in it. Their code works fine, but be sure to download the spec so you can understand the structure of the tree. I worked the original version of Acrobat and many of my fellow engineers engineers felt that the bookmark tree was a little over-complicated.
BitMiracle offers similar code. They routinely patrol Stack Overflow, so you might see an answer from them too (HI!) - you can see a sample of their work here for authoring bookmarks.
If you're willing to pay money, this is easy using Atalasoft's DotPdf (disclaimer: I work for Atalasoft and wrote nearly all of DotPdf). In our API, we try to hide the complexity of the structure where possible (for example, if you want to iterate of the chain of chains of actions taken when a bookmark is clicked, it's a foreach instead of a tree walk) and we've wrapped the bookmark tree into standard List<T> collections.
public void WalkBookmarks(Stream pdf)
{
// open the doc
PdfDocument doc = new PdfDocument(pdf);
if (doc.BookmarkTree != null)
{
// walk the list of top level bookmarks
WalkBookmarks(doc.BookmarkTree.Bookmarks, 0);
}
}
public void WalkBookmarks(PdfBookmarkList list, int depth)
{
if (list == null) return;
foreach (PdfBookmark bookmark in list)
{
// indent to the depth of the list and write the Text
// you can also get the color, basic font styling and
// the action associated with the bookmark
for (i = 0; i < depth; i++) Console.Write(" ");
Console.Writeline(bookmark.Text);
// recurse on any children
WalkBookmarks(bookmark.Children, depth + 1);
}
}
PDFs can contain at least three different things which may be called "table of contents":
Document outline (bookmarks), a set of specific PDF structures
List of hyperlinks in the beginning of a document. Each hyperlink leads to a place withing the document
List of text strings where each string names a part of the document and, optionally, specifies on which page this part starts.
I do not know about any out-of-the box or easy to implement solutions for the third case. Other cases are simpler.
For the first case, almost any PDF library will do. #plinth (Hi!) gave at least two solutions for such a case.
For the second case a solution could be implemented using Docotic.Pdf library. Basically, you might try to:
enumerate all links in a document
find all links that are close to each other (you'll need to build up some heuristics for what to treat as "close")
retrieve text from found links
If your case is "list of hyperlinks" then the Extract text from link target sample might give you some clues for a start.
Disclaimer: I work for Bit Miracle, vendor of Docotic.Pdf library.
You'll need to use a pdf-library like pdflib in order to read pdf-files (http://www.pdflib.com/) . That should do the trick, good luck!

Categories