Find highlighted text - c#

The following code finds instances of the word "Family" in a Word document. It selects and deletes the instances. The code works fine, but I want to find all instances of only highlighted words.
public void FindHighlightedText()
{
const string filePath = "D:\\COM16_Duke Energy.doc";
var word = new Microsoft.Office.Interop.Word.Application {Visible = true};
var doc = word.Documents.Open(filePath);
var range = doc.Range();
range.Find.ClearFormatting();
range.Find.Text = "Family";
while (range.Find.Execute())
{
range.Select();
range.Delete();
}
doc.Close();
word.Quit(true, Type.Missing, Type.Missing);
}

Set the Find.Highlight property to true.
Interop uses the same objects and methods that are available to VBA macros. You can find the actions, properties you need to perform a task by recording a macro with those steps and inspecting it.
Often, but not always, the properties match the UI. If something is a property in the general Find box, it's probably a property in the Find interface as well.
For example, searching only for highlighted words produced this macro :
Selection.Find.ClearFormatting
Selection.Find.Highlight = True
With Selection.Find
.Text = ""
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Which can be translated to :
range.Find.ClearFormatting();
range.Find.Highlight=1;
...
while(range.Find.Execute())
{
...
}

Related

Remove text between two occurrences in Word with Interop

I have a piece of text tagged with ##ABC, so it looks like this:
Some text ##ABCtext to be found##ABC some text
I need to find and remove the ##ABCtext to be found##ABC with interop. So far, I've come up wit the following code, which, however, seems to do nothing:
Microsoft.Office.Interop.Word.Range rng = document.Range();
rng.Find.ClearFormatting();
rng.Find.Replacement.ClearFormatting();
rng.Find.MatchWildcards = true;
rng.Find.Text = "##ABC(.*?)##ABC";
rng.Find.Replacement.Text = "";
rng.Find.Forward = true;
rng.Find.Wrap = Microsoft.Office.Interop.Word.WdFindWrap.wdFindStop;
rng.Find.Execute(Replace: Microsoft.Office.Interop.Word.WdReplace.wdReplaceAll);
What am I missing?

Add If condition in footer using Word Interop in C#

I am using visual studio express 2015 and word interop 14.0.
I need to add a If condition in the footer of the last page of a word document using Word Interop in C#. I searched for the code and also in other forums, but couldn't get it work in C#. Please help.
My question is how to add a IF condition in the footer section so that it only dipslays in the last page.
condition is:
if page = numpages then "Last Page Footer Text" else "Other page footer text"
I used the below code, but it displays in all the page and also the if condition appears in the footer.
object fieldPages = WdFieldType.wdFieldPage;
object fieldNumPages = WdFieldType.wdFieldNumPages;
object fieldMerge = WdFieldType.wdFieldMergeField;
object fieldAuthor = WdFieldType.wdFieldAuthor;
object fieldIF = WdFieldType.wdFieldIf;
object collapseDirection = WdCollapseDirection.wdCollapseStart;
object txt = string.Empty;
var field = Rng.Fields.Add(Rng, ref fieldAuthor, ref txt, true);
Rng.InsertAfter("\"");
Rng.InsertBefore("\"");
Rng.Collapse(ref collapseDirection);
oDoc.Fields.Add(Rng, ref fieldNumPages, ref txt, true);
Rng.InsertBefore(" = ");
Rng.Collapse(ref collapseDirection);
oDoc.Fields.Add(Rng, ref fieldPages, ref txt, true);
Rng.InsertBefore(" IF ");
Rng.Collapse(ref collapseDirection);
oWord.ActiveWindow.ActivePane.View.ShowFieldCodes = true;
field.Update();
Creating nested fields is complicated using the object model - there's nothing in it to facilitate the process. Trying to mimic the UI by creating the innermost field(s), selecting them, then inserting field brackets is a bit tricky and code for every combination must be written.
Using the object model, it makes more sense to create the outermost field writing placeholders for the fields to nest within it. Then Word's Range.Find functionality can pick up the placeholders and insert field codes in their place.
Here's some sample code to create the conditional footer text you describe:
//Returns the changed field code
private string GenerateNestedField(Word.Field fldOuter,
string sPlaceholder)
{
Word.Range rngFld = fldOuter.Code;
Word.Document doc = (Word.Document) fldOuter.Parent;
bool bFound;
string sFieldCode;
//Get the field code from the placeholder by removing the { }
sFieldCode = sPlaceholder.Substring(1, sPlaceholder.Length - 2); //Mid(sPlaceholder, 2, Len(sPlaceholder) - 2)
rngFld.TextRetrievalMode.IncludeFieldCodes = true;
bFound = rngFld.Find.Execute(sPlaceholder);
if (bFound) doc.Fields.Add(rngFld, Word.WdFieldType.wdFieldEmpty, sFieldCode, false);
return fldOuter.Code.ToString();
}
private void button2_Click(object sender, EventArgs e)
{
getWordInstance(); //Object defined as a class member for Word.Application
Word.Document doc = wdApp.ActiveDocument;
Word.View vw = doc.ActiveWindow.View;
Word.Range rngTarget = null;
Word.Field fldIf = null;
string sIfField, sFieldCode;
string sQ = '"'.ToString();
bool bViewFldCodes = false;
sIfField = "IF {Page} = {NumPages} " + sQ + "Last" + sQ + " " + sQ + "Other" + sQ;
rngTarget = doc.Sections[1].Footers[Word.WdHeaderFooterIndex.wdHeaderFooterPrimary].Range;
bViewFldCodes = vw.ShowFieldCodes;
//Finding text in a field codes requires field codes to be shown
if(!bViewFldCodes) vw.ShowFieldCodes = true;
//Create the nested field: { IF {Pages} = {NumPages} "Last" "Other" }
fldIf = doc.Fields.Add(rngTarget, Word.WdFieldType.wdFieldEmpty, sIfField, false);
sFieldCode = GenerateNestedField(fldIf, "{Page}");
sFieldCode = GenerateNestedField(fldIf, "{NumPages}");
rngTarget.Fields.Update();
vw.ShowFieldCodes = bViewFldCodes;
}

When using MergeField FieldCodes in OpenXml SDK in C# why do field codes disappear or fragment?

I have been working successfully with the C# OpenXml SDK (Unofficial Microsoft Package 2.5 from NuGet) for some time now, but have recently noticed that the following line of code returns different results depending on what mood Microsoft Word appears to be in when the file gets saved:
var fields = document.Descendants<FieldCode>();
From what I can tell, when creating the document in the first place (using Word 2013 on Windows 8.1) if you use the Insert->QuickParts->Field and choose MergeField from the Field names left hand pane, and then provide a Field name in the field properties and click OK then the field code is correctly saved in the document as I would expect.
Then when using the aforementioned line of code I will receive a field code count of 1 field. If I subsequently edit this document (and even leave this field well alone) the subsequent saving could mean that this field code no longer is returned in my query.
Another case of the same curiousness is when I see the FieldCode nodes split across multiple items. So rather than seeing say:
" MERGEFIELD Author \\* MERGEFORMAT "
As the node name, I will see:
" MERGEFIELD Aut"
"hor \\* MERGEFORMAT"
Split as two FieldCode node values. I have no idea why this would be the case, but it certainly makes my ability to match nodes that much more exciting. Is this expected behaviour? A known bug? I don't really want to have to crack open the raw xml and edit this document to work until I understand what is going on. Many thanks all.
I came across this very problem myself, and found a solution that exists within OpenXML: a utility class called MarkupSimplifier which is part of the PowerTools for Open XML project. Using this class solved all the problems I was having that you describe.
The full article is located here.
Here are some pertinent exercepts :
Perhaps the most useful simplification that this performs is to merge adjacent runs with identical formatting.
It goes on to say:
Open XML applications, including Word, can arbitrarily split runs as necessary. If you, for instance, add a comment to a document, runs will be split at the location of the start and end of the comment. After MarkupSimplifier removes comments, it can merge runs, resulting in simpler markup.
An example of the utility class in use is:
SimplifyMarkupSettings settings = new SimplifyMarkupSettings
{
RemoveComments = true,
RemoveContentControls = true,
RemoveEndAndFootNotes = true,
RemoveFieldCodes = false,
RemoveLastRenderedPageBreak = true,
RemovePermissions = true,
RemoveProof = true,
RemoveRsidInfo = true,
RemoveSmartTags = true,
RemoveSoftHyphens = true,
ReplaceTabsWithSpaces = true,
};
MarkupSimplifier.SimplifyMarkup(wordDoc, settings);
I have used this many times with Word 2010 documents using VS2015 .Net Framework 4.5.2 and it has made my life much, much easier.
Update:
I have revisited this code and have found it clears upon runs on MERGEFIELDS but not IF FIELDS that reference mergefields e.g.
{if {MERGEFIELD When39} = "Y???" "Y" "N" }
I have no idea why this might be so, and examination of the underlying XML offers no hints.
Word will often split text runs with into multiple text runs for no reason I've ever understood. When searching, comparing, tidying etc. We preprocess the body with method which combines multiple runs into a single text run.
/// <summary>
/// Combines the identical runs.
/// </summary>
/// <param name="body">The body.</param>
public static void CombineIdenticalRuns(W.Body body)
{
List<W.Run> runsToRemove = new List<W.Run>();
foreach (W.Paragraph para in body.Descendants<W.Paragraph>())
{
List<W.Run> runs = para.Elements<W.Run>().ToList();
for (int i = runs.Count - 2; i >= 0; i--)
{
W.Text text1 = runs[i].GetFirstChild<W.Text>();
W.Text text2 = runs[i + 1].GetFirstChild<W.Text>();
if (text1 != null && text2 != null)
{
string rPr1 = "";
string rPr2 = "";
if (runs[i].RunProperties != null) rPr1 = runs[i].RunProperties.OuterXml;
if (runs[i + 1].RunProperties != null) rPr2 = runs[i + 1].RunProperties.OuterXml;
if (rPr1 == rPr2)
{
text1.Text += text2.Text;
runsToRemove.Add(runs[i + 1]);
}
}
}
}
foreach (W.Run run in runsToRemove)
{
run.Remove();
}
}
I tried to simplify the document with Powertools but the result was a corrupted word file. I make this routine for simplify only fieldcodes that has specifics names, works in all parts on the docs (maindocumentpart, headers and footers):
internal static void SimplifyFieldCodes(WordprocessingDocument document)
{
var masks = new string[] { Constants.VAR_MASK, Constants.INP_MASK, Constants.TBL_MASK, Constants.IMG_MASK, Constants.GRF_MASK };
SimplifyFieldCodesInElement(document.MainDocumentPart.RootElement, masks);
foreach (var headerPart in document.MainDocumentPart.HeaderParts)
{
SimplifyFieldCodesInElement(headerPart.Header, masks);
}
foreach (var footerPart in document.MainDocumentPart.FooterParts)
{
SimplifyFieldCodesInElement(footerPart.Footer, masks);
}
}
internal static void SimplifyFieldCodesInElement(OpenXmlElement element, string[] regexpMasks)
{
foreach (var run in element.Descendants<Run>()
.Select(item => (Run)item)
.ToList())
{
var fieldChar = run.Descendants<FieldChar>().FirstOrDefault();
if (fieldChar != null && fieldChar.FieldCharType == FieldCharValues.Begin)
{
string fieldContent = "";
List<Run> runsInFieldCode = new List<Run>();
var currentRun = run.NextSibling();
while ((currentRun is Run) && currentRun.Descendants<FieldCode>().FirstOrDefault() != null)
{
var currentRunFieldCode = currentRun.Descendants<FieldCode>().FirstOrDefault();
fieldContent += currentRunFieldCode.InnerText;
runsInFieldCode.Add((Run)currentRun);
currentRun = currentRun.NextSibling();
}
// If there is more than one Run for the FieldCode, and is one we must change, set the complete text in the first Run and remove the rest
if (runsInFieldCode.Count > 1)
{
// Check fielcode to know it's one that we must simplify (for not to change TOC, PAGEREF, etc.)
bool applyTransform = false;
foreach (string regexpMask in regexpMasks)
{
Regex regex = new Regex(regexpMask);
Match match = regex.Match(fieldContent);
if (match.Success)
{
applyTransform = true;
break;
}
}
if (applyTransform)
{
var currentRunFieldCode = runsInFieldCode[0].Descendants<FieldCode>().FirstOrDefault();
currentRunFieldCode.Text = fieldContent;
runsInFieldCode.RemoveAt(0);
foreach (Run runToRemove in runsInFieldCode)
{
runToRemove.Remove();
}
}
}
}
}
}
Hope this helps!!!

How to use C# export QTP result to PDF automatically

I'm writing a C# program to run QTP.
Now my program can trigger QTP automatically and send the result to my mailbox. But this result is HTML, i find that QTP can export a PDF result.
so, here is my code.
qtpAutoReport = qtpApp.Options.Run.AutoExportReportConfig;
qtpAutoReport.AutoExportResults = true;
qtpAutoReport.StepDetailsReport = true;
qtpAutoReport.DataTableReport = false;
qtpAutoReport.LogTrackingReport = false;
qtpAutoReport.ScreenRecorderReport = false;
qtpAutoReport.SystemMonitorReport = false;
qtpAutoReport.StepDetailsReportFormat = "Short";
qtpAutoReport.ExportLocation = AutoExportPath;
qtpAutoReport.ExportForFailedRunsOnly = false;
qtpAutoReport.StepDetailsReportType = "PDF";
When i use this code qtpAutoReport.StepDetailsReportType = "HTML";
My program can run successfully, and i can find this HTML file on my disk.
But, when i use this code qtpAutoReport.StepDetailsReportType = "PDF";
After QTP test is over, i can't any file on my disk.
So my question is why QTP can't export result when i set StepDetailsReportType as "PDF"?
There does seem to be an issue with UFT, I found a method that works for GUI tests(vbscript), give it a try with Service Test (c#).
All options are the same as your example, with one addition:
uftObject.Options.Run.ViewResults = True
This tells UFT that you want to view the results after completion. Without this flag I get no PDF result, with it the file is waiting at the export path.
Option Explicit
Dim uftObject, qtResultsOpt
Set uftObject=CreateObject("Quicktest.application")
uftObject.Launch
uftObject.Visible = True
Set qtResultsOpt = uftObject.Options.Run.AutoExportReportConfig
Dim AutoExportPath
AutoExportPath = "C:\Users\paxic\Desktop\stackoverflow\results"
qtResultsOpt.AutoExportResults = true
qtResultsOpt.StepDetailsReport = true
qtResultsOpt.DataTableReport = false
qtResultsOpt.LogTrackingReport = false
qtResultsOpt.ScreenRecorderReport = false
qtResultsOpt.SystemMonitorReport = false
qtResultsOpt.StepDetailsReportFormat = "Short"
qtResultsOpt.ExportLocation = AutoExportPath
qtResultsOpt.ExportForFailedRunsOnly = false
qtResultsOpt.StepDetailsReportType = "PDF"
uftObject.Open "C:\Users\JMorley\Desktop\stackoverflow\ExampleOne"
qtResultsOpt.AutoExportResults = True
uftObject.Options.Run.ViewResults = True
uftObject.Test.Run

Excel QueryTables.Add from URL Comma Delimited

I have a server that returns large amounts of comma separated data in an http response. I need to import this data into excel.
I have this working by passing the contents to a temp file and then reading the temp file as a csv, but this process seems inefficient. The query tables can read directly from the http response, but it puts each line of data into a single cell, rather than separating into one cell per comma.
Is it possible to read comma separated data from an http response directly into excel from a C# excel add-in?
Thanks!
public static void URLtoCSV(string URL, Excel.Worksheet destinationSheet, Excel.Range destinationRange, int[] columnDataTypes, bool autoFitColumns)
{
destinationSheet.QueryTables.Add(
"URL;" + URL,
destinationRange, Type.Missing);
destinationSheet.QueryTables[1].Name = URL;
destinationSheet.QueryTables[1].FieldNames = true;
destinationSheet.QueryTables[1].RowNumbers = false;
destinationSheet.QueryTables[1].FillAdjacentFormulas = false;
destinationSheet.QueryTables[1].PreserveFormatting = true;
destinationSheet.QueryTables[1].RefreshOnFileOpen = false;
destinationSheet.QueryTables[1].RefreshStyle = XlCellInsertionMode.xlInsertDeleteCells;
destinationSheet.QueryTables[1].SavePassword = false;
destinationSheet.QueryTables[1].SaveData = true;
destinationSheet.QueryTables[1].AdjustColumnWidth = true;
destinationSheet.QueryTables[1].RefreshPeriod = 0;
destinationSheet.QueryTables[1].Refresh(false);
if (autoFitColumns == true)
destinationSheet.QueryTables[1].Destination.EntireColumn.AutoFit();
}
The easier solution than the one you reference is to use the type of "TEXT" instead of URL. TEXT supports all CSV imports, including from HTTP sources. URL appears to be designed to handle screen scraping more than anything else.
e.g. in your case:
destinationSheet.QueryTables.Add("URL;" + URL,
becomes
destinationSheet.QueryTables.Add("TEXT;" + URL,
And for those stumbling upon this post asking the same question but with VB scripting in Excel, the complete solution would look like:
' Load new data from web
With ActiveSheet.QueryTables.Add(Connection:="TEXT;http://yourdomain.com/csv.php", Destination:=Range("$A$1"))
.TextFileCommaDelimiter = True
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlOverwriteCells
.SavePassword = False
.SaveData = False
.AdjustColumnWidth = True
.Refresh BackgroundQuery:=False
End With

Categories