find and replace text in the entire document including headers

find and replace text in the entire document including headers - c#

Document example (Opens correctly in MS Office)
I have a Word document 1 where I need to replace all tags <> in the text with my values. Using the interop, I got access to the main text and the text of the Headers for seaching matches by Regex class,
static string GetContent(Word.Document document)
{
return document.Content.Text;
}
static string GetHeaderFooterText(Word.Document document)
{
StringBuilder sb = new StringBuilder();
foreach (Word.Section section in document.Sections)
{
foreach (Word.HeaderFooter hf in section.Headers)
{
if (!hf.Exists)
continue;
Word.Range range = hf.Range;
sb.AppendLine(range.Text);
foreach (Word.Shape shape in hf.Shapes)
{
sb.AppendLine(shape.TextFrame.TextRange.Text);
}
}
}
return sb.ToString();
}
public string[] GetMatches(string pattern, string text)
{
WordReader reader = new WordReader(Word);
Regex regex = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
HashSet<string> matches = new HashSet<string>();
foreach(Match match in regex.Matches(text))
matches.Add(match.Value);
return matches.ToArray();
}
but I can’t get the text from the tables located in the Headers. The value of the number of tables property of the HeaderFooter class is 0.
Replacing text with the Find class also does not replace tags in these tables.
private void ReplaceWords(Word.Application app, Dictionary<string, string> keyValuePairs)
{
foreach (var pair in keyValuePairs)
{
Word.Find find = app.Selection.Find;
find.ClearFormatting();
find.Replacement.ClearFormatting();
find.Text = pair.Key;
find.Replacement.Text = pair.Value;
find.MatchAllWordForms = false;
find.Forward = true;
find.Wrap = Word.WdFindWrap.wdFindContinue;
find.Forward = false;
find.MatchCase = false;
find.MatchWholeWord = false;
find.MatchWildcards = false;
find.MatchSoundsLike = false;
find.Execute(Replace: Word.WdReplace.wdReplaceAll);
}
}
Is there a way to access these tables and how to replace text in the document globally at all levels?

I found a working way to access all document content:
foreach (Word.Range range in wordApp.ActiveDocument.StoryRanges)
{
//Get string range.Text or use range.Find to find and replace text in document
}

Related

Search multiple words in a text file

I made a code to search for several words in a text file but only the last word is searched, I would like to solve it
code:
string txt_text;
string[] words = {
"var",
"bob",
"for",
"example"
};
StreamReader file = new StreamReader("test.txt");
foreach(string _words in words) {
while ((txt_text = file.ReadToEnd()) != null) {
if (txt_text.Contains(_words)) {
textBox1.Text = "founded";
break;
} else {
textBox1.Text = "nothing founded";
break;
}
}
}

First of all, you can get rid of StreamReader and loop and query the file with a help of Linq
using System.Linq;
using System.IO;
...
textBox1.Text = File
.ReadLines("test.txt")
.Any(line => words.Any(word => line.Contains(word)))
? "found"
: "nothing found";
If you insist on loop, you should drop else:
// using - do not forget to Dispose IDisposable
using StreamReader file = new StreamReader("test.txt");
// shorter version is
// string txt_text = File.ReadAllText("test.txt");
string txt_text = file.ReadToEnd();
bool found = false;
foreach (string word in words)
if (txt_text.Contains(word)) {
// If any word has been found, stop further searching
found = true;
break;
} // no else here: keep on looping for other words
textBox1.Text = found
? "found"
: "nothing found";

I'd save the text in a variable and then loop over your words to check if it exists in the file. Something like this:
string[] words = { "var", "bob", "for", "example"};
var text = file.ReadToEnd();
List<string> foundWords = new List<string>();
foreach (var word in words)
{
if (text.Contains(word))
foundWords.Add(word);
}
Then, the list foundWords contains all matching words.
(PS: Don't forget to put your StreamReader in a using statement so it gets disposed correctly)

C# OpenXML How to Replace \r\n with Break()?

I have a text field in my database and it has a text with many lines.
When generating a MS Word document using OpenXML and bookmarks, the text become one single line.
I've noticed that in each new line the bookmark value show the characters "\r\n".
Looking for a solution, I've found some answers which helped me, but I'm still having a problem.
I've used the run.Append(new Break()); solution, but the text replaced is showing the name of the bookmark as well.
For example:
bookmark test = "Big text here in first paragraph\r\nSecond paragraph".
It is shown in MS Word document like:
testBig text here in first paragraph
Second paragraph
Can anyone, please, help me to eliminate the bookmark name?
Here is my code:
public void UpdateBookmarksVistoria(string originalPath, string copyPath, string fileType)
{
string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
// Make a copy of the template file.
File.Copy(originalPath, copyPath, true);
//Open the document as an Open XML package and extract the main document part.
using (WordprocessingDocument wordPackage = WordprocessingDocument.Open(copyPath, true))
{
MainDocumentPart part = wordPackage.MainDocumentPart;
//Setup the namespace manager so you can perform XPath queries
//to search for bookmarks in the part.
NameTable nt = new NameTable();
XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
nsManager.AddNamespace("w", wordmlNamespace);
//Load the part's XML into an XmlDocument instance.
XmlDocument xmlDoc = new XmlDocument(nt);
xmlDoc.Load(part.GetStream());
//pega a url para exibir as fotos
string url = HttpContext.Current.Request.Url.ToString();
string enderecoURL;
if (url.Contains("localhost"))
enderecoURL = url.Substring(0, 26);
else if (url.Contains("www."))
enderecoURL = url.Substring(0, 24);
else
enderecoURL = url.Substring(0, 20);
//Iterate through the bookmarks.
int cont = 56;
foreach (KeyValuePair<string, string> bookmark in bookmarks)
{
var res = from bm in part.Document.Body.Descendants<BookmarkStart>()
where bm.Name == bookmark.Key
select bm;
var bk = res.SingleOrDefault();
if (bk != null)
{
Run bookmarkText = bk.NextSibling<Run>();
if (bookmarkText != null) // if the bookmark has text replace it
{
var texts = bookmark.Value.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
for (int i = 0; i < texts.Length; i++)
{
if (i > 0)
bookmarkText.Append(new Break());
Text text = new Text();
text.Text = texts[i];
bookmarkText.Append(text); //HERE IS MY PROBLEM
}
}
else // otherwise append new text immediately after it
{
var parent = bk.Parent; // bookmark's parent element
Text text = new Text(bookmark.Value);
Run run = new Run(new RunProperties());
run.Append(text);
// insert after bookmark parent
parent.Append(run);
}
bk.Remove(); // we don't want the bookmark anymore
}
}
//Write the changes back to the document part.
xmlDoc.Save(wordPackage.MainDocumentPart.GetStream(FileMode.Create));
wordPackage.Close();
}}

Deleting between the tags it is not deleting

I have asked on how to delete between values between the tags and i got solution of which i had to modify to meet my needs, but now the problem is it doesn't delete the values between the tags and i debug to see the error but no error was found. when i check the file that is been created i can still see the values inside the tags. please help me
under new button which copies the file
XmlReadMode omode = oDataSet.ReadXml(PathSelection);
for (int i = 0; i < oDataSet.Tables[2].Rows.Count; i++)
{
string comment = oDataSet.Tables["data"].Rows[i][2].ToString();
string font = DeleteBetween(comment, "[Font]", "[/Font]");
string datestamp = DeleteBetween(comment, "[DateStamp]", "[/DateStamp]");
string commentVal = DeleteBetween(comment, "[Comment]", "[/Comment]");
string[] row = new string[]
{
oDataSet.Tables["data"].Rows[i][0].ToString(),
oDataSet.Tables["data"].Rows[i][1].ToString(),
font,
datestamp,
commentVal
};
File.Copy(txtInputfile.Text, txtInputfile.Text.Replace("string-en.resx", "string-lan.resx"));
Gridview_Output.Rows.Add(row);
}
Function
public string DeleteBetween(string STR, string FirstString, string LastString)
{
string regularExpressionPattern1 = #"(?:\" + FirstString + #")([^[]+)\[\/" + LastString;
Regex regex = new Regex(regularExpressionPattern1, RegexOptions.Singleline);
MatchCollection collection = regex.Matches(STR.ToString());
var val = string.Empty;
foreach (Match m in collection)
{
val = m.Groups[1].Value;
}
return val;
}

Remove the Text in C#.net

I have a string like,
string str;
str = "This is my new string. "<script>" Hi this is XYZ "</script>"";
Now I want to remove the text, from "<script>" to "</script>" including the tags by using C#.net code.
Thanks,

You should check Regex.
With that you can locate it and delete then.
This should get everything between script tags "<script>[^<]+</script>"

this is what i use to remove html tags in a string
public static string ClearHtmlTags(string html)
{
if (string.IsNullOrWhiteSpace(html))
return html;
html = html.Trim();
string[] hs = html.Split("<>".ToArray());
bool skip = false;
StringBuilder sb = new StringBuilder();
foreach (string s in hs)
{
if (!skip)
sb.Append(s);
skip = !skip;
}
return sb.ToString();
}
and with a simple modify you will get your method
public static string ClearHtmlTags(string html)
{
if (string.IsNullOrWhiteSpace(html))
return html;
html = html.Trim();
string[] hs = html.Split("<>".ToArray());
bool skip = false;
bool skipTag = false;
StringBuilder sb = new StringBuilder();
foreach (string s in hs)
{
if (!skip)
{
if (!skipTag)
sb.Append(s);
}
else
{
skipTag = s == "script";
}
skip = !skip;
}
return sb.ToString();
}

You can use something like:
Regex.Replace(inputString, "<script>([a-z]|[A-Z])*</script>", "");
now this would only allow alphanumeric text within the script tags

If you want to remove text from specific length e.g
"my name is testing"
here you want to remove is
just use indexof function later on use substring method for replace string with null or some thing else.
In c# you can filter your string like this or user regex before enter the data

String not getting decoded

I have a DecXpress report and the datasource shows a filed where the data is comming something like
PRODUCT - APPLE<BR/>ITEM NUMBER - 23454</BR>LOT NUMBER 3343 <BR/>
Now that is how it is showing in a cell, so i decided to decoded, but nothing is working, i tried HttpUtility.HtmlDecode and here i am trying WebUtility.HtmlDecode.
private void xrTableCell9_BeforePrint(object sender, System.Drawing.Printing.PrintEventArgs e)
{
XRTableCell cell = sender as XRTableCell;
string _description = WebUtility.HtmlDecode(Convert.ToString(GetCurrentColumnValue("Description")));
cell.Text = _description;
}
How can I decode the value of this column in the datasource?.
Thank you

If you need to show the description with the < /> also, you need to use HtmlEncode.
If you need to extract the text from that html
public static string ExtractTextFromHtml(this string text)
{
if (String.IsNullOrEmpty(text))
return text;
var sb = new StringBuilder();
var doc = new HtmlDocument();
doc.LoadHtml(text);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()"))
{
if (!String.IsNullOrWhiteSpace(node.InnerText))
sb.Append(HtmlEntity.DeEntitize(node.InnerText.Trim()) + " ");
}
return sb.ToString();
}
And you need HtmlAgilityPack
To remove the br tags:
var str = Convert.ToString(GetCurrentColumnValue("Description"));
Regex.Replace(str, #"</?\s?br\s?/?>", System.Environment.NewLine, RegexOptions.IgnoreCase);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

find and replace text in the entire document including headers - c#

I found a working way to access all document content: foreach (Word.Range range in wordApp.ActiveDocument.StoryRanges) { //Get string range.Text or use range.Find to find and replace text in document }

Related

Search multiple words in a text file

C# OpenXML How to Replace \r\n with Break()?

Deleting between the tags it is not deleting

Remove the Text in C#.net

String not getting decoded

Categories

Resources