How to highlight text using string indexes in WPF RichTextBox?

How to highlight text using string indexes in WPF RichTextBox? - c#

I'm working on a custom RichTextBox which highlights certain words typed in it.
(more like highlight certain strings, because I intent to highlight strings that are not separated by spaces)
I search for strings by loading the text to memory, and looking for a list of strings one by one, then applying formatting to them.
Issue is that, index I get from the plain text representation, doesn't necessarily point to the same position in the RichTextBox's content, when formatting is applied.
(First formatting is perfect. Any subsequent formatting starts to slip to the left. I assume this is because formatting adds certain elements to the documents which makes my indexes incorrect.)
Sample pseudo code for this is as follows.
// get the current text
var text = new TextRange(Document.ContentStart, Document.ContentEnd).Text;
// loop through and highlight
foreach (string entry in WhatToHighlightCollection)
{
var currentText = text;
var nextOccurance = currentText.IndexOf(suggestion); //This index is Unreliable !!!
while (nextOccurance != -1)
{
// Get the offset from start. (There appears to be 2 characters in the
// beginning. I assume this is document and paragraph start tags ??
// So add 2 to it.)
int offsetFromStart = (text.Length) - (currentText.Length) + 2;
var startPointer = Document.ContentStart.
GetPositionAtOffset(offsetFromStart + nextOccurance, LogicalDirection.Forward);
var endPointer = startPointer.GetPositionAtOffset(suggestion.Length, LogicalDirection.Forward);
var textRange = new TextRange(startPointer, endPointer);
textRange.ApplyPropertyValue(TextElement.BackgroundProperty, new SolidColorBrush(Colors.Yellow));
textRange.ApplyPropertyValue(TextElement.FontWeightProperty, FontWeights.Bold);
textRange.ApplyPropertyValue(TextElement.FontFamilyProperty, new FontFamily("Segoe UI"));
// Go to the next occurance.
currentText = currentText.Substring(nextOccurance + suggestion.Length);
nextOccurance = currentText.IndexOf(suggestion);
}
}
How do I map string indexes to rich text box content ?
NOTE: I'm not worried about the performance of this at the moment, although any suggestions are always welcome, as currently I run this on every TextChanged event to highlight 'as the user type' and it's getting a bit sluggish.

Related

Finding list of objects that contain full or just part of searched string

I've a list of paragraphs. Each paragagraph can contain Text. I'm trying to search for a string that may be as whole within a single paragraph, or spread across multiple paragraphs with as bad case where each letter is different paragraph.
public List<WordParagraph> FindText(string text) {
List<WordParagraph> list = new List<WordParagraph>();
var found = false;
Paragraph currentParagraph = null;
foreach (var paragraph in this.Paragraphs) {
//if (currentParagraph == null) {
// currentParagraph = paragraph._paragraph;
//} else {
// if (currentParagraph != paragraph._paragraph) {
// found = false;
// }
//}
// paragraph.Text
// logic missing to find text that can start within some paragraph.Text, but
// can span across multiple paragraphs
// for example searching for text "This Is MyTest" within 4 paragraphs that
// may be written like
// paragraph.Text = "Thi"
// paragraph.Text = "s Is"
// paragraph.Text = " MyTes"
// paragraph.Text = "t"
}
return list;
}
I've tried some logic around foreach char in text, and nested loop over text from the paragraph.text but the logic was failing me.
To give you a bit of background. Consider a Word Document that has a single sentence - one long sentence but each word, or even letter is formatted differently - different font size, bold, underline or whatever. It looks like this:
Now what Word actually saved in the file is a single paragraph, but each paragraph has multiple "runs". The run contains a Text element. Each text element contains the text that you see in Word, but due to formatting of possibly even each word it can be split into many many small Text properties.
Now in my example, I've simplified the logic and for me, each "run" is a paragraph with a text. So List of WordParagraphs is a list of runs within Screenshot you see.
Now I need to find a string "I have that" from the whole sentence you see in word. That means I need to go thru all paragraphs, find the first letter that matches and then check if next letter matches as well, if not I need to start again.
My brain is having hard time to grasp this logic in code.

How many spaces does \t use in c#

I am building a report using StringBuilder and the details of it to be properly intended and aligned
for which i will be using
private static int paperWidth = 55; //it defines the size of paper
private static readonly string singleLine = string.Empty.PadLeft(paperWidth, '-');
StringBuilder reportLayout;
reportLayout.AppendLine("\t" + "Store Name");
I want Store Name in center and many such more feilds by use of \t
Thanks in Advance.
EDIT
I want to print like. Store Name in center

If you're simulating what tabs look like at a terminal you should
stick with 8 spaces per tab. A Tab character shifts over to the next
tab stop. By default, there is one every 8 spaces. But in most shells
you can easily edit it to be whatever number of spaces you want
You can realize this through the following Code:
string tab = "\t";
string space = new string(' ', 8);
StringBuilder str = new StringBuilder();
str.AppendLine(tab + "A");
str.AppendLine(space + "B");
string outPut = str.ToString(); // will give two lines of equal length
int lengthOfOP = outPut.Length; //will give you 15
From the above example we can say that in .Net the length of \t is
calculated as 1

A Tab is a Tab and its meaning is created by the application that renders it.
Think of a word processor where a Tab means:
Go to the next tab stop.
You can define the tab stops!
To center output do not use Tabs, use the correct StringFormat :
StringFormat fmt = new StringFormat()
{ Alignment = StringAlignment.Center, LineAlignment = StringAlignment.Center };
This centers the text inside a rectanlge in both directions:
e.Graphics.DrawString(someText, someFont, someBrush, layoutRectangle, fmt);
or something like it..
But it looks as if you want to embed the centering inside a text.
This will only work if you really know everything about the output process, i.e. the device, the Font and Size as well as the margins etc..
So it will probably not be reliable at all, no matter what you do.
The best alternative may be to either give up on plain text or use a fixed number of spaces to 'mean' 'centered' and then watch for this number when you render.
If you don't have control over the rendering, it will not work.

RichTextBox SelectionStart offset with linebreaks

I'm using a RichTextBox for coloured text. Let's assume I want to use different colours for different portions of the text. This is working fine so far.
I'm currently having a problem with the SelectionStart property of the RichTextBox. I've set some text to the Text property of the RichTextBox. If the text contains \r\n\r\n the SelectionStart Position won't match the position of characters with the assigned String.
Small example (WinformsApplication. Form with a RichTextBox):
public Form1()
{
InitializeComponent();
String sentence1 = "This is the first sentence.";
String sentence2 = "This is the second sentence";
String text = sentence1 + "\r\n\r\n" + sentence2;
int start1 = text.IndexOf(sentence1);
int start2 = text.IndexOf(sentence2);
this.richTextBox1.Text = text;
String subString1 = text.Substring(start1, sentence1.Length);
String subString2 = text.Substring(start2, sentence2.Length);
bool match1 = (sentence1 == subString1); // true
bool match2 = (sentence2 == subString2); // true
this.richTextBox1.SelectionStart = start1;
this.richTextBox1.SelectionLength = sentence1.Length;
this.richTextBox1.SelectionColor = Color.Red;
this.richTextBox1.SelectionStart = start2;
this.richTextBox1.SelectionLength = sentence2.Length;
this.richTextBox1.SelectionColor = Color.Blue;
}
The RichTextBox looks like this:
As you can see, the first two characters of the second sentence are not coloured. This is the result of an offset produced by \r\n\r\n.
What is the reason for this? Should I use another control for colouring text?
How do I fix the problem in a reliable way? I've tried replacing the "\r\n\r\n"with a String.Empty, but that produces other offset problem.
Related question:
Inconsistent behaviour between in RichTextBox.Select with SubString method

It seems that the sequence \r\n counts for one character only when doing selections. You can do the measurements in a copy of the string where all \r\n are replaced by \n.

Just for completeness (I'll stick to linepogls answer for now):
I've found another way to get indices for the SelectionStart property. The RichTextBox offers a Find method, that can be used to retrieve index positions based on a specified string.
Be aware of the fact, that the text you want to highlight might not be unique and occur multiple times. You can use an overload to specify a start position for the search.

C# - Implementing Markdown to Word (OpenXML)

I'm trying to implement my own version of markdown for creating Word Documents in a C# application. For bold/italic/underline I am going to use **/ `/_ respectively. I have created something that parses combinations of **'s to output bold text by extracting a match and using something like this:
RunProperties rPr2 = new RunProperties();
rPr2.Append(new Bold() { Val = new OnOffValue(true) });
Run run2 = new Run();
run2.Append(rPr2);
run2.Append(new Text(extractedString));
p.Append(run2);
My issue is when I come to combining the three different formats, as I'm thinking I would have to weigh up all the different formatting combinations and split them into separate runs. Bold runs, bold italic runs, underline runs, bold underline runs etc etc. I want my program to be able to handle something like this:
**_Lorem ipsum_** (creates bold & underlined run)
`Lorem ipsum` dolor sit amet, **consectetur _adipiscing_ elit**.
_Praesent `feugiat` velit_ sed tellus convallis, **non `rhoncus** tortor` auctor.
Basically any mix of the styles you could throw at it I want it to handle. However if I am programmatically generating these runs, I need to weigh everything up before setting the text into runs, should I handle this with an array of character indexes for each style and merge them into a big list of styles (not sure how exactly I would do this)?
The final question is does something like this already exist? If it does I have been unable to find it (markdown to word).

I think you'll have to split your text into parts by the formatting they have and add each part with the correct formatting to the document. Like here http://msdn.microsoft.com/en-us/library/office/gg278312.aspx.
So
**non `rhoncus** tortor` will become - "non "{bold}, "rhoncus "{bold,italic}, "tortor"{italic}
I think it'll be easier than performing several runs. You don't even have to parse the entire document. Just parse as you go and after each "change" in the formatting write to the docx.
Another thought - If all you're creating is simple text and that's all you need, it might be even simpler to generate the openXML itself. Your data is very structured, should be easy enough to create an XML out of it.
Here's a simple algorithm to do what I propose...
// These are the different formattings you have
public enum Formatings
{
Bold, Italic, Underline, Undefined
}
// This will store the current format
private Dictionary<Formatings, bool> m_CurrentFormat;
// This will store which string translates into which format
private Dictionary<string, Formatings> m_FormatingEncoding;
public void Init()
{
m_CurrentFormat = new Dictionary<Formatings, bool>();
foreach (Formatings format in Enum.GetValues(typeof(Formatings)))
{
m_CurrentFormat.Add(format, false);
}
m_FormatingEncoding = new Dictionary<string, Formatings>
{{"**", Formatings.Bold}, {"'", Formatings.Italic}, {"\\", Formatings.Underline}};
}
public void ParseFormattedText(string p_text)
{
StringBuilder currentWordBuilder = new StringBuilder();
int currentIndex = 0;
while (currentIndex < p_text.Length)
{
Formatings currentFormatSymbol;
int shift;
if (IsFormatSymbol(p_text, currentIndex, out currentFormatSymbol, out shift))
{
// This is the current word you need to insert
string currentWord = currentWordBuilder.ToString();
// This is the current formatting status --> m_CurrentFormat
// This is where you can insert your code and add the word you want to the .docx
currentWordBuilder = new StringBuilder();
currentIndex += shift;
m_CurrentFormat[currentFormatSymbol] = !m_CurrentFormat[currentFormatSymbol];
}
currentWordBuilder.Append(p_text[currentIndex]);
currentIndex++;
}
}
// Checks if the current position is the begining of a format symbol
// if true - p_currentFormatSymbol will be the discovered format delimiter
// and p_shift will denote it's length
private bool IsFormatSymbol(string p_text, int p_currentIndex, out Formatings p_currentFormatSymbol, out int p_shift)
{
// This is a trivial solution, you can do better if you need
string substring = p_text.Substring(p_currentIndex, 2);
foreach (var formatString in m_FormatingEncoding.Keys)
{
if (substring.StartsWith(formatString))
{
p_shift = formatString.Length;
p_currentFormatSymbol = m_FormatingEncoding[formatString];
return true;
}
}
p_shift = -1;
p_currentFormatSymbol = Formatings.Undefined;
return false;
}

different format into one single line Interop.word

I've been trying to figure out how to insert 2 different formats into the same paragraph using interop.word in c# like this:
hello planet earth here's what I want to do

Assuming you have your document defined as oDoc, the following code should get you the desired result:
Word.Paragraph oPara = oDoc.Content.Paragraphs.Add(ref oMissing);
oPara.Range.Text = "hello planet earth here's what I want to do";
object oStart = oPara.Range.Start + 13;
object oEnd = oPara.Range.Start + 18;
Word.Range rBold = oDoc.Range(ref oStart, ref oEnd);
rBold.Bold = 1;

I had to modify Dennis' answer a little to get it to work for me.
What I'm doing it totally automated, so I have to only work with variables.
private void InsertMultiFormatParagraph(string text, int size, int spaceAfter = 10) {
var para = docWord.Content.Paragraphs.Add(ref objMissing);
para.Range.Text = text;
// Explicitly set this to "not bold"
para.Range.Font.Bold = 0;
para.Range.Font.Size = size;
para.Format.SpaceAfter = spaceAfter;
var start = para.Range.Start;
var end = para.Range.Start + text.IndexOf(":");
var rngBold = docWord.Range(ref objStart, ref objEnd);
rngBold.Bold = 1;
para.Range.InsertParagraphAfter();
}
The main difference that made me want to make this post was that the Paragraph should be inserted AFTER the font is changed. My initial thought was to insert it after setting the SpaceAfter property, but then the objStart and objEnd values were tossing "OutOfRange" Exceptions. It was a little counter-intuitive, so I wanted to make sure everyone knew.

The following code seemed to work the best for me when formatting a particular selection within a paragraph. Using Word's built in "find" function to make a selection, then formatting only the selected text. This approach would only work well if the text to select is a unique string within the selection. But for most situations I have run across, this seems to work.
oWord.Selection.Find.Text = Variable_Containing_Text_to_Select; // sets the variable for find and select
oWord.Selection.Find.Execute(); // Executes find and select
oWord.Selection.Font.Bold = 1; // Modifies selection
oWord.Selection.Collapse(); // Clears selection
Hope this helps someone!

I know this post is old, but it came out in almost all my searches. The answer below is in case someone, like me, wants to do this for more than one word in a sentence. In this case, I loop through a string array of variables that contain strings and change that text to bold--modifing #joshman1019
string[] makeBold = new string[4] {a, b, c, d};
foreach (string s in makeBold)
{
wApp.Selection.Find.Text = s; //changes with each iteration
wApp.Selection.Find.Execute();
wApp.Selection.Font.Bold = 1;
wApp.Selection.Collapse(); //used to 'clear' the selection
wApp.Selection.Find.ClearFormatting();
}
So, each string represented by the variable will be bold. So if a = "hello world", then Hello World is made bold in the Word doc. Hope it saves someone some time.

I know this is an old thread, but I thought I'd post here anyway for those that come across it via Google (like I did). I got most of the way to a solution with krillgar's approach, but I had trouble because some of my text contains newlines. Accordingly, this modification worked best for me:
private void WriteText(string text)
{
var para = doc.Content.Paragraphs.Add();
var start = para.Range.Start;
var end = para.Range.Start + text.IndexOf(":");
para.Range.Text = text;
para.Range.Font.Bold = 0;
para.Range.InsertParagraphAfter();
if(text.Contains(":")){
var rngBold = doc.Range(start, end);
rngBold.Bold = 1;
}
}
The key difference is that I calculate start and end earlier in the function. I can't quite put my finger on it, but I think if your new text has newlines in it, the later calculation of start/end messes something up.
And obviously my solution is intended for text with the format:
Label: Data
where Label is to be bolded.

Consider usage of Range.Collapse eventually with Microsoft.Office.Interop.Word.WdCollapseDirection.wdCollapseEnd as parameter.
That would allow next text to have formatting different than previous text (and next text formatting will not affect formatting of previous one).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to highlight text using string indexes in WPF RichTextBox? - c#

Related

Finding list of objects that contain full or just part of searched string

How many spaces does \t use in c#

RichTextBox SelectionStart offset with linebreaks

C# - Implementing Markdown to Word (OpenXML)

different format into one single line Interop.word

Categories

Resources