I'm working on my own syntax highlighter using a Richtextbox. It's already working, but I've noticed that the typing slows down a lot when there's to many lines of code. This is because my syntax highlight function is coloring all the words in the entire Richtextbox on every change made to it. Here's a minimal example of the function to see how it works:
private void colorCode()
{
// getting keywords/functions
string keywords = #"\b(class|function)\b";
MatchCollection keywordMatches = Regex.Matches(codeBox.Text, keywords);
// saving the original caret position + forecolor
int originalIndex = codeBox.SelectionStart;
int originalLength = codeBox.SelectionLength;
Color originalColor = Color.Black
// focuses a label before highlighting (avoids blinking)
titleLabel.Focus();;
// removes any previous highlighting (so modified words won't remain highlighted)
codeBox.SelectionStart = 0;
codeBox.SelectionLength = codeBox.Text.Length;
codeBox.SelectionColor = originalColor;
foreach (Match m in keywordMatches)
{
codeBox.SelectionStart = m.Index;
codeBox.SelectionLength = m.Length;
codeBox.SelectionColor = Color.Blue;
}
// restoring the original colors, for further writing
codeBox.SelectionStart = originalIndex;
codeBox.SelectionLength = originalLength;
codeBox.SelectionColor = originalColor;
// giving back the focus
codeBox.Focus();
}
To solve the problem, I want to write a function that doesn't change the entire Richtextbox, but just the line of the cursor position instead. I realise this will still cause the same issue on minified code, but that's not a problem for me. The problem is, I can't seem to get it working. This is what I've got so far:
void changeLine(RichTextBox RTB, int line, Color clr, int curPos){
string testWords = #"\b(test1|test2)\b";
MatchCollection testwordMatches = Regex.Matches(RTB.Lines[line], testWords);
foreach (Match m in testwordMatches)
{
//RTB.SelectionStart = m.Index;
//RTB.SelectionLength = m.Length;
RTB.SelectionColor = Color.Blue;
}
RTB.SelectionStart = curPos;
RTB.SelectionColor = Color.Black;
}
The problem is that it does the coloring when a word in testWords is found, but it colors the entire line instead of just the word. This is because I can't figure out a way of doing the selections right. So I'm hoping you guys can help me out with this.
Edit:
I'd like to add that I did thought about other solutions, like putting the lines in a List, or using a Stringbuilder. But those will turn the lines into strings and don't allow me to do color formatting like the Richtextbox does.
Well, you obviously need language lexer and parser. This task is not solvable by using Regex. It's just doesn't capable to accomplish this because of some fundamental grammar rules (or "power levels" of grammars) (read about Thomsky hierarchy of grammars).
What you need is to use some grammar toolkit. For example ANTLR4 provide grammar lexer/parser generator and set of already predefined grammars.
For example, you can find a lot of user-written grammars in here (including latest C# syntax): https://github.com/antlr/grammars-v4
Then just generate parser/lexer by it and feed it your string. It will output full hierarchy with indexes and lengths of each token, and you can colorize them without jumping across entire rich box.
Also, consider to use some timeout between user input, so you don't colorize your output every symbol (just save color from previous token, and use it for some time, until you recolorize output, then refresh). This way it will go as smoothly as it is in Visual Studio.
Related
I have the following code to create syntax highlighting for a text editor that I am working on. It uses the FastColoredTextBox component. I can't quite get the regex pattern for highlighting batch file variables correct.
private void batchSyntaxHighlight(FastColoredTextBox fctb)
{
fctb.LeftBracket = '(';
fctb.RightBracket = ')';
fctb.LeftBracket2 = '\x0';
fctb.RightBracket2 = '\x0';
Range e = fctb.Range;
e.ClearStyle(StyleIndex.All);
//clear style of changed range
e.ClearStyle(BlueStyle, BoldStyle, GrayStyle, MagentaStyle, GreenStyleItalic, BrownStyleItalic, YellowStyle);
//variable highlighting
e.SetStyle(YellowStyle, "(\".+?\"|\'.+?\')", RegexOptions.Singleline);
//comment highlighting
e.SetStyle(GreenStyleItalic, #"(REM.*)");
//attribute highlighting
e.SetStyle(GrayStyle, #"^\s*(?<range>\[.+?\])\s*$", RegexOptions.Multiline);
//class name highlighting
e.SetStyle(BoldStyle, #"(:.*)");
//symbol highlighting
e.SetStyle(MagentaStyle, #"(#|%)", RegexOptions.Singleline);
e.SetStyle(RedStyle, #"(\*)", RegexOptions.Singleline);
//keyword highlighting
e.SetStyle(BlueStyle, #"\b(set|SET|echo|Echo|ECHO|FOR|for|PUSHD|pushd|POPD|popd|pause|PAUSE|exit|Exit|EXIT|cd|CD|If|IF|if|ELSE|Else|else|GOTO|goto|DEL|del)");
//clear folding markers
e.ClearFoldingMarkers();
BATCH_HIGHLIGHTING = true;
}
Using this code I can't seem to highlight strings between two '%' symbols without highlighting almost the entire file because many lines will only contain one '%' symbol or two right next to each other.
I am also having trouble with '::' comments. In order to highlight the labels I have created the regex pattern to match any line that has a ':' in it followed by all characters that proceed it.
I want to get the highlighting correct so that labels will be highlighting BoldStyle and '::' comments will be highlighted GreenItalicStyle without any conflicts. I would also like to be able to highlight strings that lay between two '%' symbols without conflicts (such as a line that contains only one '%')
All this should only be highlighted if not in a comment.
EDIT: Currently the code only highlights '%' symbols by themselves as I was unable to get the code to work for highlighting between them without causing major syntax issues.
Big thanks to #DougF for helping me find this solution. The answer is:
#"^:[a-zA-Z]+"
For Windows Forms.
I am trying to insert text into the .rtf field of a RichTextBox.
I have tried two methods. When I use .Rtf.insert, nothing happens at all.
When I edit the .rtf string based on the selected text positions myself, I either end up adding gibberish to the thing or getting an error that says that the file format is invalid. My best guess is that this is because the .rtf string is in .rtf format and the selection index that I am using is based on the plain text string and so I am inserting the text in the wrong location in the .rtf string and messing up the RTF code.
But knowing what the problem is (if I am correct) hasn't helped me solve it.
Is there a way to get .rtf.insert to work correctly, or is there a way to translate the selected text indexes to the actual .rtf text positions so that something like the code below would work? I am assuming that the RichTextBox itself must know how to translate the one index into another because it can insert characters when the user types just fine.
Here is my code snippet. The point of the code is to insert a marker into the text that will later be parsed and replaced with a student's first name. There will be other such codes. "codeLeader" and "codeEnder" are just the strings I use to surround the codes with. In this case I am using "[*" and *]" to indicate that there is a code I will need to parse, but I put them into separate strings so that I can easily change it if I wish. I have actually already written the parsing code, which works just fine on rich text. It is just inserting the text into the richTextBox itself that is the problem. In other words, if I were to type the codes by hand it would work just fine. But this would be troublesome for the user because some of the codes will use index numbers.
private void studentFirstNameCode_Click(object sender, EventArgs e)
{
string ins = f1ref.codeLeader;
ins += "SNFirst" + f1ref.codeEnder;
int start = editorField.richTextBox1.SelectionStart;
if (start == -1) { start = 0; }
int end = start + editorField.richTextBox1.SelectionLength;
if (end == -1) { end = 0; }
string pre = editorField.richTextBox1.Rtf.Substring(0, start);
string post = editorField.richTextBox1.Rtf.Substring(end);
string newstring = pre + ins + post;
editorField.richTextBox1.Rtf = newstring;
// this also doesn't work. gives no result at all.
// editorField.richTextBox1.Rtf.Insert(start, newstring);
}
I don't think that you need to use the RTF property to simple insert a text inside the RichTextBox actual text. In particular because you don't seem to add an RTF formatted text.
If you don't want to use RTF then the simplest way to accomplish your goal is just one line of code
editorField.SelectedText = yourParameterText;
This will work as you have pasted the text from the clipboard in the selected position (eventually replacing text if something is selected) and the base work of correctly formatting your text inside the RTF is done by the control itself
I have found a work-around by using .SendKeys. This makes the text appear a bit slowly (as if typed very quickly) so isn't optimal, but it does work.
It is enough for a workable solution, but I am still troubled by the problem. It seems like this issue should have a more elegant solution than this.
I got some problems with getting caret index of TextBox in Windows Store App(WP 8.1).
I need to insert specific symbols to the text when button is pressed.
I tried this:
text.Text = text.Text.Insert(text.SelectionStart, "~");
But this code inserts symbol to the beginning of text, not to the place where caret is.
UPDATE
I updated my code thanks to Ladi. But now I got another problem: I'm building HMTL editor app so my default TextBlock.Text is: <!DOCTYPE html>\r\n<html>\r\n<head>\r\n</head>\r\n<body>\r\n</body>\r\n</html>
So for example when user inserts symbol to line 3, symbol is inserted 2 symbols before caret; 3 syms before when caret is in line 4 and so on. Inserting works properly when symbol is inserted to the first line.
Here's my inserting code:
Index = HTMLBox.SelectionStart;
HTMLBox.Text = HTMLBox.Text.Insert(Index, (sender as AppBarButton).Label);
HTMLBox.Focus(Windows.UI.Xaml.FocusState.Keyboard);
HTMLBox.Select(Index+1,0);
So how to solve this? I guess new line chars making trouble.
For your first issue I assume you changed the TextBox.Text before accessing SelectionStart. When you set the text.Text, text.SelectionStart is reset to 0.
Regarding your second issue related to new lines.
You could say that what you observe is by design. SelectionStart will count one "\r\n" as one character for reasons explained here (see Remarks section). On the other hand, method string.Insert does not care about that aspect and counts "\r\n" as two characters.
You need to change slightly your code. You cannot use the value of SelectionStart as the insert position. You need to calculate the insert position accounting for this behavior of SelectionStart.
Here is a verbose code sample with a potential solution.
// normalizedText will allow you to separate the text before
// the caret even without knowing how many new line characters you have.
string normalizedText = text.Text.Replace("\r\n", "\n");
string textBeforeCaret = normalizedText.Substring(0, text.SelectionStart);
// Now that you have the text before the caret you can count the new lines.
// that need to be accounted for.
int newLineCount = textBeforeCaret.Count(c => c == '\n');
// Knowing the new lines you can calculate the insert position.
int insertPosition = text.SelectionStart + newLineCount;
text.Text = text.Text.Insert(insertPosition, "~");
Also you should make sure that SelectionStart does not exhibit similar behavior with other combinations beside "\r\n". If it does you will need to update the code above.
Say I have a WPF RichTextBox with the following content:
Hello Hello // <== here is a line break \r\n
Turn Your Radio On!
I then read the text from the box with the following code:
public static string GetText(this RichTextBox box)
{
var range = new TextRange(box.Document.ContentStart,
box.Document.ContentEnd);
return range.Text;
}
After that I retrieve var index = text.IndexOf("Hello\r\nTurn") and var length = "Hello\r\nTurn".Length.
Based on index and length:
How can I select that text in the RichTextBox? The index/length in the plain string does not match up with what the RichTextBox expects.
I tried the approach from the answer here, but this does not seem to work if the text contains a line wrap / paragraph.
Note: My string manipulation (finding index / length) is considerably more complex than the example, but the example given here describes my problem well
The RichTextBox has a Selection property that you can call its "Select" method.
It accepts 2 TextPointer objects, one for the selection start and the other for its end.
http://msdn.microsoft.com/en-us/library/system.windows.documents.textrange.select.aspx
I don't think you the index and length properties will be good enough for you to select the text. You would have to get the real TextPointer.
Try using a method for finding specific words TextPointers such as the one specified in here -
http://blogs.microsoft.co.il/blogs/tamir/archive/2006/12/14/RichTextBox-syntax-highlighting.aspx
I have made a HTML syntax highlighter in C# and it works great, but there's one problem. First off It runs pretty fast because it syntax highlights line by line, but when I paste more than one line of code or open a file I have to highlight the whole file which can take up to a minute for a file with only 150 lines of code. I tried just highlighting visible lines in the richtextbox but then when I try to scroll I can't it to highlight the new visible text. Here is my code:(note: I need to use regex so I can get the stuff in between < & > characters)
Highlight Whole File:
public void AllMarkup()
{
int selectionstart = richTextBox1.SelectionStart;
Regex rex = new Regex("<html>|</html>|<head.*?>|</head>|<body.*?>|</body>|<div.*?>|</div>|<span.*?>|</span>|<title.*?>|</title>|<style.*?>|</style>|<script.*?>|</script>|<link.*?/>|<meta.*?/>|<base.*?/>|<center.*?>|</center>|<a.*?>|</a>");
foreach (Match m in rex.Matches(richTextBox1.Text))
{
richTextBox1.Select(m.Index, m.Value.Length);
richTextBox1.SelectionColor = Color.Blue;
richTextBox1.Select(selectionstart, -1);
richTextBox1.SelectionColor = Color.Black;
}
richTextBox1.SelectionStart = selectionstart;
}
private void pasteToolStripMenuItem_Click(object sender, EventArgs e)
{
try
{
LockWindowUpdate(richTextBox1.Handle);//Stops text from flashing flashing
richTextBox1.Paste();
AllMarkup();
}finally { LockWindowUpdate(IntPtr.Zero); }
}
I want to know if there's a better way to highlight this and make it faster or if someone can help me make it highlight only the visible text.
Please help. :)
Thanks, Tanner.
I agree with RCIX - you'll have a hard time overall with combining Regex and HTML parsing :)
If you're going for a high-quality solution that always highlights syntax properly, you're going to need a full-blown parser. You can either use one that's already created, or you can create your own using a tool like ANTLR.
The creators of ANTLR have already created an HTML parser grammar. You can find it here.
If you're looking for a pre-built one, here's a few I've found:
HTML Agility Pack
Majestic 12 HTML Parser
SGML Reader
I'm sure there are others -- this is a pretty common requirement.
Long story short, if this is anything but a simple, disposable project, I'd get a full-blown parser. Otherwise, you can continue to try and hack it with Regex.