Example using DiffMatchPatch - c#

I hope I am not breaking any rules here. I have a question about another post, but I am not a big user on stackoverflow, so my reputation is too low to add a comment to questions or answers that are not my own.
On this question: How to compare two rich text box contents and highlight the characters that are changed?
TaW provided some sample C# code and we have made use of that in a Visual Studio project. But, we discovered a problem and don't know how to fix it.
If RTB1 contains the text "My name is David" and RTB2 contains the text "My name is", then after the comparison is run there are two diffs in the diffs collection and somehow, when the rich text boxes are rewritten to show the differences, RTB1 is an exact match of RTB2 and nothing is highlighted. Maybe this is the expected behavior and we just are not realizing that, but we were hoping that the text " David" in RTB1 would be highlighted.
If the text in RTB2 is "My name is " (two added spaces at the end of the line), then we get the expected behavior.
I should have mentioned that we wrote a VB.NET equivalent of TaW's C# code and just noticed a difference. I have noted the difference in the comments.
If I was up to 50 reputation, I would have also added in my comment that we are very thankful to TaW for sharing his example, as well as the creator of DiffMatchPatch.

I think we figured out the problem. In our project we are using vb.net and we are fairly certain we translated correctly from C# to VB. However in the collectChunks function in C#, you are comparing RTB and RTB2 as objects, not the text property within the objects. so for instance, when you compare RTB and RTB2, even though the text in the two text boxes being compared is equal, your code is comparing the objects, and all their other associated properties, including the text box positions. Therefore, the first == is always false.
In VB, we are not allowed to do an object comparison. i.e. We are not allowed use RTB = RTB2, we must use RTB.Text = RTB2.Text in the if statement. (There is a way to compare the RTB objects in VB, but I am guessing what really needs to be compared is the text property in the RTB and RTB2 objects). If this is the case, is it possible that the results you got were based on an assumption that the text in the text boxes were being compared? And perhaps that assumption led you to code the way you decided to stay in or jump out of the for loop?

Related

Character sequence messes up the order of characters?

I had to handle an exception, by catching it and matching the message and if the message contains a certain error code, do something (not relevant).
The exceptions message is this (in English, but the code and the gibberish after it is the same in any language):
$-5002 - $make sure that the consumed quantity of the component item would not cause the item's stock to fall below zero [ige1.whscode][line: 1] , 'production order no: 20580033 line: 1' [الرسالة 3559-7]
I had to work with the code 7-3559 (as displayed). In my code, I just did a e.Message.Contains("7-3559") and it failed to catch the exception. Wondering what went wrong I copy pasted the error massage to regex101.com and after a bit of trial and error I realized that e.Message.Contains("3559-7") is the real code and it works. I just don't know why. What messes up the string to display it in such a way that 7- is actually -7 and also behind 3559?
I guess I should also mention I am working with Visual Studio 2019 and C#.
Check out the regex here.
HxD:
This is a common issue encountered when using bidirectional text, in other words, a text that contains both texts directionality: Right-to-Left (RTL) such as Arabic texts, and Left-to-Right (LTR).
Here we have the Arabic text mixed with English text so some rules will be applied to the text to determine the directionality. You may find details about this here.
In short, the text you see in the debugger is how the text will appear when you print it but not how it is represented in memory.
Here I use Linqpad to paste the text and the editor has immediately transformed it into the representation in memory. And once printed, the text is shown with a different directionality.

simple spell checking tool in C#

What i'm tying to achieve is a input field where you can put in how you think you spell the word then it will search my text file named words.txt and will find words that are of similar spelling then it will put the results into a new window.
thanks in advance
This is the one I have used and it sounded exactly what you wanted:
Make similar suggestions for input text by remembering old inputs
You can see it in action in the screen capture video here
ps I pre-populated a dictionary.dic file to suit in one instance and in the above example I did some other rules around LogParsers SQL-Like syntax to provide intellisense. HTH

C# Unknown Text Found

I'm creating a program to transfer text from a word document to a database. During some testing I came across some text inside a textbox after setting it's text to a table cell range as follows:
textBox1.Text = oDoc.Tables[1].Cell(1, 3).Range.Text;
What appeared in the form was:
What wasn't expected was the dot at the end of the text and I have no idea what it is supposed to represent. The dot can be highlighted but if you try and copy and paste it nothing appears. You can delete the dot manually. Can anyone help me identify what this is?
The identification bit shouldn't be too hard:
string text = oDoc.Tables[1].Cell(1, 3).Range.Text;
textBox1.Text = ((int) text[4]).ToString("x4");
That will give you the Unicode UTF-16 code unit for that character... you can then find out what it is on the Unicode web site. (I usually look at the Charts page or the directory of PDFs and guess which chart it will be in based on the numbering - it's not ideal, and there are probably better ways, but it's always worked well enough for me...)
Of course when you've identified it you'll still need to work out what the heck it's doing there... does the original Word document just have "HOLD"?

index in a string versus in a richtextbox

Is there anyway to reconcile the two ? Ie when i set the text of a richtextform from a string, a given characters index in the string does not match the position of it in the textbox.
Make sure the WordWrap property is False.
On extremely long lines you're going to run into RightMargin. It is not infinite, the maximum right margin depends on the font size.
It seems to be okay, with this my sample text:
"Provide details and share your research. Avoid statements based solely on opinion; only make statements you can back up with an appropriate reference, or personal experiences"
Using the code:
richTextBox1.Text.IndexOf("back up");
textBox1.Text.IndexOf("back up");
Both have results of: 112
It seems you are using the Rtf property of the RichTextBox that contains extra tags for its formatting?

Capturing Keyboard strokes in C#

HI,
I have the following problem- the following text is in a rich text box .
The world is [[wonderful]] today .
If the user provides two brackets before and afer a word, as in the case of wonderful , the word in brackets, in this case, wonderful shall change to a link, ( with a green colour ) .
I am having problems in getting the sequence of the keystrokes, ie. how do I know that the user has entered [[ , so I can start parsing the rest of the text which follows it .
I can get it by handlng KeyDown, event, and a list , but it does not look to be elegant at all.
Please let me know what should be a proper way.
Thanks,
Sujay
You have two approaches that I can think of off-hand.
One is, as you suggest, maintain the current state with a list—was this key a bracket? was the last key a bracket?—and update on the fly.
The other approach would be to simply handle the TextChanged event and re-scan the text for the [[text-here]] pattern and update as appropriate.
The first requires more bookkeeping but will be much faster for longer text. The second approach is easier and can probably be done with a decent regex, but it will get slower as your text gets longer. If you know you have some upper limit, like 256 characters, then you're probably fine. But if you're expecting novels, probably not such a great idea.
I would recommend Google'ing: "richtextbox syntax highlighter", there are so many people that have done this, and there is a lot behind the scenes to make it work.
I dare myself to say, that EVERY SINGLE simple solution have major drawbacks. Proper way would be to use some control that already does this "syntax highlighting" and extending it to your syntax. It is also most likely the easiest way.
You can search free .net controls in Codeplex. link
I would try handling the KeyDown, and checking for the closing bracket instead "]". Once you receive one, you could check the last character in your text box for the second ], and if it's there, just replace out the last few characters.
This eliminates the need for maintaining state (ie: the list). As soon as the second ] was typed, the block would change to a link instantly.
Keeping a list will be rather complex I think. What if the user types a '[' character, clicks somewhere else in the text and then types a '[' character again. The user has then typed two consecutive '[' characters but in completely different parts of the text. Also, you may want to be able to handle text inserted from the clipboard as well.
I think the safest way is to analyze the full text and do what should be done from that context, using RegEx or some other technique.
(Sorry, don't have enough reputation to add comments yet, so have to add a new answer). As suggested by jeffamaphone I'd handle the TextChanged event and rescan the text each time - but to keep the cost constant, just scan a few characters ahead of the current cursor position instead of reading the entire text.
Trying to intercept the keystrokes and maintain an internal state is a bad approach - it is very easy for your idea of what has happened to get out of sync with the control you are monitoring and cause weird problems. (and how do you handle clicks? Alt-tab? Pastes? arrow keys? Other applicatiosn grabbing focus? Too many special cases to worry about...)

Categories