How to verify a big text in RichTextBox without freezing the screen?

How to verify a big text in RichTextBox without freezing the screen? - c#

I'm currently implementing a Custom Spell Check in WPF using NHunspell, because the native solution of .Net Framework doesn't fit my needs. But I'm having trouble when checking the words in a big text, such as a Lorem Ipsum with 10 pargraphs, because i need to check each word, see if it contains in the dictionary the Hunspell uses, and if not, I need to Underline that Word.
I have this current method, that checks all the text everytime the KeyUp is a Backspace or a Space Key.
var textRange = new TextRange(SpellCheckRichTextBox.Document.ContentStart,
SpellCheckRichTextBox.Document.ContentEnd);
textRange.ApplyPropertyValue(Inline.TextDecorationsProperty, null);
_viewModel.Text = textRange.Text;
var zzz = _viewModel.Text.Split(' ');
var kfeofe = zzz.Where(x => _viewModel.MisspelledWords.Contains(x));
foreach (var item in kfeofe)
{
TextPointer current = textRange.Start.GetInsertionPosition(LogicalDirection.Forward);
while (current != null)
{
string textInRun = current.GetTextInRun(LogicalDirection.Forward);
if (!string.IsNullOrWhiteSpace(textInRun))
{
int index = textInRun.IndexOf(item.ToString());
if (index != -1)
{
TextPointer selectionStart = current.GetPositionAtOffset(index, LogicalDirection.Forward);
TextPointer selectionEnd = selectionStart.GetPositionAtOffset(item.ToString().Length, LogicalDirection.Forward);
TextRange selection = new TextRange(selectionStart, selectionEnd);
selection.ApplyPropertyValue(Inline.TextDecorationsProperty, TextDecorations.Underline);
}
}
current = current.GetNextContextPosition(LogicalDirection.Forward);
}
}
But, I think I need a Async solution, so it doesn't block my main thread and the typing of the user.
- In theory I was thinking about running a parallel thread if the user spends more than 2 seconds without typing and then returning the checked TextRange to my RichTextBox (SpellCheckRichTextBox).
Can somebody suggest any solution so I can make the verification less slow when working with big texts? I'm really stuck at that, any help would be appreciated.
Thanks in advance!

The first improvement would be
zzz.AsParallel().Where(x => _viewModel.MisspelledWords.Contains(x)).ToList();
That obviously assumes that your .MisspelledWords.Contains(x) is something that can be done in parallel. It might be a ConcurrentDictionary already.
The fact that you have a collection of misspelled words, makes me believe you already parsed the text once. So why parse it twice? Why can't you combine those two passes? That would be another possible optimization.
And yes, doing all of this in another thread when the user stops typing would be preferable.

Related

Coded UI & Selenium - different execution speed depending on where mouse pointer is

So, working in C#. I have a CSS-based drop-down menu. I open the menu, then grab a reference to it using FindElement by CSSSelector. I then grab the contents of the list using FindElements, again by CSSSelector.
Now here's where it get's interesting. I iterate the list, based on a file I have open in a streamreader.
Looks something like:
list = driver.FindElement(By.CSSSelector("dropdown-menu"));
list_items = list.FindElements(By.CSSSelector("LI > A"));
int row = 0;
while (data_file.read())
{
iWebElement item = list_items[row];
string label = item.text;
string url = item.getattribute("href");
assert.areequal("something", label);
assert.areequal("something else", url);
row++;
}
Now here's the thing: if the mouse pointer is placed over the drop-down, while this is executing, item.text returns value and the test succeeds. If the pointer is anywhere else, item.text will be blank and the test fails. Trying to understand what's going on, and taking a clue from the fact that though the test would fail when running, but would succeed while stepping, I modified the code with a loop:
while (data_file.read())
{
iWebElement item = list_items[row];
string label = item.text;
while (label == "")
{
label = item.text;
}
string url = item.getattribute("href");
assert.areequal("something", label);
assert.areequal("something else", url);
row++;
}
Now the test will always succeed, but if the pointer is not on the control it is SIGNIFICANTLY slower... we're talking a factor of 4 or 5... then when the pointer IS on the control. By wrapping a timer around this, I find that it typically takes between 2 and 4 seconds before .text returns anything but an empty string... sometimes longer.
Again, this delay only seems to apply when the mouse pointer is not over the drop-down. Otherwise, the value appears to be there instantaneously.
Can anyone suggest a possible explanation for why it's behaving this way, and a possible approach to solving it?
BTW, I'm not finding any difference between:
item = list_items[row];
label = item.text;
and
label = list_items[row].text;
Nor does .getattribute("value") produce any faster results than .text.

As for why the menus are acting this way, it's hard to tell with the code you've provided. If you could provide the code that displays your menu, that might help. As for solutions, there are a couple.
The label could be taking longer to display because of the while loop itself - it's just going crazy grabbing the text over and over very quickly. A better solution would be to wait for the element to be present. This may make your code run faster. See the Selenium Website for information on using WebDriverWait.
Alternatively, there is an "ugly" solution. You can just move the mouse to the menu with Selenium, to make sure the menu is always displayed when you need it. I've adapted some code from here as an example:
OpenQA.Selenium.Interactions.Actions builder = new
builder.MoveToElement(list).Build().Perform();
Hope this helps!

C# Windows Forms - Using StringBuilder to generate a large string then using SendKeys to type it out. Often happening twice in one call?

I'm almost finished writing a small add-on for League of Legends that allows players to click a checkbox in my form, and the checkbox's function clicks in the search bar (which automatically clears it. Already a feature in the LoL Client) and types out the names of all the champions the player has assigned to that group via a 2nd form. However, if the text is (example) 12345, I've been getting a lot of results like this: 1231234545.
What I've tried:
I've tried adding in ctrl-a + backspace with sendkeys because maybe when it clicks in the search bar it isn't always deleting the stuff there! This however did not change anything
I then commented out the code that types out the actual string (12345) and made the checkbox ONLY type ctrl-a. I then wrote out a string longer than the search bar that I'm typing into and pasted it, clicked a checkbox, and tried to see if it would highlight the text (meaning the text was not always being cleared when clicked). I did this about 50 times and every single time the text was cleared
This leads me to believe that the problem is in the code itself, not the actual integration with the LoL Client. So maybe it's a problem with StringBuilder? I haven't used it very often so I'm not too familiar with it. Commented out all the stringbuilder stuff and changed to String concatenation. Same problem
I tried adding a boolean runningCheckboxFunction to make sure the function would only run once but still have the issue as well.
So this is where I'm at right now. I'm pretty sure it's a problem with my code looping for some reason or the checkbox function as a whole being called twice instead of once when I click the checkbox.
My code:
private void checkboxFunction()
{
if (runningCheckboxFunction == true) return;
runningCheckboxFunction = true;
//CLICK IN THE SEARCH BAR (Automatically clears the text that WAS there)
Process[] LolClient = Process.GetProcessesByName("LolClient"); //should only return one client (i.e. get [0])
if (LolClient.Length < 1) return;
ClickOnPoint(LolClient[0].MainWindowHandle, searchBarCheckPoint);
//MAKE THE STRING TO TYPE IN THE SEARCH BAR
StringBuilder whatToTypeInSearchBar = new StringBuilder();
foreach (CheckBox cb in checkboxes)
{
if (cb.Visible == true && cb.Checked == true)
{
List<String> thisCheckboxsChamps = settingsForm.getChampionListForGroup(checkboxes.IndexOf(cb));
if (thisCheckboxsChamps == null) continue;
foreach (String champ in thisCheckboxsChamps)
{
whatToTypeInSearchBar.Append("|");
whatToTypeInSearchBar.Append(champ);
whatToTypeInSearchBar.Append("$");
}
}
}
//had an issue that sometimes clicking in search bar did not automatically clear it, so select all text and backspace just in case
//^ = ctrl -- ^A = ctrl-A (select all)
SendKeys.Send("^A");
//backspace
SendKeys.Send("{BACKSPACE}");
//TYPE IN THE SEARCH BAR
if (whatToTypeInSearchBar.Length > 0)
{
//remove the first "|"
whatToTypeInSearchBar.Remove(0, 1);
SendKeys.Send(whatToTypeInSearchBar.ToString());
}
runningCheckboxFunction = false;
}

Which event are you using for the check box control (CheckedChanged or CheckStateChanged)? Also, what is the value for CheckBox.ThreeState property? It is possible that the event you use fires more than once. If that is the case, you could check for a specific state before calling the function you have listed above.
Also, when you use SendKeys, try sending the keys to your own application and monitor the ActiveControl.KeyUp/ActiveControl.KeyDown/ActiveControl.KeyPress events to ensure that the string you generate is the riight sequence.

Fastest way to search in a string collection

Problem:
I have a text file of around 120,000 users (strings) which I would like to store in a collection and later to perform a search on that collection.
The search method will occur every time the user change the text of a TextBox and the result should be the strings that contain the text in TextBox.
I don't have to change the list, just pull the results and put them in a ListBox.
What I've tried so far:
I tried with two different collections/containers, which I'm dumping the string entries from an external text file (once, of course):
List<string> allUsers;
HashSet<string> allUsers;
With the following LINQ query:
allUsers.Where(item => item.Contains(textBox_search.Text)).ToList();
My search event (fires when user change the search text):
private void textBox_search_TextChanged(object sender, EventArgs e)
{
if (textBox_search.Text.Length > 2)
{
listBox_choices.DataSource = allUsers.Where(item => item.Contains(textBox_search.Text)).ToList();
}
else
{
listBox_choices.DataSource = null;
}
}
Results:
Both gave me a poor response time (around 1-3 seconds between each key press).
Question:
Where do you think my bottleneck is? The collection I've used? The search method? Both?
How can I get better performance and more fluent functionality?

You could consider doing the filtering task on a background thread which would invoke a callback method when it's done, or simply restart filtering if input is changed.
The general idea is to be able to use it like this:
public partial class YourForm : Form
{
private readonly BackgroundWordFilter _filter;
public YourForm()
{
InitializeComponent();
// setup the background worker to return no more than 10 items,
// and to set ListBox.DataSource when results are ready
_filter = new BackgroundWordFilter
(
items: GetDictionaryItems(),
maxItemsToMatch: 10,
callback: results =>
this.Invoke(new Action(() => listBox_choices.DataSource = results))
);
}
private void textBox_search_TextChanged(object sender, EventArgs e)
{
// this will update the background worker's "current entry"
_filter.SetCurrentEntry(textBox_search.Text);
}
}
A rough sketch would be something like:
public class BackgroundWordFilter : IDisposable
{
private readonly List<string> _items;
private readonly AutoResetEvent _signal = new AutoResetEvent(false);
private readonly Thread _workerThread;
private readonly int _maxItemsToMatch;
private readonly Action<List<string>> _callback;
private volatile bool _shouldRun = true;
private volatile string _currentEntry = null;
public BackgroundWordFilter(
List<string> items,
int maxItemsToMatch,
Action<List<string>> callback)
{
_items = items;
_callback = callback;
_maxItemsToMatch = maxItemsToMatch;
// start the long-lived backgroud thread
_workerThread = new Thread(WorkerLoop)
{
IsBackground = true,
Priority = ThreadPriority.BelowNormal
};
_workerThread.Start();
}
public void SetCurrentEntry(string currentEntry)
{
// set the current entry and signal the worker thread
_currentEntry = currentEntry;
_signal.Set();
}
void WorkerLoop()
{
while (_shouldRun)
{
// wait here until there is a new entry
_signal.WaitOne();
if (!_shouldRun)
return;
var entry = _currentEntry;
var results = new List<string>();
// if there is nothing to process,
// return an empty list
if (string.IsNullOrEmpty(entry))
{
_callback(results);
continue;
}
// do the search in a for-loop to
// allow early termination when current entry
// is changed on a different thread
foreach (var i in _items)
{
// if matched, add to the list of results
if (i.Contains(entry))
results.Add(i);
// check if the current entry was updated in the meantime,
// or we found enough items
if (entry != _currentEntry || results.Count >= _maxItemsToMatch)
break;
}
if (entry == _currentEntry)
_callback(results);
}
}
public void Dispose()
{
// we are using AutoResetEvent and a background thread
// and therefore must dispose it explicitly
Dispose(true);
}
private void Dispose(bool disposing)
{
if (!disposing)
return;
// shutdown the thread
if (_workerThread.IsAlive)
{
_shouldRun = false;
_currentEntry = null;
_signal.Set();
_workerThread.Join();
}
// if targetting .NET 3.5 or older, we have to
// use the explicit IDisposable implementation
(_signal as IDisposable).Dispose();
}
}
Also, you should actually dispose the _filter instance when the parent Form is disposed. This means you should open and edit your Form's Dispose method (inside the YourForm.Designer.cs file) to look something like:
// inside "xxxxxx.Designer.cs"
protected override void Dispose(bool disposing)
{
if (disposing)
{
if (_filter != null)
_filter.Dispose();
// this part is added by Visual Studio designer
if (components != null)
components.Dispose();
}
base.Dispose(disposing);
}
On my machine, it works pretty quickly, so you should test and profile this before going for a more complex solution.
That being said, a "more complex solution" would possibly be to store the last couple of results in a dictionary, and then only filter them if it turns out that the new entry differs by only the first of last character.

I've done some testing, and searching a list of 120,000 items and populating a new list with the entries takes a negligible amount of time (about a 1/50th of a second even if all strings are matched).
The problem you're seeing must therefore be coming from the populating of the data source, here:
listBox_choices.DataSource = ...
I suspect you are simply putting too many items into the listbox.
Perhaps you should try limiting it to the first 20 entries, like so:
listBox_choices.DataSource = allUsers.Where(item => item.Contains(textBox_search.Text))
.Take(20).ToList();
Also note (as others have pointed out) that you are accessing the TextBox.Text property for each item in allUsers. This can easily be fixed as follows:
string target = textBox_search.Text;
listBox_choices.DataSource = allUsers.Where(item => item.Contains(target))
.Take(20).ToList();
However, I timed how long it takes to access TextBox.Text 500,000 times and it only took 0.7 seconds, far less than the 1 - 3 seconds mentioned in the OP. Still, this is a worthwhile optimisation.

Use Suffix tree as index. Or rather just build a sorted dictionary that associates every suffix of every name with the list of corresponding names.
For input:
Abraham
Barbara
Abram
The structure would look like:
a -> Barbara
ab -> Abram
abraham -> Abraham
abram -> Abram
am -> Abraham, Abram
aham -> Abraham
ara -> Barbara
arbara -> Barbara
bara -> Barbara
barbara -> Barbara
bram -> Abram
braham -> Abraham
ham -> Abraham
m -> Abraham, Abram
raham -> Abraham
ram -> Abram
rbara -> Barbara
Search algorithm
Assume user input "bra".
Bisect the dictionary on user input to find the user input or the position where it could go. This way we find "barbara" - last key lower than "bra". It is called lower bound for "bra". Search will take logarithmic time.
Iterate from the found key onwards until user input no longer matches. This would give "bram" -> Abram and "braham" -> Abraham.
Concatenate iteration result (Abram, Abraham) and output it.
Such trees are designed for quick search of substrings. It performance is close to O(log n). I believe this approach will work fast enough to be used by GUI thread directly. Moreover it will work faster then threaded solution due to absence of synchronization overhead.

You need either a text search engine (like Lucene.Net), or database (you may consider an embedded one like SQL CE, SQLite, etc.). In other words, you need an indexed search. Hash-based search isn't applicable here, because you searching for sub-string, while hash-based search is well for searching for exact value.
Otherwise it will be an iterative search with looping through the collection.

It might also be useful to have a "debounce" type of event. This differs from throttling in that it waits a period of time (for example, 200 ms) for changes to finish before firing the event.
See Debounce and Throttle: a visual explanation for more information about debouncing. I appreciate that this article is JavaScript focused, instead of C#, but the principle applies.
The advantage of this is that it doesn't search when you're still entering your query. It should then stop trying to perform two searches at once.

Run the search on another thread, and show some loading animation or a progress bar while that thread is running.
You may also try to parallelize the LINQ query.
var queryResults = strings.AsParallel().Where(item => item.Contains("1")).ToList();
Here is a benchmark that demonstrates the performance advantages of AsParallel():
{
IEnumerable<string> queryResults;
bool useParallel = true;
var strings = new List<string>();
for (int i = 0; i < 2500000; i++)
strings.Add(i.ToString());
var stp = new Stopwatch();
stp.Start();
if (useParallel)
queryResults = strings.AsParallel().Where(item => item.Contains("1")).ToList();
else
queryResults = strings.Where(item => item.Contains("1")).ToList();
stp.Stop();
Console.WriteLine("useParallel: {0}\r\nTime Elapsed: {1}", useParallel, stp.ElapsedMilliseconds);
}

Update:
I did some profiling.
(Update 3)
List content: Numbers generated from 0 to 2.499.999
Filter text: 123 (20.477 results)
Core i5-2500, Win7 64bit, 8GB RAM
VS2012 + JetBrains dotTrace
The initial test run for 2.500.000 records took me 20.000ms.
Number one culprit is the call to textBox_search.Text inside Contains. This makes a call for each element to the expensive get_WindowText method of the textbox. Simply changing the code to:
var text = textBox_search.Text;
listBox_choices.DataSource = allUsers.Where(item => item.Contains(text)).ToList();
reduced the execution time to 1.858ms.
Update 2 :
The other two significant bottle-necks are now the call to string.Contains (about 45% of the execution time) and the update of the listbox elements in set_Datasource (30%).
We could make a trade-off between speed and memory usage by creating a Suffix tree as Basilevs has suggested to reduce the number of necessary compares and push some processing time from the search after a key-press to the loading of the names from file which might be preferable for the user.
To increase the performance of loading the elements into the listbox I would suggest to load only the first few elements and indicate to the user that there are further elements available. This way you give a feedback to the user that there are results available so they can refine their search by entering more letters or load the complete list with a press of a button.
Using BeginUpdate and EndUpdate made no change in the execution time of set_Datasource.
As others have noted here, the LINQ query itself runs quite fast. I believe your bottle-neck is the updating of the listbox itself. You could try something like:
if (textBox_search.Text.Length > 2)
{
listBox_choices.BeginUpdate();
listBox_choices.DataSource = allUsers.Where(item => item.Contains(textBox_search.Text)).ToList();
listBox_choices.EndUpdate();
}
I hope this helps.

Assuming you are only matching by prefixes, the data structure you are looking for is called a trie, also known as "prefix tree". The IEnumerable.Where method that you're using now will have to iterate through all items in your dictionary on each access.
This thread shows how to create a trie in C#.

The WinForms ListBox control really is your enemy here. It will be slow to load the records and the ScrollBar will fight you to show all 120,000 records.
Try using an old-fashioned DataGridView data-sourced to a DataTable with a single column [UserName] to hold your data:
private DataTable dt;
public Form1() {
InitializeComponent();
dt = new DataTable();
dt.Columns.Add("UserName");
for (int i = 0; i < 120000; ++i){
DataRow dr = dt.NewRow();
dr[0] = "user" + i.ToString();
dt.Rows.Add(dr);
}
dgv.AutoSizeColumnsMode = DataGridViewAutoSizeColumnsMode.Fill;
dgv.AllowUserToAddRows = false;
dgv.AllowUserToDeleteRows = false;
dgv.RowHeadersVisible = false;
dgv.DataSource = dt;
}
Then use a DataView in the TextChanged event of your TextBox to filter the data:
private void textBox1_TextChanged(object sender, EventArgs e) {
DataView dv = new DataView(dt);
dv.RowFilter = string.Format("[UserName] LIKE '%{0}%'", textBox1.Text);
dgv.DataSource = dv;
}

First I would change how ListControl sees your data source, you're converting result IEnumerable<string> to List<string>. Especially when you just typed few characters this may be inefficient (and unneeded). Do not make expansive copies of your data.
I would wrap .Where() result to a collection that implements only what is required from IList (search). This will save you to create a new big list for each character is typed.
As alternative I would avoid LINQ and I'd write something more specific (and optimized). Keep your list in memory and build an array of matched indices, reuse array so you do not have to reallocate it for each search.
Second step is to do not search in the big list when small one is enough. When user started to type "ab" and he adds "c" then you do not need to research in the big list, search in the filtered list is enough (and faster). Refine search every time is possible, do not perform a full search each time.
Third step may be harder: keep data organized to be quickly searched. Now you have to change the structure you use to store your data. imagine a tree like this:
A B C
Add Better Ceil
Above Bone Contour
This may simply be implemented with an array (if you're working with ANSI names otherwise a dictionary would be better). Build the list like this (illustration purposes, it matches beginning of string):
var dictionary = new Dictionary<char, List<string>>();
foreach (var user in users)
{
char letter = user[0];
if (dictionary.Contains(letter))
dictionary[letter].Add(user);
else
{
var newList = new List<string>();
newList.Add(user);
dictionary.Add(letter, newList);
}
}
Search will be then done using first character:
char letter = textBox_search.Text[0];
if (dictionary.Contains(letter))
{
listBox_choices.DataSource =
new MyListWrapper(dictionary[letter].Where(x => x.Contains(textBox_search.Text)));
}
Please note I used MyListWrapper() as suggested in first step (but I omitted by 2nd suggestion for brevity, if you choose right size for dictionary key you may keep each list short and fast to - maybe - avoid anything else). Moreover note that you may try to use first two characters for your dictionary (more lists and shorter). If you extend this you'll have a tree (but I don't think you have such big number of items).
There are many different algorithms for string searching (with related data structures), just to mention few:
Finite state automaton based search: in this approach, we avoid backtracking by constructing a deterministic finite automaton (DFA) that recognizes stored search string. These are expensive to construct—they are usually created using the powerset construction—but are very quick to use.
Stubs: Knuth–Morris–Pratt computes a DFA that recognizes inputs with the string to search for as a suffix, Boyer–Moore starts searching from the end of the needle, so it can usually jump ahead a whole needle-length at each step. Baeza–Yates keeps track of whether the previous j characters were a prefix of the search string, and is therefore adaptable to fuzzy string searching. The bitap algorithm is an application of Baeza–Yates' approach.
Index methods: faster search algorithms are based on preprocessing of the text. After building a substring index, for example a suffix tree or suffix array, the occurrences of a pattern can be found quickly.
Other variants: some search methods, for instance trigram search, are intended to find a "closeness" score between the search string and the text rather than a "match/non-match". These are sometimes called "fuzzy" searches.
Few words about parallel search. It's possible but it's seldom trivial because overhead to make it parallel can be easily much higher that search itself. I wouldn't perform search itself in parallel (partitioning and synchronization will become soon too expansive and maybe complex) but I would move search to a separate thread. If main thread isn't busy your users won't feel any delay while they're typing (they won't note if list will appear after 200 ms but they'll feel uncomfortable if they have to wait 50 ms after they typed). Of course search itself must be fast enough, in this case you don't use threads to speed up search but to keep your UI responsive. Please note that a separate thread will not make your query faster, it won't hang UI but if your query was slow it'll still be slow in a separate thread (moreover you have to handle multiple sequential requests too).

You could try using PLINQ (Parallel LINQ).
Although this does not garantee a speed boost, this you need to find out by trial and error.

I doubt you'll be able to make it faster, but for sure you should:
a) Use the AsParallel LINQ extension method
a) Use some kind of timer to delay filtering
b) Put a filtering method on another thread
Keep some kind of string previousTextBoxValue somewhere. Make a timer with a delay
of 1000 ms, that fires searching on tick if previousTextBoxValue is same as your textbox.Text value. If not - reassign previousTextBoxValue to the current value and reset the timer. Set the timer start to the textbox changed event, and it'll make your application smoother. Filtering 120,000 records in 1-3 seconds is OK, but your UI must remain responsive.

You can also try using BindingSource.Filter function. I have used it and it works like a charm to filter from bunch of records, every time update this property with the text being search. Another option would be to use AutoCompleteSource for TextBox control.
Hope it helps!

I would try to sort collection, search to match only start part and limit search by some number.
so on ininialization
allUsers.Sort();
and search
allUsers.Where(item => item.StartWith(textBox_search.Text))
Maybe you can add some cache.

Use Parallel LINQ. PLINQ is a parallel implementation of LINQ to Objects. PLINQ implements the full set of LINQ standard query operators as extension methods for the T:System.Linq namespace and has additional operators for parallel operations. PLINQ combines the simplicity and readability of LINQ syntax with the power of parallel programming. Just like code that targets the Task Parallel Library, PLINQ queries scale in the degree of concurrency based on the capabilities of the host computer.
Introduction to PLINQ
Understanding Speedup in PLINQ
Also you can use Lucene.Net
Lucene.Net is a port of the Lucene search engine library, written in
C# and targeted at .NET runtime users. The Lucene search library is
based on an inverted index. Lucene.Net has three primary goals:

According to what I have seen I agree with the fact to sort the list.
However to sort when the list is construct will be very slow, sort when building, you will have a better execution time.
Otherwise if you don't need to display the list or to keep the order, use a hashmap.
The hashmap will hash your string and search at the exact offset. It should be faster I think.

Try use BinarySearch method it should work faster then Contains method.
Contains will be an O(n)
BinarySearch is an O(lg(n))
I think that sorted collection should work faster on search and slower on adding new elements, but as I understood you have only search perfomance problem.

Quickest way to Update Multiline Textbox with Large Amount of Text

I have a .NET 4.5 WinForm program that queries a text-based database using ODBC. I then want to display every result in a multiline textbox and I want to do it in the quickest way possible.
The GUI does not have to be usable during the time the textbox is being updated/populated. However, it'd be nice if I could update a progress bar to let the user know that something is happening - I believe a background worker or new thread/task is necessary for this but I've never implemented one.
I initially went with this code and it was slow, as it drew out the result every line before continuing to the next one.
OdbcDataReader dbReader = com.ExecuteReader();
while (dbReader.Read())
{
txtDatabaseResults.AppendText(dbReader[0].ToString());
}
This was significantly faster.
string resultString = "";
while (dbReader.Read())
{
resultString += dbReader[0].ToString();
}
txtDatabaseResults.Text = resultString;
But there is a generous wait time before the textbox comes to life so I want to know if the operation can be even faster. Right now I'm fetching about 7,000 lines from the file and I don't think it's necessary to switch to AvalonEdit (correct me if my way of thinking is wrong, but I would like to keep it simple and use the built-in textbox).

You can make this far faster by using a StringBuilder instead of using string concatenation.
var results = new StringBuilder();
while (dbReader.Read())
{
results.Append(dbReader[0].ToString());
}
txtDatabaseResults.Text = results.ToString();
Using string and concatenation creates a lot of pressure on the GC, especially if you're appending 7000 lines of text. Each time you use string +=, the CLR creates a new string instance, which means the older one (which is progressively larger and larger) needs to be garbage collected. StringBuilder avoids that issue.
Note that there will still be a delay when you assign the text to the TextBox, as it needs to refresh and display that text. The TextBox control isn't optimized for that amount of text, so that may be a bottleneck.
As for pushing this into a background thread - since you're using .NET 4.5, you could use the new async support to handle this. This would work via marking the method containing this code as async, and using code such as:
string resultString = await Task.Run(()=>
{
var results = new StringBuilder();
while (dbReader.Read())
{
results.Append(dbReader[0].ToString());
}
return results.ToString();
});
txtDatabaseResults.Text = resultString;

Use a StringBuilder:
StringBuilder e = new StringBuilder();
while (dbReader.Read())
{
e.Append(dbReader[0].ToString());
}
txtDatabaseResults.Text = e.ToString();

Despite the fact that a parallel Thread is recommended, the way you extract the lines from file is somehow flawed. While string is immutable everytime you concatenate resulString you actually create another (bigger) string. Here, StringBuilder comes in very useful:
StringBuilder resultString = new StringBuilder ()
while (dbReader.Read())
{
resultString = resultString.Append(dbReader[0].ToString());
}
txtDatabaseResults.Text = resultString;

I am filling a regular TextBox (multiline=true) in a single call with a very long string (more than 200kB, loaded from a file. I just assign the Text property of TextBox with my string).
It's very slow (> 1 second).
The Textbox does anything else than display the huge string.
I used a very simple trick to improve performances : I replaced the multiline textbox by a RichTextBox (native control).
Now same loadings are instantaneous and RichTextBox has exactly the same appearance and behavior as TextBox with raw text (as long as you didn't tweaked it). The most obvious difference is RTB does not have Context menu by default.
Of course, it's not a solution in every case, and it's not aiming the OP question but for me it works perfectly, so I hope it could help other peoples facing same problems with Textbox and performance with big strings.

Highlighting in a RichTextBox is taking too long

I have a large list of offsets which I need to highlight in my RichTextBox. However this process is taking too long. I am using the following code:
foreach (int offset in offsets)
{
richTextBox.Select(offset, searchString.Length);
richTextBox.SelectionBackColor = Color.Yellow;
}
Is there a more efficient way to do so?
UPDATE:
Tried using this method but it doesn't highlight anything:
richTextBox.SelectionBackColor = Color.Yellow;
foreach (int offset in offsets)
{
richTextBox.Select(offset, searchString.Length);
}

I've googled your issue and I found that RichTextBox is getting very slow when having many lines. In my opinion, you have either buy a third part control which you can be satisfied by its performance or you may need threads to devide the whole selection task. I think they can accelerate things up.
Hope it helps !

I've had this same problem before. I ended up disregarding all of the methods they give you and manipulated the underlying RTF data. Also, the reason that your second block of code doesnt work is that RTF applies formatting as it goes, so if you call a function (or Property in this case) to change the selection color, it will only apply it for the currently selected block. Any changes made to the selection after that call become irrelavent.
You can play around with the RGB values, or here is a great source on how to do different things within the RTF control. Pop this function in your code and see how well it works. I use it to provide realtime syntax highlighting for SQL code.
public void HighlightText(int offset, int length)
{
String sText = richTextBox.Text.Trim();
sText = sText.Insert(offset + length - 1, #" \highlight0");
sText = sText.Insert(offset, #" \highlight1");
String s = #"{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset0 Courier New;}}
{\colortbl ;\red255\green255\blue0;}\viewkind4\uc1\pard";
s += sText;
s += #"\par}";
richTextBox.Rtf = s;
}

Does it make any difference if you set the SelectionBackColor outside of the loop?
Looking into the RichTextBox with Reflector shows, that a WindowMessage is sent to the control every time when the color is set. In the case of large number of offsets this might lead to highlighting the already highlighted words again and again, leading to O(n^2) behavior.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to verify a big text in RichTextBox without freezing the screen? - c#

Related

Coded UI & Selenium - different execution speed depending on where mouse pointer is

C# Windows Forms - Using StringBuilder to generate a large string then using SendKeys to type it out. Often happening twice in one call?

Fastest way to search in a string collection

Quickest way to Update Multiline Textbox with Large Amount of Text

Highlighting in a RichTextBox is taking too long

Categories

Resources