best way to program a progress bar for many methods - c#

Say I have a button: btnExecute
This button executes all these methods:
doAction1();
doAction2();
doAction3();
doAction4();
Each of these actions takes a couple of seconds.
How would I go about updating a progress bar as each of these methods are run?
I know I could go into each of them and just add pbExecuteProgress.value = 10 for example but I feel like there must be a better way to do this.
I also do not want to do something like this:
doAction1();
pbExecuteProgress.value = 30
doAction2();
pbExecuteProgress.value = 60
doAction3();
pbExecuteProgress.value = 80
doAction4();
pbExecuteProgress.value = 100
Is there ANY way I create a progress bar that goes up to 100 in value and is = to the progress of a method??

var actions = new List<Action> {doAction1, doAction2, doAction3, doAction4};
foreach(var action in actions)
{
action();
progressBar.Value += (progressBar.Maximum - progressBar.Minimum) / actions.Count;
}
if you wanted to customize the progress values:
var actions = new Dictionary<Action, int>
{
{doAction1, 30},
{doAction2, 30},
{doAction3, 20},
{doAction4, 20},
};
progressBar.Minimum = 0;
progressBar.Maximum = actions.Select(kvp=>kvp.Value).Sum();
progressBar.Value = 0;
foreach(var action in actions)
{
action.Key();
progressBar.Value += action.Value;
}
Declaration and creation of the collection of actions does not have to happen just before you loop through it. In fact, a strength of this pattern (a form of the Strategy pattern) is that new Actions can come from anywhere, both inside the current class and out.

When you "nest" progress, the outer method has to estimate how long each subroutine will take to execute, and divide up its 100% among the subroutines.
You can either do this arbitrarily, i.e. method one is 40%, method two is 25%, etc, or you can do a"dry run" to decide how much work each method will do. For example, if each method processes a number of files, the dry run would count the files without actually processing them, and you could then use the file counts for each method to allocate the range of the progress bar that apples to each method.
The dry run approach gives a more accurate result, but the pre processing required can take quite a long time, and thus make the overall task slower.
Once you have the range for each subroutine you can pass in their starting and ending percentages, and they can advance the progress appropriately.
A neat way to handle this is to create a progress manager class that deals with all the subtask scaling, which makes if easy to nest a whole subtree of calls, and pass only one object around them to handle the progress display.

If you really want to do it yourself, I'd suggest timing each one of the statements inside of each method as well as the entire method, throw all the times into an array, convert them to a percentage, and write em to a text file or something. Rinse and repeat, average em up, and then you'll have to add a bunch of pbExecuteProgress.value = X statements. I know you said you didn't wanna do that, but at least this way it'd be a really accurate representation of the progress.
Just a thought.
Also, I'm fairly sure there are third party tools that'll do this kind of thing for you. However, it will be inherently impossible for them to be as accurate as the method above.

Related

How can i minimize performance hit for Coded ui when using nested if statement in test method

Is there a way for me to minimize the performance hit when i'm either running or debugging my coded U.I test. Currently its taking me a long time to run my coded UI test because it takes to long to execute. I"ve timed it and too long means that for checking if a screen exist and doing an action it takes over 1min plus, so its taking me to long to debug and finish it out.
To give some more background. These if statements are all inside one test method, where i'm checking for different screens. Its very dynamic but takes to long to run. I've read i can do ordered test but i didn't think i can create ordered test with these dynamic screens(reason being i dont think ordered test can act as if statements to account for dynamic dialog and screens) and plus i think its too late in the process to go to that architecture.
I've tried the following playback settings with little or no improvements.
Here are my current playback settings
Playback.PlaybackSettings.WaitForReadyLevel = WaitForReadyLevel.Disabled;
//Playback.PlaybackSettings.SmartMatchOptions = SmartMatchOptions.None;
Playback.PlaybackSettings.MaximumRetryCount = 10;
Playback.PlaybackSettings.ShouldSearchFailFast = false;
Playback.PlaybackSettings.DelayBetweenActions = 1000;
Playback.PlaybackSettings.SearchTimeout = 2000;
None of these setting have helped either turning off smart options.
I could have sworn that i've read somewhere that if i replace my if statements
with try catch that this would help, but i maybe totally wrong since i'm just grabbing at straws to try to atleast increase performance by 40% or so.
Would anyone have any tips or tricks when dealing with ifs statements that you had to code in your coded ui code.
I'm guessing your if statements are of a kind:
if (uTtestControl.exists)
{
do something
}
if that's the case - your delays are a result of codedui searching for the control - a time costly operation - especially when searching for a control that doesn't exists.
there are a number of ways to handle this - if my guess is in the ball park - please confirm and i'll detail the options.
Updtae:
the main reason for delay is the MaximumRetryCount =10. in addition try the following settings:
Playback.PlaybackSettings.MaximumRetryCount = 3;
Playback.PlaybackSettings.DelayBetweenActions = 100;
Playback.PlaybackSettings.SearchTimeout = 15000;
when waiting for control to exists use the:
uiTtestControl.WaitForControlExist(5000)
this will tell the playback to search for the control for a max of 5 sec.
in addition - you should reduce the Playback.PlaybackSettings.SearchTimeout before searching for a control that you know might not exists:
var defaultTimeout = Playback.PlaybackSettings.SearchTimeout;
Playback.PlaybackSettings.SearchTimeout = 5000;
and after you finish searching return it to the default value:
Playback.PlaybackSettings.SearchTimeout = defaultTimeout;
this should do the trick

I can't figure out what is slowing my program down

I have created a Windows Form application that reads in a text file, rearranges the data, and writes to a new text file. I have noticed that it slows down exponentially as it runs. I have been using tracepoints, stopwatches, and datetime to figure out why each iteration is taking longer than the previous, but I can't figure it out. My best guess would be that it might have something to do with the way I'm initializing variables.
I'm not sure how helpful this snippet of code will be but maybe it will give some insight into my problem:
while (cuttedWords.Any())
{
var variable = cuttedWords.TakeWhile(x => x != separator).ToArray();
cuttedWords = cuttedWords.Skip(variable.Length + 1);
sortDataObject.SortDataMethod(variable, b);
if (sortDataObject.virtualPara)
{
if (!virtualParaUsed)
{
listOfNames = sortDataObject.findListOfNames(backgroundWords, ref IDforCounting, countParametersTable);
}
virtualParaUsed = true;
printDataObject.WriteFileVirtual(fileName, ID, sortDataObject.listNames[0], sortDataObject.listNames[1],
sortDataObject.unit, listOfNames, sortDataObject.virtualNames);
sortDataObject.virtualNames.Clear();
}
else
{
int[] indexes = checkedListBox1.CheckedIndices.Cast<int>().ToArray();
printDataObject.WriteFile(fileName, ID, sortDataObject.listNames[0], sortDataObject.listNames[1],
sortDataObject.unit, sortDataObject.hexValue[0], sortDataObject.stringShift, sortDataObject.sign,
sortDataObject.SFBinary[0], sortDataObject.wordValue, sortDataObject.conversions, sortDataObject.stringData, indexes, sortDataObject.conType);
}
decimal sum = ((decimal)IDforCounting) / countParametersTable * 100;
int sum2 = (int)sum;
backgroundWorker1.ReportProgress(sum2);
ID++;
IDforCounting++;
b++;
}
What is strange to me is that I know that each loop runs in a matter of milliseconds, but from the start of one loop to the start of the next, the time keeps increasing.
I apologize if this is not enough information to analyze my issue, but I'm not sure what else I can provide without showing my entire solution.
Thank you.
EDIT: A better questions might be: what is a good way to analyze performance if stopwatches aren't doing the trick. I'd rather not have to download a profiler.
If its taking longer and longer, on each iteration, its probably related to the initial cuttedWords.any().
What type is cuttedWords? If its a database-backed enumerable, it will re-issue the sql statement on every iteration, which may or may not be what you want.
On the other hand, if this is a producer-consumer scenario, it may be that cuttedWords is locked by the producer, causing the consumer to be thread-locked while waiting for the producer to complete its action.
Also, the .reportProgress will cause the backgroundworker to raise an event on the thread that created it, potentially causing UI updates, so maybe try removing that line and see if it helps any. Then replace it with some code that only calls reportProgress if the progress has actually changed.

Strange Behavior with Threading and Timer

I explain my situation.
I have a producer 1 to N consumers pattern. I'm using blocking collections and everything is working well. Doing some test I noticed this strange behavior:
I was testing how long my manipulation of data took in my consumers.
I noticed this strange things, below you'll find the code cleaned of my manipulation and which produce the strange behavior.
I have 4 consumers for 1 producer.
For most of data, the Console doesn't print anything, because ts=0 (its under a tick) but randomly (between every 1 to 5sec) it plots something like this (not in this very specific order, but of the same kind):
10000
20001
10000
30002
10000
40003
10000
10000
It is of the order of 10,000 ticks so around 1ms. Always a number in the format (N)000(N-1)
Note that the BlockingCollection I consume is filled depending on some network events which occurred completely at random times. Nothing regular from here.
The timing is almost perfect, always a multiple of 10,000 ticks.
What could be behind this ? Thks !
while(IsAlive)
{
DataToFieldMapping item;
try
{
_CollectionToConsume.TryTake(out item, -1);
}
catch
{
item = null;
}
if (item != null)
{
long ts = (DateTime.Now.Ticks - item.TimeStamp.Ticks);
if(ts>10)
Console.WriteLine(ts);
}
}
What's going on here is that DateTime.Now has a fairly limited precision. It's not giving you the time to the nearest tick. It is only updated every 10,000 ticks or so, which is why you generally see multiples of 10k ticks in your prints.
If you really want to get a better feel for the duration of those events, use the StopWatch class, which has a much higher precision. That said, StopWatch is simply a diagnostic tool (hence why it's in the Diagnostics namespace). You should only be using it to help you diagnose what's going on, and should be using it in production code.
On a side note, there really isn't any need to use a timer here at all. It appears that you're creating several consumers that are polling the BlockingCollection for new content. There is no reason to do this. They can simply block until the collection has items. (Hence the name, BlockingCollection.
The easiest way is for the consumers to simply do this:
foreach(var item in _CollectionToConsume.GetConsumingEnumerable())
ProcessItem(item);
Then just run that code in a background thread.
if you write the following and run, you'll see that ticks do not roll one to one, but rather in relatively large chunks b/c ticks resolution is actually much smaller.
for(int i =0; i< 100; i++)
{
Console.WriteLine(DateTime.Now.Ticks);
}
Use Stopwatch class to measure performance as that one uses a high-resolution timer which is much more suitable for the purpose.

Fastest way to search in a string collection

Problem:
I have a text file of around 120,000 users (strings) which I would like to store in a collection and later to perform a search on that collection.
The search method will occur every time the user change the text of a TextBox and the result should be the strings that contain the text in TextBox.
I don't have to change the list, just pull the results and put them in a ListBox.
What I've tried so far:
I tried with two different collections/containers, which I'm dumping the string entries from an external text file (once, of course):
List<string> allUsers;
HashSet<string> allUsers;
With the following LINQ query:
allUsers.Where(item => item.Contains(textBox_search.Text)).ToList();
My search event (fires when user change the search text):
private void textBox_search_TextChanged(object sender, EventArgs e)
{
if (textBox_search.Text.Length > 2)
{
listBox_choices.DataSource = allUsers.Where(item => item.Contains(textBox_search.Text)).ToList();
}
else
{
listBox_choices.DataSource = null;
}
}
Results:
Both gave me a poor response time (around 1-3 seconds between each key press).
Question:
Where do you think my bottleneck is? The collection I've used? The search method? Both?
How can I get better performance and more fluent functionality?
You could consider doing the filtering task on a background thread which would invoke a callback method when it's done, or simply restart filtering if input is changed.
The general idea is to be able to use it like this:
public partial class YourForm : Form
{
private readonly BackgroundWordFilter _filter;
public YourForm()
{
InitializeComponent();
// setup the background worker to return no more than 10 items,
// and to set ListBox.DataSource when results are ready
_filter = new BackgroundWordFilter
(
items: GetDictionaryItems(),
maxItemsToMatch: 10,
callback: results =>
this.Invoke(new Action(() => listBox_choices.DataSource = results))
);
}
private void textBox_search_TextChanged(object sender, EventArgs e)
{
// this will update the background worker's "current entry"
_filter.SetCurrentEntry(textBox_search.Text);
}
}
A rough sketch would be something like:
public class BackgroundWordFilter : IDisposable
{
private readonly List<string> _items;
private readonly AutoResetEvent _signal = new AutoResetEvent(false);
private readonly Thread _workerThread;
private readonly int _maxItemsToMatch;
private readonly Action<List<string>> _callback;
private volatile bool _shouldRun = true;
private volatile string _currentEntry = null;
public BackgroundWordFilter(
List<string> items,
int maxItemsToMatch,
Action<List<string>> callback)
{
_items = items;
_callback = callback;
_maxItemsToMatch = maxItemsToMatch;
// start the long-lived backgroud thread
_workerThread = new Thread(WorkerLoop)
{
IsBackground = true,
Priority = ThreadPriority.BelowNormal
};
_workerThread.Start();
}
public void SetCurrentEntry(string currentEntry)
{
// set the current entry and signal the worker thread
_currentEntry = currentEntry;
_signal.Set();
}
void WorkerLoop()
{
while (_shouldRun)
{
// wait here until there is a new entry
_signal.WaitOne();
if (!_shouldRun)
return;
var entry = _currentEntry;
var results = new List<string>();
// if there is nothing to process,
// return an empty list
if (string.IsNullOrEmpty(entry))
{
_callback(results);
continue;
}
// do the search in a for-loop to
// allow early termination when current entry
// is changed on a different thread
foreach (var i in _items)
{
// if matched, add to the list of results
if (i.Contains(entry))
results.Add(i);
// check if the current entry was updated in the meantime,
// or we found enough items
if (entry != _currentEntry || results.Count >= _maxItemsToMatch)
break;
}
if (entry == _currentEntry)
_callback(results);
}
}
public void Dispose()
{
// we are using AutoResetEvent and a background thread
// and therefore must dispose it explicitly
Dispose(true);
}
private void Dispose(bool disposing)
{
if (!disposing)
return;
// shutdown the thread
if (_workerThread.IsAlive)
{
_shouldRun = false;
_currentEntry = null;
_signal.Set();
_workerThread.Join();
}
// if targetting .NET 3.5 or older, we have to
// use the explicit IDisposable implementation
(_signal as IDisposable).Dispose();
}
}
Also, you should actually dispose the _filter instance when the parent Form is disposed. This means you should open and edit your Form's Dispose method (inside the YourForm.Designer.cs file) to look something like:
// inside "xxxxxx.Designer.cs"
protected override void Dispose(bool disposing)
{
if (disposing)
{
if (_filter != null)
_filter.Dispose();
// this part is added by Visual Studio designer
if (components != null)
components.Dispose();
}
base.Dispose(disposing);
}
On my machine, it works pretty quickly, so you should test and profile this before going for a more complex solution.
That being said, a "more complex solution" would possibly be to store the last couple of results in a dictionary, and then only filter them if it turns out that the new entry differs by only the first of last character.
I've done some testing, and searching a list of 120,000 items and populating a new list with the entries takes a negligible amount of time (about a 1/50th of a second even if all strings are matched).
The problem you're seeing must therefore be coming from the populating of the data source, here:
listBox_choices.DataSource = ...
I suspect you are simply putting too many items into the listbox.
Perhaps you should try limiting it to the first 20 entries, like so:
listBox_choices.DataSource = allUsers.Where(item => item.Contains(textBox_search.Text))
.Take(20).ToList();
Also note (as others have pointed out) that you are accessing the TextBox.Text property for each item in allUsers. This can easily be fixed as follows:
string target = textBox_search.Text;
listBox_choices.DataSource = allUsers.Where(item => item.Contains(target))
.Take(20).ToList();
However, I timed how long it takes to access TextBox.Text 500,000 times and it only took 0.7 seconds, far less than the 1 - 3 seconds mentioned in the OP. Still, this is a worthwhile optimisation.
Use Suffix tree as index. Or rather just build a sorted dictionary that associates every suffix of every name with the list of corresponding names.
For input:
Abraham
Barbara
Abram
The structure would look like:
a -> Barbara
ab -> Abram
abraham -> Abraham
abram -> Abram
am -> Abraham, Abram
aham -> Abraham
ara -> Barbara
arbara -> Barbara
bara -> Barbara
barbara -> Barbara
bram -> Abram
braham -> Abraham
ham -> Abraham
m -> Abraham, Abram
raham -> Abraham
ram -> Abram
rbara -> Barbara
Search algorithm
Assume user input "bra".
Bisect the dictionary on user input to find the user input or the position where it could go. This way we find "barbara" - last key lower than "bra". It is called lower bound for "bra". Search will take logarithmic time.
Iterate from the found key onwards until user input no longer matches. This would give "bram" -> Abram and "braham" -> Abraham.
Concatenate iteration result (Abram, Abraham) and output it.
Such trees are designed for quick search of substrings. It performance is close to O(log n). I believe this approach will work fast enough to be used by GUI thread directly. Moreover it will work faster then threaded solution due to absence of synchronization overhead.
You need either a text search engine (like Lucene.Net), or database (you may consider an embedded one like SQL CE, SQLite, etc.). In other words, you need an indexed search. Hash-based search isn't applicable here, because you searching for sub-string, while hash-based search is well for searching for exact value.
Otherwise it will be an iterative search with looping through the collection.
It might also be useful to have a "debounce" type of event. This differs from throttling in that it waits a period of time (for example, 200 ms) for changes to finish before firing the event.
See Debounce and Throttle: a visual explanation for more information about debouncing. I appreciate that this article is JavaScript focused, instead of C#, but the principle applies.
The advantage of this is that it doesn't search when you're still entering your query. It should then stop trying to perform two searches at once.
Run the search on another thread, and show some loading animation or a progress bar while that thread is running.
You may also try to parallelize the LINQ query.
var queryResults = strings.AsParallel().Where(item => item.Contains("1")).ToList();
Here is a benchmark that demonstrates the performance advantages of AsParallel():
{
IEnumerable<string> queryResults;
bool useParallel = true;
var strings = new List<string>();
for (int i = 0; i < 2500000; i++)
strings.Add(i.ToString());
var stp = new Stopwatch();
stp.Start();
if (useParallel)
queryResults = strings.AsParallel().Where(item => item.Contains("1")).ToList();
else
queryResults = strings.Where(item => item.Contains("1")).ToList();
stp.Stop();
Console.WriteLine("useParallel: {0}\r\nTime Elapsed: {1}", useParallel, stp.ElapsedMilliseconds);
}
Update:
I did some profiling.
(Update 3)
List content: Numbers generated from 0 to 2.499.999
Filter text: 123 (20.477 results)
Core i5-2500, Win7 64bit, 8GB RAM
VS2012 + JetBrains dotTrace
The initial test run for 2.500.000 records took me 20.000ms.
Number one culprit is the call to textBox_search.Text inside Contains. This makes a call for each element to the expensive get_WindowText method of the textbox. Simply changing the code to:
var text = textBox_search.Text;
listBox_choices.DataSource = allUsers.Where(item => item.Contains(text)).ToList();
reduced the execution time to 1.858ms.
Update 2 :
The other two significant bottle-necks are now the call to string.Contains (about 45% of the execution time) and the update of the listbox elements in set_Datasource (30%).
We could make a trade-off between speed and memory usage by creating a Suffix tree as Basilevs has suggested to reduce the number of necessary compares and push some processing time from the search after a key-press to the loading of the names from file which might be preferable for the user.
To increase the performance of loading the elements into the listbox I would suggest to load only the first few elements and indicate to the user that there are further elements available. This way you give a feedback to the user that there are results available so they can refine their search by entering more letters or load the complete list with a press of a button.
Using BeginUpdate and EndUpdate made no change in the execution time of set_Datasource.
As others have noted here, the LINQ query itself runs quite fast. I believe your bottle-neck is the updating of the listbox itself. You could try something like:
if (textBox_search.Text.Length > 2)
{
listBox_choices.BeginUpdate();
listBox_choices.DataSource = allUsers.Where(item => item.Contains(textBox_search.Text)).ToList();
listBox_choices.EndUpdate();
}
I hope this helps.
Assuming you are only matching by prefixes, the data structure you are looking for is called a trie, also known as "prefix tree". The IEnumerable.Where method that you're using now will have to iterate through all items in your dictionary on each access.
This thread shows how to create a trie in C#.
The WinForms ListBox control really is your enemy here. It will be slow to load the records and the ScrollBar will fight you to show all 120,000 records.
Try using an old-fashioned DataGridView data-sourced to a DataTable with a single column [UserName] to hold your data:
private DataTable dt;
public Form1() {
InitializeComponent();
dt = new DataTable();
dt.Columns.Add("UserName");
for (int i = 0; i < 120000; ++i){
DataRow dr = dt.NewRow();
dr[0] = "user" + i.ToString();
dt.Rows.Add(dr);
}
dgv.AutoSizeColumnsMode = DataGridViewAutoSizeColumnsMode.Fill;
dgv.AllowUserToAddRows = false;
dgv.AllowUserToDeleteRows = false;
dgv.RowHeadersVisible = false;
dgv.DataSource = dt;
}
Then use a DataView in the TextChanged event of your TextBox to filter the data:
private void textBox1_TextChanged(object sender, EventArgs e) {
DataView dv = new DataView(dt);
dv.RowFilter = string.Format("[UserName] LIKE '%{0}%'", textBox1.Text);
dgv.DataSource = dv;
}
First I would change how ListControl sees your data source, you're converting result IEnumerable<string> to List<string>. Especially when you just typed few characters this may be inefficient (and unneeded). Do not make expansive copies of your data.
I would wrap .Where() result to a collection that implements only what is required from IList (search). This will save you to create a new big list for each character is typed.
As alternative I would avoid LINQ and I'd write something more specific (and optimized). Keep your list in memory and build an array of matched indices, reuse array so you do not have to reallocate it for each search.
Second step is to do not search in the big list when small one is enough. When user started to type "ab" and he adds "c" then you do not need to research in the big list, search in the filtered list is enough (and faster). Refine search every time is possible, do not perform a full search each time.
Third step may be harder: keep data organized to be quickly searched. Now you have to change the structure you use to store your data. imagine a tree like this:
A B C
Add Better Ceil
Above Bone Contour
This may simply be implemented with an array (if you're working with ANSI names otherwise a dictionary would be better). Build the list like this (illustration purposes, it matches beginning of string):
var dictionary = new Dictionary<char, List<string>>();
foreach (var user in users)
{
char letter = user[0];
if (dictionary.Contains(letter))
dictionary[letter].Add(user);
else
{
var newList = new List<string>();
newList.Add(user);
dictionary.Add(letter, newList);
}
}
Search will be then done using first character:
char letter = textBox_search.Text[0];
if (dictionary.Contains(letter))
{
listBox_choices.DataSource =
new MyListWrapper(dictionary[letter].Where(x => x.Contains(textBox_search.Text)));
}
Please note I used MyListWrapper() as suggested in first step (but I omitted by 2nd suggestion for brevity, if you choose right size for dictionary key you may keep each list short and fast to - maybe - avoid anything else). Moreover note that you may try to use first two characters for your dictionary (more lists and shorter). If you extend this you'll have a tree (but I don't think you have such big number of items).
There are many different algorithms for string searching (with related data structures), just to mention few:
Finite state automaton based search: in this approach, we avoid backtracking by constructing a deterministic finite automaton (DFA) that recognizes stored search string. These are expensive to construct—they are usually created using the powerset construction—but are very quick to use.
Stubs: Knuth–Morris–Pratt computes a DFA that recognizes inputs with the string to search for as a suffix, Boyer–Moore starts searching from the end of the needle, so it can usually jump ahead a whole needle-length at each step. Baeza–Yates keeps track of whether the previous j characters were a prefix of the search string, and is therefore adaptable to fuzzy string searching. The bitap algorithm is an application of Baeza–Yates' approach.
Index methods: faster search algorithms are based on preprocessing of the text. After building a substring index, for example a suffix tree or suffix array, the occurrences of a pattern can be found quickly.
Other variants: some search methods, for instance trigram search, are intended to find a "closeness" score between the search string and the text rather than a "match/non-match". These are sometimes called "fuzzy" searches.
Few words about parallel search. It's possible but it's seldom trivial because overhead to make it parallel can be easily much higher that search itself. I wouldn't perform search itself in parallel (partitioning and synchronization will become soon too expansive and maybe complex) but I would move search to a separate thread. If main thread isn't busy your users won't feel any delay while they're typing (they won't note if list will appear after 200 ms but they'll feel uncomfortable if they have to wait 50 ms after they typed). Of course search itself must be fast enough, in this case you don't use threads to speed up search but to keep your UI responsive. Please note that a separate thread will not make your query faster, it won't hang UI but if your query was slow it'll still be slow in a separate thread (moreover you have to handle multiple sequential requests too).
You could try using PLINQ (Parallel LINQ).
Although this does not garantee a speed boost, this you need to find out by trial and error.
I doubt you'll be able to make it faster, but for sure you should:
a) Use the AsParallel LINQ extension method
a) Use some kind of timer to delay filtering
b) Put a filtering method on another thread
Keep some kind of string previousTextBoxValue somewhere. Make a timer with a delay
of 1000 ms, that fires searching on tick if previousTextBoxValue is same as your textbox.Text value. If not - reassign previousTextBoxValue to the current value and reset the timer. Set the timer start to the textbox changed event, and it'll make your application smoother. Filtering 120,000 records in 1-3 seconds is OK, but your UI must remain responsive.
You can also try using BindingSource.Filter function. I have used it and it works like a charm to filter from bunch of records, every time update this property with the text being search. Another option would be to use AutoCompleteSource for TextBox control.
Hope it helps!
I would try to sort collection, search to match only start part and limit search by some number.
so on ininialization
allUsers.Sort();
and search
allUsers.Where(item => item.StartWith(textBox_search.Text))
Maybe you can add some cache.
Use Parallel LINQ. PLINQ is a parallel implementation of LINQ to Objects. PLINQ implements the full set of LINQ standard query operators as extension methods for the T:System.Linq namespace and has additional operators for parallel operations. PLINQ combines the simplicity and readability of LINQ syntax with the power of parallel programming. Just like code that targets the Task Parallel Library, PLINQ queries scale in the degree of concurrency based on the capabilities of the host computer.
Introduction to PLINQ
Understanding Speedup in PLINQ
Also you can use Lucene.Net
Lucene.Net is a port of the Lucene search engine library, written in
C# and targeted at .NET runtime users. The Lucene search library is
based on an inverted index. Lucene.Net has three primary goals:
According to what I have seen I agree with the fact to sort the list.
However to sort when the list is construct will be very slow, sort when building, you will have a better execution time.
Otherwise if you don't need to display the list or to keep the order, use a hashmap.
The hashmap will hash your string and search at the exact offset. It should be faster I think.
Try use BinarySearch method it should work faster then Contains method.
Contains will be an O(n)
BinarySearch is an O(lg(n))
I think that sorted collection should work faster on search and slower on adding new elements, but as I understood you have only search perfomance problem.

ThreadPool with speed execution control

I need proccess several lines from a database (can be millions) in parallel in c#. The processing is quite quick (50 or 150ms/line) but I can not know this speed before runtime as it depends on hardware/network.
The ThreadPool or the newer TaskParallelLibrary seems to be what feets my needs as I am new to threading and want to get the most efficient way to process the data.
However these methods does not provide a way to control the speed execution of my tasks (lines/minute) : I want to be able to set a maximum speed limit for the processing or run it full speed.
Please note that setting the number of thread of the ThreadPool/TaskFactory does not provide sufficient accuracy for my needs as I would like to be able to set a speed limit below the 'one thread speed'.
Using a custom sheduler for the TPL seems to be a way to do that, but I did not find a way to implement it.
Furthermore, I'm worried about the efficiency cost that would take such a setup.
Could you provide me a way or advices how to achieve this work ?
Thanks in advance for your answers.
The TPL provides a convenient programming abstraction on top of the Thread Pool. I would always select TPL when that is an option.
If you wish to throttle the total processing speed, there's nothing built-in that would support that.
You can measure the total processing speed as you proceed through the file and regulate speed by introducing (non-spinning) delays in each thread. The size of the delay can be dynamically adjusted in your code based on observed processing speed.
I am not seeing the advantage of limiting a speed, but I suggest you look into limiting max degree of parallalism of the operation. That can be done via MaxDegreeOfParallelism in the ParalleForEach options property as the code works over the disparate lines of data. That way you can control the slots, for lack of a better term, which can be expanded or subtracted depending on the criteria which you are working under.
Here is an example using the ConcurrentBag to process lines of disperate data and to use 2 parallel tasks.
var myLines = new List<string> { "Alpha", "Beta", "Gamma", "Omega" };
var stringResult = new ConcurrentBag<string>();
ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = 2;
Parallel.ForEach( myLines, parallelOptions, line =>
{
if (line.Contains( "e" ))
stringResult.Add( line );
} );
Console.WriteLine( string.Join( " | ", stringResult ) );
// Outputs Beta | Omega
Note that parallel options also has a TaskScheduler property which you can refine more of the processing. Finally for more control, maybe you want to cancel the processing when a specific threshold is reached? If so look into CancellationToken property to exit the process early.

Categories