Efficiency between searching and foreach loops - c#

I am working with a WPF in C#. I am using the GetNextControl method to store all the child controls in a Control.ControlCollection. I want to loop through the results and fill in only the text boxes. I have thought of two ways to do this, but which would be more efficient:
Search once and store the results in an Control.ControlCollection.
Use a foreach loop to go through the collection and use multiple if/else statements to find the TextBox I am looking for and fill in the box with some text.
Or,
Search and store all the controls in a Control.ControlCollection.
Use the find method of the collection to find a TextBoxwith a certain name and fill in some text in the TextBox.
I think that the first way would be slower because there are more comparisons to make. While the second method uses searching only.

Implement the easiest one. Do not worry about optimization until you have metrics to support the need.
If it is not fast enough/efficient enough, then get some good time measurements. Now it is time to consider alternate implementations.
Implement and time each of the alternates, picking the fastest/most efficient one.

Related

Huge Dictionary and sub string lookup

I have a dictionary with 500,000 keys and I have to compare using Key.contains("Description"). This is making my performance really slow. Is there any other alternative way to perform faster search?
I had List before but that performed even worse. Tried using Index on List but did not improve performance much.
Other than storing all possible substrings of all possible keys as the keys in the dictionary (which you almost certainly wouldn't have enough memory to do) there really isn't much to be done besides iterating through the entire collection and doing the check on each item. Given that you're iterating the entire collection, there's not really much benefit to using a Dictionary over a List, at least for this specific operation (perhaps other operations you perform on this data benefit from it being in a Dictionary). They're both going to be quite slow. You simply have an inherently expensive operation that you're trying to perform.
If you can alter your requirements somehow to search for a string exactly equal to your search string then you can use the dictionary's hash based lookup, which is super fast, and if you could use a StartsWith or EndsWith operation instead of a full Contains then you could sort the data and use a binary search, but with a Contains operation none of those optimizations can be made.
If the search is performed multiple times, you may want to consider using extra collections holding just the items that match a predefined condition.
These collections would be populated at the same time your original dictionary is populated.
This could be a viable solution if you have a limited number of fixed searches.
I've read that by doing Regex you get an extra overhead, but why don't you benchmark it yourself?
Something like this:
var test = "Telle Carraige Sawmill Rh-ccxxH440xxx38.5Hyv-Op-rL-2008";
var matchCollection = Regex.Matches(test, "(Carraige|Sawmill)",RegexOptions.IgnoreCase);
//matchCollection.Count should be == 2

Fastest way to check if a string is a substring C#?

I have a need to check if a list of items contains a string...so kind of like the list gets filtered as the user types in a search box. So, on the text changed event, I am checking if the entered text is contained in one of the listox items and filtering out...so
something like:
value.Contains(enteredText)
I was wondering if this is the fastest and most efficient way to filter out listbox items?
Is Contains() method the best way to search for substrings in C#?
I'd say that in all but very exceptional circumstances, it's fast and efficient enough, and even in such exceptional circumstances it's likely to be a purely academical problem. If you use it and come across any bottlenecks in your logic related to this then I'd be surprised, but only then would it be worth looking at, then chances are you'll be looking elsewhere.
Contains is one of the cheapest methods in my code completion filtering algorithm (Part 6 #6, where #7 and the fuzzy logic matching described in the footnote are vastly more expensive), which doesn't have problems keeping up with even a fast typing user and thousands of items in the dropdown.
I highly doubt it will cause you problems.
Although this is not the fastest option globally, it is the fastest one for which you do not need to code anything. It should be sufficient for filtering drop-down items.
For longer texts, you may want to go with the KMP Algorithm, which has a linear timing complexity. Note, however, that it would not make any difference for very short search strings.
For searches that have lots of matches (e.g. ones that you get for the first one to two characters) you may want to precompute a table that maps single letters and letter pairs to the rows in your drop-down list for a much faster look-up at the expense of using more memory (a pretty standard tradeoff in programming in general).

Slow iteration through elements in WatiN

I'm writing an application with Watin. Its great, but running a performance analysis on my program, over 50% of execution time is spent looping through lists of elements.
For example:
foreach (TextField bT in browser.TextFields)
{
Is very slow.
I seem to remember seeing somewhere there is a faster way of doing this in WatiN, but unfortunately I can't find the page again.
Accessing the number of elements also seems to be slow, eg;
browser.CheckBoxes.Count
Thanks for any tips,
Chris
I think I could answer you better if I had a better idea of what you were trying to do, but I can share some observations on what I've learned with WatiN so far.
The more specific your selectors are, the faster things will go. Avoid using "browser.Elements" as that is really generic. I'm not sure that it saves much, but doing something like browser.Body.Elements throws the header elements out of the scope of things to check and may save a few calculations.
When I say "scope", consider that WatiN always starts with the entire DOM. Can you think of ways to limit the scope of elements perhaps to the text fields within the main div on your page? WatiN returns Elements and ElementCollections, each of which may have its own ElementCollection. That div probably has a specific ID, so you can do something like
var textFields = ie.Div("divId").TextFields;
Look for opportunities to be more specific, and you can use LINQ to describe what you want more clearly. For example, can you write something like:
ie.Body.TextFields.
Where(tf => !string.IsNullOrWhiteSpace(tf.ClassName) && tf.ClassName.Contains("classname")).ToList().
Foreach(tf => tf.Value = "Your Text");
I would refine that further by reducing the number of times I scan the collection by doing something like:
ie.Body.TextFields.ToList().
Foreach(tf => {
if(!string.IsNullOrWhiteSpace(tf.ClassName) && tf.ClassName.Contains("classname")) {
tf => tf.Value = "Your Text"
}
});
The "Find.By*" specifiers also help WatiN operate on the collections you want faster and are a more elegant short-hand for what I wrote above:
ie.Body.TextFields.Filter(Find.ByClass("class")).ToList().ForEach(tf => tf.Value = "Your Text");
And as a last piece of advice, this project lets you find elements using jQuery/CSS style selectors.
So, tl;dr: Narrow down the scope of what you're looking for, and be specific.
Hope that helps. I'm looking for ways to speed up my own tests.
If you really need to iterate through all text fields, there is no other way. As #Xaqron pointed out, it depends on IE. But maybe you just need to iterate through text fields of eg. specified <div/>? Finding it first, and then iterating through it's text fields would be faster.
Thanks Dahv for a really detailed answer. In my case I've sped up my tests by about 10x using a number of tricks, some similar to yours:
Refining scope as you and prostynick (in my case using Form1.TextField etc.)
First checking if browser.html matches my regex before seeing if
fields do
Using the GehSoft.PRCE RegEx wrapper - its native code regex
matching is far faster than .NET's for small haystacks. So to find a TextField I'd do:
Gehtsoft.PCRE.Regex regexString = new Gehtsoft.PCRE.Regex("[Nn]ame");
foreach (TextField bT in browser.TextFields)
{
//Skip if no match
if (!regexString.Execute(bT.Name).Success) continue;
Before I was looping on a list of regexes, then inside that i was looping on TextFields. Making the TextFields loop the top loop improved speed about 3x.

Filtering a sub set of (potentially) 1.000.000+ items

I have a large dataset with possibly over a million entries. All items have an assigned time stamp and items are added to the set at runtime (usually, but not always, with a newer time stamp).
I need to show a sub set of this data given a certain time range. This time range is usually quite small compared to the total data set, i.e. of the 1.000.000+ items not more than about 1000 are in that given time range. This time range moves at a constant pace, e.g. every second the time range is moved by one second.
Additionally, the user may adjust the time range at any time ("move" through the data set) or set additional filters (e.g. filter by some text).
So far I wasn't worried about performance, trying to get the other things right, and only worked with smaller test sets. I am not quite sure how to tackle this problem efficiently and would be glad for every input. Thanks.
Edit: Used language is C# 4.
Update: I am now using a interval tree, implementation can be found here:
https://github.com/mbuchetics/RangeTree
It also comes with an asynchronous version which rebuilds the tree using the Task Parallel Library (TPL).
We had similar problem in our development - had to collect several million items sorted by some key and then export one page on demand from it. I see that your problem is somehow similar.
For the purpose, we adapted the red-black tree structure, in the following ways:
we added the iterator to it, so we could get 'next' item in o(1)
we added finding the iterator from the 'index', and managed to do that in O(log n)
RB Tree has O(log n) insertion complexity, so I guess that your insertions will fit in there nicely.
next() on the iterator was implemented by adding and maintaining the linked list of all leaf nodes - our original adopted RB Tree implementation didn't include this.
RB Tree is also cool because it allows you to fine-tune the node size according to your needs. By experimenting you'll be able to figure right numbers that just fit your problem.
Use SortedList sorted by timestamp.
All you have to is to have a implement a binary search on the sorted keys inside the sorted list to find the boundary of your selection which is pretty easy.
Insert new items into a sorted list. This would let you select a range pretty easily. You could potentially use linq as well if you're familiar with it.

Fast autocomplete from in-memory collection (.NET)

I have this text input field on a web page. User types in item names for purchase. I'd like to provide a dropdown with possible names, based on letters typed so far.
Question is how to implement the search on the server (ASP.NET MVC). I'll probably load the whole collection of item names (there are over 100 000) in a static variable on app start. How should I implement efficient search for names starting with given one or more characters?
TIA
You can sort the collection by name, then write a modified binary search that returns a range of items.
However, I would recommend first trying a simple sequential search and seeing how it behaves under load.
I'll probably load the whole
collection of item names (there are
over 100 000) in a static variable on
app start. How should I implement
efficient search for names starting
with given one or more characters?
By NOT (!) loading them into a static variable. Hit the db server on every request with a "top 101" clause. Finished.

Categories