Slow iteration through elements in WatiN

Slow iteration through elements in WatiN - c#

I'm writing an application with Watin. Its great, but running a performance analysis on my program, over 50% of execution time is spent looping through lists of elements.
For example:
foreach (TextField bT in browser.TextFields)
{
Is very slow.
I seem to remember seeing somewhere there is a faster way of doing this in WatiN, but unfortunately I can't find the page again.
Accessing the number of elements also seems to be slow, eg;
browser.CheckBoxes.Count
Thanks for any tips,
Chris

I think I could answer you better if I had a better idea of what you were trying to do, but I can share some observations on what I've learned with WatiN so far.
The more specific your selectors are, the faster things will go. Avoid using "browser.Elements" as that is really generic. I'm not sure that it saves much, but doing something like browser.Body.Elements throws the header elements out of the scope of things to check and may save a few calculations.
When I say "scope", consider that WatiN always starts with the entire DOM. Can you think of ways to limit the scope of elements perhaps to the text fields within the main div on your page? WatiN returns Elements and ElementCollections, each of which may have its own ElementCollection. That div probably has a specific ID, so you can do something like
var textFields = ie.Div("divId").TextFields;
Look for opportunities to be more specific, and you can use LINQ to describe what you want more clearly. For example, can you write something like:
ie.Body.TextFields.
Where(tf => !string.IsNullOrWhiteSpace(tf.ClassName) && tf.ClassName.Contains("classname")).ToList().
Foreach(tf => tf.Value = "Your Text");
I would refine that further by reducing the number of times I scan the collection by doing something like:
ie.Body.TextFields.ToList().
Foreach(tf => {
if(!string.IsNullOrWhiteSpace(tf.ClassName) && tf.ClassName.Contains("classname")) {
tf => tf.Value = "Your Text"
}
});
The "Find.By*" specifiers also help WatiN operate on the collections you want faster and are a more elegant short-hand for what I wrote above:
ie.Body.TextFields.Filter(Find.ByClass("class")).ToList().ForEach(tf => tf.Value = "Your Text");
And as a last piece of advice, this project lets you find elements using jQuery/CSS style selectors.
So, tl;dr: Narrow down the scope of what you're looking for, and be specific.
Hope that helps. I'm looking for ways to speed up my own tests.

If you really need to iterate through all text fields, there is no other way. As #Xaqron pointed out, it depends on IE. But maybe you just need to iterate through text fields of eg. specified <div/>? Finding it first, and then iterating through it's text fields would be faster.

Thanks Dahv for a really detailed answer. In my case I've sped up my tests by about 10x using a number of tricks, some similar to yours:
Refining scope as you and prostynick (in my case using Form1.TextField etc.)
First checking if browser.html matches my regex before seeing if
fields do
Using the GehSoft.PRCE RegEx wrapper - its native code regex
matching is far faster than .NET's for small haystacks. So to find a TextField I'd do:
Gehtsoft.PCRE.Regex regexString = new Gehtsoft.PCRE.Regex("[Nn]ame");
foreach (TextField bT in browser.TextFields)
{
//Skip if no match
if (!regexString.Execute(bT.Name).Success) continue;
Before I was looping on a list of regexes, then inside that i was looping on TextFields. Making the TextFields loop the top loop improved speed about 3x.

Related

Efficiency between searching and foreach loops

I am working with a WPF in C#. I am using the GetNextControl method to store all the child controls in a Control.ControlCollection. I want to loop through the results and fill in only the text boxes. I have thought of two ways to do this, but which would be more efficient:
Search once and store the results in an Control.ControlCollection.
Use a foreach loop to go through the collection and use multiple if/else statements to find the TextBox I am looking for and fill in the box with some text.
Or,
Search and store all the controls in a Control.ControlCollection.
Use the find method of the collection to find a TextBoxwith a certain name and fill in some text in the TextBox.
I think that the first way would be slower because there are more comparisons to make. While the second method uses searching only.

Implement the easiest one. Do not worry about optimization until you have metrics to support the need.
If it is not fast enough/efficient enough, then get some good time measurements. Now it is time to consider alternate implementations.
Implement and time each of the alternates, picking the fastest/most efficient one.

Data structure for searching strings

I am looking for the best data structure for the following case:
In my case I will have thousands of strings, however for this example I am gonna use two for obvious reasons. So let's say I have the strings "Water" and "Walter", what I need is when the letter "W" is entered both strings to be found, and when "Wat" is entered "Water" to be the only result. I did a research however I am still not quite sure which is the correct data structure for this case and I don't want to implement it if I am not sure as this will waste time. So basically what I am thinking right now is either "Trie" or "Suffix Tree". It seems that the "Trie" will do the trick but as I said I need to be sure. Additionally the implementation should not be a problem so I just need to know the correct structure. Also feel free to let me know if there is a better choice. As you can guess normal structures such as Dictionary/MultiDictionary would not work as that will be a memory killer. I am also planning to implement cache to limit the memory consumption. I am sorry there is no code but I hope I will get a answer. Thank you in advance.

You should user Trie. Tries are the foundation for one of the fastest known sorting algorithms (burstsort), it is also used for spell checking, and is used in applications that use text completion. You can see details here.

Practically, if you want to do auto suggest, then storing upto 3-4 chars should suffice.
I mean suggest as and when user types "a" or "ab" or "abc" and the moment he types "abcd" or more characters, you can use map.keys starting with "abcd" using c# language support lamda expressions.
Hence, I suggest, create a map like:
Map<char, <Map<char, Map<char, Set<string>>>>> map;
So, if user enters "a", you look for map[a] and finds all children.

A using Trim() in linq makes it run slower

i'm doing a Linq example for some class i need to give to some army guys studying C#. they gave me a database and asked me to make some queries, for example
ArmedVehicles.Where(x => x.vCommandingUnit.Equals("North"))
.Select(x => new {
vCommander = x.vCommander,
vLocation = x.vLocBase,
vType = x.vType});
The problem is that the fields vCommander and vLocBase are padded with blanks, and when i use .Trim() for them then it takes significantly more time (about 5-8 seconds more) and i can't show them that slow example.
of course when i'll talk to them i'll say to fix the database, but for now i need a faster Linq so my example won't make me look bad

If your text is space-padded only on the right, you could use TrimEnd() instead of Trim().
Please remember that loading 14k records in the DataContext is nearly always a bad idea. Normally you can disable the object tracking if you don't need to modify them (see the ObjectTrackingEnabled property of the DataContext object.

Stores the VCommander and VLocaBase fields in the database in the format you need to retrieve it (without padding).

Equivalent to HashSet.Contains that returns HashSet or index?

I have a large list of emails that I need to check test to see if they contain a string. I only need to do this once. I originally only need to check to see if they email matched any of the emails from a list of emails.
I was using if(ListOfEmailsToRemoveHashSet.Contains(email)) { Discard(email); }
This worked great, but now I need to check for partial matches, so I am trying to invert it, but if I used the same method, I would be testing it like...
if (ListOfEmailsHashSet.Contains(badstring). Obviously that tells me which string is being found, but not which index in the hashset contains the bad string.
I can't see any way of making this work while still being fast.
Does anyone know of a function I can use that will return the HashSet of matches, the index of a matched item, or any way around this?

I only need to do this once.
If this is the case, performance shouldn't really be a consideration. Something like this should work:
if(StringsToDisallow.Any(be => email.Contains(be))) {...}
On a side note, you may want to consider using Regular Expressions rather than a straight black-list of contained strings. They'll give you a much more powerful, flexible way to find matches.
If performance does turn out to be an issue after all, you'll have to find a data structure that works better for full-text searching. It might be best to leverage an existing tool like Lucene.NET.

Just a note here, We had a program that was tasked with uploading excess of 100,000 pdf/excel/doc etc, everytime the file was uploaded an entry was made in a text file. Every Night when the program ran it would read this file, load the records and add it to the static HashSet<string> FilesVisited = new HashSet<string>(); FilesVisited.Add(reader.ReadLine());.
When the program attempted to upload a file, we had to first scan through the HashSet to see if we already worked on the file. What we found was that
if (!FilesVisited.Contains(newFilePath))... would take a lot of time and would not give us the correct results (even if the file path was in there) alternately, FilesVisited.Any(m => m.Contains(newFilePath)) was also a slow operation.
The best way we found to be fast was the traditional way of
foreach (var item in FilesVisited)
{
if (item.Contains(fileName)) {
alreadyUploded = true;
break;
}
}
Just thought I would share this....

C# better way to do this?

Hi I have this code below and am looking for a prettier/faster way to do this.
Thanks!
string value = "HelloGoodByeSeeYouLater";
string[] y = new string[]{"Hello", "You"};
foreach(string x in y)
{
value = value.Replace(x, "");
}

You could do:
y.ToList().ForEach(x => value = value.Replace(x, ""));
Although I think your variant is more readable.

Forgive me, but someone's gotta say it,
value = Regex.Replace( value, string.Join("|", y.Select(Regex.Escape)), "" );
Possibly faster, since it creates fewer strings.
EDIT: Credit to Gabe and lasseespeholt for Escape and Select.

While not any prettier, there are other ways to express the same thing.
In LINQ:
value = y.Aggregate(value, (acc, x) => acc.Replace(x, ""));
With String methods:
value = String.Join("", value.Split(y, StringSplitOptions.None));
I don't think anything is going to be faster in managed code than a simple Replace in a foreach though.

It depends on the size of the string you are searching. The foreach example is perfectly fine for small operations but creates a new instance of the string each time it operates because the string is immutable. It also requires searching the whole string over and over again in a linear fashion.
The basic solutions have all been proposed. The Linq examples provided are good if you are comfortable with that syntax; I also liked the suggestion of an extension method, although that is probably the slowest of the proposed solutions. I would avoid a Regex unless you have an extremely specific need.
So let's explore more elaborate solutions and assume you needed to handle a string that was thousands of characters in length and had many possible words to be replaced. If this doesn't apply to the OP's need, maybe it will help someone else.
Method #1 is geared towards large strings with few possible matches.
Method #2 is geared towards short strings with numerous matches.
Method #1
I have handled large-scale parsing in c# using char arrays and pointer math with intelligent seek operations that are optimized for the length and potential frequency of the term being searched for. It follows the methodology of:
Extremely cheap Peeks one character at a time
Only investigate potential matches
Modify output when match is found
For example, you might read through the whole source array and only add words to the output when they are NOT found. This would remove the need to keep redimensioning strings.
A simple example of this technique is looking for a closing HTML tag in a DOM parser. For example, I may read an opening STYLE tag and want to skip through (or buffer) thousands of characters until I find a closing STYLE tag.
This approach provides incredibly high performance, but it's also incredibly complicated if you don't need it (plus you need to be well-versed in memory manipulation/management or you will create all sorts of bugs and instability).
I should note that the .Net string libraries are already incredibly efficient but you can optimize this approach for your own specific needs and achieve better performance (and I have validated this firsthand).
Method #2
Another alternative involves storing search terms in a Dictionary containing Lists of strings. Basically, you decide how long your search prefix needs to be, and read characters from the source string into a buffer until you meet that length. Then, you search your dictionary for all terms that match that string. If a match is found, you explore further by iterating through that List, if not, you know that you can discard the buffer and continue.
Because the Dictionary matches strings based on hash, the search is non-linear and ideal for handling a large number of possible matches.
I'm using this methodology to allow instantaneous (<1ms) searching of every airfield in the US by name, state, city, FAA code, etc. There are 13K airfields in the US, and I've created a map of about 300K permutations (again, a Dictionary with prefixes of varying lengths, each corresponding to a list of matches).
For example, Phoenix, Arizona's main airfield is called Sky Harbor with the short ID of KPHX. I store:
KP
KPH
KPHX
Ph
Pho
Phoe
Ar
Ari
Ariz
Sk
Sky
Ha
Har
Harb
There is a cost in terms of memory usage, but string interning probably reduces this somewhat and the resulting speed justifies the memory usage on data sets of this size. Searching happens as the user types and is so fast that I have actually introduced an artificial delay to smooth out the experience.
Send me a message if you have the need to dig into these methodologies.

Extension method for elegance
(arguably "prettier" at the call level)
I'll implement an extension method that allows you to call your implementation directly on the original string as seen here.
value = value.Remove(y);
// or
value = value.Remove("Hello", "You");
// effectively
string value = "HelloGoodByeSeeYouLater".Remove("Hello", "You");
The extension method is callable on any string value in fact, and therefore easily reusable.
Implementation of Extension method:
I'm going to wrap your own implementation (shown in your question) in an extension method for pretty or elegant points and also employ the params keyword to provide some flexbility passing the arguments. You can substitute somebody else's faster implementation body into this method.
static class EXTENSIONS {
static public string Remove(this string thisString, params string[] arrItems) {
// Whatever implementation you like:
if (thisString == null)
return null;
var temp = thisString;
foreach(string x in arrItems)
temp = temp.Replace(x, "");
return temp;
}
}
That's the brightest idea I can come up with right now that nobody else has touched on.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.