Background.
My script encounters a StackOverflowException while recursively searching for specific text in a large string. The loop is not infinite; the problem occurs (for a specific search) between 9,000-10,000 legitimate searches -- I need it to keep going. I'm using tail-recursion (I think) and that may be part of my problem, since I gather that C# does not do this well. However, I'm not sure how to avoid using tail-recursion in my case.
Question(s). Why is the StackOverflowException occurring? Does my overall approach make sense? If the design sucks, I'd rather start there, rather than just avoiding an exception. But if the design is acceptable, what can I do about the StackOverflowException?
Code.
The class I've written searches for contacts (about 500+ from a specified list) in a large amount of text (about 6MB). The strategy I'm using is to search for the last name, then look for the first name somewhere shortly before or after the last name. I need to find each instance of each contact within the given text. The StringSearcher class has a recursive method that continues to search for contacts, returning the result whenever one is found, but keeping track of where it left off with the search.
I use this class in the following manner:
StringSearcher searcher = new StringSearcher(
File.ReadAllText(FilePath),
"lastname",
"firstname",
30
);
string searchResult = null;
while ((searchResult = searcher.NextInstance()) != null)
{
// do something with each searchResult
}
On the whole, the script seems to work. Most contacts return the results I expect. However, The problem seems to occur when the primary search string is extremely common (thousands of hits), and the secondary search string never or rarely occurs. I know it's not getting stuck because the CurrentIndex is advancing normally.
Here's the recursive method I'm talking about.
public string NextInstance()
{
// Advance this.CurrentIndex to the next location of the primary search string
this.SearchForNext();
// Look a little before and after the primary search string
this.CurrentContext = this.GetContextAtCurrentIndex();
// Primary search string found?
if (this.AnotherInstanceFound)
{
// If there is a valid secondary search string, is that found near the
// primary search string? If not, look for the next instance of the primary
// search string
if (!string.IsNullOrEmpty(this.SecondarySearchString) &&
!this.IsSecondaryFoundInContext())
{
return this.NextInstance();
}
//
else
{
return this.CurrentContext;
}
}
// No more instances of the primary search string
else
{
return null;
}
}
The StackOverflowException occurs on this.CurrentIndex = ... in the following method:
private void SearchForNext()
{
// If we've already searched once,
// increment the current index before searching further.
if (0 != this.CurrentIndex)
{
this.CurrentIndex++;
this.NumberOfSearches++;
}
this.CurrentIndex = this.Source.IndexOf(
this.PrimarySearchString,
ValidIndex(this.CurrentIndex),
StringComparison.OrdinalIgnoreCase
);
this.AnotherInstanceFound = !(this.CurrentIndex >= 0) ? false : true;
}
I can include more code if needed. Let me know if one of those methods or variables are questionable.
*Performance is not really a concern because this will likely run at night as a scheduled task.
You have a 1MB stack. When that stack space runs out and you still need more stack space a StackOverflowException is thrown. This may or may not be a result of infinite recursion, the runtime has no idea. Infinite recursion is simply one effective way of using more stack space then is available (by using an infinite amount). You can be using a finite amount that just so happens to be more than is available and you'll get the same exception.
While there are other ways to use up lots of stack space, recursion is one of the most effective. Each method is adding more space based on the signature and locals of that method. Having deep recursion can use a lot of stack space, so if you expect to have a depth of more than a few hundred levels (and even that is a lot) you should probably not use recursion. Note that any code using recursion can be written iterativly, or to use an explicit Stack.
It's hard to say, as a complete implementation isn't shown, but based on what I can see you are more or less writing an iterator, but you're not using the C# constructs for one (namely IEnumerable).
My guess is "iterator blocks" will allow you to make this algorithm both easier to write, easier to write non-recursively, and more effective from the caller's side.
Here is a high level look at how you might structure this method as an iterator block:
public static IEnumerable<string> SearchString(string text
, string firstString, string secondString, int unknown)
{
int lastIndexFound = text.IndexOf(firstString);
while (lastIndexFound >= 0)
{
if (secondStringNearFirst(text, firstString, secondString, lastIndexFound))
{
yield return lastIndexFound.ToString();
}
}
}
private static bool secondStringNearFirst(string text
, string firstString, string secondString, int lastIndexFound)
{
throw new NotImplementedException();
}
It doesn't seem like recursion is the right solution here. Normally with recursive problems you have some state you pass to the recursive step. In this case, you really have a plain while loop. Below I put your method body in a loop and changed the recursive step to continue. See if that works...
public string NextInstance()
{
while (true)
{
// Advance this.CurrentIndex to the next location of the primary search string
this.SearchForNext();
// Look a little before and after the primary search string
this.CurrentContext = this.GetContextAtCurrentIndex();
// Primary search string found?
if (this.AnotherInstanceFound)
{
// If there is a valid secondary search string, is that found near the
// primary search string? If not, look for the next instance of the primary
// search string
if (!string.IsNullOrEmpty(this.SecondarySearchString) &&
!this.IsSecondaryFoundInContext())
{
continue; // Start searching again...
}
//
else
{
return this.CurrentContext;
}
}
// No more instances of the primary search string
else
{
return null;
}
}
}
Related
I am aware this question as been asked. And I am not really looking for a function to do so. I was hoping to get some tips on making a little method I made better. Basically, take a long string, and search for a smaller string inside of it. I am aware that there is literally always a million ways to do things better, and that is what brought me here.
Please take a look at the code snippet, and let me know what you think. No, its not very complex, yes it does work for my needs, but I am more interested in learning where the pain points would be using this for something I would assume it would work for, but would not for such and such reason. I hope that makes sense. But to give this question a way to be answered for SO, is this a strong way to perform this task (I somewhat know the answer :) )
Super interested in constructive criticism, not just in "that's bad". I implore you do elaborate on such a thought so I can get the most out of the responses.
public static Boolean FindTextInString(string strTextToSearch, string strTextToLookFor)
{
//put the string to search into lower case
string strTextToSearchLower = strTextToSearch.ToLower();
//put the text to look for to lower case
string strTextToLookForLower = strTextToLookFor.ToLower();
//get the length of both of the strings
int intTextToLookForLength = strTextToLookForLower.Length;
int intTextToSearch = strTextToSearchLower.Length;
//loop through the division amount so we can check each part of the search text
for(int i = 0; i < intTextToSearch; i++)
{
//substring at multiple positions and see if it can be found
if (strTextToSearchLower.Substring(i,intTextToLookForLength) == strTextToLookForLower)
{
//return true if we found a matching string within the search in text
return true;
}
}
//otherwise we will return false
return false;
}
If you only care about finding a substring inside a string, just use String.Contains()
Example:
string string_to_search = "the cat jumped onto the table";
string string_to_find = "jumped onto";
return string_to_search.ToLower().Contains(string_to_find.ToLower());
You can reuse VB's Like operator this way:
1) Make a reference to Microsoft.VisualBasic.dll library.
2) Use the following code.
using Microsoft.VisualBasic;
using Microsoft.VisualBasic.CompilerServices;
if (LikeOperator.LikeString(Source: "11", Pattern: "11*", CompareOption: CompareMethod.Text)
{
// Your code here...
}
To implement your function in a case-insensitive way, it may be more appropriate to use IndexOf instead of the combination of two ToLower() calls with Contains. This is both because ToLower() will generate a new string, and because of the Turkish İ Problem.
Something like the following should do the trick, where it returns False if either term is null, otherwise uses a case-insensitive IndexOf call to determine if the search term exists in the source string:
public static bool SourceContainsSearch(string source, string search)
{
return search != null &&
source?.IndexOf(search, StringComparison.OrdinalIgnoreCase) > -1;
}
I have a program that needs to compare any given string with a predefined string and determine if an insertion error, deletion error, transposition or substitution error was made.
For example, if the word dog was presented to the user and the user submits dogs or doge, it should notify the user that an insertion error has been made.
How do I go about this?
You probably need to write a method for each of the individual error types to see if it's an error, like:
bool IsInsertionError(string expected, string actual) {
// Maybe look at all of the chars in expected, to see if all of them
// are there in actual, in the correct order, but there's an additional one
}
bool IsDeletionError(string expected, string actual) {
// Do the reverse of IsInsertionError - see if all the letters
// of actual are in expected, in the correct order,
// but there's an additional one
}
bool IsTransposition(string expected, string actual) {
// This one might be a little tricker - maybe loop through all the chars,
// and if expected[i] != actual[i], check to see if
// expected[i+1] = actual[i] and actual[i-1]=expected[i]
// or something like that
}
Once you build out all the individual rules, and you first check for regular equality, fire each of them off one at a time.
You've got to just break problems like this down into small components, then once you have a bunch of easy problems, solve them one at a time.
Off the top of my head but I think this should get you started:
Insertion and Deletion should be pretty simple; just check the lengths of the strings.
if(originalString.Length > newString.Length)
{
//deletion
}
else if(originalString.Length < newString.Length)
{
//insertion
}
To detect transposition, check if the lengths match and if so, you could create two List<char> from the two strings. Then check if they match using the expression below
bool isTransposed = originalList.OrderBy(x => x).SequenceEquals(newList.OrderBy(y => y));
To detect substitution, you could use the Hamming Distance and check if it's greater than 0.
I would suggest you to create a function which will take a parameter as the input sting. The function would look more or less like this. Use the function wherever you want then.
private void CheckString(string userString)
{
string predefinedString = "dog";
if (userString == predefinedString)
{
// write your logic here
}
else
{
MessageBox.Show("Incorrect word"); // notify the users about incorrect word over here
}
}
I have multiple .txt files of 150MB size each. Using C# I need to retrieve all the lines containing the string pattern from each file and then write those lines to a newly created file.
I already looked into similar questions but none of their suggested answers could give me the fastest way of fetching results. I tried regular expressions, linq query, contains method, searching with byte arrays but all of them are taking more than 30 minutes to read and compare the file content.
My test files doesn't have any specific format, it's like raw data which we can't split based on a demiliter and filter based on DataViews.. Below is sample format of each line in that file.
Sample.txt
LTYY;;0,0,;123456789;;;;;;;20121002 02:00;;
ptgh;;0,0,;123456789;;;;;;;20121002 02:00;;
HYTF;;0,0,;846234863;;;;;;;20121002 02:00;;
Multiple records......
My Code
using (StreamWriter SW = new StreamWriter(newFile))
{
using(StreamReader sr = new StreamReader(sourceFilePath))
{
while (sr.Peek() >= 0)
{
if (sr.ReadLine().Contains(stringToSearch))
SW.WriteLine(sr.ReadLine().ToString());
}
}
}
I want a sample code which would take less than a minute to search for 123456789 from the Sample.txt. Let me know if my requirement is not clear. Thanks in advance!
Edit
I found the root cause as having the files residing in a remote server is what consuming more time for reading them because when I copied the files into my local machine, all comparison methods completed very quickly so this isn't issue with the way we read or compare content, they more or less took the same time.
But now how do I address this issue, I can't copy all those files to my machine for comparison and get OutOfMemory exceptions
Fastest method to search is using the Boyer–Moore string search algorithm as this method not require to read all bytes from the files, but require random access to bytes or you can try using the Rabin Karp Algorithm
or you can try doing something like the following code, from this answer:
public static int FindInFile(string fileName, string value)
{ // returns complement of number of characters in file if not found
// else returns index where value found
int index = 0;
using (System.IO.StreamReader reader = new System.IO.StreamReader(fileName))
{
if (String.IsNullOrEmpty(value))
return 0;
StringSearch valueSearch = new StringSearch(value);
int readChar;
while ((readChar = reader.Read()) >= 0)
{
++index;
if (valueSearch.Found(readChar))
return index - value.Length;
}
}
return ~index;
}
public class StringSearch
{ // Call Found one character at a time until string found
private readonly string value;
private readonly List<int> indexList = new List<int>();
public StringSearch(string value)
{
this.value = value;
}
public bool Found(int nextChar)
{
for (int index = 0; index < indexList.Count; )
{
int valueIndex = indexList[index];
if (value[valueIndex] == nextChar)
{
++valueIndex;
if (valueIndex == value.Length)
{
indexList[index] = indexList[indexList.Count - 1];
indexList.RemoveAt(indexList.Count - 1);
return true;
}
else
{
indexList[index] = valueIndex;
++index;
}
}
else
{ // next char does not match
indexList[index] = indexList[indexList.Count - 1];
indexList.RemoveAt(indexList.Count - 1);
}
}
if (value[0] == nextChar)
{
if (value.Length == 1)
return true;
indexList.Add(1);
}
return false;
}
public void Reset()
{
indexList.Clear();
}
}
I don't know how long this will take to run, but here are some improvements:
using (StreamWriter SW = new StreamWriter(newFile))
{
using (StreamReader sr = new StreamReader(sourceFilePath))
{
while (!sr.EndOfStream)
{
var line = sr.ReadLine();
if (line.Contains(stringToSearch))
SW.WriteLine(line);
}
}
}
Note that you don't need Peek, EndOfStream will give you what you want. You were calling ReadLine twice (probably not what you had intended). And there's no need to call ToString() on a string.
As I said already, you should have a database, but whatever.
The fastest, shortest and nicest way to do it (even one-lined) is this:
File.AppendAllLines("b.txt", File.ReadLines("a.txt")
.Where(x => x.Contains("123456789")));
But fast? 150MB is 150MB. It's gonna take a while.
You can replace the Contains method with your own, for faster comparison, but that's a whole different question.
Other possible solution...
var sb = new StringBuilder();
foreach (var x in File.ReadLines("a.txt").Where(x => x.Contains("123456789")))
{
sb.AppendLine(x);
}
File.WriteAllText("b.txt", sb.ToString()); // That is one heavy operation there...
Testing it with a file size 150MB, and it found all results within 3 seconds. The thing that takes time is writing the results into the 2nd file (in case there are many results).
150MB is 150MB. If you have one thread going through the entire 150MB, line by line (a "line" being terminated by a newline character/group or by an EOF), your process must read in and spin through all 150MB of the data (not all at once, and it doesn't have to hold all of it at the same time). A linear search through 157,286,400 characters is, very simply, going to take time, and you say you have many such files.
First thing; you're reading the line out of the stream twice. This will, in most cases, actually cause you to read two lines whenever there's a match; what's written to the new file will be the line AFTER the one containing the search string. This is probably not what you want (then again, it may be). If you want to write the line actually containing the search string, read it into a variable before performing the Contains check.
Second, String.Contains() will, by necessity, perform a linear search. In your case, the behavior will actually approach N^2, because when searching for a string within a string, the first character must be found, and where it is, each character is then matched one by one to subsequent characters until all characters in the search string have matched or a non-matching character is found; when a non-match occurs, the algorithm must go back to the character after the initial match to avoid skipping a possible match, meaning it can test the same character many times when checking for a long string against a longer one with many partial matches. This strategy is therefore technically a "brute force" solution. Unfortunately, when you don't know where to look (such as in unsorted data files), there is no more efficient solution.
The only possible speedup I could suggest, other than being able to sort the files' data and then perform an indexed search, is to multithread the solution; if you're only running this method on one thread that looks through every file, not only is only one thread doing the job, but that thread is constantly waiting for the hard drive to serve up the data it needs. Having 5 or 10 threads each working through one file at a time will not only leverage the true power of modern multi-core CPUs more efficiently, but while one thread is waiting on the hard drive, another thread whose data has been loaded can execute, further increasing the efficiency of this approach. Remember, the further away the data is from the CPU, the longer it takes for the CPU to get it, and when your CPU can do between 2 and 4 billion things per second, having to wait even a few milliseconds for the hard drive means you're losing out on millions of potential instructions per second.
I'm not giving you sample code, but have you tried sorting the content of your files?
trying to search for a string from 150MB worth of files is going to take some time any way you slice it, and if regex takes too long for you, than I'd suggest maybe sorting the content of your files, so that you know roughly where "123456789" will occur before you actually search, that way you won't have to search the unimportant parts.
Do not read and write at same time. Search first, save list of matching lines and write it to file at the end.
using System;
using System.Collections.Generic;
using System.IO;
...
List<string> list = new List<string>();
using (StreamReader reader = new StreamReader("input.txt")) {
string line;
while ((line = reader.ReadLine()) != null) {
if (line.Contains(stringToSearch)) {
list.Add(line); // Add to list.
}
}
}
using (StreamWriter writer = new StreamWriter("output.txt")) {
foreach (string line in list) {
writer.WriteLine(line);
}
}
You're going to experience performance problems in your approaches of blocking input from these files while doing string comparisons.
But Windows has a pretty high performance GREP-like tool for doing string searches of text files called FINDSTR that might be fast enough. You could simply call it as a shell command or redirect the results of the command to your output file.
Either preprocessing (sort) or loading your large files into a database will be faster, but I'm assuming that you already have existing files you need to search.
I want to see your ideas on a efficient way to check values of a newly serialized object.
Example I have an xml document I have serialized into an object, now I want to do value checks. First and most basic idea I can think of is to use nested if statments and checks each property, could be from one value checking that it has he correct url format, to checking another proprieties value that is a date but making sue it is in the correct range etc.
So my question is how would people do checks on all values in an object? Type checks are not important as this is already taken care of it is more to do with the value itself. It needs to be for quite large objects this is why I did not really want to use nested if statements.
Edit:
I want to achieve complete value validation on all properties in a given object.
I want to check the value it self not that it is null. I want to check the value for specific things if i have, an object with many properties one is of type string and named homepage.
I want to be able to check that the string in the in the correct URL format if not fail. This is just one example in the same object I could check that a date is in a given range if any are not I will return false or some form of fail.
I am using c# .net 4.
Try to use Fluent Validation, it is separation of concerns and configure validation out of your object
public class Validator<T>
{
List<Func<T,bool>> _verifiers = new List<Func<T, bool>>();
public void AddPropertyValidator(Func<T, bool> propValidator)
{
_verifiers.Add(propValidator);
}
public bool IsValid(T objectToValidate)
{
try {
return _verifiers.All(pv => pv(objectToValidate));
} catch(Exception) {
return false;
}
}
}
class ExampleObject {
public string Name {get; set;}
public int BirthYear { get;set;}
}
public static void Main(string[] args)
{
var validator = new Validator<ExampleObject>();
validator.AddPropertyValidator(o => !string.IsNullOrEmpty(o.Name));
validator.AddPropertyValidator(o => o.BirthYear > 1900 && o.BirthYear < DateTime.Now.Year );
validator.AddPropertyValidator(o => o.Name.Length > 3);
validator.Validate(new ExampleObject());
}
I suggest using Automapper with a ValueResolver. You can deserialize the XML into an object in a very elegant way using autommaper and check if the values you get are valid with a ValueResolver.
You can use a base ValueResolver that check for Nulls or invalid casts, and some CustomResolver's that check if the Values you get are correct.
It might not be exacly what you are looking for, but I think it's an elegant way to do it.
Check this out here: http://dannydouglass.com/2010/11/06/simplify-using-xml-data-with-automapper-and-linqtoxml
In functional languages, such as Haskell, your problem could be solved with the Maybe-monad:
The Maybe monad embodies the strategy of combining a chain of
computations that may each return Nothing by ending the chain early if
any step produces Nothing as output. It is useful when a computation
entails a sequence of steps that depend on one another, and in which
some steps may fail to return a value.
Replace Nothing with null, and the same thing applies for C#.
There are several ways to try and solve the problem, none of them are particularly pretty. If you want a runtime-validation that something is not null, you could use an AOP framework to inject null-checking code into your type. Otherwise you would really have to end up doing nested if checks for null, which is not only ugly, it will probably violate the Law of Demeter.
As a compromise, you could use a Maybe-monad like set of extension methods, which would allow you to query the object, and choose what to do in case one of the properties is null.
Have a look at this article by Dmitri Nesteruk: http://www.codeproject.com/Articles/109026/Chained-null-checks-and-the-Maybe-monad
Hope that helps.
I assume your question is: How do I efficiently check whether my object is valid?
If so, it does not matter that your object was just deserialized from some text source. If your question regards checking the object while deserializing to quickly stop deserializing if an error is found, that is another issue and you should update your question.
Validating an object efficiently is not often discussed when it comes to C# and administrative tools. The reason is that it is very quick no matter how you do it. It is more common to discuss how to do the checks in a manner that is easy to read and easily maintained.
Since your question is about efficiency, here are some ideas:
If you have a huge number of objects to be checked and performance is of key importance, you might want to change your objects into arrays of data so that they can be checked in a consistent manner. Example:
Instead of having MyObject[] MyObjects where MyObject has a lot of properties, break out each property and put them into an array like this:
int[] MyFirstProperties
float[] MySecondProperties
This way, the loop that traverses the list and checks the values, can be as quick as possible and you will not have many cache misses in the CPU cache, since you loop forward in the memory. Just be sure to use regular arrays or lists that are not implemented as linked lists, since that is likely to generate a lot of cache misses.
If you do not want to break up your objects into arrays of properties, it seems that top speed is not of interest but almost top speed. Then, your best bet is to keep your objects in a serial array and do:
.
bool wasOk = true;
foreach (MyObject obj in MyObjects)
{
if (obj.MyFirstProperty == someBadValue)
{
wasOk = false;
break;
}
if (obj.MySecondProperty == someOtherBadValue)
{
wasOk = false;
break;
}
}
This checks whether all your objects' properties are ok. I am not sure what your case really is but I think you get the point. Speed is already great when it comes to just checking properties of an object.
If you do string compares, make sure that you use x = y where possible, instead of using more sophisticated string compares, since x = y has a few quick opt outs, like if any of them is null, return, if the memory address is the same, the strings are equal and a few more clever things if I remember correctly. For any Java guy reading this, do not do this in Java!!! It will work sometimes but not always.
If I did not answer your question, you need to improve your question.
I'm not certain I understand the depth of your question but, wouldn't you just do somthing like this,
public SomeClass
{
private const string UrlValidatorRegex = "http://...
private const DateTime MinValidSomeDate = ...
private const DateTime MaxValidSomeDate = ...
public string SomeUrl { get; set; }
public DateTime SomeDate { get; set; }
...
private ValidationResult ValidateProperties()
{
var urlValidator = new RegEx(urlValidatorRegex);
if (!urlValidator.IsMatch(this.Someurl))
{
return new ValidationResult
{
IsValid = false,
Message = "SomeUrl format invalid."
};
}
if (this.SomeDate < MinValidSomeDate
|| this.SomeDate > MinValidSomeDate)
{
return new ValidationResult
{
IsValid = false,
Message = "SomeDate outside permitted bounds."
};
}
...
// Check other fields and properties here, return false on failure.
...
return new ValidationResult
{
IsValid = true,
};
}
...
private struct ValidationResult
{
public bool IsValid;
public string Message;
}
}
The exact valdiation code would vary depending on how you would like your class to work, no? Consider a property of a familar type,
public string SomeString { get; set; }
What are the valid values for this property. Both null and string.Empty may or may not be valid depending on the Class adorned with the property. There may be maximal length that should be allowed but, these details would vary by implementation.
If any suggested answer is more complicated than code above without offering an increase in performance or functionality, can it be more efficient?
Is your question actually, how can I check the values on an object without having to write much code?
I have a very simple function which takes in a matching bitfield, a grid, and a square. It used to use a delegate but I did a lot of recoding and ended up with a bitfield & operation to avoid the delegate while still being able to perform matching within reason. Basically, the challenge is to find all contiguous elements within a grid which match the match bitfield, starting from a specific "leader" square.
Square is somewhat small (but not tiny) class. Any tips on how to push this to be even faster? Note that the grid itself is pretty small (500 elements in this test).
Edit: It's worth noting that this function is called over 200,000 times per second. In truth, in the long run my goal will be to call it less often, but that's really tough, considering that my end goal is to make the grouping system be handled with scripts rather than being hardcoded. That said, this function is always going to be called more than any other function.
Edit: To clarify, the function does not check if leader matches the bitfield, by design. The intention is that the leader is not required to match the bitfield (though in some cases it will).
Things tried unsuccessfully:
Initializing the dictionary and stack with a capacity.
Casting the int to an enum to avoid a cast.
Moving the dictionary and stack outside the function and clearing them each time they are needed. This makes things slower!
Things tried successfully:
Writing a hashcode function instead of using the default: Hashcodes are precomputed and are equal to x + y * parent.Width. Thanks for the reminder, Jim Mischel.
mquander's Technique: See GetGroupMquander below.
Further Optimization: Once I switched to HashSets, I got rid of the Contains test and replaced it with an Add test. Both Contains and Add are forced to seek a key, so just checking if an add succeeds is more efficient than adding if a Contains fails check fails. That is, if (RetVal.Add(s)) curStack.Push(s);
public static List<Square> GetGroup(int match, Model grid, Square leader)
{
Stack<Square> curStack = new Stack<Square>();
Dictionary<Square, bool> Retval = new Dictionary<Square, bool>();
curStack.Push(leader);
while (curStack.Count != 0)
{
Square curItem = curStack.Pop();
if (Retval.ContainsKey(curItem)) continue;
Retval.Add(curItem, true);
foreach (Square s in curItem.Neighbors)
{
if (0 != ((int)(s.RoomType) & match))
{
curStack.Push(s);
}
}
}
return new List<Square>(Retval.Keys);
}
=====
public static List<Square> GetGroupMquander(int match, Model grid, Square leader)
{
Stack<Square> curStack = new Stack<Square>();
Dictionary<Square, bool> Retval = new Dictionary<Square, bool>();
Retval.Add(leader, true);
curStack.Push(leader);
while (curStack.Count != 0)
{
Square curItem = curStack.Pop();
foreach (Square s in curItem.Neighbors)
{
if (0 != ((int)(s.RoomType) & match))
{
if (!Retval.ContainsKey(s))
{
curStack.Push(s);
Retval.Add(curItem, true);
}
}
}
}
return new List<Square>(Retval.Keys);
}
The code you posted assumes that the leader square matches the bitfield. Is that by design?
I assume your Square class has implemented a GetHashCode method that's quick and provides a good distribution.
You did say micro-optimization . . .
If you have a good idea how many items you're expecting, you'll save a little bit of time by pre-allocating the dictionary. That is, if you know you won't have more than 100 items that match, you can write:
Dictionary<Square, bool> Retval = new Dictionary<Square, bool>(100);
That will avoid having to grow the dictionary and re-hash everything. You can also do the same thing with your stack: pre-allocate it to some reasonable maximum size to avoid resizing later.
Since you say that the grid is pretty small it seems reasonable to just allocate the stack and the dictionary to the grid size, if that's easy to determine. You're only talking grid_size references each, so memory isn't a concern unless your grid becomes very large.
Adding a check to see if an item is in the dictionary before you do the push might speed it up a little. It depends on the relative speed of a dictionary lookup as opposed to the overhead of having a duplicate item in the stack. Might be worth it to give this a try, although I'd be surprised if it made a big difference.
if (0 != ((int)(s.RoomType) & match))
{
if (!Retval.ContainsKey(curItem))
curStack.Push(s);
}
I'm really stretching on this last one. You have that cast in your inner loop. I know that the C# compiler sometimes generates a surprising amount of code for a seemingly simple cast, and I don't know if that gets optimized away by the JIT compiler. You could remove that cast from your inner loop by creating a local variable of the enum type and assigning it the value of match:
RoomEnumType matchType = (RoomEnumType)match;
Then your inner loop comparison becomes:
if (0 != (s.RoomType & matchType))
No cast, which might shave some cycles.
Edit: Micro-optimization aside, you'll probably get better performance by modifying your algorithm slightly to avoid processing any item more than once. As it stands, items that do match can end up in the stack multiple times, and items that don't match can be processed multiple times. Since you're already using a dictionary to keep track of items that do match, you can keep track of the non-matching items by giving them a value of false. Then at the end you simply create a List of those items that have a true value.
public static List<Square> GetGroup(int match, Model grid, Square leader)
{
Stack<Square> curStack = new Stack<Square>();
Dictionary<Square, bool> Retval = new Dictionary<Square, bool>();
curStack.Push(leader);
Retval.Add(leader, true);
int numMatch = 1;
while (curStack.Count != 0)
{
Square curItem = curStack.Pop();
foreach (Square s in curItem.Neighbors)
{
if (Retval.ContainsKey(curItem))
continue;
if (0 != ((int)(s.RoomType) & match))
{
curStack.Push(s);
Retval.Add(s, true);
++numMatch;
}
else
{
Retval.Add(s, false);
}
}
}
// LINQ makes this easier, but since you're using .NET 2.0...
List<Square> matches = new List<Square>(numMatch);
foreach (KeyValuePair<Square, bool> kvp in Retval)
{
if (kvp.Value == true)
{
matches.Add(kvp.Key);
}
}
return matches;
}
Here are a couple of suggestions -
If you're using .NET 3.5, you could change RetVal to a HashSet<Square> instead of a Dictionary<Square,bool>, since you're never using the values (only the keys) in the Dictionary. This would be a small improvement.
Also, if you changed the return to IEnumerable, you could just return the HashSet's enumerator directly. Depending on the usage of the results, it could potentially be faster in certain areas (and you can always use ToList() on the results if you really need a list).
However, there is a BIG optimization that could be added here -
Right now, you're always adding in every neighbor, even if that neighbor has already been processed. For example, when leader is processed, it adds in leader+1y, then when leader+1y is processed, it puts BACK in leader (even though you've already handled that Square), and next time leader is popped off the stack, you continue. This is a lot of extra processing.
Try adding:
foreach (Square s in curItem.Neighbors)
{
if ((0 != ((int)(s.RoomType) & match)) && (!Retval.ContainsKey(s)))
{
curStack.Push(s);
}
}
This way, if you've already processed the square of your neighbor, it doesn't get re-added to the stack, just to be skipped when it's popped later.