Switch inside loops impacting performance? - c#

I'm in a scenario where I'm looping through data and formatting it in specific ways based on a setting, and I'm concerned that what I feel is best stylistically might impede performance.
The basic pattern of the code is as follows
enum setting {single, multiple, foo, bar};
Data data = getData(Connection conn, int id);
setting blah = data.getSetting();
foreach (Item item in data)
{
switch(blah)
{
case blah.single:
processDataSingle(item blah);
break;
...
}
}
My concern is that there might be thousands, or even tens of thousands of items in data. I was wondering if having the switch inside the loop where it may be evaluated repeatedly might cause some serious performance issues. I know I could put the switch before the loop, but then each case contains it, which seems much less readable, in that it's less apparent that the basic function remains the same.

You could set up a delegate/action once, then call it every time in the loop:
Data data = getData(Connection conn, int id);
setting blah = data.getSetting();
Action<Item> doThis;
switch (blah)
{
case blah.single:
doThis = i => processSingleData(i blah);
break;
...
}
foreach (Item item in data)
{
doThis(item);
}
Basically, put the body of each "case" in an Action, select that Action in your switch outside the loop, and call the Action in the loop.

You could create a method to keep readability, then pass the data to the method:
void processAllData(IEnumerable<Item> data, setting blah)
{
switch(blah)
{
case blah.single:
foreach (Item item in data)
{
}
}
// next case, next loop ...
}
Then it's just a one-liner:
processAllData(data, blah);
This approach is readable since it encapsulates complexity, concise since you only see what you have to see and efficient since you can optimize the cases.

By using a Action delegate this way, you can factorize your code a lot
enum setting {single, multiple, foo, bar};
Data data = getData(Connection conn, int id);
var processAll = new Action<Action<item>>(action =>
{
foreach(var item in data)
action(item);
});
setting blah = data.getSetting();
switch(blah)
{
case blah.single:
processAll(item => processDataSingle(item, blah));
break;
...
}

It certainly does have the potential to affect performance if you're talking about possibly running the comparison tens of thousands of times or more. The other problem that could potentially arise in the code that you've written here is what happens if you need to add to your enum. Then you'd need to open up this code and adjust it to take care of that circumstance, which violates the Open/Closed Principle.
The best way, IMO, to solve both problems at once would be to use a Factory pattern to take care of this (see posts here and here for some advice on starting that). All you'd need to do is have an interface whose implementations would call the method that you'd want to call in your switch code above. Create a factory and have it pick which implementation to return back to your code (before the loop) based on the enum passed in. At that point all your loop needs to do is to call that interface method which will do exactly what you wanted.
Afterwards, any future feature additions will only require you to create another implementation of that interface, and adjust the enum accordingly. No muss, no fuss.

It's almost certainly slower to put the switch in the loop like that. Whether it's significant or not is impossible to say - use a Stopwatch to see.

If the values in the switch statement are near one to another, the compiler will produce a lookup table instead of N if statements. It increases performance, but it's hard to say when the compiler will decide to do this.
Instead you can create a Dictionary<switchType,Delegate>, populate it with pairs of value-action, and then selecting the appropriate action will take about O(1) as dictionary is a hash table.
dictionary[value].Invoke().

Related

Difference between using where/lambda and for each

I'm new to C# and came across this function preformed on a dictionary.
_objDictionary.Keys.Where(a => (a is fooObject)).ToList().ForEach(a => ((fooObject)a).LaunchMissles());
My understanding is that this essentially puts every key that is a fooObject into a list, then performs the LaunchMissles function of each. How is that different than using a for each loop like this?
foreach(var entry in _objDictionary.Keys)
{
if (entry is fooObject)
{
entry.LaunchMissles();
}
}
EDIT: The resounding opinion appears to be that there is no functional difference.
This is good example of abusing LINQ - statement did not become more readable or better in any other way, but some people just like to put LINQ everywhere. Though in this case you might take the best from both worlds by doing:
foreach(var entry in _objDictionary.Keys.OfType<FooObject>())
{
entry.LaunchMissles();
}
Note that in your foreach example you are missing a cast to FooObject to invoke LaunchMissles.
In general, Linq is no Voodomagic and does the same stuff under the hood that you would need to write if you werent using it. Linq just makes things easier to write but it wont beat regular code performance wise (if it really is equivalent)
In your case, your "oldschool" approach is perfectly fine and in my opinion the favorable
foreach(var entry in _objDictionary.Keys)
{
fooObject foo = entry as fooObject;
if (foo != null)
{
foo .LaunchMissles();
}
}
Regarding the Linq-Approach:
Materializing the Sequence to a List just to call a method on it, that does the same as the code above, is just wasting ressources and making it less readable.
In your example it doesnt make a diffrence but if the source wasnt a Collection (like Dictionary.Keys is) but an IEnumerable that really works the lazy way, then there can be a huge impact.
Lazy evalutation is designed to yield items when needed, calling ToList inbetween would first gather all items before actually executing the ForEach.
While the plain foreach-approach would get one item, then process it, then get the next and so on.
If you really want to use a "Linq-Foreach" than dont use the List-Implementation but roll your own extensionmethod (like mentioned in the comments below your quesiton)
public static class EnumerableExtensionMethods
{
public static void ForEach<T>(this IEnumerable<T> sequence, Action<T> action)
{
foreach(T item in sequence)
action(item);
}
}
Then still rolling with a regular foreach should be prefered, unless you put the foreach-body into a different method
sequence.ForEach(_methodThatDoesThejob);
That is the only "for me acceptable" way of using this.

Using ConcurrentBag correctly

Edit: Thank you, you made me realise that the code below is not working as I assumed, since somehow I thought that cbag works like a hashset. Sorry about it, you saved me some headache :)
the following function is the only function that can change _currentSetOfStepsProcessing. This function can be called from different threads. I am not sure if I understood correctly the use of a ConcurrentBag, so please let me know if in your opinion this can work. _stepsToDo datastructure is never modified once the process starts.
void OnStepDone(InitialiseNewUserBase obj)
{
var stepToDo = _stepsToDo[_currentSetOfStepsProcessing];
stepToDo.TryTake(out obj);
if (stepToDo.Count == 0) //can I assume it will enter here once per ConcurrentBag?
{
if (_currentSetOfStepsProcessing < _stepsToDo.Count - 1)
{
_currentSetOfStepsProcessing++;
}
}
}
List<ConcurrentBag<InitialiseNewUserBase>> _stepsToDo = new List<ConcurrentBag<InitialiseNewUserBase>>();
Action _onFinish;
int _currentSetOfStepsProcessing;
stepToDo.TryTake(out obj); might fail, you don't handle that.
Why are you out-referencing the method argument? This simply overwrites the argument. Why take an argument if you throw it away? More likely, this is a misunderstanding of some kind.
can I assume it will enter here once per ConcurrentBag since access to the bag is apparently concurrent multiple accessing threads might see 0. So yes, you need to handle that case better.
Probably, you should not make things so difficult and use lock in combination with non-concurrent data structures. This would only be a good idea if there was a high frequency of bag operations which seems unlikely.
What about this:
foreach (/*processing step*/) {
Parallel.ForEach(/*item in the step*/, x => { ... });
}
Much simpler.

which is more efficient in conditional looping?

suppose i have the following collection
IEnumerable<car> cars = new IEnumerable<car>();
now I need to loop on this collection.
I need to do some function depending on the car type; so I can do one of the following ways:
Method A:
foreach( var item in cars){
if(item.color == white){
doSomething();
}
else{
doSomeOtherThing();
}
}
or the other way:
Method B:
foreach( var item in cars.where(c=>c.color==white)){
doSomething();
}
foreach( var item in cars.where(c=>c.color!=white)){
doSomeOtherthing();
}
to me i think method A is better bec. I loop only once on the collection
while method B seems enticing bec. the framework will loop and filter the collection for you.
So which method is better and faster ?
Well, it depends on how complicated the filtering process is. It may be so insanely efficient that it's irrelevant, especially in light of the fact that you're no longer having to do your own filtering with the if statement.
I'll say one thing: unless your collections are massive, it probably won't make enough of a difference to care. And, sometimes, it's better to optimise for readabilty rather than speed :-)
But, if you really want to know, you measure! Time the operations in your environment with suitable production-like test data. That's the only way to be certain.
Method A is more readable than method B. Just one question, is it car.color or item.color?

Should I created class or create if?

I have a situation:
I nee to do something with a class.
What should be more efficiente, modify the method this way witf IFs or created methos for each action?
public value Value(int command)
{
if (command == 1)
{
DoSomething1();
}
if (command == 2)
{
DoSomething2();
}
else
{
return empty();
}
}
There are going to be like 50 o more of this commands.
Whats isbetter in terms of performance on execution and size of the exectuable?
At a high-level, it looks like you're trying to implement some kind of dynamic-dispatch system? Or are you just wanting to perform a specified operation without any polymorphism? It's hard to tell.
Anyway, based on the example you've given, switch block would be the most performant, as the JIT compiler converts it into an efficient hashtable lookup instead of a series of comparisons, so just do this:
enum Command { // observe how I use an enum instead "magic" integers
DoSomethingX = 1,
DoSomethingY = 2
}
public Value GetValue(Command command) {
switch(command) {
case Command.DoSomethingX: return DoSomethingX();
case Command.DoSomethingY: return DoSomethingY();
default: return GetEmpty();
}
}
I also note that the switch block also means you get more compact code.
This isn't a performance problem as much as it is a paradigm problem.
In C# a method should be an encapsulation of a task. What you have here is a metric boatload of tasks, each unrelated. That should not be in a single method. Imagine trying to maintain this method in the future. Imagine trying to debug this, wondering where you are in the method as each bit is called.
Your life will be much easier if you split this out, though the performance will probably make no difference.
Although separate methods will nearly certainly be better in terms of performance, it is highly unlikely that you should notice the difference. However, having separate methods should definitely improve readability a lot, which is a lot more important.

Would you use regions within long switch/enum declarations?

I've recently found myself needing (yes, needing) to define absurdly long switch statements and enum declarations in C# code, but I'm wondering what people feel is the best way to split them into logical subsections. In my situation, both the enum values and the cases (which are based on the enum values) have fairly clear groupings, yet I am slightly unsure how to reflect this in code.
Note that in my code, I have roughly 5 groups of between 10 and 30 enum values/cases each.
The three vaguely sensible options I can envisage are:
Define #region blocks around all logical groups of cases/enum values within the declaration (optionally separated by blank lines).
Comment each group with it's name, with a blank line before each group name comment.
Do nothing whatsoever - simply leave the switch/enum as a huge list of cases/values.
Which do you prefer? Would you treat enums and switches separately? (This would seem slightly odd to me.) Now, I wouldn't say that there is any right/wrong answer to this question, though I would nonetheless be quite interested in hearing what the general consenus of views is.
Note 1: This situation where I might potentially have an extremely long enum declaration of 50/100+ values is unfortunately unavoidable (and similarly with the switch), since I am attempting to write a lexer (tokeniser), and this would thus seem the most reasonable approach for several reasons.
Note 2: I am fully aware that several duplicate questions already exist on the question of whether to use regions in general code (for structuring classes, mainly), but I feel my question here is much more specific and hasn't yet been addressed.
Sure, region those things up. They probably don't change much, and when they do, you can expand the region, make your changes, collapse it, and move on to the rest of the file.
They are there for a reason, use them to your advantage.
You could also have a Dictionary<[your_enum_type], Action> (or Func instead of Action) or something like that (considering your functions have a similar signature). Then you could instead of using a switch, instead of:
switch (item)
{
case Enum1: func1(par1, par2)
break;
case Enum2: func2(par1, par2)
break;
}
you could have something like:
public class MyClass
{
Dictionary<int, Action<int, int>> myDictionary;
//These could have only static methods also
Group1Object myObject1;
Group2Object myObject2;
public MyClass()
{
//Again, you wouldn't have to initialize if the functions in them were static
myObject1 = new Group1Object();
myObject2 = new Group2Object();
BuildMyDictionary();
}
private Dictionary<int, Action<int, int>> BuildMyDictionary()
{
InsertGroup1Functions();
InsertGroup2Functions();
//...
}
private void InsertGroup2Functions()
{
myDictionary.Add(1, group2.AnAction2);
myDictionary.Add(2, group2.AnotherAction2);
}
private void InsertGroup1Functions()
{
myDictionary.Add(3, group1.AnAction1);
myDictionary.Add(4, group1.AnotherAction1);
}
public void DoStuff()
{
int t = 3; //Get it from wherever
//instead of switch
myDictionary[t](arg1, arg2);
}
}
I would leave it as a huge list of cases/ values.
If there are some cases that have the same code block, using the Strategy design pattern, could remove the switch block. This can create a lot of classes to you, but will show how complex it really is, and split the logic in smaller classes.
Get rid of the enums and make them into objects. You could then call methods on your objects and keep the code separated, maintainable, and not a nightmare.
There are very few cases when you would actually need to use an enum instead of an object and nobody likes long switch statements.
Here's a good shortcut for people who use regions.
I was switching between Eclipse and Visual Studio when I tried to go full screen in VS by pressing
Ctrl-M-M
and lo and behold, the region closed and expanded!

Categories