C# Performance setting value for each list item

C# Performance setting value for each list item - c#

I am trying to find the fasted way to set a specific property of every item in a generic list.
Basicly the requirement is to iterate over a list of items and resetting the IsHit property to FALSE. Only the items in a second "hit"-list should be set to TRUE afterwards.
My first attempt looked like this:
listItems.ForEach(delegate(Item i) { i.IsHit = false; });
foreach (int hitIndex in hits)
{
listItems[hitIndex - 1].IsHit = true;
}
Note: hits is 1-based, the items list is 0-based.
Then i tried to improve the speed and came up with this:
for (int i = 0; i < listItems.Count; i++)
{
bool hit = false;
for (int j = 0; j < hits.Count; j++)
{
if (i == hits[j] - 1)
{
hit = true;
hits.RemoveAt(j);
break;
}
}
if (hit)
{
this.listItems[i].IsHit = true;
}
else
{
this.listItems[i].IsHit = false;
}
}
I know this is a micro optimization but it is really time sensitive code, so it would make sense to improve this code beyond readibility... and just for fun of course ;-)
Unfortuanetly I don't really see any way to improve the code further. But I probably missed something.
Thanks
PS: Code in C# / .NET 2.0 would be preferred.
I ended up switching to Eamon Nerbonne solution. But then I noticed something weird in my benchmarks.
The delegate:
listItems.ForEach(delegate(Item i) { i.IsHit = false; });
is faster than:
foreach (Item i in listItems)
{
i.IsHit = false;
}
How is that possible?
I tried to look at IL but thats just way over my head... I only see that the delegates results in fewer lines, whatever that means.

Can you put the items of your second list in a dictionary ?
If so, you can do this:
for( int i = 0; i < firstList.Count; i++ )
{
firstList[i].IsHit = false;
if( secondList.Contains (firstList[i].Id) )
{
secondList.Remove (firstList[i].Id);
firstList[i].IsHit = true;
}
}
Where secondList is a Dictionary offcourse.
By putting the items of your histlist in a Dictionary, you can check with an O(1) operation if an item is contained in that list.
In the code above, I use some kind of unique identifier of an Item as the Key in the dictionary.

A nested for-loop is overkill, and in particular, the "remove" call itself represents yet another for-loop. All in all, your second optimized version has a worse time-complexity than the first solution, in particular when there are many hits.
The fastest solution is likely to be the following:
foreach(var item in listItems)
item.IsHit = false;
foreach (int hitIndex in hits)
listItems[hitIndex - 1].IsHit = true;
This avoids the inefficient nested for-loops, and it avoids the overhead of the delegate based .ForEach method (which is a fine method, but not in performance critical code). It involves setting IsHit slightly more frequently, but most property setters are trivial and thus this is probably not a bottleneck. A quick micro-benchmark serves as a fine sanity check in any case.
Only if IsHit is truly slow, the following will be quicker:
bool[] isHit = new bool[listItems.Count]; //default:false.
//BitArray isHit = new BitArray(listItems.Count);
//BitArray is potentially faster for very large lists.
foreach (int hitIndex in hits)
isHit [hitIndex - 1] = true;
for(int i=0; i < listItems.Count; i++)
listItems[i].IsHit = isHit[i];
Finally, consider using an array rather than a List<>. Arrays are generally faster if you can avoid needing the List<> type's insertion/removal methods.
The var keyword is C# 3.5 but can be used in .NET 2.0 (new language features don't require newer library versions, in general - it's just that they're most useful with those newer libs). Of course, you know the type with which List<> is specialized, and can explicitly specify it.

You could maybe sort the hits collection and perform a binary search, then you would be O(n log2 n) instead of O(n2)

Related

Does for loop count elements added into itself during the loop?

My question is, that when I loop through a list with for loop, and add elements to it during this, does it count the elements added while looping?
Simple code example:
for (int i = 0; i < listOfIds.Count(); i++) // Does loop counts the items added below?
{
foreach (var post in this.Collection)
{
if (post.ResponsePostID == listOfIds.ElementAt(i))
{
listOfIds.Add(post.PostId); // I add new item to list in here
}
}
}
I hope my explanation is good enough for you to understand what my question is.

Yes, it usually does. But changing a collection at the same time you're iterating over it can lead to weird behavior and hard-to-find bugs. It isn't recommended at all.

If you want this loop run only for preAdded item count then do this
int nLstCount = listOfIds.Count();
for (int i = 0; i < nLstCount ; i++)
{
foreach (var post in this.Collection)
{
if (post.ResponsePostID == listOfIds.ElementAt(i))
{
listOfIds.Add(post.PostId);
}
}
}

Yes it surely will.The inner foreach loop will execute and add the elements the outer collection and thus will increament the count of the elements.
listOfIds.Count=2 //iteration 1
listOfIds.Add(//element)
when it come to the for loop again
listOfIds.Count=3 //iteration 2

As a slightly abridged explanation of the for loop. You're essentially defining the following:
for (initializer; condition; iterator)
body
Your initializer will will establish your initial conditions, and will only happen once (effectively outside the loop).
Your condition will be evaluated every time to determine whether your loop should run again, or simply exit.
Your iterator defines an action that will occur after each iteration in your loop.
So in your case, your loop will reevaluate listOfIds.Count(); each time, to decide if it should execute; that may or may not be your desired behaviour.
As Dennis points out, you can let yourself get into a bit of a mess (youy loop may run infinitely) if you aren't careful.
A much more detailed/better written explanation can be found on msdn: http://msdn.microsoft.com/en-us/library/ch45axte.aspx

Is there a LINQ extension or (a sensible/efficient set of LINQ entensions) that determine whether a collection has at least 'x' elements?

I have code that needs to know that a collection should not be empty or contain only one item.
In general, I want an extension of the form:
bool collectionHasAtLeast2Items = collection.AtLeast(2);
I can write an extension easily, enumerating over the collection and incrementing an indexer until I hit the requested size, or run out of elements, but is there something already in the LINQ framework that would do this? My thoughts (in order of what came to me) are::
bool collectionHasAtLeast2Items = collection.Take(2).Count() == 2; or
bool collectionHasAtLeast2Items = collection.Take(2).ToList().Count == 2;
Which would seem to work, though the behaviour of taking more elements than the collection contains is not defined (in the documentation) Enumerable.Take Method , however, it seems to do what one would expect.
It's not the most efficient solution, either enumerating once to take the elements, then enumerating again to count them, which is unnecessary, or enumerating once to take the elements, then constructing a list in order to get the count property which isn't enumerator-y, as I don't actually want the list.
It's not pretty as I always have to make two assertions, first taking 'x', then checking that I actually received 'x', and it depends upon undocumented behaviour.
Or perhaps I could use:
bool collectionHasAtLeast2Items = collection.ElementAtOrDefault(2) != null;
However, that's not semantically-clear. Maybe the best is to wrap that with a method-name that means what I want. I'm assuming that this will be efficient, I haven't reflected on the code.
Some other thoughts are using Last(), but I explicitly don't want to enumerate through the whole collection.
Or maybe Skip(2).Any(), again not semantically completely obvious, but better than ElementAtOrDefault(2) != null, though I would think they produce the same result?
Any thoughts?

public static bool AtLeast<T>(this IEnumerable<T> source, int count)
{
// Optimization for ICollection<T>
var genericCollection = source as ICollection<T>;
if (genericCollection != null)
return genericCollection.Count >= count;
// Optimization for ICollection
var collection = source as ICollection;
if (collection != null)
return collection.Count >= count;
// General case
using (var en = source.GetEnumerator())
{
int n = 0;
while (n < count && en.MoveNext()) n++;
return n == count;
}
}

You can use Count() >= 2, if you sequence implements ICollection?
Behind the scene, Enumerable.Count() extension method checks does the sequence under loop implements ICollection. If it does indeed, Count property returned, so target performance should be O(1).
Thus ((IEnumerable<T>)((ICollection)sequence)).Count() >= x also should have O(1).

You could use Count, but if performance is an issue, you will be better off with Take.
bool atLeastX = collection.Take(x).Count() == x;
Since Take (I believe) uses deferred execution, it will only go through the collection once.
abatishchev mentioned that Count is O(1) with ICollection, so you could do something like this and get the best of both worlds.
IEnumerable<int> col;
// set col
int x;
// set x
bool atLeastX;
if (col is ICollection<int>)
{
atLeastX = col.Count() >= x;
}
else
{
atLeastX = col.Take(x).Count() == x;
}
You could also use Skip/Any, in fact I bet it would be even faster than Take/Count.

In .NET, using "foreach" to iterate an instance of IEnumerable<ValueType> will create a copy? So should I prefer to use "for" instead of "foreach"?

In .NET, using "foreach" to iterate an instance of IEnumerable will create a copy? So should I prefer to use "for" instead of "foreach"?
I wrote some code to testify this:
struct ValueTypeWithOneField
{
private Int64 field1;
}
struct ValueTypeWithFiveField
{
private Int64 field1;
private Int64 field2;
private Int64 field3;
private Int64 field4;
private Int64 field5;
}
public class Program
{
static void Main(string[] args)
{
Console.WriteLine("one field");
Test<ValueTypeWithOneField>();
Console.WriteLine("-----------");
Console.WriteLine("Five field");
Test<ValueTypeWithFiveField>();
Console.ReadLine();
}
static void Test<T>()
{
var test = new List<T>();
for (int i = 0; i < 5000000; i++)
{
test.Add(default(T));
}
Stopwatch sw = new Stopwatch();
for (int i = 0; i < 5; i++)
{
sw.Start();
foreach (var item in test)
{
}
sw.Stop();
Console.WriteLine("foreach " + sw.ElapsedMilliseconds);
sw.Restart();
for (int j = 0; j < test.Count; j++)
{
T temp = test[j];
}
sw.Stop();
Console.WriteLine("for " + sw.ElapsedMilliseconds);
sw.Reset();
}
}}
And this is the result that I got after I ran the code:
one field
foreach 68
for 72
foreach 68
for 72
foreach 67
for 72
foreach 64
for 73
foreach 68
for 72
-----------
Five field
foreach 272
for 193
foreach 273
for 191
foreach 272
for 190
foreach 271
for 190
foreach 275
for 188
As we can see in the result, "foreach" always takes more time than "for".
So should I prefer to use "for" instead of "foreach" when iterating through a generic collection of value type?
Note: thanks for the reminder, I edited the code and result. but still, foreach is running slower than for.

Your question is way, way too complex. Break it down.
Does using “foreach” to iterate a sequence of value types create a copy of the sequence?
No.
Does using "foreach" to iterate a sequence of value types create a copy of each value?
Yes.
Does using "for" to do an equivalent iteration of an indexed sequence of value types create a copy of each value?
Usually, yes. There are things you can do to avoid the copying if you know special things about the collection, like for instance that it is an array. But in the general case of indexed collections, indexing the sequence returns a copy of the value in the sequence, not a reference to a storage location containing the value.
Does doing anything to a value type make a copy of the value?
Just about. Value types are copied by value. That's why they're called value types. The only things that you do to value types that do not make a copy are calls to methods on the value type, and passing a value type variable using "out" or "ref". Value types are copied constantly; that's why value types are often slower than reference types.
Does using "foreach" or "for" to iterate a sequence of reference type copy the reference?
Yes. The value of an expression of reference type is a reference. That reference is copied whenever it is used.
So what's the difference between value types and reference types as far as their copying behaviour is concerned?
Value types are copied by value. Reference types copy the reference but not the thing being referred to. A 16-byte value type copies 16 bytes every time you use it. A 16 byte reference type copies the 4 (or 8) byte reference every time you use it.
Is the foreach loop slower than the for loop?
Often it is. The foreach loop is often doing more work, in that it is creating an enumerator and calling methods on the enumerator, instead of just incrementing an integer. Integer increments are extremely fast. Also don't forget that the enumerator in a foreach loop has to be disposed, and that can take time as well.
Should I use the for loop instead of the foreach loop because the for loop is sometimes a few microseconds faster?
No. That's dumb. You should make smart engineering decisions based on customer-focussed empirical data. The extra burden of a foreach loop is tiny. The customer will probably never notice. What you should do is:
Set performance goals based on customer input
Measure to see if you've met your goals
If you have not, find the slowest thing using a profiler
Fix it
Repeat until you've met your goals
Odds are extremely good that if you have a performance problem, changing a foreach loop to a for loop will make no difference whatsoever to your problem. Write the code the way it looks clear and understandable first.

Your test is not accurate; in the foreach version, you're actually spinning up the enumerator and retrieving each value from the list (even though you aren't using it). In the for version, you aren't doing anything with the list at all, other than looking at its Count property. You're essentially testing the performance of an enumerator traversing a collection compared to incrementing an integer variable an equivalent number of times.
To create parity, you'd need to declare a temporary variable and assign it in each iteration of the for loop.
That being said, the answer to your question is yes. A copy of the value will be created with every assignment or return statement.
Performance
This pseudocode breakdown should explain why foreach is somewhat slower than using for in this particular instance:
foreach:
try
{
var en = test.GetEnumerator(); //creates a ListEnumerator
T item;
while(en.MoveNext()) // MoveNext increments the current index and returns
// true if the new index is valid, or false if it's
// beyond the end of the list. If it returns true,
// it retrieves the value at that index and holds it
// in an instance variable
{
item = en.Current; // Current retrieves the value of the current instance
// variable
}
}
finally { }
for:
int index = -1;
T item;
while(++index < test.Count)
{
item = test[index];
}
As you can see, there's simply less code in the for implementation, and foreach has a layer of abstraction (the enumerator) on top of the for. I wrote the for using a while loop to show the two versions in a similar representation.
With all that said...
You're talking about a trivial difference in execution time. Use the loop that makes the code clearer and smaller, and in this circumstance that looks like foreach.

You're not resetting the "stopwatch" after the "for" test, so the time taken in the 'for' test is being added to the subsequent 'foreach' test. Also, as correctly specified, you should do an assignment inside the 'for' to mimic the exact behaviour of the foreach.
sw.Start();
foreach (var item in test)
{
}
sw.Stop();
Console.WriteLine("foreach " + sw.ElapsedMilliseconds);
sw.Restart();
for (int j = 0; j < test.Count; j++)
{
T temp = test[j];
}
sw.Stop();
Console.WriteLine("for " + sw.ElapsedMilliseconds);
sw.Reset(); // -- This bit is missing!

In your for cycle, I don't see you actually accessing items from the test. If you add var x = test[i]; into the for cycle, you'll see that the performance will be (virtually) the same.
Every access to a value-type property creates a copy, either with foreach or using indexer on the list in a for cycle.

here's a discussion on the topic Why should I use foreach instead of for (int i=0; i<length; i++) in loops?

I think that foreach provides an abstract way of looping through but it is technically slower than the for loop, a good article on the differences between the for loop and foreach can be found here

Your test is not fair. Consider how the foreach loop operates. You have the following code:
foreach (var item in test)
{
}
This creates a variable item, and on each iteration fetches the next object from the collection, and assigns it to item. This fetch and assign shouldn't create a copy, but it does take time to access the underlying collection and assign the correct value to the variable.
Then you have this code:
for (int j = 0; j < test.Count; j++)
{
}
This does not access the underlying collection at all. It does not read and assign a variable on each iteration. It simply increments an integer test.Count times, so of course it is faster. And if the compiler is smart, it will see that no operation happens in the loop and just optimize the whole thing away.
A fair comparison would replace that second bit of code with something like:
var item;
for (int j = 0; j < test.Count; j++)
{
item = test.get(j);
}
That is more comparable to what your foreach loop is doing.
As for which to use, it's really a matter of personal preference and coding style. I generally feel that foreach is more clear than for(...) from a readability standpoint.

I found only one case when it matters - Developing for Windows Phone 7. There are two reason why one should change
foreach(var item in colletion)
{
}
To
int length = array.Length;
for(int i = 0; i < length; ++i)
{
}
in XNA game if collections are big or it is called often(f.e. Update method).
It is a bit faster
less garbage
and garbage is critical, since Compact Framework GC fires every 1MB allocation, as a result, it may causes annoying freezes.

which is faster: for or foreach [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
For vs Foreach loop in C#
Lets say I have a collection
List < Foo > list = new List< Foo >();
Now which of the foolowing loops would run faster and why:
for(int i=0; i< list.Count; i++)
or
foreach(Foo foo in list)

It depends :
For For loop, it is on How much time does it take to evaluate the value oflist.Countor whatever value is provided in condition and How much time does it take to reference item at specific index.
For Foreach loop, it depends on How much time it takes for an iterator to return a value.
For your above example, there should not be any difference because you are using a standard List class.

Who cares? Do you have a performance problem? If so, have you measured and determined that this is the slowest part of your app?

foreach is faster to type for me :) and easier to read.

Well.. you can find that out using the System.Diagnostics.StopWatch.
However the point is, why do you need to think about it. You should first consider which one is more readable and use that one instead of bothering about performance in this case.
The golden rule is always write readable code and optimize if you find a performance problem.

try this, for and foreach almost resulting the same time, but the.ForEach() method is faster
class Program
{
static void Main(string[] args)
{
//Add values
List<objClass> lst1 = new List<objClass>();
for (int i = 0; i < 9000000; i++)
{
lst1.Add(new objClass("1", ""));
}
//For loop
DateTime startTime = DateTime.Now;
for (int i = 0; i < 9000000; i++)
{
lst1[i]._s1 = lst1[i]._s2;
}
Console.WriteLine((DateTime.Now - startTime).ToString());
//ForEach Action
startTime = DateTime.Now;
lst1.ForEach(s => { s._s1 = s._s2; });
Console.WriteLine((DateTime.Now - startTime).ToString());
//foreach normal loop
startTime = DateTime.Now;
foreach (objClass s in lst1)
{
s._s1 = s._s2;
}
Console.WriteLine((DateTime.Now - startTime).ToString());
}
public class objClass
{
public string _s1 { get; set; }
public string _s2 { get; set; }
public objClass(string _s1, string _s2)
{
this._s1 = _s1;
this._s2 = _s2;
}
}
}

If you need to use the index of the current item, use for loop. There is no faster solution, you have proper solution only.

I don't have a source to back this up, but I believe they will be almost if not exactly identical due to the way the compiler does optimizations such as loop unrolling. If there is a difference, it's likely on the order of single or tens of CPU cycles, which is as good as nothing for 99.9999% of applications.
In general, foreach tends to be considered 'syntactic sugar', that is, it's nice to have, but doesn't actually do much besides change the way you word a particular piece of code.

Why do we need iterators in c#?

Can somebody provide a real life example regarding use of iterators. I tried searching google but was not satisfied with the answers.

You've probably heard of arrays and containers - objects that store a list of other objects.
But in order for an object to represent a list, it doesn't actually have to "store" the list. All it has to do is provide you with methods or properties that allow you to obtain the items of the list.
In the .NET framework, the interface IEnumerable is all an object has to support to be considered a "list" in that sense.
To simplify it a little (leaving out some historical baggage):
public interface IEnumerable<T>
{
IEnumerator<T> GetEnumerator();
}
So you can get an enumerator from it. That interface (again, simplifying slightly to remove distracting noise):
public interface IEnumerator<T>
{
bool MoveNext();
T Current { get; }
}
So to loop through a list, you'd do this:
var e = list.GetEnumerator();
while (e.MoveNext())
{
var item = e.Current;
// blah
}
This pattern is captured neatly by the foreach keyword:
foreach (var item in list)
// blah
But what about creating a new kind of list? Yes, we can just use List<T> and fill it up with items. But what if we want to discover the items "on the fly" as they are requested? There is an advantage to this, which is that the client can abandon the iteration after the first three items, and they don't have to "pay the cost" of generating the whole list.
To implement this kind of lazy list by hand would be troublesome. We would have to write two classes, one to represent the list by implementing IEnumerable<T>, and the other to represent an active enumeration operation by implementing IEnumerator<T>.
Iterator methods do all the hard work for us. We just write:
IEnumerable<int> GetNumbers(int stop)
{
for (int n = 0; n < stop; n++)
yield return n;
}
And the compiler converts this into two classes for us. Calling the method is equivalent to constructing an object of the class that represents the list.

Iterators are an abstraction that decouples the concept of position in a collection from the collection itself. The iterator is a separate object storing the necessary state to locate an item in the collection and move to the next item in the collection. I have seen collections that kept that state inside the collection (i.e. a current position), but it is often better to move that state to an external object. Among other things it enables you to have multiple iterators iterating the same collection.

Simple example : a function that generates a sequence of integers :
static IEnumerable<int> GetSequence(int fromValue, int toValue)
{
if (toValue >= fromValue)
{
for (int i = fromValue; i <= toValue; i++)
{
yield return i;
}
}
else
{
for (int i = fromValue; i >= toValue; i--)
{
yield return i;
}
}
}
To do it without an iterator, you would need to create an array then enumerate it...

Iterate through the students in a class
The Iterator design pattern provides
us with a common method of enumerating
a list of items or array, while hiding
the details of the list's
implementation. This provides a
cleaner use of the array object and
hides unneccessary information from
the client, ultimately leading to
better code-reuse, enhanced
maintainability, and fewer bugs. The
iterator pattern can enumerate the
list of items regardless of their
actual storage type.

Iterate through a set of homework questions.
But seriously, Iterators can provide a unified way to traverse the items in a collection regardless of the underlying data structure.
Read the first two paragraphs here for a little more info.

A couple of things they're great for:
a) For 'perceived performance' while maintaining code tidiness - the iteration of something separated from other processing logic.
b) When the number of items you're going to iterate through is not known.
Although both can be done through other means, with iterators the code can be made nicer and tidier as someone calling the iterator don't need to worry about how it finds the stuff to iterate through...
Real life example: enumerating directories and files, and finding the first [n] that fulfill some criteria, e.g. a file containing a certain string or sequence etc...

Beside everything else, to iterate through lazy-type sequences - IEnumerators. Each next element of such sequence may be evaluated/initialized upon iteration step which makes it possible to iterate through infinite sequences using finite amount of resources...

The canonical and simplest example is that it makes infinite sequences possible without the complexity of having to write the class to do that yourself:
// generate every prime number
public IEnumerator<int> GetPrimeEnumerator()
{
yield return 2;
var primes = new List<int>();
primesSoFar.Add(2);
Func<int, bool> IsPrime = n => primes.TakeWhile(
p => p <= (int)Math.Sqrt(n)).FirstOrDefault(p => n % p == 0) == 0;
for (int i = 3; true; i += 2)
{
if (IsPrime(i))
{
yield return i;
primes.Add(i);
}
}
}
Obviously this would not be truly infinite unless you used a BigInt instead of int but it gives you the idea.
Writing this code (or similar) for each generated sequence would be tedious and error prone. the iterators do that for you. If the above example seems too complex for you consider:
// generate every power of a number from start^0 to start^n
public IEnumerator<int> GetPowersEnumerator(int start)
{
yield return 1; // anything ^0 is 1
var x = start;
while(true)
{
yield return x;
x *= start;
}
}
They come at a cost though. Their lazy behaviour means you cannot spot common errors (null parameters and the like) until the generator is first consumed rather than created without writing wrapping functions to check first. The current implementation is also incredibly bad(1) if used recursively.
Wiriting enumerations over complex structures like trees and object graphs is much easier to write as the state maintenance is largely done for you, you must simply write code to visit each item and not worry about getting back to it.
I don't use this word lightly - a O(n) iteration can become O(N^2)

An iterator is an easy way of implementing the IEnumerator interface. Instead of making a class that has the methods and properties required for the interface, you just make a method that returns the values one by one and the compiler creates a class with the methods and properties needed to implement the interface.
If you for example have a large list of numbers, and you want to return a collection where each number is multiplied by two, you can make an iterator that returns the numbers instead of creating a copy of the list in memory:
public IEnumerable<int> GetDouble() {
foreach (int n in originalList) yield return n * 2;
}
In C# 3 you can do something quite similar using extension methods and lambda expressions:
originalList.Select(n => n * 2)
Or using LINQ:
from n in originalList select n * 2

IEnumerator<Question> myIterator = listOfStackOverFlowQuestions.GetEnumerator();
while (myIterator.MoveNext())
{
Question q;
q = myIterator.Current;
if (q.Pertinent == true)
PublishQuestion(q);
else
SendMessage(q.Author.EmailAddress, "Your question has been rejected");
}
foreach (Question q in listOfStackOverFlowQuestions)
{
if (q.Pertinent == true)
PublishQuestion(q);
else
SendMessage(q.Author.EmailAddress, "Your question has been rejected");
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.