Is IEnumerable.Any faster than a for loop with a break? - c#

We experienced some slowness in our code opening a form and it was possibly due to a for loop with a break that was taking a long time to execute. I switched this to an IEnumerable.Any() and saw the form open very quickly. I am now trying to figure out if making this change alone increased performance or if it was accessing the ProductIDs property more efficiently. Should this implementation be faster, and if so, why?
Original Implementation:
public bool ContainsProduct(int productID) {
bool containsProduct = false;
for (int i = 0; i < this.ProductIDs.Length; i++) {
if (productID == this.ProductIDs[i]) {
containsProduct = true;
break;
}
}
return containsProduct;
}
New Implementation:
public bool ContainsProduct(int productID) {
return this.ProductIDs.Any(t => productID == t);
}

Call this an educated guess:
this.ProductIDs.Length
This probably is where the slowness lies. If the list of ProductIDs gets retrieved from database (for example) on every iteration in order to get the Length it would indeed be very slow. You can confirm this by profiling your application.
If this is not the case (say ProductIDs is in memory and Length is cached), then both should have an almost identical running time.

First implementation is slightly faster (enumeration is slightly slower than for loop). Second one is a lot more readable.
UPDATE
Oded's answer is possibly correct and well done for spotting it. The first one is slower here since it involves database roundtrip. Otherwise, it is slightly faster as I said.
UPDATE 2 - Proof
Here is a simple code showing why first one is faster:
public static void Main()
{
int[] values = Enumerable.Range(0, 1000000).ToArray();
int dummy = 0;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < values.Length; i++)
{
dummy *= i;
}
stopwatch.Stop();
Console.WriteLine("Loop took {0}", stopwatch.ElapsedTicks);
dummy = 0;
stopwatch.Reset();
stopwatch.Start();
foreach (var value in values)
{
dummy *= value;
}
stopwatch.Stop();
Console.WriteLine("Iteration took {0}", stopwatch.ElapsedTicks);
Console.Read();
}
Here is output:
Loop took 12198
Iteration took 20922
So loop is twice is fast as iteration/enumeration.

I think they would be more or less identical. I usually refer to Jon Skeet's Reimplementing LINQ to Objects blog series to get an idea of how the extension methods work. Here's the post for Any() and All()
Here's the core part of Any() implementation from that post
public static bool Any<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
...
foreach (TSource item in source)
{
if (predicate(item))
{
return true;
}
}
return false;
}

This post assumes that ProductIDs is a List<T> or an array. So I'm talking about Linq-to-objects.
Linq is usually slower but shorter/more readable than conventional loop based code. A factor of 2-3 depending on what you're doing is typical.
Can you refactor your code to make this.ProductIDs a HashSet<T>? Or at least sort the array so you can use a binary search. Your problem is that you're performing a linear search, which is slow if there are many products.

I think the below implementation would be a little faster than the corresponding linq implementation, but very minor though
public bool ContainsProduct(int productID) {
var length = this.ProductIDs.Length;
for (int i = 0; i < length; i++) {
if (productID == this.ProductIDs[i]) {
return true;
}
}
return false;
}

The difference will be generally in memory usage then speed.
But generally you should use for loop when you know that you will be using all elements of array in other cases you should try to use while or do while.
I think that this solution use minimum resources
int i = this.ProductIDs.Length - 1;
while(i >= 0) {
if(this.ProductIDs[i--] == productId) {
return true;
}
}
return false;

Related

Why is in C# a nested foreach loop more performant than a SelectMany combined with a single foreach loop or a GroupBy? [duplicate]

I am writing a Mesh Rendering manager and thought it would be a good idea to group all of the meshes which use the same shader and then render these while I'm in that shader pass.
I am currently using a foreach loop, but wondered if utilising LINQ might give me a performance increase?
Why should LINQ be faster? It also uses loops internally.
Most of the times, LINQ will be a bit slower because it introduces overhead. Do not use LINQ if you care much about performance. Use LINQ because you want shorter better readable and maintainable code.
LINQ-to-Objects generally is going to add some marginal overheads (multiple iterators, etc). It still has to do the loops, and has delegate invokes, and will generally have to do some extra dereferencing to get at captured variables etc. In most code this will be virtually undetectable, and more than afforded by the simpler to understand code.
With other LINQ providers like LINQ-to-SQL, then since the query can filter at the server it should be much better than a flat foreach, but most likely you wouldn't have done a blanket "select * from foo" anyway, so that isn't necessarily a fair comparison.
Re PLINQ; parallelism may reduce the elapsed time, but the total CPU time will usually increase a little due to the overheads of thread management etc.
LINQ is slower now, but it might get faster at some point. The good thing about LINQ is that you don't have to care about how it works. If a new method is thought up that's incredibly fast, the people at Microsoft can implement it without even telling you and your code would be a lot faster.
More importantly though, LINQ is just much easier to read. That should be enough reason.
It should probably be noted that the for loop is faster than the foreach. So for the original post, if you are worried about performance on a critical component like a renderer, use a for loop.
Reference:
In .NET, which loop runs faster, 'for' or 'foreach'?
You might get a performance boost if you use parallel LINQ for multi cores. See Parallel LINQ (PLINQ) (MSDN).
I was interested in this question, so I did a test just now. Using .NET Framework 4.5.2 on an Intel(R) Core(TM) i3-2328M CPU # 2.20GHz, 2200 Mhz, 2 Core(s) with 8GB ram running Microsoft Windows 7 Ultimate.
It looks like LINQ might be faster than for each loop. Here are the results I got:
Exists = True
Time = 174
Exists = True
Time = 149
It would be interesting if some of you could copy & paste this code in a console app and test as well.
Before testing with an object (Employee) I tried the same test with integers. LINQ was faster there as well.
public class Program
{
public class Employee
{
public int id;
public string name;
public string lastname;
public DateTime dateOfBirth;
public Employee(int id,string name,string lastname,DateTime dateOfBirth)
{
this.id = id;
this.name = name;
this.lastname = lastname;
this.dateOfBirth = dateOfBirth;
}
}
public static void Main() => StartObjTest();
#region object test
public static void StartObjTest()
{
List<Employee> items = new List<Employee>();
for (int i = 0; i < 10000000; i++)
{
items.Add(new Employee(i,"name" + i,"lastname" + i,DateTime.Today));
}
Test3(items, items.Count-100);
Test4(items, items.Count - 100);
Console.Read();
}
public static void Test3(List<Employee> items, int idToCheck)
{
Stopwatch s = new Stopwatch();
s.Start();
bool exists = false;
foreach (var item in items)
{
if (item.id == idToCheck)
{
exists = true;
break;
}
}
Console.WriteLine("Exists=" + exists);
Console.WriteLine("Time=" + s.ElapsedMilliseconds);
}
public static void Test4(List<Employee> items, int idToCheck)
{
Stopwatch s = new Stopwatch();
s.Start();
bool exists = items.Exists(e => e.id == idToCheck);
Console.WriteLine("Exists=" + exists);
Console.WriteLine("Time=" + s.ElapsedMilliseconds);
}
#endregion
#region int test
public static void StartIntTest()
{
List<int> items = new List<int>();
for (int i = 0; i < 10000000; i++)
{
items.Add(i);
}
Test1(items, -100);
Test2(items, -100);
Console.Read();
}
public static void Test1(List<int> items,int itemToCheck)
{
Stopwatch s = new Stopwatch();
s.Start();
bool exists = false;
foreach (var item in items)
{
if (item == itemToCheck)
{
exists = true;
break;
}
}
Console.WriteLine("Exists=" + exists);
Console.WriteLine("Time=" + s.ElapsedMilliseconds);
}
public static void Test2(List<int> items, int itemToCheck)
{
Stopwatch s = new Stopwatch();
s.Start();
bool exists = items.Contains(itemToCheck);
Console.WriteLine("Exists=" + exists);
Console.WriteLine("Time=" + s.ElapsedMilliseconds);
}
#endregion
}
This is actually quite a complex question. Linq makes certain things very easy to do, that if you implement them yourself, you might stumble over (e.g. linq .Except()). This particularly applies to PLinq, and especially to parallel aggregation as implemented by PLinq.
In general, for identical code, linq will be slower, because of the overhead of delegate invocation.
If, however, you are processing a large array of data, and applying relatively simple calculations to the elements, you will get a huge performance increase if:
You use an array to store the data.
You use a for loop to access each element (as opposed to foreach or linq).
Note: When benchmarking, please everyone remember - if you use the same array/list for two consecutive tests, the CPU cache will make the second one faster. *
Coming in .NET core 7 are some significant updates to LINQ performance of .Min .Max, .Average and .Sum
Reference: https://devblogs.microsoft.com/dotnet/performance_improvements_in_net_7/#linq
Here is a benchmark from the post.
If you compare to a ForEach loop, than it becomes apparent that in .NET 6 the ForEach loop was faster and in .NET 7 the LINQ methods:
this was the code of the benchmark using BenchmarkDotNet
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
public class Program
{
public static void Main()
{
BenchmarkRunner.Run<ForEachVsLinq>();
}
}
[SimpleJob(RuntimeMoniker.Net60)]
[SimpleJob(RuntimeMoniker.Net70)]
[MemoryDiagnoser(false)]
public class ForEachVsLinq
{
private int[] _intArray;
[GlobalSetup]
public void Setup()
{
var random = new Random();
var randomItems = Enumerable.Range(0, 500).Select(_ => random.Next(999));
this._intArray = randomItems.ToArray();
}
[Benchmark]
public void ForEachMin()
{
var min = int.MaxValue;
foreach (var i in this._intArray)
{
if ( i < min)
min = i;
}
Console.WriteLine(min);
}
[Benchmark]
public void Min()
{
var min = this._intArray.Min();
Console.WriteLine(min);
}
[Benchmark]
public void ForEachMax()
{
var max = 0;
foreach (var i in this._intArray)
{
if (i > max)
max = i;
}
Console.WriteLine(max);
}
[Benchmark]
public void Max()
{
var max = this._intArray.Max();
Console.WriteLine(max);
}
[Benchmark]
public void ForEachSum()
{
var sum = 0;
foreach (var i in this._intArray)
{
sum += i;
}
Console.WriteLine(sum);
}
[Benchmark]
public void Sum()
{
var sum = this._intArray.Sum();
Console.WriteLine(sum);
}
}
In .NET Core 6 and earlier versions the mentioned methods are slower than doing your own foreach loop and finding the min, max value, average or summarizing the objects in the array.
But in .NET Core 7, the performance increase makes these buildin LINQ methods actually a lot faster.
Nick Chapsas shows this in a benchmark video on YouTupe
So if you want to calculate the sum, min, max or average value, you should use the LINQ methods instead of a foreach loop from .NET Core 7 onwards (at least, from a performance point of view)

C# LINQ performance when extension method called inside where clause

I have a LINQ query like this
public static bool CheckIdExists(int searchId)
{
return itemCollection.Any(item => item.Id.Equals(searchId.ConvertToString()));
}
item.Id is a string while searchId is an int. .ConvertToString() is an extension which which converts int to string
Code for ConvertToString:
public static string ConvertToString(this object input)
{
return Convert.ToString(input, CultureInfo.InvariantCulture);
}
Now my query is, does searchId.ConvertToString() gets executed for each item in itemCollection?
Is computing searchId.ConvertToString() beforehand and calling the method like below improves performance?
public static bool CheckIdExists(int searchId)
{
string sId=searchId.ConvertToString();
return itemCollection.Any(item => item.Id.Equals(sId));
}
How to debug these two scenarios and observe their performances?
I re-generated the scenarios you talked about in your question. I tried following code and got this output.
But this is how you can debug this.
static List<string> itemCollection = new List<string>();
static void Main(string[] args)
{
for (int i = 0; i < 10000000; i++)
{
itemCollection.Add(i.ToString());
}
var watch = new Stopwatch();
watch.Start();
Console.WriteLine(CheckIdExists(580748));
watch.Stop();
Console.WriteLine($"Took {watch.ElapsedMilliseconds}");
var watch1 = new Stopwatch();
watch1.Start();
Console.WriteLine(CheckIdExists1(580748));
watch1.Stop();
Console.WriteLine($"Took {watch1.ElapsedMilliseconds}");
Console.ReadLine();
}
public static bool CheckIdExists(int searchId)
{
return itemCollection.Any(item => item.Equals(ConvertToString(searchId)));
}
public static bool CheckIdExists1(int searchId)
{
string sId =ConvertToString(searchId);
return itemCollection.Any(item => item.Equals(sId));
}
public static string ConvertToString(int input)
{
return Convert.ToString(input, CultureInfo.InvariantCulture);
}
OUTPUT:
True
Took 170
True
Took 11
How long it takes is the ultimate guide. You can create a stopwatch to log the performance of any code. Just use the ElapsedMilliseconds to see how long has been taken. For very short operations I suggest using very long loops to get a more accurate length of time.
var watch = new Stopwatch();
watch.Start();
/// CODE HERE (IDEALLY IN A LONG LOOP)
Debub.WriteLine($"Took {watch.ElapsedMilliseconds}");
Yes, it should be faster to get the string once. But I guess that compiler does optimize that thing for you (I just suspect this, don't ave anything to back it up. I just remeber that compilers are very good at detecting things that are not changing).
And no, it's not computed for every item, since LINQ method Any does not necessarily check all items. It return true for the first matching item. The only scenario when it checks all items, is where for none the lambda returns true.
If you want to test the speed difference,make sure to have more data - otherwise the difference may be too small.
Just do:
itemCollection = Enumerable.Range(0, 1000).SelectMany(x => itemCollection).ToList() // or array or whatever the type of collection you have
Than measure the times with StopWatch, just like #RobSedgwick said
I think you have two solution:
1- make log and meke inside this log datetime.now
2- you can use the diagnostic Tools tab
hopefully this help you

Buffering a LINQ query

FINAL EDIT:
I've chosen Timothy's answer but if you want a cuter implementation that leverages the C# yield statement check Eamon's answer: https://stackoverflow.com/a/19825659/145757
By default LINQ queries are lazily streamed.
ToArray/ToList give full buffering but first they're eager and secondly it may take quite some time to complete with an infinite sequence.
Is there any way to have a combination of both behaviors : streaming and buffering values on the fly as they are generated, so that the next querying won't trigger the generation of the elements that have already been queried.
Here is a basic use-case:
static IEnumerable<int> Numbers
{
get
{
int i = -1;
while (true)
{
Console.WriteLine("Generating {0}.", i + 1);
yield return ++i;
}
}
}
static void Main(string[] args)
{
IEnumerable<int> evenNumbers = Numbers.Where(i => i % 2 == 0);
foreach (int n in evenNumbers)
{
Console.WriteLine("Reading {0}.", n);
if (n == 10) break;
}
Console.WriteLine("==========");
foreach (int n in evenNumbers)
{
Console.WriteLine("Reading {0}.", n);
if (n == 10) break;
}
}
Here is the output:
Generating 0.
Reading 0.
Generating 1.
Generating 2.
Reading 2.
Generating 3.
Generating 4.
Reading 4.
Generating 5.
Generating 6.
Reading 6.
Generating 7.
Generating 8.
Reading 8.
Generating 9.
Generating 10.
Reading 10.
==========
Generating 0.
Reading 0.
Generating 1.
Generating 2.
Reading 2.
Generating 3.
Generating 4.
Reading 4.
Generating 5.
Generating 6.
Reading 6.
Generating 7.
Generating 8.
Reading 8.
Generating 9.
Generating 10.
Reading 10.
The generation code is triggered 22 times.
I'd like it to be triggered 11 times, the first time the enumerable is iterated.
Then the second iteration would benefit from the already generated values.
It would be something like:
IEnumerable<int> evenNumbers = Numbers.Where(i => i % 2 == 0).Buffer();
For those familiar with Rx it's a behavior similar to a ReplaySubject.
IEnumerable<T>.Buffer() extension method
public static EnumerableExtensions
{
public static BufferEnumerable<T> Buffer(this IEnumerable<T> source)
{
return new BufferEnumerable<T>(source);
}
}
public class BufferEnumerable<T> : IEnumerable<T>, IDisposable
{
IEnumerator<T> source;
List<T> buffer;
public BufferEnumerable(IEnumerable<T> source)
{
this.source = source.GetEnumerator();
this.buffer = new List<T>();
}
public IEnumerator<T> GetEnumerator()
{
return new BufferEnumerator<T>(source, buffer);
}
public void Dispose()
{
source.Dispose()
}
}
public class BufferEnumerator<T> : IEnumerator<T>
{
IEnumerator<T> source;
List<T> buffer;
int i = -1;
public BufferEnumerator(IEnumerator<T> source, List<T> buffer)
{
this.source = source;
this.buffer = buffer;
}
public T Current
{
get { return buffer[i]; }
}
public bool MoveNext()
{
i++;
if (i < buffer.Count)
return true;
if (!source.MoveNext())
return false;
buffer.Add(source.Current);
return true;
}
public void Reset()
{
i = -1;
}
public void Dispose()
{
}
}
Usage
using (var evenNumbers = Numbers.Where(i => i % 2 == 0).Buffer())
{
...
}
Comments
The key point here is that the IEnumerable<T> source given as input to the Buffer method only has GetEnumerator called once, regardless of how many times the result of Buffer is enumerated. All enumerators for the result of Buffer share the same source enumerator and internal list.
You can use the Microsoft.FSharp.Collections.LazyList<> type from the F# power pack (yep, from C# without F# installed - no problem!) for this. It's in Nuget package FSPowerPack.Core.Community.
In particular, you want to call LazyListModule.ofSeq(...) which returns a LazyList<T> that implements IEnumerable<T> and is lazy and cached.
In your case, usage is just a matter of...
var evenNumbers = LazyListModule.ofSeq(Numbers.Where(i => i % 2 == 0));
var cachedEvenNumbers = LazyListModule.ofSeq(evenNumbers);
Though I personally prefer var in all such cases, note that this does mean the compile-time type will be more specific than just IEnumerable<> - not that this is likely to ever be a downside. Another advantage of the F# non-interface types is that they expose some efficient operations you can't do efficienly with plain IEnumerables, such as LazyListModule.skip.
I'm not sure whether LazyList is thread-safe, but I suspect it is.
Another alternative pointed out in the comments below (if you have F# installed) is SeqModule.Cache (namespace Microsoft.FSharp.Collections, it'll be in GACed assembly FSharp.Core.dll) which has the same effective behavior. Like other .NET enumerables, Seq.cache doesn't have a tail (or skip) operator you can efficiently chain.
Thread-safe: unlike other solutions to this question Seq.cache is thread-safe in the sense that you can have multiple enumerators running in parallel (each enumerator is not thread safe).
Performance I did a quick benchmark, and the LazyList enumerable has at least 4 times more overhead than the SeqModule.Cache variant, which has at least three times more overhead than the custom implementation answers. So, while the F# variants work, they're not quite as fast. Note that 3-12 times slower still isn't very slow compared to an enumerable that does (say) I/O or any non-trivial computation, so this probably won't matter most of the time, but it's good to keep in mind.
TL;DR If you need an efficient, thread-safe cached enumerable, just use SeqModule.Cache.
Building upon Eamon's answer above, here's another functional solution (no new types) that works also with simultaneous evaluation. This demonstrates that a general pattern (iteration with shared state) underlies this problem.
First we define a very general helper method, meant to allow us to simulate the missing feature of anonymous iterators in C#:
public static IEnumerable<T> Generate<T>(Func<Func<Tuple<T>>> generator)
{
var tryGetNext = generator();
while (true)
{
var result = tryGetNext();
if (null == result)
{
yield break;
}
yield return result.Item1;
}
}
Generate is like an aggregator with state. It accepts a function that returns initial state, and a generator function that would have been an anonymous with yield return in it, if it were allowed in C#. The state returned by initialize is meant to be per-enumeration, while a more global state (shared between all enumerations) can be maintained by the caller to Generate e.g. in closure variables as we'll show below.
Now we can use this for the "buffered Enumerable" problem:
public static IEnumerable<T> Cached<T>(IEnumerable<T> enumerable)
{
var cache = new List<T>();
var enumerator = enumerable.GetEnumerator();
return Generate<T>(() =>
{
int pos = -1;
return () => {
pos += 1;
if (pos < cache.Count())
{
return new Tuple<T>(cache[pos]);
}
if (enumerator.MoveNext())
{
cache.Add(enumerator.Current);
return new Tuple<T>(enumerator.Current);
}
return null;
};
});
}
I hope this answer combines the brevity and clarity of sinelaw's answer and the support for multiple enumerations of Timothy's answer:
public static IEnumerable<T> Cached<T>(this IEnumerable<T> enumerable) {
return CachedImpl(enumerable.GetEnumerator(), new List<T>());
}
static IEnumerable<T> CachedImpl<T>(IEnumerator<T> source, List<T> buffer) {
int pos=0;
while(true) {
if(pos == buffer.Count)
if (source.MoveNext())
buffer.Add(source.Current);
else
yield break;
yield return buffer[pos++];
}
}
Key ideas are to use the yield return syntax to make for a short enumerable implementation, but you still need a state-machine to decide whether you can get the next element from the buffer, or whether you need to check the underlying enumerator.
Limitations: This makes no attempt to be thread-safe, nor does it dispose the underlying enumerator (which, in general, is quite tricky to do as the underlying uncached enumerator must remain undisposed as long as any cached enumerabl might still be used).
As far as I know there is no built-in way to do this, which - now that you mention it - is slightly surprising (my guess is, given the frequency with which one would want to use this option, it was probably not worth the effort needed to analyse the code to make sure that the generator gives the exact same sequence every time).
You can however implement it yourself. The easy way would be on the call-site, as
var evenNumbers = Numbers.Where(i => i % 2 == 0).
var startOfList = evenNumbers.Take(10).ToList();
// use startOfList instead of evenNumbers in the loop.
More generally and accurately, you could do it in the generator: create a List<int> cache and every time you generate a new number add it to the cache before you yield return it. Then when you loop through again, first serve up all the cached numbers. E.g.
List<int> cachedEvenNumbers = new List<int>();
IEnumerable<int> EvenNumbers
{
get
{
int i = -1;
foreach(int cached in cachedEvenNumbers)
{
i = cached;
yield return cached;
}
// Note: this while loop now starts from the last cached value
while (true)
{
Console.WriteLine("Generating {0}.", i + 1);
yield return ++i;
}
}
}
I guess if you think about this long enough you could come up with a general implementation of a IEnumerable<T>.Buffered() extension method - again, the requirement is that the enumeration doesn't change between calls and the question is if it is worth it.
Here's an incomplete yet compact 'functional' implementation (no new types defined).
The bug is that it does not allow simultaneous enumeration.
Original description:
The first function should have been an anonymous lambda inside the second, but C# does not allow yield in anonymous lambdas:
// put these in some extensions class
private static IEnumerable<T> EnumerateAndCache<T>(IEnumerator<T> enumerator, List<T> cache)
{
while (enumerator.MoveNext())
{
var current = enumerator.Current;
cache.Add(current);
yield return current;
}
}
public static IEnumerable<T> ToCachedEnumerable<T>(this IEnumerable<T> enumerable)
{
var enumerator = enumerable.GetEnumerator();
var cache = new List<T>();
return cache.Concat(EnumerateAndCache(enumerator, cache));
}
Usage:
var enumerable = Numbers.ToCachedEnumerable();
Full credit to Eamon Nerbonne and sinelaw for their answers, just a couple of tweaks! First, to release the enumerator when it is completed. Secondly to protect the underlying enumerator with a lock so the enumerable can be safely used on multiple threads.
// This is just the same as #sinelaw's Generator but I didn't like the name
public static IEnumerable<T> AnonymousIterator<T>(Func<Func<Tuple<T>>> generator)
{
var tryGetNext = generator();
while (true)
{
var result = tryGetNext();
if (null == result)
{
yield break;
}
yield return result.Item1;
}
}
// Cached/Buffered/Replay behaviour
public static IEnumerable<T> Buffer<T>(this IEnumerable<T> self)
{
// Rows are stored here when they've been fetched once
var cache = new List<T>();
// This counter is thread-safe in that it is incremented after the item has been added to the list,
// hence it will never give a false positive. It may give a false negative, but that falls through
// to the code which takes the lock so it's ok.
var count = 0;
// The enumerator is retained until it completes, then it is discarded.
var enumerator = self.GetEnumerator();
// This lock protects the enumerator only. The enumerable could be used on multiple threads
// and the enumerator would then be shared among them, but enumerators are inherently not
// thread-safe so a) we must protect that with a lock and b) we don't need to try and be
// thread-safe in our own enumerator
var lockObject = new object();
return AnonymousIterator<T>(() =>
{
int pos = -1;
return () =>
{
pos += 1;
if (pos < count)
{
return new Tuple<T>(cache[pos]);
}
// Only take the lock when we need to
lock (lockObject)
{
// The counter could have been updated between the check above and this one,
// so now we have the lock we must check again
if (pos < count)
{
return new Tuple<T>(cache[pos]);
}
// Enumerator is set to null when it has completed
if (enumerator != null)
{
if (enumerator.MoveNext())
{
cache.Add(enumerator.Current);
count += 1;
return new Tuple<T>(enumerator.Current);
}
else
{
enumerator = null;
}
}
}
}
return null;
};
});
}
I use the following extension method.
This way, the input is read at maximum speed, and the consumer processes at maximum speed.
public static IEnumerable<T> Buffer<T>(this IEnumerable<T> input)
{
var blockingCollection = new BlockingCollection<T>();
//read from the input
Task.Factory.StartNew(() =>
{
foreach (var item in input)
{
blockingCollection.Add(item);
}
blockingCollection.CompleteAdding();
});
foreach (var item in blockingCollection.GetConsumingEnumerable())
{
yield return item;
}
}
Example Usage
This example has a fast producer (find files), and a slow consumer (upload files).
long uploaded = 0;
long total = 0;
Directory
.EnumerateFiles(inputFolder, "*.jpg", SearchOption.AllDirectories)
.Select(filename =>
{
total++;
return filename;
})
.Buffer()
.ForEach(filename =>
{
//pretend to do something slow, like upload the file.
Thread.Sleep(1000);
uploaded++;
Console.WriteLine($"Uploaded {uploaded:N0}/{total:N0}");
});

Is a LINQ statement faster than a 'foreach' loop?

I am writing a Mesh Rendering manager and thought it would be a good idea to group all of the meshes which use the same shader and then render these while I'm in that shader pass.
I am currently using a foreach loop, but wondered if utilising LINQ might give me a performance increase?
Why should LINQ be faster? It also uses loops internally.
Most of the times, LINQ will be a bit slower because it introduces overhead. Do not use LINQ if you care much about performance. Use LINQ because you want shorter better readable and maintainable code.
LINQ-to-Objects generally is going to add some marginal overheads (multiple iterators, etc). It still has to do the loops, and has delegate invokes, and will generally have to do some extra dereferencing to get at captured variables etc. In most code this will be virtually undetectable, and more than afforded by the simpler to understand code.
With other LINQ providers like LINQ-to-SQL, then since the query can filter at the server it should be much better than a flat foreach, but most likely you wouldn't have done a blanket "select * from foo" anyway, so that isn't necessarily a fair comparison.
Re PLINQ; parallelism may reduce the elapsed time, but the total CPU time will usually increase a little due to the overheads of thread management etc.
LINQ is slower now, but it might get faster at some point. The good thing about LINQ is that you don't have to care about how it works. If a new method is thought up that's incredibly fast, the people at Microsoft can implement it without even telling you and your code would be a lot faster.
More importantly though, LINQ is just much easier to read. That should be enough reason.
It should probably be noted that the for loop is faster than the foreach. So for the original post, if you are worried about performance on a critical component like a renderer, use a for loop.
Reference:
In .NET, which loop runs faster, 'for' or 'foreach'?
You might get a performance boost if you use parallel LINQ for multi cores. See Parallel LINQ (PLINQ) (MSDN).
I was interested in this question, so I did a test just now. Using .NET Framework 4.5.2 on an Intel(R) Core(TM) i3-2328M CPU # 2.20GHz, 2200 Mhz, 2 Core(s) with 8GB ram running Microsoft Windows 7 Ultimate.
It looks like LINQ might be faster than for each loop. Here are the results I got:
Exists = True
Time = 174
Exists = True
Time = 149
It would be interesting if some of you could copy & paste this code in a console app and test as well.
Before testing with an object (Employee) I tried the same test with integers. LINQ was faster there as well.
public class Program
{
public class Employee
{
public int id;
public string name;
public string lastname;
public DateTime dateOfBirth;
public Employee(int id,string name,string lastname,DateTime dateOfBirth)
{
this.id = id;
this.name = name;
this.lastname = lastname;
this.dateOfBirth = dateOfBirth;
}
}
public static void Main() => StartObjTest();
#region object test
public static void StartObjTest()
{
List<Employee> items = new List<Employee>();
for (int i = 0; i < 10000000; i++)
{
items.Add(new Employee(i,"name" + i,"lastname" + i,DateTime.Today));
}
Test3(items, items.Count-100);
Test4(items, items.Count - 100);
Console.Read();
}
public static void Test3(List<Employee> items, int idToCheck)
{
Stopwatch s = new Stopwatch();
s.Start();
bool exists = false;
foreach (var item in items)
{
if (item.id == idToCheck)
{
exists = true;
break;
}
}
Console.WriteLine("Exists=" + exists);
Console.WriteLine("Time=" + s.ElapsedMilliseconds);
}
public static void Test4(List<Employee> items, int idToCheck)
{
Stopwatch s = new Stopwatch();
s.Start();
bool exists = items.Exists(e => e.id == idToCheck);
Console.WriteLine("Exists=" + exists);
Console.WriteLine("Time=" + s.ElapsedMilliseconds);
}
#endregion
#region int test
public static void StartIntTest()
{
List<int> items = new List<int>();
for (int i = 0; i < 10000000; i++)
{
items.Add(i);
}
Test1(items, -100);
Test2(items, -100);
Console.Read();
}
public static void Test1(List<int> items,int itemToCheck)
{
Stopwatch s = new Stopwatch();
s.Start();
bool exists = false;
foreach (var item in items)
{
if (item == itemToCheck)
{
exists = true;
break;
}
}
Console.WriteLine("Exists=" + exists);
Console.WriteLine("Time=" + s.ElapsedMilliseconds);
}
public static void Test2(List<int> items, int itemToCheck)
{
Stopwatch s = new Stopwatch();
s.Start();
bool exists = items.Contains(itemToCheck);
Console.WriteLine("Exists=" + exists);
Console.WriteLine("Time=" + s.ElapsedMilliseconds);
}
#endregion
}
This is actually quite a complex question. Linq makes certain things very easy to do, that if you implement them yourself, you might stumble over (e.g. linq .Except()). This particularly applies to PLinq, and especially to parallel aggregation as implemented by PLinq.
In general, for identical code, linq will be slower, because of the overhead of delegate invocation.
If, however, you are processing a large array of data, and applying relatively simple calculations to the elements, you will get a huge performance increase if:
You use an array to store the data.
You use a for loop to access each element (as opposed to foreach or linq).
Note: When benchmarking, please everyone remember - if you use the same array/list for two consecutive tests, the CPU cache will make the second one faster. *
Coming in .NET core 7 are some significant updates to LINQ performance of .Min .Max, .Average and .Sum
Reference: https://devblogs.microsoft.com/dotnet/performance_improvements_in_net_7/#linq
Here is a benchmark from the post.
If you compare to a ForEach loop, than it becomes apparent that in .NET 6 the ForEach loop was faster and in .NET 7 the LINQ methods:
this was the code of the benchmark using BenchmarkDotNet
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
public class Program
{
public static void Main()
{
BenchmarkRunner.Run<ForEachVsLinq>();
}
}
[SimpleJob(RuntimeMoniker.Net60)]
[SimpleJob(RuntimeMoniker.Net70)]
[MemoryDiagnoser(false)]
public class ForEachVsLinq
{
private int[] _intArray;
[GlobalSetup]
public void Setup()
{
var random = new Random();
var randomItems = Enumerable.Range(0, 500).Select(_ => random.Next(999));
this._intArray = randomItems.ToArray();
}
[Benchmark]
public void ForEachMin()
{
var min = int.MaxValue;
foreach (var i in this._intArray)
{
if ( i < min)
min = i;
}
Console.WriteLine(min);
}
[Benchmark]
public void Min()
{
var min = this._intArray.Min();
Console.WriteLine(min);
}
[Benchmark]
public void ForEachMax()
{
var max = 0;
foreach (var i in this._intArray)
{
if (i > max)
max = i;
}
Console.WriteLine(max);
}
[Benchmark]
public void Max()
{
var max = this._intArray.Max();
Console.WriteLine(max);
}
[Benchmark]
public void ForEachSum()
{
var sum = 0;
foreach (var i in this._intArray)
{
sum += i;
}
Console.WriteLine(sum);
}
[Benchmark]
public void Sum()
{
var sum = this._intArray.Sum();
Console.WriteLine(sum);
}
}
In .NET Core 6 and earlier versions the mentioned methods are slower than doing your own foreach loop and finding the min, max value, average or summarizing the objects in the array.
But in .NET Core 7, the performance increase makes these buildin LINQ methods actually a lot faster.
Nick Chapsas shows this in a benchmark video on YouTupe
So if you want to calculate the sum, min, max or average value, you should use the LINQ methods instead of a foreach loop from .NET Core 7 onwards (at least, from a performance point of view)

Some help understanding "yield"

In my everlasting quest to suck less I'm trying to understand the "yield" statement, but I keep encountering the same error.
The body of [someMethod] cannot be an iterator block because
'System.Collections.Generic.List< AClass>' is not an iterator interface type.
This is the code where I got stuck:
foreach (XElement header in headersXml.Root.Elements()){
yield return (ParseHeader(header));
}
What am I doing wrong? Can't I use yield in an iterator? Then what's the point?
In this example it said that List<ProductMixHeader> is not an iterator interface type.
ProductMixHeader is a custom class, but I imagine List is an iterator interface type, no?
--Edit--
Thanks for all the quick answers.
I know this question isn't all that new and the same resources keep popping up.
It turned out I was thinking I could return List<AClass> as a return type, but since List<T> isn't lazy, it cannot. Changing my return type to IEnumerable<T> solved the problem :D
A somewhat related question (not worth opening a new thread): is it worth giving IEnumerable<T> as a return type if I'm sure that 99% of the cases I'm going to go .ToList() anyway? What will the performance implications be?
A method using yield return must be declared as returning one of the following two interfaces:
IEnumerable<SomethingAppropriate>
IEnumerator<SomethingApropriate>
(thanks Jon and Marc for pointing out IEnumerator)
Example:
public IEnumerable<AClass> YourMethod()
{
foreach (XElement header in headersXml.Root.Elements())
{
yield return (ParseHeader(header));
}
}
yield is a lazy producer of data, only producing another item after the first has been retrieved, whereas returning a list will return everything in one go.
So there is a difference, and you need to declare the method correctly.
For more information, read Jon's answer here, which contains some very useful links.
It's a tricky topic. In a nutshell, it's an easy way of implementing IEnumerable and its friends. The compiler builds you a state machine, transforming parameters and local variables into instance variables in a new class. Complicated stuff.
I have a few resources on this:
Chapter 6 of C# in Depth (free download from that page)
Iterators, iterator blocks and data pipelines (article)
Iterator block implementation details (article)
"yield" creates an iterator block - a compiler generated class that can implement either IEnumerable[<T>] or IEnumerator[<T>]. Jon Skeet has a very good (and free) discussion of this in chapter 6 of C# in Depth.
But basically - to use "yield" your method must return an IEnumerable[<T>] or IEnumerator[<T>]. In this case:
public IEnumerable<AClass> SomeMethod() {
// ...
foreach (XElement header in headersXml.Root.Elements()){
yield return (ParseHeader(header));
}
}
List implements Ienumerable.
Here's an example that might shed some light on what you are trying to learn. I wrote this about 6 months
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace YieldReturnTest
{
public class PrimeFinder
{
private Boolean isPrime(int integer)
{
if (0 == integer)
return false;
if (3 > integer)
return true;
for (int i = 2; i < integer; i++)
{
if (0 == integer % i)
return false;
}
return true;
}
public IEnumerable<int> FindPrimes()
{
int i;
for (i = 1; i < 2147483647; i++)
{
if (isPrime(i))
{
yield return i;
}
}
}
}
class Program
{
static void Main(string[] args)
{
PrimeFinder primes = new PrimeFinder();
foreach (int i in primes.FindPrimes())
{
Console.WriteLine(i);
Console.ReadLine();
}
Console.ReadLine();
Console.ReadLine();
}
}
}
I highly recommend using Reflector to have a look at what yield actually does for you. You'll be able to see the full code of the class that the compiler generates for you when using yield, and I've found that people understand the concept much more quickly when they can see the low-level result (well, mid-level I guess).
To understand yield, you need to understand when to use IEnumerator and IEnumerable (because you have to use either of them). The following examples help you to understand the difference.
First, take a look at the following class, it implements two methods - one returning IEnumerator<int>, one returning IEnumerable<int>. I'll show you that there is a big difference in usage, although the code of the 2 methods is looking similar:
// 2 iterators, one as IEnumerator, one as IEnumerable
public class Iterator
{
public static IEnumerator<int> IterateOne(Func<int, bool> condition)
{
for(var i=1; condition(i); i++) { yield return i; }
}
public static IEnumerable<int> IterateAll(Func<int, bool> condition)
{
for(var i=1; condition(i); i++) { yield return i; }
}
}
Now, if you're using IterateOne you can do the following:
// 1. Using IEnumerator allows to get item by item
var i=Iterator.IterateOne(x => true); // iterate endless
// 1.a) get item by item
i.MoveNext(); Console.WriteLine(i.Current);
i.MoveNext(); Console.WriteLine(i.Current);
// 1.b) loop until 100
int j; while (i.MoveNext() && (j=i.Current)<=100) { Console.WriteLine(j); }
1.a) prints:
1
2
1.b) prints:
3
4
...
100
because it continues counting right after the 1.a) statements have been executed.
You can see that you can advance item by item using MoveNext().
In contrast, IterateAll allows you to use foreach and also LINQ statements for bigger comfort:
// 2. Using IEnumerable makes looping and LINQ easier
var k=Iterator.IterateAll(x => x<100); // limit iterator to 100
// 2.a) Use a foreach loop
foreach(var x in k){ Console.WriteLine(x); } // loop
// 2.b) LINQ: take 101..200 of endless iteration
var lst=Iterator.IterateAll(x=>true).Skip(100).Take(100).ToList(); // LINQ: take items
foreach(var x in lst){ Console.WriteLine(x); } // output list
2.a) prints:
1
2
...
99
2.b) prints:
101
102
...
200
Note: Since IEnumerator<T> and IEnumerable<T> are Generics, they can be used with any type. However, for simplicity I have used int in my examples for type T.
This means, you can use one of the return types IEnumerator<ProductMixHeader> or IEnumerable<ProductMixHeader> (the custom class you have mentioned in your question).
The type List<ProductMixHeader> does not implement any of these interfaces, which is the reason why you can't use it that way. But Example 2.b) is showing how you can create a list from it.
If you're creating a list by appending .ToList() then the implication is, that it will create a list of all elements in memory, while an IEnumerable allows lazy creation of its elements - in terms of performance, it means that elements are enumerated just in time - as late as possible, but as soon as you're using .ToList(), then all elements are created in memory. LINQ tries to optimize performance this way behind the scenes.
DotNetFiddle of all examples
#Ian P´s answer helped me a lot to understand yield and why it is used. One (major) use case for yield is in "foreach" loops after the "in" keyword not to return a fully completed list. Instead of returning a complete list at once, in each "foreach" loop only one item (the next item) is returned. So you will gain performance with yield in such cases.
I have rewritten #Ian P´s code for my better understanding to the following:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace YieldReturnTest
{
public class PrimeFinder
{
private Boolean isPrime(int integer)
{
if (0 == integer)
return false;
if (3 > integer)
return true;
for (int i = 2; i < integer; i++)
{
if (0 == integer % i)
return false;
}
return true;
}
public IEnumerable<int> FindPrimesWithYield()
{
int i;
for (i = 1; i < 2147483647; i++)
{
if (isPrime(i))
{
yield return i;
}
}
}
public IEnumerable<int> FindPrimesWithoutYield()
{
var primes = new List<int>();
int i;
for (i = 1; i < 2147483647; i++)
{
if (isPrime(i))
{
primes.Add(i);
}
}
return primes;
}
}
class Program
{
static void Main(string[] args)
{
PrimeFinder primes = new PrimeFinder();
Console.WriteLine("Finding primes until 7 with yield...very fast...");
foreach (int i in primes.FindPrimesWithYield()) // FindPrimesWithYield DOES NOT iterate over all integers at once, it returns item by item
{
if (i > 7)
{
break;
}
Console.WriteLine(i);
//Console.ReadLine();
}
Console.WriteLine("Finding primes until 7 without yield...be patient it will take lonkg time...");
foreach (int i in primes.FindPrimesWithoutYield()) // FindPrimesWithoutYield DOES iterate over all integers at once, it returns the complete list of primes at once
{
if (i > 7)
{
break;
}
Console.WriteLine(i);
//Console.ReadLine();
}
Console.ReadLine();
Console.ReadLine();
}
}
}
What does the method you're using this in look like? I don't think this can be used in just a loop by itself.
For example...
public IEnumerable<string> GetValues() {
foreach(string value in someArray) {
if (value.StartsWith("A")) { yield return value; }
}
}

Categories