Is C# LINQ OrderBy threadsafe when used with ConcurrentDictionary<Tkey, TValue>? - c#

My working assumption is that LINQ is thread-safe when used with the System.Collections.Concurrent collections (including ConcurrentDictionary).
(Other Overflow posts seem to agree: link)
However, an inspection of the implementation of the LINQ OrderBy extension method shows that it appears not to be threadsafe with the subset of concurrent collections which implement ICollection (e.g. ConcurrentDictionary).
The OrderedEnumerable GetEnumerator (source here) constructs an instance of a Buffer struct (source here) which tries to cast the collection to an ICollection (which ConcurrentDictionary implements) and then performs a collection.CopyTo with an array initialised to the size of the collection.
Therefore, if the ConcurrentDictionary (as the concrete ICollection in this case) grows in size during the OrderBy operation, between initialising the array and copying into it, this operation will throw.
The following test code shows this exception:
(Note: I appreciate that performing an OrderBy on a thread-safe collection which is changing underneath you is not that meaningful, but I do not believe it should throw)
using System;
using System.Collections.Concurrent;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Program
{
class Program
{
static void Main(string[] args)
{
try
{
int loop = 0;
while (true) //Run many loops until exception thrown
{
Console.WriteLine($"Loop: {++loop}");
_DoConcurrentDictionaryWork().Wait();
}
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
}
private static async Task _DoConcurrentDictionaryWork()
{
var concurrentDictionary = new ConcurrentDictionary<int, object>();
var keyGenerator = new Random();
var tokenSource = new CancellationTokenSource();
var orderByTaskLoop = Task.Run(() =>
{
var token = tokenSource.Token;
while (token.IsCancellationRequested == false)
{
//Keep ordering concurrent dictionary on a loop
var orderedPairs = concurrentDictionary.OrderBy(x => x.Key).ToArray(); //THROWS EXCEPTION HERE
//...do some more work with ordered snapshot...
}
});
var updateDictTaskLoop = Task.Run(() =>
{
var token = tokenSource.Token;
while (token.IsCancellationRequested == false)
{
//keep mutating dictionary on a loop
var key = keyGenerator.Next(0, 1000);
concurrentDictionary[key] = new object();
}
});
//Wait for 1 second
await Task.Delay(TimeSpan.FromSeconds(1));
//Cancel and dispose token
tokenSource.Cancel();
tokenSource.Dispose();
//Wait for orderBy and update loops to finish (now token cancelled)
await Task.WhenAll(orderByTaskLoop, updateDictTaskLoop);
}
}
}
That the OrderBy throws an exception leads to one of a few possible conclusions:
1) My assumption about LINQ being threadsafe with concurrent collections is incorrect, and it is only safe to perform LINQ on collections (be they concurrent or not) which are not mutating during the LINQ query
2) There is a bug with the implementation of LINQ OrderBy and it is incorrect for the implementation to try and cast the source collection to an ICollection and try and perform the collection copy (and It should just drop through to its default behaviour iterating the IEnumerable).
3) I have misunderstood what is going on here...
Thoughts much appreciated!

It's not stated anywhere that OrderBy (or other LINQ methods) should always use GetEnumerator of source IEnumerable or that it should be thread safe on concurrent collections. All that is promised is this method
Sorts the elements of a sequence in ascending order according to a
key.
ConcurrentDictionary is not thread-safe in some global sense either. It's thread-safe with respect to other operations performed on it. Even more, documentation says that
All public and protected members of ConcurrentDictionary
are thread-safe and may be used concurrently from multiple threads.
However, members accessed through one of the interfaces the
ConcurrentDictionary implements, including extension
methods, are not guaranteed to be thread safe and may need to be
synchronized by the caller.
So, your understanding is correct (OrderBy will see IEnumerable you pass to it is really ICollection, will then get length of that collection, allocate buffer of that size, then will call ICollection.CopyTo, and this is of course not thread safe on any type of collection), but it's not a bug in OrderBy because neither OrderBy nor ConcurrentDictionary ever promised what you assume.
If you want to do OrderBy in a thread safe way on ConcurrentDictionary, you need to rely on methods that are promised to be thread safe. For example:
// note: this is NOT IEnumerable.ToArray()
// but public ToArray() method of ConcurrentDictionary itself
// it is guaranteed to be thread safe with respect to other operations
// on this dictionary
var snapshot = concurrentDictionary.ToArray();
// we are working on snapshot so no one other thread can modify it
// of course at this point real contents of dictionary might not be
// the same as our snapshot
var sorted = snapshot.OrderBy(c => c.Key);
If you don't want to allocate additional array (with ToArray), you can use Select(c => c) and it will work in this case, but then we are again in moot territory and relying on something to be safe to use in situation it was not promised to (Select will also not always enumerate your collection. If collection is array or list - it will shortcut and use indexers instead). So you can create extension method like this:
public static class Extensions {
public static IEnumerable<T> ForceEnumerate<T>(this ICollection<T> collection) {
foreach (var item in collection)
yield return item;
}
}
And use it like this if you want to be safe and don't want to allocate array:
concurrentDictionary.ForceEnumerate().OrderBy(c => c.Key).ToArray();
In this case we are forcing enumeration of ConcurrentDictionary (which we know is safe from documentation) and then pass that to OrderBy knowing that it cannot do any harm with that pure IEnumerable. Note that as correctly pointed out in comments by mjwills, this is not exactly the same as ToArray, because ToArray produces snapshot (locks collection preventing modifications while building array) and Select \ yield does not acquire any locks (so items might be added\removed right when enumeration is in progress). Though I doubt it matters when doing things like described in question - in both cases after OrderBy is completed - you have no idea whether your ordered results reflect current state of collection or not.

Related

Updating List in ConcurrentDictionary

So I have a IList as the value in my ConcurrentDictionary.
ConcurrentDictionary<int, IList<string>> list1 = new ConcurrentDictionary<int, IList<string>>;
In order to update a value in a list I do this:
if (list1.ContainsKey[key])
{
IList<string> templist;
list1.TryGetValue(key, out templist);
templist.Add("helloworld");
}
However, does adding a string to templist update the ConcurrentDictionary? If so, is the update thread-safe so that no data corruption would occur?
Or is there a better way to update or create a list inside the ConcurrentDictionary
EDIT
If I were to use a ConcurrentBag instead of a List, how would I implement this? More specifically, how could I update it? ConcurrentDictionary's TryUpdate method feels a bit excessive.
Does ConcurrentBag.Add update the ConcurrentDictionary in a thread-safe mannar?
ConcurrentDictionary<int, ConcurrentBag<string>> list1 = new ConcurrentDictionary<int, ConcurrentBag<string>>
Firstly, there's no need to do ContainsKey() and TryGetValue().
You should just do this:
IList<string> templist;
if (list1.TryGetValue(key, out templist))
templist.Add("helloworld");
In fact your code as written has a race condition.
Inbetween one thread calling ContainsKey() and TryGetValue() a different thread may have removed the item with that key. Then TryGetValue() will return tempList as null, and then you'll get a null reference exception when you call tempList.Add().
Secondly, yes: There's another possible threading issue here. You don't know that the IList<string> stored inside the dictionary is threadsafe.
Therefore calling tempList.Add() is not guaranteed to be safe.
You could use ConcurrentQueue<string> instead of IList<string>. This is probably going to be the most robust solution.
Note that simply locking access to the IList<string> wouldn't be sufficient.
This is no good:
if (list1.TryGetValue(key, out templist))
{
lock (locker)
{
templist.Add("helloworld");
}
}
unless you also use the same lock everywhere else that the IList may be accessed. This is not easy to achieve, hence it's better to either use a ConcurrentQueue<> or add locking to this class and change the architecture so that no other threads have access to the underlying IList.
Operations on a thread-safe dictionary are thread-safe by key, so to say. So as long as you access your values (in this case an IList<T>) only from one thread, you're good to go.
The ConcurrentDictionary does not prevent two threads at the same time to access the value beloning to one key.
You can use ConcurrentDictionary.AddOrUpdate method to add item to list in thread-safe way. Its simpler and should work fine.
var list1 = new ConcurrentDictionary<int, IList<string>>();
list1.AddOrUpdate(key,
new List<string>() { "test" }, (k, l) => { l.Add("test"); return l;});
UPD
According to docs and sources, factories, which was passed to AddOrUpdate method will be run out of lock scope, so calling List methods inside factory delegate is NOT thread safe.
See comments under this answer.
The ConcurrentDictionary has no effect on whether you can apply changes to value objects in a thread-safe manner or not. That is the reponsiblity of the value object (the IList-implementation in your case).
Looking at the answers of No ConcurrentList<T> in .Net 4.0? there are some good reasons why there is no ConcurrentList implementation in .net.
Basically you have to take care of thread-safe changes yourself. The most simple way is to use the lock operator. E.g.
lock (templist)
{
templist.Add("hello world");
}
Another way is to use the ConcurrentBag in the .net Framework. But this way is only useful for you, if you do not rely on the IList interface and the ordering of items.
it has been already mentioned about what would be the best solution ConcurrentDictionary with ConcurrentBag. Just going to add how to do that
ConcurrentBag<string> bag= new ConcurrentBag<string>();
bag.Add("inputstring");
list1.AddOrUpdate(key,bag,(k,v)=>{
v.Add("inputString");
return v;
});
does adding a string to templist update the ConcurrentDictionary?
It does not.
Your thread safe collection (Dictionary) holds references to non-thread-safe collections (IList). So changing those is not thread safe.
I suppose you should consider using mutexes.
If you use ConcurrentBag<T>:
var dic = new ConcurrentDictionary<int, ConcurrentBag<string>>();
Something like this could work OK:
public static class ConcurentDictionaryExt
{
public static ConcurrentBag<V> AddToInternal<K, V>(this ConcurrentDictionary<K, ConcurrentBag<V>> dic, K key, V value)
=> dic.AddOrUpdate(key,
k => new ConcurrentBag<V>() { value },
(k, existingBag) =>
{
existingBag.Add(value);
return existingBag;
}
);
public static ConcurrentBag<V> AddRangeToInternal<K, V>(this ConcurrentDictionary<K, ConcurrentBag<V>> dic, K key, IEnumerable<V> values)
=> dic.AddOrUpdate(key,
k => new ConcurrentBag<V>(values),
(k, existingBag) =>
{
foreach (var v in values)
existingBag.Add(v);
return existingBag;
}
);
}
I didn't test it yet :)

How to write copy-on-write list in .NET

How to write a thread-safe list using copy-on-write model in .NET?
Below is my current implementation, but after lots of reading about threading, memory barriers, etc, I know that I need to be cautious when multi-threading without locks is involved. Could someone comment if this is the correct implementation?
class CopyOnWriteList
{
private List<string> list = new List<string>();
private object listLock = new object();
public void Add(string item)
{
lock (listLock)
{
list = new List<string>(list) { item };
}
}
public void Remove(string item)
{
lock (listLock)
{
var tmpList = new List<string>(list);
tmpList.Remove(item);
list = tmpList;
}
}
public bool Contains(string item)
{
return list.Contains(item);
}
public string Get(int index)
{
return list[index];
}
}
EDIT
To be more specific: is above code thread safe, or should I add something more? Also, will all thread eventually see change in list reference? Or maybe I should add volatile keyword on list field or Thread.MemoryBarrier in Contains method between accessing reference and calling method on it?
Here is for example Java implementation, looks like my above code, but is such approach also thread-safe in .NET?
And here is the same question, but also in Java.
Here is another question related to this one.
Implementation is correct because reference assignment is atomic in accordance to Atomicity of variable references. I would add volatile to list.
Your approach looks correct, but I'd recommend using a string[] rather than a List<string> to hold your data. When you're adding an item, you know exactly how many items are going to be in the resulting collection, so you can create a new array of exactly the size required. When removing an item, you can grab a copy of the list reference and search it for your item before making a copy; if it turns out that the item doesn't exist, there's no need to remove it. If it does exist, you can create a new array of the exact required size, and copy to the new array all the items preceding or following the item to be removed.
Another thing you might want to consider would be to use a int[1] as your lock flag, and use a pattern something like:
static string[] withAddedItem(string[] oldList, string dat)
{
string[] result = new string[oldList.Length+1];
Array.Copy(oldList, result, oldList.Length);
return result;
}
int Add(string dat) // Returns index of newly-added item
{
string[] oldList, newList;
if (listLock[0] == 0)
{
oldList = list;
newList = withAddedItem(oldList, dat);
if (System.Threading.Interlocked.CompareExchange(list, newList, oldList) == oldList)
return newList.Length;
}
System.Threading.Interlocked.Increment(listLock[0]);
lock (listLock)
{
do
{
oldList = list;
newList = withAddedItem(oldList, dat);
} while (System.Threading.Interlocked.CompareExchange(list, newList, oldList) != oldList);
}
System.Threading.Interlocked.Decrement(listLock[0]);
return newList.Length;
}
If there is no write contention, the CompareExchange will succeed without having to acquire a lock. If there is write contention, writes will be serialized by the lock. Note that the lock here is neither necessary nor sufficient to ensure correctness. Its purpose is to avoid thrashing in the event of write contention. It is possible that thread #1 might get past its first "if" test, and get task task-switched out while many other threads simultaneously try to write the list and start using the lock. If that occurs, thread #1 might then "surprise" the thread in the lock by performing its own CompareExchange. Such an action would result in the lock-holding thread having to waste time making a new array, but that situation should arise rarely enough that the occasional cost of an extra array copy shouldn't matter.
Yes, it is thread-safe:
Collection modifications in Add and Remove are done on separate collections, so it avoids concurrent access to the same collection from Add and Remove or from Add/Remove and Contains/Get.
Assignment of the new collection is done inside lock, which is just pair of Monitor.Enter and Monitor.Exit, which both do a full memory barrier as noted here, which means that after the lock all threads should observe the new value of list field.

ToList()-- does it create a new list?

Let's say I have a class
public class MyObject
{
public int SimpleInt{get;set;}
}
And I have a List<MyObject>, and I ToList() it and then change one of the SimpleInt, will my change be propagated back to the original list. In other words, what would be the output of the following method?
public void RunChangeList()
{
var objs = new List<MyObject>(){new MyObject(){SimpleInt=0}};
var whatInt = ChangeToList(objs );
}
public int ChangeToList(List<MyObject> objects)
{
var objectList = objects.ToList();
objectList[0].SimpleInt=5;
return objects[0].SimpleInt;
}
Why?
P/S: I'm sorry if it seems obvious to find out. But I don't have compiler with me now...
Yes, ToList will create a new list, but because in this case MyObject is a reference type then the new list will contain references to the same objects as the original list.
Updating the SimpleInt property of an object referenced in the new list will also affect the equivalent object in the original list.
(If MyObject was declared as a struct rather than a class then the new list would contain copies of the elements in the original list, and updating a property of an element in the new list would not affect the equivalent element in the original list.)
From the Reflector'd source:
public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
return new List<TSource>(source);
}
So yes, your original list won't be updated (i.e. additions or removals) however the referenced objects will.
ToList will always create a new list, which will not reflect any subsequent changes to the collection.
However, it will reflect changes to the objects themselves (Unless they're mutable structs).
In other words, if you replace an object in the original list with a different object, the ToList will still contain the first object.
However, if you modify one of the objects in the original list, the ToList will still contain the same (modified) object.
Yes, it creates a new list. This is by design.
The list will contain the same results as the original enumerable sequence, but materialized into a persistent (in-memory) collection. This allows you to consume the results multiple times without incurring the cost of recomputing the sequence.
The beauty of LINQ sequences is that they are composable. Often, the IEnumerable<T> you get is the result of combining multiple filtering, ordering, and/or projection operations. Extension methods like ToList() and ToArray() allow you to convert the computed sequence into a standard collection.
The accepted answer correctly addresses the OP's question based on his example. However, it only applies when ToList is applied to a concrete collection; it does not hold when the elements of the source sequence have yet to be instantiated (due to deferred execution). In case of the latter, you might get a new set of items each time you call ToList (or enumerate the sequence).
Here is an adaptation of the OP's code to demonstrate this behaviour:
public static void RunChangeList()
{
var objs = Enumerable.Range(0, 10).Select(_ => new MyObject() { SimpleInt = 0 });
var whatInt = ChangeToList(objs); // whatInt gets 0
}
public static int ChangeToList(IEnumerable<MyObject> objects)
{
var objectList = objects.ToList();
objectList.First().SimpleInt = 5;
return objects.First().SimpleInt;
}
Whilst the above code may appear contrived, this behaviour can appear as a subtle bug in other scenarios. See my other example for a situation where it causes tasks to get spawned repeatedly.
A new list is created but the items in it are references to the orginal items (just like in the original list). Changes to the list itself are independent, but to the items will find the change in both lists.
Just stumble upon this old post and thought of adding my two cents. Generally, if I am in doubt, I quickly use the GetHashCode() method on any object to check the identities. So for above -
public class MyObject
{
public int SimpleInt { get; set; }
}
class Program
{
public static void RunChangeList()
{
var objs = new List<MyObject>() { new MyObject() { SimpleInt = 0 } };
Console.WriteLine("objs: {0}", objs.GetHashCode());
Console.WriteLine("objs[0]: {0}", objs[0].GetHashCode());
var whatInt = ChangeToList(objs);
Console.WriteLine("whatInt: {0}", whatInt.GetHashCode());
}
public static int ChangeToList(List<MyObject> objects)
{
Console.WriteLine("objects: {0}", objects.GetHashCode());
Console.WriteLine("objects[0]: {0}", objects[0].GetHashCode());
var objectList = objects.ToList();
Console.WriteLine("objectList: {0}", objectList.GetHashCode());
Console.WriteLine("objectList[0]: {0}", objectList[0].GetHashCode());
objectList[0].SimpleInt = 5;
return objects[0].SimpleInt;
}
private static void Main(string[] args)
{
RunChangeList();
Console.ReadLine();
}
And answer on my machine -
objs: 45653674
objs[0]: 41149443
objects: 45653674
objects[0]: 41149443
objectList: 39785641
objectList[0]: 41149443
whatInt: 5
So essentially the object that list carries remain the same in above code. Hope the approach helps.
I think that this is equivalent to asking if ToList does a deep or shallow copy. As ToList has no way to clone MyObject, it must do a shallow copy, so the created list contains the same references as the original one, so the code returns 5.
ToList will create a brand new list.
If the items in the list are value types, they will be directly updated, if they are reference types, any changes will be reflected back in the referenced objects.
In the case where the source object is a true IEnumerable (i.e. not just a collection packaged an as enumerable), ToList() may NOT return the same object references as in the original IEnumerable. It will return a new List of objects, but those objects may not be the same or even Equal to the objects yielded by the IEnumerable when it is enumerated again
var objectList = objects.ToList();
objectList[0].SimpleInt=5;
This will update the original object as well. The new list will contain references to the objects contained within it, just like the original list. You can change the elements either and the update will be reflected in the other.
Now if you update a list (adding or deleting an item) that will not be reflected in the other list.
I don't see anywhere in the documentation that ToList() is always guaranteed to return a new list. If an IEnumerable is a List, it may be more efficient to check for this and simply return the same List.
The worry is that sometimes you may want to be absolutely sure that the returned List is != to the original List. Because Microsoft doesn't document that ToList will return a new List, we can't be sure (unless someone found that documentation). It could also change in the future, even if it works now.
new List(IEnumerable enumerablestuff) is guaranteed to return a new List. I would use this instead.

Is this LINQ dynamic orderby method thread safe?

After searching online and going through past stackoverflow posts for a suitable implementation for dynamic ordering with linq, i came up with my own implementation that borrows a few things from other solutions i have previously seen.
What i need to know is if this implementation is threadsafe? I don't believe it is as i am passing an enumerable generic type object (reference type) as a parameter into the static method but i would like to know if i am missing anything else and hopefully how to make it completely threadsafe.
public class OrderByHelper
{
public static IEnumerable<T> OrderBy<T>(IQueryable<T> items, string sortColumn, string sortDirection, int pageNumber, int pageSize)
{
Type t = typeof(T).GetProperty(sortColumn).PropertyType;
return (IEnumerable<T>)(typeof(OrderByHelper)
.GetMethod("OrderByKnownType")
.MakeGenericMethod(new[] { typeof(T), t })
.Invoke(null, new object[] { items, sortColumn, sortDirection, pageNumber, pageSize }));
}
public static IEnumerable<K> OrderByKnownType<K, T>(IQueryable<K> items, string sortColumn, string sortDirection, int pageNumber, int pageSize)
{
var param = Expression.Parameter(typeof(K), "i");
var mySortExpression = Expression.Lambda<Func<K, T>>(Expression.Property(param, sortColumn), param);
if (!string.IsNullOrEmpty(sortDirection))
{
if (sortDirection == "ASC")
return items.OrderBy(mySortExpression).Skip((pageNumber - 1) * pageSize).Take(pageSize);
else
return items.OrderByDescending(mySortExpression).Skip((pageNumber - 1) * pageSize).Take(pageSize);
}
else
throw new InvalidOperationException("No sorting direction specified.");
}
}
The answer is simple:
It is thread safe if your original collection is thread safe.
All extension methods in LINQ merely call .GetEnumerator() of the original collection. Sorting and Ordering doesn't manipulate the original collection, but rather lets you enumerate it in a sorted order. Thus, you only do read operations on the data. As a general rule of thumb, if you only do read data, you do not need to implement any thread safety.
I'm tempted to say that in 99% of cases you don't need any thread safety, because you collect data only once and then expose the LINQ functionality. You might only need a thread safe collection if you want to create a framework that does not instantiate new collections when refreshing data, but rather keeps re-using the same (observable) collection instance that synchronizes itself with the database data in a fancy way.
But I don't know about your exact scenario. If you really do need thread safety, then it depends if you have control over the code where the original collection is instantiated and the data is added.
If you are using merely LINQ-to-objects, then it's entirely in your control to create an instance of a thread safe collection class at that spot.
If you're using LINQ-to-SQL or anything, then it might be difficult because the original collection that collects the data from the database is probably instantiated deep within the provider and hidden from you. I haven't looked at it though if there are extension points where you can override stuff to use a thread safe collection instead.

What is the yield keyword used for in C#?

In the How Can I Expose Only a Fragment of IList<> question one of the answers had the following code snippet:
IEnumerable<object> FilteredList()
{
foreach(object item in FullList)
{
if(IsItemInPartialList(item))
yield return item;
}
}
What does the yield keyword do there? I've seen it referenced in a couple places, and one other question, but I haven't quite figured out what it actually does. I'm used to thinking of yield in the sense of one thread yielding to another, but that doesn't seem relevant here.
The yield contextual keyword actually does quite a lot here.
The function returns an object that implements the IEnumerable<object> interface. If a calling function starts foreaching over this object, the function is called again until it "yields". This is syntactic sugar introduced in C# 2.0. In earlier versions you had to create your own IEnumerable and IEnumerator objects to do stuff like this.
The easiest way understand code like this is to type-in an example, set some breakpoints and see what happens. Try stepping through this example:
public void Consumer()
{
foreach(int i in Integers())
{
Console.WriteLine(i.ToString());
}
}
public IEnumerable<int> Integers()
{
yield return 1;
yield return 2;
yield return 4;
yield return 8;
yield return 16;
yield return 16777216;
}
When you step through the example, you'll find the first call to Integers() returns 1. The second call returns 2 and the line yield return 1 is not executed again.
Here is a real-life example:
public IEnumerable<T> Read<T>(string sql, Func<IDataReader, T> make, params object[] parms)
{
using (var connection = CreateConnection())
{
using (var command = CreateCommand(CommandType.Text, sql, connection, parms))
{
command.CommandTimeout = dataBaseSettings.ReadCommandTimeout;
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
yield return make(reader);
}
}
}
}
}
Iteration. It creates a state machine "under the covers" that remembers where you were on each additional cycle of the function and picks up from there.
Yield has two great uses,
It helps to provide custom iteration without creating temp collections.
It helps to do stateful iteration.
In order to explain above two points more demonstratively, I have created a simple video you can watch it here
Recently Raymond Chen also ran an interesting series of articles on the yield keyword.
The implementation of iterators in C# and its consequences (part 1)
The implementation of iterators in C# and its consequences (part 2)
The implementation of iterators in C# and its consequences (part 3)
The implementation of iterators in C# and its consequences (part 4)
While it's nominally used for easily implementing an iterator pattern, but can be generalized into a state machine. No point in quoting Raymond, the last part also links to other uses (but the example in Entin's blog is esp good, showing how to write async safe code).
At first sight, yield return is a .NET sugar to return an IEnumerable.
Without yield, all the items of the collection are created at once:
class SomeData
{
public SomeData() { }
static public IEnumerable<SomeData> CreateSomeDatas()
{
return new List<SomeData> {
new SomeData(),
new SomeData(),
new SomeData()
};
}
}
Same code using yield, it returns item by item:
class SomeData
{
public SomeData() { }
static public IEnumerable<SomeData> CreateSomeDatas()
{
yield return new SomeData();
yield return new SomeData();
yield return new SomeData();
}
}
The advantage of using yield is that if the function consuming your data simply needs the first item of the collection, the rest of the items won't be created.
The yield operator allows the creation of items as it is demanded. That's a good reason to use it.
A list or array implementation loads all of the items immediately whereas the yield implementation provides a deferred execution solution.
In practice, it is often desirable to perform the minimum amount of work as needed in order to reduce the resource consumption of an application.
For example, we may have an application that process millions of records from a database. The following benefits can be achieved when we use IEnumerable in a deferred execution pull-based model:
Scalability, reliability and predictability are likely to improve since the number of records does not significantly affect the application’s resource requirements.
Performance and responsiveness are likely to improve since processing can start immediately instead of waiting for the entire collection to be loaded first.
Recoverability and utilisation are likely to improve since the application can be stopped, started, interrupted or fail. Only the items in progress will be lost compared to pre-fetching all of the data where only using a portion of the results was actually used.
Continuous processing is possible in environments where constant workload streams are added.
Here is a comparison between build a collection first such as a list compared to using yield.
List Example
public class ContactListStore : IStore<ContactModel>
{
public IEnumerable<ContactModel> GetEnumerator()
{
var contacts = new List<ContactModel>();
Console.WriteLine("ContactListStore: Creating contact 1");
contacts.Add(new ContactModel() { FirstName = "Bob", LastName = "Blue" });
Console.WriteLine("ContactListStore: Creating contact 2");
contacts.Add(new ContactModel() { FirstName = "Jim", LastName = "Green" });
Console.WriteLine("ContactListStore: Creating contact 3");
contacts.Add(new ContactModel() { FirstName = "Susan", LastName = "Orange" });
return contacts;
}
}
static void Main(string[] args)
{
var store = new ContactListStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection.");
Console.ReadLine();
}
Console Output
ContactListStore: Creating contact 1
ContactListStore: Creating contact 2
ContactListStore: Creating contact 3
Ready to iterate through the collection.
Note: The entire collection was loaded into memory without even asking for a single item in the list
Yield Example
public class ContactYieldStore : IStore<ContactModel>
{
public IEnumerable<ContactModel> GetEnumerator()
{
Console.WriteLine("ContactYieldStore: Creating contact 1");
yield return new ContactModel() { FirstName = "Bob", LastName = "Blue" };
Console.WriteLine("ContactYieldStore: Creating contact 2");
yield return new ContactModel() { FirstName = "Jim", LastName = "Green" };
Console.WriteLine("ContactYieldStore: Creating contact 3");
yield return new ContactModel() { FirstName = "Susan", LastName = "Orange" };
}
}
static void Main(string[] args)
{
var store = new ContactYieldStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection.");
Console.ReadLine();
}
Console Output
Ready to iterate through the collection.
Note: The collection wasn't executed at all. This is due to the "deferred execution" nature of IEnumerable. Constructing an item will only occur when it is really required.
Let's call the collection again and obverse the behaviour when we fetch the first contact in the collection.
static void Main(string[] args)
{
var store = new ContactYieldStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection");
Console.WriteLine("Hello {0}", contacts.First().FirstName);
Console.ReadLine();
}
Console Output
Ready to iterate through the collection
ContactYieldStore: Creating contact 1
Hello Bob
Nice! Only the first contact was constructed when the client "pulled" the item out of the collection.
yield return is used with enumerators. On each call of yield statement, control is returned to the caller but it ensures that the callee's state is maintained. Due to this, when the caller enumerates the next element, it continues execution in the callee method from statement immediately after the yield statement.
Let us try to understand this with an example. In this example, corresponding to each line I have mentioned the order in which execution flows.
static void Main(string[] args)
{
foreach (int fib in Fibs(6))//1, 5
{
Console.WriteLine(fib + " ");//4, 10
}
}
static IEnumerable<int> Fibs(int fibCount)
{
for (int i = 0, prevFib = 0, currFib = 1; i < fibCount; i++)//2
{
yield return prevFib;//3, 9
int newFib = prevFib + currFib;//6
prevFib = currFib;//7
currFib = newFib;//8
}
}
Also, the state is maintained for each enumeration. Suppose, I have another call to Fibs() method then the state will be reset for it.
Intuitively, the keyword returns a value from the function without leaving it, i.e. in your code example it returns the current item value and then resumes the loop. More formally, it is used by the compiler to generate code for an iterator. Iterators are functions that return IEnumerable objects. The MSDN has several articles about them.
If I understand this correctly, here's how I would phrase this from the perspective of the function implementing IEnumerable with yield.
Here's one.
Call again if you need another.
I'll remember what I already gave you.
I'll only know if I can give you another when you call again.
Here is a simple way to understand the concept:
The basic idea is, if you want a collection that you can use "foreach" on, but gathering the items into the collection is expensive for some reason (like querying them out of a database), AND you will often not need the entire collection, then you create a function that builds the collection one item at a time and yields it back to the consumer (who can then terminate the collection effort early).
Think of it this way: You go to the meat counter and want to buy a pound of sliced ham. The butcher takes a 10-pound ham to the back, puts it on the slicer machine, slices the whole thing, then brings the pile of slices back to you and measures out a pound of it. (OLD way).
With yield, the butcher brings the slicer machine to the counter, and starts slicing and "yielding" each slice onto the scale until it measures 1-pound, then wraps it for you and you're done. The Old Way may be better for the butcher (lets him organize his machinery the way he likes), but the New Way is clearly more efficient in most cases for the consumer.
The yield keyword allows you to create an IEnumerable<T> in the form on an iterator block. This iterator block supports deferred executing and if you are not familiar with the concept it may appear almost magical. However, at the end of the day it is just code that executes without any weird tricks.
An iterator block can be described as syntactic sugar where the compiler generates a state machine that keeps track of how far the enumeration of the enumerable has progressed. To enumerate an enumerable, you often use a foreach loop. However, a foreach loop is also syntactic sugar. So you are two abstractions removed from the real code which is why it initially might be hard to understand how it all works together.
Assume that you have a very simple iterator block:
IEnumerable<int> IteratorBlock()
{
Console.WriteLine("Begin");
yield return 1;
Console.WriteLine("After 1");
yield return 2;
Console.WriteLine("After 2");
yield return 42;
Console.WriteLine("End");
}
Real iterator blocks often have conditions and loops but when you check the conditions and unroll the loops they still end up as yield statements interleaved with other code.
To enumerate the iterator block a foreach loop is used:
foreach (var i in IteratorBlock())
Console.WriteLine(i);
Here is the output (no surprises here):
Begin
1
After 1
2
After 2
42
End
As stated above foreach is syntactic sugar:
IEnumerator<int> enumerator = null;
try
{
enumerator = IteratorBlock().GetEnumerator();
while (enumerator.MoveNext())
{
var i = enumerator.Current;
Console.WriteLine(i);
}
}
finally
{
enumerator?.Dispose();
}
In an attempt to untangle this I have crated a sequence diagram with the abstractions removed:
The state machine generated by the compiler also implements the enumerator but to make the diagram more clear I have shown them as separate instances. (When the state machine is enumerated from another thread you do actually get separate instances but that detail is not important here.)
Every time you call your iterator block a new instance of the state machine is created. However, none of your code in the iterator block is executed until enumerator.MoveNext() executes for the first time. This is how deferred executing works. Here is a (rather silly) example:
var evenNumbers = IteratorBlock().Where(i => i%2 == 0);
At this point the iterator has not executed. The Where clause creates a new IEnumerable<T> that wraps the IEnumerable<T> returned by IteratorBlock but this enumerable has yet to be enumerated. This happens when you execute a foreach loop:
foreach (var evenNumber in evenNumbers)
Console.WriteLine(eventNumber);
If you enumerate the enumerable twice then a new instance of the state machine is created each time and your iterator block will execute the same code twice.
Notice that LINQ methods like ToList(), ToArray(), First(), Count() etc. will use a foreach loop to enumerate the enumerable. For instance ToList() will enumerate all elements of the enumerable and store them in a list. You can now access the list to get all elements of the enumerable without the iterator block executing again. There is a trade-off between using CPU to produce the elements of the enumerable multiple times and memory to store the elements of the enumeration to access them multiple times when using methods like ToList().
One major point about Yield keyword is Lazy Execution. Now what I mean by Lazy Execution is to execute when needed. A better way to put it is by giving an example
Example: Not using Yield i.e. No Lazy Execution.
public static IEnumerable<int> CreateCollectionWithList()
{
var list = new List<int>();
list.Add(10);
list.Add(0);
list.Add(1);
list.Add(2);
list.Add(20);
return list;
}
Example: using Yield i.e. Lazy Execution.
public static IEnumerable<int> CreateCollectionWithYield()
{
yield return 10;
for (int i = 0; i < 3; i++)
{
yield return i;
}
yield return 20;
}
Now when I call both methods.
var listItems = CreateCollectionWithList();
var yieldedItems = CreateCollectionWithYield();
you will notice listItems will have a 5 items inside it (hover your mouse on listItems while debugging).
Whereas yieldItems will just have a reference to the method and not the items.
That means it has not executed the process of getting items inside the method. A very efficient way of getting data only when needed.
Actual implementation of yield can be seen in ORM like Entity Framework and NHibernate etc.
The C# yield keyword, to put it simply, allows many calls to a body of code, referred to as an iterator, that knows how to return before it's done and, when called again, continues where it left off - i.e. it helps an iterator become transparently stateful per each item in a sequence that the iterator returns in successive calls.
In JavaScript, the same concept is called Generators.
It is a very simple and easy way to create an enumerable for your object. The compiler creates a class that wraps your method and that implements, in this case, IEnumerable<object>. Without the yield keyword, you'd have to create an object that implements IEnumerable<object>.
It's producing enumerable sequence. What it does is actually creating local IEnumerable sequence and returning it as a method result
This link has a simple example
Even simpler examples are here
public static IEnumerable<int> testYieldb()
{
for(int i=0;i<3;i++) yield return 4;
}
Notice that yield return won't return from the method. You can even put a WriteLine after the yield return
The above produces an IEnumerable of 4 ints 4,4,4,4
Here with a WriteLine. Will add 4 to the list, print abc, then add 4 to the list, then complete the method and so really return from the method(once the method has completed, as would happen with a procedure without a return). But this would have a value, an IEnumerable list of ints, that it returns on completion.
public static IEnumerable<int> testYieldb()
{
yield return 4;
console.WriteLine("abc");
yield return 4;
}
Notice also that when you use yield, what you are returning is not of the same type as the function. It's of the type of an element within the IEnumerable list.
You use yield with the method's return type as IEnumerable. If the method's return type is int or List<int> and you use yield, then it won't compile. You can use IEnumerable method return type without yield but it seems maybe you can't use yield without IEnumerable method return type.
And to get it to execute you have to call it in a special way.
static void Main(string[] args)
{
testA();
Console.Write("try again. the above won't execute any of the function!\n");
foreach (var x in testA()) { }
Console.ReadLine();
}
// static List<int> testA()
static IEnumerable<int> testA()
{
Console.WriteLine("asdfa");
yield return 1;
Console.WriteLine("asdf");
}
Nowadays you can use the yield keyword for async streams.
C# 8.0 introduces async streams, which model a streaming source of data. Data streams often retrieve or generate elements asynchronously. Async streams rely on new interfaces introduced in .NET Standard 2.1. These interfaces are supported in .NET Core 3.0 and later. They provide a natural programming model for asynchronous streaming data sources.
Source: Microsoft docs
Example below
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
public class Program
{
public static async Task Main()
{
List<int> numbers = new List<int>() { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
await foreach(int number in YieldReturnNumbers(numbers))
{
Console.WriteLine(number);
}
}
public static async IAsyncEnumerable<int> YieldReturnNumbers(List<int> numbers)
{
foreach (int number in numbers)
{
await Task.Delay(1000);
yield return number;
}
}
}
Simple demo to understand yield
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp_demo_yield {
class Program
{
static void Main(string[] args)
{
var letters = new List<string>() { "a1", "b1", "c2", "d2" };
// Not yield
var test1 = GetNotYield(letters);
foreach (var t in test1)
{
Console.WriteLine(t);
}
// yield
var test2 = GetWithYield(letters).ToList();
foreach (var t in test2)
{
Console.WriteLine(t);
}
Console.ReadKey();
}
private static IList<string> GetNotYield(IList<string> list)
{
var temp = new List<string>();
foreach(var x in list)
{
if (x.Contains("2")) {
temp.Add(x);
}
}
return temp;
}
private static IEnumerable<string> GetWithYield(IList<string> list)
{
foreach (var x in list)
{
if (x.Contains("2"))
{
yield return x;
}
}
}
}
}
It's trying to bring in some Ruby Goodness :)
Concept: This is some sample Ruby Code that prints out each element of the array
rubyArray = [1,2,3,4,5,6,7,8,9,10]
rubyArray.each{|x|
puts x # do whatever with x
}
The Array's each method implementation yields control over to the caller (the 'puts x') with each element of the array neatly presented as x. The caller can then do whatever it needs to do with x.
However .Net doesn't go all the way here.. C# seems to have coupled yield with IEnumerable, in a way forcing you to write a foreach loop in the caller as seen in Mendelt's response. Little less elegant.
//calling code
foreach(int i in obCustomClass.Each())
{
Console.WriteLine(i.ToString());
}
// CustomClass implementation
private int[] data = {1,2,3,4,5,6,7,8,9,10};
public IEnumerable<int> Each()
{
for(int iLooper=0; iLooper<data.Length; ++iLooper)
yield return data[iLooper];
}

Categories