Yield item in a list - c#

In my application there is a List<MyItem> with a getter only:
public List<MyItem> myList
{
get
{
MyHost.GetItemFromID(_i1); //this may be a long operation
MyHost.GetItemFromID(_i2);
MyHost.GetItemFromID(_i3);
MyHost.GetItemFromID(_i4);
MyHost.GetItemFromID(_i5);
}
}
This list needs sometimes to be retrieved as whole and other times only certain item has to be accessed: i.e. myList[3]. Is there a way of not building the entire list as I only need the fourth item?

You could wrap the list with a class, say 'myListContainer' and overload its '[]' operator so that you could do something like this:
myListContainer[3]
which will invoke the call
MyHost.GetItemFromID(_i3);
and return the desired list item.
I'll add a full example if needed
EDIT
public class myListContainer
{
public MyItem this[int i]
{
get
{
return MyHost.GetItemFromID(i);
}
}
}
and add method to get the entire list.

You could return IEnumerable<MyItem> and use yield return:
public IEnumerable<MyItem> MyItems
{
get
{
yield return MyHost.GetItemFromID(_i1); //this may be a long operation
yield return MyHost.GetItemFromID(_i2);
yield return MyHost.GetItemFromID(_i3);
yield return MyHost.GetItemFromID(_i4);
yield return MyHost.GetItemFromID(_i5);
}
}
Then it is using deferred execution and you can write:
var fourItems = MyItems.Take(4).ToList();
Note that it might be a good idea to change the order of execution if the order doesn't matter and only the first call of GetItemFromID takes more time than the others:
yield return MyHost.GetItemFromID(_i2);
yield return MyHost.GetItemFromID(_i3);
yield return MyHost.GetItemFromID(_i4);
yield return MyHost.GetItemFromID(_i5);
yield return MyHost.GetItemFromID(_i1); //this may be a long operation
I think i have misuderstood the requirement. I've read "sometimes I only need the fourth item" as "only need four items"
So you could use my aproach but with ElementAt:
MyItem fourthItem = MyItems.ElementAt(3);
If you don't know if there's a fourth use ElementAtOrdefault.

I don't know what type your keys for _i1 etc are, but let's assume they are ints. Then you could do this:
public IEnumerable<MyItem> GetItems(params int[] keys)
{
return keys.Select(key => MyHost.GetItemFromID(key));
}
Which you could call like this:
var myItems = GetItems(_i2, _i3, _i5).ToList();
or
var myItems = GetItems(_i3).ToArray();
and so on.
Note: If you only ever want a list returned, you can do the conversion inside GetItems() itself:
public List<MyItem> GetItems(params int[] keys)
{
return keys.Select(key => MyHost.GetItemFromID(key)).ToList();
}
This approach requires a method rather than a property, though.

Note:
This assumes by '4th item' you mean an ordinal number, not the actual id itself.
Build a map in advance and return only items you wish to.
List<int> map = new List<int>
{
_i1,
_i2,
_i3,
_i4,
_i5,
};
public IEnumerable<MyItem> myList(params int[] indices)
{
if (!indices.Any())
return map.Select(MyHost.GetItemFromID);
return indices.Select(i => MyHost.GetItemFromID(map[i]));
}
// so you call
myList(); // for all items; decide on this API, may be separate to two methods?
myList(0); // or
myList(4); // or
myList(1, 3); // all loaded only on demand
Use a dictionary if you want more control over indexing. Ideally, for some reason this looks like you should be passing all IDs to DB in one go and SQL directly handle it, if that is the case.

Related

C# yield return performance

How much space is reserved to the underlying collection behind a method using yield return syntax WHEN I PERFORM a ToList() on it? There's a chance it will reallocate and thus decrease performance if compared to the standard approach where i create a list with predefined capacity?
The two scenarios:
public IEnumerable<T> GetList1()
{
foreach( var item in collection )
yield return item.Property;
}
public IEnumerable<T> GetList2()
{
List<T> outputList = new List<T>( collection.Count() );
foreach( var item in collection )
outputList.Add( item.Property );
return outputList;
}
yield return does not create an array that has to be resized, like what List does; instead, it creates an IEnumerable with a state machine.
For instance, let's take this method:
public static IEnumerable<int> Foo()
{
Console.WriteLine("Returning 1");
yield return 1;
Console.WriteLine("Returning 2");
yield return 2;
Console.WriteLine("Returning 3");
yield return 3;
}
Now let's call it and assign that enumerable to a variable:
var elems = Foo();
None of the code in Foo has executed yet. Nothing will be printed on the console. But if we iterate over it, like this:
foreach(var elem in elems)
{
Console.WriteLine( "Got " + elem );
}
On the first iteration of the foreach loop, the Foo method will be executed until the first yield return. Then, on the second iteration, the method will "resume" from where it left off (right after the yield return 1), and execute until the next yield return. Same for all subsequent elements.
At the end of the loop, the console will look like this:
Returning 1
Got 1
Returning 2
Got 2
Returning 3
Got 3
This means you can write methods like this:
public static IEnumerable<int> GetAnswers()
{
while( true )
{
yield return 42;
}
}
You can call the GetAnswers method, and every time you request an element, it'll give you 42; the sequence never ends. You couldn't do this with a List, because lists have to have a finite size.
How much space is reserved to the underlying collection behind a method using yield return syntax?
There's no underlying collection.
There's an object, but it isn't a collection. Just how much space it will take up depends on what it needs to keep track of.
There's a chance it will reallocate
No.
And thus decrease performance if compared to the standard approach where i create a list with predefined capacity?
It will almost certainly take up less memory than creating a list with a predefined capacity.
Let's try a manual example. Say we had the following code:
public static IEnumerable<int> CountToTen()
{
for(var i = 1; i != 11; ++i)
yield return i;
}
To foreach through this will iterate through the numbers 1 to 10 inclusive.
Now let's do this the way we would have to if yield did not exist. We'd do something like:
private class CountToTenEnumerator : IEnumerator<int>
{
private int _current;
public int Current
{
get
{
if(_current == 0)
throw new InvalidOperationException();
return _current;
}
}
object IEnumerator.Current
{
get { return Current; }
}
public bool MoveNext()
{
if(_current == 10)
return false;
_current++;
return true;
}
public void Reset()
{
throw new NotSupportedException();
// We *could* just set _current back, but the object produced by
// yield won't do that, so we'll match that.
}
public void Dispose()
{
}
}
private class CountToTenEnumerable : IEnumerable<int>
{
public IEnumerator<int> GetEnumerator()
{
return new CountToTenEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
public static IEnumerable<int> CountToTen()
{
return new CountToTenEnumerable();
}
Now, for a variety of reasons this is quite different to the code you're likely to get from the version using yield, but the basic principle is the same. As you can see there are two allocations involved of objects (same number as if we had a collection and then did a foreach on that) and the storage of a single int. In practice we can expect yield to store a few more bytes than that, but not a lot.
Edit: yield actually does a trick where the first GetEnumerator() call on the same thread that obtained the object returns that same object, doing double service for both cases. Since this covers over 99% of use cases yield actually does one allocation rather than two.
Now let's look at:
public IEnumerable<T> GetList1()
{
foreach( var item in collection )
yield return item.Property;
}
While this would result in more memory used than just return collection, it won't result in a lot more; the only thing the enumerator produced really needs to keep track of is the enumerator produced by calling GetEnumerator() on collection and then wrapping that.
This is going to be massively less memory than that of the wasteful second approach you mention, and much faster to get going.
Edit:
You've changed your question to include "syntax WHEN I PERFORM a ToList() on it", which is worth considering.
Now, here we need to add a third possibility: Knowledge of the collection's size.
Here, there is the possibilty that using new List(capacity) will prevent allocations of the list being built. That can indeed be a considerable saving.
If the object that has ToList called on it implements ICollection<T> then ToList will end up first doing a single allocation of an internal array of T and then calling ICollection<T>.CopyTo().
This would mean that your GetList2 would result in a faster ToList() than your GetList1.
However, your GetList2 has already wasted time and memory doing what ToList() will do with the results of GetList1 anyway!
What it should have done here was just return new List<T>(collection); and be done with it.
If though we need to actually do something inside GetList1 or GetList2 (e.g. convert elements, filter elements, track averages, and so on) then GetList1 is going to be faster and lighter on memory. Much lighter if we never call ToList() on it, and slightly ligher if we do call ToList() because again, the faster and lighter ToList() is offset by GetList2 being slower and heavier in the first place by exactly the same amount.

Yield return results of another Enumerable of the same datatype

I am writing validation logic, and I wanted the caller to only get the number of validation messages they really need (some cases, just the first validation message is necessary, other times, we want to now all of the problems with the given data)
Given this, I thought "Brilliant! I'll return an IEnumerable, and use the yield return each of the results. if FirstOrDefault() is used on the enumeration, only the first failed validation will be executed, where as the following will be skipped, unless we call ToList() on the validation result enumerable.
The issue I am seeing is if I want to break my validation logic into multiple methods, each returning an Enumerable, I have to enumerate over THAT set with another yield return there as well. (see simplified example below)
public IEnumerable<string> Validate(ClassToValidate obj)
{
if(string.IsNullOrEmpty(obj.Name)
{
yield return "empty name";
}
foreach(var message in ValidateSubObject(obj.OtherObjectToValidate))
{
yield return message;
}
}
private IEnumerable<string> ValidateSubObject(OtherClass objToValidate)
{
yield return ...
}
Is there some other keyword I am missing, where I could "yield return set" from the other method that returns another IEnumerable of the same datatype? I.E. is there a simpler syntax than:
foreach(var message in ValidateSubObject(obj.OtherObjectToValidate))
{
yield return message;
}
You cannot yield return multiple items. If you want to use iterator methods to concatenate sequences, you'll have to loop through them.
Of course, you could always drop the yield return completely and construct your IEnumerable<T> to be returned using other means (LINQ's Concat method immediately comes to mind).
public IEnumerable<string> Validate(ClassToValidate obj)
{
var subObjectMessages = ValidateSubObject(obj.OtherObjectToValidate);
if (string.IsNullOrEmpty(obj.Name))
{
return new[] { "empty name" }.Concat(subObjectMessages);
}
return subObjectMessages;
}
Once you've introduced yield in a function, you have to stay with it. A common approach these days is to use LINQ, which is often more flexible.
public IEnumerable<string> Validate(ClassToValidate obj)
{
return (String.IsNullOrEmpty(obj.Name) ? new [] { "empty name" } : Enumerable.Empty<string>())
.Concat(ValidateSubObject(obj.OtherObjectToValidate));
}

How to properly check IEnumerable for existing results

What's the best practice to check if a collection has items?
Here's an example of what I have:
var terminalsToSync = TerminalAction.GetAllTerminals();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
The GetAllTerminals() method will execute a stored procedure and, if we return a result, (Any() is true), SyncTerminals() will loop through the elements; thus enumerating it again and executing the stored procedure for the second time.
What's the best way to avoid this?
I'd like a good solution that can be used in other cases too; possibly without converting it to List.
Thanks in advance.
I would probably use a ToArray call, and then check Length; you're going to enumerate all the results anyway so why not do it early? However, since you've said you want to avoid early realisation of the enumerable...
I'm guessing that SyncTerminals has a foreach, in which case you can write it something like this:
bool any = false;
foreach(var terminal in terminalsToSync)
{
if(!any)any = true;
//....
}
if(!any)
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
Okay, there's a redundant if after the first loop, but I'm guessing the cost of an extra few CPU cycles isn't going to matter much.
Equally, you could do the iteration the old way and use a do...while loop and GetEnumerator; taking the first iteration out of the loop; that way there are literally no wasted operations:
var enumerator = terminalsToSync.GetEnumerator();
if(enumerator.MoveNext())
{
do
{
//sync enumerator.Current
} while(enumerator.MoveNext())
}
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
How about this, which still defers execution, but buffers it once executed:
var terminalsToSync = TerminalAction.GetAllTerminals().Lazily();
with:
public static class LazyEnumerable {
public static IEnumerable<T> Lazily<T>(this IEnumerable<T> source) {
if (source is LazyWrapper<T>) return source;
return new LazyWrapper<T>(source);
}
class LazyWrapper<T> : IEnumerable<T> {
private IEnumerable<T> source;
private bool executed;
public LazyWrapper(IEnumerable<T> source) {
if (source == null) throw new ArgumentNullException("source");
this.source = source;
}
IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
public IEnumerator<T> GetEnumerator() {
if (!executed) {
executed = true;
source = source.ToList();
}
return source.GetEnumerator();
}
}
}
Personally i wouldnt use an any here, foreach will simply not loop through any items if the collection is empty, so i would just do it like that. However i would recommend that you check for null.
If you do want to pre-enumerate the set use .ToArray() eg will only enumerate once:
var terminalsToSync = TerminalAction.GetAllTerminals().ToArray();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
var terminalsToSync = TerminalAction.GetAllTerminals().ToList();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
.Length or .Count is faster since it doesn't need to go through the GetEnumerator()/MoveNext()/Dispose() required by Any()
Here's another way of approaching this problem:
int count = SyncTerminals(terminalsToSync);
if(count == 0) GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
where you change SyncTerminals to do:
int count = 0;
foreach(var obj in terminalsToSync) {
count++;
// some code
}
return count;
Nice and simple.
All the caching solutions here are caching all items when the first item is being retrieved. It it really lazy if you cache each single item while the items of the list are is iterated.
The difference can be seen in this example:
public class LazyListTest
{
private int _count = 0;
public void Test()
{
var numbers = Enumerable.Range(1, 40);
var numbersQuery = numbers.Select(GetElement).ToLazyList(); // Cache lazy
var total = numbersQuery.Take(3)
.Concat(numbersQuery.Take(10))
.Concat(numbersQuery.Take(3))
.Sum();
Console.WriteLine(_count);
}
private int GetElement(int value)
{
_count++;
// Some slow stuff here...
return value * 100;
}
}
If you run the Test() method, the _count is only 10. Without caching it would be 16 and with .ToList() it would be 40!
An example of the implementation of LazyList can be found here.
If you're seeing two procedure calls for the evaluation of whatever GetAllTerminals() returns, this means that the procedure's result isn't being cached. Without knowing what data-access strategy you're using, this is quite hard to fix in a general way.
The simplest solution, as you've alluded, is to copy the result of the call before you perform any other operations. If you wanted to, you could neatly wrap this behaviour up in an IEnumerable<T> which executes the inner enumerable call just once:
public class CachedEnumerable<T> : IEnumerable<T>
{
public CachedEnumerable<T>(IEnumerable<T> enumerable)
{
result = new Lazy<List<T>>(() => enumerable.ToList());
}
private Lazy<List<T>> result;
public IEnumerator<T> GetEnumerator()
{
return this.result.Value.GetEnumerator();
}
System.Collections.IEnumerable GetEnumerator()
{
return this.GetEnumerator();
}
}
Wrap the result in an instance of this type and it will not evaluate the inner enumerable multiple times.

Is this achievable with a single LINQ query?

Suppose I have a given object of type IEnumerable<string> which is the return value of method SomeMethod(), and which contains no repeated elements. I would like to be able to "zip" the following lines in a single LINQ query:
IEnumerable<string> someList = SomeMethod();
if (someList.Contains(givenString))
{
return (someList.Where(givenString));
}
else
{
return (someList);
}
Edit: I mistakenly used Single instead of First. Corrected now.
I know I can "zip" this by using the ternary operator, but that's just not the point. I would just list to be able to achieve this with a single line. Is that possible?
This will return items with given string or all items if given is not present in the list:
someList.Where(i => i == givenString || !someList.Contains(givenString))
The nature of your desired output requires that you either make two requests for the data, like you are now, or buffer the non-matches to return if no matches are found. The later would be especially useful in cases where actually getting the data is a relatively expensive call (eg: database query or WCF service). The buffering method would look like this:
static IEnumerable<T> AllIfNone<T>(this IEnumerable<T> source,
Func<T, bool> predicate)
{
//argument checking ignored for sample purposes
var buffer = new List<T>();
bool foundFirst = false;
foreach (var item in source)
{
if (predicate(item))
{
foundFirst = true;
yield return item;
}
else if (!foundFirst)
{
buffer.Add(item);
}
}
if (!foundFirst)
{
foreach (var item in buffer)
{
yield return item;
}
}
}
The laziness of this method is either that of Where or ToList depending on if the collection contains a match or not. If it does, you should get execution similar to Where. If not, you will get roughly the execution of calling ToList (with the overhead of all the failed filter checks) and iterating the result.
What is wrong with the ternary operator?
someList.Any(s => s == givenString) ? someList.Where(s => s == givenString) : someList;
It would be better to do the Where followed by the Any but I can't think of how to one-line that.
var reducedEnumerable = someList.Where(s => s == givenString);
return reducedEnumerable.Any() ? reducedEnumerable : someList;
It is not possible to change the return type on the method, which is what you're asking. The first condition returns a string and the second condition returns a collection of strings.
Just return the IEnumerable<string> collection, and call Single on the return value like this:
string test = ReturnCollectionOfStrings().Single(x => x == "test");

When to use Yield?

When should I use return yield and when should I use return only?
Use yield when you are returning an enumerable, and you don't have all the results at that point.
Practically, I've used yield when I want to iterate through a large block of information (database, flat file, etc.), and I don't want to load everything in memory first. Yield is a nice way to iterate through the block without loading everything at once.
The yield keyword is incredibly powerful. It basically allows you to quickly return IEnumerable and IEnumerator objects without explicitly coding them.
Consider a scenario where you want to return the intersection of two IEnumerable objects. Here is how you would do it using the yield keyword.
public static class Program
{
public static void Main()
{
IEnumerable<object> lhs = new List<int> { 1, 2, 3, 4, 5 };
IEnumerable<object> rhs = new List<int> { 3, 4, 5, 6, 7 };
foreach (object item in IntersectExample.Intersect(lhs, rhs))
{
Console.WriteLine(item);
break;
}
}
}
public static class IntersectExample
{
public static IEnumerable<object> Intersect(IEnumerable<object> lhs, IEnumerable<object> rhs)
{
var hashset = new HashSet<object>();
foreach (object item in lhs)
{
if (!hashset.Contains(item))
{
hashset.Add(item);
}
}
foreach (object item in rhs)
{
if (hashset.Contains(item))
{
yield return item;
}
}
}
}
It is hard to appreciate this until you fully realize what is going on. Normally when you intersect two sets you complete the entire operation before returning the result to the caller. The means the runtime complexity of the operation is O(m + n), where m and n are the sizes of the collections being intersected, regardless of what you do with the result afterwards. But, in my example I just wanted to pick off the first item from the result. Using an IEnumerable that was created by the yield keyword makes it super easy to delay part of the processing until it is actually required. My example runs in O(m). The alternative is to code the IEnumerable and maintain the state in it manually. The power of the yield keyword is that it creates that state machine for you.
Yield is for iterators.
It lets you process a list in small swallows, which is nice for big lists.
The magical thing about Yield is that it remembers where you're up to between invocations.
If you're not iterating you don't need Yield.
The yield construct is used to create an iterator that can produce multiple values in succession:
IEnumerable<int> three_numbers() {
yield return 1;
yield return 2;
yield return 3;
}
...
foreach (var i in three_numbers()) {
// i becomes 1, 2 and 3, in turn.
}
Yield Return will continue the method from that point. For example, you want to loop over an array or list and return each element at the time for the caller to process. So you will use yield return. If you want to return everything and then done, you don't need to do that
It is explained here:
C# Language Reference
yield (C# Reference)
The method called will return every single value so that they can be enumerated by the caller.
This means that you will need to use yield when you want every possible result returned by an iteration.

Categories