As I do that often enough i was wondering if there is a neat way to skip using a variable. I have a function that returns a List of unknown length. I want to add the result to another list and also know if i got returned an empty list. I could save the function result in a variable and see if its empty and otherwise add it to the list. I just was thinking if this could be done more elegantly.
I also could do:
myList.AddRange(getFiles(path));
if(getFiles(path).Count == 0)
{
doSomething();
}
getFiles being an example function that returns a list of files at a path
That way however I have to call the function twice.
This is more of a programming style question as I am quite unexperienced. Should one make a "tmp" variable everytime this happens?
Make this extension method:
public static int AddRangeEx<T>(this List<T> target, IEnumerable<T> items)
{
int result = items.Count();
if( result > 0 ) target.AddRange(items);
return result;
}
And then you'll be able to write code like this:
if(myList.AddRangeEx(getfiles(path)) > 0)
{
doSomething();
}
Alternatively, if you're concerned about side effects of the extra Count() call (even just performance side effects, but there are lots of ways to get an IEnumerable that can only run one time), you could build the extension method this way:
public static int AddRangeEx<T>(this List<T> target, IEnumerable<T> items)
{
int result = 0;
foreach (T item in items)
{
target.Add(item);
result++;
}
return result;
}
Which isn't really any different than what AddRange() and Count() were already doing.
As a bonus, you can overload the extension method for improved performance on types that already know the count:
public static int AddRangeEx<T>(this List<T> target, T[] items)
{
if (items.Length > 0)
target.AddRange(items);
return items.Length;
}
It kind of makes me sad the built-in AddRange() method doesn't already do this for us.
The first choice is to just store a local variable
var files = getFiles(path);
myList.AddRange(files);
if(files.Count == 0)
{
doSomething();
}
The second choice would be this
var count = myList.Count
myList.AddRange(getFiles(path));
if(myList.Count > count)
{
doSomething();
}
Related
How much space is reserved to the underlying collection behind a method using yield return syntax WHEN I PERFORM a ToList() on it? There's a chance it will reallocate and thus decrease performance if compared to the standard approach where i create a list with predefined capacity?
The two scenarios:
public IEnumerable<T> GetList1()
{
foreach( var item in collection )
yield return item.Property;
}
public IEnumerable<T> GetList2()
{
List<T> outputList = new List<T>( collection.Count() );
foreach( var item in collection )
outputList.Add( item.Property );
return outputList;
}
yield return does not create an array that has to be resized, like what List does; instead, it creates an IEnumerable with a state machine.
For instance, let's take this method:
public static IEnumerable<int> Foo()
{
Console.WriteLine("Returning 1");
yield return 1;
Console.WriteLine("Returning 2");
yield return 2;
Console.WriteLine("Returning 3");
yield return 3;
}
Now let's call it and assign that enumerable to a variable:
var elems = Foo();
None of the code in Foo has executed yet. Nothing will be printed on the console. But if we iterate over it, like this:
foreach(var elem in elems)
{
Console.WriteLine( "Got " + elem );
}
On the first iteration of the foreach loop, the Foo method will be executed until the first yield return. Then, on the second iteration, the method will "resume" from where it left off (right after the yield return 1), and execute until the next yield return. Same for all subsequent elements.
At the end of the loop, the console will look like this:
Returning 1
Got 1
Returning 2
Got 2
Returning 3
Got 3
This means you can write methods like this:
public static IEnumerable<int> GetAnswers()
{
while( true )
{
yield return 42;
}
}
You can call the GetAnswers method, and every time you request an element, it'll give you 42; the sequence never ends. You couldn't do this with a List, because lists have to have a finite size.
How much space is reserved to the underlying collection behind a method using yield return syntax?
There's no underlying collection.
There's an object, but it isn't a collection. Just how much space it will take up depends on what it needs to keep track of.
There's a chance it will reallocate
No.
And thus decrease performance if compared to the standard approach where i create a list with predefined capacity?
It will almost certainly take up less memory than creating a list with a predefined capacity.
Let's try a manual example. Say we had the following code:
public static IEnumerable<int> CountToTen()
{
for(var i = 1; i != 11; ++i)
yield return i;
}
To foreach through this will iterate through the numbers 1 to 10 inclusive.
Now let's do this the way we would have to if yield did not exist. We'd do something like:
private class CountToTenEnumerator : IEnumerator<int>
{
private int _current;
public int Current
{
get
{
if(_current == 0)
throw new InvalidOperationException();
return _current;
}
}
object IEnumerator.Current
{
get { return Current; }
}
public bool MoveNext()
{
if(_current == 10)
return false;
_current++;
return true;
}
public void Reset()
{
throw new NotSupportedException();
// We *could* just set _current back, but the object produced by
// yield won't do that, so we'll match that.
}
public void Dispose()
{
}
}
private class CountToTenEnumerable : IEnumerable<int>
{
public IEnumerator<int> GetEnumerator()
{
return new CountToTenEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
public static IEnumerable<int> CountToTen()
{
return new CountToTenEnumerable();
}
Now, for a variety of reasons this is quite different to the code you're likely to get from the version using yield, but the basic principle is the same. As you can see there are two allocations involved of objects (same number as if we had a collection and then did a foreach on that) and the storage of a single int. In practice we can expect yield to store a few more bytes than that, but not a lot.
Edit: yield actually does a trick where the first GetEnumerator() call on the same thread that obtained the object returns that same object, doing double service for both cases. Since this covers over 99% of use cases yield actually does one allocation rather than two.
Now let's look at:
public IEnumerable<T> GetList1()
{
foreach( var item in collection )
yield return item.Property;
}
While this would result in more memory used than just return collection, it won't result in a lot more; the only thing the enumerator produced really needs to keep track of is the enumerator produced by calling GetEnumerator() on collection and then wrapping that.
This is going to be massively less memory than that of the wasteful second approach you mention, and much faster to get going.
Edit:
You've changed your question to include "syntax WHEN I PERFORM a ToList() on it", which is worth considering.
Now, here we need to add a third possibility: Knowledge of the collection's size.
Here, there is the possibilty that using new List(capacity) will prevent allocations of the list being built. That can indeed be a considerable saving.
If the object that has ToList called on it implements ICollection<T> then ToList will end up first doing a single allocation of an internal array of T and then calling ICollection<T>.CopyTo().
This would mean that your GetList2 would result in a faster ToList() than your GetList1.
However, your GetList2 has already wasted time and memory doing what ToList() will do with the results of GetList1 anyway!
What it should have done here was just return new List<T>(collection); and be done with it.
If though we need to actually do something inside GetList1 or GetList2 (e.g. convert elements, filter elements, track averages, and so on) then GetList1 is going to be faster and lighter on memory. Much lighter if we never call ToList() on it, and slightly ligher if we do call ToList() because again, the faster and lighter ToList() is offset by GetList2 being slower and heavier in the first place by exactly the same amount.
My situation is this. I need to run some validation and massage type code on multiple different types of objects, but for cleanliness (and code reuse), I'd like to make all the calls to this validation look basically the same regardless of object. I am attempting to solve this through overloading, which works fine until I get to Generic Collection objects.
The following example should clarify what I'm talking about here:
private string DoStuff(string tmp) { ... }
private ObjectA DoStuff(ObjectA tmp) { ... }
private ObjectB DoStuff(ObjectB tmp) { ... }
...
private Collection<ObjectA> DoStuff(Collection<ObjectA> tmp) {
foreach (ObjectA obj in tmp) if (DoStuff(obj) == null) tmp.Remove(obj);
if (tmp.Count == 0) return null;
return tmp;
}
private Collection<Object> DoStuff(Collection<ObjectB> tmp) {
foreach (ObjectB obj in tmp) if (DoStuff(obj) == null) tmp.Remove(obj);
if (tmp.Count == 0) return null;
return tmp;
}
...
This seems like a real waste, as I have to duplicate the exact same code for every different Collection<T> type. I would like to make a single instance of DoStuff that handles any Collection<T>, rather than make a separate one for each.
I have tried using ICollection, but this has two problems: first, ICollection does not expose the .Remove method, and I can't write the foreach loop because I don't know the type of the objects in the list. Using something more generic, like object, does not work because I don't have a method DoStuff that accepts an object - I need it to call the appropriate one for the actual object. Writing a DoStuff method which takes an object and does some kind of huge list of if statements to pick the right method and cast appropriately kind of defeats the whole idea of getting rid of redundant code - I might as well just copy and paste all those Collection<T> methods.
I have tried using a generic DoStuff<T> method, but this has the same problem in the foreach loop. Because I don't know the object type at design time, the compiler won't let me call DoStuff(obj).
Technically, the compiler should be able to tell which call needs to be made at compile time, since these are all private methods, and the specific types of the objects being passed in the calls are all known at the point the method is being called. That knowledge just doesn't seem to bubble up to the later methods being called by this method.
I really don't want to use reflection here, as that makes the code even more complicated than just copying and pasting all the Collection<T> methods, and it creates a performance slowdown. Any ideas?
---EDIT 1---
I realized that my generic method references were not displaying correctly, because I had not used the html codes for the angle brackets. This should be fixed now.
---EDIT 2---
Based on a response below, I have altered my Collection<T> method to look like this:
private Collection<T> DoStuff<T>(Collection<T> tmp) {
for (int i = tmp.Count - 1; i >= 0; i--) if (DoStuff(tmp[i]) == null) tmp.RemoveAt(i);
if (tmp.Count == 0) return null;
return tmp;
}
This still does not work, however, as the compiler cannot figure out which overloaded method to call when I call DoStuff(tmp[i]).
You need to pass the method you want to call into the generic method as a parameter. That way the overload resolution happens at a point where the compiler knows what types to expect.
Alternatively, you need to make the per-item DoStuff method generic (or object) to support any possible item in the collection.
(I also separated the RemoveItem call from the first loop, so that it isn't trying to remove an item from the same list being iterated.)
private Collection<T> DoStuff<T>(Collection<T> tmp, Func<T, T> stuffDoer)
{
var removeList = tmp
.Select(v => stuffDoer(v))
.Where(v => v == null)
.ToList();
foreach (var removeItem in removeList) tmp.Remove(removeItem);
if (tmp.Count == 0) return null;
return tmp;
}
private class ObjectA { }
private class ObjectB { }
private string DoStuff(string tmp) { return tmp; }
private ObjectA DoStuff(ObjectA tmp) { return tmp; }
private ObjectB DoStuff(ObjectB tmp) { return tmp; }
Call using this code:
var x = new Collection<ObjectA>
{
new ObjectA(),
new ObjectA(),
null
};
var result = DoStuff(x, DoStuff);
Something like this?:
private Collection DoStuff<T>(Collection tmp)
{
// This will probably assert as you are modifying a collection while looping in it.
foreach (T obj in tmp) if (DoStuff(obj) == null) tmp.Remove(obj);
if (tmp.Count == 0) return null;
return tmp;
}
Where T is the type of the object in the collection.
Please note that you have a line that will most likely assert. SO:
private Collection DoStuff<T>(Collection tmp)
{
// foreach doesn't work if you are modifying the collection.
// Looping backward with an index, so we never encounter an invalid index.
for (int i = tmp.Count - 1; i >= 0; i--) if (DoStuff(tmp[i]) == null) tmp.Remove(tmp[i]);
if (tmp.Count == 0) return null;
return tmp;
}
But at this point... Why make it generic, since you are not using T anymore?
private Collection DoStuff(Collection tmp)
{
// DoStuff can be generic, but you shouldn't need to explicitly pass it a type...
for (int i = tmp.Count - 1; i >= 0; i--) if (DoStuff(tmp[i]) == null) tmp.Remove(tmp[i]);
if (tmp.Count == 0) return null;
return tmp;
}
What's the best practice to check if a collection has items?
Here's an example of what I have:
var terminalsToSync = TerminalAction.GetAllTerminals();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
The GetAllTerminals() method will execute a stored procedure and, if we return a result, (Any() is true), SyncTerminals() will loop through the elements; thus enumerating it again and executing the stored procedure for the second time.
What's the best way to avoid this?
I'd like a good solution that can be used in other cases too; possibly without converting it to List.
Thanks in advance.
I would probably use a ToArray call, and then check Length; you're going to enumerate all the results anyway so why not do it early? However, since you've said you want to avoid early realisation of the enumerable...
I'm guessing that SyncTerminals has a foreach, in which case you can write it something like this:
bool any = false;
foreach(var terminal in terminalsToSync)
{
if(!any)any = true;
//....
}
if(!any)
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
Okay, there's a redundant if after the first loop, but I'm guessing the cost of an extra few CPU cycles isn't going to matter much.
Equally, you could do the iteration the old way and use a do...while loop and GetEnumerator; taking the first iteration out of the loop; that way there are literally no wasted operations:
var enumerator = terminalsToSync.GetEnumerator();
if(enumerator.MoveNext())
{
do
{
//sync enumerator.Current
} while(enumerator.MoveNext())
}
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
How about this, which still defers execution, but buffers it once executed:
var terminalsToSync = TerminalAction.GetAllTerminals().Lazily();
with:
public static class LazyEnumerable {
public static IEnumerable<T> Lazily<T>(this IEnumerable<T> source) {
if (source is LazyWrapper<T>) return source;
return new LazyWrapper<T>(source);
}
class LazyWrapper<T> : IEnumerable<T> {
private IEnumerable<T> source;
private bool executed;
public LazyWrapper(IEnumerable<T> source) {
if (source == null) throw new ArgumentNullException("source");
this.source = source;
}
IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
public IEnumerator<T> GetEnumerator() {
if (!executed) {
executed = true;
source = source.ToList();
}
return source.GetEnumerator();
}
}
}
Personally i wouldnt use an any here, foreach will simply not loop through any items if the collection is empty, so i would just do it like that. However i would recommend that you check for null.
If you do want to pre-enumerate the set use .ToArray() eg will only enumerate once:
var terminalsToSync = TerminalAction.GetAllTerminals().ToArray();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
var terminalsToSync = TerminalAction.GetAllTerminals().ToList();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
.Length or .Count is faster since it doesn't need to go through the GetEnumerator()/MoveNext()/Dispose() required by Any()
Here's another way of approaching this problem:
int count = SyncTerminals(terminalsToSync);
if(count == 0) GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
where you change SyncTerminals to do:
int count = 0;
foreach(var obj in terminalsToSync) {
count++;
// some code
}
return count;
Nice and simple.
All the caching solutions here are caching all items when the first item is being retrieved. It it really lazy if you cache each single item while the items of the list are is iterated.
The difference can be seen in this example:
public class LazyListTest
{
private int _count = 0;
public void Test()
{
var numbers = Enumerable.Range(1, 40);
var numbersQuery = numbers.Select(GetElement).ToLazyList(); // Cache lazy
var total = numbersQuery.Take(3)
.Concat(numbersQuery.Take(10))
.Concat(numbersQuery.Take(3))
.Sum();
Console.WriteLine(_count);
}
private int GetElement(int value)
{
_count++;
// Some slow stuff here...
return value * 100;
}
}
If you run the Test() method, the _count is only 10. Without caching it would be 16 and with .ToList() it would be 40!
An example of the implementation of LazyList can be found here.
If you're seeing two procedure calls for the evaluation of whatever GetAllTerminals() returns, this means that the procedure's result isn't being cached. Without knowing what data-access strategy you're using, this is quite hard to fix in a general way.
The simplest solution, as you've alluded, is to copy the result of the call before you perform any other operations. If you wanted to, you could neatly wrap this behaviour up in an IEnumerable<T> which executes the inner enumerable call just once:
public class CachedEnumerable<T> : IEnumerable<T>
{
public CachedEnumerable<T>(IEnumerable<T> enumerable)
{
result = new Lazy<List<T>>(() => enumerable.ToList());
}
private Lazy<List<T>> result;
public IEnumerator<T> GetEnumerator()
{
return this.result.Value.GetEnumerator();
}
System.Collections.IEnumerable GetEnumerator()
{
return this.GetEnumerator();
}
}
Wrap the result in an instance of this type and it will not evaluate the inner enumerable multiple times.
By fastest I mean what is the most performant means of converting each item in List to type int using C# assuming int.Parse will work for every item?
You won't get around iterating over all elements. Using LINQ:
var ints = strings.Select(s => int.Parse(s));
This has the added bonus it will only convert at the time you iterate over it, and only as much elements as you request.
If you really need a list, use the ToList method. However, you have to be aware that the performance bonus mentioned above won't be available then.
If you're really trying to eeke out the last bit of performance you could try doing someting with pointers like this, but personally I'd go with the simple linq implementation that others have mentioned.
unsafe static int ParseUnsafe(string value)
{
int result = 0;
fixed (char* v = value)
{
char* str = v;
while (*str != '\0')
{
result = 10 * result + (*str - 48);
str++;
}
}
return result;
}
var parsed = input.Select(i=>ParseUnsafe(i));//optionally .ToList() if you really need list
There is likely to be very little difference between any of the obvious ways to do this: therefore go for readability (one of the LINQ-style methods posted in other answers).
You may gain some performance for very large lists by initializing the output list to its required capacity, but it's unlikely you'd notice the difference, and readability will suffer:
List<string> input = ..
List<int> output = new List<int>(input.Count);
... Parse in a loop ...
The slight performance gain will come from the fact that the output list won't need to be repeatedly reallocated as it grows.
I don't know what the performance implications are, but there is a List<T>.ConvertAll<TOutput> method for converting the elements in the current List to another type, returning a list containing the converted elements.
List.ConvertAll Method
var myListOfInts = myListString.Select(x => int.Parse(x)).ToList()
Side note: If you call ToList() on ICollection .NET framework automatically preallocates an
List of needed size, so it doesn't have to allocate new space for each new item added to the list.
Unfortunately LINQ Select doesn't return an ICollection (as Joe pointed out in comments).
From ILSpy:
// System.Linq.Enumerable
public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
return new List<TSource>(source);
}
// System.Collections.Generic.List<T>
public List(IEnumerable<T> collection)
{
if (collection == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
}
ICollection<T> collection2 = collection as ICollection<T>;
if (collection2 != null)
{
int count = collection2.Count;
this._items = new T[count];
collection2.CopyTo(this._items, 0);
this._size = count;
return;
}
this._size = 0;
this._items = new T[4];
using (IEnumerator<T> enumerator = collection.GetEnumerator())
{
while (enumerator.MoveNext())
{
this.Add(enumerator.Current);
}
}
}
So, ToList() just calls List constructor and passes in an IEnumerable.
The List constructor is smart enough that if it is an ICollection it uses most efficient way of filling a new instance of List
I have this function from a plugin (from a previous post)
// This method implements the test condition for
// finding the ResolutionInfo.
private static bool IsResolutionInfo(ImageResource res)
{
return res.ID == (int)ResourceIDs.ResolutionInfo;
}
And the line thats calling this function:
get
{
return (ResolutionInfo)m_imageResources.Find(IsResolutionInfo);
}
So basically I'd like to get rid of the calling function. It's only called twice (once in the get and the other in the set). And It could possible help me to understand inline functions in c#.
get
{
return (ResolutionInfo)m_imageResources.Find(res => res.ID == (int)ResourceIDs.ResolutionInfo);
}
Does that clear it up at all?
Just to further clear things up, looking at reflector, this is what the Find method looks like:
public T Find(Predicate<T> match)
{
if (match == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
}
for (int i = 0; i < this._size; i++)
{
if (match(this._items[i]))
{
return this._items[i];
}
}
return default(T);
}
So as you can see, it loops through the collection, and for every item in the collection, it passes the item at that index to the Predicate that you passed in (through your lambda). Thus, since we're dealing with generics, it automatically knows the type you're dealing with. It'll be Type T which is whatever type that is in your collection. Makes sense?
Just to add , does the "Find" Function on a list (which is what m_imageresources is) automatically pass the parameter to the IsResoulutionInfo function?
Also, what happens first the cast or the function call?