What's the fastest way to convert List<string> to List<int> in C# assuming int.Parse will work for every item? - c#

By fastest I mean what is the most performant means of converting each item in List to type int using C# assuming int.Parse will work for every item?

You won't get around iterating over all elements. Using LINQ:
var ints = strings.Select(s => int.Parse(s));
This has the added bonus it will only convert at the time you iterate over it, and only as much elements as you request.
If you really need a list, use the ToList method. However, you have to be aware that the performance bonus mentioned above won't be available then.

If you're really trying to eeke out the last bit of performance you could try doing someting with pointers like this, but personally I'd go with the simple linq implementation that others have mentioned.
unsafe static int ParseUnsafe(string value)
{
int result = 0;
fixed (char* v = value)
{
char* str = v;
while (*str != '\0')
{
result = 10 * result + (*str - 48);
str++;
}
}
return result;
}
var parsed = input.Select(i=>ParseUnsafe(i));//optionally .ToList() if you really need list

There is likely to be very little difference between any of the obvious ways to do this: therefore go for readability (one of the LINQ-style methods posted in other answers).
You may gain some performance for very large lists by initializing the output list to its required capacity, but it's unlikely you'd notice the difference, and readability will suffer:
List<string> input = ..
List<int> output = new List<int>(input.Count);
... Parse in a loop ...
The slight performance gain will come from the fact that the output list won't need to be repeatedly reallocated as it grows.

I don't know what the performance implications are, but there is a List<T>.ConvertAll<TOutput> method for converting the elements in the current List to another type, returning a list containing the converted elements.
List.ConvertAll Method

var myListOfInts = myListString.Select(x => int.Parse(x)).ToList()
Side note: If you call ToList() on ICollection .NET framework automatically preallocates an
List of needed size, so it doesn't have to allocate new space for each new item added to the list.
Unfortunately LINQ Select doesn't return an ICollection (as Joe pointed out in comments).
From ILSpy:
// System.Linq.Enumerable
public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
return new List<TSource>(source);
}
// System.Collections.Generic.List<T>
public List(IEnumerable<T> collection)
{
if (collection == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
}
ICollection<T> collection2 = collection as ICollection<T>;
if (collection2 != null)
{
int count = collection2.Count;
this._items = new T[count];
collection2.CopyTo(this._items, 0);
this._size = count;
return;
}
this._size = 0;
this._items = new T[4];
using (IEnumerator<T> enumerator = collection.GetEnumerator())
{
while (enumerator.MoveNext())
{
this.Add(enumerator.Current);
}
}
}
So, ToList() just calls List constructor and passes in an IEnumerable.
The List constructor is smart enough that if it is an ICollection it uses most efficient way of filling a new instance of List

Related

Linq count vs IList count

If I have the following IEnumerable list which comes from some repository.
IEnumerable<SomeObject> items = _someRepo.GetAll();
What is faster:
items.Count(); // Using Linq on the IEnumerable interface.
or
List<SomeObject> temp = items.ToList<SomeObject>(); // Cast as a List
temp.Count(); // Do a count on a list
Is the Linq Count() faster or slower than casting the IEnumerable to a List and then performing a Count()?
Update: Improved the question a little bit to a bit more realistic scenario.
Calling Count directly is a better choice.
Enumerable.Count has some performance improvements built in that will let it return without enumerating the entire collection:
public static int Count<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
ICollection<TSource> collectionoft = source as ICollection<TSource>;
if (collectionoft != null) return collectionoft.Count;
ICollection collection = source as ICollection;
if (collection != null) return collection.Count;
int count = 0;
using (IEnumerator<TSource> e = source.GetEnumerator()) {
checked {
while (e.MoveNext()) count++;
}
}
return count;
}
ToList() uses similar optimizations, baked into List<T>(IEnumerable<T> source) constructor:
public List(IEnumerable<T> collection) {
if (collection==null)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
Contract.EndContractBlock();
ICollection<T> c = collection as ICollection<T>;
if( c != null) {
int count = c.Count;
if (count == 0)
{
_items = _emptyArray;
}
else {
_items = new T[count];
c.CopyTo(_items, 0);
_size = count;
}
}
else {
_size = 0;
_items = _emptyArray;
// This enumerable could be empty. Let Add allocate a new array, if needed.
// Note it will also go to _defaultCapacity first, not 1, then 2, etc.
using(IEnumerator<T> en = collection.GetEnumerator()) {
while(en.MoveNext()) {
Add(en.Current);
}
}
}
}
But as you can see it only uses generic ICollection<T>, so if your collection implements ICollection but not its generic version calling Count() directly will be much faster.
Not calling ToList first also saves you an allocation of new List<T> instance - not something overly expensive, but it's always better to avoid unnecessary allocation when possible.
A very rudimentary LinqPad test indicates that calling IEnumerable<string>.Count() is faster than creating a list collection and getting the count, not to mention more memory efficient (as mentioned in other answers) and faster still when revisited the already enumerated collection.
I had an average of ~4 ticks for calling Count() off of the IEnumerable vs ~10k for creating a new list to obtain Count.
void Main()
{
IEnumerable<string> ienumerable = GetStrings();
var test1 = new Stopwatch();
test1.Start();
var count1 = ienumerable.Count();
test1.Stop();
test1.ElapsedTicks.Dump();
var test2 = new Stopwatch();
test2.Start();
var count2 = ienumerable.ToList().Count;
test2.Stop();
test2.ElapsedTicks.Dump();
var test3 = new Stopwatch();
test3.Start();
var count3 = ienumerable.Count();
test3.Stop();
test3.ElapsedTicks.Dump();
}
public IEnumerable<string> GetStrings()
{
var testString = "test";
var strings = new List<string>();
for (int i = 0; i < 500000; i++)
{
strings.Add(testString);
}
return strings;
}
In the latter case, you're incurring the cycles required to create a new collection from an existing collection (which under the hood has to iterate the collection), then pull the Count property off of the collection. As a result, the Enumerable optimizations win and return the count value faster.
In the third test run, average ticks dropped to ~2 since it immediately returned the previously seen count (as highlighted below).
IColllection<TSource> collectionoft = source as ICollection<TSource>;
if (collectionoft != null) return collectionoft.Count;
ICollection collection = source as ICollection;
if (collection != null) return collection.Count;
However, the real cost here is not CPU cycles, but rather memory consumption. That's what you should be more concerned about.
Finally, as a warning, be careful to not use Count() while IN the enumeration of the collection. Doing so will re-enumerate the collection, leading to possible collisions. If you need to use count for something while iterating the collection, the proper approach is to create a new list with .ToList() and iterate that list, referencing Count
Either version requires (in the general case) that you completely iterate your IEnumerable<string>.
In some cases, the backing type provides a mechanism to directly determine the count which can be used for O(1) performance. See #Marcin's answer for details.
The version where you call ToList() will have an additional CPU overhead, though very small and possibly hard to measure. It will also allocate memory that would not otherwise be allocated. If your count is high, that would be the greater concern.

C# yield return performance

How much space is reserved to the underlying collection behind a method using yield return syntax WHEN I PERFORM a ToList() on it? There's a chance it will reallocate and thus decrease performance if compared to the standard approach where i create a list with predefined capacity?
The two scenarios:
public IEnumerable<T> GetList1()
{
foreach( var item in collection )
yield return item.Property;
}
public IEnumerable<T> GetList2()
{
List<T> outputList = new List<T>( collection.Count() );
foreach( var item in collection )
outputList.Add( item.Property );
return outputList;
}
yield return does not create an array that has to be resized, like what List does; instead, it creates an IEnumerable with a state machine.
For instance, let's take this method:
public static IEnumerable<int> Foo()
{
Console.WriteLine("Returning 1");
yield return 1;
Console.WriteLine("Returning 2");
yield return 2;
Console.WriteLine("Returning 3");
yield return 3;
}
Now let's call it and assign that enumerable to a variable:
var elems = Foo();
None of the code in Foo has executed yet. Nothing will be printed on the console. But if we iterate over it, like this:
foreach(var elem in elems)
{
Console.WriteLine( "Got " + elem );
}
On the first iteration of the foreach loop, the Foo method will be executed until the first yield return. Then, on the second iteration, the method will "resume" from where it left off (right after the yield return 1), and execute until the next yield return. Same for all subsequent elements.
At the end of the loop, the console will look like this:
Returning 1
Got 1
Returning 2
Got 2
Returning 3
Got 3
This means you can write methods like this:
public static IEnumerable<int> GetAnswers()
{
while( true )
{
yield return 42;
}
}
You can call the GetAnswers method, and every time you request an element, it'll give you 42; the sequence never ends. You couldn't do this with a List, because lists have to have a finite size.
How much space is reserved to the underlying collection behind a method using yield return syntax?
There's no underlying collection.
There's an object, but it isn't a collection. Just how much space it will take up depends on what it needs to keep track of.
There's a chance it will reallocate
No.
And thus decrease performance if compared to the standard approach where i create a list with predefined capacity?
It will almost certainly take up less memory than creating a list with a predefined capacity.
Let's try a manual example. Say we had the following code:
public static IEnumerable<int> CountToTen()
{
for(var i = 1; i != 11; ++i)
yield return i;
}
To foreach through this will iterate through the numbers 1 to 10 inclusive.
Now let's do this the way we would have to if yield did not exist. We'd do something like:
private class CountToTenEnumerator : IEnumerator<int>
{
private int _current;
public int Current
{
get
{
if(_current == 0)
throw new InvalidOperationException();
return _current;
}
}
object IEnumerator.Current
{
get { return Current; }
}
public bool MoveNext()
{
if(_current == 10)
return false;
_current++;
return true;
}
public void Reset()
{
throw new NotSupportedException();
// We *could* just set _current back, but the object produced by
// yield won't do that, so we'll match that.
}
public void Dispose()
{
}
}
private class CountToTenEnumerable : IEnumerable<int>
{
public IEnumerator<int> GetEnumerator()
{
return new CountToTenEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
public static IEnumerable<int> CountToTen()
{
return new CountToTenEnumerable();
}
Now, for a variety of reasons this is quite different to the code you're likely to get from the version using yield, but the basic principle is the same. As you can see there are two allocations involved of objects (same number as if we had a collection and then did a foreach on that) and the storage of a single int. In practice we can expect yield to store a few more bytes than that, but not a lot.
Edit: yield actually does a trick where the first GetEnumerator() call on the same thread that obtained the object returns that same object, doing double service for both cases. Since this covers over 99% of use cases yield actually does one allocation rather than two.
Now let's look at:
public IEnumerable<T> GetList1()
{
foreach( var item in collection )
yield return item.Property;
}
While this would result in more memory used than just return collection, it won't result in a lot more; the only thing the enumerator produced really needs to keep track of is the enumerator produced by calling GetEnumerator() on collection and then wrapping that.
This is going to be massively less memory than that of the wasteful second approach you mention, and much faster to get going.
Edit:
You've changed your question to include "syntax WHEN I PERFORM a ToList() on it", which is worth considering.
Now, here we need to add a third possibility: Knowledge of the collection's size.
Here, there is the possibilty that using new List(capacity) will prevent allocations of the list being built. That can indeed be a considerable saving.
If the object that has ToList called on it implements ICollection<T> then ToList will end up first doing a single allocation of an internal array of T and then calling ICollection<T>.CopyTo().
This would mean that your GetList2 would result in a faster ToList() than your GetList1.
However, your GetList2 has already wasted time and memory doing what ToList() will do with the results of GetList1 anyway!
What it should have done here was just return new List<T>(collection); and be done with it.
If though we need to actually do something inside GetList1 or GetList2 (e.g. convert elements, filter elements, track averages, and so on) then GetList1 is going to be faster and lighter on memory. Much lighter if we never call ToList() on it, and slightly ligher if we do call ToList() because again, the faster and lighter ToList() is offset by GetList2 being slower and heavier in the first place by exactly the same amount.

How to properly check IEnumerable for existing results

What's the best practice to check if a collection has items?
Here's an example of what I have:
var terminalsToSync = TerminalAction.GetAllTerminals();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
The GetAllTerminals() method will execute a stored procedure and, if we return a result, (Any() is true), SyncTerminals() will loop through the elements; thus enumerating it again and executing the stored procedure for the second time.
What's the best way to avoid this?
I'd like a good solution that can be used in other cases too; possibly without converting it to List.
Thanks in advance.
I would probably use a ToArray call, and then check Length; you're going to enumerate all the results anyway so why not do it early? However, since you've said you want to avoid early realisation of the enumerable...
I'm guessing that SyncTerminals has a foreach, in which case you can write it something like this:
bool any = false;
foreach(var terminal in terminalsToSync)
{
if(!any)any = true;
//....
}
if(!any)
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
Okay, there's a redundant if after the first loop, but I'm guessing the cost of an extra few CPU cycles isn't going to matter much.
Equally, you could do the iteration the old way and use a do...while loop and GetEnumerator; taking the first iteration out of the loop; that way there are literally no wasted operations:
var enumerator = terminalsToSync.GetEnumerator();
if(enumerator.MoveNext())
{
do
{
//sync enumerator.Current
} while(enumerator.MoveNext())
}
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
How about this, which still defers execution, but buffers it once executed:
var terminalsToSync = TerminalAction.GetAllTerminals().Lazily();
with:
public static class LazyEnumerable {
public static IEnumerable<T> Lazily<T>(this IEnumerable<T> source) {
if (source is LazyWrapper<T>) return source;
return new LazyWrapper<T>(source);
}
class LazyWrapper<T> : IEnumerable<T> {
private IEnumerable<T> source;
private bool executed;
public LazyWrapper(IEnumerable<T> source) {
if (source == null) throw new ArgumentNullException("source");
this.source = source;
}
IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
public IEnumerator<T> GetEnumerator() {
if (!executed) {
executed = true;
source = source.ToList();
}
return source.GetEnumerator();
}
}
}
Personally i wouldnt use an any here, foreach will simply not loop through any items if the collection is empty, so i would just do it like that. However i would recommend that you check for null.
If you do want to pre-enumerate the set use .ToArray() eg will only enumerate once:
var terminalsToSync = TerminalAction.GetAllTerminals().ToArray();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
var terminalsToSync = TerminalAction.GetAllTerminals().ToList();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
.Length or .Count is faster since it doesn't need to go through the GetEnumerator()/MoveNext()/Dispose() required by Any()
Here's another way of approaching this problem:
int count = SyncTerminals(terminalsToSync);
if(count == 0) GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
where you change SyncTerminals to do:
int count = 0;
foreach(var obj in terminalsToSync) {
count++;
// some code
}
return count;
Nice and simple.
All the caching solutions here are caching all items when the first item is being retrieved. It it really lazy if you cache each single item while the items of the list are is iterated.
The difference can be seen in this example:
public class LazyListTest
{
private int _count = 0;
public void Test()
{
var numbers = Enumerable.Range(1, 40);
var numbersQuery = numbers.Select(GetElement).ToLazyList(); // Cache lazy
var total = numbersQuery.Take(3)
.Concat(numbersQuery.Take(10))
.Concat(numbersQuery.Take(3))
.Sum();
Console.WriteLine(_count);
}
private int GetElement(int value)
{
_count++;
// Some slow stuff here...
return value * 100;
}
}
If you run the Test() method, the _count is only 10. Without caching it would be 16 and with .ToList() it would be 40!
An example of the implementation of LazyList can be found here.
If you're seeing two procedure calls for the evaluation of whatever GetAllTerminals() returns, this means that the procedure's result isn't being cached. Without knowing what data-access strategy you're using, this is quite hard to fix in a general way.
The simplest solution, as you've alluded, is to copy the result of the call before you perform any other operations. If you wanted to, you could neatly wrap this behaviour up in an IEnumerable<T> which executes the inner enumerable call just once:
public class CachedEnumerable<T> : IEnumerable<T>
{
public CachedEnumerable<T>(IEnumerable<T> enumerable)
{
result = new Lazy<List<T>>(() => enumerable.ToList());
}
private Lazy<List<T>> result;
public IEnumerator<T> GetEnumerator()
{
return this.result.Value.GetEnumerator();
}
System.Collections.IEnumerable GetEnumerator()
{
return this.GetEnumerator();
}
}
Wrap the result in an instance of this type and it will not evaluate the inner enumerable multiple times.

How to change a LINQ list into an arraylist

I have to write a query in a web application using LINQ but I need to change that query into an array list. How can I change the query below to do this?
var resultsQuery =
from result in o["SearchResponse"]["Web"]["Results"].Children()
select new
{
Url = result.Value<string>("Url").ToString(),
Title = result.Value<string>("Title").ToString(),
Content = result.Value<string>("Description").ToString()
};
If you really need to create an ArrayList, you can write new ArrayList(resultsQuery.ToArray()).
However, you should use a List<T> instead, by writing resultsQuery.ToList().
Note that, in both cases, the list will contain objects of anonymous type.
There is a .ToArray() method that'll convert IEnumerable to an Array.
ArrayList doesn't have a constructor or Add(Range) method that takes an IEnumerable. So that leaves two choices:
Use an intermediate collection that does implement ICollection: as both Array and List<T> implement ICollection can be used via the ToArray() or ToList() extension methods from LINQ.
Create an instance of ArrayList and then add each element of the result:
var query = /* LINQ Expression */
var res = new ArrayList();
foreach (var item in query) {
res.Add(item);
}
The former method is simple to do but does mean creating the intermediate data structure (which of the two options has a higher overhead is an interesting question and partly depends on the query so there is no general answer). The latter is more code and does involve growing the ArrayList incrementally (so more memory for the GC, as would be the case for an intermediate Array or List<T>).
If you just need this in one place you can just do the code inline, if you need to do it in multiple places create your own extension method over IEnumerable<T>:
public static class MyExtensions {
public static ArrayList ToArrayList<T>(this IEnumerable<T> input) {
var col = input as ICollection;
if (col != null) {
return new ArrayList(col);
}
var res = new ArrayList();
foreach (var item in input) {
res.Add(item);
}
return res;
}
}

C# List - Group By - Without Linq

I have an object:
IObject
{
string Account,
decimal Amount
}
How do I group by Account and Sum the Amount, returning a List without Linq.
2.0 Framework ... that is why no Linq.
Here is what I have:
ListofObjects = List<IObject>;
foreach (var object in objects)
{
var objectToAdd = new Object(object);
var oa = ListofObjects.Find(x => x.Account == objectToAdd.Account);
if (oa == null)
{
ListofObjects.Add(objectToAdd);
}
else
{
ListofObjects.Remove(oa);
oa.Amount = objectToAdd.Amount;
ListofObjects.Add(oa);
}
}
Easiest answer: use LINQBridge and get all your LINQ to Objects goodness against .NET 2.0... works best if you can use C# 3 (i.e. VS2008 but targeting .NET 2.0).
If you really can't do that, you'll basically need to keep a dictionary from a key to a list of values. Iterate through the sequence, and check whether it already contains a list - if not, add one. Then add to whatever list you've found (whether new or old).
If you need to return the groups in key order, you'll need to also keep a list of keys in the order in which you found them. Frankly it's a pain... just get LINQBridge instead :)
(Seriously, each individual bit of LINQ is actually fairly easy to write - but it's also quite easy to make off-by-one errors, or end up forgetting to optimize something like Count() in the case where it's actually an ICollection<T>... There's no need to reinvent the wheel here.)
EDIT: I was about to write some code, but then I noticed that you want a list returned... a list of what? A List<IList<IObject>>? Or are you actually trying to group and sum in one go? If so, don't you want a list of pairs of key and amount? Or are you going to reuse the same class that you've already got for a single account, but as the aggregate? If it's the latter, here's some sample code:
public static IList<IObject> SumAccounts(IEnumerable<IObject> data)
{
List<IObject> ret = new List<IObject>();
Dictionary<string, IObject> map = new Dictionary<string, IObject>();
foreach (var item in data)
{
IObject existing;
if (!map.TryGetValue(item.Account, out existing))
{
existing = new IObject(item.Account, 0m);
map[item.Account] = existing;
ret.Add(existing);
}
existing.Amount += item.Amount;
}
return ret;
}
Admittedly the extra efficiency here due to using a Dictionary for lookups will be pointless unless you've got really quite a lot of accounts...
EDIT: If you've got a small number of accounts as per your comment, you could use:
public static IList<IObject> SumAccounts(IEnumerable<IObject> data)
{
List<IObject> ret = new List<IObject>();
foreach (var item in data)
{
IObject existing = ret.Find(x => x.Account == item.Account);
if (existing == null)
{
existing = new IObject(item.Account, 0m);
ret.Add(existing);
}
existing.Amount += item.Amount;
}
return ret;
}
Use a dictionary to hold the results. Locating an item in a dictionary is close to an O(1) operation, so it's a lot faster than searching for items in a list.
Dictionary<string, decimal> sum = new Dictionary<string, decimal>();
foreach (IObject obj in objects) {
if (sum.ContainsKey(obj.Account)) {
sum[obj.Account].Amount += obj.Amount;
} else {
sum.Add(obj.Account, obj.Amount);
}
}

Categories