After searching online and going through past stackoverflow posts for a suitable implementation for dynamic ordering with linq, i came up with my own implementation that borrows a few things from other solutions i have previously seen.
What i need to know is if this implementation is threadsafe? I don't believe it is as i am passing an enumerable generic type object (reference type) as a parameter into the static method but i would like to know if i am missing anything else and hopefully how to make it completely threadsafe.
public class OrderByHelper
{
public static IEnumerable<T> OrderBy<T>(IQueryable<T> items, string sortColumn, string sortDirection, int pageNumber, int pageSize)
{
Type t = typeof(T).GetProperty(sortColumn).PropertyType;
return (IEnumerable<T>)(typeof(OrderByHelper)
.GetMethod("OrderByKnownType")
.MakeGenericMethod(new[] { typeof(T), t })
.Invoke(null, new object[] { items, sortColumn, sortDirection, pageNumber, pageSize }));
}
public static IEnumerable<K> OrderByKnownType<K, T>(IQueryable<K> items, string sortColumn, string sortDirection, int pageNumber, int pageSize)
{
var param = Expression.Parameter(typeof(K), "i");
var mySortExpression = Expression.Lambda<Func<K, T>>(Expression.Property(param, sortColumn), param);
if (!string.IsNullOrEmpty(sortDirection))
{
if (sortDirection == "ASC")
return items.OrderBy(mySortExpression).Skip((pageNumber - 1) * pageSize).Take(pageSize);
else
return items.OrderByDescending(mySortExpression).Skip((pageNumber - 1) * pageSize).Take(pageSize);
}
else
throw new InvalidOperationException("No sorting direction specified.");
}
}
The answer is simple:
It is thread safe if your original collection is thread safe.
All extension methods in LINQ merely call .GetEnumerator() of the original collection. Sorting and Ordering doesn't manipulate the original collection, but rather lets you enumerate it in a sorted order. Thus, you only do read operations on the data. As a general rule of thumb, if you only do read data, you do not need to implement any thread safety.
I'm tempted to say that in 99% of cases you don't need any thread safety, because you collect data only once and then expose the LINQ functionality. You might only need a thread safe collection if you want to create a framework that does not instantiate new collections when refreshing data, but rather keeps re-using the same (observable) collection instance that synchronizes itself with the database data in a fancy way.
But I don't know about your exact scenario. If you really do need thread safety, then it depends if you have control over the code where the original collection is instantiated and the data is added.
If you are using merely LINQ-to-objects, then it's entirely in your control to create an instance of a thread safe collection class at that spot.
If you're using LINQ-to-SQL or anything, then it might be difficult because the original collection that collects the data from the database is probably instantiated deep within the provider and hidden from you. I haven't looked at it though if there are extension points where you can override stuff to use a thread safe collection instead.
Related
My working assumption is that LINQ is thread-safe when used with the System.Collections.Concurrent collections (including ConcurrentDictionary).
(Other Overflow posts seem to agree: link)
However, an inspection of the implementation of the LINQ OrderBy extension method shows that it appears not to be threadsafe with the subset of concurrent collections which implement ICollection (e.g. ConcurrentDictionary).
The OrderedEnumerable GetEnumerator (source here) constructs an instance of a Buffer struct (source here) which tries to cast the collection to an ICollection (which ConcurrentDictionary implements) and then performs a collection.CopyTo with an array initialised to the size of the collection.
Therefore, if the ConcurrentDictionary (as the concrete ICollection in this case) grows in size during the OrderBy operation, between initialising the array and copying into it, this operation will throw.
The following test code shows this exception:
(Note: I appreciate that performing an OrderBy on a thread-safe collection which is changing underneath you is not that meaningful, but I do not believe it should throw)
using System;
using System.Collections.Concurrent;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Program
{
class Program
{
static void Main(string[] args)
{
try
{
int loop = 0;
while (true) //Run many loops until exception thrown
{
Console.WriteLine($"Loop: {++loop}");
_DoConcurrentDictionaryWork().Wait();
}
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
}
private static async Task _DoConcurrentDictionaryWork()
{
var concurrentDictionary = new ConcurrentDictionary<int, object>();
var keyGenerator = new Random();
var tokenSource = new CancellationTokenSource();
var orderByTaskLoop = Task.Run(() =>
{
var token = tokenSource.Token;
while (token.IsCancellationRequested == false)
{
//Keep ordering concurrent dictionary on a loop
var orderedPairs = concurrentDictionary.OrderBy(x => x.Key).ToArray(); //THROWS EXCEPTION HERE
//...do some more work with ordered snapshot...
}
});
var updateDictTaskLoop = Task.Run(() =>
{
var token = tokenSource.Token;
while (token.IsCancellationRequested == false)
{
//keep mutating dictionary on a loop
var key = keyGenerator.Next(0, 1000);
concurrentDictionary[key] = new object();
}
});
//Wait for 1 second
await Task.Delay(TimeSpan.FromSeconds(1));
//Cancel and dispose token
tokenSource.Cancel();
tokenSource.Dispose();
//Wait for orderBy and update loops to finish (now token cancelled)
await Task.WhenAll(orderByTaskLoop, updateDictTaskLoop);
}
}
}
That the OrderBy throws an exception leads to one of a few possible conclusions:
1) My assumption about LINQ being threadsafe with concurrent collections is incorrect, and it is only safe to perform LINQ on collections (be they concurrent or not) which are not mutating during the LINQ query
2) There is a bug with the implementation of LINQ OrderBy and it is incorrect for the implementation to try and cast the source collection to an ICollection and try and perform the collection copy (and It should just drop through to its default behaviour iterating the IEnumerable).
3) I have misunderstood what is going on here...
Thoughts much appreciated!
It's not stated anywhere that OrderBy (or other LINQ methods) should always use GetEnumerator of source IEnumerable or that it should be thread safe on concurrent collections. All that is promised is this method
Sorts the elements of a sequence in ascending order according to a
key.
ConcurrentDictionary is not thread-safe in some global sense either. It's thread-safe with respect to other operations performed on it. Even more, documentation says that
All public and protected members of ConcurrentDictionary
are thread-safe and may be used concurrently from multiple threads.
However, members accessed through one of the interfaces the
ConcurrentDictionary implements, including extension
methods, are not guaranteed to be thread safe and may need to be
synchronized by the caller.
So, your understanding is correct (OrderBy will see IEnumerable you pass to it is really ICollection, will then get length of that collection, allocate buffer of that size, then will call ICollection.CopyTo, and this is of course not thread safe on any type of collection), but it's not a bug in OrderBy because neither OrderBy nor ConcurrentDictionary ever promised what you assume.
If you want to do OrderBy in a thread safe way on ConcurrentDictionary, you need to rely on methods that are promised to be thread safe. For example:
// note: this is NOT IEnumerable.ToArray()
// but public ToArray() method of ConcurrentDictionary itself
// it is guaranteed to be thread safe with respect to other operations
// on this dictionary
var snapshot = concurrentDictionary.ToArray();
// we are working on snapshot so no one other thread can modify it
// of course at this point real contents of dictionary might not be
// the same as our snapshot
var sorted = snapshot.OrderBy(c => c.Key);
If you don't want to allocate additional array (with ToArray), you can use Select(c => c) and it will work in this case, but then we are again in moot territory and relying on something to be safe to use in situation it was not promised to (Select will also not always enumerate your collection. If collection is array or list - it will shortcut and use indexers instead). So you can create extension method like this:
public static class Extensions {
public static IEnumerable<T> ForceEnumerate<T>(this ICollection<T> collection) {
foreach (var item in collection)
yield return item;
}
}
And use it like this if you want to be safe and don't want to allocate array:
concurrentDictionary.ForceEnumerate().OrderBy(c => c.Key).ToArray();
In this case we are forcing enumeration of ConcurrentDictionary (which we know is safe from documentation) and then pass that to OrderBy knowing that it cannot do any harm with that pure IEnumerable. Note that as correctly pointed out in comments by mjwills, this is not exactly the same as ToArray, because ToArray produces snapshot (locks collection preventing modifications while building array) and Select \ yield does not acquire any locks (so items might be added\removed right when enumeration is in progress). Though I doubt it matters when doing things like described in question - in both cases after OrderBy is completed - you have no idea whether your ordered results reflect current state of collection or not.
I have a List<Person> and instead want to convert them for simple processing to a List<string>, doing the following:
List<Person> persons = GetPersonsBySeatOrder();
List<string> seatNames = persons.Select(x => x.Name).ToList();
Console.WriteLine("First in line: {0}", seatNames[0]);
Is the .Select() statement on a LINQ to Objects object guaranteed to not change the order of the list members? Assuming no explicit distinct/grouping/ordering is added
Also, if an arbitrary .Where() clause is used first, is it still guaranteed to keep the relative order, or does it sometimes use non-iterative filtering?
As Fermin commented above, this is essentially a duplicate question. I failed on selecting the correct keywords to search stackoverflow
Preserving order with LINQ
It depends on the underlying collection type more than anything. You could get inconsistent ordering from a HashSet, but a List is safe. Even if the ordering you want is provided implicitly, it's better to define an explicit ordering if you need it though. It looks like you're doing that judging by the method names.
In current .Net implementation it use such code. But there are no guarantee that this implementation will be in future.
private static IEnumerable<TResult> SelectIterator<TSource, TResult>(IEnumerable<TSource> source, Func<TSource, int, TResult> selector)
{
int index = -1;
foreach (TSource source1 in source)
{
checked { ++index; }
yield return selector(source1, index);
}
}
Yes, Linq Select is guaranteed to return all its results in the order of the enumeration it is passed. Like most Linq functions, it is fully specified what it does. Barring handling of errors, this might as well be the code for Select:
IEnumerable<Y> Select<X, Y>(this IEnumerable<X> input, Func<X, Y> transform)
{
foreach (var x in input)
yield return transform(x);
}
But as Samantha Branham pointed out, the underlying collection might not have an intrinsic order. I've seen hashtables that rearrange themselves on read.
As the easiest way to convert the IList<T1> to IList<BaseT1>?
IList<T1>.Count() is very large number!!!
class BaseT1 { };
class T1 : BaseT1
{
static public IList<BaseT1> convert(IList<T1> p)
{
IList<BaseT1> result = new List<BaseT1>();
foreach (BaseT1 baseT1 in p)
result.Add(baseT1);
return result;
}
}
You'll get much better performance in your implementation if you specify the size of the result list when it is initalized, and call the Add method on List<T> directly:
List<BaseT1> result = new List<BaseT1>(p.Count);
that way, it isn't resizing lots of arrays when new items get added. That should yield an order-of-magnitude speedup.
Alternatively, you could code a wrapper class that implements IList<BaseT1> and takes an IList<T1> in the constructor.
linq?
var baseList = derivedList.Cast<TBase>();
Edit:
Cast returns an IEnumerable, do you need it in a List? List can be an expensive class to deal with
IList<T1>.Count() is very large number!!!
Yes, which means that no matter what syntax sugar you use, the conversion is going to require O(n) time and O(n) storage. You cannot cast the list to avoid re-creating it. If that was possible, client code could add an element of BaseT1 to the list, violating the promise that list only contains objects that are compatible with T1.
The only way to get ahead is to return an interface type that cannot change the list. Which would be IEnumerable<BaseT1> in this case. Allowing you to iterate the list, nothing else. That conversion is automatic in .NET 4.0 thanks to its support for covariance. You'll have to write a little glue code in earlier versions:
public static IEnumerable<BaseT1> enumerate(IList<T1> p) {
foreach (BaseT1 item in p) yield return item;
}
Let's say I have a class
public class MyObject
{
public int SimpleInt{get;set;}
}
And I have a List<MyObject>, and I ToList() it and then change one of the SimpleInt, will my change be propagated back to the original list. In other words, what would be the output of the following method?
public void RunChangeList()
{
var objs = new List<MyObject>(){new MyObject(){SimpleInt=0}};
var whatInt = ChangeToList(objs );
}
public int ChangeToList(List<MyObject> objects)
{
var objectList = objects.ToList();
objectList[0].SimpleInt=5;
return objects[0].SimpleInt;
}
Why?
P/S: I'm sorry if it seems obvious to find out. But I don't have compiler with me now...
Yes, ToList will create a new list, but because in this case MyObject is a reference type then the new list will contain references to the same objects as the original list.
Updating the SimpleInt property of an object referenced in the new list will also affect the equivalent object in the original list.
(If MyObject was declared as a struct rather than a class then the new list would contain copies of the elements in the original list, and updating a property of an element in the new list would not affect the equivalent element in the original list.)
From the Reflector'd source:
public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
return new List<TSource>(source);
}
So yes, your original list won't be updated (i.e. additions or removals) however the referenced objects will.
ToList will always create a new list, which will not reflect any subsequent changes to the collection.
However, it will reflect changes to the objects themselves (Unless they're mutable structs).
In other words, if you replace an object in the original list with a different object, the ToList will still contain the first object.
However, if you modify one of the objects in the original list, the ToList will still contain the same (modified) object.
Yes, it creates a new list. This is by design.
The list will contain the same results as the original enumerable sequence, but materialized into a persistent (in-memory) collection. This allows you to consume the results multiple times without incurring the cost of recomputing the sequence.
The beauty of LINQ sequences is that they are composable. Often, the IEnumerable<T> you get is the result of combining multiple filtering, ordering, and/or projection operations. Extension methods like ToList() and ToArray() allow you to convert the computed sequence into a standard collection.
The accepted answer correctly addresses the OP's question based on his example. However, it only applies when ToList is applied to a concrete collection; it does not hold when the elements of the source sequence have yet to be instantiated (due to deferred execution). In case of the latter, you might get a new set of items each time you call ToList (or enumerate the sequence).
Here is an adaptation of the OP's code to demonstrate this behaviour:
public static void RunChangeList()
{
var objs = Enumerable.Range(0, 10).Select(_ => new MyObject() { SimpleInt = 0 });
var whatInt = ChangeToList(objs); // whatInt gets 0
}
public static int ChangeToList(IEnumerable<MyObject> objects)
{
var objectList = objects.ToList();
objectList.First().SimpleInt = 5;
return objects.First().SimpleInt;
}
Whilst the above code may appear contrived, this behaviour can appear as a subtle bug in other scenarios. See my other example for a situation where it causes tasks to get spawned repeatedly.
A new list is created but the items in it are references to the orginal items (just like in the original list). Changes to the list itself are independent, but to the items will find the change in both lists.
Just stumble upon this old post and thought of adding my two cents. Generally, if I am in doubt, I quickly use the GetHashCode() method on any object to check the identities. So for above -
public class MyObject
{
public int SimpleInt { get; set; }
}
class Program
{
public static void RunChangeList()
{
var objs = new List<MyObject>() { new MyObject() { SimpleInt = 0 } };
Console.WriteLine("objs: {0}", objs.GetHashCode());
Console.WriteLine("objs[0]: {0}", objs[0].GetHashCode());
var whatInt = ChangeToList(objs);
Console.WriteLine("whatInt: {0}", whatInt.GetHashCode());
}
public static int ChangeToList(List<MyObject> objects)
{
Console.WriteLine("objects: {0}", objects.GetHashCode());
Console.WriteLine("objects[0]: {0}", objects[0].GetHashCode());
var objectList = objects.ToList();
Console.WriteLine("objectList: {0}", objectList.GetHashCode());
Console.WriteLine("objectList[0]: {0}", objectList[0].GetHashCode());
objectList[0].SimpleInt = 5;
return objects[0].SimpleInt;
}
private static void Main(string[] args)
{
RunChangeList();
Console.ReadLine();
}
And answer on my machine -
objs: 45653674
objs[0]: 41149443
objects: 45653674
objects[0]: 41149443
objectList: 39785641
objectList[0]: 41149443
whatInt: 5
So essentially the object that list carries remain the same in above code. Hope the approach helps.
I think that this is equivalent to asking if ToList does a deep or shallow copy. As ToList has no way to clone MyObject, it must do a shallow copy, so the created list contains the same references as the original one, so the code returns 5.
ToList will create a brand new list.
If the items in the list are value types, they will be directly updated, if they are reference types, any changes will be reflected back in the referenced objects.
In the case where the source object is a true IEnumerable (i.e. not just a collection packaged an as enumerable), ToList() may NOT return the same object references as in the original IEnumerable. It will return a new List of objects, but those objects may not be the same or even Equal to the objects yielded by the IEnumerable when it is enumerated again
var objectList = objects.ToList();
objectList[0].SimpleInt=5;
This will update the original object as well. The new list will contain references to the objects contained within it, just like the original list. You can change the elements either and the update will be reflected in the other.
Now if you update a list (adding or deleting an item) that will not be reflected in the other list.
I don't see anywhere in the documentation that ToList() is always guaranteed to return a new list. If an IEnumerable is a List, it may be more efficient to check for this and simply return the same List.
The worry is that sometimes you may want to be absolutely sure that the returned List is != to the original List. Because Microsoft doesn't document that ToList will return a new List, we can't be sure (unless someone found that documentation). It could also change in the future, even if it works now.
new List(IEnumerable enumerablestuff) is guaranteed to return a new List. I would use this instead.
My question as title above. For example
IEnumerable<T> items = new T[]{new T("msg")};
items.ToList().Add(new T("msg2"));
but after all it only has 1 item inside. Can we have a method like items.Add(item) like the List<T>?
You cannot, because IEnumerable<T> does not necessarily represent a collection to which items can be added. In fact, it does not necessarily represent a collection at all! For example:
IEnumerable<string> ReadLines()
{
string s;
do
{
s = Console.ReadLine();
yield return s;
} while (!string.IsNullOrEmpty(s));
}
IEnumerable<string> lines = ReadLines();
lines.Add("foo") // so what is this supposed to do??
What you can do, however, is create a new IEnumerable object (of unspecified type), which, when enumerated, will provide all items of the old one, plus some of your own. You use Enumerable.Concat for that:
items = items.Concat(new[] { "foo" });
This will not change the array object (you cannot insert items into to arrays, anyway). But it will create a new object that will list all items in the array, and then "Foo". Furthermore, that new object will keep track of changes in the array (i.e. whenever you enumerate it, you'll see the current values of items).
The type IEnumerable<T> does not support such operations. The purpose of the IEnumerable<T> interface is to allow a consumer to view the contents of a collection. Not to modify the values.
When you do operations like .ToList().Add() you are creating a new List<T> and adding a value to that list. It has no connection to the original list.
What you can do is use the Add extension method to create a new IEnumerable<T> with the added value.
items = items.Add("msg2");
Even in this case it won't modify the original IEnumerable<T> object. This can be verified by holding a reference to it. For example
var items = new string[]{"foo"};
var temp = items;
items = items.Add("bar");
After this set of operations the variable temp will still only reference an enumerable with a single element "foo" in the set of values while items will reference a different enumerable with values "foo" and "bar".
EDIT
I contstantly forget that Add is not a typical extension method on IEnumerable<T> because it's one of the first ones that I end up defining. Here it is
public static IEnumerable<T> Add<T>(this IEnumerable<T> e, T value) {
foreach ( var cur in e) {
yield return cur;
}
yield return value;
}
Have you considered using ICollection<T> or IList<T> interfaces instead, they exist for the very reason that you want to have an Add method on an IEnumerable<T>.
IEnumerable<T> is used to 'mark' a type as being...well, enumerable or just a sequence of items without necessarily making any guarantees of whether the real underlying object supports adding/removing of items. Also remember that these interfaces implement IEnumerable<T> so you get all the extensions methods that you get with IEnumerable<T> as well.
In .net Core, there is a method Enumerable.Append that does exactly that.
The source code of the method is available on GitHub..... The implementation (more sophisticated than the suggestions in other answers) is worth a look :).
A couple short, sweet extension methods on IEnumerable and IEnumerable<T> do it for me:
public static IEnumerable Append(this IEnumerable first, params object[] second)
{
return first.OfType<object>().Concat(second);
}
public static IEnumerable<T> Append<T>(this IEnumerable<T> first, params T[] second)
{
return first.Concat(second);
}
public static IEnumerable Prepend(this IEnumerable first, params object[] second)
{
return second.Concat(first.OfType<object>());
}
public static IEnumerable<T> Prepend<T>(this IEnumerable<T> first, params T[] second)
{
return second.Concat(first);
}
Elegant (well, except for the non-generic versions). Too bad these methods are not in the BCL.
No, the IEnumerable doesn't support adding items to it. The alternative solution is
var myList = new List(items);
myList.Add(otherItem);
To add second message you need to -
IEnumerable<T> items = new T[]{new T("msg")};
items = items.Concat(new[] {new T("msg2")})
I just come here to say that, aside from Enumerable.Concat extension method, there seems to be another method named Enumerable.Append in .NET Core 1.1.1. The latter allows you to concatenate a single item to an existing sequence. So Aamol's answer can also be written as
IEnumerable<T> items = new T[]{new T("msg")};
items = items.Append(new T("msg2"));
Still, please note that this function will not change the input sequence, it just return a wrapper that put the given sequence and the appended item together.
Not only can you not add items like you state, but if you add an item to a List<T> (or pretty much any other non-read only collection) that you have an existing enumerator for, the enumerator is invalidated (throws InvalidOperationException from then on).
If you are aggregating results from some type of data query, you can use the Concat extension method:
Edit: I originally used the Union extension in the example, which is not really correct. My application uses it extensively to make sure overlapping queries don't duplicate results.
IEnumerable<T> itemsA = ...;
IEnumerable<T> itemsB = ...;
IEnumerable<T> itemsC = ...;
return itemsA.Concat(itemsB).Concat(itemsC);
Others have already given great explanations regarding why you can not (and should not!) be able to add items to an IEnumerable. I will only add that if you are looking to continue coding to an interface that represents a collection and want an add method, you should code to ICollection or IList. As an added bonanza, these interfaces implement IEnumerable.
you can do this.
//Create IEnumerable
IEnumerable<T> items = new T[]{new T("msg")};
//Convert to list.
List<T> list = items.ToList();
//Add new item to list.
list.add(new T("msg2"));
//Cast list to IEnumerable
items = (IEnumerable<T>)items;
Easyest way to do that is simply
IEnumerable<T> items = new T[]{new T("msg")};
List<string> itemsList = new List<string>();
itemsList.AddRange(items.Select(y => y.ToString()));
itemsList.Add("msg2");
Then you can return list as IEnumerable also because it implements IEnumerable interface
Instances implementing IEnumerable and IEnumerator (returned from IEnumerable) don't have any APIs that allow altering collection, the interface give read-only APIs.
The 2 ways to actually alter the collection:
If the instance happens to be some collection with write API (e.g. List) you can try casting to this type:
IList<string> list = enumerableInstance as IList<string>;
Create a list from IEnumerable (e.g. via LINQ extension method toList():
var list = enumerableInstance.toList();
IEnumerable items = Enumerable.Empty(T);
List somevalues = new List();
items.ToList().Add(someValues);
items.ToList().AddRange(someValues);
Sorry for reviving really old question but as it is listed among first google search results I assume that some people keep landing here.
Among a lot of answers, some of them really valuable and well explained, I would like to add a different point of vue as, to me, the problem has not be well identified.
You are declaring a variable which stores data, you need it to be able to change by adding items to it ? So you shouldn't use declare it as IEnumerable.
As proposed by #NightOwl888
For this example, just declare IList instead of IEnumerable: IList items = new T[]{new T("msg")}; items.Add(new T("msg2"));
Trying to bypass the declared interface limitations only shows that you made the wrong choice.
Beyond this, all methods that are proposed to implement things that already exists in other implementations should be deconsidered.
Classes and interfaces that let you add items already exists. Why always recreate things that are already done elsewhere ?
This kind of consideration is a goal of abstracting variables capabilities within interfaces.
TL;DR : IMO these are cleanest ways to do what you need :
// 1st choice : Changing declaration
IList<T> variable = new T[] { };
variable.Add(new T());
// 2nd choice : Changing instantiation, letting the framework taking care of declaration
var variable = new List<T> { };
variable.Add(new T());
When you'll need to use variable as an IEnumerable, you'll be able to. When you'll need to use it as an array, you'll be able to call 'ToArray()', it really always should be that simple. No extension method needed, casts only when really needed, ability to use LinQ on your variable, etc ...
Stop doing weird and/or complex things because you only made a mistake when declaring/instantiating.
Maybe I'm too late but I hope it helps anyone in the future.
You can use the insert function to add an item at a specific index.
list.insert(0, item);
Sure, you can (I am leaving your T-business aside):
public IEnumerable<string> tryAdd(IEnumerable<string> items)
{
List<string> list = items.ToList();
string obj = "";
list.Add(obj);
return list.Select(i => i);
}