Sorting ConcurrentDictionary makes any sense?

Sorting ConcurrentDictionary makes any sense? - c#

At first my thought was like "this is an hash-based data type, then it is unsorted".
Then since I was about to use it I examined the matter in depth and found out that this class implements IEnumerable and also this post confirmed that it is possible to iterate over this kind of data.
So, my question is: if I use foreach over a ConcurrentDictionary which is the order I read the elements in?
Then, as a second question, I'd like to know if the sorting methods inherited by its interfaces are of any kind of use. If I call a sorting method over a ConcurrentDictionary the new order will persist (for example for an incoming foreach)?.
Hope I've made myself clear

The current implementation makes no promises whatsoever regarding the order of the elements.
A future implementation can easily change the order by which the elements are enumerated.
As such, your code should not depend on that order.
From the Dictionary<TKey, TValue> msdn docs:
The order in which the items are returned is undefined.
(I couldn't find any reference regarding the ConcurrentDictionary, but the same principle applies.)
When you refer to "the sorting methods inherited by its interfaces", do you mean LINQ extensions? Like OrderBy? If so, these extensions are purely functional and always return a new collection. So, to answer your question "the new order will persist?": no, it won't. You can however use it like this:
foreach(KeyValuePair<T1, T2> kv in dictionary.OrderBy(...))
{
}

if I use foreach over a ConcurrentDictionary which is the order I read the elements in?
You get them in the order of buckets they belong to, and if a bucket contains multiple items, the items are in the order in which they've been added.
But as others have said, this is an implementation detail you shouldn't rely on.
I'd like to know if the sorting methods inherited by its interfaces
are of any kind of use. If I call a sorting method over a
ConcurrentDictionary the new order will persist (for example for an
incoming foreach)?.
I assume you're refering to the OrderBy() extension method on the IEnumnerable<KeyValuePair<TKey, TValue>> interface. No nothing will persist. This method returns another IEnumnerable<KeyValuePair<TKey, TValue>>. The dictionary remains as it is.

Sounds like you might be asking for trouble if you aren't particularly careful. As was mentioned by dcastro order of elements is not ensured. A more troublesome issue is that a ConcurrentDictionary can be changed at any time by other threads. This means that even if order was ensured there is no reason why new items being added while you iterate wouldn't be missed. Unless you know you can prevent other threads from changing the dictionary it's probably not a good idea to iterate over it.

Related

Forcing/Requiring Sorted List Argument for Method

Suppose a method has been written that expects a sorted list as one of its input.
Of course this will be commented and documented in the code, param will be named "sortedList" but if someone forgets, then there will be a bug.
Is there a way to FORCE the input must be sorted?
I was thinking about creating a new object class with a list and a boolean "sorted", and the passed object has to be that object, and then the method checks immediately if the "sorted" boolean is true. But I feel like there must be a better/standard way.
*This method is called in a loop, so don't want to sort inside the method.

Assuming that you only need to iterate this collection, and not perform any other operations, you can accept an IOrderedEnumerable, which would require that the sequence have been ordered by something. (Keep in mind that doing this may mean that it was sorted based on some other criteria than what you expected, so it's still possible that by the criteria you're using internally, the data is no sorted.)
The other option that you have is to simply sort the data after you receive it, instead of requiring the caller to sort the data. Note that for most common sorting algorithms sorting an already sorted data set is its best case speed (Typically O(n) instead of O(n*log(n))), so even if the data set is sometimes already sorted and sometimes not it's not necessarily terrible, so long as you don't have a huge data set.

First, let's answer the question asked here.
Is there a way to FORCE the input must be sorted?
Well, yes and no. You can specify that you need one of the data structures in .NET that has a sort order. On the other hand, no, you can't specify that it uses a sort order you care about. As such, it could be sorted by a random number, which would be the same as "unsorted" (probably) in your context.
Let me expand on that. There is no way for you to declare a type or method with a requirement that the compiler can verify that the data passed to the method is sorted according to some rules you decide upon. There simply isn't a syntax that will allow you to declare such a requirement. You either got to trust the calling code to have sorted the data correctly, or not.
So what have you got left?
My advice would be to create a method where the calling code would tell you that the data has been sorted according to some predefined requirement for calling that method. If the caller said "no, I haven't or cannot guarantee that the data is in that sort order", then you will have to sort it yourself.
Other than that you could create your own data structure that would imply the correct type of sorting.

It is possible to express and enforce such constraints in more powerful type systems but not in the type system of C# or .NET. You could flag the collection in some way, as you suggested, but this will not really make sure that the collections is actually sorted. You could use a boolean flag as you suggested or a special class or interface.
Personally I would not try to enforce it this way but would either check at runtime that the collection is sorted costing O(n) time. If you are iterating over the collection anyway, it would be easy to just check in every iteration that the current value is larger than the last one and throw an exception if this condition is violated.
Another option would be to use a sorting algorithm that runs in O(n) on a sorted list and just sort the collection every time. This will add not to much overhead in the case the list is really already sorted but it will still work if it is not. Insertion sort has the required property to run in O(n) on a sorted list. Bubble sort has the property, too, but is really slow in other cases.

'Don't expose generic list', why to use collection<T> instead of list<T> in method parameter

I am using FxCop and it shows warning for "Don't expose generic list" which suggests use Collection<T> instead of List<T>. The reason why it is preferred, I know all that stuff, as mentioned in this SO post and MSDN and many more articles I have been through.
But my question is this, I am having few methods which does so much heavy calculation and methods accepts parameters of List<T> which is supposed to be faster and good in terms of performance. But FxCop issues warning for this as well as. So one option is that I should declare the parameter as Collection<T>, then use ToList() inside the method and then use it.
So which one is optimized?
"Suppress the warning for this case" OR "use Collection<T> in parameter and then use ToList() inside the method itself".

The code analysis/FxCop rules have been written to support framework creators (Microsoft creates a lot of frameworks). A framework is consumed by external parties and you should be careful when you design the public interface. Provided that you are not writing a framework to be consumed by external parties you can simply ignore rules that doesn't provide value to you.
However, one of the reasons that this rule exists is that exposing collections on a class is somewhat difficult. Often the elements in the collection are owned by the containing class and in that case you violate encapsulation if you allow clients to modify the collection used to store the aggregated items. By returning List<T> you allow the clients to modify the collection in many different ways. But often you want to keep track of the items in the collection. E.g. adding a new element might require some additional bookkeeping in the containing class etc. You lose this kind of control when you return a List<T> unless of course you make a copy when you return it (but then the client should understand that they only get a copy of collection and modifications will be ignored).
All in all you can probably improve your class design by avoiding exposing classes like List<T> and being more explicit about how aggregated elements can be added, modified and removed. But if you are in a hurry and just want to crank out some code then using List<T> may be exactly what you need to get the job done.

Don't bother using generic lists in public properties as long as you are not coding a framework somebody else want's to extend in the near future.
I suggest to suppress the warning. You can refactor your classes later if requirements change.

IMHO your interpretation of "Don't expose generic list' which suggests use collection instead of list". Is invalid.
The critical difference between collection and list is that the elements in list are ordered. Some methods may require that passed elements have order. Then we must use in parameter a list.
The key to understand delivered warning is that you should use instead of concrete class List<T> a interface IList<T>.
As the method operate on the list it is not so important what kind of list it is. The key factor is that it is a list.
Concluding the method parameters should be abstract as possible.

You should use the type that is most appropriate for your purposes (and suppress the warning if appropriate). If you're passing a bunch of items, and order and uniqueness don't matter, use a collection. If you're passing an ordered collection of items, use a list. If you're passing data such that every item is unique but order doesn't matter, use a set. Use the type that has the semantic meaning appropriate for the exchange. In a few cases where the semantics and the methods that you need don't necessarily align (suppose you need AddRange), make an exception, or use the conversion methods.

Why refactor argument of List<Term> to IEnumerable<Term>?

I have a method that looks like this:
public void UpdateTermInfo(List<Term> termInfoList)
{
foreach (Term termInfo in termInfoList)
{
UpdateTermInfo(termInfo);
}
m_xdoc.Save(FileName.FullName);
}
Resharper advises me to change the method signature to IEnumerable<Term> instead of List<Term>. What is the benefit of doing this?

The other answers point out that by choosing a "larger" type you permit a broader set of callers to call you. Which is a good enough reason in itself to make this change. However, there are other reasons. I would recommend that you make this change because when I see a method that takes a list or an array, the first thing I think is "what if that method tries to change an item in my list/array?"
You want the contents of a bucket, but you are requiring not just the bucket but also the ability to change its contents. Why would you require that if you're not going to use that ability? When you say "this method cannot take any old sequence; it has to take a mutable list that is indexed by integers" I think that you're making that requirement on the caller because you're going to take advantage of that power.
If "I'm planning on messing up your data structure" is not what you intend to communicate to the caller of the method then don't communicate that. A method that takes a sequence communicates "The most I'm going to do is read from this sequence in order".

Simply put, accepting an enumerable allows your function to be compatible with a broader scope of input arguments, such as arrays and LINQ queries.
To expound on accepting LINQ queries, one could do:
UpdateTermInfo(myTermList.Where(x => somefilter));
Additionally, specifying an interface rather than a concrete class allows others to provide their own implementation of that interface. In this way, you are being "subscriptive" rather than "proscriptive." (Yes, I did just make up a word.)
In general (with many exceptions relating to what sort of abilities you want to reserve for potential later modifications), it is a best-practice to implement functions using arguments that are the most general that they can be. This gives maximum flexibility to the consumer of your function.
As a result, if you are dead-set on using a list for this function (perhaps because at some later date you expect you might want to use properties such as Count or the index operator), I would strongly urge you to consider using IList<Term> instead of List<Term> for the reasons mentioned above.

List implements IEnumerable, using it would makes things more flexible. If an instance came along where you didn't want to use a List and wanted to use a different collection object it would cast from IEnumerable with ease.
For instance IEnumerable allows you to use Arrays and many others as opposed to always using a List.
Inumerable is simply a collection of items, dissimilar to a List, where you can add, remove, sort, use For Each, Count etc.

The main idea behind that refactor is that you make the method more general. You don't say what data structure you want, only what you need from it: that you can iterate through its elements.
So later, when you decide that O(n) search is not good enough for you, you only have to change one line and move along.

If you use List then you are confining yourself to only use a concrete implementation of List where as with IEnumerable you can pass in Arrays, Lists, Collections as they all implement that interface.

IList<KeyValuePair> vs IDictionary to serve as [DataMember] in WCF

I have a dictionary data structure that must be passed around using WCF. To do that I created a member property with get and set methods. I can basicly achieve the same functionality, with this property being either a:
IDictionary<keyType, valueType>
or a
IList<KeyValuePair<keyType, valueType>>
I can see no strong reason for choosing one over the other. One mild reaons I could think of is:
IDictionary - People reading the code will think that IDictionary makes more sense, since the data structure is a dictionary, but in terms of what is passed through WCF they are all the same.
Can anyone think of a reason to choose IList? If there is none I'll just go with IDictionary.

Design your interfaces based on use, not on implementation.
If the consumer of a class needs to iterate through the entire set, use IEnumerable. If they should be able to modify the result, and need index-based access, return IList. If they want specific items, and there is a single useful key value, return IDictionary.
Write your internal code this way, too :)

It depends on your consumers. I would cater for the most likely use case and make their API as simple as possible. Edge cases can always iterate the dictionary via the Values collection.
Don't make them think about it. If the the term dictionary is what they'd think about as the result of the operation and then the type with name is a very useful thing to use.

If the collection of keyValuePairs expects unique key, you can use dictionary.
If the same key can appear in more than one keyValuePair, use Ilist/ ienumerable.

ArrayList versus an array of objects versus Collection of T

I have a class Customer (with typical customer properties) and I need to pass around, and databind, a "chunk" of Customer instances. Currently I'm using an array of Customer, but I've also used Collection of T (and List of T before I knew about Collection of T). I'd like the thinnest way to pass this chunk around using C# and .NET 3.5.
Currently, the array of Customer is working just fine for me. It data binds well and seems to be as lightweight as it gets. I don't need the stuff List of T offers and Collection of T still seems like overkill. The array does require that I know ahead of time how many Customers I'm adding to the chunk, but I always know that in advance (given rows in a page, for example).
Am I missing something fundamental or is the array of Customer OK? Is there a tradeoff I'm missing?
Also, I'm assuming that Collection of T makes the old loosely-typed ArrayList obsolete. Am I right there?

Yes, Collection<T> (or List<T> more commonly) makes ArrayList pretty much obsolete. In particular, I believe ArrayList isn't even supported in Silverlight 2.
Arrays are okay in some cases, but should be considered somewhat harmful - they have various disadvantages. (They're at the heart of the implementation of most collections, of course...) I'd go into more details, but Eric Lippert does it so much better than I ever could in the article referenced by the link. I would summarise it here, but that's quite hard to do. It really is worth just reading the whole post.

No one has mentioned the Framework Guidelines advice: Don't use List<T> in public API's:
We don’t recommend using List in
public APIs for two reasons.
List<T> is not designed to be extended. i.e. you cannot override any
members. This for example means that
an object returning List<T> from a
property won’t be able to get notified
when the collection is modified.
Collection<T> lets you overrides
SetItem protected member to get
“notified” when a new items is added
or an existing item is changed.
List has lots of members that are not relevant in many scenarios. We
say that List<T> is too “busy” for
public object models. Imagine
ListView.Items property returning
List<T> with all its richness. Now,
look at the actual ListView.Items
return type; it’s way simpler and
similar to Collection<T> or
ReadOnlyCollection<T>
Also, if your goal is two-way Databinding, have a look at BindingList<T> (with the caveat that it is not sortable 'out of the box'!)

Generally, you should 'pass around' IEnumerable<T> or ICollection<T> (depending on whether it makes sense for your consumer to add items).

If you have an immutable list of customers, that is... your list of customers will not change, it's relatively small, and you will always iterate over it first to last and you don't need to add to the list or remove from it, then an array is probably just fine.
If you're unsure, however, then your best bet is a collection of some type. What collection you choose depends on the operations you wish to perform on it. Collections are all about inserts, manipulations, lookups, and deletes. If you do frequent frequent searches for a given element, then a dictionary may be best. If you need to sort the data, then perhaps a SortedList will work better.
I wouldn't worry about "lightweight", unless you're talking a massive number of elements, and even then the advantages of O(1) lookups outweigh the costs of resources.
When you "pass around" a collection, you're only passing a reference, which is basically a pointer. So there is no performance difference between passing a collection and an array.

I'm going to put in a dissenting argument to both Jon and Eric Lippert )which means that you should be very weary of my answer, indeed!).
The heart of Eric Lippert's arguments against arrays is that the contents are immutable, while the data structure itself is not. With regards to returning them from methods, the contents of a List are just as mutable. In fact, because you can add or subtract elements from a List, I would argue that this makes the return value more mutable than an array.
The other reason I'm fond of Arrays is because sometime back I had a small section of performance critical code, so I benchmarked the performance characteristics of the two, and arrays blew Lists out of the water. Now, let me caveat this by saying it was a narrow test for how I was going to use them in a specific situation, and it goes against what I understand of both, but the numbers were wildly different.
Anyway, listen to Jon and Eric =), and I agree that List almost always makes more sense.

I agree with Alun, with one addition. If you may want to address the return value by subscript myArray[n], then use an IList.
An Array inherently supports IList (as well as IEnumerable and ICollection, for that matter). So if you pass by interface, you can still use the array as your underlying data structure. In this way, the methods that you are passing the array into don't have to "know" that the underlying datastructure is an array:
public void Test()
{
IList<Item> test = MyMethod();
}
public IList<Item> MyMethod()
{
Item[] items = new Item[] {new Item()};
return items;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Sorting ConcurrentDictionary makes any sense? - c#

Related

Forcing/Requiring Sorted List Argument for Method

'Don't expose generic list', why to use collection<T> instead of list<T> in method parameter

Why refactor argument of List<Term> to IEnumerable<Term>?

IList<KeyValuePair> vs IDictionary to serve as [DataMember] in WCF

ArrayList versus an array of objects versus Collection of T

Categories

Resources