I have been asked to revise the code written some time ago for a windows form application. The programmer has used ArrayList heavily. I think generic lists are way more efficient compared to array lists and plan to rewrite the code using List<T> I wanted to know if there are any other alternatives that might also be worth considering. I work on .net 2.0
If you're working in .NET 2, then you won't have any of the concurrent collections in .NET 4 available to you, which pretty much just leaves List<T> in terms of "collections which are a bit like ArrayList. (Even within the concurrent collections, there isn't an immediate equivalent - and you should only use the concurrent collections when you actually anticipate concurrent access anyway.)
There are Stack<T> and Queue<T>, as well as LinkedList<T> - but all of those are somewhat different to ArrayList in terms of what you can do with them. They're worth considering if you don't need random access, of course.
I wouldn't expect too much more in terms of efficiency unless you're currently boxing a lot of large value types in your ArrayList. What you can expect is far clearer code. Fewer casts, less uncertainty about the contents of the collection, etc.
If you have the option of upgrading to .NET 3.5 at any point in the near future, that would then give you access to LINQ, which is fabulously useful when dealing with collections. Relatively few new collection types, but much simpler ways of expressing operations on them.
Update:
For add to/remove from head/tail it is better to use LinkedList<T>, but if you can determine exact maximum capacity of collection and size will be close to capacity then may be it's better to use Queue<T> (because internally it's array, reallocated when size reaches capacity). With Queue you will not get memory overhead that comes with LinkedList nodes.
Original:
From MSDN: The List<T> class is the generic equivalent of the ArrayList class.
Please, read carefully List<T> Performance Considerations section.
What should you use depends on how ArrayList is used? Is it random access or add to/remove from head/tail?
Try SortedList and Collection.
Both supported by .NET Framework 2.0
Related
ConcurrentDictionary offer constant read time. I do not need key-value pairs, only keys.. I searched for read times on ConcurrentBag and havent found how it is implemented?
Is there a constant read time ConcurrentCollection, besides ConcurrentDictionary?
ConcurrentBag is probably not what you are looking for:
Represents a thread-safe, unordered collection of objects.
Which means that it allows duplicates (whereas the dictionary doesn't)
Bags are useful for storing objects when ordering doesn't matter, and unlike sets, bags support duplicates.
As for performance, it certainly isn't as much performant as a list (so at least O(n)) (C# - Performance comparison of ConcurrentBag vs List)
For a ConcurrentSet check your luck with the custom implementation here: How to implement ConcurrentHashSet in .Net
You can also check the list of Concurrent collections to see if something else suits your needs.
When using List.AddRange(), is there any difference in performance between adding a List or Array.
MyList.AddRange(MyArrayof1000ComplexElements);
VS
MyList.AddRange(MyListof1000ComplexElements);
or is there no difference?
Since an array and a list both implement ICollection<T>, it uses the same code.
It resolves to a call to Array.Copy(...)
http://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs#e569d850a66a1771#references
There is no difference between List<T> and T[] - AddRange uses the same handling for anything implementing ICollection<T>, which both of those do.
Both Array and List implement the ICollection<T> interface. Therefore, the implementation of List.AddRange that is used will be identical and will offer the same performance.
In the future, you can either test something like this yourself with a simple program using the Stopwatch class for timing or download a tool like JetBrain's dotPeek to inspect the framework code yourself.
This is a more interesting question than some of the comments might suggest.
As it happens, for this specific list/array implementation the answer is: no difference. Both rely on the same collection interface.
But it doesn't have to be that way. If a list is implemented as a doubly-linked list (which it is in many other cases) then appending one list to another is O(1) while appending an array to a list is O(n).
And I would not start by benchmarking to resolve this question. Benchmarking is time-consuming to do well and can easily produce results susceptible to misinterpretation. In this case a careful study of the implementation and the underlying source code (easily available through a .NET disassembler) will answer the question faster. Then benchmark to confirm, if it matters enough.
Please note that the specific O(1) optimisation that applies here is only available if MyListof1000ComplexElements too is a List. If it some kind of enumerator or linked list then the performance will be O(n).
In response to those who have criticised this answer, please note that it has been written with the intention of highlighting that the other answers given are based on a specific interpretation of the question. They fail to point out how narrowly they have interpreted the question and how narrowly their answers apply. Another reader might easily miss the fact that this answer only applies to this specific circumstance if they don't say so. My aim is simply to point out that in many other closely related situations, this is an O(n) operation rather than O(1).
What is difference between ImmutableArray<T> and ImmutableList<T>, and where would it be best to use each?
Here is some reading that might help explain: Please welcome ImmutableArray
Here's an excerpt:
Reasons to use immutable array:
Updating the data is rare or the number of elements is quite small (<16)
you need to be able to iterate over the data in performance critical sections
you have many instances of immutable collections and you can’t afford keeping the data in trees
Reasons to stick with immutable list:
Updating the data is common or the number of elements isn’t expected to be small
Updating the collection is more performance critical than iterating the contents
I think you are asking where to use each of them. Please welcome ImmutableArray will help. To summarize, use immutable array when:
Updating the data is rare or the number of elements is quite small (<16)
You need to be able to iterate over the data in performance critical sections
You have many instances of immutable collections and you can’t afford keeping the data in trees
Use immutable list when:
Updating the data is common or the number of elements isn't expected to be small
Updating the collection is more performance critical than iterating the contents
The main difference is that an ImmutableArray is just a wrapper around a normal array, which means that retrieving elements within the array is extremely quick.
An ImmutableArray is also a struct, meaning that it's a value type, so it takes up less space than an Immutable List which is a reference type.
For these reasons, an ImmutableArray is suitable to use if you rarely need to update its data, or the number of elements is small (less than 16), or if the performance of iterating over the collection is important.
If your needs do not match the reasons above, you should use an ImmutableList.
Look here for the documentation.
Going by the blog post "Please welcome ImmutableArray<T>" (the same blog post everyone else is referencing), the differences are...
ImmutableArray:
Simple implementation (just an array).
Not optimized for quickly updating large collections (the entire array needs to be copied).
More efficient for most use cases (anything that does not involve updating large collections).
ImmutableList:
More complicated implementation (something involving trees).
Is optimized for quickly updating large collections (it can be done in logarithmic time).
Less efficient for most use cases.
So if you're not updating much, or your collections are small, use ImmutableArray. If you're going to be frequently updating large collections, you'll need to use ImmutableList.
For every situation that warrants the use of an array ... there is an awesome collection with benefits. Is there any specific use case for Arrays any more in .NET?
Sending/Receiving data with a specific length comes to mind, ie. Serial Port, Web Request, FTP Request. Basically stuff that works on a lower level in the system. Also, most Collections are using an array for storage (Noteable exception: LinkedList<T>). Collections are just another abstraction layer.
Arrays are useful because they are always linear in memory and are fast to work with. For example I can take a byte[] and marshal directly into a structure without any problems but a List<T> would have to be converted to an array first as far as I know.
No they still have their uses and should always be considered.
Remember arrays are very basic representations of a fixed length so they are very fast and most languages understand them depending on the type used in the array.
You need to define an array size at the time that it is created and cannot change its size later. Lists and other things can grow as needed which adds overhead with respect to memory allocation.
Lists and other types are useful because they can do a lot but sometimes you don't need all that extra overhead so an array is all you need.
It's like driving a 4x4 because you think one day you might need to go off roading even though there is 99.9% of a chance you will be on normal roads. Array would be the basic car and a List for example would be a 4x4... it does everything else a car can do (an under the hood might use most of the same parts) but at the expense of gas, cost, might not fit in certain parking stalls, etc...
Arrays = performance and compatability
Lists (or other representations) = ease of use at a cost of performance and compatability
Yes, there certainly still is a use for arrays. Some methods still needs arrays.
For example:
string[] items = "a,b;c:d".Split(new char[]{',',';',':'});
It's still the simplest way to keep a bunch of items, and the number one choice until you need some specific feature, like for example dynamic growth.
Have arrays lost (some) significance?
Yes. For many tasks requiring a 'table' of items there now are more flexible and useful solutions like List<> and IEnumerable<>.
Have arrays lost their importance?
No. They are the fastest form of storage and they are used 'under the hood' in most of the collection classes, System.String etc.
So, arrays have become more low-level, and an application programmer will be using them directly less often.
Quite apart from everything else, many of those great collection classes you refer to are implemented using arrays. You might not be using them explicitly, but you're using loads of them and your program is better for it. That means that arrays must be in the language (or that the collections are implemented directly using lots of native code, which would be suckier).
Yes? Anytime I have a type which internally maintains a fixed-size collection of items, I use an array as it's the fastest to iterate and requires the least memory. No sense using a List<T>, Queue<T>, etc. if you don't need those features.
Two cases where I work daily with arrays:
Image analysis. Images are almost always byte[] or int[] arrays.
External hardware communication. Most devices require perfectly structured arrays to send/receive messages.
It's a fair question, but the answer is definitely that they're still useful. Speed is one reason, simplicity for fixed sizes is another. But I think the most important one is flexibility. It gives you a nice base to design your own collection, backed by a simple array, if you ever needed it.
No , Array will not loose its importance.
As when you know about the no of items in advance , you can go for array which gives you very fast access.
2- In Graph theroy , still when you store information about link between vertex , you can implment using arrays which are more fast than LinkList implementation.
3- Some methods like string.split returns array.
you can use this wonderful static placeholder of items in a verity of computer problems
I use arrays to maniulate images like in WritableBitmap class.
One thing I must tell you Arrays are the building blocks for any programming language. If you want to declare a storage having more than one element, Arrays are the basic option for you.
Say for instance a List.
If you see the definition of List, it actually holds
T[] items
Just use Reflector and find the definition of List, you will be surprised to find out List to be actually an Array. In .NET most of the collection other than LinkedList are basically an Array implementation. They used Array because of its fast storage and retrieval.
I agree that Array has limitation of Update or Remove, if your main emphasis is in storage than speed, you might go for Linked List.
What do you think the backing field behind many of those fancy collections is?
I have a number of data classes representing various entities.
Which is better: writing a generic class (say, to print or output XML) using generics and interfaces, or writing a separate class to deal with each data class?
Is there a performance benefit or any other benefit (other than it saving me the time of writing separate classes)?
There's a significant performance benefit to using generics -- you do away with boxing and unboxing. Compared with developing your own classes, it's a coin toss (with one side of the coin weighted more than the other). Roll your own only if you think you can out-perform the authors of the framework.
Not only yes, but HECK YES. I didn't believe how big of a difference they could make. We did testing in VistaDB after a rewrite of a small percentage of core code that used ArrayLists and HashTables over to generics. 250% or more was the speed improvement.
Read my blog about the testing we did on generics vs weak type collections. The results blew our mind.
I have started rewriting lots of old code that used the weakly typed collections into strongly typed ones. One of my biggest grips with the ADO.NET interface is that they don't expose more strongly typed ways of getting data in and out. The casting time from an object and back is an absolute killer in high volume applications.
Another side effect of strongly typing is that you often will find weakly typed reference problems in your code. We found that through implementing structs in some cases to avoid putting pressure on the GC we could further speed up our code. Combine this with strongly typing for your best speed increase.
Sometimes you have to use weakly typed interfaces within the dot net runtime. Whenever possible though look for ways to stay strongly typed. It really does make a huge difference in performance for non trivial applications.
Generics in C# are truly generic types from the CLR perspective. There should not be any fundamental difference between the performance of a generic class and a specific class that does the exact same thing. This is different from Java Generics, which are more of an automated type cast where needed or C++ templates that expand at compile time.
Here's a good paper, somewhat old, that explains the basic design:
"Design and Implementation of Generics for the
.NET Common Language Runtime".
If you hand-write classes for specific tasks chances are you can optimize some aspects where you would need additional detours through an interface of a generic type.
In summary, there may be a performance benefit but I would recommend the generic solution first, then optimize if needed. This is especially true if you expect to instantiate the generic with many different types.
I did some simple benchmarking on ArrayList's vs Generic Lists for a different question: Generics vs. Array Lists, your mileage will vary, but the Generic List was 4.7 times faster than the ArrayList.
So yes, boxing / unboxing are critical if you are doing a lot of operations. If you are doing simple CRUD stuff, I wouldn't worry about it.
Generics are one of the way to parameterize code and avoid repetition. Looking at your program description and your thought of writing a separate class to deal with each and every data object, I would lean to generics. Having a single class taking care of many data objects, instead of many classes that do the same thing, increases your performance. And of course your performance, measured in the ability to change your code, is usually more important than the computer performance. :-)
According to Microsoft, Generics are faster than casting (boxing/unboxing primitives) which is true.
They also claim generics provide better performance than casting between reference types, which seems to be untrue (no one can quite prove it).
Tony Northrup - co-author of MCTS 70-536: Application Development Foundation - states in the same book the following:
I haven’t been able to reproduce the
performance benefits of generics;
however, according to Microsoft,
generics are faster than using
casting. In practice, casting proved
to be several times faster than using
a generic. However, you probably won’t
notice performance differences in your
applications. (My tests over 100,000
iterations took only a few seconds.)
So you should still use generics
because they are type-safe.
I haven't been able to reproduce such performance benefits with generics compared to casting between reference types - so I'd say the performance gain is "supposed" more than "significant".
if you compare a generic list (for example) to a specific list for exactly the type you use then the difference is minimal, the results from the JIT compiler are almost the same.
if you compare a generic list to a list of objects then there is significant benefits to the generic list - no boxing/unboxing for value types and no type checks for reference types.
also the generic collection classes in the .net library were heavily optimized and you are unlikely to do better yourself.
In the case of generic collections vs. boxing et al, with older collections like ArrayList, generics are a performance win. But in the vast majority of cases this is not the most important benefit of generics. I think there are two things that are of much greater benefit:
Type safety.
Self documenting aka more readable.
Generics promote type safety, forcing a more homogeneous collection. Imagine stumbling across a string when you expect an int. Ouch.
Generic collections are also more self documenting. Consider the two collections below:
ArrayList listOfNames = new ArrayList();
List<NameType> listOfNames = new List<NameType>();
Reading the first line you might think listOfNames is a list of strings. Wrong! It is actually storing objects of type NameType. The second example not only enforces that the type must be NameType (or a descendant), but the code is more readable. I know right away that I need to go find TypeName and learn how to use it just by looking at the code.
I have seen a lot of these "does x perform better than y" questions on StackOverflow. The question here was very fair, and as it turns out generics are a win any way you skin the cat. But at the end of the day the point is to provide the user with something useful. Sure your application needs to be able to perform, but it also needs to not crash, and you need to be able to quickly respond to bugs and feature requests. I think you can see how these last two points tie in with the type safety and code readability of generic collections. If it were the opposite case, if ArrayList outperformed List<>, I would probably still take the List<> implementation unless the performance difference was significant.
As far as performance goes (in general), I would be willing to bet that you will find the majority of your performance bottlenecks in these areas over the course of your career:
Poor design of database or database queries (including indexing, etc),
Poor memory management (forgetting to call dispose, deep stacks, holding onto objects too long, etc),
Improper thread management (too many threads, not calling IO on a background thread in desktop apps, etc),
Poor IO design.
None of these are fixed with single-line solutions. We as programmers, engineers and geeks want to know all the cool little performance tricks. But it is important that we keep our eyes on the ball. I believe focusing on good design and programming practices in the four areas I mentioned above will further that cause far more than worrying about small performance gains.
Generics are faster!
I also discovered that Tony Northrup wrote wrong things about performance of generics and non-generics in his book.
I wrote about this on my blog:
http://andriybuday.blogspot.com/2010/01/generics-performance-vs-non-generics.html
Here is great article where author compares performance of generics and non-generics:
nayyeri.net/use-generics-to-improve-performance
If you're thinking of a generic class that calls methods on some interface to do its work, that will be slower than specific classes using known types, because calling an interface method is slower than a (non-virtual) function call.
Of course, unless the code is the slow part of a performance-critical process, you should focus of clarity.
See Rico Mariani's Blog at MSDN too:
http://blogs.msdn.com/ricom/archive/2005/08/26/456879.aspx
Q1: Which is faster?
The Generics version is considerably
faster, see below.
The article is a little old, but gives the details.
Not only can you do away with boxing but the generic implementations are somewhat faster than the non generic counterparts with reference types due to a change in the underlying implementation.
The originals were designed with a particular extension model in mind. This model was never really used (and would have been a bad idea anyway) but the design decision forced a couple of methods to be virtual and thus uninlineable (based on the current and past JIT optimisations in this regard).
This decision was rectified in the newer classes but cannot be altered in the older ones without it being a potential binary breaking change.
In addition iteration via foreach on an List<> (rather than IList<>) is faster due to the ArrayList's Enumerator requiring a heap allocation. Admittedly this did lead to an obscure bug