Why is array co-variance considered so horrible? - c#

In .NET reference type arrays are co-variant. This is considered a mistake. However, I don't see why this is so bad consider the following code:
string[] strings = new []{"Hey there"};
object[] objects = strings;
objects[0] = new object();
Oh ho, this compiles and will fail at runtime. As we tried to stick an object into a string[]. Okay, I agree that stinks, but a T[] extends Array and also implements IList (and IList<T>, I wonder if it implements IList<BaseType>...>. Both Array and IList allow us to make the same horrid mistake.
string[] strings = new []{"Hey there"};
Array objects = strings;
objects.SetValue(new object(),new[]{0});
IList version
string[] strings = new []{"Hey there"};
IList objects = strings;
objects[0] = new object();
The T[] classes are generated by the CLR, and have to include a type check on the equivalent of the set_Item method (arrays don't actually have one).
Is the concern that setting to a T[] has to do the type check at runtime (which violates the type-safety you expect at compile time)? Why is it considered harmful for arrays to exhibit this property when there are equivalent means to shoot yourself in the foot through the provided means above?

In .NET reference type arrays are co-variant. This is considered a mistake.
Type-safety breaking array covariance is considered by some people to be a mistake in the design of .NET. It is not so considered by all people. I do not consider it to be a mistake; I consider it to be an unfortunate choice. All design processes involve choices between undesirable alternatives. In this case, the choice was between adding an unsafe implicit conversion that imposes a run-time cost on all array writes, or building a type system that could not easily implement the Java type system. That's a tough choice and the designers of the type system made the best choice they could with the information they had.
That explanation of course is mere question begging; isn't it then simply the case that the designers of Java made a mistake? Possibly yes, possibly no; likely the designers of Java also faced tradeoffs in the design of their type system. Any experts on the history of the development of the Java type system who would like to chime in here on what those tradeoffs were, I'd be interested to know.
I, with the benefit of ten years of hindsight, personally would have preferred it if the designers of the .NET type system had chosen to eschew safety-breaking array covariance. But that doesn't make that choice a "mistake", it just makes it somewhat unfortunate.
Is the concern that setting to a T[] has to do the type check at runtime (which violates the type-safety you expect at compile time)?
Yes. It means that code that looks like it ought to always run successfully can fail at runtime. And it means that correct code has a performance penalty imposed upon it.
Why is it considered harmful for arrays to exhibit this property when there are equivalent means to shoot yourself in the foot through the provided means above?
This is a strange question. The question is essentially "I have two guns already with which I can shoot myself in the foot, so why is it considered harmful for me to shoot myself in the foot with a third?"
The existence of two dangerous patterns that violate type safety does not make a third such pattern any less dangerous.
Language and runtime features that violate type safety are there for those times when you absolutely positively know that what you are doing is safe, even if the compiler doesn't know it. If you don't understand those features well enough to use them safely then do not use them.

Yes, IList and Array allow you to make the same mistake - because they're weakly-typed APIs to start with.
Arrays look like they're strongly typed (at compile time) but in reality they're not. They could so easily have been safe (and faster) but they're not. It's just a wasted opportunity for both performance and compile-time safety :(

I think your note about IList points to something worth considering here.
It is pretty useful that IList is implemented by arrays. It's also useful that there other collections that implement it.
Now, these days, indeed for the last 5 years, we have often found it more useful (or equally useful and safer) to deal with IList<T>.
Prior to .NET2.0 though, we didn't have IList<T>, we only had IList. Quite a few cases where one might move between arrays and other collections were fiddlier (at best) prior to generics which in many cases now let us move between typed collections and typed arrays with greater confidence.
As such, the arguments in favour of covariant arrays were greater when the relevant decisions were made, than they are now. And that they build on similar decisions in Java when it didn't have generics only adds to this fact.

One cost of array-variance is that assignments to an array of a non sealed reference type are a bit more expensive. But considering that assignments to reference types already do quite a bit related to the GC I guess that cost isn't significant.

Related

Performing and type-safe collection of value types in the early version of .NET

I recently got access to a very old C# code base (very old meaning .NET 1.1 and C# 1.2) in order to find places that could be improved. It is like a window to the past when a bow was the most powerfull weapon. I have never had the opportunity to willingly code in C# version lesser than C# 2 before.
Basically, what I have found is that there is a whole bunch of large ArrayLists for storing numbers and structs. By large I mean tens or hundreds of thousand of values. Something like
ArrayList numbers = new ArrayList();
for(int i = 0; i < BIG_CONSTANT; i++)
numbers.Add(someNumberReturnedFromMethod);
These lists are sorted on some place then again sorted by a different standard on another place, or there are searches performed on them.
Knowing that ArrayList works with objects results in two problems I came up with
no type-safety
boxing and unboxing everywhere
So the question is, how to solve this without the aid of generics? I definitely will not make classes from these structs in order to remove un/boxing, because with so many data, this really makes difference (tested).
I thought about creating custom array lists that would work with my structs instead. Do I have to create a custom array list implementation for every struct there is? Thinging about it, it would mean to define my own IEnumerable, ICollection and IList for each of these structs, right?
I think that it could be a good cost to pay in order to get more performing and type-safe system, but before trying it out I would like to know your opinion.
Without generics you cannot create a collection that avoids boxing with value types. But you can create specialized collections for concrete value types.
Here's an idea: Create your own version of ArrayList but use T4 templates to generate specialized versions for every concrete type you care about. That gives you, without code duplication, an ArrayListInt32, an ArrayListSingle and so on.
This removes any performance penalty from the boxing and restores type safety.
Actually, Java has a similar problem because due to type erasure they cannot use generics to avoid boxing. They do have some specialized array list classes in newer versions. Thank god C# does implement true generics in recent versions.

Immutable Data Structures in C#

I was reading some entries in Eric Lippert's blog about immutable data structures and I got to thinking, why doesn't C# have this built into the standard libraries? It seems strange for something with obvious reuse to not be already implemented out of the box.
EDIT: I feel I might be misunderstood on my question. I'm not asking how to implement one in C#, I'm asking why some of the basic data structures (Stack, Queue, etc.) aren't already available as immutable variants.
It does now.
.NET just shipped their first immutable collections, which I suggest you try out.
Any framework, language, or combination thereof that is not a purely experimental exercise has a market. Some purely experimental ones go on to develop a market.
In this sense, "market" does not necessarily refer to market economics, it's as true whether the producers of the framework/language/both are commercially or non-commercially oriented and distributing the framework/language/both (I'm just going to say "framework" for now on) at a cost or for free. Indeed, free-as-in-both-beer-and-speech projects can be even more heavily dependent on their markets than commercial projects in this way, because their producers are a subset of their market. The market is anyone and everyone who uses it.
The nature of this market will affect the framework in several ways both by organic processes (some parts prove more popular than others and get a bigger share of the mindspace within the community that educates its own members about them) and by fiat (someone decides a feature will serve the market and therefore adds it).
When .NET was developed, it was developed to serve its future market. Ideas about what would serve them therefore influenced decisions as to what should and should not be included in the FCL, the runtime, and the languages that work with it.
Someone decided that we'd quite likely need System.Collections.ArrayList. Someone decided we'd quite likely need System.IO.DirectoryInfo to have a Delete() method. Nobody decided we'd be likely to need a System.Collections.ImmutableStack.
Maybe nobody thought of it at all. Maybe someone did and even implemented it and then it was decided not to be of enough general use. Maybe it was debated at length within Microsoft. I've never worked for MS, so I don't have a clue.
What I can consider though, is the question as to what the people who were using the .NET framework in 2002 using in 2001.
Well, COM, ActiveX, ("Classic") ASP, and VB6 & VBScript is now much less used than it was, so it can be said to have been replaced by .NET. Indeed, that could be said to have been an intention.
As well as VB6 & VBScript, a considerable number who were writing in C++ and Java with Windows as a sole or primary target platform are now at least partly using .NET instead. Again, I think that could be said to be an intention, or at the very least I don't think MS were surprised that it went that way.
In COM we had an enumerator-object based foreach approach to iteration that had direct support in some languages (the VB family*), and .NET we have an enumerator-object based foreach approach to iteration that has direct support in some languages (C#, VB.NET, and others)†.
In C++ we had a rich set of collection types from the STL, and in .NET we have a rich set of collection types from the FCL (and typesafe generic types from .NET2.0 on).
In Java we had a strong everything-is-an-object style of OOP with a small set of methods provided by a common base-type and a boxing mechanism to allow for simple efficient primitives while keeping to this style. In .NET we have a strong everything-is-an-object style of OOP with a small set of methods provided by a common base-type and a (different) boxing mechanism to allow for simple efficient primitives while keeping to this style.
These cases show choices that are unsurprising considering who was likely to end up being the market for .NET (though such broad statements above shouldn't be read in a way that underestimates the amount of work and subtlety of issues within each of them). Another thing that relates to this is when .NET differs from COM or classic VB or C++ or Java, there may well be a bit of an explanation given in the documentation. When .NET differs from Haskell or Lisp, nobody feels the need to point it out!
Now, of course there are things done differently in .NET than to any of the above (or there'd have been no point and we could have stayed with COM etc.)
However, my point is that out of the near-infinite range of possible things that could end up in a framework like .NET, there are some complete no-brainers ("they might need some sort of string type..."), some close-to-obvious ("this is really easy to do in COM, so it must be easy to do in .NET"), some harder calls ("this will be more complicated than in VB6, but the benefits are worth it"), some improvements ("locale support could really be made a lot easier for developers, so lets build a new approach to the old issue") and some that were less related to the above.
At the other extreme, we can probably all imagine something that would be so out there as to be bizarre ("hey, all coders like Conway's Life - let's put a Conway's Life right into the framework") and hence there's no surprise at not finding it supported.
So far I've quickly brushed over a lot of hard work and difficult design balances in a way that makes the design they came up with seem simpler than it no doubt was. Most likely, the more "obvious" it seems to an outsider after the fact, the more difficult it was for the designers.
Immutable collection types falls into the large range of possible components to the FCL that while not as bizarre as a built-in-conway-support idea, was not as strongly called for by examining the market as a mutable list or a way to encapsulate locale information nicely. It would have been novel to much of the initial market, and therefore at risk of ending up not being used. In an alternate universe there's a .NET1.0 with immutable collections, but it's not very surprising that there wasn't one here.
*At least for consuming. Producing IEnumVARIANT implementations in VB6 wasn't simple, and could involve writing pointer values straight into v-tables in a rather nasty way that it suddenly occurs to me, is possibly not even allowed by today's DEP.
†With a sometimes impossible to implement .Reset() method. Is there any reason for this other than it was in IEnumVARIANT? Was it even ever much used in IEnumVARIANT?
I'll quote from that Eric Lippert blog that you've been reading:
because no one ever designed, specified, implemented, tested, documented and shipped that feature.
In other words, there is no reason other than it hasn't been high enough value or priority to get done ahead of all the other things they're working on.
Why can't you make an immutable struct? Consider the following:
public struct Coordinate
{
public int X
{
get { return _x; }
}
private int _x;
public int Y
{
get { return _y; }
}
private int _y;
public Coordinate(int x, int y)
{
_x = x;
_y = y;
}
}
It's an immutable value type.
It's hard to work with immutable data structures unless you have some functional programming constructs. Suppose you wanted to create an immutable vector containing every other capital letter. How would you do it unless you
A) had functions that did things like range(65, 91), filter(only even) and map(int -> char) to create the sequence in one shot and then turn it into an array
B) created the vector as mutable, added the characters in a loop, then then "froze" it, making it immutable?
By the way, C# does have the B option to some extent -- ReadOnlyCollection can wrap a mutable collection and prevent people from mutating it. However, it's a pain in the ass to do that all the time (and obviously it's hard to support sharing structure between copies when you don't know if something is going to become immutable or not.) A is a better option.
Remember, when C# 1.0 existed, it didn't have anonymous functions, it didn't have language support for generators or other laziness, it didn't have any functional APIs like LINQ -- not even map or filter -- it didn't have concise array initialization syntax (you couldn't write new int[] { 1, 2, 5 }) and it didn't have generic types; just putting stuff into and getting stuff out of collections normally was a pain. So I don't think it would have been a great choice to spend time on making robust immutable collections with such poor language support for using them.
It would be nice if .net had some really solid support for immutable data holders (classes and structures). One difficulty with adding really good support for such things, though, is that taking maximum advantage of mutable and immutable data structures would require some fundamental changes to the way inheritance works. While I would like to see such support in the next major object-oriented framework, I don't know that it can be efficiently worked into existing frameworks like .net or Java.
To see the problem, imagine that there are two basic data types: basicItem and deluxeItem (which is a basicItem with a few extra fields added). Each can exist in two concrete forms: mutable and immutable. Each can also be described in an abstract form: readable. Thus, there should be six data types; all but ReadableBasicItem should be substitutable for at least one other:
ReadableBasicItem: Not substitutable for anything
MutableBasicItem: ReadableBasicItem
ImmutableBasicItem: ReadableBasicItem
ReadableDeluxeItem: ReadableBasicItem
MutableDeluxeItem: ReadableDeluxeItem, MutableBasicItem (also ReadableBasicItem)
ImmutableDeluxeItem: ReadableDeluxeItem, ImmutableBasicItem (also ReadableBasicItem)
Even thought the underlying data type has just one base and one derived type, there inheritance graph has two "diamonds" since both "MutableDeluxeItem" and "ImmutableDeluxeItem" have two parents (MutableBasicItem and ReadableDeluxeItem), both of which inherit from ReadableBasicItem. Existing class architectures cannot effectively deal with that. Note that it wouldn't be necessary to support generalized multiple inheritance; merely to allow some specific variants such as those above (which, despite having "diamonds" in the inheritance graph, have an internal structure such that both ReadableDeluxeItem and MutableBasicItem would inherit from "the same" ReadableBasicItem).
Also, while support for that style of inheritance of mutable and immutable types might be nice, the biggest payoff wouldn't happen unless the system had a means of distinguishing heap-stored objects that should expose value semantics from those which should expose reference semantics, could distinguish mutable objects from immutable ones, and could allow objects to start out in an "uncommitted" state (neither mutable nor guaranteed immutable). Copying a reference to a heap object with mutable value semantics should perform a memberwise clone on that object and any nested objects with mutable value semantics, except in cases where the original reference would be guaranteed destroyed; the clones should start life as uncommitted, but be CompareExchange'd to mutable or immutable as needed.
Adding framework support for such features would allow copy-on-write value semantics to be implemented much more efficiently than would be possible without framework support, but such support would really have to be built into the framework from the ground up. I don't think it could very well be overlaid onto an existing framework.

C# classes - Why so many static methods?

I'm pretty new to C# so bear with me.
One of the first things I noticed about C# is that many of the classes are static method heavy. For example...
Why is it:
Array.ForEach(arr, proc)
instead of:
arr.ForEach(proc)
And why is it:
Array.Sort(arr)
instead of:
arr.Sort()
Feel free to point me to some FAQ on the net. If a detailed answer is in some book somewhere, I'd welcome a pointer to that as well. I'm looking for the definitive answer on this, but your speculation is welcome.
Because those are utility classes. The class construction is just a way to group them together, considering there are no free functions in C#.
Assuming this answer is correct, instance methods require additional space in a "method table." Making array methods static may have been an early space-saving decision.
This, along with avoiding the this pointer check that Amitd references, could provide significant performance gains for something as ubiquitous as arrays.
Also see this rule from FXCOP
CA1822: Mark members as static
Rule Description
Members that do not access instance data or call instance methods can
be marked as static (Shared in Visual Basic). After you mark the
methods as static, the compiler will emit nonvirtual call sites to
these members. Emitting nonvirtual call sites will prevent a check at
runtime for each call that makes sure that the current object pointer
is non-null. This can achieve a measurable performance gain for
performance-sensitive code. In some cases, the failure to access the
current object instance represents a correctness issue.
Perceived functionality.
"Utility" functions are unlike much of the functionality OO is meant to target.
Think about the case with collections, I/O, math and just about all utility.
With OO you generally model your domain. None of those things really fit in your domain--it's not like you are coding and go "Oh, we need to order a new hashtable, ours is getting full". Utility stuff often just doesn't fit.
We get pretty close, but it's still not very OO to pass around collections (where is your business logic? where do you put the methods that manipulate your collection and that other little piece or two of data you are always passing around with it?)
Same with numbers and math. It's kind of tough to have Integer.sqrt() and Long.sqrt() and Float.sqrt()--it just doesn't make sense, nor does "new Math().sqrt()". There are a lot of areas it just doesn't model well. If you are looking for mathematical modeling then OO may not be your best bet. (I made a pretty complete "Complex" and "Matrix" class in Java and made them fairly OO, but making them really taught me some of the limits of OO and Java--I ended up "Using" the classes from Groovy mostly)
I've never seen anything anywhere NEAR as good as OO for modeling your business logic, being able to demonstrate the connections between code and managing your relationship between data and code though.
So we fall back on a different model when it makes more sense to do so.
The classic motivations against static:
Hard to test
Not thread-safe
Increases code size in memory
1) C# has several tools available that make testing static methods relatively easy. A comparison of C# mocking tools, some of which support static mocking: https://stackoverflow.com/questions/64242/rhino-mocks-typemock-moq-or-nmock-which-one-do-you-use-and-why
2) There are well-known, performant ways to do static object creation/logic without losing thread safety in C#. For example implementing the Singleton pattern with a static class in C# (you can jump to the fifth method if the inadequate options bore you): http://www.yoda.arachsys.com/csharp/singleton.html
3) As #K-ballo mentions, every method contributes to code size in memory in C#, rather than instance methods getting special treatment.
That said, the 2 specific examples you pointed out are just a problem of legacy code support for the static Array class before generics and some other code sugar was introduced back in C# 1.0 days, as #Inerdia said. I tried to answer assuming you had more code you were referring to, possibly including outside libraries.
The Array class isn't generic and can't be made fully generic because this would break backwards compatibility. There's some sort of magic going on where arrays implement IList<T>, but that's only for single-dimension arrays with a lower bound of 0 – "list-ish" arrays.
I'm guessing the static methods are the only way to add generic methods that work over any shape of array regardless of whether it qualifies for the above-mentioned compiler magic.

"Overhead" of Class vs Structure in C#?

I'm doing course 3354 (Implementing System Types and Interfaces in the .NET Framework 2.0) and it is said that for simple classes, with members variables and functions, it is better to use a struct than a class because of overhead.
I have never heard of such a thing, what is the validity of this claim?
I recommend to never use a struct unless you have a very specific use-case in mind and know exactly how the struct will benefit the system.
While C# structs do allow members, they work a good bit different then classes (can't be subtyped, no virtual dispatch, may live entirely on the stack) and the behavior changes depending upon lifting, etc. (Lifting is the process of promoting a value type to the heap -- surprise!)
So, to answer the question: I think one of the biggest misnomers in C# is using structs "for performance". The reason for this is 'overhead' can't be truly measured without seeing how it interacts with the rest of the system and the role, if anything of note, it plays. This requires profiling and can't be summed up with such a trivial statement as "less overhead".
There are some good cases for struct value types -- one example is a composite RGB value stored in an array for an image. This is because the RGB type is small, there can be very many in an image, value types can be packed well in arrays, and may help to keep better memory locality, etc.
You should in general not change a type that should be a class into a struct for performance reasons. Structs and classes have different semantics and are not exactly interchangeable. It is true that structs can sometimes be faster, but they can also sometimes be slower. They behave differently and you should recognize when to use one or the other.
MSDN has a page on choosing between classes and structures.
The summary is:
Do not define a structure unless the
type has all of the following
characteristics:
It logically represents a single value, similar to primitive types
(integer, double, and so on).
It has an instance size smaller than 16 bytes.
It is immutable.
It will not have to be boxed frequently.
There may occasionally be a case where a type really should be a class but you need it to be a struct for performance reasons. This is rare and you should make sure you know what you are doing before taking this route. It can be especially dangerous if you break the immutability guideline for structs. Mutable structs can behave in ways that most programmers don't expect.
When C# was in beta, I was treated to a demonstration of the difference between class and struct. The developers had built a Mandelbrot set viewer, which grinds a lot of complex numbers to accomplish its result. IN the first instance, they ran the code with complex numbers represented as a class with a Real and Imaginary field. Then, they changed the word class to struct and recompiled. The difference in performance was huge.
Not having to allocate heap objects nor having to garbage collect them, in this scenario, was what made the difference. Depending on how many objects you have, the difference may be more or less spectacular. Before you start tweaking, measure your performance and only then make a decision as to go with the value type.
You're going to use a class about 95% of the time. Value type semantics end up biting you in the ass more often than not down the road.
If however, you're going with a struct, make sure that it's immutable.
In C#, classes and structs are completely different beasts. The syntax is nearly identical for the two but the behavior is completely different (reference vs. value). An individual instance of a struct will use less memory than the equivalent class but once you start building collections or passing them around, the value type semantics will kill you in terms of performance (and possibly memory consumption depending on what you're doing) unless the data in the struct is very small. I don't know what the hard number is but we're talking about on the order of 8 bytes or less.
It can also hurt in terms of maintainability and readability because the two look so similar but behave so differently.
One rule of thumb you can follow is to ask yourself whether or not what you are trying to do is like System.Date If it's not, then you should really really question your decision to use structs.
struct is a value type and class is a reference one. this could be one of the reason.
For simple types where you will have many of them, don't require inheritance, and are a good candidate for being immutable, structs are a good choice. Examples of this are Point, Rectangle, Decimal, and DateTime.
If you will not have many of them, the overhead is irrelevant and it shouldn't factor into your decision. If you will ever need to derive from it (or make it a derived type), it has to be a class. If the item cannot be immutable, struct is not a good candidate because mutating one instance won't change any copies of it.
I believe the most commonly quoted break-even point is around 12 bytes of data in your struct. So for example, 3 integers. The difference between value type and reference type is much more fundamental though, and you should make your choice based on that for types with few members.
If something is big, then class :)
Structs are value types and classes are reference types. So, if you're going to pass around references, classes can be better performance-wise since you are copying an address rather than the whole structure. However, if you are going to instantiate and reference these objects a large number of times, structs will be much better-performing because they are allocated on the stack. So, structs can be better for numerous objects where performance is important or in small objects where value-type semantics are desirable. Structs also have no default constructor. They do inherit from Object, but otherwise they are inheritance-free.
There's no need to allocate/deallocate structs.

Do C# Generics Have a Performance Benefit?

I have a number of data classes representing various entities.
Which is better: writing a generic class (say, to print or output XML) using generics and interfaces, or writing a separate class to deal with each data class?
Is there a performance benefit or any other benefit (other than it saving me the time of writing separate classes)?
There's a significant performance benefit to using generics -- you do away with boxing and unboxing. Compared with developing your own classes, it's a coin toss (with one side of the coin weighted more than the other). Roll your own only if you think you can out-perform the authors of the framework.
Not only yes, but HECK YES. I didn't believe how big of a difference they could make. We did testing in VistaDB after a rewrite of a small percentage of core code that used ArrayLists and HashTables over to generics. 250% or more was the speed improvement.
Read my blog about the testing we did on generics vs weak type collections. The results blew our mind.
I have started rewriting lots of old code that used the weakly typed collections into strongly typed ones. One of my biggest grips with the ADO.NET interface is that they don't expose more strongly typed ways of getting data in and out. The casting time from an object and back is an absolute killer in high volume applications.
Another side effect of strongly typing is that you often will find weakly typed reference problems in your code. We found that through implementing structs in some cases to avoid putting pressure on the GC we could further speed up our code. Combine this with strongly typing for your best speed increase.
Sometimes you have to use weakly typed interfaces within the dot net runtime. Whenever possible though look for ways to stay strongly typed. It really does make a huge difference in performance for non trivial applications.
Generics in C# are truly generic types from the CLR perspective. There should not be any fundamental difference between the performance of a generic class and a specific class that does the exact same thing. This is different from Java Generics, which are more of an automated type cast where needed or C++ templates that expand at compile time.
Here's a good paper, somewhat old, that explains the basic design:
"Design and Implementation of Generics for the
.NET Common Language Runtime".
If you hand-write classes for specific tasks chances are you can optimize some aspects where you would need additional detours through an interface of a generic type.
In summary, there may be a performance benefit but I would recommend the generic solution first, then optimize if needed. This is especially true if you expect to instantiate the generic with many different types.
I did some simple benchmarking on ArrayList's vs Generic Lists for a different question: Generics vs. Array Lists, your mileage will vary, but the Generic List was 4.7 times faster than the ArrayList.
So yes, boxing / unboxing are critical if you are doing a lot of operations. If you are doing simple CRUD stuff, I wouldn't worry about it.
Generics are one of the way to parameterize code and avoid repetition. Looking at your program description and your thought of writing a separate class to deal with each and every data object, I would lean to generics. Having a single class taking care of many data objects, instead of many classes that do the same thing, increases your performance. And of course your performance, measured in the ability to change your code, is usually more important than the computer performance. :-)
According to Microsoft, Generics are faster than casting (boxing/unboxing primitives) which is true.
They also claim generics provide better performance than casting between reference types, which seems to be untrue (no one can quite prove it).
Tony Northrup - co-author of MCTS 70-536: Application Development Foundation - states in the same book the following:
I haven’t been able to reproduce the
performance benefits of generics;
however, according to Microsoft,
generics are faster than using
casting. In practice, casting proved
to be several times faster than using
a generic. However, you probably won’t
notice performance differences in your
applications. (My tests over 100,000
iterations took only a few seconds.)
So you should still use generics
because they are type-safe.
I haven't been able to reproduce such performance benefits with generics compared to casting between reference types - so I'd say the performance gain is "supposed" more than "significant".
if you compare a generic list (for example) to a specific list for exactly the type you use then the difference is minimal, the results from the JIT compiler are almost the same.
if you compare a generic list to a list of objects then there is significant benefits to the generic list - no boxing/unboxing for value types and no type checks for reference types.
also the generic collection classes in the .net library were heavily optimized and you are unlikely to do better yourself.
In the case of generic collections vs. boxing et al, with older collections like ArrayList, generics are a performance win. But in the vast majority of cases this is not the most important benefit of generics. I think there are two things that are of much greater benefit:
Type safety.
Self documenting aka more readable.
Generics promote type safety, forcing a more homogeneous collection. Imagine stumbling across a string when you expect an int. Ouch.
Generic collections are also more self documenting. Consider the two collections below:
ArrayList listOfNames = new ArrayList();
List<NameType> listOfNames = new List<NameType>();
Reading the first line you might think listOfNames is a list of strings. Wrong! It is actually storing objects of type NameType. The second example not only enforces that the type must be NameType (or a descendant), but the code is more readable. I know right away that I need to go find TypeName and learn how to use it just by looking at the code.
I have seen a lot of these "does x perform better than y" questions on StackOverflow. The question here was very fair, and as it turns out generics are a win any way you skin the cat. But at the end of the day the point is to provide the user with something useful. Sure your application needs to be able to perform, but it also needs to not crash, and you need to be able to quickly respond to bugs and feature requests. I think you can see how these last two points tie in with the type safety and code readability of generic collections. If it were the opposite case, if ArrayList outperformed List<>, I would probably still take the List<> implementation unless the performance difference was significant.
As far as performance goes (in general), I would be willing to bet that you will find the majority of your performance bottlenecks in these areas over the course of your career:
Poor design of database or database queries (including indexing, etc),
Poor memory management (forgetting to call dispose, deep stacks, holding onto objects too long, etc),
Improper thread management (too many threads, not calling IO on a background thread in desktop apps, etc),
Poor IO design.
None of these are fixed with single-line solutions. We as programmers, engineers and geeks want to know all the cool little performance tricks. But it is important that we keep our eyes on the ball. I believe focusing on good design and programming practices in the four areas I mentioned above will further that cause far more than worrying about small performance gains.
Generics are faster!
I also discovered that Tony Northrup wrote wrong things about performance of generics and non-generics in his book.
I wrote about this on my blog:
http://andriybuday.blogspot.com/2010/01/generics-performance-vs-non-generics.html
Here is great article where author compares performance of generics and non-generics:
nayyeri.net/use-generics-to-improve-performance
If you're thinking of a generic class that calls methods on some interface to do its work, that will be slower than specific classes using known types, because calling an interface method is slower than a (non-virtual) function call.
Of course, unless the code is the slow part of a performance-critical process, you should focus of clarity.
See Rico Mariani's Blog at MSDN too:
http://blogs.msdn.com/ricom/archive/2005/08/26/456879.aspx
Q1: Which is faster?
The Generics version is considerably
faster, see below.
The article is a little old, but gives the details.
Not only can you do away with boxing but the generic implementations are somewhat faster than the non generic counterparts with reference types due to a change in the underlying implementation.
The originals were designed with a particular extension model in mind. This model was never really used (and would have been a bad idea anyway) but the design decision forced a couple of methods to be virtual and thus uninlineable (based on the current and past JIT optimisations in this regard).
This decision was rectified in the newer classes but cannot be altered in the older ones without it being a potential binary breaking change.
In addition iteration via foreach on an List<> (rather than IList<>) is faster due to the ArrayList's Enumerator requiring a heap allocation. Admittedly this did lead to an obscure bug

Categories