Related
I'm writing classes to be a "higher-level" representation of some binary structures as binary files, tcp packets, etc.
For this, and for the sake of readability, that would be very nice if I could define some custom attribute to determine some information about each class' field (e.g. the offset of that field in the binary buffer, the size of the field, etc).
I could accomplish this by declaring constant integers, but IMHO the code would be very ugly and mucky. So I thought about using attributes, which are a very elegant way to accomplish what I want. Features like the InteropServices.Marshal actually uses attributes (as StructLayout, MarshalAs and FieldOffset) to accomplish something very similar to what I want, so I can only assume the performance tradeoff is advantageous compared to the gain of readability (please correct me if I'm wrong).
So, how the forementioned InteropServices' attributes are handled by the compiler/CLR?
Do you guys think the forementioned tradeoff is worth? If yes, the best way to deal with the attributes is using the default method using Refletion? I'm assuming that there could be other ways to access attributes rather than Reflection because I know this is a bit expensive and Marshal uses it in almost every method.
Any helpful thought would be very appreciated, thanks.
What you are proposing sounds reasonable assuming the parallels to the Interop are as clear as you are describing. To avoid the performance issues of using reflection for every property access you can use reflection once, maybe via a static constructor, and build compiled Expressions for each property instead. The performance should be the equivalent of calling a virtual method I would think.
Here is a link to a blog post denoting the performance differences between different dynamic invocation types. Compiled expressions are ~10x faster then cached reflection and "only" 2x slower then compiled property access.
http://www.palmmedia.de/Blog/2012/2/4/reflection-vs-compiled-expressions-vs-delegates-performance-comparision
If I have a class with some value member that I want to store regardless of type, I would think that an object type would be the best. Lets say that the object can realistically be one of three types: string, int, customeClass. Would it be better to keep an extra enum member of the class with what type is stored in the value? Or is the execution of
if(object is string){...}
else if(object is int){...}
else if(object is customeClass){...}
fast enough that it's not worth storing the extra information?
Don't do premature optimization before measuring and proving that is operator is a local bottleneck on a hot code path. Maintaining the enum for the alternative approach will tax development over long time.
With all respect...I think this is called premature optimization. I wouldn't worry about the speed of an If statement even if it is a loop of 10k+ iterations.
The performance of the IS test in .NET is rarely a performance problem - worst case is when the result is False and the object has a deep inheritance hierarchy. IS will have to perform multiple lookups going up the object's inheritance chain.
Unless your objects all have deep (100+) inheritance, the performance difference will be negligible.
The biggest difference between testing inheritance and testing an enum is that you can test an enum against multiple values in a switch statement. IS tests always require an if statement chain. The larger (> 10?) the number of elements to test, the greater the performance advantage of switch statements over if statements.
Creating a design where testing the type of the object is a key element of the design seems highly questionable. You should be trying to use polymorphism to allow you to manipulate various types without needing to know the type of each object. A virtual method call will be faster than testing inheritance or testing enums when the number of types involved is greater than 1.
This code will execute as fast as a cast (in other words it will be very fast as it is a single IL instruction). Retrieving an enum value from a property involves a lot more work (items must be pushed and popped from the stack and methods must be called) which would make it much slower than using is.
All that being said, however, you shouldn't be concerning yourself with performance at this level until you have identified that this code is introducing performance issues. For most purposes this kind of optimization produces almost no measurable performance difference.
That will be fast enough. Storing any extra data would be uncessary as you'd have to lookup the properties of the instance of the class to get the type that you stored anyway if you did.
It is fast enough, it seems that you are doing a premature optimization which you shouldn't unless you've identified this as a bottleneck for your application performance.
As a follow up to this question What are the advantages of built-in immutability of F# over C#?--am I correct in assuming that the F# compiler can make certain optimizations knowing that it's dealing with largely immutable code? I mean even if a developer writes "Functional C#" the compiler wouldn't know all of the immutability that the developer had tried to code in so that it couldn't make the same optimizations, right?
In general would the compiler of a functional language be able to make optimizations that would not be possible with an imperative language--even one written with as much immutability as possible?
Am I correct in assuming that the F# compiler can make certain
optimizations knowing that it's dealing with largely immutable code?
Unfortunately not. To a compiler writer, there's a huge difference between "largely immutable" and "immutable". Even guaranteed immutability is not that important to the optimizer; the main thing that it buys you is you can write a very aggressive inliner.
In general would the compiler of a functional language be able to make optimizations that would not be possible with an imperative language--even one written with as much immutability as possible?
Yes, but it's mostly a question of being able to apply the classic optimizations more easily, in more places. For example, immutability makes it much easier to apply common-subexpression elimination because immutability can guarantee you that contents of certain memory cells are not changed.
On the other hand, if your functional language is not just immutable but pure (no side effects like I/O), then you enable a new class of optimizations that involve rewriting source-level expressions to more efficient expressions. One of the most important and more interesting to read about is short-cut deforestation, which is a way to avoid allocating memory space for intermediate results. A good example to read about is stream fusion.
If you are compiling a statically typed, functional language for high performance, here are some of the main points of emphasis:
Use memory effectively. When you can, work with "unboxed" values, avoiding allocation and an extra level of indirection to the heap. Stream fusion in particular and other deforestation techniques are all very effective because they eliminate allocations.
Have a super-fast allocator, and amortize heap-exhaustion checks over multiple allocations.
Inline functions effectively. Especially, inline small functions across module boundaries.
Represent first-class functions efficiently, usually through closure conversion. Handle partially applied functions efficiently.
Don't overlook the classic scalar and loop optimizations. They made a huge difference to compilers like TIL and Objective Caml.
If you have a lazy functional language like Haskell or Clean, there are also a lot of specialized things to do with thunks.
Footnotes:
One interesting option you get with total immutability is more ability to execute very fine-grained parallelism. The end of this story has yet to be told.
Writing a good compiler for F# is harder than writing a typical compiler (if there is such a thing) because F# is so heavily constrained: it must do the functional things well, but it must also work effectively within the .NET framework, which was not designed with functional languages in mind. We owe a tip of the hat to Don Syme and his team for doing such a great job on a heavily constrained problem.
No.
The F# compiler makes no attempt to analyze the referential transparency of a method or lambda. The .NET BCL is simply not designed for this.
The F# language specification does reserve the keyword 'pure', so manually marking a method as pure may be possible in vNext, allowing more aggressive graph reduction of lambda-expressions.
However, if you use the either record or algebraic types, F# will create default comparison and equality operators, and provide copy semantics. Amongst many other benefits (pattern-matching, closed-world assumption) this reduces a significant burden!
Yes, if you don't consider F#, but consider Haskell for instance. The fact that there are no side effects really opens up a lot of possibilities for optimization.
For instance consider in a C like language:
int factorial(int n) {
if (n <= 0) return 1;
return n* factorial(n-1);
}
int factorialuser(int m) {
return factorial(m) * factorial(m);
}
If a corresponding method was written in Haskell, there would be no second call to factorial when you call factorialuser. It might be possible to do this in C#, but I doubt the current compilers do it, even for a simple example as this. As things get more complicated, it would be hard for C# compilers to optimize to the level Haskell can do.
Note, F# is not really a "pure" functional language, currently. So, I brought in Haskell (which is great!).
Unfortunately, because F# is only mostly pure there aren't really that many opportunities for aggressive optimization. In fact, there are some places where F# "pessimizes" code compared to C# (e.g. making defensive copies of structs to prevent observable mutation). On the bright side, the compiler does a good job overall despite this, providing comparable performace to C# in most places nonetheless while simultaneously making programs easier to reason about.
I would say largely 'no'.
The main 'optimization' advantages you get from immutability or referential transparency are things like the ability to do 'common subexpression elimination' when you see code like ...f(x)...f(x).... But such analysis is hard to do without very precise information, and since F# runs on the .Net runtime and .Net has no way to mark methods as pure (effect-free), it requires a ton of built-in information and analysis to even try to do any of this.
On the other hand, in a language like Haskell (which mostly means 'Haskell', as there are few languages 'like Haskell' that anyone has heard of or uses :)) that is lazy and pure, the analysis is simpler (everything is pure, go nuts).
That said, such 'optimizations' can often interact badly with other useful aspects of the system (performance predictability, debugging, ...).
There are often stories of "a sufficiently smart compiler could do X", but my opinion is that the "sufficiently smart compiler" is, and always will be, a myth. If you want fast code, then write fast code; the compiler is not going to save you. If you want common subexpression elimination, then create a local variable (do it yourself).
This is mostly my opinion, and you're welcome to downvote or disagree (indeed I've heard 'multicore' suggested as a rising reason that potentially 'optimization may get sexy again', which sounds plausible on the face of it). But if you're ever hopeful about any compiler doing any non-trivial optimization (that is not supported by annotations in the source code), then be prepared to wait a long, long time for your hopes to be fulfilled.
Don't get me wrong - immutability is good, and is likely to help you write 'fast' code in many situations. But not because the compiler optimizes it - rather, because the code is easy to write, debug, get correct, parallelize, profile, and decide which are the most important bottlenecks to spend time on (possibly rewriting them mutably). If you want efficient code, use a development process that let you develop, test, and profile quickly.
Additional optimizations for functional languages are sometimes possible, but not necessarily because of immutability. Internally, many compilers will convert code into an SSA (single static assignment) form, where each local variable inside a function can only be assigned once. This can be done for both imperative and functional languages. For instance:
x := x + 1
y := x + 4
can become
x_1 := x_0 + 1
y := x_1 + 4
where x_0 and x_1 are different variable names. This vastly simplifies many transformations, since you can move bits of code around without worrying about what value they have at specific points in the program. This doesn't work for values stored in memory though (i.e., globals, heap values, arrays, etc). Again, this is done for both functional and imperative languages.
One benefit most functional languages provide is a strong type system. This allows the compiler to make assumptions that it wouldn't be able to otherwise. For instance, if you have two references of different types, the compiler knows that they cannot alias (point to the same thing). This is not an assumption a C compiler could ever make.
I have a number of data classes representing various entities.
Which is better: writing a generic class (say, to print or output XML) using generics and interfaces, or writing a separate class to deal with each data class?
Is there a performance benefit or any other benefit (other than it saving me the time of writing separate classes)?
There's a significant performance benefit to using generics -- you do away with boxing and unboxing. Compared with developing your own classes, it's a coin toss (with one side of the coin weighted more than the other). Roll your own only if you think you can out-perform the authors of the framework.
Not only yes, but HECK YES. I didn't believe how big of a difference they could make. We did testing in VistaDB after a rewrite of a small percentage of core code that used ArrayLists and HashTables over to generics. 250% or more was the speed improvement.
Read my blog about the testing we did on generics vs weak type collections. The results blew our mind.
I have started rewriting lots of old code that used the weakly typed collections into strongly typed ones. One of my biggest grips with the ADO.NET interface is that they don't expose more strongly typed ways of getting data in and out. The casting time from an object and back is an absolute killer in high volume applications.
Another side effect of strongly typing is that you often will find weakly typed reference problems in your code. We found that through implementing structs in some cases to avoid putting pressure on the GC we could further speed up our code. Combine this with strongly typing for your best speed increase.
Sometimes you have to use weakly typed interfaces within the dot net runtime. Whenever possible though look for ways to stay strongly typed. It really does make a huge difference in performance for non trivial applications.
Generics in C# are truly generic types from the CLR perspective. There should not be any fundamental difference between the performance of a generic class and a specific class that does the exact same thing. This is different from Java Generics, which are more of an automated type cast where needed or C++ templates that expand at compile time.
Here's a good paper, somewhat old, that explains the basic design:
"Design and Implementation of Generics for the
.NET Common Language Runtime".
If you hand-write classes for specific tasks chances are you can optimize some aspects where you would need additional detours through an interface of a generic type.
In summary, there may be a performance benefit but I would recommend the generic solution first, then optimize if needed. This is especially true if you expect to instantiate the generic with many different types.
I did some simple benchmarking on ArrayList's vs Generic Lists for a different question: Generics vs. Array Lists, your mileage will vary, but the Generic List was 4.7 times faster than the ArrayList.
So yes, boxing / unboxing are critical if you are doing a lot of operations. If you are doing simple CRUD stuff, I wouldn't worry about it.
Generics are one of the way to parameterize code and avoid repetition. Looking at your program description and your thought of writing a separate class to deal with each and every data object, I would lean to generics. Having a single class taking care of many data objects, instead of many classes that do the same thing, increases your performance. And of course your performance, measured in the ability to change your code, is usually more important than the computer performance. :-)
According to Microsoft, Generics are faster than casting (boxing/unboxing primitives) which is true.
They also claim generics provide better performance than casting between reference types, which seems to be untrue (no one can quite prove it).
Tony Northrup - co-author of MCTS 70-536: Application Development Foundation - states in the same book the following:
I haven’t been able to reproduce the
performance benefits of generics;
however, according to Microsoft,
generics are faster than using
casting. In practice, casting proved
to be several times faster than using
a generic. However, you probably won’t
notice performance differences in your
applications. (My tests over 100,000
iterations took only a few seconds.)
So you should still use generics
because they are type-safe.
I haven't been able to reproduce such performance benefits with generics compared to casting between reference types - so I'd say the performance gain is "supposed" more than "significant".
if you compare a generic list (for example) to a specific list for exactly the type you use then the difference is minimal, the results from the JIT compiler are almost the same.
if you compare a generic list to a list of objects then there is significant benefits to the generic list - no boxing/unboxing for value types and no type checks for reference types.
also the generic collection classes in the .net library were heavily optimized and you are unlikely to do better yourself.
In the case of generic collections vs. boxing et al, with older collections like ArrayList, generics are a performance win. But in the vast majority of cases this is not the most important benefit of generics. I think there are two things that are of much greater benefit:
Type safety.
Self documenting aka more readable.
Generics promote type safety, forcing a more homogeneous collection. Imagine stumbling across a string when you expect an int. Ouch.
Generic collections are also more self documenting. Consider the two collections below:
ArrayList listOfNames = new ArrayList();
List<NameType> listOfNames = new List<NameType>();
Reading the first line you might think listOfNames is a list of strings. Wrong! It is actually storing objects of type NameType. The second example not only enforces that the type must be NameType (or a descendant), but the code is more readable. I know right away that I need to go find TypeName and learn how to use it just by looking at the code.
I have seen a lot of these "does x perform better than y" questions on StackOverflow. The question here was very fair, and as it turns out generics are a win any way you skin the cat. But at the end of the day the point is to provide the user with something useful. Sure your application needs to be able to perform, but it also needs to not crash, and you need to be able to quickly respond to bugs and feature requests. I think you can see how these last two points tie in with the type safety and code readability of generic collections. If it were the opposite case, if ArrayList outperformed List<>, I would probably still take the List<> implementation unless the performance difference was significant.
As far as performance goes (in general), I would be willing to bet that you will find the majority of your performance bottlenecks in these areas over the course of your career:
Poor design of database or database queries (including indexing, etc),
Poor memory management (forgetting to call dispose, deep stacks, holding onto objects too long, etc),
Improper thread management (too many threads, not calling IO on a background thread in desktop apps, etc),
Poor IO design.
None of these are fixed with single-line solutions. We as programmers, engineers and geeks want to know all the cool little performance tricks. But it is important that we keep our eyes on the ball. I believe focusing on good design and programming practices in the four areas I mentioned above will further that cause far more than worrying about small performance gains.
Generics are faster!
I also discovered that Tony Northrup wrote wrong things about performance of generics and non-generics in his book.
I wrote about this on my blog:
http://andriybuday.blogspot.com/2010/01/generics-performance-vs-non-generics.html
Here is great article where author compares performance of generics and non-generics:
nayyeri.net/use-generics-to-improve-performance
If you're thinking of a generic class that calls methods on some interface to do its work, that will be slower than specific classes using known types, because calling an interface method is slower than a (non-virtual) function call.
Of course, unless the code is the slow part of a performance-critical process, you should focus of clarity.
See Rico Mariani's Blog at MSDN too:
http://blogs.msdn.com/ricom/archive/2005/08/26/456879.aspx
Q1: Which is faster?
The Generics version is considerably
faster, see below.
The article is a little old, but gives the details.
Not only can you do away with boxing but the generic implementations are somewhat faster than the non generic counterparts with reference types due to a change in the underlying implementation.
The originals were designed with a particular extension model in mind. This model was never really used (and would have been a bad idea anyway) but the design decision forced a couple of methods to be virtual and thus uninlineable (based on the current and past JIT optimisations in this regard).
This decision was rectified in the newer classes but cannot be altered in the older ones without it being a potential binary breaking change.
In addition iteration via foreach on an List<> (rather than IList<>) is faster due to the ArrayList's Enumerator requiring a heap allocation. Admittedly this did lead to an obscure bug
The system I work on here was written before .net 2.0 and didn't have the benefit of generics. It was eventually updated to 2.0, but none of the code was refactored due to time constraints. There are a number of places where the code uses ArraysLists etc. that store things as objects.
From performance perspective, how important change the code to using generics? I know from a perfomance perspective, boxing and unboxing etc., it is inefficient, but how much of a performance gain will there really be from changing it? Are generics something to use on a go forward basis, or it there enough of a performance change that a conscience effort should be made to update old code?
Technically the performance of generics is, as you say, better. However, unless performance is hugely important AND you've already optimised in other areas you're likely to get MUCH better improvements by spending your time elsewhere.
I would suggest:
use generics going forward.
if you have solid unit tests then refactor to generics as you touch code
spend other time doing refactorings/measurement that will significantly improve performance (database calls, changing data structures, etc) rather than a few milliseconds here and there.
Of course there's reasons other than performance to change to generics:
less error prone, since you have compile-time checking of types
more readable, you don't need to cast all over the place and it's obvious what type is stored in a collection
if you're using generics going forward, then it's cleaner to use them everywhere
Here's the results I got from a simple parsing of a string from a 100KB file 100,000 times. The Generic List(Of char) took 612.293 seconds to go 100,000 times through the file.
The ArrayList took 2,880.415 seconds to go 100,000 times through the file. This means in this scenario (as your mileage will vary) the Generic List(Of char) is 4.7 times faster.
Here is the code I ran through 100,000 times:
Public Sub Run(ByVal strToProcess As String) Implements IPerfStub.Run
Dim genList As New ArrayList
For Each ch As Char In strToProcess.ToCharArray
genList.Add(ch)
Next
Dim dummy As New System.Text.StringBuilder()
For i As Integer = 0 To genList.Count - 1
dummy.Append(genList(i))
Next
End Sub
Public Sub Run(ByVal strToProcess As String) Implements IPerfStub.Run
Dim genList As New List(Of Char)
For Each ch As Char In strToProcess.ToCharArray
genList.Add(ch)
Next
Dim dummy As New System.Text.StringBuilder()
For i As Integer = 0 To genList.Count - 1
dummy.Append(genList(i))
Next
End Sub
The only way to know for sure is to profile your code using a tool like dotTrace.
http://www.jetbrains.com/profiler/
It's possible that the boxing/unboxing is trivial in your particular application and wouldn't be worth refactoring. Going forward, you should still consider using generics due to the compile-time type safety.
Generics, whether Java or .NET, should be used for design and type safety, not for performance. Autoboxing is different from generics (essentially implicit object to primitive conversions), and as you mentioned, you should NOT use them in place of a primitive if there is to be a lot of arithmetic or other operations which will cause a performance hit from the repeated implicit object creation/destruction.
Overall I would suggest using going forward, and only updating existing code if it needs to be cleaned up for type safety / design purposes, not performance.
It depends, the best answer is to profile your code and see. I like AQTime but a number of packages exist for this.
In general, if an ArrayList is being used a LOT it may be worth switching it to a generic version. Really though, it's most likely that you wouldn't even be able to measure the performance difference. Boxing and unboxing are extra steps but modern computers are so fast that it makes almost no difference. As an ArrayList is really just an normal array with a nice wrapper, you would probably see much more performance gained from better data structure selection (ArrayList.Remove is O(n)!) than with the conversion to generics.
Edit: Outlaw Programmer has a good point, you will still be boxing and unboxing with generics, it just happens implicitly. All the code around checking for exceptions and nulls from casting and "is/as" keywords would help a bit though.
The biggest gains, you will find in Maintenance phases. Generics are much easier to deal with and update, without having to deal with conversion and casting issues. If this is code that you continually visit, then by all means take the effort. If this is code that hasn't been touched in years, I wouldn't really bother.
What does autoboxing/unboxing have to do with generics? This is just a type-safety issue. With a non-generic collection, you are required to explicitly cast back to an object's actual type. With generics, you can skip this step. I don't think there is a performance difference one way or the other.
My old company actually considered this problem. The approach we took was: if it's easy to refactor, do it; if not (i.e. it will touch too many classes), leave it for a later time. It really depends on whether or not you have the time to do it, or whether there are more important items to be coding (i.e. features you should be implementing for clients).
Then again, if you're not working on something for a client, go ahead and spend time refactoring. It'll improve readability of the code for yourself.
Depends on how much is out there in your code. If you binding or display large lists in the UI, you would probably see a great gain in performance.
If your ArrayList are just sprinkled about here and there, then it probably wouldn't be a big deal to just get it cleaned up, but also wouldn't impact overall performance very much.
If you are using a lot a ArrayLists throughout your code and it would be a big untertaking to replace them (something that may impact your schedules), then you could adopt a if-you-touch-it-change-it approach.
Main thing is, though, that Generics are a lot easier to read, and are more stable across the app due to the strong typing you get from them. You'll see gains not just from performance, but from code maintainablity and stability. If you can do it quickly, I'd say do it.
If you can get buy-in from the Product Owner, I'd recommend getting it cleaned up. You love your code more afterward.
If the entities in the ArrayLists are Object types, you'll gain a little from not casting them to the correct type. If they're Value types (structs or primitives like Int32), then the boxing/unboxing process adds a lot of overhead, and Generic collections should be much faster.
Here's an MSDN article on the subject
Generics has much better performance especially if you'll be using value-type (int, bool, struct etc.) where you'll gain a noticeble performance gain.
Using Arraylist with value-types causes boxing/unboxing which if done several hundred times is substantialy slower then using generic List.
when storing value-types as object you'll up to four time memory per item. While this amount won't drain your RAM the cache memory that is smaller could contain less items, Which means that while iterating a long collections there would be many copies from the main memory to the cache that would slow your application.
I wrote about here.
Using generics should also mean that your code will be simplier and easy to use if you want to leverage things like linq in the later c# versions.