If reflection is inefficient, when is it most appropriate? - c#

I find a lot of cases where I think to myself that I could use relfection to solve a problem, but I usually don't because I hear a lot along the lines of "don't use reflection, it's too inefficient".
Now I'm in a position where I have a problem where I can't find any other solution than to use reflection with new T(), as outlined in this question & answer.
So I'm wondering if somebody can tell me reflection's specific intended usage, and if there's a set of guidelines to indicate when it's appropriate and when it isn't?

It is often "fast enough", and if you need faster (for tight loops etc) you can do meta-programming with Expression or ILGenerator (perhaps via DynamicMethod), to make extremely fast code (including some tricks you can't do in C#).
Reflection is more commonly used for framework/library scenarios, where the library by definition knows nothing about the caller, and must work based on configuration, attributes or patterns.

If there's one thing that I hate hearing it's "don't use reflection, it's too inefficient".
Too inefficient for what? If you're writing a console application that's run once a month and isn't time critical, does it really matter if it takes 30 seconds instead of 28, because of you using reflection?
Guidelines for when it's inappropriate to use are ones that only you can really put together as they're heavily dependent on what you're doing and how efficient/performant alternatives are.

A useful abstraction for code efficiency is to partition it in three categories of time, each about 3 orders of magnitude apart.
First is human-time. There's a lot you can do when you only need to keep a person happy with the performance of your code. Humans cannot perceive the difference between code that needs 10 milliseconds or 20 milliseconds, both look instant. And a human is forgiving when a program needs 6 seconds instead of 5, roughly 3 billion machine instructions more. Common examples of programs that run at human-time are compilers and point-and-click designers. Using reflection is never a problem.
Then there is I/O-time. When your program needs to hit the disk or the network. I/O is slow, restricted by mechanical motion in the case of the disk, bandwidth and latency in the case of a network. You can always tell when I/O is the bottleneck, your program is running but it isn't driving up the CPU load much. The operating system is constantly blocking the thread, making it wait until the I/O request is complete.
Reflection operates at I/O-time. To retrieve type data, the CLR must read the assembly metadata. And when that wasn't done before, your program will cause a page-fault, requiring the operating system to read the data from disk. What follows is that, roughly, reflection can make I/O bound code only twice as slow. Usually better because after the first perf hit, the metadata is cached and can be retrieved a lot quicker. Reflection is thus often an acceptable trade-off. The canonical examples are serialization and dbase ORMs.
Then there's machine-time. The raw performance of a CPU core is stupendous. A property getter can execute in somewhere between 0 and 1/2 a nanosecond. This does not compare favorably with, say, PropertyInfo.GetValue(). Both will keep the CPU busy, you'll see the CPU load for the core at 100%. But GetValue() costs hundreds if not thousands of machine code instructions. Not counting the time needed to page in the metadata. While not much an incremental time, it builds up fast when you loop.
If you cannot classify your reflection code in the human-time or I/O-time categories then reflection is unlikely to be an appropriate substitute for regular code.

The key to keeping reflection from slowing down your program is to not use it inside a loop. If you want to read a property from an object during startup (happens once), use reflection. You want to read a property from a list of 10,000 objects of unknown type, use reflection to get the property getter delegate once (search term: PropertyInfo.GetGetMethod), then call the delegate 10,000 types. There are plenty of examples of this on StackOverflow.

Reflection is not inefficient. It is less efficient than direct calls. So personnaly I use reflection when there's no equivalent compile time safe method. IMHO the problem with reflection is not so much the efficiency but the fragility of the code as it uses magic strings which are very refactor unfriendly.

I use it for plugin architecture - looking through assemblies in the plugin folder for methods marked with a custom attribute indicating info about the plugin - and in a logging framework. The framework detects a custom attribute on the assembly itself which holds information about the author of the assembly, the project, version information, and other tags that are logged along with everything in the stack trace.
Going to give away a 'trade secret', but it's a good one. The framework allows you to tag each method or class with a 'Story ref', e.g.
[StoryRef(Ref="ImportCSV1")]
...and the idea is it would integrate into our agile project management framework: if there were any exceptions thrown within that class/method, the logging method would use reflection to check for a StoryRef attribute in the stack trace, and if so that would be logged as an exception against that story. In the PM software you could see exceptions by Story (a story is like an extreme/agile use case).
I think that's a valid use, at least! Basically, when it just seems the most neat, and appropriate way to do it, I use reflection. Nothing else really comes into it - I can't think of an occasion you'd be using reflection to make that many calls that efficiency would come into it.

So I'm wondering if somebody can tell
me reflection's specific intended
usage, and if there's a set of
guidelines to indicate when it's
appropriate and when it isn't?
A bad example of reflection is this one from Wikipedia:
//Without reflection
Foo foo = new Foo();
foo.Hello();
//With reflection
Type t = Type.GetType("FooNamespace.Foo");
object foo = Activator.CreateInstance(t);
t.InvokeMember("Hello", BindingFlags.InvokeMethod, null, foo, null);
Here, there is no advantage to using reflection: The non-reflection-using code is not only more efficient, but easier to understand.
Good uses of reflection are things like serialization and object-relational mapping, which are easy to implement if you have a list of a class's properties, but otherwise require a custom-written function for each class.

Related

Repeated access to properties and speed in C# [duplicate]

Please ignore code readability in this question.
In terms of performance, should the following code be written like this:
int maxResults = criteria.MaxResults;
if (maxResults > 0)
{
while (accounts.Count > maxResults)
accounts.RemoveAt(maxResults);
}
or like this:
if (criteria.MaxResults > 0)
{
while (accounts.Count > criteria.MaxResults)
accounts.RemoveAt(criteria.MaxResults);
}
?
Edit: criteria is a class, and MaxResults is a simple integer property (i.e., public int MaxResults { get { return _maxResults; } }.
Does the C# compiler treat MaxResults as a black box and evaluate it every time? Or is it smart enough to figure out that I've got 3 calls to the same property with no modification of that property between the calls? What if MaxResults was a field?
One of the laws of optimization is precalculation, so I instinctively wrote this code like the first listing, but I'm curious if this kind of thing is being done for me automatically (again, ignore code readability).
(Note: I'm not interested in hearing the 'micro-optimization' argument, which may be valid in the specific case I've posted. I'd just like some theory behind what's going on or not going on.)
First off, the only way to actually answer performance questions is to actually try it both ways and test the results in realistic conditions.
That said, the other answers which say that "the compiler" does not do this optimization because the property might have side effects are both right and wrong. The problem with the question (aside from the fundamental problem that it simply cannot be answered without actually trying it and measuring the result) is that "the compiler" is actually two compilers: the C# compiler, which compiles to MSIL, and the JIT compiler, which compiles IL to machine code.
The C# compiler never ever does this sort of optimization; as noted, doing so would require that the compiler peer into the code being called and verify that the result it computes does not change over the lifetime of the callee's code. The C# compiler does not do so.
The JIT compiler might. No reason why it couldn't. It has all the code sitting right there. It is completely free to inline the property getter, and if the jitter determines that the inlined property getter returns a value that can be cached in a register and re-used, then it is free to do so. (If you don't want it to do so because the value could be modified on another thread then you already have a race condition bug; fix the bug before you worry about performance.)
Whether the jitter actually does inline the property fetch and then enregister the value, I have no idea. I know practically nothing about the jitter. But it is allowed to do so if it sees fit. If you are curious about whether it does so or not, you can either (1) ask someone who is on the team that wrote the jitter, or (2) examine the jitted code in the debugger.
And finally, let me take this opportunity to note that computing results once, storing the result and re-using it is not always an optimization. This is a surprisingly complicated question. There are all kinds of things to optimize for:
execution time
executable code size -- this has a major effect on executable time because big code takes longer to load, increases the working set size, puts pressure on processor caches, RAM and the page file. Small slow code is often in the long run faster than big fast code in important metrics like startup time and cache locality.
register allocation -- this also has a major effect on execution time, particularly in architectures like x86 which have a small number of available registers. Enregistering a value for fast re-use can mean that there are fewer registers available for other operations that need optimization; perhaps optimizing those operations instead would be a net win.
and so on. It get real complicated real fast.
In short, you cannot possibly know whether writing the code to cache the result rather than recomputing it is actually (1) faster, or (2) better performing. Better performance does not always mean making execution of a particular routine faster. Better performance is about figuring out what resources are important to the user -- execution time, memory, working set, startup time, and so on -- and optimizing for those things. You cannot do that without (1) talking to your customers to find out what they care about, and (2) actually measuring to see if your changes are having a measurable effect in the desired direction.
If MaxResults is a property then no, it will not optimize it, because the getter may have complex logic, say:
private int _maxResults;
public int MaxReuslts {
get { return _maxResults++; }
set { _maxResults = value; }
}
See how the behavior would change if it in-lines your code?
If there's no logic...either method you wrote is fine, it's a very minute difference and all about how readable it is TO YOU (or your team)...you're the one looking at it.
Your two code samples are only guaranteed to have the same result in single-threaded environments, which .Net isn't, and if MaxResults is a field (not a property). The compiler can't assume, unless you use the synchronization features, that criteria.MaxResults won't change during the course of your loop. If it's a property, it can't assume that using the property doesn't have side effects.
Eric Lippert points out quite correctly that it depends a lot on what you mean by "the compiler". The C# -> IL compiler? Or the IL -> machine code (JIT) compiler? And he's right to point out that the JIT may well be able to optimize the property getter, since it has all of the information (whereas the C# -> IL compiler doesn't, necessarily). It won't change the situation with multiple threads, but it's a good point nonetheless.
It will be called and evaluated every time. The compiler has no way of determining if a method (or getter) is deterministic and pure (no side effects).
Note that actual evaluation of the property may be inlined by the JIT compiler, making it effectively as fast as a simple field.
It's good practise to make property evaluation an inexpensive operation. If you do some heavy calculation in the getter, consider caching the result manually, or changing it to a method.
why not test it?
just set up 2 console apps make it look 10 million times and compare the results ... remember to run them as properly released apps that have been installed properly or else you cannot gurantee that you are not just running the msil.
Really you are probably going to get about 5 answers saying 'you shouldn't worry about optimisation'. they clearly do not write routines that need to be as fast as possible before being readable (eg games).
If this piece of code is part of a loop that is executed billions of times then this optimisation could be worthwhile. For instance max results could be an overridden method and so you may need to discuss virtual method calls.
Really the ONLY way to answer any of these questions is to figure out is this is a piece of code that will benefit from optimisation. Then you need to know the kinds of things that are increasing the time to execute. Really us mere mortals cannot do this a priori and so have to simply try 2-3 different versions of the code and then test it.
If criteria is a class type, I doubt it would be optimized, because another thread could always change that value in the meantime. For structs I'm not sure, but my gut feeling is that it won't be optimized, but I think it wouldn't make much difference in performance in that case anyhow.

Reference a big DLL for a single method

I want to use a single method from a big Class library-dll in C#.
Are there any drawbacks of performance or anything else?
Should I "read" the method code with reflection tool and copy-paste it to my project?
Update: The hard disk space isn't an issue. My application is web app.
Are there any drawbacks of performance or anything else?
The only one that is actually important is the size of your distributable, if it matters to you. (Users downloading a 30 MB file instead of a 2 MB one). Performance differences will be negligible. Assembly binding and verifying the Strong Name (if it's signed) hash may take longer, but unlikely to be noticeable to a user.
Should I "read" the method code with reflection tool and copy-paste it to my project?
Probably not; most licensing terms prohibit reverse engineering and/or only partial distribution. Check the license, if any, to see if you can even do it first.
No, leave that up to the JIT compiler. It is already selective about what IL actually gets turned into machine code, it only compiles what actually executes. You'll lose a bit of virtual memory address space but that doesn't cost anything, it's virtual. You don't pay for what you don't use.
The costs are disk space, load time and memory footprint. The JIT compiler will only compile what you call (there may by caveats to this, but it certainly will not compile the entire assembly). It's your call as to if it is worth your while to 'rip' out the method you need. Remember of course this could be a rabbit hole, by that I mean this method is likely to use other classes in its assembly so it may not be as simple as you think to extract the code you need.
Bah, extreme hacking needs in extreme cases. Copy methods code (if this is possible) is extreme hacking, imo.
If just frustration to use unnecessary memory, but basically pretty affordable solution, do it like you continue doing. Simple and easy.
If there are memory issues, and the method is not called too frequently (what is too frequently depends on your project) you can try to load it in external AppDomain, and unload that domain one time you finished with it. Sure you need take care of IPC management, in this case. Nothing comes free in this world.
You can try do what you wrote. User Reflector (or similar software), get the methods C# code and create its clone. All this supposing that method doesn't use DLL's internal's structures, states or whatever, cause in that case the story becomes fairly complicated.
Good luck.

Cost of using params in C#

Does anyone have advice for using the params in C# for method argument passing. I'm contemplating making overloads for the first 6 arguments and then a 7th using the params feature. My reasoning is to avoid the extra array allocation the params feature require. This is for some high performant utility methods. Any advice? Is it a waste of code to create all the overloads?
Honestly, I'm a little bothered by everyone shouting "premature optimization!" Here's why.
What you say makes perfect sense, particularly as you have already indicated you are working on a high-performance library.
Even BCL classes follow this pattern. Consider all the overloads of string.Format or Console.WriteLine.
This is very easy to get right. The whole premise behind the movement against premature optimization is that when you do something tricky for the purposes of optimizing performance, you're liable to break something by accident and make your code less maintainable. I don't see how that's a danger here; it should be very straightforward what you're doing, to yourself as well as any future developer who may deal with your code.
Also, even if you profiled the results of both approaches and saw only a very small difference in speed, there's still the issue of memory allocation. Creating a new array for every method call entails allocating more memory that will need to be garbage collected later. And in some scenarios where "nearly" real-time behavior is desired (such as algorithmic trading, the field I'm in), minimizing garbage collections is just as important as maximizing execution speed.
So, even if it earns me some downvotes: I say go for it.
(And to those who claim "the compiler surely already does something like this"--I wouldn't be so sure. Firstly, if that were the case, I fail to see why BCL classes would follow this pattern, as I've already mentioned. But more importantly, there is a very big semantic difference between a method that accepts multiple arguments and one that accepts an array. Just because one can be used as a substitute for the other doesn't mean the compiler would, or should, attempt such a substitution).
Yes, that's the strategy that the .NET framework uses. String.Concat() would be a good example. It has overloads for up to 4 strings, plus a fallback one that takes a params string[]. Pretty important here, Concat needs to be fast and is there to help the user fall in the pit of success when he uses the + operator instead of a StringBuilder.
The code duplication you'll get is the price. You'd profile them to see if the speedup is worth the maintenance headache.
Fwiw: there are plenty of micro-optimizations like this in the .NET framework. Somewhat necessary because the designers could not really predict how their classes were going to be used. String.Concat() is just as likely to be used in a tight inner loop that is critical to program perf as, say, a config reader that only runs once at startup. As the end-user of your own code, you typically have the luxury of not having to worry about that. The reverse is also true, the .NET framework code is remarkably free of micro-optimizations when it is unlikely that their benefit would be measurable. Like providing overloads when the core code is slow anyway.
You can always pass Tuple as a parameter, or if the types of the parameters are always the same, an IList<T>.
As other answers and comments have said, you should only optimize after:
Ensuring correct behavior.
Determining the need to optimize.
My point is, if your method is capable of getting unlimited number of parameters, then the logic inside it works in an array-style. So, having overloads for limited number of parameters wouldn't be helping. Unless, you can implement limited number of parameters in a whole different way that is much faster.
For example, if you're handing the parameters to a Console.WriteLine, there's a hidden array creation in there too, so either way you end up having an array.
And, sorry for bothering Dan Tao, I also feel like it is premature optimization. Because you need to know what difference would it make to have overloads with limited number of parameters. If your application is that much performance-critical, you'd need to implement both ways and try to run a test and compare execution times.
Don't even think about performance at this stage. Create whatever overloads will make your code easier to write and easier to understand at 4am two years from now. Sometimes that means params, sometimes that means avoiding it.
After you've got something that works, figure out if these are a performance problem. It's not hard to make the parameters more complicated, but if you add unnecessary complexity now, you'll never make them less so later.
You can try something like this to benchmark the performance so you have some concrete numbers to make decisions with.
In general, object allocation is slightly faster than in C/C++ and deletion is much, much faster for small objects -- until you have tens of thousands of them being made per second. Here's an old article regarding memory allocation performance.

Concurrency issues while accessing data via reflection in C#

I'm currently writing a library that can be used to show the internal state of some running code (mainly fields and properties both public and private). Objects are accessed in a different thread to put their info into a window for the user to see. The problem is, there are times while I'm walking a long IList in which its structure may change. Some piece of code in the program being 'watched' may add a new item, or even worse, remove some. This of course causes the whole thing to crash.
I've come up with some ideas but I'm afraid they're not quite correct:
Locking the list being accessed while I'm walking it. I'm not sure if this would work since the IList being used may have not been locked for writing at the other side.
Let the code being watched to be aware of my existence and provide some interfaces to allow for synchronization. (I'd really like it to be totally transparent though).
As a last resort, put every read access into a try/catch block and pretend as if nothing happened when it throws. (Can't think of an uglier solution that actually works).
Thanks in advance.
The only way you're going to keep things "transparent" to the code being monitored is to make the monitoring code robust in the face of state changes.
Some suggestions
Don't walk a shared list - make a copy of the list into a local List instance as soon (and as fast) as you can. Once you have a local (non-shared) list of instances, noone can monkey with the list.
Make things as robust as you can - putting every read into a try/catch might feel nasty, but you'll probably need to do it.
Option number 3 may feel ugly, but this looks to be similar to the approach the Visual Studio watch windows use, and I would choose that approach.
In Visual Studio, you can often set a watch on some list or collection and at a later point notice the watch simply displays an exception when it can't evaluate a certain value due to user or code state changes.
This is the most robust approach when dealing with such an open ended range of possibilities. The fact is, if your watching code is designed to support as many scenarios as possible you will not be able to think of all situations in advance. Handling and presenting exceptions nicely is the next best approach.
By the way, someone else mentioned that locking your data structures will work. This is not true if the "other code" is not also using locks for synchronization. In fact both pieces of code must lock the same synchronization object, very unlikely if you don't control the other code. (I think you mention this in your question, so I agree.)
While I like Bevan's idea of copying the list for local read access, if the list is particularly large, that may not be a truly viable option.
If you really need seamless, transparent, concurrent access to these lists, you should look into the Parallel Extensions for .NET library. It is currently available for .NET 2.0 through 3.5 as a CTP. The extensions will be officially included in .NET 4.0 along with some additional collections. I think you would be interested in the BlockingCollection from the CTP, which would give you that transparent concurrent access you need. There is obviously a performance hit as with any threaded stuff that involves synchronization, however these collections are fairly well optimized.
As I understand, you don't want to have ANY dependency/requirement on the code being watched or enforce any constrains on how the code is written.
Although this is my favourite approach to code a "watcher", this causes you application to face a very broad range of code and behaviours, which can cause it to crash.
So, as said before me, my advice is to make the watcher "robust" in the first step. You should be prepared for anything going wrong anywhere in your code, because considering the "transparency", many things can potentially go wrong! (Be careful where to put your try/catch, entering and leaving the try block many times can have a visible performance impact)
When you're done making your code robust, next steps would be making it more usable and dodging the situations that can cause exceptions, like the "list" thing you mentioned. For example, you can check the watched object and see if it's a list, and it's not too long, first make a quick copy of it and then do the rest. This way you eliminate a large amount of the probability that can make your code throw.
Locking the list will work, because it is being modified, as you've observed via the crashing :)
Seems to me, though, that I'd avoid locking (because it seems that your thread is only the 'watcher' and shouldn't really interrupt).
On this basis, I would just try and handle the cases where you determine missing things. Is this not possible?

Why doesn't .NET have a SoftReference as well as a WeakReference, like Java?

I really love WeakReference's. But I wish there was a way to tell the CLR how much (say, on a scale of 1 to 5) how weak you consider the reference to be. That would be brilliant.
Java has SoftReference, WeakReference and I believe also a third type called a "phantom reference". That's 3 levels right there which the GC has a different behaviour algorithm for when deciding if that object gets the chop.
I am thinking of subclassing .NET's WeakReference (luckily and slightly bizzarely it isn't sealed) to make a pseudo-SoftReference that is based on a expiration timer or something.
I believe the fundamental reason that NET does not have soft references is because it can rely on an operating system with virtual memory. A Java process must specify its maximum OS memory (e.g. with -Xmx128M), and it never takes more OS memory than that. Whereas a NET process keeps taking OS memory that it needs, which the OS supplies with disk-backed virtual memory when RAM runs out. If NET allowed soft references, then the NET runtime would not know when to release them unless it either peeked deep into the OS to see if its memory is actually paged on disk (a nasty OS/CLR dependency), or it requested the runtime to specify a maximum process memory footprint (e.g. an equivalent of -Xmx). I guess that Microsoft does not want to add -Xmx to NET because they think the OS should decide how much RAM each process gets (by choosing which virtual memory pages to hold in RAM or on disk), and not the process itself.
Java SoftReferences are used in the creation of memory sensitive caches (they serve no other purpose).
As of .NET 4, .NET has a class System.Runtime.Caching.MemoryCache which will probably meet any such needs.
Having a WeakReference with varying levels of weakness (priority) sounds nice, but also might make the GC's job harder, not easier. (I've no idea on the GC internals, but) I would assume there some sort of additional access statistics that are kept for WeakReference objects so that the GC can clean them up efficiently (e.g. it might get rid of the least-used items first).
More than likely the added complexity wouldn't make anything any more efficient because the most efficient way is to get rid of infrequently used WeakReferences first. If you could assign a priority, how would you do it? This smells like a premature optimization: the programmer doesn't really know most of the time and is guessing; the result is a slower GC collection cycle that is probably reclaiming the wrong objects.
It begs the question though, that if you care about the WeakReference.Target object being reclaimed, is it really a good use of WeakReference?
It's like a cache. You shove stuff into the cache and ask the cache to make it stale after x minutes, but most caches never guarantee to keep it around at all. It just guarantees that if it does, it will expire it according to the policy requested.
My guess as to why this isn't there already would be simplicity. Most people, I think, would call it a virtue that there is only one type of reference, not four.
Maybe the ASP.NET Cache class (System.Web.Caching.Cache) might help achieve what you want? It automatically remove objects if memory gets low:
ASP.NET Caching Overview
Here's an article that shows how to use the Cache class in a windows forms application.
quoted from: Equivalent to SoftReference in .net?
Don't forget that you also have your standard references (the ones that you use on a daily basis). This gives you one more level.
WeakReferences should be used when you don't really care if the object goes away, while SoftReferences really only should be used when you would use a normal reference, but you would rather your object be cleared then for you to run out of memory. I'm not sure on the specifics, but I suspect that the GC normally traces through SoftReferences but not WeakReferences when determining which objects are live, but when running low on memory will also skip the SoftReferences.
My guess is that the .Net designers felt that the difference was confusing to most people and or that SoftReferences add more complexity than they really wanted and so decided to leave them out.
As a side note, AFAIK PhantomReferences are mostly designed for internal use by the virtual machine and are not intended for actual client use.
Maybe there should be an property where you can specify which Generation that the object >= before it is collected. So if you specify 1 then it is the weakest possible reference. But if you specify 3 then it would need to survive at least 3 prior collections before it can be considered for collection itself.
I thought the track ressurection flag was no good for this because by that time the object has already been finalized? May be wrong though...
(PS: I am the OP, just signed up. PITA that it doesn't inherit your history from "unregistered" accounts.)
Looking for the 'trackResurrection' option passed to the constructor perhaps?
The GC class also offers some assistance.
Don't know why .NET does not have Softreferences.
BUT in Java Softreferences are IMHO overused. The reason is tha at least in an application server you would want to be able to influence per application how long your Softreferenzen live. That's currently not possible in Java.

Categories