Reference a big DLL for a single method - c#

I want to use a single method from a big Class library-dll in C#.
Are there any drawbacks of performance or anything else?
Should I "read" the method code with reflection tool and copy-paste it to my project?
Update: The hard disk space isn't an issue. My application is web app.

Are there any drawbacks of performance or anything else?
The only one that is actually important is the size of your distributable, if it matters to you. (Users downloading a 30 MB file instead of a 2 MB one). Performance differences will be negligible. Assembly binding and verifying the Strong Name (if it's signed) hash may take longer, but unlikely to be noticeable to a user.
Should I "read" the method code with reflection tool and copy-paste it to my project?
Probably not; most licensing terms prohibit reverse engineering and/or only partial distribution. Check the license, if any, to see if you can even do it first.

No, leave that up to the JIT compiler. It is already selective about what IL actually gets turned into machine code, it only compiles what actually executes. You'll lose a bit of virtual memory address space but that doesn't cost anything, it's virtual. You don't pay for what you don't use.

The costs are disk space, load time and memory footprint. The JIT compiler will only compile what you call (there may by caveats to this, but it certainly will not compile the entire assembly). It's your call as to if it is worth your while to 'rip' out the method you need. Remember of course this could be a rabbit hole, by that I mean this method is likely to use other classes in its assembly so it may not be as simple as you think to extract the code you need.

Bah, extreme hacking needs in extreme cases. Copy methods code (if this is possible) is extreme hacking, imo.
If just frustration to use unnecessary memory, but basically pretty affordable solution, do it like you continue doing. Simple and easy.
If there are memory issues, and the method is not called too frequently (what is too frequently depends on your project) you can try to load it in external AppDomain, and unload that domain one time you finished with it. Sure you need take care of IPC management, in this case. Nothing comes free in this world.
You can try do what you wrote. User Reflector (or similar software), get the methods C# code and create its clone. All this supposing that method doesn't use DLL's internal's structures, states or whatever, cause in that case the story becomes fairly complicated.
Good luck.

Related

Function call for small method will consume for memory or not..... in C#

I have one question. Instead of writing big method(including big business logic), i preferred to divide this method in small methods and call them in one method because for me it looks so neat and easy to maintain. But my Team Lead said that "Don't write small methods and call them in one because it consumes more memory while you call small methods." Is that correct ?
Please suggest what should i do in this case ? and once again thank you for your valuable time
There are many factors that come into play here. More context of your project would be required to give any strict conclusions.
Generally speaking though, C#, VB and managed languages in general were devised to prioritize developer productivity over performance. In that light, worrying about method call memory consumption seems questionable.
Additionally, IL-based languages (C#, VB, ...) use a JIT that compiles the intermediate code to CPU-specific assembly in runtime. JIT's unit of work is a method. The bigger the method, the less optimizations JIT can do. Therefore a big method may yield worse performance than many small methods doing the same work. In addition, JIT can also do an optimization called inlining where a small method code is generated inside its caller, eliding the function call altogether.
Function call takes very little memory by C#/VB's terms. Unless you're working in a very constrained environment (e.g. embedded), such optimization doesn't really make sense, especially when not backed by any reasonable arguments.
You are both mistaken.
OOP is built on a concept of divide and conquer so you should divide your method into small methods for the sake of reuse ability and maintainability.
About the memory consumed, I don't think it will consume more memory but this may happen when you create methods for each small task.
So yes divide them into small methods only if need to, with respect of resources and sharing variables.

How do the .NET JITs optimise generated code layout?

Back in 2009 I posted this answer to a question about optimisations for nested try/catch/finally blocks.
Thinking about this again some years later, it seems the question could be extended to that other control flow, not only try/catch/finally, but also if/else.
At each of these junctions, execution will follow one path. Code must be generated for both, obviously, but the order in which they're placed in memory, and the number of jumps required to navigate through them will differ.
The order generated code is laid out in memory has implications for the miss rate on the CPU's instruction cache. Having the instruction pipeline stalled, waiting for memory reads, can really kill loop performance.
I don't think loops (for/foreach/while) are a such a good fit unless you expect the loop have zero iterations more often than it has some, as the natural generation order seems pretty optimal.
Some questions:
In what ways do the available .NET JITs optimise for generated instruction order?
How much difference can this make in practice to common code? What about perfectly suited cases?
Is there anything the developer can do to influence this layout? What about mangling with the forbidden goto?
Does the specific JIT being used make much difference to layout?
Does the method inlining heuristic come into play here too?
Basically anything interesting related to this aspect of the JIT!
Some initial thoughts:
Moving catch blocks out of line is an easy job, as they're supposed to be the exceptional case by definition. Not sure this happens.
For some loops I suspect you can increase performance non-trivially. However in general I don't think it'll make that much difference.
I don't know how the JIT decides the order of generated code. In C on Linux you have likely(cond) and unlikely(cond) which you can use to tell to the compiler which branch is the common path to optimise for. I'm not sure that all compilers respect these macros.
Instruction ordering is distinct from the problem of branch prediction, in which the CPU guesses (on its own, afaik) which branch will be taken in order to start the pipeline (oversimplied steps: decode, fetch operands, execute, write back) on instructions, before the execute step has determined the value of the condition variable.
I can't think of any way to influence this order in the C# language. Perhaps you can manipulate it a bit by gotoing to labels explicitly, but is this portable, and are there any other problems with it?
Perhaps this is what profile guided optimisation is for. Do we have that in the .NET ecosystem, now or in plan? Maybe I'll go and have a read about LLILC.
The optimization you are referring to is called the code layout optimization which is defined as follows:
Those pieces of code that are executed close in time in the same thread should be be close in the virtual address space so that they fit in a single or few consecutive cache lines. This reduces cache misses.
Those pieces of code that are executed close in time in different threads should be be close in the virtual address space so that they fit in a single or few consecutive cache lines as long as there is no self-modifying code. This gets lower priority than the previous one. This reduces cache misses.
Those pieces of code that are executed frequently (hot code) should be close in the virtual address space so that they fit in as few virtual pages as possible. This reduces page faults and working set size.
Those pieces of code that are rarely executed (cold code) should be close in the virtual address space so that they fit in as few virtual pages as possible. This reduces page faults and working set size.
Now to your questions.
In what ways do the available .NET JITs optimise for generated
instruction order?
"Instruction order" is really a very general term. Many optimizations affect instruction order. I'll assume that you're referring to code layout.
JITters by design should take the minimum amount of time to compile code while at the same time produce high-quality code. To achieve this, they only perform the most important optimizations so that it's really worth spending time doing them. Code layout optimization is not one of them because without profiling, it may not be beneficial. While a JITter can certainly perform profiling and dynamic optimization, there is a generally preferred way.
How much difference can this make in practice to common code? What
about perfectly suited cases?
Code layout optimization by itself can improve overall performance typically by -1% (negative one) to 4%, which is enough to make compiler writers happy. I would like to add that it reduces energy consumption indirectly by reducing cache misses. The reduction in miss ratio of the instruction cache can be typically up to 35%.
Is there anything the developer can do to influence this layout? What
about mangling with the forbidden goto?
Yes, there are numerous ways. I would like to mention the generally recommended one which is mpgo.exe. Please do not use goto for this purpose. It's forbidden.
Does the specific JIT being used make much difference to layout?
No.
Does the method inlining heuristic come into play here too?
Inlining can indeed improve code layout with respect to function calls. It's one of the most important optimizations and all .NET JITs perform it.
Moving catch blocks out of line is an easy job, as they're supposed to
be the exceptional case by definition. Not sure this happens.
Yes it might be "easy", but what is the potential gained benefit? catch blocks are typically small in size (containing a call to a function that handles the exception). Handling this particular case of code layout does not seem promising. If you really care, use mpgo.exe.
I don't know how the JIT decides the order of generated code. In C on
Linux you have likely(cond) and unlikely(cond) which you can use to
tell to the compiler which branch is the common path to optimise for.
Using PGO is much more preferable over using likely(cond) and unlikely(cond) for two reasons:
The programmer might inadvertently make mistakes while placing likely(cond) and unlikely(cond) in the code. It actually happens a lot. Making big mistakes while trying to manually optimize the code is very typical.
Adding likely(cond) and unlikely(cond) all over the code makes it less maintainable in the future. You'll have to make sure that these hints hold every time you change the source code. In large code bases, this could be ( or rather is) a nightmare.
Instruction ordering is distinct from the problem of branch
prediction...
Assuming you are talking about code layout, yes they are distinct. But code layout optimization is usually guided by a profile which really includes branch statistics. Hardware branch prediction is of course totally different.
Maybe I'll go and have a read about LLILC.
While using mpgo.exe is the mainstream way of performing this optimization, you can use LLILC also since LLVM support profile-guided optimization as well. But I don't think you need to go this far.

Does passing a control as parameter causes an enough performance hit?

What is better/preferred - Creating a method of 2 lines which accepts a web control as a parameter, operates on it and is called from 3-4 places within the same code file or writing those 2 lines at the 3-4 places and not creating the method?
P.S. The control I am referring here is a textbox.
All it is passing is a reference. There will be no significant cost to this whatsoever. If the method is small and linear, the JIT may even choose to inline it - but ultimately, this is not going to make any difference.
Stick with the method approach - then you only have one place to maintain.
For maintainability it is better to break out the lines to a method.
Performance wise you will not notice any difference at all.
Unless you're working on code that needs to run in a Nuclear Powerplant or a NASA landrover on Mars, you're always better of writing code that is easier for YOU to maintain! And that means refactoring your code so you never repeat yourself.
Theoretically its of course faster to have the instructions inline and not call a method, but in practice it far outweighs the cons maintaining it.
The DRY (Don't Repeat Yourself) principle says that you create the method and call it the 3-4 times. The performance hit is so minimal that it really isn't worth thinking about unless the consequence of those billionths of a second outweigh the additional overhead of maintaining the code 3-4 times over.
In short, unless you can really really really justify the additional maintenance over grabbing every last processor cycle* overhead then create the method.
(*) Given this is a textbox and therefore most likely a business application your biggest performance worries more likely are databases and webservices.
Careful where the method that modifies the control resides. If the method is part of another class, then passing controls to it will break the encapsulation of the class that owns the control.

If reflection is inefficient, when is it most appropriate?

I find a lot of cases where I think to myself that I could use relfection to solve a problem, but I usually don't because I hear a lot along the lines of "don't use reflection, it's too inefficient".
Now I'm in a position where I have a problem where I can't find any other solution than to use reflection with new T(), as outlined in this question & answer.
So I'm wondering if somebody can tell me reflection's specific intended usage, and if there's a set of guidelines to indicate when it's appropriate and when it isn't?
It is often "fast enough", and if you need faster (for tight loops etc) you can do meta-programming with Expression or ILGenerator (perhaps via DynamicMethod), to make extremely fast code (including some tricks you can't do in C#).
Reflection is more commonly used for framework/library scenarios, where the library by definition knows nothing about the caller, and must work based on configuration, attributes or patterns.
If there's one thing that I hate hearing it's "don't use reflection, it's too inefficient".
Too inefficient for what? If you're writing a console application that's run once a month and isn't time critical, does it really matter if it takes 30 seconds instead of 28, because of you using reflection?
Guidelines for when it's inappropriate to use are ones that only you can really put together as they're heavily dependent on what you're doing and how efficient/performant alternatives are.
A useful abstraction for code efficiency is to partition it in three categories of time, each about 3 orders of magnitude apart.
First is human-time. There's a lot you can do when you only need to keep a person happy with the performance of your code. Humans cannot perceive the difference between code that needs 10 milliseconds or 20 milliseconds, both look instant. And a human is forgiving when a program needs 6 seconds instead of 5, roughly 3 billion machine instructions more. Common examples of programs that run at human-time are compilers and point-and-click designers. Using reflection is never a problem.
Then there is I/O-time. When your program needs to hit the disk or the network. I/O is slow, restricted by mechanical motion in the case of the disk, bandwidth and latency in the case of a network. You can always tell when I/O is the bottleneck, your program is running but it isn't driving up the CPU load much. The operating system is constantly blocking the thread, making it wait until the I/O request is complete.
Reflection operates at I/O-time. To retrieve type data, the CLR must read the assembly metadata. And when that wasn't done before, your program will cause a page-fault, requiring the operating system to read the data from disk. What follows is that, roughly, reflection can make I/O bound code only twice as slow. Usually better because after the first perf hit, the metadata is cached and can be retrieved a lot quicker. Reflection is thus often an acceptable trade-off. The canonical examples are serialization and dbase ORMs.
Then there's machine-time. The raw performance of a CPU core is stupendous. A property getter can execute in somewhere between 0 and 1/2 a nanosecond. This does not compare favorably with, say, PropertyInfo.GetValue(). Both will keep the CPU busy, you'll see the CPU load for the core at 100%. But GetValue() costs hundreds if not thousands of machine code instructions. Not counting the time needed to page in the metadata. While not much an incremental time, it builds up fast when you loop.
If you cannot classify your reflection code in the human-time or I/O-time categories then reflection is unlikely to be an appropriate substitute for regular code.
The key to keeping reflection from slowing down your program is to not use it inside a loop. If you want to read a property from an object during startup (happens once), use reflection. You want to read a property from a list of 10,000 objects of unknown type, use reflection to get the property getter delegate once (search term: PropertyInfo.GetGetMethod), then call the delegate 10,000 types. There are plenty of examples of this on StackOverflow.
Reflection is not inefficient. It is less efficient than direct calls. So personnaly I use reflection when there's no equivalent compile time safe method. IMHO the problem with reflection is not so much the efficiency but the fragility of the code as it uses magic strings which are very refactor unfriendly.
I use it for plugin architecture - looking through assemblies in the plugin folder for methods marked with a custom attribute indicating info about the plugin - and in a logging framework. The framework detects a custom attribute on the assembly itself which holds information about the author of the assembly, the project, version information, and other tags that are logged along with everything in the stack trace.
Going to give away a 'trade secret', but it's a good one. The framework allows you to tag each method or class with a 'Story ref', e.g.
[StoryRef(Ref="ImportCSV1")]
...and the idea is it would integrate into our agile project management framework: if there were any exceptions thrown within that class/method, the logging method would use reflection to check for a StoryRef attribute in the stack trace, and if so that would be logged as an exception against that story. In the PM software you could see exceptions by Story (a story is like an extreme/agile use case).
I think that's a valid use, at least! Basically, when it just seems the most neat, and appropriate way to do it, I use reflection. Nothing else really comes into it - I can't think of an occasion you'd be using reflection to make that many calls that efficiency would come into it.
So I'm wondering if somebody can tell
me reflection's specific intended
usage, and if there's a set of
guidelines to indicate when it's
appropriate and when it isn't?
A bad example of reflection is this one from Wikipedia:
//Without reflection
Foo foo = new Foo();
foo.Hello();
//With reflection
Type t = Type.GetType("FooNamespace.Foo");
object foo = Activator.CreateInstance(t);
t.InvokeMember("Hello", BindingFlags.InvokeMethod, null, foo, null);
Here, there is no advantage to using reflection: The non-reflection-using code is not only more efficient, but easier to understand.
Good uses of reflection are things like serialization and object-relational mapping, which are easy to implement if you have a list of a class's properties, but otherwise require a custom-written function for each class.

Why doesn't .NET have a SoftReference as well as a WeakReference, like Java?

I really love WeakReference's. But I wish there was a way to tell the CLR how much (say, on a scale of 1 to 5) how weak you consider the reference to be. That would be brilliant.
Java has SoftReference, WeakReference and I believe also a third type called a "phantom reference". That's 3 levels right there which the GC has a different behaviour algorithm for when deciding if that object gets the chop.
I am thinking of subclassing .NET's WeakReference (luckily and slightly bizzarely it isn't sealed) to make a pseudo-SoftReference that is based on a expiration timer or something.
I believe the fundamental reason that NET does not have soft references is because it can rely on an operating system with virtual memory. A Java process must specify its maximum OS memory (e.g. with -Xmx128M), and it never takes more OS memory than that. Whereas a NET process keeps taking OS memory that it needs, which the OS supplies with disk-backed virtual memory when RAM runs out. If NET allowed soft references, then the NET runtime would not know when to release them unless it either peeked deep into the OS to see if its memory is actually paged on disk (a nasty OS/CLR dependency), or it requested the runtime to specify a maximum process memory footprint (e.g. an equivalent of -Xmx). I guess that Microsoft does not want to add -Xmx to NET because they think the OS should decide how much RAM each process gets (by choosing which virtual memory pages to hold in RAM or on disk), and not the process itself.
Java SoftReferences are used in the creation of memory sensitive caches (they serve no other purpose).
As of .NET 4, .NET has a class System.Runtime.Caching.MemoryCache which will probably meet any such needs.
Having a WeakReference with varying levels of weakness (priority) sounds nice, but also might make the GC's job harder, not easier. (I've no idea on the GC internals, but) I would assume there some sort of additional access statistics that are kept for WeakReference objects so that the GC can clean them up efficiently (e.g. it might get rid of the least-used items first).
More than likely the added complexity wouldn't make anything any more efficient because the most efficient way is to get rid of infrequently used WeakReferences first. If you could assign a priority, how would you do it? This smells like a premature optimization: the programmer doesn't really know most of the time and is guessing; the result is a slower GC collection cycle that is probably reclaiming the wrong objects.
It begs the question though, that if you care about the WeakReference.Target object being reclaimed, is it really a good use of WeakReference?
It's like a cache. You shove stuff into the cache and ask the cache to make it stale after x minutes, but most caches never guarantee to keep it around at all. It just guarantees that if it does, it will expire it according to the policy requested.
My guess as to why this isn't there already would be simplicity. Most people, I think, would call it a virtue that there is only one type of reference, not four.
Maybe the ASP.NET Cache class (System.Web.Caching.Cache) might help achieve what you want? It automatically remove objects if memory gets low:
ASP.NET Caching Overview
Here's an article that shows how to use the Cache class in a windows forms application.
quoted from: Equivalent to SoftReference in .net?
Don't forget that you also have your standard references (the ones that you use on a daily basis). This gives you one more level.
WeakReferences should be used when you don't really care if the object goes away, while SoftReferences really only should be used when you would use a normal reference, but you would rather your object be cleared then for you to run out of memory. I'm not sure on the specifics, but I suspect that the GC normally traces through SoftReferences but not WeakReferences when determining which objects are live, but when running low on memory will also skip the SoftReferences.
My guess is that the .Net designers felt that the difference was confusing to most people and or that SoftReferences add more complexity than they really wanted and so decided to leave them out.
As a side note, AFAIK PhantomReferences are mostly designed for internal use by the virtual machine and are not intended for actual client use.
Maybe there should be an property where you can specify which Generation that the object >= before it is collected. So if you specify 1 then it is the weakest possible reference. But if you specify 3 then it would need to survive at least 3 prior collections before it can be considered for collection itself.
I thought the track ressurection flag was no good for this because by that time the object has already been finalized? May be wrong though...
(PS: I am the OP, just signed up. PITA that it doesn't inherit your history from "unregistered" accounts.)
Looking for the 'trackResurrection' option passed to the constructor perhaps?
The GC class also offers some assistance.
Don't know why .NET does not have Softreferences.
BUT in Java Softreferences are IMHO overused. The reason is tha at least in an application server you would want to be able to influence per application how long your Softreferenzen live. That's currently not possible in Java.

Categories