I came across ref and this parameter keywords lately, and I can't find any good comparison performance wise.
void main()
{
int original = 1;
/* Which one is best? */
original = DoMaths(original, 2);
DoMaths(ref original, 2);
original = original.DoMaths(2);
}
int DoMaths(int original, int arg)
{
return original * arg;
}
void DoMaths(ref int original, int arg)
{
original *= arg;
}
int DoMaths(this int original, int arg)
{
return original * arg;
}
I came across ref and this parameter keywords lately, and I can't find any good comparison performance wise
You found no good comparison because no meaningful comparison is possible.
The this keyword, as used in your code example, has literally no effect on runtime performance. It is strictly a marker signaling to the compiler that the method being declared is an extension method. The code you posted won't even compile (extension methods are required to be static, and they are legal only in a class that is also static), but when they are declared and used correctly, they affect only the compile-time semantics of the code, allowing you to write the call to the method as if it was an instance method called using the first parameter (marked with this).
Otherwise, it's exactly like calling any other static method, and has the exact same performance characteristics. And comparing performance between a static method and one using a by-reference parameter (which via ref or out) is also meaningless. The two are not even mutually exclusive.
Even if you were to conceive of a question comparing the use of ref to some other otherwise-comparable scenario, it is generally impossible for anyone else to answer the question in a useful way, especially without a good, minimal, complete code example that clearly illustrates the question. Even with such an example, a useful answer is unlikely because such a code example is necessarily divorced from the real-world context in which the actual performance is presumably a concern. I.e. any answer would be primarily of academic value.
In a forum like this, the best and only reasonable answer to a question that takes the form of "which of these performs better?" is "which performed better when you tested it in your real-world scenario?"
Finally, note that passing by-reference significantly changes the semantics of the method. As such, even where performance is a concern and even if it actually happens that passing by-reference improves performance, one should be extremely wary of using ref solely as a performance-enhancement measure. It is extremely unusual for a single method call to be a significant performance bottleneck, and even less likely that passing by-reference could produce a significant enough difference in performance as compared to a semantically-correct implementation to justify harming the expressiveness of the code in that way.
Related
Perhaps this is a bit naive of me, but I can't really seem to find/think of a decent use-case for "pass by reference". Changing an immutable string (as some other Q/As have mentioned) is generally avoidable and returning multiple variables is generally better handled by returning a Tuple, List, array, etc.
The example on MSDN is terrible in my opinion; I would simply be returning a value in the Square method, instead of having it declared as void.
It seems to me like it's a bit of a legacy part of C#, rather than an integral part of it. Can someone smarter than me try to explain why it's still around and/or some real-world use-cases that are actually practical (i.e. Changing an immutable string is avoidable in almost every case).
P.S.: I followed up on some of the comments by #KallDrexx and #newacct. I see now that they were right and I was wrong: my answer was somewhat misleading. The excellent article "Java is pass-by-value, dammit!" by Scott Stanchfield (Java-specific, but still mostly relevant to C#) finally convinced me so.
I'll leave the misleading bits of my answer striked through
for now, but might later remove them.
Pass by reference is not just used with ref or out parameters. More importantly, all reference types are passed by reference (thus their name), although this happens transparently.
Here are three frequent use cases for pass-by-reference:
Prevent copying of large structs when passing them around. Imagine you have a byte[] array representing a binary large object (BLOB), possibly a few megabytes in size value of some struct type that contains lots of fields. A value of that type might potentially occupy quite a lot of memory. Now you want to pass this value to some method. Do you really want to pass it by value, i.e. create a temporary copy?
You can avoid unnecessary copying of large structs by passing them by reference.
(Luckily for us, arrays such as byte[] are reference types, so the array's contents are already passed by refence.)
It is often suggested (e.g. in Microsoft's Framework Design Guidelines) that types having value-type semantics should be implemented as reference types if they exceed a certain size (32 bytes), so this use case should not be very frequent.
Mutability. If you want a method to be able to mutate a struct value that is passed to it, and you want the caller to observe the mutation of his version of that object, then you need pass by reference (ref). If the value is passed to the method by value, it receives a copy; mutating the copy will leave the original object unmodified.
This point is also mentioned in the Framework Design Guideline article linked to above.
Note the widespread recommendation against mutable value types (See e.g. "Why are mutable structs evil?"). You should rarely have to use ref or out parameters together with value types.
COM interop as mentioned in this answer often requires you to declare ref and out parameters.
Suppose you want to have a function that mutates a value (Notice, value, not object), and you also want that function to return some success indicator. A good practice would be to return a boolean indicating success / failure, but what about the value? So you use a ref:
bool Mutate(ref int val)
{
if(val > 0)
{
val = val * 2;
return true;
}
return false;
}
It's true that in C# there are usually alternatives to ref and out - for example, if you want to return more than one value to the caller, you could return a tuple, or a custom type, or receive a reference type as a parameter and change multiple values inside it.
However, these keywords can still be a convenient solution in situations like interop:
// C
int DoSomething(int input, int *output);
// C#
[DllImport(...)]
static extern int DoSomething(int input, ref int output);
There are indeed only few cases where explicit ref parameters are useful from a functional point of view.
One example I have seen:
public static void Swap<T>(ref T a, ref T b)
{
var temp = a;
a = b;
b = temp;
}
Such a method is useful in some sort algorithms, for example.
One reason why ref parameters are sometimes used, is as an optimization technique. A large struct (value type) is sometimes passed by ref, even if the called method has no intent to modify the struct's value, to avoid copying the structs contents. You could argue that a large struct would better be a class (i.e. a reference type), but that has disadvantages when you keep large arrays of such objects around (for example in graphics processing). Being an optimization, such a technique does not improve code readability, but improves performance in some situations.
A similar question could be asked about pointers. C# has support for pointers, through the unsafe and fixed keywords. They are hardly ever needed from a functional point of view: I can't really think of a function that could not be coded without the use of pointers. But pointers are not used to make the language more expressive or the code more readable, they are used as a low level optimization technique.
In fact, passing a struct by reference really is a safe way to pass a pointer to the struct's data, a typical low level optimization technique for large structs.
Is a feature that enables low level optimizations a legacy feature? That may depend on your point of view, but you aren't the only C# user. Many of them will tell you that the availability of pointers and other low level constructs is one of the big advantages C# has over some other languages. I for one don't think that ref parameters are a legacy feature, they are an integral part of the language, useful in some scenarios.
That does not mean that you have to use ref parameters, just like you don't have to use LINQ, async/await or any other language feature. Just be happy it's there for when you do need it.
I have a static C# class that exposes some pre known Guids via properties
I used the old fashioned property syntax e.g a private declaration that declares the guid and a public property that exposes it. e.g. something like
private static Guid aGuid = new Guid("l1....");
public static Guid AGuid { get { return aGuidl }}
But would be less code just to have the automatic property e.g.something like
public static Guid AGuid { get{ return new Guid("11...") }}
The question is, is the former more verbose method more efficient or is the C# compiler clever enough not to create a new Guid every time if I use the later approach.
Caching (your 1st sample) would be a little more efficient here.
The compiler won't do this as an optimization.
I just did a quick check with Ildasm.
It's not an unreasonable assumption, we all know the Guid constructor should yield the same value every time. But the compiler cannot assume this, there would have to be a list of special types, probably not worth the trouble.
Necessary preamble:
Like most micro optimizations, the answer is probably, "who cares?" It's unlikely to be the thing that's actually slowing down your program.
This is an easy thing to do a quick empirical test. However, like all empirical tests, you should be skeptical of your results. The answer may be dependent upon a lot of unobserved or unobservable confounds.
Henk Holterman is correct that the IL will appear to be more efficient in the first form. The constructor that takes a string has to do some parsing. Parsing isn't slow, but it's not free either. It might be possible to improve performance easily by using a Guid constructor that takes a series int types instead of a string.
Also, Guid is a value type. This means that no matter what, you'll have to pay the cost of copying the memory when you return the Guid (yeah, I know, it's a really steep cost to copy 16 bytes).
Finally, and I'm getting a bit speculative here, it's entirely possible that the JITer can recognize side-effect free constructors and unroll them at JIT time into "plain old data." If this is the case then the total cost may be just a 16 byte memcpy, which is unavoidable anyway.
So... write it both ways, test it both ways in a variety of conditions. You'll likely find it doesn't matter at all, but you also might find a small performance boost.
Eric Lippert told me I should "try to always make value types immutable", so I figured I should try to always make value types immutable.
But, I just found this internal mutable struct, System.Web.Util.SimpleBitVector32, in the System.Web assembly, which makes me think that there must be a good reason for having a mutable struct. I'm guessing the reason that they did it this way is because it performed better under testing, and they kept it internal to discourage its misuse. However, that's speculation.
I've C&P'd the source of this struct. What is it that justifies the design decision to use a mutable struct? In general, what sort of benefits can be gained by the approach and when are these benefits significant enough to justify the potential detriments?
[Serializable, StructLayout(LayoutKind.Sequential)]
internal struct SimpleBitVector32
{
private int data;
internal SimpleBitVector32(int data)
{
this.data = data;
}
internal int IntegerValue
{
get { return this.data; }
set { this.data = value; }
}
internal bool this[int bit]
{
get {
return ((this.data & bit) == bit);
}
set {
int data = this.data;
if (value) this.data = data | bit;
else this.data = data & ~bit;
}
}
internal int this[int mask, int offset]
{
get { return ((this.data & mask) >> offset); }
set { this.data = (this.data & ~mask) | (value << offset); }
}
internal void Set(int bit)
{
this.data |= bit;
}
internal void Clear(int bit)
{
this.data &= ~bit;
}
}
Given that the payload is a 32-bit integer, I'd say this could easily have been written as an immutable struct, probably with no impact on performance. Whether you're calling a mutator method that changes the value of a 32-bit field, or replacing a 32-bit struct with a new 32-bit struct, you're still doing the exact same memory operations.
Probably somebody wanted something that acted kind of like an array (while really just being bits in a 32-bit integer), so they decided they wanted to use indexer syntax with it, instead of a less-obvious .WithTheseBitsChanged() method that returns a new struct. Since it wasn't going to be used directly by anyone outside MS's web team, and probably not by very many people even within the web team, I imagine they had quite a bit more leeway in design decisions than the people building the public APIs.
So, no, probably not that way for performance -- it was probably just some programmer's personal preference in coding style, and there was never any compelling reason to change it.
If you're looking for design guidelines, I wouldn't spend too much time looking at code that hasn't been polished for public consumption.
Actually, if you search for all classes containing BitVector in the .NET framework, you'll find a bunch of these beasts :-)
System.Collections.Specialized.BitVector32 (the sole public one...)
System.Web.Util.SafeBitVector32 (thread safe)
System.Web.Util.SimpleBitVector32
System.Runtime.Caching.SafeBitVector32 (thread safe)
System.Configuration.SafeBitVector32 (thread safe)
System.Configuration.SimpleBitVector32
And if you look here were resides the SSCLI (Microsoft Shared Source CLI, aka ROTOR) source of System.Configuration.SimpleBitVector32, you'll find this comment:
//
// This is a cut down copy of System.Collections.Specialized.BitVector32. The
// reason this is here is because it is used rather intensively by Control and
// WebControl. As a result, being able to inline this operations results in a
// measurable performance gain, at the expense of some maintainability.
//
[Serializable()]
internal struct SimpleBitVector32
I believe this says it all. I think the System.Web.Util one is more elaborate but built on the same grounds.
SimpleBitVector32 is mutable, I suspect, for the same reasons that BitVector32 is mutable. In my opinion, the immutable guideline is just that, a guideline; however, one should have a really good reason for doing so.
Consider, also, the Dictionary<TKey, TValue> - I go into some extended details here. The dictionary's Entry struct is mutable - you can change TValue at any time. But, Entry logically represents a value.
Mutability must make sense. I agree with the #JoeWhite: somebody wanted something that acted kind of like an array (while really just being bits in a 32-bit integer); also that both BitVector structs could easily have been ... immutable.
But, as a blanket statement, I disagree with it was probably just some programmer's personal preference in coding style and lean more toward there was never [nor is there] any compelling reason to change it. Simply know and understand the responsibility of using a mutable struct.
Edit
For the record, I do heartily agree that you should always try to make a struct immutable. If you find that requirements dictate member mutability, revisit the design decision and get peers involved.
Update
I was not initially confident in my assessment of performance when considering a mutable value type v. immutable. However, as #David points out, Eric Lippert writes this:
There are times when you need to wring every last bit of performance
out of a system. And in those scenarios, you sometimes have to make a
tradeoff between code that is clean, pure, robust ,
understandable, predictable, modifiable and code that is none of the
above but blazingly fast.
I bolded pure because a mutable struct does not fit the pure ideal that a struct should be immutable. There are side-affect of writing a mutable struct: understability and predictability are compromised, as Eric goes on to explain:
Mutable value types ... behave
in a manner that many people find deeply counterintuitive, and thereby
make it easy to write buggy code (or correct code that is easily
turned into buggy code by accident.) But yes, they are real fast.
The point Eric is making is that you, as the designer and/or developer need to make a conscious and informed decision. How do you become informed? Eric explains that also:
I would consider coding up two benchmark solutions -- one using
mutable structs, one using immutable structs -- and run some
realistic user-scenario-focused benchmarks. But here's the thing: do not pick the faster one. Instead, decide BEFORE you run the benchmark
how slow is unacceptably slow.
We know that altering a value type is faster than creating a new value type; but considering correctness:
If both solutions are acceptable, choose the one that is clean,
correct and fast enough.
The key is being fast enough to offset side affects of choosing mutable over immutable. Only you can determine that.
Using a struct for a 32- or 64-bit vector as shown here is reasonable, with a few caveats:
I would recommend using an Interlocked.CompareExchange loop when performing any updates to the structure, rather than just using the ordinary Boolean operators directly. If one thread tries to write bit 3 while another tries to write bit 8, neither operation should interfere with the other beyond delaying it a little bit. Use of an Interlocked.CompareExchange loop will avoid the possibility of errant behavior (thread 1 reads value, thread 2 reads old value, thread 1 writes new value, thread 2 writes value computed based on old value and undoes thread 1's change) without needing any other type of locking.
Structure members, other than property setters, which modify "this" should be avoided. It's better to use a static method which accepts the structure as a reference parameter. Invoking a structure member which modifies "this" is generally identical to calling a static method which accepts the member as a reference parameter, both from a semantic and performance standpoint, but there's one key difference: If one tries to pass a read-only structure by reference to a static method, one will get a compiler error. By contrast, if one invokes on a read-only structure a method which modifies "this", there won't be any compiler error but the intended modification won't happen. Since even mutable structures can get treated as read-only in certain contexts, it's far better to get a compiler error when this happens than to have code which will compile but won't work.
Eric Lippert likes to rail on about how mutable structures are evil, but it's important to recognize his relation to them: he's one of the people on the C# team who is charged with making the language support features like closures and iterators. Because of some design decisions early in the creation of .net, properly supporting value-type semantics is difficult in some contexts; his job would be a lot easier if there weren't any mutable value types. I don't begrudge Eric for his point of view, but it's important to note that some principles which may be important in framework and language design are not so applicable to application design.
If I understand correctly, you cannot make a serializable immutable struct simply by using SerializableAttribute. This is because during deserialization, the serializer instantiates a default instance of the struct, then sets all the fields following instantiation. If they are readonly, deserialization will fail.
Thus, the struct had to be mutable, else a complex serialization system would have been necessary.
I have a set of extension methods that I regularly use for various UI tasks. I typically define them to run off of type object, even though inside of them I'm typically converting them to string types.
public static string FormatSomething(this object o)
{
if( o != null )
{
string s = o.ToString();
/// do the work and return something.
}
// return something else or empty string.
}
The main reason I use type object and not string is to save myself in the UI from having to do <%#Eval("Phone").ToString().FormatSomething()%> when I can do <%#Eval("Phone").FormatSomething()%> instead.
So, is it fine from performance standpoint to create all the extension methods on object, or should I convert them to be string (or relevant) types based on what the extension method is doing?
Is there a performance hit for creating extension methods that operate off the object type?
Yes. If you pass a value type in then the value type will be boxed. That creates a performance penalty of allocating the box and doing the copy, plus of course later having to garbage collect the box.
Instead of
public static string FormatSomething(this object o)
{
return (o != null) ? o.ToString() : "";
}
I would write
public static string FormatSomething<T>(this T o)
{
return (o != null) ? o.ToString() : "";
}
That has the same effect, but avoids the boxing penalty. Or rather, it trades a per call boxing penalty for a first call jitting cost penalty.
is it fine from performance standpoint to create all the extension methods on object?
We cannot answer the question. Try it! Measure the performance, compare that against the desired performance, and see if you met your goal. If you did, great. If not, use a profiler, find the slowest thing, and fix it.
But neither question is the question you should be asking. The question you should have asked is:
Is it a good programming practice to create an extension method that extends everything?
No. It is almost never a good idea. In most cases where people want to do that, they are abusing the extension method mechanism. Typically there is some more specific type that could be extended. If you do this a lot then you end up with lots of extension methods on every type, and coding becomes confusing and error-prone.
For example, suppose you want to have an extension method that answers the question "does this sequence contain this value?" You could write:
public static bool IsContainedIn<T>(this T item, IEnumerable<T> sequence)
and then say
if (myInt.IsContainedIn(myIntSequence))
But it is much better to say:
public static bool Contains<T>(this IEnumerable<T> sequence, T item)
and then say
if (myIntSequence.Contains(myInt))
If you do it the first way then you're typing along in the IDE and every single time you type ".", you get prompted with IsContainedIn as an option because maybe you're about to write code that determines if this object is in a collection. But 99% of the time, you're not going to do that. Doing this adds noise to the tooling and makes it harder to find what you really want.
I seriously doubt there would be any performance implications outside of perhaps some IDE impact. Once compiled I wouldn't expect it would make any difference.
Compared to when you call ToString befor the call to FormatSomething not really (you're null check might take a few more ms but they also make the code more robust.
Even if the compile time type of the object you're calling the method on was string it still would make a visible difference.
Don't worry about performance until you have a performance issue. Until then worry about maintainability including readability. If you have a performance problem then use a profiler to find where it is.
What overhead is associated with an extension method at runtime? (.NET) answers your question I think. Extension methods are just static methods, so they do not belong on the Object type. Intellisense only makes it seem so.
ReSharper sometimes hints that I can make some of my random utility methods in my WebForms static. Why would I do this? As far as I can tell, there's no benefit in doing so.. or is there? Am I missing something as far as static members in WebForms goes?
The real reason is not the performance reason -- that will be measured in billionths of a second, if it has any effect at all.
The real reason is that an instance method which makes no use of its instance is logically a design flaw. Suppose I wrote you a method:
class C
{
public int DoubleIt(int x, string y, Type z)
{
return x * 2;
}
}
Is this a well-designed method? No. It takes all kinds of information in which it then ignores and does not use to compute the result or execute a side effect. Why force the caller to pass in an unnecessary string and type?
Now, notice that this method also takes in a C, in the form of the "this" passed into the call. That is also ignored. This method should be static, and take one parameter.
A well-designed method takes in exactly the information it needs to compute its results and execute its side effects, no more, no less. Resharper is telling you that you have a design flaw in your code: you have a method that is taking in information that it is ignoring. Either fix the method so that it starts using that information, or stop passing in useless data by making the method static.
Again, the performance concern is a total red herring; you'll never notice a difference that small unless what you're doing takes on the order of a handful of processor cycles. The reason for the warning is to call your attention to a logical design flaw. Getting the program logic right is far more important than shaving off a nanosecond here and there.
I wouldn't mind any performance improvement, but what you might like is that static methods have no side effect on the instance. So unless you're having a lot of static state (do you?) this gives away your intention that this method is similar to a function, only looking at the parameters and (optional) returning a result.
For me this is a nice hint when I read someone else's code. I don't worry too much about shared state and can see the flow of information more easily. It's much more constrained in what it can do by declaring it static, which is less to worry about for me, the reader.
You will get a performance improvement, FxCop rule CA1822 is the same.
From MSDN:
Methods that do not access instance
data or call instance methods can be
marked as static (Shared in Visual
Basic). After you mark the methods as
static, the compiler will emit
non-virtual call sites to these
members. Emitting non-virtual call
sites will prevent a check at runtime
for each call that ensures that the
current object pointer is non-null.
This can result in a measurable
performance gain for
performance-sensitive code. In some
cases, the failure to access the
current object instance represents a
correctness issue
Resharper suggest to convert methods to static if they don't use any non-static variables or methods from the class.
Benefit could be a minor performance increase (application will use less memory), and there will be one less resharper warning ;)