Why use "ref" in when passing objects (not structs)? - c#

Which one is better in terms of memory? I have always used snippet 2.. Is Snippet 1 better in anyway than snippet 2 (performance,memory) ?
Snippet 1
public void GetListOfString(ref List<string> x)
{
x = new List<string>(){"Dave","John"};
}
Snippet 2
public List<string> GetListOfString()
{
return new List<string>(){"Dave","John"};
}

First of all, your first example should be using out, not ref:
public void GetListOfString(out List<string> x)
The method doesn't care what the incoming value is; it just overwrites whatever was there. Using out ensures that a) the caller is not required to initialize the variable before passing it, and b) the method itself is required to initialize the variable before returning (which will ensure against bugs).
If there is any performance difference at all (and I doubt you could measure one), I would expect the first example to be slower, because it has to pass a reference to a variable. Passing by-reference means there has to be a memory location involved where the method can modify the variable's value. Returning a value is a highly optimized scenario, with the value often even stored in a register. And if the variable isn't passed by-reference, then the compiler may be able to enregister the caller's variable too, for an additional performance gain.
And of course, if data is kept in registers rather than being stored on the stack, that represents a (marginal, inconsequential, completely unimportant) reduction in memory footprint too.
But performance and memory footprint should not be your first concern anyway. The primary concern, and in 99.94% of all code the only concern, is what makes sense semantically and operationally. If the method has a need to modify a caller's variable, then pass by-reference, ref or out as appropriate to the scenario. If not, then pass by-value. Period.
Note that if just one variable of the caller needs to be modified, and the method does not otherwise have to return anything (i.e. would be void), then it is considered a much better practice to let the caller handle modifying the variable, and just return the new value for the variable (i.e. as in your second example).
If and when you come to a point in your code where for some reason, you just cannot achieve some specific and measurable performance or memory footprint goal, and you can prove that using passing by-reference will ensure that you will achieve that goal, then you can use performance as a motivation for passing by-reference. Otherwise, don't give it a second thought.

Snippet 2 is much better in terms of readability and usability.
It is probably also slightly better in terms of performance and memory.
But this is just because the caller is forced to create a new list to even call snippet 1. You could argue that this overhead will be optimized away by the compiler, but don't rely on it.
If you had used out instead of ref for snippet 1, then I would say they are the same in terms of performance and memory.
I can sympathize with someone coming from a different programming language background thinking that snippet 1 would be better, but in C# reference types are returned by reference, not copied like they could be in some other languages.

Related

C# Efficiency for method parameters

Am I correct in saying that this:
public static void MethodName{bool first, bool second, bool third}
{
//Do something
}
Is more efficient than this:
public static void MethodName{bool [] boolArray}
{
bool first = boolArray[0];
bool second = boolArray[1];
bool third = boolArray[2];
//Do something
}
My thoughts are that for both they would have to declare first, second and third - just in different places. But for the second one it has to add it into an array and then unpack it again.
Unless you declared the array like this:
MethodName(new[] { true, true, true });
In which case I am not sure which is faster?
I ask because I am thinking of using the second one but wanted to know if/what the implications are on performance.
In this case performance is not particularly important, but it would be helpful for me to clarify this point.
Also, the second one has the advantage that you can pass as many values as you like to it, and it is also easier to read I think?
The reason I am thinking of using this is because there are already about 30 parameters being passed into the method and I feel it is becoming confusing to keep adding more. All these bools are closely related so I thought it may make the code more manageable to package them up.
I am working on existing code and it is not in my project scope to spend time reworking the method to decrease the number of parameters that are passed into the method, but I thought it would be good practice to understand the implications of this change.
In terms of performance, there's just an answer for your question:
"Programmers waste enormous amounts of time thinking about, or
worrying about, the speed of noncritical parts of their programs, and
these attempts at efficiency actually have a strong negative impact
when debugging and maintenance are considered. We should forget about
small efficiencies, say about 97% of the time: premature optimization
is the root of all evil. Yet we should not pass up our opportunities
in that critical 3%."
In terms of productivity, parameters > arrays.
Side note
Everyone should know that that was said by Donald Knuth in 1974. More than 40 years after this statement, we still fall on premature optimization (or even pointless optimization) very often!
Further reading
I would take a look at this other Q&A on Software Engineering
Am I correct in saying that this:
Is more efficient than this:
In isolation, yes. Unless the caller already has that array, in which case the second is the same or even (for larger argument types or more arguments) minutely faster.
I ask because I am thinking of using the second one but wanted to know if/what the implications are on performance.
Why are you thinking about the second one? If it is more natural at the point of the call then the reasons making it more natural are likely going to also have a performance impact that makes the second the better one in the wider context that outweighs this.
If you're starting off with three separate bools and you're wrapping them just to unwrap them again then I don't see what this offers in practice except for more typing.
So your reason for considering this at all is the more important thing here.
In this case performance is not particularly important
Then really don't worry about it. It's certainly known for hot-path code that hits params to offer overloads that take set numbers of individual parameters, but it really does only make a difference in hot paths. If you aren't in a hot path the lifetime saving of computing time of picking whichever of the two is indeed more efficient is unlikely to add up to the
amount of time it took you to write your post here.
If you are in a hot path and really need to shave off every nanosecond you can because you're looping so much that it will add up to something real, then you have to measure. Isolated changes have non-isolated effects when it comes to performance, so it doesn't matter whether the people on the Internet tell you A is faster than B if the wider context means the code calling A is slower than B. Measure. Measurement number one is "can I even notice?", if the answer to that measurement is "no" then leave it alone and find somewhere where the performance impact is noticeable to optimise instead.
Write "natural" code to start with, before seeing if little tweaks can have a performance impact in the bits that are actually hurting you. This isn't just because of the importance of readability and so on, but also because:
The more "natural" code in a given language very often is the more efficient. Even if you think it can't be, it's more likely to benefit from some compiler optimisation behind the scenes.
The more "natural" code is a lot easier to tweak for performance when it is necessary than code doing a bunch of strange things.
I don't think this would affect the performance of your app at all.
Personally
I'd go with the first option for two reasons:
Naming each parameter: if the project is a large scale project and there is a lot of coding or for possible future edits and enhancements.
Usability: if you are sending a list of similar parameters then you must use an array or a list, if it just a couple of parameters that happened to be of the same type then you should be sending them separately.
Third way would be use of params, Params - MSDN
In the end I dont think it will change much in performance.
array[] though inheritates from abstract Array class which implements IEnumerable and IEnumerable<t> (ICloneable, IList, ICollection,
IEnumerable, IStructuralComparable, IStructuralEquatable), this means objects are more blown up than three value type Parameters, which will make then slower obviously
Array - MSDN
You could test performance differences on both, but I doubt there would be much difference.
You have to consider maintainability, is another programmer, or even yourself going to understand why you did it that way in a few weeks, or a few months time when it's time for review? Is it easily extended, can you pass different object types through to your method?
If your passing a collection of items, then certainly packing them into an array would be quicker than specifying a new parameter for each additional item?
If you have to, you can do it that way, but have you considered param array??
Why use the params keyword?
public static void MethodName{params bool [] boolAarray}
{
//extract data here
}
Agreed with Matias' answer.
I also want to add that you need to add error checking, as you are passed an array, and nowhere is stated how many elements in your array you will receive. So you must first check that you have three elements in your array. This will balance the small perf gain that you may have earned.
Also, if you ever want to make this method available to other developers (as part of an API, public or private), intellisense will not help them at all in which parameters they're suppposed to set...
While using three parameters, you can do this :
///<summary>
///This method does something
///</summary>
///<param name="first">The first parameter</param>
///<param name="second">The second parameter</param>
///<param name="third">The third parameter</param>
public static void MethodName{bool first, bool second, bool third}
{
//Do something
}
And it will be displayed nicely and helpfully to others...
I would take a different approach and use Flags;
public static void MethodName(int Flag)
{
if (Flag & FIRST) { }
}
Chances are the compiler will do its own optimizations;
Check http://rextester.com/QRFL3116 Added method from Jamiec comment
M1 took 5ms
M2 took 23ms
M3 took 4ms

C# - how does variable scope and disposal impact processing efficiency?

I was having a discussion with a colleague the other day about this hypothetical situation. Consider this pseudocode:
public void Main()
{
MyDto dto = Repository.GetDto();
foreach(var row in dto.Rows)
{
ProcessStrings(row);
}
}
public void ProcessStrings(DataRow row)
{
string string1 = GetStringFromDataRow(row, 1);
string string2 = GetStringFromDataRow(row, 2);
// do something with the strings
}
Then this functionally identical alternative:
public void Main()
{
string1 = null;
string2 = null,
MyDto dto = Repository.GetDto();
foreach(var row in dto.Rows)
{
ProcessStrings(row, string1, string2)
}
}
public void ProcessStrings(DataRow row, string string1, string string2)
{
string1 = GetStringFromDataRow(row, 1);
string2 = GetStringFromDataRow(row, 2);
// do something with the strings
}
How will these differ in processing when running the compiled code? Are we right in thinking the second version is marginally more efficient because the string variables will take up less memory and only be disposed once, whereas in the first version, they're disposed of on each pass of the loop?
Would it make any difference if the strings in the second version were passed by ref or as out parameters?
When you're dealing with "marginally more efficient" level of optimizations you risk not seeing the whole picture and end up being "marginally less efficient".
This answer here risks the same thing, but with that caveat, let's look at the hypothesis:
Storing a string into a variable creates a new instance of the string
No, not at all. A string is an object, what you're storing in the variable is a reference to that object. On 32-bit systems this reference is 4 bytes in size, on 64-bit it is 8. Nothing more, nothing less. Moving 4/8 bytes around is overhead that you're not really going to notice a lot.
So neither of the two examples, with the very little information we have about the makings of the methods being called, creates more or less strings than the other so on this count they're equivalent.
So what is different?
Well in one example you're storing the two string references into local variables. This is most likely going to be cpu registers. Could be memory on the stack. Hard to say, depends on the rest of the code. Does it matter? Highly unlikely.
In the other example you're passing in two parameters as null and then reusing those parameters locally. These parameters can be passed as cpu registers or stack memory. Same as the other. Did it matter? Not at all.
So most likely there is going to be absolutely no difference at all.
Note one thing, you're mentioning "disposal". This term is reserved for the usage of objects implementing IDisposable and then the act of disposing of these by calling IDisposable.Dispose on those objects. Strings are not such objects, this is not relevant to this question.
If, instead, by disposal you mean "garbage collection", then since I already established that neither of the two examples creates more or less objects than the others due to the differences you asked about, this is also irrelevant.
This is not important, however. It isn't important what you or I or your colleague thinks is going to have an effect. Knowing is quite different, which leads me to...
The real tip I can give about optimization:
Measure
Measure
Measure
Understand
Verify that you understand it correctly
Change, if possible
You measure, use a profiler to find the real bottlenecks and real time spenders in your code, then understand why those are bottlenecks, then ensure your understanding is correct, then you can see if you can change it.
In your code I will venture a guess that if you were to profile your program you would find that those two examples will have absolutely no effect whatsoever on the running time. If they do have effect it is going to be on order of nanoseconds. Most likely, the very act of looking at the profiler results will give you one or more "huh, that's odd" realizations about your program, and you'll find bottlenecks that are far bigger fish than the variables in play here.
In both of your alternatives, GetStringFromDataRow creates new string every time. Whether you store a reference to this string in a local variable or in argument parameter variable (which is essentially not much different from local variable in your case) does not matter. Imagine you even not assigned result of GetStringFromDataRow to any variable - instance of string is still created and stored somewhere in memory until garbage collected. If you would pass your strings by reference - it won't make much difference. You will be able to reuse memory location to store reference to created string (you can think of it as the memory address of string instance), but not memory location for string contents.

C# huge performance drop assigning float value

I am trying to optimize my code and was running VS performance monitor on it.
It shows that simple assignment of float takes up a major chunk of computing power?? I don't understand how is that possible.
Here is the code for TagData:
public class TagData
{
public int tf;
public float tf_idf;
}
So all I am really doing is:
float tag_tfidf = td.tf_idf;
I am confused.
I'll post another theory: it might be the cache miss of the first access to members of td. A memory load takes 100-200 cycles which in this case seems to amount to about 1/3 of the total duration of the method.
Points to test this theory:
Is your data set big? It bet it is.
Are you accessing the TagData's in random memory order? I bet they are not sequential in memory. This causes the memory prefetcher of the CPU to be dysfunctional.
Add a new line int dummy = td.tf; before the expensive line. This new line will now be the most expensive line because it will trigger the cache miss. Find some way to do a dummy load operation that the JIT does not optimize out. Maybe add all td.tf values to a local and pass that value to GC.KeepAlive at the end of the method. That should keep the memory load in the JIT-emitted x86.
I might be wrong but contrary to the other theories so far mine is testable.
Try making TagData a struct. That will make all items of term.tags sequential in memory and give you a nice performance boost.
Are you using LINQ? If so, LINQ uses lazy enumeration so the first time you access the value you pulled out, it's going to be painful.
If you are using LINQ, call ToList() after your query to only pay the price once.
It also looks like your data structure is sub optimal but since I don't have access to your source (and probably couldn't help even if I did :) ), I can't tell you what would be better.
EDIT: As commenters have pointed out, LINQ may not be to blame; however my question is based on the fact that both foreach statements are using IEnumerable. The TagData assignment is a pointer to the item in the collection of the IEnumerable (which may or may not have been enumerated yet). The first access of legitimate data is the line that pulls the property from the object. The first time this happens, it may be executing the entire LINQ statement and since profiling uses the average, it may be off. The same can be said for tagScores (which I'm guessing is database backed) whose first access is really slow and then speeds up. I wasn't pointing out the solution just a possible problem given my understanding of IEnumerable.
See http://odetocode.com/blogs/scott/archive/2008/10/01/lazy-linq-and-enumerable-objects.aspx
As we can see that next line to the suspicious one takes only 0.6 i.e
float tag_tfidf = td.tf_idf;//29.6
string tagName =...;//0.6
I suspect this is caused bu the excessive number of calls, and also note float is a value type, meaning they are copied by value. So everytime you assign it, runtime creates new float (Single) struct and initializes it by copying the value from td.tf_idf which takes huge time.
You can see string tagName =...; doesn't takes much because it is copied by reference.
Edit: As comments pointed out I may be wrong in that respect, this might be a bug in profiler also, Try re profiling and see if that makes any difference.

Using inline object method call vs. declaring a new variable

I have been working with Java and C# for a while now and I've been asking myself this many times but haven't ever found the answer I was looking for.
When I have to call an object method (this means it's not static), I have to use the call through instance of class, for example:
MyClass myInstance = new MyClass();
myInstance.nonStaticMethod();
I see this kind of code everywhere, so I was thinking if one-line call (example below) is behaving differently performance wise or it's just the sake of standards?
This is what I meant with one-line call:
new MyClass().nonStaticMethod();
The performance would probably be the same.
However, having calls such as new MyClass().nonStaticMethod(); usually reeks of a code smell - what state was encapsulated by the object that you only needed to invoke a method on it? (i.e. Why was that not a static method?)
EDIT: I do not intend to say it is always bad - in some cases, such idioms are encouraged (such as in the case of fluent builder objects) - but you will notice that in these cases, the resulting object is still significant in some way.
I would go with the first way i.e.
MyClass myInstance = new MyClass();
myInstance.nonStaticMethod();
The problem with the second one i.e. new MyClass().nonStaticMethod();
is in case you want to call another method from same object you don't have any choice The only thing you can do is new MyClass().nonStaticMethod1(); which actually creates a new object everytime. IMHO I don't think any one of them would performance better than the other. IN lack of performance gain I would definitely choose the one which is more clear and understandable hence my choice.
If you look at the byte code generated you can prove there is absolutely no different to the performance or anything else. Except you are using a local variable which should be discarded.
Unless you have measured that you have a performance problem, you should assume you don't and guessing where a performance problem might be is just that, no matter how much experience you have performance optimizing Java applications.
When faced with a question like this, you should first consider what is the simplest and clearest, because the optimizer looks for standard patterns and if you write confusing code, you will not only confuse yourself and others but the optimizer as well and it is more likely to be slower as a result.
Assuming you never need to access the object's instance again, there is no difference. Do whichever you prefer.
Of course if you want to do anything else with that object later on you'll need it in a variable.

When instantiating an object, does it all get stored in memory?

Just short question, if you have a class with just one 1 property, and lots of (non static) methods, does an entirely new object get stored every time you say 'new object()', or just the property, and the methods in some 'common' memory space so the same Type can reference to that?
Thus, is having a large class always performing worse than a small class in terms of instantiation time?
Memory allocation may prove to be time consuming indeed.
Still, I believe a cleaner, more obvious measurement of resource consumption would be occupied space not instantiation time.
As you have stated yourself already, it is the case that methods,
static or not, occupy memory space just once. The this reference is just a hidden parameter, which gets sent from caller to called code just like any other parameter and in the end, all methods are just plain ol' functions (or routines).
In a simplistic way of putting it, so do all static fields.
Don't think about properties. They are just high level wrappers for methods which in the end access fields.
Instance fields are what occupies space, per instance.
But there are other things, like runtime type information which get allocated also.
In short, your assumption is correct.
EDIT
Just as a recap:
If by "large class" you mean a class which defines a lot of methods, then no, instantiation time will not be affected.
On the other hand, if by that term, you mean a class which defines a lot of instance fields, then yeah, instantiation time will be affected.
Although this is not my happy place (I know almost nothing of how good ol' malloc actually manages to defragment memory) thinking that allocating a lot of memory would take a longer time is in a strange way I can't put my finger on, like saying that
"adding the numbers 1024 and 2048 takes a bit longer than adding the numbers 3 and 4"
(given all 4 numbers are stored in variables of same numerical type).
So I would worry more about memory consumption. I'm sure time is somehow affected too, but maybe logarithmically.
Methods are shared. All other things being equal, instantiating a class with many methods has pretty much the same cost as instantiating one with few. It's their non-static fields and the amount of work performed by the constructor (and some other minor factors) that determine creation cost.
Instance fields are the only thing stored in the object itself. Methods are stored in the type, which means that they only exist in one place.
In fact, instance methods are just syntactic sugar (at the IL level) for static methods that accept an instance as a parameter.
I think you will find the information you need (and probably more) here . The code of the instance methods will be shared.

Categories