Using inline object method call vs. declaring a new variable - c#

I have been working with Java and C# for a while now and I've been asking myself this many times but haven't ever found the answer I was looking for.
When I have to call an object method (this means it's not static), I have to use the call through instance of class, for example:
MyClass myInstance = new MyClass();
myInstance.nonStaticMethod();
I see this kind of code everywhere, so I was thinking if one-line call (example below) is behaving differently performance wise or it's just the sake of standards?
This is what I meant with one-line call:
new MyClass().nonStaticMethod();

The performance would probably be the same.
However, having calls such as new MyClass().nonStaticMethod(); usually reeks of a code smell - what state was encapsulated by the object that you only needed to invoke a method on it? (i.e. Why was that not a static method?)
EDIT: I do not intend to say it is always bad - in some cases, such idioms are encouraged (such as in the case of fluent builder objects) - but you will notice that in these cases, the resulting object is still significant in some way.

I would go with the first way i.e.
MyClass myInstance = new MyClass();
myInstance.nonStaticMethod();
The problem with the second one i.e. new MyClass().nonStaticMethod();
is in case you want to call another method from same object you don't have any choice The only thing you can do is new MyClass().nonStaticMethod1(); which actually creates a new object everytime. IMHO I don't think any one of them would performance better than the other. IN lack of performance gain I would definitely choose the one which is more clear and understandable hence my choice.

If you look at the byte code generated you can prove there is absolutely no different to the performance or anything else. Except you are using a local variable which should be discarded.
Unless you have measured that you have a performance problem, you should assume you don't and guessing where a performance problem might be is just that, no matter how much experience you have performance optimizing Java applications.
When faced with a question like this, you should first consider what is the simplest and clearest, because the optimizer looks for standard patterns and if you write confusing code, you will not only confuse yourself and others but the optimizer as well and it is more likely to be slower as a result.

Assuming you never need to access the object's instance again, there is no difference. Do whichever you prefer.
Of course if you want to do anything else with that object later on you'll need it in a variable.

Related

Should I declare variables as close as possible to the scope where they will be used?

ReSharper usually suggests me that, and I'm still looking for a good reason of why to do that.
The only thing that came to my mind is that declaring it closer to the scope it will be used, can avoid initializing it in some cases where it isn't necessary (because a condition, etc.)
Something related with that is the following:
int temp;
foreach (var x in collection) {
temp = x.GetValue();
//Do something with temp
}
Is that really different than
foreach (var x in collection) {
int temp = x.GetValue();
//...
}
I mean, isn't the second code more expensive because it is allocating memory everytime? Or are both the same? Of course, after finished the loop, in the second code the garbage collector will take care about temp variable, but not in the first one...
Declaring as close as possible to use is a readability decision. Your example doesn't display it, but in longer methods it's hard to sift through the code to find the temp variable.
It's also a refactoring advantage. Declaring closer to source leads to easier refactoring later.
The cost of the second example is negligible. The only difference is that in the first example, temp will be available outside the scope of the for loop, and thus it will exist longer than if you declared it inside the for loop.
If you don't need temp outside the for loop, it shouldn't be declared outside that loop. Like others have said, readability and style are more at play here than performance and memory.
I agree that if you init a variable inside the scope that it's being used then you're helping the gc out, but I think the real reason is more to do with code maintenance best practices. It's sort of a way of reducing cognitive load on you or another developer coming back to the code after months (or years) of not looking at a particular block. Sure, IDE's help you discover things, but you still have to do the "go to definition" dance.
There is no performance benefits, I believe, but more of a coding style. Its more C programming style to declare it all at the beginning of the scope. There is more details here: Scope of variables in C#
Its a style personal preference thing to do with readability.
There are very few languages/systems where this will have any noticeable effect on performance.
I try to follow these two rules.
All the core attributes of a class should be defined together in one place. e.g. If you are handling an order then orderno, customerno, amount, sales tax etc. should be defined close together.
All the technical attributes which form part of the internal mechanics of the class such as iterators, flags, state varaibles should be defined close to thier usage.
Or to put it another business/external type data all defined in one place, technical/internal data defined close to usage.
The difference is a matter of coding style and one of such dispute that different coding standards have completely opposite rules. The conflict is still strongest in the C++ world where the C language forced variables to be declared at the beginning of a scope and so old-timers (like myself) were well accustomed to "looking at the beginning of the function" to find variables.
The C# style that you most often see is that variables come into existence right at the point where they are needed. This style limits the existence of the variable and minimizes the chance that you could mean some other variable accidentally. I find it very easy to read.
In the modern C# era, putting the declaration of variables at their first point of use is most clearly beneficial when combined with the both loved and hated var feature. Using var just isn't that useful unless you use it with an assignment that allows the compiler and readers to infer the type of the variable. The var feature encourages declaration with first use.
Me, I love var, and so you can guess which coding style I prefer!
I was always taught to declare your variables at the top of a function, class, etc. This makes it easier to read.

Declaring and creating an object then adding to collection VS Adding object to collection using new keyword to create object

Ok so the title may have been confusing so i have posted 2 code snippets to illustrate what i mean.
NOTE: allUsers is just a collection.
RegularUser regUser = new RegularUser(userName, password, name, emailAddress);
allUsers.Add(regUser);
VS
allUsers.Add(new RegularUser(userName, password, name, emailAddress));
Which snippet A or B is better and why?
What are the advantages or disadvantages?
The example i wrote was C# but does the language (C#, Java etc) make a difference?
As far as C# is concerned, both of your code examples are practically identical at the IL level. The second examples still creates a reference to the created object and pushes it onto the stack, you just don't have a local variable hooked up to it. This will not create any performance problems at all.
1) Which snippet A or B is better and why?
They're really identical. The compiled code will be nearly identical, since a temporary object is pushed onto the stack, then used in the method call.
2) What are the advantages or disadvantages?
The main advantages and disadvantages to the approach are really just readability.
Your first example has the advantage of keeping a single "operation" per line of code, which, in many ways, is more maintainable.
The second example removes the unnecessary variable declaration, which may be more maintainable.
Personally, I feel that the number of parameters in your RegularUser constructor would probably push me, in this instance, towards your first option. I typically find that, when a line of code gets to be more than about half a screen width on an average monitor, it's easier to read and understand if it's split up. Splitting this up by introducing the temporary and calling Add separately makes this more clear.
However, if you're just adding an integer or a class that's very small, I'd probably vote to skip the unnecessary variable. This is completely a personal preference, however - your milage may (and probably will) vary.
3) The example i wrote was C# but does the language (C#, Java etc) make a difference?
No, for the most part. This is really language/implementation dependent, but most languages will have the same basic behavior and performance in both cases. It is possible (and highly likely) that some languages may treat this differently, but most mainstream languages will not.
I really like to create them the first way unless I really really know what is going on. It is much harder to do debugging if you don't create the object first...
The compiler will just turn the 2nd version into the 1st for you, anyway, so there isn't a net negative effect.
Pros of #1:
easier to debug (!)
theoretically easier to read, clearer
can use the object later
Cons:
more verbose
can be unnecessary, especially for trivial objects
Result:
1 for anything complex to create, or that may need to be inspected easily at debug time
2 for lots of annoying little stuff, like the following.
var list = new List<NameValuePair>(3);
list.Add( new NameValuePair("name", "valuable");
list.add( new NameValuePair("age", "valuable");
list.add( new NameValuePair("height", "not valuable");
var dates = new List<date>();
dates.Add(DateTime.Now());
dates.Add(DateTime.Now().Date().AddYears(-2));
As far as I know there isn't a real difference between languages when it comes to this. Some may not allow it, though.
Both are equal in terms of performance.
In terms of maintainability the second case is a nightmare, it is (nearly) impossible to trace in a debugger. So I tend to prefer the first one. In my early oop days I was always writing the second, because "I knew that they were objects and I was sooo good at grasping objects that I ... blah blah blah", but that wore off with time and especially maintenance time
Also, suppose that someone wants you to
FilterClass.FilterUser(regUser)
or
Database.AddToDatabase(regUser)
because it is the right place to do so, the first scenario is better.
Finally, when do you stop?
allUsers.Add(new RegularUser(new ReadFromInput(new EscapedName(new Name(new String(userName)))), password, name, emailAddress));

Is there any advantage in writing a set of operations in a single line if all those operations have to occur anyway?

From a post-compilation perspective (rather than a coding syntax perspective), in C#, is there any actual difference in the compiled code between a set of operations that have occurred on one line to a set of operations that occur across multiple lines?
This
object anObject = new object();
anObject = this.FindName("rec"+keyPlayed.ToString());
Rectangle aRectangle = new Rectangle();
aRectangle = (Rectangle)anObject;
vs this.
Rectangle aRectangle = (Rectangle)this.FindName("rec"+keyPlayed.ToString());
I wonder because there seems to be a view that the least amount of lines used is better however I would like to understand if this is because there is a tangible technical benefit or if there was at some point a tangible benefit or if it is indeed for a reason that is quantifiable?
The number of lines don't matter; the IL will be identical if the code is equivalent (your's isn't).
And actually, unless we know what FindName returns, we can't answer properly - since by casting to object you might be introducing a "box" operation, and you might be changing a conversion operation (or perhaps a passive no-op cast) into an active double-cast (cast to object, cast to Rectangle). For now, I'll assume that FindName returns object, for simplicity. If you'd used var, we'd know at a glance that your code wasn't changing the type (box / cast / etc):
var anObject = this.FindName("rec"+keyPlayed.ToString());
In release mode (with optimize enabled) the compiler will remove most variables that are set and then used immediately. The biggest difference between the two lines above is that the second version doesn't create and discard new object() and new Rectangle(). But if you hadn't have done that, the code would have been equivalent (again, assuming that FindName returns object):
object anObject;
anObject = this.FindName("rec"+keyPlayed.ToString());
Rectangle aRectangle;
aRectangle = (Rectangle)anObject;
Some subtleties exist if you re-use the variable (in which case it can't necessarily be removed by the compiler), and if that variable is "captured" by a lambda/anon-method, or used in a ref/out. And some more subtleties for some math scenarios if the compiler/JIT chooses to do an operation purely in the registers without copying it back down to a variable (the registers have different (greater) width, even for "fixed-size" math like float).
I think that you should generally aim to make your code as readable as possible, and sometimes that means seperating out your code and sometimes it means having it on one line. Aim for readablity and if performance becomes a problem, use profiling tools to analyse the code and refactor it if necessary.
The compiled code may not have any difference (with optimization enabled perhaps), but think about readability too :)
In your example, everything on one line is actually more readable than separate lines. What you were trying to do was immediately obvious there. But others can quickly point out counter-examples. So use your good judgment to decide which way to go.
There's a refactoring pattern to prefer a call to a temporary variable. Following this pattern reduces the number of lines of code but makes interactive debugging harder.
One the main practical issues which differ between the two is, when debugging it can be useful to have the individual steps on different lines with results being passed to local variables.
This means that you can cleanly step through the different bits of code which give the final result and see the intervening values.
When you build optimized the compiler will remove the steps and make the code efficient.
Tony
With your example there is an actual difference, as you in the first piece of code are creating objects and values that you don't use.
The proper way to write that code is like this:
object anObject;
anObject = this.FindName("rec" + keyPlayed.ToString());
Rectangle aRectangle;
aRectangle = (Rectangle)anObject;
Now, the difference between that and the single line version is that you are declaring one more local variable. In most cases the compiler can optimize that so that the generated code is identical anyway, and even if it actually uses one more local variable in the generated code, that is still negligable compared to anything else you do in that code.
For this example I think that the single line version is clearer, but with more complicated code it can of course be clearer to split it into several stages. Local variables are very cheap, so you should not hesitate to use some if the code gets clearer.

Can I detect whether I've been given a new object as a parameter?

Short Version
For those who don't have the time to read my reasoning for this question below:
Is there any way to enforce a policy of "new objects only" or "existing objects only" for a method's parameters?
Long Version
There are plenty of methods which take objects as parameters, and it doesn't matter whether the method has the object "all to itself" or not. For instance:
var people = new List<Person>();
Person bob = new Person("Bob");
people.Add(bob);
people.Add(new Person("Larry"));
Here the List<Person>.Add method has taken an "existing" Person (Bob) as well as a "new" Person (Larry), and the list contains both items. Bob can be accessed as either bob or people[0]. Larry can be accessed as people[1] and, if desired, cached and accessed as larry (or whatever) thereafter.
OK, fine. But sometimes a method really shouldn't be passed a new object. Take, for example, Array.Sort<T>. The following doesn't make a whole lot of sense:
Array.Sort<int>(new int[] {5, 6, 3, 7, 2, 1});
All the above code does is take a new array, sort it, and then forget it (as its reference count reaches zero after Array.Sort<int> exits and the sorted array will therefore be garbage collected, if I'm not mistaken). So Array.Sort<T> expects an "existing" array as its argument.
There are conceivably other methods which may expect "new" objects (though I would generally think that to have such an expectation would be a design mistake). An imperfect example would be this:
DataTable firstTable = myDataSet.Tables["FirstTable"];
DataTable secondTable = myDataSet.Tables["SecondTable"];
firstTable.Rows.Add(secondTable.Rows[0]);
As I said, this isn't a great example, since DataRowCollection.Add doesn't actually expect a new DataRow, exactly; but it does expect a DataRow that doesn't already belong to a DataTable. So the last line in the code above won't work; it needs to be:
firstTable.ImportRow(secondTable.Rows[0]);
Anyway, this is a lot of setup for my question, which is: is there any way to enforce a policy of "new objects only" or "existing objects only" for a method's parameters, either in its definition (perhaps by some custom attributes I'm not aware of) or within the method itself (perhaps by reflection, though I'd probably shy away from this even if it were available)?
If not, any interesting ideas as to how to possibly accomplish this would be more than welcome. For instance I suppose if there were some way to get the GC's reference count for a given object, you could tell right away at the start of a method whether you've received a new object or not (assuming you're dealing with reference types, of course--which is the only scenario to which this question is relevant anyway).
EDIT:
The longer version gets longer.
All right, suppose I have some method that I want to optionally accept a TextWriter to output its progress or what-have-you:
static void TryDoSomething(TextWriter output) {
// do something...
if (output != null)
output.WriteLine("Did something...");
// do something else...
if (output != null)
output.WriteLine("Did something else...");
// etc. etc.
if (output != null)
// do I call output.Close() or not?
}
static void TryDoSomething() {
TryDoSomething(null);
}
Now, let's consider two different ways I could call this method:
string path = GetFilePath();
using (StreamWriter writer = new StreamWriter(path)) {
TryDoSomething(writer);
// do more things with writer
}
OR:
TryDoSomething(new StreamWriter(path));
Hmm... it would seem that this poses a problem, doesn't it? I've constructed a StreamWriter, which implements IDisposable, but TryDoSomething isn't going to presume to know whether it has exclusive access to its output argument or not. So the object either gets disposed prematurely (in the first case), or doesn't get disposed at all (in the second case).
I'm not saying this would be a great design, necessarily. Perhaps Josh Stodola is right and this is just a bad idea from the start. Anyway, I asked the question mainly because I was just curious if such a thing were possible. Looks like the answer is: not really.
No, basically.
There's really no difference between:
var x = new ...;
Foo(x);
and
Foo(new ...);
and indeed sometimes you might convert between the two for debugging purposes.
Note that in the DataRow/DataTable example, there's an alternative approach though - that DataRow can know its parent as part of its state. That's not the same thing as being "new" or not - you could have a "detach" operation for example. Defining conditions in terms of the genuine hard-and-fast state of the object makes a lot more sense than woolly terms such as "new".
Yes, there is a way to do this.
Sort of.
If you make your parameter a ref parameter, you'll have to have an existing variable as your argument. You can't do something like this:
DoSomething(ref new Customer());
If you do, you'll get the error "A ref or out argument must be an assignable variable."
Of course, using ref has other implications. However, if you're the one writing the method, you don't need to worry about them. As long as you don't reassign the ref parameter inside the method, it won't make any difference whether you use ref or not.
I'm not saying it's good style, necessarily. You shouldn't use ref or out unless you really, really need to and have no other way to do what you're doing. But using ref will make what you want to do work.
No. And if there is some reason that you need to do this, your code has improper architecture.
Short answer - no there isn't
In the vast majority of cases I usually find that the issues that you've listed above don't really matter all that much. When they do you could overload a method so that you can accept something else as a parameter instead of the object you are worried about sharing.
// For example create a method that allows you to do this:
people.Add("Larry");
// Instead of this:
people.Add(new Person("Larry"));
// The new method might look a little like this:
public void Add(string name)
{
Person person = new Person(name);
this.add(person); // This method could be private if neccessary
}
I can think of a way to do this, but I would definitely not recommend this. Just for argument's sake.
What does it mean for an object to be a "new" object? It means there is only one reference keeping it alive. An "existing" object would have more than one reference to it.
With this in mind, look at the following code:
class Program
{
static void Main(string[] args)
{
object o = new object();
Console.WriteLine(IsExistingObject(o));
Console.WriteLine(IsExistingObject(new object()));
o.ToString(); // Just something to simulate further usage of o. If we didn't do this, in a release build, o would be collected by the GC.Collect call in IsExistingObject. (not in a Debug build)
}
public static bool IsExistingObject(object o)
{
var oRef = new WeakReference(o);
#if DEBUG
o = null; // In Debug, we need to set o to null. This is not necessary in a release build.
#endif
GC.Collect();
GC.WaitForPendingFinalizers();
return oRef.IsAlive;
}
}
This prints True on the first line, False on the second.
But again, please do not use this in your code.
Let me rewrite your question to something shorter.
Is there any way, in my method, which takes an object as an argument, to know if this object will ever be used outside of my method?
And the short answer to that is: No.
Let me venture an opinion at this point: There should not be any such mechanism either.
This would complicate method calls all over the place.
If there was a method where I could, in a method call, tell if the object I'm given would really be used or not, then it's a signal to me, as a developer of that method, to take that into account.
Basically, you'd see this type of code all over the place (hypothetical, since it isn't available/supported:)
if (ReferenceCount(obj) == 1) return; // only reference is the one we have
My opinion is this: If the code that calls your method isn't going to use the object for anything, and there are no side-effects outside of modifying the object, then that code should not exist to begin with.
It's like code that looks like this:
1 + 2;
What does this code do? Well, depending on the C/C++ compiler, it might compile into something that evaluates 1+2. But then what, where is the result stored? Do you use it for anything? No? Then why is that code part of your source code to begin with?
Of course, you could argue that the code is actually a+b;, and the purpose is to ensure that the evaluation of a+b isn't going to throw an exception denoting overflow, but such a case is so diminishingly rare that a special case for it would just mask real problems, and it would be really simple to fix by just assigning it to a temporary variable.
In any case, for any feature in any programming language and/or runtime and/or environment, where a feature isn't available, the reasons for why it isn't available are:
It wasn't designed properly
It wasn't specified properly
It wasn't implemented properly
It wasn't tested properly
It wasn't documented properly
It wasn't prioritized above competing features
All of these are required to get a feature to appear in version X of application Y, be it C# 4.0 or MS Works 7.0.
Nope, there's no way of knowing.
All that gets passed in is the object reference. Whether it is 'newed' in-situ, or is sourced from an array, the method in question has no way of knowing how the parameters being passed in have been instantiated and/or where.
One way to know if an object passed to a function (or a method) has been created right before the call to the function/method is that the object has a property that is initialized with the timestamp passed from a system function; in that way, looking at that property, it would be possible to resolve the problem.
Frankly, I would not use such method because
I don't see any reason why the code should now if the passed parameter is an object right created, or if it has been created in a different moment.
The method I suggest depends from a system function that in some systems could not be present, or that could be less reliable.
With the modern CPUs, which are a way faster than the CPUs used 10 years ago, there could be the problem to use the right value for the threshold value to decide when an object has been freshly created, or not.
The other solution would be to use an object property that is set to a a value from the object creator, and that is set to a different value from all the methods of the object.
In this case the problem would be to forget to add the code to change that property in each method.
Once again I would ask to myself "Is there a really need to do this?".
As a possible partial solution if you only wanted one of an object to be consumed by a method maybe you could look at a Singleton. In this way the method in question could not create another instance if it existed already.

Ref Abuse: Worth Cleaning Up?

I have inherited some code that uses the ref keyword extensively and unnecessarily. The original developer apparently feared objects would be cloned like primitive types if ref was not used, and did not bother to research the issue before writing 50k+ lines of code.
This, combined with other bad coding practices, has created some situations that are absurdly dangerous on the surface. For example:
Customer person = NextInLine();
//person is Alice
person.DataBackend.ChangeAddress(ref person, newAddress);
//person could now be Bob, Eve, or null
Could you imagine walking into a store to change your address, and walking out as an entirely different person?
Scary, but in practice the use of ref in this application seems harmlessly superfluous. I am having trouble justifying the extensive amount of time it would take to clean it up. To help sell the idea, I pose the following question:
How else can unnecessary use of ref be destructive?
I am especially concerned with maintenance. Plausible answers with examples are preferred.
You are also welcome to argue clean-up is not necessary.
I would say the biggest danger is if the parameter were set to null inside the function for some reason:
public void MakeNull(ref Customer person)
{
// random code
person = null;
return;
}
Now, you're not just a different person, you've been erased from existence altogether!
As long as whoever is developing this application understands that:
By default, object references are passed by value.
and:
With the ref keyword, object references are passed by reference.
If the code works as expected now and your developers understand the difference, it's probably not worth the effort it's going to take remove them all.
I'll just add the worst use of the ref keyword I've ever seen, the method looked something like this:
public bool DoAction(ref Exception exception) {...}
Yup, you had to declare and pass an Exception reference in order to call the method, then check the method's return value to see if an exception had been caught and returned via the ref exception.
Can you work out why the originator of the code thought that they needed to have the parameter as a ref? Was it because they did update it and then removed the functionality or was it simply because they didn't understand c# at the time?
If you think the clean up is worth it, then go ahead with it - particularly if you have the time now. You might not be in a position to do fix it properly when a real issue does arise, as it will most likely be an urgent bug fix and you won't have the time to do a proper job.
It is quite common in C# to modify the values of arguments in methods since they usually are by value, and not by ref. This applies to both reference and value types; setting a reference to null for instance would change the original reference. This could lead to very strange and painful bugs when other developers work "as usual". Creating recursive methods with ref arguments is a no-go.
Apart from this, you have restrictions on what you can pass by ref. For instance, you cannot pass constant values, readonly fields, properties etc., so that a lot of helper variables are required when calling methods with ref arguments.
Last but not least the performance if likely not nearly as well, since it requires more indirections (a ref is just a reference which needs to be resolved on every access) and may also keep objects alive longer, since the references are not going out of scope as quickly.
To me, smells like a C++ developer making unwarranted assumptions.
I'd be wary of making wholesale changes to something that works. (I'm assuming it works because you don't comment about it being broken, just about it being dangerous).
The last thing you want to do is to break something subtle and have to spend a week tracking down the problem.
I suggest you clean up as you go - one method at a time.
Find a method that uses ref where you're sure it isn't required.
Change the method signature and fix up the calls.
Test.
Repeat.
While the specific problems you have may be more severe than most cases, your situation is pretty common - having a large code base that doesn't comply with our current understanding of the "right way" to do things.
Wholesale "upgrades" often run into difficulties. Refactoring as you go - bringing things up to spec as you work on them - is much safer.
There's precedent here in the building industry. For example, electrical wiring in older (say, 19th century) buildings doesn't need to be touched unless there's a problem. When there is a problem, though, the new work has to be completed to modern standards.
I would try to fix it. Just perform a solution wide string replacement with regex and check the unit tests after that. I am aware that this might break the code. But how often do you use ref? Almost never, right? And given that the developer did not know how it works, I consider the chance that it is used somewhere (the way it should) even smaller. And if the code break - well, rollback ...
How else can unnecessary use of ref be destructive?
The other answers deal with semantic issues which is definitely the most important thing. Good code is self-documenting, and when I give a ref parameter, I assume it will change. If it doesn't, the API failed to be self-documenting.
But for fun, how about we look at another aspect -- performance?
void ChangeAddress(ref Customer person, Address address)
{
person.Address = address;
}
Here person is a reference to a reference, so there will be some indirection introduced whenever you access it. Lets look at some assembly this might translate into:
mov eax, [person] ; load the reference to person.
mov [eax+Address], address ; assign address to person.Address.
Now the non-ref version:
void ChangeAddress(Customer person, Address address)
{
person.Address = address;
}
There's no indirection here, so we can get rid of one read:
mov [person+Address], address ; assign address to person.Address.
In practice, one hopes that .NET caches [person] in the ref version, amortizing the indirection cost over multiple accesses. It probably won't actually be a 50% drop in instruction count outside of trivial methods like the ones here.

Categories