I'm new to C# and any form of programming, and I have a question that seems to divide those in the know in my university faculty. That question is simply: do I always have to declare a variable? As a basic example of what I'm talking about: If I have int pounds and int pence do I need to declare int money into which to put the answer or is it ok to just have:
textbox1.Text = (pounds + pence).ToString();
I know both work but i'm thinking in terms of best practice.
Thanks in advance.
In my opinion the answer is "no". You should, however, use variables in some cases:
Whenever a value is used multiple times
When a call to an expensive function is done, or one that has side-effects
When the expression needs to be made more self-explaining, variable (with meaningful names) do help
Basically, follow your common sense. The code should be self-explaining and clear, if introducing variables helps with that then use them.
Maintenance is king. It's the most expensive part of software development and anything you can do to make it easier is good.
Variables are good because when debugging, you can inspect the results of functions and calculations.
Absolutely not. The only time I would create a variable for a single use like this is if it significantly increases the readability of my code.
In my opinion if you do something like
int a = SomeFunc();
int b = SomeFunc2();
int c = a + b;
SomeFunc3(c);
it is better to just do
int a = SomeFunc();
int b = SomeFunc2();
SomeFunc3(a + b);
or even just
SomeFunc3(SomeFunc() + SomeFunc2());
If I am not manipulating with the variable after it's calculated then I think it's just better not to declare it because you just get more lines of code and way more room to make some mistake later on when your code gets bigger
Variables come to serve the following two purposes above all else:
Place holder for data (and information)
Readability enhancer
of course more can be said about the job of a variable, but those other tasks are less important here, and are far less relevant.
The above two points have the same importance as far as I'm concerned.
If you think that declaring a variable will enhance readability, or if you think that the data stored in that variable will be needed many times (and in which case, storing it in a well name var will again increase readability), then by all means create a new variable.
The only time I strictly advice against creating more variables is when the clutter of too-many-variables impacts readability more then aids it, and this cannot be undone by method extraction.
I would suggest that logging frequently makes variable declaration worthwile, and when you need to know what something specific is and you need to track that specific value. And you are logging, aren't you? Logging is good. Logging is right. Logging is freedom and unicorns and happy things.
I don't always use a variable. As an example if have a method evaluating something and returning true/false, I typically am returning the expression. The results are logged elsewhere, and I have the inputs logged, so I always know what happened.
Localisation and scope
For a programmer, knowledge of local variables - their content and scopes - is an essential part of the mental effort in comprehending and evolving the code. When you reduce the number of concurrent variables, you "free up" the programmer to consider other factors. Minimising scope is a part of this. Every little decision implies something about your program.
void f()
{
int x = ...; // "we need x (or side effect) in next scope AND
// thereafter..."
{
int n = ...; // "n isn't needed at function scope..."
...
} // can "pop" that n mentally...
...
}
The smallest scope is a literal or temporary result. If a value is only used once I prefer to use comments rather than a variable (they're not restricted to A-Za-z0-9_ either :-)):
x = employees.find("*", // name
retirement.qualify_at(), // age
num_wives() + num_kids()); // # dependents
Concision
Keeping focused on what your program is achieving is important. If you have a lot of screen real-estate (i.e. lines of code) getting fields into variables, you've less code on screen that's actually responsible for getting algorithmic level stuff done, and hence it's less tangible to the programmer. That's another reason to keep the code concise, so:
keep useful documentation targetted and concise
It depends on the situation. There is no one practice that would be best for all. For something this simple, you can skip creating a new variable but the best thing to do is step back and see how readable your expression is and see if introducing an intermediate variable will help the situation.
There are two objectives in making this decision:
Readability - the code is readable
and self-explanatory Code
Optimization - the code doesn't have
any unnecessary calculation
If you look at this as an optimization problem it might seem less subjective
Most readable on a scale from 1 to 10 with 10 being the easiest. Using sensible variable names may give you a 2, showing the calculation in line may give you a 3 (since the user doesn't have to look up what "money" is, it's just there in that line of code). etc etc. This piece is subjective, you and the companies you work for define what is readable and you can build this cost model from that experience.
Most optimal execution is not subjective. If you write "pounds + pence" everywhere you want the money calculation to go, you are wasting processor time. Yes I know addition is a bad example, but it still holds true. Say minimum execution of a process is simplified to memory allocation of variables, assignments, and calculations. Maybe one or two of these additions in the code will be ok for readability, but at some point it becomes a complete waste on the processor. This is why variables exist, allocate some space to store the value, name it money so the user knows what it is, and reference that variable "money" everywhere it's needed.
This makes more sense as you look at loops. lets say you want to sum 1000 values in the following loop.
money = factor[1] + factor[2] + ... + factor[n]
You could do this everywhere you want to use the value money so that anyone that reads your code knows what money consists of, instead, just do it once and write in some comments when you first calculate money so any programmers can come back and reference that.
Long story short, if you only use money once and it's clear what the inline calculation means, then of course, don't make a variable. If you plan on using it throughout your code and it's meaning becomes remotely confusing, then declare a variable, save the processor and be done with it!
Note: partially kidding about this approach, just thought it was funny to answer something like this in a cost model format :) still useful I'de say
I don't recall ever seeing anything like is, and I think it's more tied to different "styles" of programming. Some styles, such a Spartan programming actually attempts to declare as few as possible. If you aren't trying to follow a particular style, then it's best to go off of readability. In your example I wouldn't declare a special variable to hold it. It you were calculating taxes based off some percentage of the total of those, then I may- or at the very least I would comment what I was calculating.
Related
Am I correct in saying that this:
public static void MethodName{bool first, bool second, bool third}
{
//Do something
}
Is more efficient than this:
public static void MethodName{bool [] boolArray}
{
bool first = boolArray[0];
bool second = boolArray[1];
bool third = boolArray[2];
//Do something
}
My thoughts are that for both they would have to declare first, second and third - just in different places. But for the second one it has to add it into an array and then unpack it again.
Unless you declared the array like this:
MethodName(new[] { true, true, true });
In which case I am not sure which is faster?
I ask because I am thinking of using the second one but wanted to know if/what the implications are on performance.
In this case performance is not particularly important, but it would be helpful for me to clarify this point.
Also, the second one has the advantage that you can pass as many values as you like to it, and it is also easier to read I think?
The reason I am thinking of using this is because there are already about 30 parameters being passed into the method and I feel it is becoming confusing to keep adding more. All these bools are closely related so I thought it may make the code more manageable to package them up.
I am working on existing code and it is not in my project scope to spend time reworking the method to decrease the number of parameters that are passed into the method, but I thought it would be good practice to understand the implications of this change.
In terms of performance, there's just an answer for your question:
"Programmers waste enormous amounts of time thinking about, or
worrying about, the speed of noncritical parts of their programs, and
these attempts at efficiency actually have a strong negative impact
when debugging and maintenance are considered. We should forget about
small efficiencies, say about 97% of the time: premature optimization
is the root of all evil. Yet we should not pass up our opportunities
in that critical 3%."
In terms of productivity, parameters > arrays.
Side note
Everyone should know that that was said by Donald Knuth in 1974. More than 40 years after this statement, we still fall on premature optimization (or even pointless optimization) very often!
Further reading
I would take a look at this other Q&A on Software Engineering
Am I correct in saying that this:
Is more efficient than this:
In isolation, yes. Unless the caller already has that array, in which case the second is the same or even (for larger argument types or more arguments) minutely faster.
I ask because I am thinking of using the second one but wanted to know if/what the implications are on performance.
Why are you thinking about the second one? If it is more natural at the point of the call then the reasons making it more natural are likely going to also have a performance impact that makes the second the better one in the wider context that outweighs this.
If you're starting off with three separate bools and you're wrapping them just to unwrap them again then I don't see what this offers in practice except for more typing.
So your reason for considering this at all is the more important thing here.
In this case performance is not particularly important
Then really don't worry about it. It's certainly known for hot-path code that hits params to offer overloads that take set numbers of individual parameters, but it really does only make a difference in hot paths. If you aren't in a hot path the lifetime saving of computing time of picking whichever of the two is indeed more efficient is unlikely to add up to the
amount of time it took you to write your post here.
If you are in a hot path and really need to shave off every nanosecond you can because you're looping so much that it will add up to something real, then you have to measure. Isolated changes have non-isolated effects when it comes to performance, so it doesn't matter whether the people on the Internet tell you A is faster than B if the wider context means the code calling A is slower than B. Measure. Measurement number one is "can I even notice?", if the answer to that measurement is "no" then leave it alone and find somewhere where the performance impact is noticeable to optimise instead.
Write "natural" code to start with, before seeing if little tweaks can have a performance impact in the bits that are actually hurting you. This isn't just because of the importance of readability and so on, but also because:
The more "natural" code in a given language very often is the more efficient. Even if you think it can't be, it's more likely to benefit from some compiler optimisation behind the scenes.
The more "natural" code is a lot easier to tweak for performance when it is necessary than code doing a bunch of strange things.
I don't think this would affect the performance of your app at all.
Personally
I'd go with the first option for two reasons:
Naming each parameter: if the project is a large scale project and there is a lot of coding or for possible future edits and enhancements.
Usability: if you are sending a list of similar parameters then you must use an array or a list, if it just a couple of parameters that happened to be of the same type then you should be sending them separately.
Third way would be use of params, Params - MSDN
In the end I dont think it will change much in performance.
array[] though inheritates from abstract Array class which implements IEnumerable and IEnumerable<t> (ICloneable, IList, ICollection,
IEnumerable, IStructuralComparable, IStructuralEquatable), this means objects are more blown up than three value type Parameters, which will make then slower obviously
Array - MSDN
You could test performance differences on both, but I doubt there would be much difference.
You have to consider maintainability, is another programmer, or even yourself going to understand why you did it that way in a few weeks, or a few months time when it's time for review? Is it easily extended, can you pass different object types through to your method?
If your passing a collection of items, then certainly packing them into an array would be quicker than specifying a new parameter for each additional item?
If you have to, you can do it that way, but have you considered param array??
Why use the params keyword?
public static void MethodName{params bool [] boolAarray}
{
//extract data here
}
Agreed with Matias' answer.
I also want to add that you need to add error checking, as you are passed an array, and nowhere is stated how many elements in your array you will receive. So you must first check that you have three elements in your array. This will balance the small perf gain that you may have earned.
Also, if you ever want to make this method available to other developers (as part of an API, public or private), intellisense will not help them at all in which parameters they're suppposed to set...
While using three parameters, you can do this :
///<summary>
///This method does something
///</summary>
///<param name="first">The first parameter</param>
///<param name="second">The second parameter</param>
///<param name="third">The third parameter</param>
public static void MethodName{bool first, bool second, bool third}
{
//Do something
}
And it will be displayed nicely and helpfully to others...
I would take a different approach and use Flags;
public static void MethodName(int Flag)
{
if (Flag & FIRST) { }
}
Chances are the compiler will do its own optimizations;
Check http://rextester.com/QRFL3116 Added method from Jamiec comment
M1 took 5ms
M2 took 23ms
M3 took 4ms
I am trying to optimize my code and was running VS performance monitor on it.
It shows that simple assignment of float takes up a major chunk of computing power?? I don't understand how is that possible.
Here is the code for TagData:
public class TagData
{
public int tf;
public float tf_idf;
}
So all I am really doing is:
float tag_tfidf = td.tf_idf;
I am confused.
I'll post another theory: it might be the cache miss of the first access to members of td. A memory load takes 100-200 cycles which in this case seems to amount to about 1/3 of the total duration of the method.
Points to test this theory:
Is your data set big? It bet it is.
Are you accessing the TagData's in random memory order? I bet they are not sequential in memory. This causes the memory prefetcher of the CPU to be dysfunctional.
Add a new line int dummy = td.tf; before the expensive line. This new line will now be the most expensive line because it will trigger the cache miss. Find some way to do a dummy load operation that the JIT does not optimize out. Maybe add all td.tf values to a local and pass that value to GC.KeepAlive at the end of the method. That should keep the memory load in the JIT-emitted x86.
I might be wrong but contrary to the other theories so far mine is testable.
Try making TagData a struct. That will make all items of term.tags sequential in memory and give you a nice performance boost.
Are you using LINQ? If so, LINQ uses lazy enumeration so the first time you access the value you pulled out, it's going to be painful.
If you are using LINQ, call ToList() after your query to only pay the price once.
It also looks like your data structure is sub optimal but since I don't have access to your source (and probably couldn't help even if I did :) ), I can't tell you what would be better.
EDIT: As commenters have pointed out, LINQ may not be to blame; however my question is based on the fact that both foreach statements are using IEnumerable. The TagData assignment is a pointer to the item in the collection of the IEnumerable (which may or may not have been enumerated yet). The first access of legitimate data is the line that pulls the property from the object. The first time this happens, it may be executing the entire LINQ statement and since profiling uses the average, it may be off. The same can be said for tagScores (which I'm guessing is database backed) whose first access is really slow and then speeds up. I wasn't pointing out the solution just a possible problem given my understanding of IEnumerable.
See http://odetocode.com/blogs/scott/archive/2008/10/01/lazy-linq-and-enumerable-objects.aspx
As we can see that next line to the suspicious one takes only 0.6 i.e
float tag_tfidf = td.tf_idf;//29.6
string tagName =...;//0.6
I suspect this is caused bu the excessive number of calls, and also note float is a value type, meaning they are copied by value. So everytime you assign it, runtime creates new float (Single) struct and initializes it by copying the value from td.tf_idf which takes huge time.
You can see string tagName =...; doesn't takes much because it is copied by reference.
Edit: As comments pointed out I may be wrong in that respect, this might be a bug in profiler also, Try re profiling and see if that makes any difference.
ReSharper usually suggests me that, and I'm still looking for a good reason of why to do that.
The only thing that came to my mind is that declaring it closer to the scope it will be used, can avoid initializing it in some cases where it isn't necessary (because a condition, etc.)
Something related with that is the following:
int temp;
foreach (var x in collection) {
temp = x.GetValue();
//Do something with temp
}
Is that really different than
foreach (var x in collection) {
int temp = x.GetValue();
//...
}
I mean, isn't the second code more expensive because it is allocating memory everytime? Or are both the same? Of course, after finished the loop, in the second code the garbage collector will take care about temp variable, but not in the first one...
Declaring as close as possible to use is a readability decision. Your example doesn't display it, but in longer methods it's hard to sift through the code to find the temp variable.
It's also a refactoring advantage. Declaring closer to source leads to easier refactoring later.
The cost of the second example is negligible. The only difference is that in the first example, temp will be available outside the scope of the for loop, and thus it will exist longer than if you declared it inside the for loop.
If you don't need temp outside the for loop, it shouldn't be declared outside that loop. Like others have said, readability and style are more at play here than performance and memory.
I agree that if you init a variable inside the scope that it's being used then you're helping the gc out, but I think the real reason is more to do with code maintenance best practices. It's sort of a way of reducing cognitive load on you or another developer coming back to the code after months (or years) of not looking at a particular block. Sure, IDE's help you discover things, but you still have to do the "go to definition" dance.
There is no performance benefits, I believe, but more of a coding style. Its more C programming style to declare it all at the beginning of the scope. There is more details here: Scope of variables in C#
Its a style personal preference thing to do with readability.
There are very few languages/systems where this will have any noticeable effect on performance.
I try to follow these two rules.
All the core attributes of a class should be defined together in one place. e.g. If you are handling an order then orderno, customerno, amount, sales tax etc. should be defined close together.
All the technical attributes which form part of the internal mechanics of the class such as iterators, flags, state varaibles should be defined close to thier usage.
Or to put it another business/external type data all defined in one place, technical/internal data defined close to usage.
The difference is a matter of coding style and one of such dispute that different coding standards have completely opposite rules. The conflict is still strongest in the C++ world where the C language forced variables to be declared at the beginning of a scope and so old-timers (like myself) were well accustomed to "looking at the beginning of the function" to find variables.
The C# style that you most often see is that variables come into existence right at the point where they are needed. This style limits the existence of the variable and minimizes the chance that you could mean some other variable accidentally. I find it very easy to read.
In the modern C# era, putting the declaration of variables at their first point of use is most clearly beneficial when combined with the both loved and hated var feature. Using var just isn't that useful unless you use it with an assignment that allows the compiler and readers to infer the type of the variable. The var feature encourages declaration with first use.
Me, I love var, and so you can guess which coding style I prefer!
I was always taught to declare your variables at the top of a function, class, etc. This makes it easier to read.
Ok so the title may have been confusing so i have posted 2 code snippets to illustrate what i mean.
NOTE: allUsers is just a collection.
RegularUser regUser = new RegularUser(userName, password, name, emailAddress);
allUsers.Add(regUser);
VS
allUsers.Add(new RegularUser(userName, password, name, emailAddress));
Which snippet A or B is better and why?
What are the advantages or disadvantages?
The example i wrote was C# but does the language (C#, Java etc) make a difference?
As far as C# is concerned, both of your code examples are practically identical at the IL level. The second examples still creates a reference to the created object and pushes it onto the stack, you just don't have a local variable hooked up to it. This will not create any performance problems at all.
1) Which snippet A or B is better and why?
They're really identical. The compiled code will be nearly identical, since a temporary object is pushed onto the stack, then used in the method call.
2) What are the advantages or disadvantages?
The main advantages and disadvantages to the approach are really just readability.
Your first example has the advantage of keeping a single "operation" per line of code, which, in many ways, is more maintainable.
The second example removes the unnecessary variable declaration, which may be more maintainable.
Personally, I feel that the number of parameters in your RegularUser constructor would probably push me, in this instance, towards your first option. I typically find that, when a line of code gets to be more than about half a screen width on an average monitor, it's easier to read and understand if it's split up. Splitting this up by introducing the temporary and calling Add separately makes this more clear.
However, if you're just adding an integer or a class that's very small, I'd probably vote to skip the unnecessary variable. This is completely a personal preference, however - your milage may (and probably will) vary.
3) The example i wrote was C# but does the language (C#, Java etc) make a difference?
No, for the most part. This is really language/implementation dependent, but most languages will have the same basic behavior and performance in both cases. It is possible (and highly likely) that some languages may treat this differently, but most mainstream languages will not.
I really like to create them the first way unless I really really know what is going on. It is much harder to do debugging if you don't create the object first...
The compiler will just turn the 2nd version into the 1st for you, anyway, so there isn't a net negative effect.
Pros of #1:
easier to debug (!)
theoretically easier to read, clearer
can use the object later
Cons:
more verbose
can be unnecessary, especially for trivial objects
Result:
1 for anything complex to create, or that may need to be inspected easily at debug time
2 for lots of annoying little stuff, like the following.
var list = new List<NameValuePair>(3);
list.Add( new NameValuePair("name", "valuable");
list.add( new NameValuePair("age", "valuable");
list.add( new NameValuePair("height", "not valuable");
var dates = new List<date>();
dates.Add(DateTime.Now());
dates.Add(DateTime.Now().Date().AddYears(-2));
As far as I know there isn't a real difference between languages when it comes to this. Some may not allow it, though.
Both are equal in terms of performance.
In terms of maintainability the second case is a nightmare, it is (nearly) impossible to trace in a debugger. So I tend to prefer the first one. In my early oop days I was always writing the second, because "I knew that they were objects and I was sooo good at grasping objects that I ... blah blah blah", but that wore off with time and especially maintenance time
Also, suppose that someone wants you to
FilterClass.FilterUser(regUser)
or
Database.AddToDatabase(regUser)
because it is the right place to do so, the first scenario is better.
Finally, when do you stop?
allUsers.Add(new RegularUser(new ReadFromInput(new EscapedName(new Name(new String(userName)))), password, name, emailAddress));
From a post-compilation perspective (rather than a coding syntax perspective), in C#, is there any actual difference in the compiled code between a set of operations that have occurred on one line to a set of operations that occur across multiple lines?
This
object anObject = new object();
anObject = this.FindName("rec"+keyPlayed.ToString());
Rectangle aRectangle = new Rectangle();
aRectangle = (Rectangle)anObject;
vs this.
Rectangle aRectangle = (Rectangle)this.FindName("rec"+keyPlayed.ToString());
I wonder because there seems to be a view that the least amount of lines used is better however I would like to understand if this is because there is a tangible technical benefit or if there was at some point a tangible benefit or if it is indeed for a reason that is quantifiable?
The number of lines don't matter; the IL will be identical if the code is equivalent (your's isn't).
And actually, unless we know what FindName returns, we can't answer properly - since by casting to object you might be introducing a "box" operation, and you might be changing a conversion operation (or perhaps a passive no-op cast) into an active double-cast (cast to object, cast to Rectangle). For now, I'll assume that FindName returns object, for simplicity. If you'd used var, we'd know at a glance that your code wasn't changing the type (box / cast / etc):
var anObject = this.FindName("rec"+keyPlayed.ToString());
In release mode (with optimize enabled) the compiler will remove most variables that are set and then used immediately. The biggest difference between the two lines above is that the second version doesn't create and discard new object() and new Rectangle(). But if you hadn't have done that, the code would have been equivalent (again, assuming that FindName returns object):
object anObject;
anObject = this.FindName("rec"+keyPlayed.ToString());
Rectangle aRectangle;
aRectangle = (Rectangle)anObject;
Some subtleties exist if you re-use the variable (in which case it can't necessarily be removed by the compiler), and if that variable is "captured" by a lambda/anon-method, or used in a ref/out. And some more subtleties for some math scenarios if the compiler/JIT chooses to do an operation purely in the registers without copying it back down to a variable (the registers have different (greater) width, even for "fixed-size" math like float).
I think that you should generally aim to make your code as readable as possible, and sometimes that means seperating out your code and sometimes it means having it on one line. Aim for readablity and if performance becomes a problem, use profiling tools to analyse the code and refactor it if necessary.
The compiled code may not have any difference (with optimization enabled perhaps), but think about readability too :)
In your example, everything on one line is actually more readable than separate lines. What you were trying to do was immediately obvious there. But others can quickly point out counter-examples. So use your good judgment to decide which way to go.
There's a refactoring pattern to prefer a call to a temporary variable. Following this pattern reduces the number of lines of code but makes interactive debugging harder.
One the main practical issues which differ between the two is, when debugging it can be useful to have the individual steps on different lines with results being passed to local variables.
This means that you can cleanly step through the different bits of code which give the final result and see the intervening values.
When you build optimized the compiler will remove the steps and make the code efficient.
Tony
With your example there is an actual difference, as you in the first piece of code are creating objects and values that you don't use.
The proper way to write that code is like this:
object anObject;
anObject = this.FindName("rec" + keyPlayed.ToString());
Rectangle aRectangle;
aRectangle = (Rectangle)anObject;
Now, the difference between that and the single line version is that you are declaring one more local variable. In most cases the compiler can optimize that so that the generated code is identical anyway, and even if it actually uses one more local variable in the generated code, that is still negligable compared to anything else you do in that code.
For this example I think that the single line version is clearer, but with more complicated code it can of course be clearer to split it into several stages. Local variables are very cheap, so you should not hesitate to use some if the code gets clearer.