Boolean vs memory - c#

We had a discussion at work about code design and one of the issues was when handling responses from a call to a boolean method like this:
bool ok = IsEverythingOK();
if(ok)
{
//do somehthing
}
One of my colleagues insists that we skip the extra variable ok and write
if(IsEverythingOK())
{
//do somehthing
}
Since he says that using the "bool ok"-statement is memorywise bad.
Which one is the one we should use?

Paraphrasing your question:
Is there a cost to using a local variable?
Because C# and .NET is well-engineered my expectation is that using a local variable as you describe has no or a negligible cost but let my try to back this expectation by some facts.
The following C# code
if (IsEverythingOk()) {
...
}
will compile into this (simplified) IL (with optimizations turned on)
call IsEverythingOk
brfalse.s AfterIfBody
... if body
Using a local variable
var ok = IsEverythingOk();
if (ok) {
...
}
you get this optimized (and simplified) IL:
call IsEverythingOk
stloc.0
ldloc.0
brfalse.s AfterIfBody
... if body
On the surface this seems slightly less efficient because the return value is stored on the stack and then retrived but the JIT compiler will also perform some optimizations.
You can see the actual machine code generated by debugging your application with native code debugging enabled. You have to do this using the release build and you also have to turn off the debugger option that supresses JIT optimizations on module load. Now you can put breakpoints in the code you want to inspect and then view the disassembly. Note, that the JIT is like a black box and the behavior I see on my computer may differ from what other people see on their computers. With that disclaimer in mind the assembly code I get for both versions of the code is (with a slight difference in how the call is performed):
call IsEverythingOk
test eax,eax
je AfterIfBody
So the JIT will optimize the extra unnecessary IL away. Actually, in my initial experiments the method IsEverythingOk returned true and the JIT was able to completely optimize the branch away. When I then switched to returning a field in the method the JIT would inline the call and access the field directly.
The bottom line: You should expect the JIT to optimize at least simple things like transient local variables even if the code generates some extra IL that seems unnecessary.

It all depends on whether you are doing anything with ok inside the loop.
i.e.
bool ok = IsEverythingOK();
if(ok)
{
//do somehthing
ok = IsEverythingOK();
}
Assuming you don't do anything with ok in the loop however, you will probably find that the JIT compiler will turn:
bool ok = IsEverythingOK();
if(ok)
{
//do somehthing
}
...into what is essentially:
if(IsEverythingOK())
{
//do somehthing
}
...anyway.

The compiler of course generates some additional IL code steps if you use the first solution as it at least needs an additional stloc and a ldloc command but if it's just for performance reasons, forget these microsecondes (or nanoseconds).
If there is no other reason for the ok variable I would prefer the second solution nevertheless as it is easier readable.

I'd say it's preference, unless you'd have a unified coding standard. One of the other provides a benefit.
This is great if you're expecting or assuming modification other than the if clause. Though it does create a stack entry upon creation of the variable, it'll prolly get disposed after the method scope.
bool ok = IsEverythingOK();
if(ok)
{
//do somehthing
}
This is great if you only want to use it as a validation one. Though it's only good if your method name is short. But lets say you access a class before you use it like _myLongNameInstance.IsEverythingOK() this reduces readability and I'll go with the first one, but with other conditions, i'd choose the direct if.
if(IsEverythingOK())
{
//do somehthing
}

Related

Entity Framework 6 - Enforce asynchronous queries, compile time prevent synchronous calls

With the move to EF6.1, our goal is to use exclusivity the Async/Await options speaking with our data sets. While porting from our previous Linq2Sql, there are many .ToList(), .FirstOrDefault(), and .Count()'s. I know we can search and fix those all, but it would be nice if we could, at compile time, prevent those functions from even being permitted into a build. Does anyone have have a creative idea on how to accomplish this? Even if they were compiler warnings that could be thrown (such as use of the Obsolete attribute).
You can use the .NET Compiler Platform to write a Diagnostic and Code Fix that will look for these patterns and provide warnings/errors.
You could even implement a Syntax Transformation to automatically change these constructs - though the effort might be more expensive than just doing it by hand.
Following up to this... i never found a solution that can detect this at compile time, but I was able to do this in code in the DataContext:
public EfMyCustomContext(string connctionString)
: base(string.Format(CONNECTION_STRING, connctionString))
{
#if DEBUG
this.Database.Log = LogDataBaseCall;
#endif
}
#if DEBUG
private void LogDataBaseCall(string s)
{
if (s.Contains("Executing "))
{
if (!s.Contains("asynchronously"))
{
// This code was not executed asynchronously
// Please look at the stack trace, and identify what needs
// to be loaded. Note, an entity.SomeOtherEntityOrCollection can be loaded
// with the eConnect API call entity.SomeOtherEntityOrCollectionLoadAsync() before using the
// object that is going to hit the sub object. This is the most common mistake
// and this breakpoint will help you identify all synchronous code.
// eConnect does not want any synchronous code in the code base.
System.Diagnostics.Debugger.Break();
}
}
}
#endif
Hope this helps someone else, and still would love if there was some option during compile.

Is removing unreachable code always safe?

Assume I have a tool that automatically removes C# code that is detected by the compiler as unreachable. Is there a situation in which such operation can get me into trouble? Please share interesting cases.
Here's the interesting example. Consider a function like this:
public static IEnumerable<int> Fun()
{
if (false)
{
yield return 0;
}
}
The line with yield is detected as unreachable. However, removing it will make the program not compilable. yield contained in the function gives the compiler information to reorganize the function so that not doing anything just returns the empty collection. With yield line removed it looks just like ordinary function with no return while it's required.
As in the comment, the example is contrived, however instead of false we could have a constant value from other, generated project etc (i.e. such piece of code wouldn't look so obvious as in this case).
Edit:
Note that the yield construct is actually very similar to async/await. Yet with the latter creators of the language took a different (IMO better) approach that prevents such scenarios. Iterator blocks could be defined in the same way (using some keyword in function signature instead of detecting it from the function body).
I wouldn't do this automatically, for reasons mentioned in other answers, but I will say here that I'd have a strong bias towards removing unused code over keeping it. After all, tracking obsolete code is what source control is for.
This example is contrived, but this will get flagged for removal (in default DEBUG settings) and produce different behaviors when removed.
public class Baz { }
public class Foo
{
public void Bar()
{
if (false)
{
// Will be flagged as unreachable code
var baz = new Baz();
}
var true_in_debug_false_in_release =
GetType()
.GetMethod("Bar")
.GetMethodBody()
.LocalVariables
.Any(x => x.LocalType == typeof(Baz));
Console.WriteLine(true_in_debug_false_in_release);
}
}
In Release mode (with default settings), the "unreachable code" will optimized away and produce the same result as if you deleted the if block in DEBUG mode.
Unlike the example using yield, this code compiles regardless of whether or not the unreachable code is removed.
Further to Dan Bryant's answer, here's an example of a program whose behaviour will be altered by a tool that's smarter than the C# compiler in finding and removing unreachable code:
using System;
class Program
{
static bool tru = true;
static void Main(string[] args)
{
var x = new Destructive();
while (tru)
{
GC.Collect(2);
}
GC.KeepAlive(x); // unreachable yet has an effect on program output
}
}
class Destructive
{
~Destructive()
{
Console.WriteLine("Blah");
}
}
The C# compiler does not try very hard to prove that GC.KeepAlive is unreachable, so it doesn't eliminate it in this case. As a result, the program loops forever without printing anything.
If a tool proves that it's actually unreachable (fairly easy in this example) and removes it, the program behaviour is changed. It will print "Blah" straight away and then loop forever. So it has become a different program. Try it if you have doubts; just comment out that unreachable line and see the behaviour change.
If GC.KeepAlive was there for a reason then this change would, in fact, be unsafe, and will make the program misbehave at some point (probably just crash).
One rare boundary case is if the unreachable code contains a GC.KeepAlive call. This very rarely comes up (as it is related to particular hacky use cases of unmanaged/managed interop), but if you are interoperating with unmanaged code that requires this, removing it could cause intermittent failures if you're unlucky enough to have GC trigger at just the wrong moment.
UPDATE:
I've tested this and I can confirm that Servy is correct; the GC.KeepAlive call does not take effect due to the fact that the compiler proves it can never actually use the reference. This is not because the method is never executed (the method doesn't have to execute it for it impact the GC behavior), but because the compiler ignores the GC.KeepAlive when it is provably not reachable.
I'm leaving this answer up because it's still interesting for the counter case. This is a mechanism for breaking a program if you modify your code to make it unreachable, but don't move the GC.KeepAlive to make sure it still keeps the reference alive.

Reflection.Emit.ILGenerator Exception Handling "Leave" instruction

First, some background info:
I am making a compiler for a school project. It is already working, and I'm expending a lot of effort to bug fix and/or optimize it. I've recently run into a problem with is that I discovered that the ILGenerator object generates an extra leave instruction when you call any of the following member methods:
BeginCatchBlock()
BeginExceptFilterBlock()
BeginFaultBlock()
BeginFinallyBlock()
EndExceptionBlock()
So, you start a try statement with a call to BeginExceptionBlock(), add a couple of catch clauses with BeginCatchBlock(), possibly add a finally clause with BeginFinallyBlock(), and then end the protected code region with EndExceptionBlock().
The methods I listed automatically generate a leave instruction branching to the first instruction after the try statement. I don't want these, for two reasons. One, because it always generates an unoptimized leave instruction, rather than a leave.s instruction, even when it's branching just two bytes away. And two, because you can't control where the leave instruction goes.
So, if you wanted to branch to some other location in your code, you have to add a compiler-generated local variable, set it depending on where you want to go inside of the try statement, let EndExceptionBlock() auto-generate the leave instruction, and then generate a switch statement below the try block. OR, you could just emit a leave or leave.s instruction yourself, before calling one of the previous methods, resulting in an ugly and unreachable extra 5 bytes, like so:
L_00ca: leave.s L_00e5
L_00cc: leave L_00d1
Both of these options are unacceptable to me. Is there any way to either prevent the automatic generation of leave instructions, or else any other way to specify protected regions rather than using these methods (which are extremely annoying and practically undocumented)?
EDIT
Note: the C# compiler itself does this, so it's not as if there is a good reason to force it on us. For example, if you have .NET 4.5 beta, disassemble the following code and check their implementation: (exception block added internally)
public static async Task<bool> TestAsync(int ms)
{
var local = ms / 1000;
Console.WriteLine("In async call, before await " + local.ToString() + "-second delay.");
await System.Threading.Tasks.Task.Delay(ms);
Console.WriteLine("In async call, after await " + local.ToString() + "-second delay.");
Console.WriteLine();
Console.WriteLine("Press any key to continue.");
Console.ReadKey(false);
return true;
}
As far as I can tell, you cannot do this in .NET 4.0. The only way to create a method body without using ILGenerator is by using MethodBuilder.CreateMethodBody, but that does not allow you to set exception handling info. And ILGenerator forces the leave instruction you're asking about.
However, if .NET 4.5 is an option for you (it seems to be), take a look at MethodBuilder.SetMethodBody. This allows you to create the IL yourself, but still pass through exception handling information. You can wrap this in a custom ILGenerator-like class of your own, with Emit methods taking an OpCode argument, and reading OpCode.Size and OpCode.Value to get the corresponding bytes.
And of course there's always Mono.Cecil, but that probably requires more extensive changes to code you've already written.
Edit: you appear to have already figured this out yourself, but you left this question open. You can post answers to your own questions and accept them, if you've figured it out on your own. This would have let me know I shouldn't have wasted time searching, and which would have let other people with the same question know what to do.

Does adding return statements for C# methods improve performance?

This blog says
12) Include Return Statements with in the Function/Method.
How it improves performance
Explicitly using return allows the JIT to perform slightly more optimizations. Without a return statement, each function/method is given several local variables on stack to transparently support returning values without the keyword. Keeping these around makes it harder for the JIT to optimize, and can impact the performance of your code. Look through your functions/methods and insert return as needed. It doesn't change the semantics of the code at all, and it can help you get more speed from your application.
I'm fairly sure that this is a false statement. But wanted to get the opinion experts out there. What do you guys think?
This statement does not apply to C#.
With C# you must explicitly set a "return" to have a valid function, without a return, you get a compile error to the effect of "not all code paths return a value".
With VB.NET this would apply as VB.NET does NOT have the requirement for an explicit return, and allows you to have functions that never return a value, as well as allow you to set the return using the name of the function.
To provide an example
In VB.NET you can do this
Function myFunction() As String
myFunction = "MyValue"
End Function
Function myFunction2() As String
'Your code here
End Function
The above compiles, neither with an explicit "returns", there is more overhead involved in this.
If you try to do this with C#
string myFunction()
{
//Error due to "Cannot assign to 'myFunction' because it is a 'Method Group'
myFunction = "test";
}
string myFunction2()
{
//Error due to "not all code paths return a value
}
My comments note the errors that you get.
The post is kind of vague. Being a C# developer, my first thought was "as opposed to what?". However, he could be referring to something like:
public bool MyFunction()
{
bool result = false;
if (someCondition == true)
{
// Do some processing
result = true;
}
else if (someOtherCondition == true)
{
// Do some processing
result = true;
}
// ... keep going
return result;
}
He may be suggesting that replacing the result = true; statements with return true; may perform better. I'm not sure about that personally... that's pretty deep into JIT theory at that point, and I think any gains that are made would be very minor compared to other performance improvements that you could make.
I'd disagree - I think a single in and single out of every method makes code much easier to read and debug. Multiple return statements in a function can make navigating code more comlpex. In fact (if possible to refactor) its better to have more smaller functions than say larger functions with multiple exits.
It is somewhat true, both for VB.NET and C#. In C# the programmer has to declare the variable that holds the return value explicitly, it is automatic in VB.NET. Most return values are returned in the EAX or RAX register, the JIT compiler has to generate code to load the register from the return value variable before the function exits. When you use the return statement, the JIT compiler might have the opportunity to load the EAX register directly, or already have the register containing the correct value, and jump to the function exit code, bypassing the load-from-variable instruction.
That's a pretty big "might" btw, real code invariably tests some expression with the if() statement. Evaluating that expression almost always involves using the EAX register, it still has to be reloaded with the return value. The x64 JIT compiler does a completely different job doing that compared to the x86 compiler, the latter always seems to use the variable in a few spot checks I did. So you're not likely to be ahead unless you run on a 64-bit version of Windows.
Of all the evil in premature optimization, this one is arguably the worst. The potential time savings are minuscule, write your code for clarity first. Profile later.
My only guess here is that he's talking about VB.NET not C#. VB.NET allows you to somethin like this to return values
Public Function GetSomething() As Int
GetSomething = 4
End Function
My VB is incredibly out of date though. This may be slower that using an explicit return statement
Generally there are 2 spots in which I exit a function.
At the very beginning of my methods to validate incoming data:
if (myParameter == null)
throw new ArgumentNullException("myParameter");
And/or at the very end of the method.
private bool GetSomeValue()
{
bool returnValue = false;
// some code here
if (some condition)
{
returnValue = some expression
}
else
{
returnValue = some other expression;
}
return returnValue;
}
The reason I don't return inside of the conditional, is so that there is 1 exit point of the function, it helps with debugging. No one wants to have to maintain a method with 12 return statements in it. But that is just a personal opinion of mine. I would err on the side of readability, and not worry about optimization unless you're dealing with a real-time must-go-faster situation.

How do I write (test) code that will not be optimized by the compiler/JIT?

I don't really know much about the internals of compiler and JIT optimizations, but I usually try to use "common sense" to guess what could be optimized and what couldn't. So there I was writing a simple unit test method today:
#Test // [Test] in C#
public void testDefaultConstructor() {
new MyObject();
}
This method is actually all I need. It checks that the default constructor exists and runs without exceptions.
But then I started to think about the effect of compiler/JIT optimizations. Could the compiler/JIT optimize this method by eliminating the new MyObject(); statement completely? Of course, it would need to determine that the call graph does not have side effects to other objects, which is the typical case for a normal constructor that simply initializes the internal state of the object.
I presume that only the JIT would be allowed to perform such an optimization. This probably means that it's not something I should worry about, because the test method is being performed only once. Are my assumptions correct?
Nevertheless, I'm trying to think about the general subject. When I thought about how to prevent this method from being optimized, I thought I may assertTrue(new MyObject().toString() != null), but this is very dependent on the actual implementation of the toString() method, and even then, the JIT can determine that toString() method always returns a non-null string (e.g. if actually Object.toString() is being called), and thus optimize the whole branch. So this way wouldn't work.
I know that in C# I can use [MethodImpl(MethodImplOptions.NoOptimization)], but this is not what I'm actually looking for. I'm hoping to find a (language-independent) way of making sure that some specific part(s) of my code will actually run as I expect, without the JIT interfering in this process.
Additionally, are there any typical optimization cases I should be aware of when creating my unit tests?
Thanks a lot!
Don't worry about it. It's not allowed to ever optimize anything that can make a difference to your system (except for speed). If you new an object, code gets called, memory gets allocated, it HAS to work.
If you had it protected by an if(false), where false is a final, it could be optimized out of the system completely, then it could detect that the method doesn't do anything and optimize IT out (in theory).
Edit: by the way, it can also be smart enough to determine that this method:
newIfTrue(boolean b) {
if(b)
new ThisClass();
}
will always do nothing if b is false, and eventually figure out that at one point in your code B is always false and compile this routine out of that code completely.
This is where the JIT can do stuff that's virtually impossible in any non-managed language.
I think if you are worried about it getting optimized away, you may be doing a bit of testing overkill.
In a static language, I tend to think of the compiler as a test. If it passes compilation, that means that certain things are there (like methods). If you don't have another test that exercises your default constructor (which will prove it wont throw exceptions), you may want to think about why you are writing that default constructor in the first place (YAGNI and all that).
I know there are people that don't agree with me, but I feel like this sort of thing is just something that will bloat out your number of tests for no useful reason, even looking at it through TDD goggles.
Think about it this way:
Lets assume that compiler can determine that the call graph doesn't have any side effects(I don't think it is possible, I vaguely remember something about P=NP from my CS courses). It will optimize any method that doesn't have side effects. Since most tests don't have and shouldn't have any side effects then compiler can optimize them all away.
The JIT is only allowed to perform operations that do not affect the guaranteed semantics of the language. Theoretically, it could remove the allocation and call to the MyObject constructor if it can guarantee that the call has no side effects and can never throw an exception (not counting OutOfMemoryError).
In other words, if the JIT optimizes the call out of your test, then your test would have passed anyway.
PS: Note that this applies because you are doing functionality testing as opposed to performance testing. In performance testing, it's important to make sure the JIT does not optimize away the operation you are measuring, else your results become useless.
It seems that in C# I could do this:
[Test]
public void testDefaultConstructor() {
GC.KeepAlive(new MyObject());
}
AFAIU, the GC.KeepAlive method will not be inlined by the JIT, so the code will be guaranteed to work as expected. However, I don't know a similar construct in Java.
Every I/O is a side effect, so you can just put
Object obj = new MyObject();
System.out.println(obj.toString());
and you're fine.
Why should it matter? If the compiler/JIT can statically determine no asserts are going to be hit (which could cause side effects), then you're fine.

Categories