Assume I have a tool that automatically removes C# code that is detected by the compiler as unreachable. Is there a situation in which such operation can get me into trouble? Please share interesting cases.
Here's the interesting example. Consider a function like this:
public static IEnumerable<int> Fun()
{
if (false)
{
yield return 0;
}
}
The line with yield is detected as unreachable. However, removing it will make the program not compilable. yield contained in the function gives the compiler information to reorganize the function so that not doing anything just returns the empty collection. With yield line removed it looks just like ordinary function with no return while it's required.
As in the comment, the example is contrived, however instead of false we could have a constant value from other, generated project etc (i.e. such piece of code wouldn't look so obvious as in this case).
Edit:
Note that the yield construct is actually very similar to async/await. Yet with the latter creators of the language took a different (IMO better) approach that prevents such scenarios. Iterator blocks could be defined in the same way (using some keyword in function signature instead of detecting it from the function body).
I wouldn't do this automatically, for reasons mentioned in other answers, but I will say here that I'd have a strong bias towards removing unused code over keeping it. After all, tracking obsolete code is what source control is for.
This example is contrived, but this will get flagged for removal (in default DEBUG settings) and produce different behaviors when removed.
public class Baz { }
public class Foo
{
public void Bar()
{
if (false)
{
// Will be flagged as unreachable code
var baz = new Baz();
}
var true_in_debug_false_in_release =
GetType()
.GetMethod("Bar")
.GetMethodBody()
.LocalVariables
.Any(x => x.LocalType == typeof(Baz));
Console.WriteLine(true_in_debug_false_in_release);
}
}
In Release mode (with default settings), the "unreachable code" will optimized away and produce the same result as if you deleted the if block in DEBUG mode.
Unlike the example using yield, this code compiles regardless of whether or not the unreachable code is removed.
Further to Dan Bryant's answer, here's an example of a program whose behaviour will be altered by a tool that's smarter than the C# compiler in finding and removing unreachable code:
using System;
class Program
{
static bool tru = true;
static void Main(string[] args)
{
var x = new Destructive();
while (tru)
{
GC.Collect(2);
}
GC.KeepAlive(x); // unreachable yet has an effect on program output
}
}
class Destructive
{
~Destructive()
{
Console.WriteLine("Blah");
}
}
The C# compiler does not try very hard to prove that GC.KeepAlive is unreachable, so it doesn't eliminate it in this case. As a result, the program loops forever without printing anything.
If a tool proves that it's actually unreachable (fairly easy in this example) and removes it, the program behaviour is changed. It will print "Blah" straight away and then loop forever. So it has become a different program. Try it if you have doubts; just comment out that unreachable line and see the behaviour change.
If GC.KeepAlive was there for a reason then this change would, in fact, be unsafe, and will make the program misbehave at some point (probably just crash).
One rare boundary case is if the unreachable code contains a GC.KeepAlive call. This very rarely comes up (as it is related to particular hacky use cases of unmanaged/managed interop), but if you are interoperating with unmanaged code that requires this, removing it could cause intermittent failures if you're unlucky enough to have GC trigger at just the wrong moment.
UPDATE:
I've tested this and I can confirm that Servy is correct; the GC.KeepAlive call does not take effect due to the fact that the compiler proves it can never actually use the reference. This is not because the method is never executed (the method doesn't have to execute it for it impact the GC behavior), but because the compiler ignores the GC.KeepAlive when it is provably not reachable.
I'm leaving this answer up because it's still interesting for the counter case. This is a mechanism for breaking a program if you modify your code to make it unreachable, but don't move the GC.KeepAlive to make sure it still keeps the reference alive.
Related
A legacy app is in an endless loop at startup; I don't know why/how yet (code obfuscation contest candidate), but regarding the method that's being called over and over (which is called from several other methods), I thought, "I wonder if one of the methods that calls this is also calling another method that also calls it?"
I thought: "Nah, the compiler would be able to figure that out, and not allow it, or at least emit a warning!"
So I created a simple app to prove that would be the case:
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
method1();
}
private void button2_Click(object sender, EventArgs e)
{
method2();
}
private void method1()
{
MessageBox.Show("method1 called, which will now call method2");
method2();
}
private void method2()
{
MessageBox.Show("method2 called, which will now call method1");
// Note to self: Write an article entitled, "Copy-and-Paste Considered Harmful"
method1();
}
}
...but no! It compiles just fine. Why wouldn't the compiler flag this code as questionable at best? If either button is mashed, you are in never-never land!
Okay, sometimes you may want an endless loop (pacemaker code, etc.), but still I think a warning should be emitted.
As you said sometimes people want infinite loops. And the jit-compiler of .net supports tailcall optimization, so you might not even get a stack overflow for endless recursion like you did it.
For the general case, predicting whether or not a program is going to terminate at some point or stuck in an infinite loop is impossible in finite time. It's called the halting problem. All a compiler can possibly find are some special cases, where it is easy to decide.
That's not an endless loop, but an endless recursion. And this is much worse, since they can lead to a stack overflow. Endless recursions are not desired in most languages, unless you are programming malware. Endless loops, however, are often intentional. Services typically run in endless loops.
In order to detect this kind of situation, the compiler would have to analyze the code by following the method calls; however the C# compiler limits this process to the immediate code within the current method. Here, uninitialized or unused variables can be tracked and unreachable code can be detected, for instance. There is a tradeoff to make between the compiling speed and the depth of static analysis and optimizations.
Also it is hardly possible to know the real intention of the programmer.
Imagine that you wrote a method that is perfectly legal. Suddenly because you are calling this method from another place, your compiler complains and tells you that your method is no more legal. I can already see the flood of posts on SO like: "My method compiled yesterday. Today it does not compile any more. But I didn't change it".
To put it very simply: it's not the compiler's job to question your coding patterns.
You could very well write a Main method that does nothing but throw an Exception. It's a far easier pattern to detect and a much more stupid thing to do; yet the compiler will happily allow your program to compile, run, crash and burn.
With that being said, since technically an endless loop / recursion is perfectly legal as far as the compiler is concerned, there's no reason why it should complain about it.
Actually, it would be very hard to figure out at compile time that the loop can't ever be broken at runtime. An exception could be thrown, user interaction could happen, a state might change somewhere on a specific thread, on a port you are monitoring, etc... there's way too much possibilities for any code analysis tool out there to establish, without any doubt, that a specific recursing code segment will inevitably cause an overflow at runtime.
I think the right way to prevent these situations is through unit testing organization. The more code paths you are covering in your tests, the less likely you are to ever face such a scenario.
Because its nearly impossible to detect!
In the example you gave, it is obvious (to us) that the code will loop forever. But the compiler just sees a function call, it doesn't necessarily know at the time what calls that function, what conditional logic could change the looping behavior etc.
For example, with this slight change you aren't in an infinite loop anymore:
private bool method1called = false;
private void method1()
{
MessageBox.Show("method1 called, which will now call method2");
if (!method1called)
method2();
method1called = true;
}
private void method2()
{
MessageBox.Show("method2 called, which will now call method1");
method1();
}
Without actually running the program, how would you know that it isn't looping? I could potentially see a warning for while (true), but that has enough valid use cases that it also makes sense to not put a warning in for it.
A compiler is just parsing the code and translating to IL (for .NET anyways). You can get limited information like variables not being assigned while doing that (especially since it has to generate the symbol table anyways) but advanced detection like this is generally left to code analysis tools.
I found this on the Infinite Loop Wiki found here: http://en.wikipedia.org/wiki/Infinite_loop#Intentional_looping
There are a few situations when this is desired behavior. For example, the games on cartridge-based game consoles typically have no exit condition in their main loop, as there is no operating system for the program to exit to; the loop runs until the console is powered off.
Antique punchcard-reading unit record equipment would literally halt once a card processing task was completed, since there was no need for the hardware to continue operating, until a new stack of program cards were loaded.
By contrast, modern interactive computers require that the computer constantly be monitoring for user input or device activity, so at some fundamental level there is an infinite processing idle loop that must continue until the device is turned off or reset. In the Apollo Guidance Computer, for example, this outer loop was contained in the Exec program, and if the computer had absolutely no other work to do it would loop running a dummy job that would simply turn off the "computer activity" indicator light.
Modern computers also typically do not halt the processor or motherboard circuit-driving clocks when they crash. Instead they fall back to an error condition displaying messages to the operator, and enter an infinite loop waiting for the user to either respond to a prompt to continue, or to reset the device.
Hope this helps.
I'm starting to use Code Contracts, and whilst Contract.Requires is pretty straight forward, I'm having trouble seeing what Ensures actually does.
I've tried creating a simple method like this:
static void Main()
{
DoSomething();
}
private static void DoSomething()
{
Contract.Ensures(false, "wrong");
Console.WriteLine("Something");
}
I never see the message "wrong" though, nor does it throw exceptions or anything else.
So what does it actually do ?
It's odd for it to not throw anything - if you're running the rewriter tool with the appropriate settings. My guess is that you're running in a mode which doesn't check postconditions.
The confusing thing about Contract.Ensures is that you write it at the start of the method, but it executes at the end of the method. The rewriter does all the magic to make sure it executes appropriately, and is given the return value if necessary.
Like many things about Code Contracts, I think it's best to run Reflector on the results of the rewriter tool. Make sure you've got the settings right, then work out what the rewriter has done.
EDIT: I realise I haven't expressed the point of Contact.Ensures yet. Simply put, it's to ensure that your method has done something by the end - for example, it could ensure that it's added something to a list, or (more likely) that the return value is non-null, or positive or whatever. For example, you might have:
public int IncrementByRandomAmount(int input)
{
// We can't do anything if we're given int.MaxValue
Contract.Requires(input < int.MaxValue);
Contract.Ensures(Contract.Result<int>() > input);
// Do stuff here to compute output
return output;
}
In the rewritten code, there will be a check at the point of return to ensure that the returned value really is greater than the input.
I have a piece of software written with fluent syntax. The method chain has a definitive "ending", before which nothing useful is actually done in the code (think NBuilder, or Linq-to-SQL's query generation not actually hitting the database until we iterate over our objects with, say, ToList()).
The problem I am having is there is confusion among other developers about proper usage of the code. They are neglecting to call the "ending" method (thus never actually "doing anything")!
I am interested in enforcing the usage of the return value of some of my methods so that we can never "end the chain" without calling that "Finalize()" or "Save()" method that actually does the work.
Consider the following code:
//The "factory" class the user will be dealing with
public class FluentClass
{
//The entry point for this software
public IntermediateClass<T> Init<T>()
{
return new IntermediateClass<T>();
}
}
//The class that actually does the work
public class IntermediateClass<T>
{
private List<T> _values;
//The user cannot call this constructor
internal IntermediateClass<T>()
{
_values = new List<T>();
}
//Once generated, they can call "setup" methods such as this
public IntermediateClass<T> With(T value)
{
var instance = new IntermediateClass<T>() { _values = _values };
instance._values.Add(value);
return instance;
}
//Picture "lazy loading" - you have to call this method to
//actually do anything worthwhile
public void Save()
{
var itemCount = _values.Count();
. . . //save to database, write a log, do some real work
}
}
As you can see, proper usage of this code would be something like:
new FluentClass().Init<int>().With(-1).With(300).With(42).Save();
The problem is that people are using it this way (thinking it achieves the same as the above):
new FluentClass().Init<int>().With(-1).With(300).With(42);
So pervasive is this problem that, with entirely good intentions, another developer once actually changed the name of the "Init" method to indicate that THAT method was doing the "real work" of the software.
Logic errors like these are very difficult to spot, and, of course, it compiles, because it is perfectly acceptable to call a method with a return value and just "pretend" it returns void. Visual Studio doesn't care if you do this; your software will still compile and run (although in some cases I believe it throws a warning). This is a great feature to have, of course. Imagine a simple "InsertToDatabase" method that returns the ID of the new row as an integer - it is easy to see that there are some cases where we need that ID, and some cases where we could do without it.
In the case of this piece of software, there is definitively never any reason to eschew that "Save" function at the end of the method chain. It is a very specialized utility, and the only gain comes from the final step.
I want somebody's software to fail at the compiler level if they call "With()" and not "Save()".
It seems like an impossible task by traditional means - but that's why I come to you guys. Is there an Attribute I can use to prevent a method from being "cast to void" or some such?
Note: The alternate way of achieving this goal that has already been suggested to me is writing a suite of unit tests to enforce this rule, and using something like http://www.testdriven.net to bind them to the compiler. This is an acceptable solution, but I am hoping for something more elegant.
I don't know of a way to enforce this at a compiler level. It's often requested for objects which implement IDisposable as well, but isn't really enforceable.
One potential option which can help, however, is to set up your class, in DEBUG only, to have a finalizer that logs/throws/etc. if Save() was never called. This can help you discover these runtime problems while debugging instead of relying on searching the code, etc.
However, make sure that, in release mode, this is not used, as it will incur a performance overhead since the addition of an unnecessary finalizer is very bad on GC performance.
You could require specific methods to use a callback like so:
new FluentClass().Init<int>(x =>
{
x.Save(y =>
{
y.With(-1),
y.With(300)
});
});
The with method returns some specific object, and the only way to get that object is by calling x.Save(), which itself has a callback that lets you set up your indeterminate number of with statements. So the init takes something like this:
public T Init<T>(Func<MyInitInputType, MySaveResultType> initSetup)
I can think of three a few solutions, not ideal.
AIUI what you want is a function which is called when the temporary variable goes out of scope (as in, when it becomes available for garbage collection, but will probably not be garbage collected for some time yet). (See: The difference between a destructor and a finalizer?) This hypothetical function would say "if you've constructed a query in this object but not called save, produce an error". C++/CLI calls this RAII, and in C++/CLI there is a concept of a "destructor" when the object isn't used any more, and a "finaliser" which is called when it's finally garbage collected. Very confusingly, C# has only a so-called destructor, but this is only called by the garbage collector (it would be valid for the framework to call it earlier, as if it were partially cleaning the object immediately, but AFAIK it doesn't do anything like that). So what you would like is a C++/CLI destructor. Unfortunately, AIUI this maps onto the concept of IDisposable, which exposes a dispose() method which can be called when a C++/CLI destructor would be called, or when the C# destructor is called -- but AIUI you still have to call "dispose" manually, which defeats the point?
Refactor the interface slightly to convey the concept more accurately. Call the init function something like "prepareQuery" or "AAA" or "initRememberToCallSaveOrThisWontDoAnything". (The last is an exaggeration, but it might be necessary to make the point).
This is more of a social problem than a technical problem. The interface should make it easy to do the right thing, but programmers do have to know how to use code! Get all the programmers together. Explain simply once-and-for-all this simple fact. If necessary have them all sign a piece of paper saying they understand, and if they wilfully continue to write code which doesn't do anythign they're worse than useless to the company and will be fired.
Fiddle with the way the operators are chained, eg. have each of the intermediateClass functions assemble an aggregate intermediateclass object containing all of the parameters (you mostly do it this was already (?)) but require an init-like function of the original class to take that as an argument, rather than have them chained after it, and then you can have save and the other functions return two different class types (with essentially the same contents), and have init only accept a class of the correct type.
The fact that it's still a problem suggests that either your coworkers need a helpful reminder, or they're rather sub-par, or the interface wasn't very clear (perhaps its perfectly good, but the author didn't realise it wouldn't be clear if you only used it in passing rather than getting to know it), or you yourself have misunderstood the situation. A technical solution would be good, but you should probably think about why the problem occurred and how to communicate more clearly, probably asking someone senior's input.
After great deliberation and trial and error, it turns out that throwing an exception from the Finalize() method was not going to work for me. Apparently, you simply can't do that; the exception gets eaten up, because garbage collection operates non-deterministically. I was unable to get the software to call Dispose() automatically from the destructor either. Jack V.'s comment explains this well; here was the link he posted, for redundancy/emphasis:
The difference between a destructor and a finalizer?
Changing the syntax to use a callback was a clever way to make the behavior foolproof, but the agreed-upon syntax was fixed, and I had to work with it. Our company is all about fluent method chains. I was also a fan of the "out parameter" solution to be honest, but again, the bottom line is the method signatures simply could not change.
Helpful information about my particular problem includes the fact that my software is only ever to be run as part of a suite of unit tests - so efficiency is not a problem.
What I ended up doing was use Mono.Cecil to Reflect upon the Calling Assembly (the code calling into my software). Note that System.Reflection was insufficient for my purposes, because it cannot pinpoint method references, but I still needed(?) to use it to get the "calling assembly" itself (Mono.Cecil remains underdocumented, so it's possible I just need to get more familiar with it in order to do away with System.Reflection altogether; that remains to be seen....)
I placed the Mono.Cecil code in the Init() method, and the structure now looks something like:
public IntermediateClass<T> Init<T>()
{
ValidateUsage(Assembly.GetCallingAssembly());
return new IntermediateClass<T>();
}
void ValidateUsage(Assembly assembly)
{
// 1) Use Mono.Cecil to inspect the codebase inside the assembly
var assemblyLocation = assembly.CodeBase.Replace("file:///", "");
var monoCecilAssembly = AssemblyFactory.GetAssembly(assemblyLocation);
// 2) Retrieve the list of Instructions in the calling method
var methods = monoCecilAssembly.Modules...Types...Methods...Instructions
// (It's a little more complicated than that...
// if anybody would like more specific information on how I got this,
// let me know... I just didn't want to clutter up this post)
// 3) Those instructions refer to OpCodes and Operands....
// Defining "invalid method" as a method that calls "Init" but not "Save"
var methodCallingInit = method.Body.Instructions.Any
(instruction => instruction.OpCode.Name.Equals("callvirt")
&& instruction.Operand is IMethodReference
&& instruction.Operand.ToString.Equals(INITMETHODSIGNATURE);
var methodNotCallingSave = !method.Body.Instructions.Any
(instruction => instruction.OpCode.Name.Equals("callvirt")
&& instruction.Operand is IMethodReference
&& instruction.Operand.ToString.Equals(SAVEMETHODSIGNATURE);
var methodInvalid = methodCallingInit && methodNotCallingSave;
// Note: this is partially pseudocode;
// It doesn't 100% faithfully represent either Mono.Cecil's syntax or my own
// There are actually a lot of annoying casts involved, omitted for sanity
// 4) Obviously, if the method is invalid, throw
if (methodInvalid)
{
throw new Exception(String.Format("Bad developer! BAD! {0}", method.Name));
}
}
Trust me, the actual code is even uglier looking than my pseudocode.... :-)
But Mono.Cecil just might be my new favorite toy.
I now have a method that refuses to be run its main body unless the calling code "promises" to also call a second method afterwards. It's like a strange kind of code contract. I'm actually thinking about making this generic and reusable. Would any of you have a use for such a thing? Say, if it were an attribute?
What if you made it so Init and With don't return objects of type FluentClass? Have them return, e.g., UninitializedFluentClass which wraps a FluentClass object. Then calling .Save(0 on the UnitializedFluentClass object calls it on the wrapped FluentClass object and returns it. If they don't call Save they don't get a FluentClass object.
In Debug mode beside implementing IDisposable you can setup a timer that will throw a exception after 1 second if the resultmethod has not been called.
Use an out parameter! All the outs must be used.
Edit: I am not sure of it will help, tho...
It would break the fluent syntax.
Does C# have a way to temporarily change the value of a variable in a specific scope and revert it back automatically at the end of the scope/block?
For instance (not real code):
bool UpdateOnInput = true;
using (UpdateOnInput = false)
{
//Doing my changes without notifying anyone
Console.WriteLine (UpdateOnInput) // prints false
}
//UpdateOnInput is true again.
EDIT:
The reason I want the above is because I don't want to do this:
UpdateOnInput = false
//Doing my changes without notifying anyone
Console.WriteLine (UpdateOnInput) // prints false
UpdateOnInput = true
No, there's no way to do this directly. There are a few different schools of thought on how to do this sort of thing. Compare and contrast these two:
originalState = GetState();
SetState(newState);
DoSomething();
SetState(originalState);
vs
originalState = GetState();
SetState(newState);
try
{
DoSomething();
}
finally
{
SetState(originalState);
}
Many people will tell you that the latter is "safer".
It ain't necessarily so.
The difference between the two is of course the the latter restores the state even if DoSomething() throws an exception. Is that better than keeping the state mutated in an exception scenario? What makes it better? You have an unexpected, unhandled exception reporting that something awful and unexpected has happened. Your internal state could be completely inconsistent and arbitrarily messed up; no one knows what might have been happening at the point of the exception. All we know is that DoSomething probably was trying to do something to the mutated state.
Is it really the right thing to do in the scenario where something terrible and unknown has happened to keep on stirring that particular pot and trying to mutate the state that just caused an exception again?
Sometimes that is going to be the right thing to do, and sometimes its going to make matters worse. Which scenario you're actually in depends on what exactly the code is doing, so think carefully about what the right thing to do is before just blindly choosing one or the other.
Frankly, I would rather solve the problem by not getting into the situation in the first place. Our existing compiler design uses this design pattern, and frankly, it is freakin' irritating. In the existing C# compiler the error reporting mechanism is "side effecting". That is, when part of the compiler gets an error, it calls the error reporting mechanism which then displays the error to the user.
This is a major problem for lambda binding. If you have:
void M(Func<int, int> f) {}
void M(Func<string, int> f) {}
...
M(x=>x.Length);
then the way this works is we try to bind
M((int x)=>{return x.Length;});
and
M((string x)=>{return x.Length;});
and we see which one, if any, gives us an error. In this case, the former gives an error, the latter compiles without error, so this is a legal lambda conversion and overload resolution succeeds. What do we do with the error? We cannot report it to the user because this program is error free!
Therefore what we do when we bind the body of a lambda is exactly what you say: we tell the error reporter "don't report your errors to the user; save them in this buffer over here instead". Then we bind the lambda, restore the error reporter to its earlier state, and look at the contents of the error buffer.
We could avoid this problem entirely by changing the expression analyzer so that it returned the errors along with the result, rather than making errors a state-related side effect. Then the need for mutation of the error reporting state goes away entirely and we don't even have to worry about it.
So I would encourage you to revisit your design. Is there a way you can make the operation you are performing not dependent upon the state you are mutating? If so, then do that, and then you don't need to worry about how to restore the mutated state.
(And of course in our case we do want to restore the state upon an exception. If something inside the compiler throws during lambda binding, we want to be able to report that to the user! We don't want the error reporter to stay in the "suppress reporting errors" state.)
No, but it is pretty simple to just do this:
bool UpdateOnTrue = true;
// ....
bool temp = UpdateOnTrue;
try
{
UpdateOnTrue = false;
// do stuff
}
finally
{
UpdateOnTrue = temp;
}
Try:
public void WithAssignment<T>(ref T var, T val, Action action)
{
T original = var;
var = val;
try
{
action();
}
finally
{
var = original;
}
}
Now you can say:
bool flag = false;
WithAssignment(ref flag, true, () =>
{
// flag is true during this block
});
// flag is false again
No, you have to do it manually with a try/finally block. I dare say you could write an IDisposable implementation which would do something hacky in conjunction with lambda expressions, but I suspect a try/finally block is simpler (and doesn't abuse the using statement).
Sounds like you really want
Stack<bool>
No , there is not standard way, you should implement it manually. Generic implementation of IEditableObject via TypeDescriptor and Reflection can be helpfull
Canned... I doubt it. Given your example is a simple use of a temporary bool value I'm assuming you've got something wacky in mind :-) You can implement some kind of Stack stucture:
1) Push old value onto stack
2) Load new value
3) Do Stuff
4) Pop from stack and replace used value.
Rough (AKA Untested) example (Can't look up stack syntax right now)
bool CurrentValue = true;
Stack<bool> Storage= new Stack<bool>
Storage.Push(CurrentValue);
CurrentValue=false;
DoStuff();
CurrentValue = Storage.Pop();
//Continue
You should refactor your code to use a separate function, like so:
bool b = GetSomeValue();
DoSomething(ModifyValue(b));
//b still has the original value.
For this to work for a reference type, you need to copy it before messing with it:
ICloneable obj = GetSomeValue();
DoSomething(ModifyValue(obj.Clone()));
//obj still has the original value.
It's hard to write correct code when the values of your variables change around a lot. Strive to have as few reassignments in your code as possible.
I don't really know much about the internals of compiler and JIT optimizations, but I usually try to use "common sense" to guess what could be optimized and what couldn't. So there I was writing a simple unit test method today:
#Test // [Test] in C#
public void testDefaultConstructor() {
new MyObject();
}
This method is actually all I need. It checks that the default constructor exists and runs without exceptions.
But then I started to think about the effect of compiler/JIT optimizations. Could the compiler/JIT optimize this method by eliminating the new MyObject(); statement completely? Of course, it would need to determine that the call graph does not have side effects to other objects, which is the typical case for a normal constructor that simply initializes the internal state of the object.
I presume that only the JIT would be allowed to perform such an optimization. This probably means that it's not something I should worry about, because the test method is being performed only once. Are my assumptions correct?
Nevertheless, I'm trying to think about the general subject. When I thought about how to prevent this method from being optimized, I thought I may assertTrue(new MyObject().toString() != null), but this is very dependent on the actual implementation of the toString() method, and even then, the JIT can determine that toString() method always returns a non-null string (e.g. if actually Object.toString() is being called), and thus optimize the whole branch. So this way wouldn't work.
I know that in C# I can use [MethodImpl(MethodImplOptions.NoOptimization)], but this is not what I'm actually looking for. I'm hoping to find a (language-independent) way of making sure that some specific part(s) of my code will actually run as I expect, without the JIT interfering in this process.
Additionally, are there any typical optimization cases I should be aware of when creating my unit tests?
Thanks a lot!
Don't worry about it. It's not allowed to ever optimize anything that can make a difference to your system (except for speed). If you new an object, code gets called, memory gets allocated, it HAS to work.
If you had it protected by an if(false), where false is a final, it could be optimized out of the system completely, then it could detect that the method doesn't do anything and optimize IT out (in theory).
Edit: by the way, it can also be smart enough to determine that this method:
newIfTrue(boolean b) {
if(b)
new ThisClass();
}
will always do nothing if b is false, and eventually figure out that at one point in your code B is always false and compile this routine out of that code completely.
This is where the JIT can do stuff that's virtually impossible in any non-managed language.
I think if you are worried about it getting optimized away, you may be doing a bit of testing overkill.
In a static language, I tend to think of the compiler as a test. If it passes compilation, that means that certain things are there (like methods). If you don't have another test that exercises your default constructor (which will prove it wont throw exceptions), you may want to think about why you are writing that default constructor in the first place (YAGNI and all that).
I know there are people that don't agree with me, but I feel like this sort of thing is just something that will bloat out your number of tests for no useful reason, even looking at it through TDD goggles.
Think about it this way:
Lets assume that compiler can determine that the call graph doesn't have any side effects(I don't think it is possible, I vaguely remember something about P=NP from my CS courses). It will optimize any method that doesn't have side effects. Since most tests don't have and shouldn't have any side effects then compiler can optimize them all away.
The JIT is only allowed to perform operations that do not affect the guaranteed semantics of the language. Theoretically, it could remove the allocation and call to the MyObject constructor if it can guarantee that the call has no side effects and can never throw an exception (not counting OutOfMemoryError).
In other words, if the JIT optimizes the call out of your test, then your test would have passed anyway.
PS: Note that this applies because you are doing functionality testing as opposed to performance testing. In performance testing, it's important to make sure the JIT does not optimize away the operation you are measuring, else your results become useless.
It seems that in C# I could do this:
[Test]
public void testDefaultConstructor() {
GC.KeepAlive(new MyObject());
}
AFAIU, the GC.KeepAlive method will not be inlined by the JIT, so the code will be guaranteed to work as expected. However, I don't know a similar construct in Java.
Every I/O is a side effect, so you can just put
Object obj = new MyObject();
System.out.println(obj.toString());
and you're fine.
Why should it matter? If the compiler/JIT can statically determine no asserts are going to be hit (which could cause side effects), then you're fine.