Why does the C# compiler not even warn about endless recursion? - c#

A legacy app is in an endless loop at startup; I don't know why/how yet (code obfuscation contest candidate), but regarding the method that's being called over and over (which is called from several other methods), I thought, "I wonder if one of the methods that calls this is also calling another method that also calls it?"
I thought: "Nah, the compiler would be able to figure that out, and not allow it, or at least emit a warning!"
So I created a simple app to prove that would be the case:
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
method1();
}
private void button2_Click(object sender, EventArgs e)
{
method2();
}
private void method1()
{
MessageBox.Show("method1 called, which will now call method2");
method2();
}
private void method2()
{
MessageBox.Show("method2 called, which will now call method1");
// Note to self: Write an article entitled, "Copy-and-Paste Considered Harmful"
method1();
}
}
...but no! It compiles just fine. Why wouldn't the compiler flag this code as questionable at best? If either button is mashed, you are in never-never land!
Okay, sometimes you may want an endless loop (pacemaker code, etc.), but still I think a warning should be emitted.

As you said sometimes people want infinite loops. And the jit-compiler of .net supports tailcall optimization, so you might not even get a stack overflow for endless recursion like you did it.
For the general case, predicting whether or not a program is going to terminate at some point or stuck in an infinite loop is impossible in finite time. It's called the halting problem. All a compiler can possibly find are some special cases, where it is easy to decide.

That's not an endless loop, but an endless recursion. And this is much worse, since they can lead to a stack overflow. Endless recursions are not desired in most languages, unless you are programming malware. Endless loops, however, are often intentional. Services typically run in endless loops.
In order to detect this kind of situation, the compiler would have to analyze the code by following the method calls; however the C# compiler limits this process to the immediate code within the current method. Here, uninitialized or unused variables can be tracked and unreachable code can be detected, for instance. There is a tradeoff to make between the compiling speed and the depth of static analysis and optimizations.
Also it is hardly possible to know the real intention of the programmer.
Imagine that you wrote a method that is perfectly legal. Suddenly because you are calling this method from another place, your compiler complains and tells you that your method is no more legal. I can already see the flood of posts on SO like: "My method compiled yesterday. Today it does not compile any more. But I didn't change it".

To put it very simply: it's not the compiler's job to question your coding patterns.
You could very well write a Main method that does nothing but throw an Exception. It's a far easier pattern to detect and a much more stupid thing to do; yet the compiler will happily allow your program to compile, run, crash and burn.
With that being said, since technically an endless loop / recursion is perfectly legal as far as the compiler is concerned, there's no reason why it should complain about it.
Actually, it would be very hard to figure out at compile time that the loop can't ever be broken at runtime. An exception could be thrown, user interaction could happen, a state might change somewhere on a specific thread, on a port you are monitoring, etc... there's way too much possibilities for any code analysis tool out there to establish, without any doubt, that a specific recursing code segment will inevitably cause an overflow at runtime.
I think the right way to prevent these situations is through unit testing organization. The more code paths you are covering in your tests, the less likely you are to ever face such a scenario.

Because its nearly impossible to detect!
In the example you gave, it is obvious (to us) that the code will loop forever. But the compiler just sees a function call, it doesn't necessarily know at the time what calls that function, what conditional logic could change the looping behavior etc.
For example, with this slight change you aren't in an infinite loop anymore:
private bool method1called = false;
private void method1()
{
MessageBox.Show("method1 called, which will now call method2");
if (!method1called)
method2();
method1called = true;
}
private void method2()
{
MessageBox.Show("method2 called, which will now call method1");
method1();
}
Without actually running the program, how would you know that it isn't looping? I could potentially see a warning for while (true), but that has enough valid use cases that it also makes sense to not put a warning in for it.
A compiler is just parsing the code and translating to IL (for .NET anyways). You can get limited information like variables not being assigned while doing that (especially since it has to generate the symbol table anyways) but advanced detection like this is generally left to code analysis tools.

I found this on the Infinite Loop Wiki found here: http://en.wikipedia.org/wiki/Infinite_loop#Intentional_looping
There are a few situations when this is desired behavior. For example, the games on cartridge-based game consoles typically have no exit condition in their main loop, as there is no operating system for the program to exit to; the loop runs until the console is powered off.
Antique punchcard-reading unit record equipment would literally halt once a card processing task was completed, since there was no need for the hardware to continue operating, until a new stack of program cards were loaded.
By contrast, modern interactive computers require that the computer constantly be monitoring for user input or device activity, so at some fundamental level there is an infinite processing idle loop that must continue until the device is turned off or reset. In the Apollo Guidance Computer, for example, this outer loop was contained in the Exec program, and if the computer had absolutely no other work to do it would loop running a dummy job that would simply turn off the "computer activity" indicator light.
Modern computers also typically do not halt the processor or motherboard circuit-driving clocks when they crash. Instead they fall back to an error condition displaying messages to the operator, and enter an infinite loop waiting for the user to either respond to a prompt to continue, or to reset the device.
Hope this helps.

Related

Is removing unreachable code always safe?

Assume I have a tool that automatically removes C# code that is detected by the compiler as unreachable. Is there a situation in which such operation can get me into trouble? Please share interesting cases.
Here's the interesting example. Consider a function like this:
public static IEnumerable<int> Fun()
{
if (false)
{
yield return 0;
}
}
The line with yield is detected as unreachable. However, removing it will make the program not compilable. yield contained in the function gives the compiler information to reorganize the function so that not doing anything just returns the empty collection. With yield line removed it looks just like ordinary function with no return while it's required.
As in the comment, the example is contrived, however instead of false we could have a constant value from other, generated project etc (i.e. such piece of code wouldn't look so obvious as in this case).
Edit:
Note that the yield construct is actually very similar to async/await. Yet with the latter creators of the language took a different (IMO better) approach that prevents such scenarios. Iterator blocks could be defined in the same way (using some keyword in function signature instead of detecting it from the function body).
I wouldn't do this automatically, for reasons mentioned in other answers, but I will say here that I'd have a strong bias towards removing unused code over keeping it. After all, tracking obsolete code is what source control is for.
This example is contrived, but this will get flagged for removal (in default DEBUG settings) and produce different behaviors when removed.
public class Baz { }
public class Foo
{
public void Bar()
{
if (false)
{
// Will be flagged as unreachable code
var baz = new Baz();
}
var true_in_debug_false_in_release =
GetType()
.GetMethod("Bar")
.GetMethodBody()
.LocalVariables
.Any(x => x.LocalType == typeof(Baz));
Console.WriteLine(true_in_debug_false_in_release);
}
}
In Release mode (with default settings), the "unreachable code" will optimized away and produce the same result as if you deleted the if block in DEBUG mode.
Unlike the example using yield, this code compiles regardless of whether or not the unreachable code is removed.
Further to Dan Bryant's answer, here's an example of a program whose behaviour will be altered by a tool that's smarter than the C# compiler in finding and removing unreachable code:
using System;
class Program
{
static bool tru = true;
static void Main(string[] args)
{
var x = new Destructive();
while (tru)
{
GC.Collect(2);
}
GC.KeepAlive(x); // unreachable yet has an effect on program output
}
}
class Destructive
{
~Destructive()
{
Console.WriteLine("Blah");
}
}
The C# compiler does not try very hard to prove that GC.KeepAlive is unreachable, so it doesn't eliminate it in this case. As a result, the program loops forever without printing anything.
If a tool proves that it's actually unreachable (fairly easy in this example) and removes it, the program behaviour is changed. It will print "Blah" straight away and then loop forever. So it has become a different program. Try it if you have doubts; just comment out that unreachable line and see the behaviour change.
If GC.KeepAlive was there for a reason then this change would, in fact, be unsafe, and will make the program misbehave at some point (probably just crash).
One rare boundary case is if the unreachable code contains a GC.KeepAlive call. This very rarely comes up (as it is related to particular hacky use cases of unmanaged/managed interop), but if you are interoperating with unmanaged code that requires this, removing it could cause intermittent failures if you're unlucky enough to have GC trigger at just the wrong moment.
UPDATE:
I've tested this and I can confirm that Servy is correct; the GC.KeepAlive call does not take effect due to the fact that the compiler proves it can never actually use the reference. This is not because the method is never executed (the method doesn't have to execute it for it impact the GC behavior), but because the compiler ignores the GC.KeepAlive when it is provably not reachable.
I'm leaving this answer up because it's still interesting for the counter case. This is a mechanism for breaking a program if you modify your code to make it unreachable, but don't move the GC.KeepAlive to make sure it still keeps the reference alive.

Threading in C# with XNA KeyboadInput

I am a bit new to threading (not new to C#, just haven't done much threading). Can someone explain to me why this does not work?
I have a thread which calls a method I will call "Loop". Loop contains a while loop which will continuously run, and on every loop of the while I want it to check if the A Key is down (using Microsoft's Keyboard class within the XNA Framework). But for some reason it never registers that anything is being pressed.
static Thread thread = new Thread(Loop);
static bool abort = false;
public static void Begin()
{
thread.Start();
}
private static void Loop()
{
while (!abort)
{
if (Keyboard.GetState().IsKeyDown(Keys.A))
Console.WriteLine("A pressed.");
}
}
Might anyone know why the Console.WriteLine() is never being called?
EDIT:
I guess I should explain a little bit. What I am actually trying to do is create something similar to ActionScript's events in C#. So I want to pass a "condition" and an "action" to call if that condition is met in this separate class which contains this thread. What this would do would allow me to just add "event listeners" to objects and it would automatically constantly check if one of the events gets triggered, rather than leave it to me to write If statements in code to check for the events.
Upon trying to do so, the first thing I tested was regarding this XNA Keyboard stuff, because it was one of the reasons I originally wanted to build this system, but it didn't work. So I created the standalone code which i posted above to see if I had made an error in my previous code and it still didn't work.
I never use XNA so I didn't really "know" but I've run into similar situations where you can't get keyboard (and other) input from a worker thread. I googled and found that in XNA this does seem to be the case. See this for example
So you need to (and probably want to) process your game input in the GUI thread. Just checking for input on each update tick should be fine. I doubt even if it did work, you would gain any performance - and you might introduce some interesting synchronization bugs ;-)
It does look like your creating your worker thread properly - this just isn't an application for it.

When does a param that is passed by reference get updated?

Suppose I have a method like this:
public void MyCoolMethod(ref bool scannerEnabled)
{
try
{
CallDangerousMethod();
}
catch (FormatException exp)
{
try
{
//Disable scanner before validation.
scannerEnabled = false;
if (exp.Message == "FormatException")
{
MessageBox.Show(exp.Message);
}
}
finally
{
//Enable scanner after validation.
scannerEnabled = true;
}
}
And it is used like this:
MyCoolMethod(ref MyScannerEnabledVar);
The scanner can fire at any time on a separate thread. The idea is to not let it if we are handling an exception.
The question I have is, does the call to MyCoolMethod update MyScannerEnabledVar when scannerEnabled is set or does it update it when the method exits?
Note: I did not write this code, I am just trying to refactor it safely.
You can think of a ref as making an alias to a variable. It's not that the variable you pass is "passed by reference", it's that the parameter and the argument are the same variable, just with two different names. So updating one immediately updates the other, because there aren't actually two things here in the first place.
As SLaks notes, there are situations in VB that use copy-in-copy-out semantics. There are also, if I recall correctly, rare and obscure situations in which expression trees may be compiled into code that does copy-in-copy-out, but I do not recall the details.
If this code is intended to update the variable for reading on another thread, the fact that the variable is "immediately" updated is misleading. Remember, on multiple threads, reads and writes can be observed to move forwards and backwards in time with respect to each other if the reads and writes are not volatile. If the intention is to use the variable as a cross-thread communications mechanism them use an object actually designed for that purpose which is safe for that purpose. Use some sort of wait handle or mutex or whatever.
It gets updated live, as it is assigned inside the method.
When you pass a parameter by reference, the runtime passes (an equivalent to) a pointer to the field or variable that you referenced. When the method assigns to the parameter, it assigns directly to whatever the reference is pointing to.
Note, by the way, that this is not always true in VB.
Yes, it will be set when the variable is set within the method. Perhaps it would be best to return true or false whether the scanner is enabled rather than pass it in as a ref arg
The situation calls for more than a simple refactor. The code you posted will be subject to race conditions. The easy solution is to lock the unsafe method, thereby forcing threads to hop in line. The way it is, there's bound to be some bug(s) in the application due to this code, but its impossible to say what exactly they are without knowing a lot more about your requirements and implementation. I recommend you proceed with caution, a mutex/lock is an easy fix, but may have a great impact on performance. If this is a concern for you, then you all should review a better thread safe solution.

How do you validate an object's internal state?

I'm interested in hearing what technique(s) you're using to validate the internal state of an object during an operation that, from it's own point of view, only can fail because of bad internal state or invariant breach.
My primary focus is on C++, since in C# the official and prevalent way is to throw an exception, and in C++ there's not just one single way to do this (ok, not really in C# either, I know that).
Note that I'm not talking about function parameter validation, but more like class invariant integrity checks.
For instance, let's say we want a Printer object to Queue a print job asynchronously. To the user of Printer, that operation can only succeed, because an asynchronous queue result with arrive at another time. So, there's no relevant error code to convey to the caller.
But to the Printer object, this operation can fail if the internal state is bad, i.e., the class invariant is broken, which basically means: a bug. This condition is not necessarily of any interest to the user of the Printer object.
Personally, I tend to mix three styles of internal state validation and I can't really decide which one's the best, if any, only which one is absolutely the worst. I'd like to hear your views on these and also that you share any of your own experiences and thoughts on this matter.
The first style I use - better fail in a controllable way than corrupt data:
void Printer::Queue(const PrintJob& job)
{
// Validate the state in both release and debug builds.
// Never proceed with the queuing in a bad state.
if(!IsValidState())
{
throw InvalidOperationException();
}
// Continue with queuing, parameter checking, etc.
// Internal state is guaranteed to be good.
}
The second style I use - better crash uncontrollable than corrupt data:
void Printer::Queue(const PrintJob& job)
{
// Validate the state in debug builds only.
// Break into the debugger in debug builds.
// Always proceed with the queuing, also in a bad state.
DebugAssert(IsValidState());
// Continue with queuing, parameter checking, etc.
// Generally, behavior is now undefined, because of bad internal state.
// But, specifically, this often means an access violation when
// a NULL pointer is dereferenced, or something similar, and that crash will
// generate a dump file that can be used to find the error cause during
// testing before shipping the product.
}
The third style I use - better silently and defensively bail out than corrupt data:
void Printer::Queue(const PrintJob& job)
{
// Validate the state in both release and debug builds.
// Break into the debugger in debug builds.
// Never proceed with the queuing in a bad state.
// This object will likely never again succeed in queuing anything.
if(!IsValidState())
{
DebugBreak();
return;
}
// Continue with defenestration.
// Internal state is guaranteed to be good.
}
My comments to the styles:
I think I prefer the second style, where the failure isn't hidden, provided that an access violation actually causes a crash.
If it's not a NULL pointer involved in the invariant, then I tend to lean towards the first style.
I really dislike the third style, since it will hide lots of bugs, but I know people that prefers it in production code, because it creates the illusion of a robust software that doesn't crash (features will just stop to function, as in the queuing on the broken Printer object).
Do you prefer any of these or do you have other ways of achieving this?
You can use a technique called NVI (Non-Virtual-Interface) together with the template method pattern. This probably is how i would do it (of course, it's only my personal opinion, which is indeed debatable):
class Printer {
public:
// checks invariant, and calls the actual queuing
void Queue(const PrintJob&);
private:
virtual void DoQueue(const PringJob&);
};
void Printer::Queue(const PrintJob& job) // not virtual
{
// Validate the state in both release and debug builds.
// Never proceed with the queuing in a bad state.
if(!IsValidState()) {
throw std::logic_error("Printer not ready");
}
// call virtual method DoQueue which does the job
DoQueue(job);
}
void Printer::DoQueue(const PrintJob& job) // virtual
{
// Do the actual Queuing. State is guaranteed to be valid.
}
Because Queue is non-virtual, the invariant is still checked if a derived class overrides DoQueue for special handling.
To your options: I think it depends on the condition you want to check.
If it is an internal invariant
If it is an invariant, it should not
be possible for a user of your class
to violate it. The class should care
about its invariant itself. Therefor,
i would assert(CheckInvariant()); in
such a case.
It's merely a pre-condition of a method
If it's merely a pre-condition that
the user of the class would have to
guarantee (say, only printing after
the printer is ready), i would throw
std::logic_error as shown above.
I would really discourage from check a condition, but then doing nothing.
The user of the class could itself assert before calling a method that the pre-conditions of it are satisfied. So generally, if a class is responsible for some state, and it finds a state to be invalid, it should assert. If the class finds a condition to be violated that doesn't fall in its responsibility, it should throw.
The question is best considered in combination with how you test your software.
It's important that hitting a broken invariant during testing is filed as a high severity bug, just as a crash would be. Builds for testing during development can be made to stop dead and output diagnostics.
It can be appropriate to add defensive code, rather like your style 3: your DebugBreak would dump diagnostics in test builds, but just be a break point for developers. This makes less likely the situation where a developer is prevented from working by a bug in unrelated code.
Sadly, I've often seen it done the other way round, where developers get all the inconvenience, but test builds sail through broken invariants. Lots of strange behaviour bugs get filed, where in fact a single bug is the cause.
It's a fine and very relevant question. IMHO, any application architecture should provide a strategy to report broken invariants. One can decide to use exceptions, to use an 'error registry' object, or to explicitly check the result of any action. Maybe there are even other strategies - that's not the point.
Depending on a possibly loud crash is a bad idea: you cannot guarantee the application is going to crash if you don't know the cause of the invariant breach. In case it doesn't, you still have corrupt data.
The NonVirtual Interface solution from litb is a neat way to check invariants.
Tough question this one :)
Personally, I tend to just throw an exception since I'm usually too much into what I'm doing when implementing stuff to take care of what should be taken care of by your design. Usually this comes back and bites me later on...
My personal experience with the "Do-some-logging-and-then-don't-do-anything-more"-strategy is that it too comes back to bite you - especially if it's implemented like in your case (no global strategy, every class could potentially do it different ways).
What I would do, as soon as I discover a problem like this, would be to speak to the rest of my team and tell them that we need some kind of global error-handling. What the handling will do depends on your product (you don't want to just do nothing and log something in a subtle developer-minded file in an Air Traffic Controller-system, but it would work fine if you were making a driver for, say, a printer :) ).
I guess what Im saying is, that imho, this question is something you should resolve on a design-level of your application rather than at implementation level. - And sadly there's no magic solutions :(

How do I write (test) code that will not be optimized by the compiler/JIT?

I don't really know much about the internals of compiler and JIT optimizations, but I usually try to use "common sense" to guess what could be optimized and what couldn't. So there I was writing a simple unit test method today:
#Test // [Test] in C#
public void testDefaultConstructor() {
new MyObject();
}
This method is actually all I need. It checks that the default constructor exists and runs without exceptions.
But then I started to think about the effect of compiler/JIT optimizations. Could the compiler/JIT optimize this method by eliminating the new MyObject(); statement completely? Of course, it would need to determine that the call graph does not have side effects to other objects, which is the typical case for a normal constructor that simply initializes the internal state of the object.
I presume that only the JIT would be allowed to perform such an optimization. This probably means that it's not something I should worry about, because the test method is being performed only once. Are my assumptions correct?
Nevertheless, I'm trying to think about the general subject. When I thought about how to prevent this method from being optimized, I thought I may assertTrue(new MyObject().toString() != null), but this is very dependent on the actual implementation of the toString() method, and even then, the JIT can determine that toString() method always returns a non-null string (e.g. if actually Object.toString() is being called), and thus optimize the whole branch. So this way wouldn't work.
I know that in C# I can use [MethodImpl(MethodImplOptions.NoOptimization)], but this is not what I'm actually looking for. I'm hoping to find a (language-independent) way of making sure that some specific part(s) of my code will actually run as I expect, without the JIT interfering in this process.
Additionally, are there any typical optimization cases I should be aware of when creating my unit tests?
Thanks a lot!
Don't worry about it. It's not allowed to ever optimize anything that can make a difference to your system (except for speed). If you new an object, code gets called, memory gets allocated, it HAS to work.
If you had it protected by an if(false), where false is a final, it could be optimized out of the system completely, then it could detect that the method doesn't do anything and optimize IT out (in theory).
Edit: by the way, it can also be smart enough to determine that this method:
newIfTrue(boolean b) {
if(b)
new ThisClass();
}
will always do nothing if b is false, and eventually figure out that at one point in your code B is always false and compile this routine out of that code completely.
This is where the JIT can do stuff that's virtually impossible in any non-managed language.
I think if you are worried about it getting optimized away, you may be doing a bit of testing overkill.
In a static language, I tend to think of the compiler as a test. If it passes compilation, that means that certain things are there (like methods). If you don't have another test that exercises your default constructor (which will prove it wont throw exceptions), you may want to think about why you are writing that default constructor in the first place (YAGNI and all that).
I know there are people that don't agree with me, but I feel like this sort of thing is just something that will bloat out your number of tests for no useful reason, even looking at it through TDD goggles.
Think about it this way:
Lets assume that compiler can determine that the call graph doesn't have any side effects(I don't think it is possible, I vaguely remember something about P=NP from my CS courses). It will optimize any method that doesn't have side effects. Since most tests don't have and shouldn't have any side effects then compiler can optimize them all away.
The JIT is only allowed to perform operations that do not affect the guaranteed semantics of the language. Theoretically, it could remove the allocation and call to the MyObject constructor if it can guarantee that the call has no side effects and can never throw an exception (not counting OutOfMemoryError).
In other words, if the JIT optimizes the call out of your test, then your test would have passed anyway.
PS: Note that this applies because you are doing functionality testing as opposed to performance testing. In performance testing, it's important to make sure the JIT does not optimize away the operation you are measuring, else your results become useless.
It seems that in C# I could do this:
[Test]
public void testDefaultConstructor() {
GC.KeepAlive(new MyObject());
}
AFAIU, the GC.KeepAlive method will not be inlined by the JIT, so the code will be guaranteed to work as expected. However, I don't know a similar construct in Java.
Every I/O is a side effect, so you can just put
Object obj = new MyObject();
System.out.println(obj.toString());
and you're fine.
Why should it matter? If the compiler/JIT can statically determine no asserts are going to be hit (which could cause side effects), then you're fine.

Categories