Creating an error number from stack trace

Creating an error number from stack trace - c#

I'm coding in Java, on Android platform, but this isn't really a question for a specific programming language.
So ... If an error occurs in my program, i catch it within a try-catch statement and i would like to create an error number which I'll display to the user, giving him the opportunity to send me this error number.
The catch is that i would like to code the error in such a way that i get a small number (let's say a maximum of 5 digits) which i later on can decode and can find out exactly in which class, in which method and at which line number the error occured.
I'm guessing this is more of a cryptography issue, so has anyone got any ideas on how i should go about doing this?
EDIT
I was thinking of giving a number to each file, each method and somehow use these values to create the error number, but i'm not sure how to calculate the actual error number so that it will work the other way around (decode it correctly).

I will say it, this is a dreadful approach to debugging issues. What you want to do is set up an enumeration that dictates the error codes + descriptions. This would be similar to how Microsoft does it
ERROR_SUCCESS
0 (0x0)
The operation completed successfully
So on and so forth. That way you can publish these things to your users, so you reduce the amount of emails / complaints that you get (to some degree). Obfuscating the stack trace is going to be a nightmare for you, because it would almost seem that you are locking yourself into an unmanageable reporting system. As your code base grows and/or you add more custom exceptions you will quickly break your design. Also, this is a strong case of security through obscurity, wherein someone will potentially reverse engineer your process and start writing malicious code against you.
Better approach:
Get a JIRA account where the more technically savvy users can post the error plus description. I believe FogBugz also has this functionality where you can upload these types of things and it allows the users to crop the images to focus on what you are looking for specifically.

Related

Perform cleanup actions after Exception

A process runs that parses many xml files, and within each file there are many nodes that represent work to be done.
Currently, errors that occur are logged to log4net one at a time, logging to a file once per error, and also generating an email once per error. This is not ideal, so I'm working to roll up all the errors that occur during one parsing session so they can be sent as a single email. For example, let's say the creator of the files accidentally left a crucial field out of each work node. Instead of 4,000 emails, I'd like to send a single email with 4,000 results, plus a summary at the top saying "4,000 field missing errors" and so on.
The trick comes when dealing with exceptions, because I don't know in advance what exception will occur. So, the previous state of the system worked well for tolerating exceptions because it flushed its knowledge as soon as it acquired it (sent an email, for example). But now I'm seeing that waiting until the end to send the email risks not sending an email at all. Let's say the nature of a repeating error is such that eventually it causes a system-level exception. By losing any error reporting at all, I never get the information on the repeating error that (almost certainly) led up to the big crash.
How can I get the best of both worlds, where logging is saved until the end, but whatever work is done and whatever information is collected has a decent chance of being logged even if an error occurs?
One thought I had was sending log information to a custom log4net appender that does caching or batching, and thus allows modification of the log message over time until it is finally triggered to be sent/logged (perhaps accepts individual items and a lambda that it executes at the end to assemble those items into a loggable result). Then, in my application:
var loggingContext = CreateNewLoggingContext();
try {
// Note: all reasonable and known or handleable exceptions will be
// caught from within this message. We're talking unforeseen or
// difficult to handle errors, here.
ParseAndProcessABunchOfFiles(loggingContext);
// within this method repeatedly use loggingContext.Push("error id", "error message");
}
catch (Exception e) {
loggingContext.Push("System-level exception", e);
loggingContext.Flush();
throw;
}
loggingContext.Flush();
But I am not particularly excited about catching every exception. An OutOfMemoryException is going to make it pretty hard to do an operation such as logging that might require even more memory. Or a disk full exception is going to make it hard to add to the log file on disk (though it would be grand if it still tried to run its email result and perhaps also log to the database).
How can I achieve these goals in a reasonable way?
UPDATE
Part of what's troubling me is distinguishing at each level in the call stack what is a recoverable error and what's one that should terminate the whole process. I guess what I have to do is simply catch the ones I can think of and reasonably expect, then as other exceptions show up, add specific handling for them.

ZeroMQ subscriber fails to initialize using 1000+ publishers

I am trying to evaluate ZeroMQ for a larger monitoring and data gathering system. On a smaller scale everything works nice but stepping up the load and scale a bit seems tricky.
Right now I am using a C# wrapper (clrzmq, 3.0.0-rc1) to create both a publisher and a subscriber application. I am binding the Publisher socket (1 socket, 1 context) to 1000 endpoints (localhost + a range of ports) and let the Subscriber applications socket (again 1 socket, 1 context) bind to the publisher endpoints.
This sometimes works, and sometimes not (I guess it relates to the max number of sockets handled by the process somehow). It seems to depend on in which order I start the applications but I cannot tell for sure. The only thing I see is nasty SEHExceptions, containing no details at all. If I create simple console applications I sometimes see low level C++ Asserts like:
Assertion failed: fds.size () <= FD_SETSIZE (......\src\select.cpp:70)
Assertion failed: Permission denied (......\src\signaler.cpp:281)
Assertion failed: Connection reset by peer (......\src\signaler.cpp:124)
Not very helpful to me. In the C# wrapper, the Context creation fails. It does not even get a chance to begin connecting to or even creating sockets. I would expect low level ZeroMQ errors to be handled by throwing exceptions, maybe I just have not understood how to deal with errors yet.
The questions I have right now is:
How do I create a (somewhat) realistic test setup to simulate 1000 separate publishers on a single machine (in real world 1 publisher = 1 machine) and a couple of Subscribers on Another machine, all using C#. Is that even possible?
More importantly, how do I trap ZeroMQ errors in C# code to be able to understand what goes wrong?
Since ZeroMQ seems pretty stable and mature I have a hard time believing 1000 publishers should be a problem to handle. However, I need better error support than currently available (unless I completely missed something here) in order to use ZeroMQ over C#.
Update:
After diggin into the source, I end up with a zmq_assert(...) leading to RaiseException (0x40000015, EXCEPTION_NONCONTINUABLE, 1, extra_info);. This will abruptly terminate the application after dumping the original assert statement to the console. This seems a bit harsh, but may well be the best option given that it is really unrecoverable. However, a somewhat better error message would not hurt. Not everyone knows what fds.size () <= FD_SETSIZE means. The comment in the source gives some clues, would be nice to have that comment in the error message. Anyway, given that my application is not a console app, this just leaves me with an unhandled SEHException, which does not seem to contain even the assert statement or line/file info. I wonder how many other bugs I will create that will result in other similar cryptical errors.

After looking into this a bit more, it seems the default number of sockets are set to 1024. The C# wrapper has a property on the Context object that should be able to change this setting but it is not working, at least not as expected. Also, the native zmqlib does not have this setting on the context object.
Running a setup like in the description does not seem possible, at least not using the clrzmq C# ZeroMQ wrapper. I solved it by running 500 publishers on a separate machine and another 500 plus 1000 subscribers on another machine. This worked nice without any errors.
The other topic is also a bit disappointing. When the maximum number of sockets are reached, ZeroMQ simply throws an uncatchable exception causing the application to crash abruptly. This is a fail fast approach, avoiding any further data/state corruption but unfortunatly also leaves very few clues to what happend that caused the application to die. Judging from other posts, it seems very hard to gather data for post-mortem when this happens. Catching the exception in the C# code seems impossible or very hard, and hooking into the stdout to capture the printed assert also seems very hard to achieve (if we are not running from a command prompt, in which case the assert message is printed just before the application dies).
All-in-all, this makes low-level trouble shooting and post-mortem analysis in a non-console C# setting very hard when ZeroMQ terminates via the zmq_assert(...) call. Hopefully this was an extreme case. Not all failure modes seems to cause termination in this abrupt way.

The default FD_SETSIZE is 1024 (defined in the MSVC libzmq project), so you will hit this about half-way through your test case. The other asserts tumble on from that.
Increase this in your libzmq project, to 4K or 8K, and things should work better.
As for the assert() call, it's too brutal on Windows, for sure. On Linux this gives a decent stack dump and enough information to trace the problem. Feel free to improve the assert macro so that it does something smarter, e.g. launch the debugger. In any case if you hit an assert you can't reasonably continue.
Asserting when the FD set is full, well, that could be handled better. If you know anything about C/C++, feel free to take a look at the code. We do depend on peoples' patches.
Also, if you feel 1024 is too small, feel free to raise this in the project and send us the patch.

A quick and dirty look into this problem suggest that you're creating too many socket connections for your computer. Check out this link on the max number of sockets from MSDN. The error's you are getting look suspiciously relevant enough for this to be a possible source of your error.
To be honest, having 1000 separate publishers seems like you are tackling the problem a little incorrectly for using zmq. Why not have 1 publisher and use 'namespaces' and have the subscribers SUBSCRIBE to what it needs to split out what messages subscribers get.

Is it necessary to wrap every exception at top level?

Today, someone told me that we should always wrap every exception at the top level of the framework.
The reason is the original exception may contains sensitive information in the stacktrace or message, especially the stacktrace.
I don't know, is there any rules/principles/patterns?

Today, someone told me that we should always wrap every exception at the top level of the framework.
"Always" seems a bit much.
Like any other design decision, you should consider the costs and the benefits.
Because the original exception may contains sensitive information in the stacktrace or message, especially the stacktrace.
Indeed; the exception could contain sensitive information, and the stack trace can be used by attackers.
are there any rules/principles/patterns?
Yes. Before you do anything else, particularly before you make a design or code change, make a threat model. You are asking a security question and therefore you absolutely positively have to understand the threats before you can devise a good strategy to mitigate the vulnerabilities.
The central questions answered by your threat model should be "what are the trust boundaries of my application? When and how does data cross a boundary? What vulnerabilities does this expose? What threats can an attacker make good on as a result?"
If you don't understand precisely what trust boundaries, vulnerabilities, threats and attackers are, then learn what those words mean before you try to design a security system to mitigate vulnerabilities to threats. "Writing Secure Code 2" is a good place to start. (Chapter 5 of my book on code security has some good advice on eliminating exception vulnerabilities, but it is long out of print. Maybe I'll put it up on the blog one of these days.)
Data can cross a trust boundary in either direction; an untrusted client might be sending malformed data to your server, and your server might be sending sensitive private data to the untrustworthy client.
The particular aspect of the threat model that your question specifically addresses is data in the form of exceptions. I kid you not, before we shipped .NET 1.0 you could actually get the framework to give you an exception whose text was something like "You do not have permission to determine the name of directory C:\foo". (Great. Thanks for letting me know. I'll be sure to not use that information to attack the user now.)
Obviously that one got fixed long before we shipped, but people do the moral equivalent of that every day. If you have data crossing a trust boundary, you should assume that the hostile user on the untrusted side is going to try to cause exceptions on the trusted side, and is going to try to learn as much about the system as possible from those exceptions. Do not make the attacker's job easier.
You asked whether all exceptions should be wrapped. Maybe. If in fact you have a problem -- if an exception containing sensitive data can cross a trust boundary, then maybe wrapping the exception is the right thing to do. But maybe it is not enough. Maybe you need to not throw an exception across the boundary at all, even if the exception can be santized. Maybe the right thing to do is to go on full red alert and say "hey, we got an unexpected exception caused by bad data from a potentially hostile third party, so let's (1) ban their IP, or (2) redirect them to the honeypot server, or (3) alert the security department, or (4) something else." What the right solution is depends on the threat, which you have not yet stated.
Like I said the first thing you need to do is model the threats. Don't go making security decisions without understanding the threats thoroughly.

Yes, there is a rule: it is stupid.
We need to know more details, but at the end:
You can and should have a top level handler for exceptions. This is not in your framework (i.e. wrapping top level) but attached to the appdomain (unhandled exception). This allows you to show an error message and write logs etc.
At the end you should propagate as much information as possible. The rule is "catch what you can handle, propagte waht yo ucan not".
Now it gets funny. "sensitive informatioN" is "information usefull and need3ed for debuggin". On top standards say "wrap an exception = the original goes into the inner exception property".
The framework should not expose sensitive infrmation to the user, but this is application specific (iis: custom error page, windowms etc.: last resort handler, not showing the user too many details because the user does not care anyway).

Displaying stack trace for user in normal conditions is not best idea.
Im using DEBUG constant set to "true" for debugging and developing, and set to "false" in normal conditions, look at that PHP example
DEFINE("DEBUG" true);
function soap_error($soapFault)
{
if (DEBUG)
{
// full message with stack including sensitive info
echo '<p class="error">SOAP error:</p><br />';
echo '<p>'.$soapFault.'</p><br />';
echo '<hr />';
}
else
{
// only error message
echo '<p class="error">SOAP error:</p><br />';
echo '<p>'.$soapFault->getMessage().'</p><br />';
echo '<hr />';
}
}
You can also use log files to save full error stacks.
In C#/.NET this is easier, because you have build configurations in Visual Studio and you can switch between them and use something like that:
public static void myerrorhandler(Exeption e)
{
#if (DEBUG) // you have DEBUG and RELEASE configurations by default
yourErrorMessageFunction("This is error message and stack " + e.toString());
#else
yourErrorMessageFunction("This is only message" + e.Message());
#endif
}
Visual Studio is adding and controlling DEBUG constant (its set in "Build configurations", you can customize it there).

What should an Application Log ideally contain?

What kind of information should an Application Log ideally contain? How is it different from Error Log?

You are going to get a lot of different opinions for this question.....
Ultimately it should contain any information that you think is going to be relevant to your application. It should also contain information that will help you determine what is happening with the application. That is not to say it should contain errors, but could if you wanted to use it that way.
At a minimum I would suggest that you include:
application start/stop time
application name
pass/fail information (if applicable)
Optional items would be:
call processing (if not too intensive)
errors if you decide to combine application and error logs
messaging (if not too intensive)
One thing you want to keep in mind is that you do not want to be writing so much information to your logs that you impact your application performance. Also, want to make sure you don't grow your log files so large that you run out of disk space.

A true error log should really contain:
The stack trace of where the error took place
The local variables present at the point of error.
A timestamp of when the error took place.
Detail of the exception thrown (if it is an exception).
A general application log file, for tracking events, etc, should contain less internal information, and perhaps be more user friendly.
To be honest, the answer really depends on what software the log is for.

Ideally, it should contain exactly the information you need to diagnose an application problem, or analyze a particular aspect of its past behavior. The only thing that makes this hard to do is that you do not know in advance exactly what problems will occur or which aspects of the application behavior will interest you in the future. You can't log every single change in application state, but you have to log enough. How much is enough? That's hard to say and very application-dependent. I doubt a desktop calculator logs anything.
An error log would just log any errors that occur. Unexpected exceptions and other unexpected conditions.

An application log usually contains errors, warning, events and non-critical information in difference to an error log that usually contains only errors and critical warnings.

The application log should contain all the information necessary for audit. This may include such things as successful/unsuccessful log on and any specific actions. The error log can be a subset of the application log or a separate log containing only information related to errors in the application.

What is Environment.FailFast?

What is Environment.FailFast?
How is it useful?

It is used to kill an application. It's a static method that will instantly kill an application without being caught by any exception blocks.
Environment.FastFail(String) can
actually be a great debugging tool.
For example, say you have an
application that is just downright
giving you some weird output. You have
no idea why. You know it's wrong, but
there are just no exceptions bubbling
to the surface to help you out. Well,
if you have access to Visual Studio
2005's Debug->Exceptions... menu item,
you can actually tell Visual Studio to
allow you to see those first chance
exceptions. If you don't have that,
however you can put
Environment.FastFail(String) in an
exception, and use deductive reasoning
and process of elimination to find out
where your problem in.
Reference

It also creates a dump and event viewer entry, which might be useful.

It's a way to immediately exit your application without throwing an exception.
Documentation is here.
Might be useful in some security or data-critical contexts.

Failfast can be used in situations where you might be endangering the user's data. Say in a database engine, when you detect a corruption of your internal data structures, the only sane course of action is to halt the process as quickly as possible, to avoid writing garbage to the database and risk corrupting it and lose the user's data. This is one possible scenario where failfast is useful.
Another use is to catch programmer errors. Say you are writing a library and some function accepts a pointer that cannot be null in any circumstance, that is, if it's null, you are clearly in presence of a programmer error. You can return an error like E_POINTER or throw some InvalidArgument exception and hope someone notices, but you'll get their attention better by failing fast :-)
Note that I'm not restricting the example to pointers, you can generalize to any parameter or condition that should never happen. Failing fast ultimately results in better quality apps, as many bugs no longer go unnoticed.
Finally, failing fast helps with capturing the state of the process as faithfully as possible (as a memory dump gets created), in particular when failing fast immediately upon detecting an unrecoverable error or a really unexpected condition.
If the process was allowed to continue, say the 'finally' clauses would run or the stack would be unwound, and things would get destroyed or disposed-of, before a memory dump is taken, then the state of the process might be altered in such as way that makes it much more difficult to diagnose the root cause of the problem.

It kills the application and even skips try/finally blocks.

From .NET Framework Design Guidelines on Exception Throwing:
✓ CONSIDER terminating the process by calling System.Environment.FailFast (.NET Framework 2.0 feature) instead of throwing an exception if your code encounters a situation where it is unsafe for further execution.

Joe Duffy discusses failing fast and the discipline to make it useful, here.
http://joeduffyblog.com/2014/10/13/if-youre-going-to-fail-do-it-fast/
Essentially, he's saying that for programming bugs - i.e. unexpected errors that are the fault of the programmer and not the programme user or other inputs or situations that can be reasonable expected to be bad - then deciding to always fail fast for unexpected errors has been seen to improve code quality.
I think since its an optional team decision and discipline, use of this API in C# is rare since in reality we're all mostly writing LoB apps for 12 people in HR or an online shop at best.
So for us, we'd maybe use this when we want deny the consumer of our API the opportunity of making any further moves.

An unhandled exception that is thrown (or rethrown) within a Task won't take effect until the Task is garbage-collected, at some perhaps-random time later.
This method lets you crash the process now -- see this answer.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.