Should I null-protect my F# code from C# calls - c#

I am writing a library in F#, with some interfaces and base classes that are publicly visible. Generally, I avoid specifying [<AllowNullLiteral>] on my custom types as this complicates the validation logic in my F# code (see this nice post for goods and bads of null handing in F# to get a picture ), and also, F# does not initially allow null for F# types. So, I validate for nulls only for types that accept the null value as valid.
However, an issues arises when my library is used from another .NET language, such as C#. More particularly, I worry how should I implement methods that accept F#-declared interfaces, when called by C# code. The interface types are nullable in C#, and I suspect that there will not be an issue for the C# code to pass a null to my F# method.
I fear that the caller would crash and burn with a NPE, and the problem is that I am not even allowed to properly handle that in the F# code -- say throw an ArgumentNullException -- because the respective interface lack the AllowNullLiteral attribute. I fear that I would have to use the attribute and add the related null-checking logic in my F# code to eventually prevent such a disaster.
Are my fears reasonable? I am a little confused as I initially attempt to stick to the good F# practices and avoid null as much as possible. How does this change if among my goals is to allow C# code to subclass and implement the interfaces I created in F#? Do I have to allow nulls for all non-value types coming from my F# code if they are public and can be accessed from any CLR language? Is there a best practice or a good advice to follow?

There are two basic approaches you can take:
Document in your API design that passing null to your library is not allowed, and that the calling code is responsible for ensuring that your library never receives a null. Then ignore the problem, and when your code throws NullReferenceExceptions and users complain about it, point them to the documentation.
Assume that the input your library receives from "outside" cannot be trusted, and put a validation layer around the "outside-facing" edge of your library. That validation layer would be responsible for checking for null and throwing ArgumentNullExceptions. (And pointing to the documentation that says "No nulls allowed" in the exception message).
As you can probably guess, I favor approach #2, even though it takes more time. But you can usually make a single function, used everywhere, to do that for you:
let nullArg name message =
raise new System.ArgumentNullException(name, message)
let guardAgainstNull value name =
if isNull value then nullArg name "Nulls not allowed in Foo library functions"
let libraryFunc a b c =
guardAgainstNull a nameof(a)
guardAgainstNull b nameof(b)
guardAgainstNull c nameof(c)
// Do your function's work here
Or, if you have a more complicated data structure that you have to inspect for internal nulls, then treat it like a validation problem in HTML forms. Your validation functions will either throw an exception, or else they will return valid data structures. So the rest of your library can ignore nulls completely, and be written in a nice, simple, idiomatic-F# way. And your validation functions can handle the interface between your domain functions and the untrusted "outside world", just as you would with user input in an HTML form.
Update: See also the advice given near the bottom of https://fsharpforfunandprofit.com/posts/the-option-type/ (in the "F# and null" section), where Scott Wlaschin writes, "As a general rule, nulls are never created in "pure" F#, but only by interacting with the .NET libraries or other external systems. [...] In these cases, it is good practice to immediately check for nulls and convert them into an option type!" Your library code, which expects to get data from other .NET libraries, would be in a similar situation. If you want to allow nulls, you'd convert them to the None value of an Option type. If you want to disallow them and throw ArgumentNullExceptions when you get passed a null, you'd also do that at the boundaries of your library.

Based on #rmunn's advice I ended up creating a simple null2option function:
let null2option arg = if obj.ReferenceEquals(arg, null) then None else Some arg
It solved most of my cases alone. If I expect a null argument to be coming for the calling code I would simply use this idioms:
match null2option arg with | None -> nullArg "arg" "Message" | _ -> ()

Related

Reuse NUnit's IConstraint in validation

NUnit has an IConstraint interface (documentation here and code here). It seems to me that reusing this type in my core project for validation purposes makes sense.
Are there unforseen side effects I have not yet recognized? Would you reuse the IConstraint type in your core project? why/why not?
This is more an opinion-based question. Beside that, there are two issues that come to my mind.
Firstly, you can write something like Assert.That(foo, Is.EqualTo(bar)), which internally invokes an EqualConstraint. To have your custom constraint to be usable like this, you have to "overload" Is, so you can have Assert.That(foo, Is.AsGoodAs(bar)) (where AsGoodAs is your custom constraint invocation). See NUnit's Custom Constraints documentation for details. With this you will have two classes with the name Is (yours and the NUnit one) and you will also call the default static methods like EqualTo via a derived type. Resharper will warn you about this.
Secondly, writing intelligent assertion failure texts (like expected "this", but was "that") can be a bit tricky to figure out. You will certainly spend some time on this until you get what you want. Of course this depends on your personal feelings about nice texts.

Possible to force a developer to handle specific exceptions?

Essentially, I'd like a special form of an Interface for Exceptions that requires anyone who uses my object to wrap it with specific catch implementations.
Example
I have an object that sends data to another host. I expect that the realistic implementations will require a way to handle the following exceptions:
HostNotFoundException
InvalidUsernameException
AccountExpiredException
DataAlreadyExistsException
Similar to how an Interface or an Abstract class is used to force the creation of methods and properties in derived classes, is there any way I can force a consumer to implement exception handling the way I expect?
On a similar note, I'd also like to force methods (created via Interface or Abstract) to be able to generate certain exceptions. Sure they may be NotImplemented, but I want to tell that developer (who doesn't read documentation) that they should be considered.
Goal
The benefit of this exception checking is to enable more robust error handling. This would be accomplished by the consumer using the object, and the object creator.
Solution?
The only approach I can think of is T4 templates, but that isn't as complete of a solution as I would like. I'd love to see this implemented in the language itself.
You can't force a programmer to do anything except jump through hoops. For example, let's say you have some method called Frob that does something, and can throw FrobinatorException. You expect programmers to write:
try
{
var result = Frob(foo);
}
catch (FrobinatorException)
{
// handle exception here
}
But you find that they don't. So force them to by defining Frob like this:
public FrobResult Frob(FrobyThing foo, Action FrobinatorExceptionHandler);
And then programmers have to write something like:
var result = Frob(
foo,
() => { /* handle FrobinatorException here */; });
Programmers will grumble about having to do that and they'll end up writing this:
var StupidExceptionHandler = new Action(() => {});
var result = Frob(foo, StupidExceptionHandler);
And now you're worse off than you were because the exceptions are being swallowed, which hides bugs. It's better if the programmer just ignores the exception handling altogether. At least that way you know when an error occurs.
There's simply no way to force good exception handling. At least, not in C# as it currently exists. You can make it more convenient to handle exceptions, but doing so often makes it easier to hide exceptions by swallowing them.
If I'm reading your question correctly, it sounds like you're kind of looking for checked exceptions. There's an interesting article from much earlier in the development of C# that discusses this, actually.
From a design perspective, I don't really see how you could "force" the consumer of an interface to handle your exceptions. After all, how would you know it's being handled? Does the method which calls your interface need to wrap that call in a try/catch directly? Or would it be sufficient for the method which calls that method to do so? Or for a global exception handler for the application to do so? It should really be up to the consumer of the interface to determine how/when/where to handle exceptions.
The best approach you can take is to document the potential exceptions in the intellisense comments on the interface. But this brings up an interesting problem which you also mention (if I'm reading you correctly). The problem here is that the documentation is on the interface, not on the implementation. What if one or more implementations throw different exceptions than those which are expected by the interface?
In general, I think the balance to be reached here is still to document potential exceptions on the interface. The four examples you give sound like safe assumptions for an interface to make about its implementations. But that depends on what the interface method is accepting as arguments.
For example, if the whole concept of a "host" or a "username" is entirely encapsulated within the implementation (such as hitting a web service from within some kind of service interface, which could just as easily hit a database or some other source of record in other/later implementations) then exceptions about those pieces of data wouldn't make sense at the interface level. It would be better in that case to create an exception type like "DomainException" or "ImplementationException" or "DataRetrievalException" or something like that and just putting the internal implementation details inside of the exception.
To get back to the main point, however... From the perspective of your interfaces and your objects, you shouldn't be concerned with how exceptions are handled by consumers. All you should do is internally handle any exceptions that make sense to internally handle and throw exceptions that make sense to throw. Anything beyond that creates coupling between the consumer and the component.
Although I partly sympathies with your goal of better error handling, I feel that if you forced consumers of your code to handle exceptions correctly then your colleagues would murder you within 20 minutes of checking it in.
Due to C#'s lack of checked exception you're reduced to documenting your code so consumers know what to expect and under what conditions to expect them.
On a side note there is a great plugin for ReSharper called Exceptional that will identify places in your code where you have either not handled a possible exception, or not documented it so callers may do so instead.

How is validating a parameter "high up" on the callstack done?

I've been reading the Framework Design Guidelines book, a book on designing frameworks in .NET, with excerpts from the framework designers on the decisions they made regarding each section (E.g. parameter design, exception handling, etc).
One of the tips, under parameter design, is to validate parameters as "high up on the callstack" as possible. This is because the work here is not as expensive as it is low on the callstack, so a performance penalty is not as costly when validating high up in the callstack.
Does this mean that when I pass parameters into a method or constructor, I validate them before doing anything else, or do I do so just before using the parameters (So there could be 100 lines of code between the parameter in the definition and the usage of the parameter)?
Thanks
Prefer to validate in the public API of an assembly. That means the public methods of the public classes.
Prefer to validate in the public methods of your classes. So if your class requires a non-null pointer to another object to work correctly, you could enforce this by requiring it as a constructor parameter and throwing an exception when a null pointer is supplied. From that point forward none of the member methods need to test if the pointer is non-null.
The idea is that no user can break your class (or assembly) by feeding invalid data. Of course the code won't work either way, but if you fail in a controlled way, it's more clear to the calling code what is wrong, and you won't have unpleasant side effects like resource leaks (or worse).
Failing fast is generally a good practice. All arguments passed to a method should be validated as soon as possible, without any unnecessary calculations being performed before, because that eases debugging and allows for easier recovery from the faulty situation.
In respect to input validation I consider performance a minor concern.
I haven't read the specific guidelines you mention, but I expect they're talking about the case where method A calls method B, which calls method C and a parameter value gets passed through all three calls. It's better to validate that parameter at the start of method A than somewhere in the middle of method C because if it's invalid, then you get to skip all of the stuff that happens in A and B and the start of C. This is especially true if B or C are called inside loops because then the low-level validation would occur many times instead of just once at the start of A.
Of course you have to balance that with how complicated the validation of the parameter is. It may just be way easier to understand if you validate it in the same place you use it.
Validate them as early as you can in your method!
What I believe this to mean is that you should validate data that could be invalid as soon as you receive it. Once it has been validated then no more checks are needed. If you wait until the bottom of the call stack then you may have to validate many times because your call tree may have many branches.
I would whole-heartedly agree with this advice, but not on the grounds of performance. By validating at the point of entry you are in a much better position to give a meaningful error message to the client who supplied the data. And by reducing the amount of validation that you do, you will end up with much clearer code.

Pros/cons of different methods for testing preconditions?

Off the top of my head, I can think of 4 ways to check for null arguments:
Debug.Assert(context != null);
Contract.Assert(context != null);
Contract.Requires(context != null);
if (context == null) throw new ArgumentNullException("context");
I've always used the last method, but I just saw a code snippet that used Contract.Requires, which I'm unfamiliar with. What are the advantages/disadvantages of each method? Are there other ways?
In VS2010 w/ Resharper,
Contract.Assert warns me that the expression is always true (how it knows, I'm not quite sure... can't HttpContext be null?),
Contract.Requires gets faded out and it tells me the compiler won't invoke the method (I assume because of the former reason, it will never be null), and
if I change the last method to context != null all the code following gets faded out and it tells me the code is heuristically unreachable.
So, it seems the last 3 methods have some kind of intelligence built into the VS static checker, and Debug.Assert is just dumb.
My guess is that there is a contract applied to the interface IHttpHandler.ProcessRequest which requires that context != null. Interface contracts are inherited by their implementers, so you don't need to repeat the Requires. In fact, you are not allowed to add additional Requires statements, as you are limited to the requirements associated with the interface contract.
I think it's important to make a distinction between specifying a contractual obligation vs. simply performing a null check. You can implement a null check and throw an exception at runtime, as a way to inform the developer that they are using your API correctly. A Contract expression, on the other hand, is really a form of metadata, which can be interpreted by the contract rewriter (to introduce the runtime exceptions that were previously implemented manually), but also by the static analyzer, which can use them to reason about the static correctness of your application.
That said, if you're working in an environment where you're actively using Code Contracts and static analysis, then it's definitely preferable to put the assertions in Contract form, to take advantage of the static analysis. Even if you're not using the static analysis, you can still leave the door open for later benefits by using contracts. The main thing to watch out for is whether you've configured your projects to perform the rewriting, as otherwise the contracts will not result in runtime exceptions as you might expect.
To elaborate on what the commenters have said, the difference between Assert, Assume and Requires is:
A Contract.Assert expression is transformed into an assertion by the contract rewriter and the static analyzer attempts to prove the expression based on its existing evidence. If it can't be proven, you'll get a static analysis warning.
A Contract.Assume expression is ignored by the contract rewriter (as far as I know), but is interpreted by the static analyzer as a new piece of evidence it can take into account in its static analysis. Contract.Assume is used to 'fill the gaps' in the static analysis, either where it lacks the sophistication to make the necessary inferences or when inter-operating with code that has not been decorated with Contracts, so that you can Assume, for instance, that a particular function call returns a non-null result.
Contract.Requires are conditions that must always be true when your method is called. They can be constraints on parameters to the method (which are the most typical) and they may also be constraints on publicly visible states of the object (For instance, you might only allow the method to be called if Initialized is True.) These kinds of constraints push the users of your class to either check Initialized when using the object (and presumably handle the error appropriately if it's not) or create their own constraints and/or class invariants to clarify that Initialization has, indeed, happened.
The first method is appropriate for testing for a null condition that should never exist. That is, use it during development to ensure it doesn't unexpectedly get set to null. Since it doesn't do any error handling, this is not appropriate for handling null conditions in your released product.
I would say the 2nd and 3rd versions are similar in that they don't handle the issue in any way.
In general, if there's a possibility that the variable could actually be null in the final product, the last version is the one to use. You could do special handling there, or just raise an exception as you've done.

C# / Object oriented design - maintaining valid object state

When designing a class, should logic to maintain valid state be incorporated in the class or outside of it ? That is, should properties throw exceptions on invalid states (i.e. value out of range, etc.), or should this validation be performed when the instance of the class is being constructed/modified ?
It belongs in the class. Nothing but the class itself (and any helpers it delegates to) should know, or be concerned with, the rules that determine valid or invalid state.
Yes, properties should check on valid/invalid values when being set. That's what it's for.
It should be impossible to put a class into an invalid state, regardless of the code outside it. That should make it clear.
On the other hand, the code outside it is still responsible for using the class correctly, so frequently it will make sense to check twice. The class's methods may throw an ArgumentException if passed something they don't like, and the calling code should ensure that this doesn't happen by having the right logic in place to validate input, etc.
There are also more complex cases where there are different "levels" of client involved in a system. An example is an OS - an application runs in "User mode" and ought to be incapable of putting the OS into an invalid state. But a driver runs in "Kernel mode" and is perfectly capable of corrupting the OS state, because it is part of a team that is responsible for implementing the services used by the applications.
This kind of dual-level arrangement can occur in object models; there can be "exterior" clients of the model that only see valid states, and "interior" clients (plug-ins, extensions, add-ons) which have to be able to see what would otherwise be regarded as "invalid" states, because they have a role to play in implementing state transitions. The definition of invalid/valid is different depending on the role being played by the client.
Generally this belongs in the class itself, but to some extent it has to also depend on your definition of 'valid'. For example, consider the System.IO.FileInfo class. Is it valid if it refers to file that no longer exists? How would it know?
I would agree with #Joel. Typcially this would be found in the class. However, I would not have the property accessors implement the validation logic. Rather I'd recommend a validation method for the persistence layer to call when the object is being persisted. This allows you to localize the validation logic in a single place and make different choices for valid/invalid based on the persistence operation being performed. If, for example, you are planning to delete an object from the database, do you care that some of its properties are invalid? Probably not -- as long as the ID and row versions are the same as those in the database, you just go ahead and delete it. Likewise, you may have different rules for inserts and updates, e.g., some fields may be null on insert, but required on update.
It depends.
If the validation is simple, and can be checked using only information contained in the class, then most of the time it's worth while to add the state checks to the class.
There are sometimes, however, where it's not really possible or desirable to do so.
A great example is a compiler. Checking the state of abstract syntax trees (ASTs) to make sure a program is valid is usually not done by either property setters or constructors. Instead, the validation is usually done by a tree visitor, or a series of mutually recursive methods in some sort of "semantic analysis class". In either case, however, properties are validated long after their values are set.
Also, with objects used to old UI state it's usually a bad idea (from a usability perspective) to throw exceptions when invalid values are set. This is particularly true for apps that use WPF data binding. In that case you want to display some sort of modeless feedback to the customer rather than throwing an exception.
The class really should maintain valid values. It shouldn't matter if these are entered through the constructor or through properties. Both should reject invalid values. If both a constructor parameter and a property require the same validation, you can either use a common private method to validate the value for both the property and the constructor or you can do the validation in the property and use the property inside your constructor when setting the local variables. I would recommend using a common validation method, personally.
Your class should throw an exception if it receives invalid values. All in all, good design can help reduce the chances of this happening.
The valid state in a class is best express with the concept of class invariant. It is a boolean expression which must hold true for the objects of that class to be valid.
The Design by Contract approach suggests that you, as a developer of class C, should guarantee that the class invariant holds:
After construction
After a call to a public method
This will imply that, since the object is encapsulated (noone can modify it except via calls to public methods), the invariant will also be satisfied at entering any public method, or at entering the destructor (in languages with destructors), if any.
Each public method states preconditions that the caller must satisfy, and postconditions that will be satisfied by the class at the end of every public method. Violating a precondition effectively violates the contract of the class, so that it can still be correct but it doesn't have to behave in any particular way, nor maintain the invariant, if it is called with a precondition violation. A class that fulfills its contract in the absence of caller violations can be said to be correct.
A concept different from correct but complementary to it (and certainly belonging to the multiple factors of software quality) is that of robust. In our context, a robust class will detect when one of its methods is called without fulfilling the method preconditions. In such cases, an assertion violation exception will typically be thrown, so that the caller knows that he blew it.
So, answering your question, both the class and its caller have obligations as part of the class contract. A robust class will detect contract violations and spit. A correct caller will not violate the contract.
Classes belonging to the public interface of a code library should be compiled as robust, while inner classes could be tested as robust but then run in the released product as just correct, without the precondition checks on. This depends on a number of things and was discussed elsewhere.

Categories