How can I have unit tests for undefined behaviour? - c#

I have a set of related classes which take a variety of inputs and produce expected outputs. These are ideal low-level candidates for unit testing, and all works well for valid inputs.
The difficulty comes with invalid inputs, particularly when attempting to remove items from collections which were not added, for which we currently have undefined behaviour: some of the classes will simply produce a rubbish result (GIGO1 wins) but some will throw an exception (perhaps a KeyNotFoundException).
Given that there is no valid, consistent behaviour for these invalid inputs (it means that something has been mis-configured elsewhere and no sensible results can be produced) and that our API explicitly states that the caller must only remove something which they added, how can this be reflected in our unit tests?
It clearly cannot be a "test", as there is no defined behaviour (simply recording our current behaviour will be fragile if our implementation of any of them should change in the future), but I want to have a way to preclude the possibility of some zealous team member adding one in the future without being aware of the potential problems.
The unit test method for one of them currently looks something like this:
[TestCase("1", "2", "1", ExpectedResult = "|2|")]
[TestCase("1", "2", "2", ExpectedResult = "|1|")]
public object InsertTwoDeleteOne(string insertedValue1,
string insertedValue2,
string deletedValue1)
{
// Apply tests here
}
The two ways I can see to deal with this are either to add explicit code in the test method along the lines of:
if (deletedValue1 != insertedValue1 &&
deletedValue1 != insertedValue2)
{
Assert.Fail("Invalid inputs");
}
but that is "out of line" and less easy to see among the other test cases or else by adding a TestCase which is purely for documentation saying "don't run this", like this:
[TestCase("1", "2", "3", Ignore = true, Reason = "Invalid inputs")]
but that yields a "Skipped test" result which is untidy.
Is there anything better?
[Edit] The API in question is a public interface, and we have a number of implementations of it in our product: it is these implementations which I am in the process of updating the tests for. However, installations are free to write their own implementations as plugins (by creating their own assembly, implementing their own objects, and instantiating them through configuration), so our framework will ensure that the data is valid before calling them.
In our current model, it is unlikely that installations would re-use the objects and call them from their own code.
The reason why we have chosen not to concern ourselves with validating the data in each object is twofold:
It will, in our default product configuration, always receive data which has already been validated by the caller.
Performance: we are storing a lot of data here - currently limited to 100,000 rows of data (one of our objects per field in the row, so perhaps between 20 and 50 objects in total) but our customers are already asking about increasing that limit to 1,000,000 - so where we already store a dictionary of the data in our calling code so we can validate it there we would have to store a duplicate of it within these objects. That's between 20MB and 50MB if they are simply doubles on current limits, or 200MB - 500MB on the projected future needs.
That's a massive overhead for something we don't currently need to do!
1 Warning: Some people might prefer not to google for that in an office!

It might depend on the project's criticality and quality standards, but my gut feeling is that you should normally not let "undefined behaviour" creep into your system, especially if "rubbish results" are produced.
You say you fear that a zealous team member adds an inconsistent test to the suite. You might be assuming that team members will always add tests before writing production code and thus come across your "parapet" test, but what if they don't ? Wouldn't the primary safety measure be to prevent them from using the API the wrong way in the first place (i.e. handling edge cases properly) ?

Related

Applying business rules to an object in a SOLID way

I've been driving myself crazy for hours trying to figure this one out, and I'm not moving anywhere with it.
I'm creating a checkout till for a cashier, specifically I need to sum the items, then apply the promotional discounts. I'm trying to do it without violating any design principles (impossible, I know, I can let things slide when it makes sense).
Promotional discounts could be anything, from a black friday deal flat discount, to 'Orders over £100 save 10%' to '3 for 2 on these items', or 'Buy at least two cans of coke, and the price for them all drops to £0.50!'
I cannot see how to fit the promotional deals in. Each may require a different set of data from different locations. For instance one of the big problems being the '3 for 2' deal. Getting access to the items in the Checkout has been a plague on my mind.
So far, my best approach has been to use the Decorator pattern. Wrap the checkout up in a bunch of promotional deals when the price is calculated, as each decorator holds an instance of the checkout, we'll have access to the orginal checkout with the list of items.
In the future, the only thing I'd need to do is write the new rule, add it to the factory, and update any DB data which is perfect, the minimal change.
This kind of works. I can justify it in my head that it's still a checkout, and therefore all the rules being able to access the checkout make sense, gives me a nice way to chain the discount. But there is a problem, in that I'm sure I shouldn't be using it the way I'm suggesting. For instance, if one of the promotions wears off, you shouldn't really 'unwrap' it, and realistically while it's nice to be able to add promotions dynamically to extend an instance, it's not necessary.
I've read through more design patterns but can't seem to find anything that applies. I saw the following article:
https://levelup.gitconnected.com/rules-design-pattern-in-c-6c62f0e20ee0
This is basically what I want to do, but the implementation feels clumsy to me.
public bool IsValid(FileInfo fileInfo)
{
var rules = new List<IFileValidationRule> { new FileExtensionRule(new string[] { "txt", "html" }) };
if (AdminConfig.CheckFileSize)
{
rules.Add(new FileSizeRule("txt", 5 * 1024 * 1024));
rules.Add(new FileSizeRule("html", 10 * 1024 * 1024));
}
if (User.Status != UserStatus.Premium)
{
rules.Add(new MaxFileLengthRule(50));
}
bool isValid = rules.All(rule => rule.IsValid(fileInfo));
return isValid;
}
Specifically this part, seems to violate a few key principles, Open-Closed principle, Dependency Inversion, etc.
The other big problem I can't wrap my head arround is as below:
Imagine for the above example, a new rule needs to be added that reads the file data, check if there are any bad characters in there, doesn't matter what.
Implementing this is easy, you inject the file or the file data, whatever into the 'FileValidator' class, you then instantiate your rule and pass the file data into it. You then run the rule and return the success, great! But is this ok?
Reading this says no: http://wiki.c2.com/?TellDontAsk
"Tell, don't ask" - It is okay to use accessors to get the state of an object, as long as you don't use the result to make decisions outside the object.
That would be exactly what the code is doing! I guess the alternative to this is to update the 'FileData' object to essentially take a list of bad characters, check the file data, return false to the rule, which then fails the whole process, but this would start to throw a bunch more rules out the door. You're now breaking the Open-Closed principle, Single Responsibility principle, and it feels like you're building a rod for your own back, adding these custom methods for singular rules, bloating your object. (The link does discuss how you can pass a function into the method, which is pretty nice, but still not perfect, at the end of the day, aren't you just indirectly handing control to the caller?)
The above alone wouldn't be enough to stop me, but I'm struggling to justify making a private set of items public so one rule out of the bunch can make use of that data.
I'm in OOP recursion, tumbling towards a stack overflow. Can anyone pull me out and help me consilidate my thoughts? None of the design patterns seem to work but I'm sure this is a basic problem solved many times in the past. What am I missing?

StoryQ BDD, Given or When without a body

I would like to do a very simple test for the Constructor of my class,
[Test]
public void InitLensShadingPluginTest()
{
_lensShadingStory.WithScenario("Init Lens Shading plug-in")
.Given(InitLensShadingPlugin)
.When(Nothing)
.Then(PluginIsCreated)
.Execute();
}
this can be in Given or When it... I think it should be in When() but it doesn't really matter.
private void InitLensShadingPlugin()
{
_plugin = new LSCPlugin(_imagesDatabaseProvider, n_iExternalToolImageViewerControl);
}
Since the Constructor is the one being tested, I do not have anything to do inside the When() statement,
And in Then() I assert about the plugin creation.
private void PluginIsCreated()
{
Assert.NotNull(_plugin);
}
my question is about StoryQ, since I do not want to do anything inside When()
i tried to use When(()=>{}) however this is not supported by storyQ,
this means I need to implement something like
private void Nothing()
{
}
and call When(Nothing)
is there a better practice?
It's strange that StoryQ doesn't support missing steps; your scenario is actually pretty typical of other examples I've used of starting applications, games etc. up:
Given the chess program is running
Then the pieces should be in the starting positions
for instance. So your desire to use a condition followed by an outcome is perfectly valid.
Looking at StoryQ's API, it doesn't look as if it supports these empty steps. You could always make your own method and call both the Given and When steps inside it, returning the operation from the When:
.GivenIStartedWith(InitLensShadingPlugin)
.Then(PluginIsCreated)
If that seems too clunky, I'd do as you suggested and move the Given to a When, initializing the Given with an empty method with a more meaningful name instead:
Given(NothingIsInitializedYet)
.When(InitLensShadingPlugin)
.Then(PluginIsCreated)
Either of these will solve your problem.
However, if all you're testing is a class, rather than an entire application, using StoryQ is probably overkill. The natural-language BDD frameworks like StoryQ, Cucumber, JBehave etc. are intended to help business and development teams collaborate in their exploration of requirements. They incur significant setup and maintenance overhead, so if the audience of your class-level scenarios / examples is technical, there may be an easier way.
For class-level examples of behaviour I would just go with a plain unit testing tool like NUnit or MSpec. I like using NUnit and putting my "Given / When / Then" in comments:
// Given I initialized the lens shading plugin on startup
_plugin = new LSCPlugin(_imagesDatabaseProvider, n_iExternalToolImageViewerControl);
// Then the plugin should have been created
Assert.NotNull(_plugin);
Steps at a class level aren't reused in the same way they are in full-system scenarios, because classes have much smaller, more encapsulated responsibilities; and developers benefit from reading the code rather than having it hidden away in the step definitions.
Your Given/When/Then comments here might still echo scenarios at a higher level, if the class is directly driving the functionality that the user sees.
Normally for full-system scenarios we would derive the steps from conversations with the "3 amigos":
a business representative (PO, SME, someone who has a problem to be solved)
a tester (who spots scenarios we might otherwise miss)
the dev (who's going to solve the problem).
There might be a pair of devs. UI designers can get involved if they want to. Matt Wynne says it's "3 amigos, where 3 is any number between 3 and 7". The best time to have the conversations is right before the devs pick up the work to begin coding it.
However, if you're working on your own, whether it's a toy or a real application, you might benefit just from having imaginary conversations. I use a pixie called Thistle for mine.

Unit Testing - Algorithm or Sample based?

Say I'm trying to test a simple Set class
public IntSet : IEnumerable<int>
{
Add(int i) {...}
//IEnumerable implementation...
}
And suppose I'm trying to test that no duplicate values can exist in the set. My first option is to insert some sample data into the set, and test for duplicates using my knowledge of the data I used, for example:
//OPTION 1
void InsertDuplicateValues_OnlyOneInstancePerValueShouldBeInTheSet()
{
var set = new IntSet();
//3 will be added 3 times
var values = new List<int> {1, 2, 3, 3, 3, 4, 5};
foreach (int i in values)
set.Add(i);
//I know 3 is the only candidate to appear multiple times
int counter = 0;
foreach (int i in set)
if (i == 3) counter++;
Assert.AreEqual(1, counter);
}
My second option is to test for my condition generically:
//OPTION 2
void InsertDuplicateValues_OnlyOneInstancePerValueShouldBeInTheSet()
{
var set = new IntSet();
//The following could even be a list of random numbers with a duplicate
var values = new List<int> { 1, 2, 3, 3, 3, 4, 5};
foreach (int i in values)
set.Add(i);
//I am not using my prior knowledge of the sample data
//the following line would work for any data
CollectionAssert.AreEquivalent(new HashSet<int>(values), set);
}
Of course, in this example, I conveniently have a set implementation to check against, as well as code to compare collections (CollectionAssert). But what if I didn't have either ? This code would be definitely more complicated than that of the previous option! And this is the situation when you are testing your real life custom business logic.
Granted, testing for expected conditions generically covers more cases - but it becomes very similar to implementing the logic again (which is both tedious and useless - you can't use the same code to check itself!). Basically I'm asking whether my tests should look like "insert 1, 2, 3 then check something about 3" or "insert 1, 2, 3 and check for something in general"
EDIT - To help me understand, please state in your answer if you prefer OPTION 1 or OPTION 2 (or neither, or that it depends on the case, etc). Just to clarify, it's pretty clear that in this case (IntSet), option 2 is better in all aspects. However, my question pertains to the cases where you don't have an alternative implementation to check against, so the code in option 2 would be definitely more complicated than option 1.
I usually prefer to test use cases one by one - this works nicely the TDD manner: "code a little, test a little". Of course, after a while my test cases start to contain duplicated code, so I refactor. The actual method of verifying the results does not matter to me as long as it is working for sure, and doesn't get into the way of testing itself. So if there is a "reference implementation" to test against, it is all the better.
An important thing, however, is that the tests should be reproducable and it should be clear what each test method is actually testing. To me, inserting random values into a collection is neither - of course if there is a huge amount of data/use cases involved, every tool or approach is welcome which helps to handle the situation better without lulling me into a false sense of security.
If you have an alternative implementation, then definitely use it.
In some situations, you can avoid reimplementing an alternative implementation, but still test the functionality in general. For instance, in your example, you could first generate a set of unique values, and then randomly duplicate elements before passing it to your implementation. You can test that the output is equivalent to your starting vector, without having to reimplement the sort.
I try to take this approach whenever it's feasible.
Update: Essentially, I'm advocating the OP's "Option #2". With this approach, there's precisely one output vector that will allow the test to pass. With "Option #1", there's an infinite number of acceptable output vectors (it's testing an invariant, but it's not testing for any relationship to the input data).
Basically I'm asking whether my tests
should look like "insert 1, 2, 3 then
check something about 3" or "insert 1,
2, 3 and check for something in
general"
I am not a TDD purist but it seems people are saying that the test should break if the condition that you are trying to test is broken. e.i. if you implement a test which checks a general condition, then your test will break in more than a few cases so it is not optimal.
If I am testing for not being able to add duplicates, then I would only test that. So in this case, I would say I would go with first.
(Update)
OK, now you have updated the code and I need to update my answer.
Which one would I choose? It depends on the implementation of CollectionAssert.AreEquivalent(new HashSet<int>(values), set);. For example, IEnumerable<T> does keep the order while HashSet<T> does not so even this could break the test while it should not. For me first is still superior.
According to xUnit Test Patterns, it's usually more favorable to test the state of the system under test. If you want to test its behavior and the way in which the algorithm operates, you can use Mock Object Testing.
That being said, both of your tests are known as Data Driven Tests. What is usually acceptable is to use as much knowledge as the API provides. Remember, those tests also serve as documentation for your software. Therefore it's critical to keep them as simple as possible - whatever that means for your specific case.
The first step should be demonstrating the correctness of the Add method using an activity diagram/flowchart. The next step would be to formally prove the correctness of the Add method if you have the time. Then testing with a specific sets of data where you expect duplications and non-duplications (i.e. some sets of data have duplications and some sets don't and you are seeing if the data structure performs correctly - it's important to have cases that should succeed (no duplicates) and to check that they were added correctly to the set rather than just testing for failure cases (cases in which duplicates should be found)). And finally checking generically. Even though it is now somewhat deprecated I would suggest constructing data to fully exercise every execution path in the method being tested. At any point you made a code change then begin all over applying regression testing.
I would opt for the algorithmic approach, but preferably without relying on an alternate implementation such as HashSet. You're actually testing for more than just "no duplicates" with the HashSet match. For example, the test will fail if any items didn't make it into the resulting set, and you presumably have other tests that check for this.
A cleaner verification of the "no duplicates" expectation might be something like the following:
Assert.AreEqual(values.Distinct().Count(), set.Count());

Is it a good idea to create a custom type for the primary key of each data table?

We have a lot of code that passes about “Ids” of data rows; these are mostly ints or guids. I could make this code safer by creating a different struct for the id of each database table. Then the type checker will help to find cases when the wrong ID is passed.
E.g the Person table has a column calls PersonId and we have code like:
DeletePerson(int personId)
DeleteCar(int carId)
Would it be better to have:
struct PersonId
{
private int id;
// GetHashCode etc....
}
DeletePerson(PersionId persionId)
DeleteCar(CarId carId)
Has anyone got real life experience
of dong this?
Is it worth the overhead?
Or more pain then it is worth?
(It would also make it easier to change the data type in the database of the primary key, that is way I thought of this ideal in the first place)
Please don’t say use an ORM some other big change to the system design as I know an ORM would be a better option, but that is not under my power at present. However I can make minor changes like the above to the module I am working on at present.
Update:
Note this is not a web application and the Ids are kept in memory and passed about with WCF, so there is no conversion to/from strings at the edge. There is no reason that the WCF interface can’t use the PersonId type etc. The PersonsId type etc could even be used in the WPF/Winforms UI code.
The only inherently "untyped" bit of the system is the database.
This seems to be down to the cost/benefit of spending time writing code that the compiler can check better, or spending the time writing more unit tests. I am coming down more on the side of spending the time on testing, as I would like to see at least some unit tests in the code base.
It's hard to see how it could be worth it: I recommend doing it only as a last resort and only if people are actually mixing identifiers during development or reporting difficulty keeping them straight.
In web applications in particular it won't even offer the safety you're hoping for: typically you'll be converting strings into integers anyway. There are just too many cases where you'll find yourself writing silly code like this:
int personId;
if (Int32.TryParse(Request["personId"], out personId)) {
this.person = this.PersonRepository.Get(new PersonId(personId));
}
Dealing with complex state in memory certainly improves the case for strongly-typed IDs, but I think Arthur's idea is even better: to avoid confusion, demand an entity instance instead of an identifier. In some situations, performance and memory considerations could make that impractical, but even those should be rare enough that code review would be just as effective without the negative side-effects (quite the reverse!).
I've worked on a system that did this, and it didn't really provide any value. We didn't have ambiguities like the ones you're describing, and in terms of future-proofing, it made it slightly harder to implement new features without any payoff. (No ID's data type changed in two years, at any rate - it's could certainly happen at some point, but as far as I know, the return on investment for that is currently negative.)
I wouldn't make a special id for this. This is mostly a testing issue. You can test the code and make sure it does what it is supposed to.
You can create a standard way of doing things in your system than help future maintenance (similar to what you mention) by passing in the whole object to be manipulated. Of course, if you named your parameter (int personID) and had documentation then any non malicious programmer should be able to use the code effectively when calling that method. Passing a whole object will do that type matching that you are looking for and that should be enough of a standardized way.
I just see having a special structure made to guard against this as adding more work for little benefit. Even if you did this, someone could come along and find a convenient way to make a 'helper' method and bypass whatever structure you put in place anyway so it really isn't a guarantee.
You can just opt for GUIDs, like you suggested yourself. Then, you won't have to worry about passing a person ID of "42" to DeleteCar() and accidentally delete the car with ID of 42. GUIDs are unique; if you pass a person GUID to DeleteCar in your code because of a programming typo, that GUID will not be a PK of any car in the database.
You could create a simple Id class which can help differentiate in code between the two:
public class Id<T>
{
private int RawValue
{
get;
set;
}
public Id(int value)
{
this.RawValue = value;
}
public static explicit operator int (Id<T> id) { return id.RawValue; }
// this cast is optional and can be excluded for further strictness
public static implicit operator Id<T> (int value) { return new Id(value); }
}
Used like so:
class SomeClass
{
public Id<Person> PersonId { get; set; }
public Id<Car> CarId { get; set; }
}
Assuming your values would only be retrieved from the database, unless you explicitly cast the value to an integer, it is not possible to use the two in each other's place.
I don't see much value in custom checking in this case. You might want to beef up your testing suite to check that two things are happening:
Your data access code always works as you expect (i.e., you aren't loading inconsistent Key information into your classes and getting misuse because of that).
That your "round trip" code is working as expected (i.e., that loading a record, making a change and saving it back isn't somehow corrupting your business logic objects).
Having a data access (and business logic) layer you can trust is crucial to being able to address the bigger pictures problems you will encounter attempting to implement the actual business requirements. If your data layer is unreliable you will be spending a lot of effort tracking (or worse, working around) problems at that level that surface when you put load on the subsystem.
If instead your data access code is robust in the face of incorrect usage (what your test suite should be proving to you) then you can relax a bit on the higher levels and trust they will throw exceptions (or however you are dealing with it) when abused.
The reason you hear people suggesting an ORM is that many of these issues are dealt with in a reliable way by such tools. If your implementation is far enough along that such a switch would be painful, just keep in mind that your low level data access layer needs to be as robust as an good ORM if you really want to be able to trust (and thus forget about to a certain extent) your data access.
Instead of custom validation, your testing suite could inject code (via dependency injection) that does robust tests of your Keys (hitting the database to verify each change) as the tests run and that injects production code that omits or restricts such tests for performance reasons. Your data layer will throw errors on failed keys (if you have your foreign keys set up correctly there) so you should also be able to handle those exceptions.
My gut says this just isn't worth the hassle. My first question to you would be whether you actually have found bugs where the wrong int was being passed (a Car ID instead of a Person ID in your example). If so, it is probably more of a case of worse overall architecture in that your Domain objects have too much coupling, and are passing too many arguments around in method parameters rather than acting on internal variables.

Is it true I should not do "long running" things in a property accessor?

And if so, why?
and what constitutes "long running"?
Doing magic in a property accessor seems like my prerogative as a class designer. I always thought that is why the designers of C# put those things in there - so I could do what I want.
Of course it's good practice to minimize surprises for users of a class, and so embedding truly long running things - eg, a 10-minute monte carlo analysis - in a method makes sense.
But suppose a prop accessor requires a db read. I already have the db connection open. Would db access code be "acceptable", within the normal expectations, in a property accessor?
Like you mentioned, it's a surprise for the user of the class. People are used to being able to do things like this with properties (contrived example follows:)
foreach (var item in bunchOfItems)
foreach (var slot in someCollection)
slot.Value = item.Value;
This looks very natural, but if item.Value actually is hitting the database every time you access it, it would be a minor disaster, and should be written in a fashion equivalent to this:
foreach (var item in bunchOfItems)
{
var temp = item.Value;
foreach (var slot in someCollection)
slot.Value = temp;
}
Please help steer people using your code away from hidden dangers like this, and put slow things in methods so people know that they're slow.
There are some exceptions, of course. Lazy-loading is fine as long as the lazy load isn't going to take some insanely long amount of time, and sometimes making things properties is really useful for reflection- and data-binding-related reasons, so maybe you'll want to bend this rule. But there's not much sense in violating the convention and violating people's expectations without some specific reason for doing so.
In addition to the good answers already posted, I'll add that the debugger automatically displays the values of properties when you inspect an instance of a class. Do you really want to be debugging your code and have database fetches happening in the debugger every time you inspect your class? Be nice to the future maintainers of your code and don't do that.
Also, this question is extensively discussed in the Framework Design Guidelines; consider picking up a copy.
A db read in a property accessor would be fine - thats actually the whole point of lazy-loading. I think the most important thing would be to document it well so that users of the class understand that there might be a performance hit when accessing that property.
You can do whatever you want, but you should keep the consumers of your API in mind. Accessors and mutators (getters and setters) are expected to be very light weight. With that expectation, developers consuming your API might make frequent and chatty calls to these properties. If you are consuming external resources in your implementation, there might be an unexpected bottleneck.
For consistency sake, it's good to stick with convention for public APIs. If your implementations will be exclusively private, then there's probably no harm (other than an inconsistent approach to solving problems privately versus publicly).
It is just a "good practice" not to make property accessors taking long time to execute.
That's because properties looks like fields for the caller and hence caller (a user of your API that is) usually assumes there is nothing more than just a "return smth;"
If you really need some "action" behind the scenes, consider creating a method for that...
I don't see what the problem is with that, as long as you provide XML documentation so that the Intellisense notifies the object's consumer of what they're getting themselves into.
I think this is one of those situations where there is no one right answer. My motto is "Saying always is almost always wrong." You should do what makes the most sense in any given situation without regard to broad generalizations.
A database access in a property getter is fine, but try to limit the amount of times the database is hit through caching the value.
There are many times that people use properties in loops without thinking about the performance, so you have to anticipate this use. Programmers don't always store the value of a property when they are going to use it many times.
Cache the value returned from the database in a private variable, if it is feasible for this piece of data. This way the accesses are usually very quick.
This isn't directly related to your question, but have you considered going with a load once approach in combination with a refresh parameter?
class Example
{
private bool userNameLoaded = false;
private string userName = "";
public string UserName(bool refresh)
{
userNameLoaded = !refresh;
return UserName();
}
public string UserName()
{
if (!userNameLoaded)
{
/*
userName=SomeDBMethod();
*/
userNameLoaded = true;
}
return userName;
}
}

Categories