How to generate good code coverage of floating-point logic? - c#

I am hand-crafting new code. I'd like to make sure I leave no stone unturned.
Is there anything specific I can do beyond specifying Code Contracts to guide Pex so it produces good coverage in numerically-intensive code?
Try searching http://research.microsoft.com/en-us/projects/pex/pexconcepts.pdf for keyword 'float' for some background information.
Arithmetic constraints over floating point numbers are approximated by a translation to rational numbers, and heuristic search techniques are used outside of Z3 to find approximate solutions for floating point constraints.
...and also...
Symbolic Reasoning. Pex uses an automatic constraint solver to determine which values are relevant for the test and the code-under-test. However, the abilities of the constraint solver are, and always will be, limited. In particular, Z3 cannot reason precisely about floating point arithmetic.
Alternatively, do you know a tool under .NET that is better suited for the task of finding numerical anomalies under .NET? I am aware of http://fscheck.codeplex.com/ but it does not perform symbolic reasoning.

Is what you want good coverage? Just having a test that runs every branch in a piece of code is unlikely to actually mean that it is correct - often it's more about corner cases and you as the developer are best placed to know what these corner cases are. It also sounds like it works by just saying 'here's an interesting input combination' whereas more than likely what you want is to specify the behaviour of the system you want to see - if you have written the code wrong in the first place then the interesting inputs may be completely irrelevant to the correct code.
Maybe this isn't the answer you're looking for but I'd say the best way to do this is by hand! Write down a spec before you start coding and turn it in into a load of test cases when you know/as you are writing the API for your class/subsystem.
As begin filling out the API/writing the code you're likely to pick up extra bits and pieces that you need to do + find out what the difficult bits are - if you have conditionals etc that are something you feel that someone refactoring your code might get wrong then write a test case that covers them. I sometimes intentionally write code wrong at these points, get a test in that fails and then correct it just to make sure that the test is checking the correct path through the code.
Then try and think of any odd values you may not have covered - negative inputs, nulls etc. Often these will be cases that are invalid and you dont want to cater for/have to think about - in these cases I will generally write some tests to say that they should throw exceptions - that basically stops people misusing the code in cases you haven't though about properly/with invalid data.
You mentioned above that you are working with numerically intensive code - it may be worth testing a level above so you can test the behaviours in the system you are looking for rather than just number crunching - presuming that the code isn't purely numerical this will help you establish some real conditions of execution and also ensure that whatever the number crunching bit is actually doing interacts with the rest of the program in the way you need it to - if it's something algorithmic you'd probably be better off writing an acceptance test language to help characterise what the desired outputs are in different situations - this gives a clear picture of what you are trying to achieve, it also allows you to throw large amounts of (real) data through a system which is probably better than a computer generated input. The other benefit of this is that if you realise the algorithm needs a drastic rewrite in order to meet some new requirement then all you have to do is add the new test case and then rewrite/refactor; if your tests were just looking at the details of the algorithm and assuming the effects on the outside world then you would have a substantial headache trying to figure out how the algorithm currently influences behaviour, which parts were correct and which were not and then trying to migrate a load of unit tests onto a new API/algorithm.

Related

C# how to find the state someone was born in with their social security number

I am currently working on an assignment in which I am to validate various formats using regular expressions (phone numbers, birth date, email address, Social Security). One of the features our teacher has suggested would be to have a method that returns the state an individual was born using their Social Security Number.
xxx-xx-xxxx
The first 3 digits correspond to a state/area as outlined here:
http://socialsecuritynumerology.com/prefixes.php
If I've isolated the first 3 numbers as an integer already, is there anyway I could quickly match the number with its corresponding area code?
Currently I'm only using if-else statements but its getting pretty tedious.
Example:
if (x > 0 && x<3)
return "New Hampshire";
else if (x <= 7)
return "Maine";
...
You have a few options here:
50 if statements, one for each state, as you are doing.
A switch with 999 conditions, matching each option with a state. It probably looks cleaner and you can generate it with a script and interject the return statements wherever necessary. Maybe worse than option 1 in terms of tediousness.
Import the file as text, parse it into a Dictionary and do a simple lookup. The mapping is most likely not going to change in the near future, so the robustness argument is rather moot, but it is probably "simpler" in terms of amount of effort*. And it's another chance to practice regex to parse lines in the file you linked.
*Where "effort" is measured purely in the amount of tedious gruntwork prone to annoying human error and hand fatigue. Energy consumed within the brain due to engineering and implementing a solution where the computer does the ugly stuff for you is not included. :)
It's hard to tell your level of skill and what your course has taught you so far which is why it's difficult answering these kinds of questions, and also for the most part that's why you will get a negative response from people - they assume that you would have had the answer in your course materials already and will assume that you are being lazy.
I'm going to assume that you are at a basic level and that you already know how to solve the problem the brute force way (your if/else construct) and that you are genuinely interested in how to make your code better and not simply asking for a solution you can copy/paste.
Now, while your if/else idea will work, that is a procedural way of thinking. You are working with an object oriented language, so I would suggest to you to think about how you could use the principles of OO to make this work better. A good starting point would be to make a collection of state objects that contain all the parameters you need. You could then loop through your state collection and use their properties to find the matching one. You could create the state collection by reading from a file or database or even just hard coding it for the purposes of your assignment.
Your intuition that a long chain of if and else if statements might be unwieldy is sound. Not only is it tedious, but the search to find the correct interval is inefficient. However, something needs to be in charge of this tedium. A simple solution would be to use a Dictionary to store key/value pairs that you could build up once, and reuse throughout the application. This has the downside of requiring more space than necessary, as every individual mapping becomes an element of the data structure. You could instead follow the advice from this question and use a data structure more suited to ranged values for look ups.
It's difficult to tell from your question as it's written what your level of expertise is, and going into any real detail here would essentially mean completing your assignment for you. It should be noted though, that there's nothing inherently wrong with your solution, and depending on where you are in your education it may be what's expected.

How TDD works when there can be millions of test cases for a production functionality?

In TDD, you pick a test case and implement that test case then you write enough production code so that the test passes, refactor the codes and again you pick a new test case and the cycle continues.
The problem I have with this process is that TDD says that you write enough code only to pass the test you just wrote. What I refer to exactly is that if a method can have e.g. 1 million test cases, what can you do?! Obviously not writing 1 million test cases?!
Let me explain what I mean more clearly by the below example:
internal static List<long> GetPrimeFactors(ulong number)
{
var result = new List<ulong>();
while (number % 2 == 0)
{
result.Add(2);
number = number / 2;
}
var divisor = 3;
while (divisor <= number)
{
if (number % divisor == 0)
{
result.Add(divisor);
number = number / divisor;
}
else
{
divisor += 2;
}
}
return result;
}
The above code returns all the prime factors of a given number. ulong has 64 bits which means it can accept values between 0 to 18,446,744,073,709,551,615!
So, How TDD works when there can be millions of test cases for a production functionality?!
I mean how many test cases suffice to be written so that I can say I used TDD to achieve this production code?
This concept in TDD which says that you should only write enough code to pass your test seems to be wrong to me as can be seen by the example above?
When enough is enough?
My own thoughts are that I only pick some test cases e.g. for Upper band, lower band and few more e.g. 5 test cases but that's not TDD, is it?
Many thanks for your thoughts on TDD for this example.
It's an interesting question, related to the idea of falsifiability in epistemology. With unit tests, you are not really trying to prove that the system works; you are constructing experiments which, if they fail, will prove that the system doesn't work in a way consistent with your expectations/beliefs. If your tests pass, you do not know that your system works, because you may have forgotten some edge case which is untested; what you know is that as of now, you have no reason to believe that your system is faulty.
The classical example in history of sciences is the question "are all swans white?". No matter how many different white swans you find, you can't say that the hypothesis "all swans are white" is correct. On the other hand, bring me one black swan, and I know the hypothesis is not correct.
A good TDD unit test is along these lines; if it passes, it won't tell you that everything is right, but if it fails, it tells you where your hypothesis is incorrect. In that frame, testing for every number isn't that valuable: one case should be sufficient, because if it doesn't work for that case, you know something is wrong.
Where the question is interesting though is that unlike for swans, where you can't really enumerate over every swan in the world, and all their future children and their parents, you could enumerate every single integer, which is a finite set, and verify every possible situation. Also, a program is in lots of ways closer to mathematics than to physics, and in some cases you can also truly verify whether a statement is true - but that type of verification is, in my opinion, not what TDD is going after. TDD is going after good experiments which aim at capturing possible failure cases, not at proving that something is true.
You're forgetting step three:
Red
Green
Refactor
Writing your test cases gets you to red.
Writing enough code to make those test cases pass gets you to green.
Generalizing your code to work for more than just the test cases you wrote, while still not breaking any of them, is the refactoring.
You appear to be treating TDD as if it is black-box testing. It's not. If it were black-box testing, only a complete (millions of test cases) set of tests would satisfy you, because any given case might be untested, and therefore the demons in the black box would be able to get away with a cheat.
But it isn't demons in the black box in your code. It's you, in a white box. You know whether you're cheating or not. The practice of Fake It Til You Make It is closely associated with TDD, and sometimes confused with it. Yes, you write fake implementations to satisfy early test cases - but you know you're faking it. And you also know when you have stopped faking it. You know when you have a real implementation, and you've gotten there by progressive iteration and test-driving.
So your question is really misplaced. For TDD, you need to write enough test cases to drive your solution to completion and correctness; you don't need test cases for every conceivable set of inputs.
From my POV the refactoring step doesn't seem to have taken place on this piece of code...
In my book TDD does NOT mean to write testcases for every possible permutation of every possible input/output parameter...
BUT to write all testcases needed to ensure that it does what it is specified to be doing i.e. for such a method all boundary cases plus a test which picks randomly a number from a list containing numbers with known correct results. If need be you can always extend this list to make the test more thorough...
TDD only works in real world if you don't throw common sense out the window...
As to
Only write enough code to pass your test
in TDD this refers to "non-cheating programmers"... IF you have one or more "cheating programmer" who for example just hardcode the "correct result" of the testcases into the method I suspect you have a much bigger problem on your hands than TDD...
BTW "Testcase construction" is something you get better at the more you practice it - there is no book/guide that can tell you which testcases are best for any given situation upfront... experience pays off big when it comes to constructing testcases...
TDD does permit you to use common sense if you want to. There's no point defining your version of TDD to be stupid, just so that you can say "we're not doing TDD, we're doing something less stupid".
You can write a single test case that calls the function under test more than once, passing in different arguments. This prevents "write code to factorize 1", "write code to factorize 2", "write code to factorize 3" being separate development tasks.
How many distinct values to test really depends how much time you have to run the tests. You want to test anything that might be a corner case (so in the case of factorization at least 0, 1, 2, 3, LONG_MAX+1 since it has the most factors, whichever value has the most distinct factors, a Carmichael number, and a few perfect squares with various numbers of prime factors) plus as big a range of values as you can in the hope of covering something that you didn't realise was a corner case, but is. This may well mean writing the test, then writing the function, then adjusting the size of the range based on its observed performance.
You're also allowed to read the function specification, and implement the function as if more values are tested than actually will be. This doesn't really contradict "only implement what's tested", it just acknowledges that there isn't enough time before ship date to run all 2^64 possible inputs, and so the actual test is a representative sample of the "logical" test that you'd run if you had time. You can still code to what you want to test, rather than what you actually have time to test.
You could even test randomly-selected inputs (common as part of "fuzzing" by security analysts), if you find that your programmers (i.e. yourself) are determined to be perverse, and keep writing code that only solves the inputs tested, and no others. Obviously there are issues around the repeatability of random tests, so use a PRNG and log the seed. You see a similar thing with competition programming, online judge programs, and the like, to prevent cheating. The programmer doesn't know exactly which inputs will be tested, so must attempt to write code that solves all possible inputs. Since you can't keep secrets from yourself, random input does the same job. In real life programmers using TDD don't cheat on purpose, but might cheat accidentally because the same person writes the test and the code. Funnily enough, the tests then miss the same difficult corner cases that the code does.
The problem is even more obvious with a function that takes a string input, there are far more than 2^64 possible test values. Choosing the best ones, that is to say ones the programmer is most likely to get wrong, is at best an inexact science.
You can also let the tester cheat, moving beyond TDD. First write the test, then write the code to pass the test, then go back and write more white box tests, that (a) include values that look like they might be edge cases in the implementation actually written; and (b) include enough values to get 100% code coverage, for whatever code coverage metric you have the time and willpower to work to. The TDD part of the process is still useful, it helps write the code, but then you iterate. If any of these new tests fail you could call it "adding new requirements", in which case I suppose what you're doing is still pure TDD. But it's solely a question of what you call it, really you aren't adding new requirements, you're testing the original requirements more thoroughly than was possible before the code was written.
When you write a test you should take meaningful cases, not every case. Meaningful cases include general cases, corner cases...
You just CAN'T write a test for every single case (otherwise you could just put the values on a table and answer them, so you'd be 100% sure your program will work :P).
Hope that helps.
That's sort of the first question you've got for any testing. TDD is of no importance here.
Yes, there are lots and lots of cases; moreover, there are combinations and combinations of cases if you start building the system. It will indeed lead you to a combinatoric explosion.
What to do about that is a good question. Usually, you choose equivalence classes for which your algorithm will probably work the same—and test one value for each class.
Next step would be, test boundary conditions (remember, two most frequent errors in CS are off-by one error).
Next... Well, for all practical reasons, it's ok to stop here. Still, take a look at these lecture notes: http://www.scs.stanford.edu/11au-cs240h/notes/testing.html
PS. By the way, using TDD "by book" for math problems is not a very good idea. Kent Beck in his TDD book proves that, implementing the worst possible implementation of a function calculating Fibonacci numbers. If you know a closed form—or have an article describing a proven algorithm, just make sanity checks as described above, and do not do TDD with the whole refactoring cycle—it'll save your time.
PPS. Actually, there's a nice article which (surprise!) mentions bot the Fibonacci problem and the problem you have with TDD.
There aren't millions of test cases. Only a few. You might like to try PEX, which will let you find out the different real test cases in your algorithm. Of course, you need only test those.
I've never done any TDD, but what you're asking about isn't about TDD: It is about how to write a good test suite.
I like to design models (on paper or in my head) of all the states each piece of code can be in. I consider each line as if it were a part of a state machine. For each of those lines, I determine all the transitions that can be made (execute the next line, branch or not branch, throw an exception, overflow any of the sub calculations in the expression, etc).
From there I've got a basic matrix for my test cases. Then I determine each boundary condition for each of those state transitions, and any interesting mid-points between each of those boundaries. Then I've got the variations for my test cases.
From here I try to come up with interesting and different combinations of flow or logic - "This if statement, plus that one - with multiple items in the list", etc.
Since code is a flow, you often can't interrupt it in the middle unless it makes sense to insert a mock for an unrelated class. In those cases I've often reduced my matrix quite a bit, because there are conditions you just can't hit, or because the variation becomes less interesting by being masked out by another piece of logic.
After that, I'm about tired for the day, and go home :) And I probably have about 10-20 test cases per well-factored and reasonably short method, or 50-100 per algorithm/class. Not 10,000,000.
I probably come up with too many uninteresting test cases, but at least I usually overtest rather than undertest. I mitigate this by trying to to factor my test cases well to avoid code duplication.
Key pieces here:
Model your algorithms/objects/code, at least in your head. Your code is more of a tree than a script
Exhaustively determine all the state transitions within that model (each operation that can be executed independently, and each part of each expression that gets evaluated)
Utilize boundary testing so you don't have to come up with infinite variations
Mock when you can
And no, you don't have to write up FSM drawings, unless you have fun doing that sort of thing. I don't :)
What you usually do, it test against "test boundary conditions", and a few random conditions.
for example: ulong.min, ulong.max, and some values. Why are you even making a GetPrimeFactors? You like to calculate them in general, or are you making that to do something specific? Test for why you're making it.
What you could also do it Assert for result.Count, instead of the all individual items. If you know how many items you're suppose to get, and some specific cases, you can still refactor your code and if those cases and the total count is the same, assume the function still works.
If you really want to test that much, you could also look into white box testing. For example Pex and Moles is pretty good.
TDD is not a way to check that a function/program works correctly on every permutation of inputs possible. My take on it is that the probability that I write a particular test-case is proportional to how uncertain I am that my code is correct in that case.
This basically means I write tests in two scenarios: 1) some code I've written is complicated or complex and/or has too many assumptions and 2) a bug happens in production.
Once you understand what causes a bug it is generally very easy to codify in a test case. In the long term, doing this produces a robust test suite.

How do I show that I'm getting the smallest number of conflicts with Git by using K&R instead of B&D as the coding standard?

Git is smart and will "follow" changes in history to make merging easier and auto merge more for me. Conflicts are delineated by the lines that have not changed around the change. With K&R you get no ambiguous lines that have only "{" in them like you would in B&D. How would I test the limit of the context sensitivity that Git has in terms of the lines that surround a change?
I want to avoid resolving conflicts that I may not need to. But I need some way to test how many potential conflicts I will save by switching to K&R and the additional context it brings.
So far, Git is too smart to get fooled by trivial examples.
Any help is appreciated. Thanks in advance.
So this is the closest thing I have found so far as evidence to this.
http://git.661346.n2.nabble.com/Bram-Cohen-speaks-up-about-patience-diff-td2277041.html
The e-mail is discussing the purpose of the "patience" algorithm in Git. In it Bram explains how superfluous matching lines can create nasty conflicts that can be tough to resolve but only in the case of fairly complex merges involving large patches. In other words simple contrived examples will fail to show this behavior.
While he does also mention things like End affecting the results it makes some sense to infer that placing an opening brace on it's own line increases the number of superfluous matching lines possibly resulting in a greater probability of these conflicts.
I'd say this isn't iron-clad, but it does lend some credence to this theory.
So 'unique lines' is a simple cross-language proxy for 'unimportant
lines'.
That quote stands out to me in this discussion, since we are basically matching on what we feel are important lines that are supposed to give us context, however a brace by itself is not important and provides no context.
Pick standard that is easier for you/your team to read over anything else. No matter how your source control system behaves today it will behave better tomorrow, but your code will stay the way you've checked it in for long time.
We have several large projects (over 200+ code files), which contain a massive mixture of both K&R AND B&D. I can safely say that after almost 2 years on Git, a development staff that is refactoring-crazy, and a rebasing behavior of "several time a day", that I have never had a conflict due to coding standards things. So pick whichever, or both, or neither. It won't matter.

Does Unit Testing make Debug.Assert() unnecessary?

Its been a while I have ready Mcconnell's "Code Complete". Now I read it again in Hunt & Thomas' "The Pragmatic Programmer": Use assertions! Note: Not Unit Testing assertions, I mean Debug.Assert().
Following the SO questions When should I use Debug.Assert()? and When to use assertion over exceptions in domain classes assertions are useful for development, because "impossible" situations can be found quite fast. And it seems that they are commonly used. As far as I understood assertions, in C# they are often used for checking input variables for "impossible" values.
To keep unit tests concise and isolated as much as possible, I do feed classes and methods with nulls and "impossible" dummy input (like an empty string).
Such tests are explicitly documenting, that they don't rely on some specific input. Note: I am practicing what Meszaros' "xUnit Test Patterns" is describing as Minimal Fixture.
And that's the point: If I would have an assertions guarding these inputs, they would blow up my unit tests.
I like the idea of Assertative Programming, but on the other hand I don't need to force it. Currently I can't think of any use for Debug.Assert(). Maybe there is something I miss? Do you have any suggestions, where they could be really useful? Maybe I just overestimate the usefulness of assertions? Or maybe my way of testing needs to be revisited?
Edit: Best practice for debug Asserts during Unit testing is very similar, but it does not answer the question which bothers me: Should I care about Debug.Assert() in C# if I test like I have described? If yes, in which situation are they really useful? In my current point of view such Unit Tests would make Debug.Assert() unnecessary.
Another point: If you really think that, this is a duplicate question, just post some comment.
In theory, you're right - exhaustive testing makes asserts redundant. In theory. In parctice, they're still useful for debugging your tests, and for catching attempts by future developers who might try to use interfaces not according to their intended semantics.
In short, they just serve a different purpose from unit tests. They're there to catch mistakes that by their very nature aren't going to be made when writing unit tests.
I would recommend keeping them, since they offer another level of protection from programmer mistakes.
They're also a local error protection mechanism, whereas unit tests are external to the code being tested. It's far easier to "inadvertently" disable unit tests when under pressure than it is to disable all the assertions and runtime checks in a piece of code.
I generally see asserts being used for sanity checks on internal state rather than things like argument checking.
IMO the inputs to a solid API should be guarded by checks that remain in place regardless of the build type. For example, if a public method expects an argument that is a number in between 5 and 500, it should be guarded with an ArgumentOutOfRangeException. Fail fast and fail often using exceptions as far as I'm concerned, particularly when an argument is pushed somewhere and is used much later.
However, in places where internal, transient state is being sanity-checked (e.g. checking that some intermediate state is within reasonable bounds during a loop), it seems the Debug.Assert is more at home. What else are you meant to do when your algorithm has gone wrong despite having valid arguments passed to it? Throw an EpicFailException? :) I think this is where Debug.Assert is still useful.
I'm still undecided on the best balance between the two. I've stopped using using Debug.Asserts so much in C# since I started unit testing, but there's still a place for them IMO. I certainly wouldn't use them to check correctness in API use, but sanity checking in hard to get to places? Sure.
The only downside is that they can pop up and halt NUnit, but you can write an NUnit plugin to detect them and fail any test that triggers an assert.
I'm using both unit testing and assertions, for different purposes.
Unit testing is automated experimentation showing that your program (and its parts) function as specified. Just like in mathematics, experimentation is not a proof as long as you cannot try every possible combination of input. Nothing demonstrates that better than the fact that even with unit testing, your code will have bugs. Not many, but it will have them.
Assertions are for catching dodgy situations during runtime that normally shouldn't happen. Maybe you've heard about preconditions, postconditions, loop invariables and things like that. In the real world, we don't often go through the formal process of actually proving (by formal logic) that a piece of code yields the specified postconditions if the preconditions are satisfied. That would be a real mathematical proof, but we often don't have time to do that for each method. However, by checking if the preconditions and postconditions are satisfied, we can spot problems at a much earlier stage.
If you're doing exhaustive unit testing which covers all the odd edge cases you might encounter, then I don't think you are going to find assertions very useful. Most people who don't unit test place assertions to establish similar constraints as you catch with your tests.
I think the idea behind Unit Testing in this case is to move those assertions over to the test cases to ensure that, instead of having Debug.Assert(...) your code under test handles it without throwing up (or ensures that it's throwing up correctly).

Programmatically checking code complexity, possibly via c#?

I'm interested in data mining projects, and have always wanted to create a classification algorithm that would determine which specific check-ins need code-reviews, and which may not.
I've developed many heuristics for my algorithm, although I've yet to figure out the killer...
How can I programmatically check the computational complexity of a chunk of code?
Furthermore, and even more interesting - how could I use not just the code but the diff that the source control repository provides to obtain better data there..
IE: If I add complexity to the code I'm checking in - but it reduces complexity in the code that is left - shouldn't that be considered 'good' code?
Interested in your thoughts on this.
UPDATE
Apparently I wasn't clear. I want this
double codeValue = CodeChecker.CheckCode(someCodeFile);
I want a number to come out based on how good the code was. I'll start with numbers like VS2008 gives when you calculate complexity, but would like to move to further heuristics.
Anyone have any ideas? It would be much appreciated!
Have you taken a look at NDepend? This tool can be used to calculated code complexity and supports a query language by which you can get an incredible amount of data on your application.
The NDepend web site contains a list of definitions of various metrics. Deciding which are most important in your environment is largely up to you.
NDepend also has a command line version that can be integrated into your build process.
Also, Microsoft's Code Analysis (ships with VS Team Suite) includes metrics which check the cyclomatic complexity of code, and raises a build error (or warning) if this number is over a certain threshold.
I don't know off hand, but ut may be worth checking whether this number is configurable to your requirements. You could then modify your build process to run code analysis any time something is checked in.
See Semantic Designs C# Metrics Tool for a tool that computes a variety of standard metrics value both over complete files, and all reasonable subdivisions (methods, classes, ...).
The output is an XML document, but extracting the value(s) you want from that should be trivial with an XML reader.

Categories