Should you Unit Test simple properties of a class, asserting that a value is set and retrieved? Or is that really just unit testing the language?
Example
public string ConnectionString { get; set; }
Test
public void TestConnectionString()
{
var c = new MyClass();
c.ConnectionString = "value";
Assert.Equal(c.ConnectionString, "value");
}
I guess I don't see the value in that.
I would suggest that you absolutely should.
What is an auto-property today may end up having a backing field put against it tomorrow, and not by you...
The argument that "you're just testing the compiler or the framework" is a bit of a strawman imho; what you're doing when you test an auto-property is, from the perspective of the caller, testing the public "interface" of your class. The caller has no idea if this is an auto property with a framework-generated backing store, or if there is a million lines of complex code in the getter/setter. Therefore the caller is testing the contract implied by the property - that if you put X into the box, you can get X back later on.
Therefore it behooves us to include a test since we are testing the behaviour of our own code and not the behaviour of the compiler.
A test like this takes maybe a minute to write, so it's not exactly burdensome; and you can easily enough create a T4 template that will auto-generate these tests for you with a bit of reflection. I'm actually working on such a tool at the moment to save our team some drudgery
If you're doing pure TDD then it forces you to stop for a moment and consider if having an auto public property is even the best thing to do (hint: it's often not!)
Wouldn't you rather have an up-front regression test so that when the FNG does something like this:
//24-SEP-2013::FNG - put backing field for ConnectionString as we're now doing constructor injection of it
public string ConnectionString
{
{get { return _connectionString; } }
{set {_connectionString="foo"; } }//FNG: I'll change this later on, I'm in a hurry
}
///snip
public MyDBClass(string connectionString)
{
ConnectionString=connectionString;
}
You instantly know that they broke something?
If the above seems contrived for a simple string property I have personally seen a situation where an auto-property was refactored by someone who thought they were being oh so clever and wanted to change it from an instance member to a wrapper around a static class member (representing a database connection as it happens, the resons for the change are not important).
Of course that same very clever person completely forgot to tell anyone else that they needed to call a magic function to initialise this static member.
This caused the application to compile and ship to a customer whereupon it promptly failed. Not a huge deal, but it cost several hours of support's time==money....
That muppet was me, by the way!
EDIT: as per various conversations on this thread, I wanted to point out that a test for a read-write property is ridiculously simple:
[TestMethod]
public void PropertyFoo_StoresCorrectly()
{
var sut = new MyClass();
sut.Foo = "hello";
Assert.AreEqual("hello", sut.Foo, "Oops...");
}
edit: And you can even do it in one line as per Mark Seeman's Autofixture
I would submit that if you find you have such a large number of public properties as to make writing 3 lines like the above a chore for each one, then you should be questioning your design; If you rely on another test to indicate a problem with this property then either
The test is actually testing this property, or
You will spend more time verifying that this other test is failing because the property is incorrect (via debugger, etc) than you would have spent typing in the above code
If some other test allows you to instantly tell that the property is at fault, it's not a unit test!
edit (again!): As pointed out in the comments, and rightly so, things like generated DTO models and the like are probably exceptions to the above because they are just dumb old buckets for shifting data somewhere else, plus since a tool created them, it's generally pointless to test them.
/EDIT
Ultimately "It depends" is probably the real answer, with the caveat that the best "default" disposition to be the "always do it" approach, with exceptions to that taken on an informed, case by case basis.
Generally, no. A unit test should be used to test for the functionality of a unit. You should unit test methods on a class, not individual, automatic properties (unless you are overriding the getter or setter with custom behaviour).
You know that assigning a string value to an automatic string property will work if you get the syntax and setter value correct as that is a part of the language specification. If you do not do this then you will get a runtime error to point out your flaw.
Unit tests should be designed to test for logical errors in code rather than something the compiler would catch anyway.
EDIT: As per my conversation with the author of the accepted answer for this question I would like to add the following.
I can appreciate that TDD purists would say you need to test automatic properties. But, as a business applications developer I need to weigh up, reasonably the amount of time I could spend writing and performing tests for 'trivial' code such as automatic properties compared to how long it would reasonably take to fix an issue that could arise from not testing. In personal experience most bugs that arise from changing trivial code are trivial to fix 99% of the time. For that reason I would say the positives of only unit testing non-language specification functionality outweigh the negatives.
If you work in a fast paced, business environment which uses a TDD approach then part of the workflow for that team should be to only test code that needs testing, basically any custom code. Should someone go into your class and change the behavior of an automatic property, it is their responsibility to set up a unit test for it at that point.
I would have to say no. If that doesn't work, you have bigger problems. I know I don't. Now some will argue that having the code alone would make sure the test failed if the property was removed, for example. But I'd put money on the fact that if the property were removed, the unit test code would get removed in the refactor, so it wouldn't matter.
Are you adhering to strict TDD practices or not?
If yes then you absolutely should write tests on public getters and setters, otherwise how will you know if you've implemented them correctly?
If no, you still probably should write the tests. Though the implementation is trivial today, it is not guaranteed to remain so, and without a test covering the functionality of a simple get/set operation, when a future change to implementation breaks an invariant of "setting property Foo with a value Bar results in the getter for property Foo returning value Bar" the unit tests will continue to pass. The test itself is also trivially implemented, yet guards against future change.
The way I see is that how much unit testing (or testing in general) is down to how confident are you that the code works as designed and what are the chances of it breaking in the future.
If you have a lower confidence of the code breaking (maybe due to the code being out sourced and the cost of checking line by line is high) then perhaps unit testing properties is appropriate.
Once thing you can do is write a helper class that can go over all get/set properties of a class to test that they still behave as designed.
Unless the properties perform any other sort of logic, then no.
Yes, it is like unit testing the language. It would be completely pointless to test simple auto-implemented properties otherwise.
According to the book The Art of Unit Testing With Examples in .NET, a unit test covers not any type of code, it focuses on logical code. So, what is logical code?
Logical code is any piece of code that has some sort of logic in it, small as it may be. It’s logical code if it has one or more of the
following: an IF statement, a loop, switch or case statements,
calculations, or any other type of decision-making code.
Does a simple getter/setter wrap an any logic? The answer is:
Properties (getters/setters in Java) are good examples of code that
usually doesn’t contain any logic, and so doesn’t require testing. But
watch out: once you add any check inside the property, you’ll want to
make sure that logic is being tested.
My answer which is from former test manager viewpoint and currently a development manager ( responsible for software delivery in time and quality) viewpoint. I see people are mentioning pragmatism. Pragmatism is not a good adviser because it may pair up with laziness and/or time pressure. It may led you on the wrong way. If you mention pragmatism you have to be careful to keep your intentions on the track of professionalism and common sense. It requires humility to accept the answers because they might not that you want to hear.
From my viewpoint what is important are the next:
you should find the defect as early as possible. Doing so you have to apply proper testing strategy. If it is testing properties then you have to test properties. If not, then don't do it. Both comes with a price.
your testing should be easy and fast. The bigger part (unit, integration, etc.) of the code tested in build time is the better.
you should do root cause analysis to answer the questions below and make your organization protected from the current type of error. Don't worry, another type of defect will come up and there will be always lessons to be learned.
what the root cause is?
how to avoid it next time?
another aspect is the cost of creating/maintaining tests. Not testing properties because they are boring to maintain and/or you have hundreds of properties is ridiculous. You should create/apply tools which makes the woodcutting job instead of human. In general, you always have to enhance your environment in order to be more efficient.
what other says are not good adviser - doesn't matter whether it was said by Martin Fowler or Seeman - the environment they are I'm pretty sure not the same as you are in. You have to use your knowledge and experience to setup what is good for your project and how to make it better. If you apply things because they were said by people you are respect without even thinking it through you will find yourself in deep trouble. I do not say that you don't need advises and/or other people help or opinion, you have to apply common sense to apply them.
TDD does not answer two important question, however, BDD does give you answers for the questions below. But, if you follow only one, you won't have delivery in time and quality. So doesn't matter whether you are purist TDD guy or not.
what must be tested? ( it says everything must be tested - wrong answer in my opinion )
when testing must be finished?
All in all, there is no good answer. Just other questions you have to answer to get that temporary point where you are able to decide whether it is needed or not.
Related
I'm writing unit tests for the implementation of an API I wrote myself in my company's application. Still new to this whole thing. When looking for answeres on how to unit test certain things I come across a certain pattern. It goes something like this:
Question:
I have this private method I need to unit test.
Top voted answer:
Don't.
I also came across this article arguing against unit testing private methods as well.
Basically how I'm implementing an API I'm given is I write the code first, then I write unit tests to "break it the worst way possible" (as my superior puts it). Once I notice something broke I fix it in the code. To me this seems like a mash-up of OOD and TDD. Is that a legit approach?
The reason I got so many private methods in the first place is that I'm required to break up larger chunks of code into methods. Since these methods are only supposed to be used within the scope of this API implementation I set them to private. Since the file structure established by my team requires me to write all the code into a single file corresponding to an API I can't separate these private methods into a new class and set them to public.
My superior expects me to test these private methods as well. But I'm beginning to doubt if this is even really necessary if the Asserts on the public methods all run successfully?
From my point of view, if my tests on the public methods return the values I expected, I infer that my private methods also work like I intended.
Or am I missing something?
The core point is: unit tests exist to guarantee that your class under tests behaves as expected.
The behavior of your classes manifests itself via those methods that can be called from "outside" of your classes.
Therefore there is neither need nor sense in trying to directly test private methods.
Of course, it is fair to measure coverage while running unit tests; in order to understand which paths in your code are taken. This information can be used to either enhance test cases (to gain more coverage); or to delete production code (which isn't required).
And to align with your question: you do not use TDD to implement private methods.
You use TDD to create a special form of your "contract" that can be executed automatically. You verify what needs to be done; not how it is actually done in detail. That is especially true since the TDD methodology includes continuous refactoring. You write your tests, you turn them green (by writing production code); and then, at some point, you look into improving the quality of your code. Meaning: you start reworking internal aspects of your class under test. Like: creating more private methods, moving content around; maybe even creating internal-only helper classes and so on. But you keep running your existing tests ... which should still all work; because as said: you write them to check the externally observable behavior (as far as possible).
And beyond that: you should rather looking into "fuzzying" the test data that your unit tests drive into your code instead of worrying about private methods.
What I mean: instead of trying to manually find that test data that makes your production code break, look into concepts like QuickCheck that try to do exactly that automatically.
Final words: and if your management keeps hammering on "test private methods"; then it is your responsibility as engineer to convince them that they are wrong about this. And there is plenty of material out there to back that up.
The way you are splitting your code at the moment is out of necessity. You are delegating some work in a private method, because, well, other public methods need to re-use this, and you don't want to copy-paste that code. Of course, since these methods don't make sense being used as standalone methods, you keep them private.
Good, at least you're true to the DRY (Don't Repeat Yourself) principle.
Now, another way to look it is that you want to separate your private methods from the rest of the code, because you want to have a Separation of Concerns. If you do this, you will see that these private methods, although they can't be used on their own, don't really belong to the class containing your public methods, because they don't solve the same concern : This is the Single Responsibility principle: the S in SOLID.
Instead of having your private method within your class, what you can do is move it to another class (a service as I call them), inject it in the class in which they were before, and call these methods instead of the call to the private ones.
Why should you do this ?
Because it will be so much easier to test: you delegate a big part of the code, that you will not have to test under a big combination of scenarios.
Because you can then inject an alternative implementation (think maintainability: it's easier to replace a brick, than a part of a brick)
Because you can delegate the implementation (and the testing) of this service to someone else (you can have 2 developers in parallel working on a very small area of the code)
Sometimes, it makes even more sense, because these service classes will then be re-used by other completely different classes that will have the same needs, if they really take care of one single concern.
This last point doesn't always happen, but quite often, it does. I found it is easier to re-use existing data services when they are self-documented: properly-named services and properly-named methods. (your co-workers will discover them more easily)
Now, you don't need to test a private method... because it's public.
You may think it's cheating, because you just made it public, but this comes from a very legitimate approach: Separation of Concerns.
Final notes:
I am convinced your superior is right about asking you to test this code. One thing he could have added was to do that separation into different classes. Also, make sure that you inject these classes using Dependency Injection and Inversion of Control containers. don't instantiate them using the new statement, otherwise, you will not be able to assert that the right method was called with the right arguments !
I recently joined a company where they use TDD, but it's still not clear for me, for example, we have a test:
[TestMethod]
public void ShouldReturnGameIsRunning()
{
var game = new Mock<IGame>();
game.Setup(g => g.IsRunning).Returns(true);
Assert.IsTrue(game.Object.IsRunning);
}
What's the purpose of it? As far as I understand, this is testing nothing! I say that because it's mocking an interface, and saying, return true for IsRunning, it will never return a different value...
I'm probably thinking wrong, as everybody says it's a good practice etc etc... or is this test wrong?
This test is excerising an interface, and verifying that the interface exposes a Boolean getter named IsRunning.
This test was probably written before the interface existed, and probably well before any concrete classes of IGame existed either. I would imagine, if the project is of any maturity, there are other tests that exercise the actual implementions and verify behavior for that level, and probably use mocking in the more traditional isolationist context rather than stubbing theoretical shapes.
The value of these kinds of tests is that it forces one to think about how a shape or object is going to be used. Something along the lines of "I don't know exactly what a game is going to do yet, but I do know that I'm going to want to check if it's running...". So this style of testing is more to drive design decisions, as well as validate behavior.
Not the most valuable test on the planet, but serves its purpose in my opinion.
Edit: I was on the train last night when I was typing this, but I wanted to address Andrew Whitaker comment directly in the answer.
One could argue that the signature in the interface itself is enough of an enforcement of the desired contract. However, this really depends on how you treat your interfaces and ultimately how you validate the system that you are trying to test.
There may be tangible value in explicitly stating this functionality as a test. You are explicitly verifying that this is a desired functionality for an IGame.
Lets take a use case, lets say that as a designer you know that you want to expose a public getter for IsRunning, so you add it to the interface. But lets say that another developer gets this and sees a property, but doesn't see any usages of it anywhere else in the code, one might assume that it is eligible for deletion (or code-rot removal, which is normally a good thing) without harm. The existence of this test, however, explicitly states this properity should exist.
Without this test, a developer could modify the interface, drop the implementation and the project would still compile and run seemingly correctly. And it would be until much later that someone might notice that interface has been changed.
With this test, even if it appears as if no one is using it, your test suite will fail. The self documenting nature of the test tells you that ShouldReturnGameIsRunning(), which at the very least should spawn a conversation if IGame should really expose an IsRunning getter.
The test is invalid. Moq is a framework used to "mock" objects used by the method under test. If you were testing IsRunning, you might use Mock to provide a certain implementation for those methods that IsRunning depends on, for example to cause certain execution paths to execute.
People can tell you all day that testing first is better; you won't realize that it actually is until you start doing it yourself and realize that you have fewer bugs with code you wrote tests for first, which will save you and your co-workers time and probably make the quality of your code higher.
It could be that the test has been written (before) the code which is good practice.
It might equally be that this is a pointless test created by a tool.
Another possibility is that the code was written by somebody who, at the time, didn't understand how to use that particular mocking framework - and wrote one or more simple tests to learn. Kent Beck talked about that sort of "learning test" in his original TDD book.
This test could have had a point in the past. It could have been testing a concrete class. It could have been testing an abstract one -- as it sometimes makes sense to mock an abstract class that you want to test. I can see how a (less informed(like myself)) follower of the TDD way of doing things could have left a small mess like this. Cleanup of dead code and dead test code, does not really fit in the natural workflow of red, green, refactor (since it doesn't break any tests!).
However, history is not justification for a test that does nothing but confuse the reader. So, be a good boyscout and remove it.
In my honest oppinion this is just rubbish, it only proves that moq works as expected
There are some other variations of this question here at SO, but please read the entire question.
By just using fakes, we look at the constructor to see what kind of dependencies that a class have and then create fakes for them accordingly.
Then we write a test for a method by just looking at it's contract (method signature). If we can't figure out how to test the method by doing so, shouldn't we rather try to refactor the method (most likely break it up in smaller pieces) than to look inside it to figure our how we should test it? In other words, it also gives us a quality control by doing so.
Isn't mocks a bad thing since they require us to look inside the method that we are going to test? And therefore skip the whole "look at the signature as a critic".
Update to answer the comment
Say a stub then (just a dummy class providing the requested objects).
A framework like Moq makes sure that Method A gets called with the arguments X and Y. And to be able to setup those checks, one needs to look inside the tested method.
Isn't the important thing (the method contract) forgotten when setting up all those checks, as the focus is shifted from the method signature/contract to look inside the method and create the checks.
Isn't it better to try to test the method by just looking at the contract? After all, when we use the method we'll just look at the contract when using it. So it's quite important the it's contract is easy to follow and understand.
This is a bit of a grey area and I think that there is some overlap. On the whole I would say using mock objects is preferred by me.
I guess some of it depends on how you go about testing code - test or code first?
If you follow a test driven design plan with objects implementing interfaces then you effectively produce a mock object as you go.
Each test treats the tested object / method as a black box.
It focuses you onto writing simpler method code in that you know what answer you want.
But above all else it allows you to have runtime code that uses mock objects for unwritten areas of the code.
On the macro level it also allows for major areas of the code to be switched at runtime to use mock objects e.g. a mock data access layer rather than one with actual database access.
Fakes are just stupid dummy objects. Mocks enable you to verify that the controlflow of the unit is correct (e.g. that it calls the correct functions with the expected arguments). Doing so is very often a good way to test things. An example is that a saveProject()-function probably want's to call something like saveToProject() on the objects to be saved. I consider doing this a lot better than saving the project to a temporary buffer, then loading it to verify that everything was fine (this tests more than it should - it also verifies that the saveToProject() implementation(s) are correct).
As of mocks vs stubs, I usually (not always) find that mocks provide clearer tests and (optionally) more fine-grained control over the expectations. Mocks can be too powerful though, allowing you to test an implementation to the level that changing implementation under test leaving the result unchanged, but the test failing.
By just looking on method/function signature you can test only the output, providing some input (stubs that are only able to feed you with needed data). While this is ok in some cases, sometimes you do need to test what's happening inside that method, you need to test whether it behaves correctly.
string readDoc(name, fileManager) { return fileManager.Read(name).ToString() }
You can directly test returned value here, so stub works just fine.
void saveDoc(doc, fileManager) { fileManager.Save(doc) }
here you would much rather like to test, whether method Save got called with proper arguments (doc). The doc content is not changing, the fileManager is not outputting anything. This is because the method that is tested depends on some other functionality provided by the interface. And, the interface is the contract, so you not only want to test whether your method gives correct results. You also test whether it uses provided contract in correct way.
I see it a little different. Let me explain my view:
I use a mocking framework. When I try to test a class, to ensure it will work as intended, I have to test all the situations may happening. When my class under test uses other classes, I have to ensure in certain test situation that a special exceptions is raised by a used class or a certain return value, and so on... This is hardly to simulate with the real implementations of those classes, so I have to write fakes of them. But I think that in the case I use fakes, tests are not so easy to understand. In my tests I use MOQ-Framework and have the setup for the mocks in my test method. In case I have to analyse my testmethod, I can easy see how the mocks are configured and have not to switch to the coding of the fakes to understand the test.
Hope that helps you finding your answer ...
Many times I find myself torn between making a method private to prevent someone from calling it in a context that doesn't make sense (or would screw up the internal state of the object involved), or making the method public (or typically internal) in order to expose it to the unit test assembly. I was just wondering what the Stack Overflow community thought of this dilemma?
So I guess the question truly is, is it better to focus on testability or on maintaining proper encapsulation?
Lately I've been leaning towards testability, as most of the code is only going to be leveraged by a small group of developers, but I thought I would see what everyone else thought?
Its NOT ok to change method visibility on methods that the customers or users can see. Doing this is ugly, a hack, exposes methods that any dumb user could try to use and explode your app... its a liability you do not need.
You are using C# yes? Check out the internals visible to attribute class.
You can declare your testable methods as internal, and allow your unit testing assembly access to your internals.
It depends on whether the method is part of a public API or not. If a method does not belong to part of a public API, but is called publicly from other types within the same assembly, use internal, friend your unit test assembly, and unit test it.
However, if the method is not/should not be part of a public API, and it is not called by other types internal to the assembly, DO NOT test it directly. It should be protected or private, and it should only be tested indirectly by unit testing your public API. If you write unit tests for non-public (or what should be non-public) members of your types, you are binding test code to internal implementation details.
Thats a bad kind of coupling, increases the amount of unit tests you need, increases workload both in the short term (more unit tests) as well as in the long term (more test maintenance and modification in response to refactoring internal implementation details). Another problem with testing non-public members is that you test code that may not actually be needed or used. A GREAT way to find dead code is when it is not covered by any of your unit tests when your public API is covered 100%. Removing dead code is a great way to keep your code base lean and mean, and is impossible if you are not careful about what you put into your public API, and what parts of your code you unit test.
EDIT:
As a quick additional note...with a properly designed public API, you can very effectively use a tool like Microsoft PEX to automatically generate full-coverage unit tests that test every execution path of your code. Combined with a few manually written tests that cover critical behavior, anything not covered can be considered dead code and removed, and you can greatly shortcut your unit testing process.
This is a common thought.
It's generally best to test the private methods by testing the public methods that call them (so you don't explicitly test the private methods). However, I understand that there are times when you really do want to test those private methods.
The answers to this question (Java) and this question (.NET) should be helpful.
To answer the question: no, you shouldn't change method visibility for the sake of testing. You generally shouldn't be testing private methods, and when you do, there are better ways to do it.
In general I agree with #jrista. But, as usual, it depends.
When trying to work with legacy code, the key is to get it under test. After that, you can add tests for new features and existing bugs, refactor to improve design, etc. This is risky without tests. Legacy code tends to be rife with dependencies, and is often extremely difficult to get under test.
In Working Effectively with Legacy Code, Michael Feathers suggests multiple techniques for getting code under test. Many of these techniques involve breaking encapsulation or complicating the design, and the author is up front about this. Once tests are in place, the code can be improved safely.
So for legacy code, do what you have to do.
In .NET you should use Accessors for unit testing, even rather than the InternalsVisibleTo attribute. Accessors allow you to get access to any method in the class even if it is private. They even let you test abstract classes using an empty mock derived object (see the "PrivateObject" class).
Basically in your test project you use the accessor class rather than the actual class with the methods you want to test. The accessor class is the same as the "real" class, except everything is public to your test project. Visual studio can generate accessors for you.
NEVER make a type more visible to facilitate unit testing.
IMO is it WRONG to say that you should not unit test private methods. Unit tests are of exceptional value for regression testing and there is no reason why private methods should not be regression tested with granular unit tests.
We unit test most of our business logic, but are stuck on how best to test some of our large service tasks and import/export routines. For example, consider the export of payroll data from one system to a 3rd party system. To export the data in the format the company needs, we need to hit ~40 tables, which creates a nightmare situation for creating test data and mocking out dependencies.
For example, consider the following (a subset of ~3500 lines of export code):
public void ExportPaychecks()
{
var pays = _pays.GetPaysForCurrentDate();
foreach (PayObject pay in pays)
{
WriteHeaderRow(pay);
if (pay.IsFirstCheck)
{
WriteDetailRowType1(pay);
}
}
}
private void WriteHeaderRow(PayObject pay)
{
//do lots more stuff
}
private void WriteDetailRowType1(PayObject pay)
{
//do lots more stuff
}
We only have the one public method in this particular export class - ExportPaychecks(). That's really the only action that makes any sense to someone calling this class ... everything else is private (~80 private functions). We could make them public for testing, but then we'd need to mock them to test each one separately (i.e. you can't test ExportPaychecks in a vacuum without mocking the WriteHeaderRow function. This is a huge pain too.
Since this is a single export, for a single vendor, moving logic into the Domain doesn't make sense. The logic has no domain significance outside of this particular class. As a test, we built out unit tests which had close to 100% code coverage ... but this required an insane amount of test data typed into stub/mock objects, plus over 7000 lines of code due to stubbing/mocking our many dependencies.
As a maker of HRIS software, we have hundreds of exports and imports. Do other companies REALLY unit test this type of thing? If so, are there any shortcuts to make it less painful? I'm half tempted to say "no unit testing the import/export routines" and just implement integration testing later.
Update - thanks for the answers all. One thing I'd love to see is an example, as I'm still not seeing how someone can turn something like a large file export into an easily testable block of code without turning the code into a mess.
This style of (attempted) unit testing where you try to cover an entire huge code base through a single public method always reminds me of surgeons, dentists or gynaecologists whe have perform complex operations through small openings. Possible, but not easy.
Encapsulation is an old concept in object-oriented design, but some people take it to such extremes that testability suffers. There's another OO principle called the Open/Closed Principle that fits much better with testability. Encapsulation is still valuable, but not at the expense of extensibility - in fact, testability is really just another word for the Open/Closed Principle.
I'm not saying that you should make your private methods public, but what I am saying is that you should consider refactoring your application into composable parts - many small classes that collaborate instead of one big Transaction Script. You may think it doesn't make much sense to do this for a solution to a single vendor, but right now you are suffering, and this is one way out.
What will often happen when you split up a single method in a complex API is that you also gain a lot of extra flexibility. What started out as a one-off project may turn into a reusable library.
Here are some thoughts on how to perform a refactoring for the problem at hand: Every ETL application must perform at least these three steps:
Extract data from the source
Transform the data
Load the data into the destination
(hence, the name ETL). As a start for refactoring, this give us at least three classes with distinct responsibilities: Extractor, Transformer and Loader. Now, instead of one big class, you have three with more targeted responsibilities. Nothing messy about that, and already a bit more testable.
Now zoom in on each of these three areas and see where you can split up responsibilities even more.
At the very least, you will need a good in-memory representation of each 'row' of source data. If the source is a relational database, you may want to use an ORM, but if not, such classes need to be modeled so that they correctly protect the invariants of each row (e.g. if a field is non-nullable, the class should guarantee this by throwing an exception if a null value is attempted). Such classes have a well-defined purpose and can be tested in isolation.
The same holds true for the destination: You need a good object model for that.
If there's advanced application-side filtering going on at the source, you could consider implementing these using the Specification design pattern. Those tend to be very testable as well.
The Transform step is where a lot of the action happens, but now that you have good object models of both source and destination, transformation can be performed by Mappers - again testable classes.
If you have many 'rows' of source and destination data, you can further split this up in Mappers for each logical 'row', etc.
It never needs to become messy, and the added benefit (besides automated testing) is that the object model is now way more flexible. If you ever need to write another ETL application involving one of the two sides, you alread have at least one third of the code written.
Something general that came to my mind about refactoring:
Refactoring does not mean you take your 3.5k LOC and divide it into n parts. I would not recommend to make some of your 80 methods public or stuff like this. It's more like vertically slicing your code:
Try to factor out self-standing algorithms and data structures like parsers, renderers, search operations, converters, special-purpose data structures ...
Try to figure out if your data is processed in several steps and can be build in a kind of pipe and filter mechanism, or tiered architecture. Try to find as many layers as possible.
Separate technical (files, database) parts from logical parts.
If you have many of these import/export monsters see what they have in common and factor that parts out and reuse them.
Expect in general that your code is too dense, i.e. it contains too many different functionalities next to each in too few LOC. Visit the different "inventions" in your code and think about if they are in fact tricky facilities that are worth having their own class(es).
Both LOC and number of classes are likely to increase when you refactor.
Try to make your code real simple ('baby code') inside classes and complex in the relations between the classes.
As a result, you won't have to write unit tests that cover the whole 3.5k LOC at all. Only small fractions of it are covered in a single test, and you'll have many small tests that are independent from each other.
EDIT
Here's a nice list of refactoring patterns. Among those, one shows quite nicely my intention: Decompose Conditional.
In the example, certain expressions are factored out to methods. Not only becomes the code easier to read but you also achieve the opportunity to unit test those methods.
Even better, you can lift this pattern to a higher level and factor out those expressions, algorithms, values etc. not only to methods but also to their own classes.
What you should have initially are integration tests. These will test that the functions perform as expected and you could hit the actual database for this.
Once you have that savety net you could start refactoring the code to be more maintainable and introducing unit tests.
As mentioned by serbrech Workign Effectively with Legacy code will help you to no end, I would strongly advise reading it even for greenfield projects.
http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052
The main question I would ask is how often does the code change? If it is infrequent is it really worth the effort trying to introduce unit tests, if it is changed frequently then I would definatly consider cleaning it up a bit.
It sounds like integration tests may be sufficient. Especially if these export routines that don't change once their done or are only used for a limited time. Just get some sample input data with a variations, and have a test that verifies the final result is as expected.
A concern with your tests was the amount of fake data you had to create. You may be able to reduce this by creating a shared fixture (http://xunitpatterns.com/Shared%20Fixture.html). For unit tests the fixture which may be an in-memory representation of business objects to export, or for the case on integration tests it may be the actual databases initialized with known data. The point is that however you generate the shared fixture is the same in each test, so creating new tests is just a matter of doing minor tweaks to the existing fixture to trigger the code you want to test.
So should you use integration tests? One barrier is how to set up the shared fixture. If you can duplicate the databases somewhere, you could use something like DbUnit to prepare the shared fixture. It might be easier to break the code into pieces (import, transform, export). Then use the DbUnit based tests to test import and export, and use regular unit tests to verify the transform step. If you do that you don't need DbUnit to set up a shared fixture for the transform step. If you can break the code into 3 steps (extract, transform, export) at least you can focus your testing efforts on the part thats likely to have bugs or change later.
I have nothing to do with C#, but I have some idea you could try here. If you split your code a bit, then you'll notice that what you have is basically chain of operations performed on sequences.
First one gets pays for current date:
var pays = _pays.GetPaysForCurrentDate();
Second one unconditionally processes the result
foreach (PayObject pay in pays)
{
WriteHeaderRow(pay);
}
Third one performs conditional processing:
foreach (PayObject pay in pays)
{
if (pay.IsFirstCheck)
{
WriteDetailRowType1(pay);
}
}
Now, you could make those stages more generic (sorry for pseudocode, I don't know C#):
var all_pays = _pays.GetAll();
var pwcdate = filter_pays(all_pays, current_date()) // filter_pays could also be made more generic, able to filter any sequence
var pwcdate_ann = annotate_with_header_row(pwcdate);
var pwcdate_ann_fc = filter_first_check_only(pwcdate_annotated);
var pwcdate_ann_fc_ann = annotate_with_detail_row(pwcdate_ann_fc); // this could be made more generic, able to annotate with arbitrary row passed as parameter
(Etc.)
As you can see, now you have set of unconnected stages that could be separately tested and then connected together in arbitrary order. Such connection, or composition, could also be tested separately. And so on (i.e. - you can choose what to test)
This is one of those areas where the concept of mocking everything falls over. Certainly testing each method in isolation would be a "better" way of doing things, but compare the effort of making test versions of all your methods to that of pointing the code at a test database (reset at the start of each test run if necessary).
That is the approach I'm using with code that has a lot of complex interactions between components, and it works well enough. As each test will run more code, you are more likely to need to step through with the debugger to find exactly where something went wrong, but you get the primary benefit of unit tests (knowing that something went wrong) without putting in significant additional effort.
I think Tomasz Zielinski has a piece of the answer. But if you say you have 3500 lines of procedural codes, then the the problem is bigger than that.
Cutting it into more functions will not help you test it. However, it' a first step to identify responsibilities that could be extracted further to another class (if you have good names for the methods, that can be obvious in some cases).
I guess with such a class you have an incredible list of dependencies to tackle just to be able to instanciate this class into a test. It becomes then really hard to create an instance of that class in a test...
The book from Michael Feathers "Working With Legacy Code" answer very well such questions.
The first goal to be able to test well that code into should be to identify the roles of the class and to break it into smaller classes. Of course that's easy to say and the irony is that it's risky to do without tests to secure your modifications...
You say you have only 1 public method in that class. That should ease the refactoring as you don't need to worry about the users fro, all the private methods. Encapsulation is nice, but if you have so much stuff private in that class, that probably means it doesn't belong here and you should extract different classes from that monster, that you will eventually be able to test. Pieces by pieces, the design should look cleaner, and you will be able to test more of that big piece of code.
You best friend if you start this will be a refactoring tool, then it should help you not to break logic while extracting classes and methods.
Again the book from Michael Feathers seems to be a must read for you :)
http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052
ADDED EXAMPLE :
This example come from the book from Michael Feathers and illustrate well your problem I think :
RuleParser
public evaluate(string)
private brachingExpression
private causalExpression
private variableExpression
private valueExpression
private nextTerm()
private hasMoreTerms()
public addVariables()
obvioulsy here, it doesn't make sense to make the methods nextTerm and hasMoreTerms public. Nobody should see these methods, the way we are moving to the next item is definitely internal to the class. so how to test this logic??
Well if you see that this is a separate responsibility and extract a class, Tokenizer for example. this method will suddenly be public within this new class! because that's its purpose. It becomes then easy to test that behaviour...
So if you would apply that to your huge piece of code, and extract pieces of it to other classes with less responsibilities, and where it would feel more natural to make these methods public, you also will be able to test them easily.
You said you are accessing about 40 different tables to map them. Why not breaking that into classes for each part of the mapping?
It's a bit hard to reason about a code I can't read. You maybe have other issues that prevent you to do this, but that's my best try on it.
Hope this helps
Good luck :)
I really find it hard to accept that you've got multiple, ~3.5 Klines data-export functions with no common functionality at all between them. If that's in fact the case, then maybe Unit Testing is not what you need to be looking at here. If there really is only one thing that each export module does, and it's essentially indivisible, then maybe a snapshot-comparison, data driven integration test suite is what's called for.
If there are common bits of functionality, then extract each of them out (as separate classes) and test them individually. Those little helper classes will naturally have different public interfaces, which should reduce the problem of private APIs that can't be tested.
You don't give any details about what the actual output formats look like, but if they're generally tabular, fixed-width or delimited text, then you ought at least to be able to split the exporters up into structural and formatting code. By which I mean, instead of your example code up above, you'd have something like:
public void ExportPaychecks(HeaderFormatter h, CheckRowFormatter f)
{
var pays = _pays.GetPaysForCurrentDate();
foreach (PayObject pay in pays)
{
h.formatHeader(pay);
f.WriteDetailRow(pay);
}
}
The HeaderFormatter and CheckRowFormatter abstract classes would define a common interface for those types of report elements, and the individual concrete subclasses (for the various reports) would contain logic for removing duplicate rows, for example (or whatever a particular vendor requires).
Another way to slice this is to separate data extraction and formatting from each other. Write code that extracts all the records from the various databases into an intermediate representation that's a super-set of the needed representations, then write relatively simple-minded filter routines that convert from the uber-format down to the required format for each vendor.
After thinking about this a little more, I realize you've identified this as an ETL application, but your example seems to combine all three steps together. That suggests that a first step would be to split things up such that all the data is extracted first, then translated, then stored. You can certainly test at least those steps separately.
I maintain some reports similar to what you describe, but not as many of them and with fewer database tables. I use a 3-fold strategy that might scale well enough to be useful to you:
At the method level, I unit test anything I subjectively deem to be 'complicated'. This includes 100% of bug fixes, plus anything that just makes me feel nervous.
At the module level, I unit test the main use cases. As you have encountered, this is fairly painful since it does require somehow mocking the data. I have accomplished this by abstracting the database interfaces (i.e. no direct SQL connections within my reporting module). For some simple tests I have typed the test data by hand, for others I have written a database interface that records and/or plays back queries, so that I can bootstrap my tests with real data. In other words, I run once in record mode and it not only fetches real data but it also saves a snapshot for me in a file; when I run in playback mode, it consults this file instead of the real database tables. (I'm sure there are mocking frameworks that can do this, but since every SQL interaction in my world has the signature Stored Procedure Call -> Recordset it was quite simple just to write it myself.)
I'm fortunate to have access to a staging environment with a full copy of production data, so I can perform integration tests with full regression against previous software versions.
Have you looked into Moq?
Quote from the site:
Moq (pronounced "Mock-you" or just
"Mock") is the only mocking library
for .NET developed from scratch to
take full advantage of .NET 3.5 (i.e.
Linq expression trees) and C# 3.0
features (i.e. lambda expressions)
that make it the most productive,
type-safe and refactoring-friendly
mocking library available.