How can I automate testing for my database driven function?

How can I automate testing for my database driven function? - c#

I'm trying to improve the automated testing in my application, but am unsure of the best way to proceed.
My app gathers data from multiple forms, recodes it and stores it in a database. I have created a pretty complex SQL view, which flattens the structure out, so it can be imported into a stats package (SPSS).
My concern is that the view is complex, and I want to automate some tests around it.
Currently I have some functional tests, which create a complete form objects model, and sends it into the application. I then retrieve the view from the database, and use reflection to test that the retrieved view fields match the original data.
The problem is that this is very manual and heavy, my fixtures are lengthy, and it is cumbersome to add new scenarios (i.e. various parts of the model incomplete).
Does anyone have any advice on how I could improve my test strategy? Tips tricks all welcome!
Thanks!

DbFit is perfect for this. DbFit is an extension of FitNesse which maybe you are already using since you spoke of using "fixtures". In any case, DbFit makes it really easy to set up a test where you can seed some data, run the View, compare the expected results, and then it will automatically rollback the data that you just seeded for the test. And it is very easy to update as you add more fields to the View. AND it requires no additional objects in your DB like some other SQL "unit" testing suites.
You can find more info on using DbFit at:
http://benilovj.github.com/dbfit
http://groups.google.com/group/dbfit
And here is a tutorial that I wrote for it that explains the basic options:
http://www.sqlservercentral.com/articles/Testing/64636/

This is a very difficult question to answer. It almost sounds to me like You want to make a single test that tests all in one go.
First, Your app should be constructed, so each functionality is isolated in its own class, thereby making it easy to test AND easy to replace by stubs when testing other things. Dependencies on other functions should be injected (Dependency Injection).
Second, you should use the same technique for external systems like database connections and SPSS file writers. This involves wrapping such functionality so these dependencies also can be injected, and thus replaced by stubs when testing other aspects of your app.
Third, be aware that if tests are hard to write, 99,99% of the time this indicates that your design is not as strong as it could be.
Regards,
Morten

Related

Using Specflow to test object copying

We've just started using specflow to do testing on a large codebase.
Since we're always adding features we said we'd be doing the tests on the new code and write tests for the old one when it's time to refactor.
One of our new features implies copying cars from one user's database to another's. There's not much ui for this so we don't have a way to check everything went ok other than hitting the database.
Is there an alternative to hitting the database when writing tests like this?

If the database access is isolated from the business logic well enough; i.e. behind some sort of repository interface then you cna mock the real repository in your tests and validate that the moked repository is called as would be apropriate for such a copy operation.

Best Practice - What to do when creating Unit Test with huge amount of entities

I am creating an unit test, but there are many entities. So do I have to insert all entities at database manually or is there any better solution?

Are you looking for something like Moq? You use it to create a Mock objects and Queryable lists of objects so that you don't need to put fake data into your database to test.

Have a look at this link on how to get going on writing unit tests. The one thing I think that may help you in regard to your question:
Mock out all external services and state
Otherwise, behaviour in those external services overlaps multiple tests, and state data means that different unit tests can influence each other’s outcome.
You’ve definitely taken a wrong turn if you have to run your tests in a specific order, or if they only work when your database or network connection is active.
(By the way, sometimes your architecture might mean your code touches static variables during unit tests. Avoid this if you can, but if you can’t, at least make sure each test resets the relevant statics to a known state before it runs.)

Unit testing database code [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Unit testing on code that uses the Database
I am just starting with unit testing and wondering how to unit test methods that are making actual changes to my database. Would the best way be to put them into transactions and then rollback, or are there better approaches to this?

If you want proper test coverage, you need two types of tests:
Unit-tests which mock all your actual data access. These tests will not acually write to the database, but test the behaviour of the class that does (which methods it calls on other dependencies, etc.)
System tests (or integration tests) which check that your database can be accessed and modified. I would considered two types of tests here: simple plain CRUD tests (create / read / update / delete) for each one of your model objects, and more complex system tests for your actual methods, and everything you deem interesting or valuable to test. Good practices here are to have each test starting from an empty (or "ready for the test") database, do its stuff, then check state of the database. Transactions / rollbacks are one good way to achieve this.

For unit testing you need to mock or stub the data access code, mostly you have repository interface and you can stub it by creating a concrete repository which stores data in memory, or you could mock it using dynamic mocking framework ..
For system or integration testing, you need to re-create the entire database before each test method in order to maintain a stable state before each test.

As per some of the previous answers if you want to test your data access code then you might want to think about mocks and a system/integration test strategy.
But, if you want to unit test your SQL objects (eg sprocs, views, constraints in tables etc) - then there are a number of database unit testing frameworks out there that might be of interest (including one that I have written).
Some implement tests within SQL, others within your code and use mbUnit/NUnit etc.
I have written a number of articles with examples on how I approach this - see http://dbtestunit.wordpress.com/
Other resources that might be of use:
http://www.simple-talk.com/sql/t-sql-programming/close-those-loopholes---testing-stored-procedures--/
http://tsqlt.org/articles/

The general approach is to have a way to mock you database actions. So that your unit tests are not reliant on the database being available or in a certain state. That said it also implies design that facilitates the isolation required to mock away your data layer. Unit test and how to do it well is a huge topic. Take a look on the googley for Mock Frameworks, and Dependency injection for a start.

If you are not developing an O/R mapper, there's no need to test database code. You don't want to test ADO.NET methods, right? Instead you want to verify that the ADO.NET methods are called with the right values.
Search Google for repository pattern. You will create an implementation of IRepository interface with CRUD methods and test/mock this.

If you want to test against a real database, this would be more of an integration then a unit test. Wrapping your tests in transaction could be an idea to keep your database in a consistent state.
We've done this in a base class and used the TestInitialize and TestCleanup functions to make sure this always happens.
But testing against a real database will certainly bring you into performance problems. So make sure from the beginning that you can swap your database access code with something that runs in memory. I don't now which database access code your targeting but design patterns like UnitOfWork and Repository can help you to isolate your database code and replace it with an in memory solution.

Using MSTest to test multiple database types

I am currently developing an application that connects to multiple database engines (2 right now, but this will grow in the future) but does similar things on each database. I would like to develop a set of unit tests that I only have to write once, but can be run on a different database engine. This application will be extremely complex, and I predict that I will be writing hundreds if not thousands of tests for it.
For example, I have a method that retrieves all the databases available in a database server, and I have two classes which have the same interface that defines the GetDatabases() method. I would like to develop one method that creates an instance of a class that implements IDatabaseEngine, and call the GetDatabases() method on it.
I then want to call this method once with my MySQLDatabaseEngine class, and once again with my SqlServerDatabaseEngine class and test the output.
I am currently using MSTest, because this is what I am most familiar with, but I am not against switching my test engine if it proves to be unsuitable for this task. As I only started this morning, I have only written three tests for this so far, so switching would not be a problem at all.
It may not even be necessary to do something different with the configuration of MSTest, but to develop some sort of test harness inside MSTest to run a method twice with different parameters. However, I would like to avoid any sort of situation where I have to
I have considered code generation, but I would really like to avoid this.

you can also have private test methods which accept the interface implementation as parameter and you call them twice with one or the other class (mySql or SQL Server), this way you have one or two callers ( better two ) but the test in itself is written only once.
Or, you could successfully use dependency injection to get what you need, I don't have very much experience with DI but for what I have heard is great to mock and simplify these usage patterns, among tons of other advantages.
Search about NInject or Unity.

Unit testing large blocks of code (mappings, translation, etc)

We unit test most of our business logic, but are stuck on how best to test some of our large service tasks and import/export routines. For example, consider the export of payroll data from one system to a 3rd party system. To export the data in the format the company needs, we need to hit ~40 tables, which creates a nightmare situation for creating test data and mocking out dependencies.
For example, consider the following (a subset of ~3500 lines of export code):
public void ExportPaychecks()
{
var pays = _pays.GetPaysForCurrentDate();
foreach (PayObject pay in pays)
{
WriteHeaderRow(pay);
if (pay.IsFirstCheck)
{
WriteDetailRowType1(pay);
}
}
}
private void WriteHeaderRow(PayObject pay)
{
//do lots more stuff
}
private void WriteDetailRowType1(PayObject pay)
{
//do lots more stuff
}
We only have the one public method in this particular export class - ExportPaychecks(). That's really the only action that makes any sense to someone calling this class ... everything else is private (~80 private functions). We could make them public for testing, but then we'd need to mock them to test each one separately (i.e. you can't test ExportPaychecks in a vacuum without mocking the WriteHeaderRow function. This is a huge pain too.
Since this is a single export, for a single vendor, moving logic into the Domain doesn't make sense. The logic has no domain significance outside of this particular class. As a test, we built out unit tests which had close to 100% code coverage ... but this required an insane amount of test data typed into stub/mock objects, plus over 7000 lines of code due to stubbing/mocking our many dependencies.
As a maker of HRIS software, we have hundreds of exports and imports. Do other companies REALLY unit test this type of thing? If so, are there any shortcuts to make it less painful? I'm half tempted to say "no unit testing the import/export routines" and just implement integration testing later.
Update - thanks for the answers all. One thing I'd love to see is an example, as I'm still not seeing how someone can turn something like a large file export into an easily testable block of code without turning the code into a mess.

This style of (attempted) unit testing where you try to cover an entire huge code base through a single public method always reminds me of surgeons, dentists or gynaecologists whe have perform complex operations through small openings. Possible, but not easy.
Encapsulation is an old concept in object-oriented design, but some people take it to such extremes that testability suffers. There's another OO principle called the Open/Closed Principle that fits much better with testability. Encapsulation is still valuable, but not at the expense of extensibility - in fact, testability is really just another word for the Open/Closed Principle.
I'm not saying that you should make your private methods public, but what I am saying is that you should consider refactoring your application into composable parts - many small classes that collaborate instead of one big Transaction Script. You may think it doesn't make much sense to do this for a solution to a single vendor, but right now you are suffering, and this is one way out.
What will often happen when you split up a single method in a complex API is that you also gain a lot of extra flexibility. What started out as a one-off project may turn into a reusable library.
Here are some thoughts on how to perform a refactoring for the problem at hand: Every ETL application must perform at least these three steps:
Extract data from the source
Transform the data
Load the data into the destination
(hence, the name ETL). As a start for refactoring, this give us at least three classes with distinct responsibilities: Extractor, Transformer and Loader. Now, instead of one big class, you have three with more targeted responsibilities. Nothing messy about that, and already a bit more testable.
Now zoom in on each of these three areas and see where you can split up responsibilities even more.
At the very least, you will need a good in-memory representation of each 'row' of source data. If the source is a relational database, you may want to use an ORM, but if not, such classes need to be modeled so that they correctly protect the invariants of each row (e.g. if a field is non-nullable, the class should guarantee this by throwing an exception if a null value is attempted). Such classes have a well-defined purpose and can be tested in isolation.
The same holds true for the destination: You need a good object model for that.
If there's advanced application-side filtering going on at the source, you could consider implementing these using the Specification design pattern. Those tend to be very testable as well.
The Transform step is where a lot of the action happens, but now that you have good object models of both source and destination, transformation can be performed by Mappers - again testable classes.
If you have many 'rows' of source and destination data, you can further split this up in Mappers for each logical 'row', etc.
It never needs to become messy, and the added benefit (besides automated testing) is that the object model is now way more flexible. If you ever need to write another ETL application involving one of the two sides, you alread have at least one third of the code written.

Something general that came to my mind about refactoring:
Refactoring does not mean you take your 3.5k LOC and divide it into n parts. I would not recommend to make some of your 80 methods public or stuff like this. It's more like vertically slicing your code:
Try to factor out self-standing algorithms and data structures like parsers, renderers, search operations, converters, special-purpose data structures ...
Try to figure out if your data is processed in several steps and can be build in a kind of pipe and filter mechanism, or tiered architecture. Try to find as many layers as possible.
Separate technical (files, database) parts from logical parts.
If you have many of these import/export monsters see what they have in common and factor that parts out and reuse them.
Expect in general that your code is too dense, i.e. it contains too many different functionalities next to each in too few LOC. Visit the different "inventions" in your code and think about if they are in fact tricky facilities that are worth having their own class(es).
Both LOC and number of classes are likely to increase when you refactor.
Try to make your code real simple ('baby code') inside classes and complex in the relations between the classes.
As a result, you won't have to write unit tests that cover the whole 3.5k LOC at all. Only small fractions of it are covered in a single test, and you'll have many small tests that are independent from each other.
EDIT
Here's a nice list of refactoring patterns. Among those, one shows quite nicely my intention: Decompose Conditional.
In the example, certain expressions are factored out to methods. Not only becomes the code easier to read but you also achieve the opportunity to unit test those methods.
Even better, you can lift this pattern to a higher level and factor out those expressions, algorithms, values etc. not only to methods but also to their own classes.

What you should have initially are integration tests. These will test that the functions perform as expected and you could hit the actual database for this.
Once you have that savety net you could start refactoring the code to be more maintainable and introducing unit tests.
As mentioned by serbrech Workign Effectively with Legacy code will help you to no end, I would strongly advise reading it even for greenfield projects.
http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052
The main question I would ask is how often does the code change? If it is infrequent is it really worth the effort trying to introduce unit tests, if it is changed frequently then I would definatly consider cleaning it up a bit.

It sounds like integration tests may be sufficient. Especially if these export routines that don't change once their done or are only used for a limited time. Just get some sample input data with a variations, and have a test that verifies the final result is as expected.
A concern with your tests was the amount of fake data you had to create. You may be able to reduce this by creating a shared fixture (http://xunitpatterns.com/Shared%20Fixture.html). For unit tests the fixture which may be an in-memory representation of business objects to export, or for the case on integration tests it may be the actual databases initialized with known data. The point is that however you generate the shared fixture is the same in each test, so creating new tests is just a matter of doing minor tweaks to the existing fixture to trigger the code you want to test.
So should you use integration tests? One barrier is how to set up the shared fixture. If you can duplicate the databases somewhere, you could use something like DbUnit to prepare the shared fixture. It might be easier to break the code into pieces (import, transform, export). Then use the DbUnit based tests to test import and export, and use regular unit tests to verify the transform step. If you do that you don't need DbUnit to set up a shared fixture for the transform step. If you can break the code into 3 steps (extract, transform, export) at least you can focus your testing efforts on the part thats likely to have bugs or change later.

I have nothing to do with C#, but I have some idea you could try here. If you split your code a bit, then you'll notice that what you have is basically chain of operations performed on sequences.
First one gets pays for current date:
var pays = _pays.GetPaysForCurrentDate();
Second one unconditionally processes the result
foreach (PayObject pay in pays)
{
WriteHeaderRow(pay);
}
Third one performs conditional processing:
foreach (PayObject pay in pays)
{
if (pay.IsFirstCheck)
{
WriteDetailRowType1(pay);
}
}
Now, you could make those stages more generic (sorry for pseudocode, I don't know C#):
var all_pays = _pays.GetAll();
var pwcdate = filter_pays(all_pays, current_date()) // filter_pays could also be made more generic, able to filter any sequence
var pwcdate_ann = annotate_with_header_row(pwcdate);
var pwcdate_ann_fc = filter_first_check_only(pwcdate_annotated);
var pwcdate_ann_fc_ann = annotate_with_detail_row(pwcdate_ann_fc); // this could be made more generic, able to annotate with arbitrary row passed as parameter
(Etc.)
As you can see, now you have set of unconnected stages that could be separately tested and then connected together in arbitrary order. Such connection, or composition, could also be tested separately. And so on (i.e. - you can choose what to test)

This is one of those areas where the concept of mocking everything falls over. Certainly testing each method in isolation would be a "better" way of doing things, but compare the effort of making test versions of all your methods to that of pointing the code at a test database (reset at the start of each test run if necessary).
That is the approach I'm using with code that has a lot of complex interactions between components, and it works well enough. As each test will run more code, you are more likely to need to step through with the debugger to find exactly where something went wrong, but you get the primary benefit of unit tests (knowing that something went wrong) without putting in significant additional effort.

I think Tomasz Zielinski has a piece of the answer. But if you say you have 3500 lines of procedural codes, then the the problem is bigger than that.
Cutting it into more functions will not help you test it. However, it' a first step to identify responsibilities that could be extracted further to another class (if you have good names for the methods, that can be obvious in some cases).
I guess with such a class you have an incredible list of dependencies to tackle just to be able to instanciate this class into a test. It becomes then really hard to create an instance of that class in a test...
The book from Michael Feathers "Working With Legacy Code" answer very well such questions.
The first goal to be able to test well that code into should be to identify the roles of the class and to break it into smaller classes. Of course that's easy to say and the irony is that it's risky to do without tests to secure your modifications...
You say you have only 1 public method in that class. That should ease the refactoring as you don't need to worry about the users fro, all the private methods. Encapsulation is nice, but if you have so much stuff private in that class, that probably means it doesn't belong here and you should extract different classes from that monster, that you will eventually be able to test. Pieces by pieces, the design should look cleaner, and you will be able to test more of that big piece of code.
You best friend if you start this will be a refactoring tool, then it should help you not to break logic while extracting classes and methods.
Again the book from Michael Feathers seems to be a must read for you :)
http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052
ADDED EXAMPLE :
This example come from the book from Michael Feathers and illustrate well your problem I think :
RuleParser
public evaluate(string)
private brachingExpression
private causalExpression
private variableExpression
private valueExpression
private nextTerm()
private hasMoreTerms()
public addVariables()
obvioulsy here, it doesn't make sense to make the methods nextTerm and hasMoreTerms public. Nobody should see these methods, the way we are moving to the next item is definitely internal to the class. so how to test this logic??
Well if you see that this is a separate responsibility and extract a class, Tokenizer for example. this method will suddenly be public within this new class! because that's its purpose. It becomes then easy to test that behaviour...
So if you would apply that to your huge piece of code, and extract pieces of it to other classes with less responsibilities, and where it would feel more natural to make these methods public, you also will be able to test them easily.
You said you are accessing about 40 different tables to map them. Why not breaking that into classes for each part of the mapping?
It's a bit hard to reason about a code I can't read. You maybe have other issues that prevent you to do this, but that's my best try on it.
Hope this helps
Good luck :)

I really find it hard to accept that you've got multiple, ~3.5 Klines data-export functions with no common functionality at all between them. If that's in fact the case, then maybe Unit Testing is not what you need to be looking at here. If there really is only one thing that each export module does, and it's essentially indivisible, then maybe a snapshot-comparison, data driven integration test suite is what's called for.
If there are common bits of functionality, then extract each of them out (as separate classes) and test them individually. Those little helper classes will naturally have different public interfaces, which should reduce the problem of private APIs that can't be tested.
You don't give any details about what the actual output formats look like, but if they're generally tabular, fixed-width or delimited text, then you ought at least to be able to split the exporters up into structural and formatting code. By which I mean, instead of your example code up above, you'd have something like:
public void ExportPaychecks(HeaderFormatter h, CheckRowFormatter f)
{
var pays = _pays.GetPaysForCurrentDate();
foreach (PayObject pay in pays)
{
h.formatHeader(pay);
f.WriteDetailRow(pay);
}
}
The HeaderFormatter and CheckRowFormatter abstract classes would define a common interface for those types of report elements, and the individual concrete subclasses (for the various reports) would contain logic for removing duplicate rows, for example (or whatever a particular vendor requires).
Another way to slice this is to separate data extraction and formatting from each other. Write code that extracts all the records from the various databases into an intermediate representation that's a super-set of the needed representations, then write relatively simple-minded filter routines that convert from the uber-format down to the required format for each vendor.
After thinking about this a little more, I realize you've identified this as an ETL application, but your example seems to combine all three steps together. That suggests that a first step would be to split things up such that all the data is extracted first, then translated, then stored. You can certainly test at least those steps separately.

I maintain some reports similar to what you describe, but not as many of them and with fewer database tables. I use a 3-fold strategy that might scale well enough to be useful to you:
At the method level, I unit test anything I subjectively deem to be 'complicated'. This includes 100% of bug fixes, plus anything that just makes me feel nervous.
At the module level, I unit test the main use cases. As you have encountered, this is fairly painful since it does require somehow mocking the data. I have accomplished this by abstracting the database interfaces (i.e. no direct SQL connections within my reporting module). For some simple tests I have typed the test data by hand, for others I have written a database interface that records and/or plays back queries, so that I can bootstrap my tests with real data. In other words, I run once in record mode and it not only fetches real data but it also saves a snapshot for me in a file; when I run in playback mode, it consults this file instead of the real database tables. (I'm sure there are mocking frameworks that can do this, but since every SQL interaction in my world has the signature Stored Procedure Call -> Recordset it was quite simple just to write it myself.)
I'm fortunate to have access to a staging environment with a full copy of production data, so I can perform integration tests with full regression against previous software versions.

Have you looked into Moq?
Quote from the site:
Moq (pronounced "Mock-you" or just
"Mock") is the only mocking library
for .NET developed from scratch to
take full advantage of .NET 3.5 (i.e.
Linq expression trees) and C# 3.0
features (i.e. lambda expressions)
that make it the most productive,
type-safe and refactoring-friendly
mocking library available.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.