Enhancing testability by decomposing batch tasks - c#

I can't seem to find much information on this so I thought I'd bring it up here. One of the issues I often find myself running into is unit testing the creation of a single object while processing a list. For example, I'd have a method signature such as IEnumerable<Output> Process(IEnumerable<Input> inputs). When unit testing a single input I would create a list of one input and simply call First() on the results and ensure it is what I expect it to be. This would lead to something such as:
public class BatchCreator
{
public IEnumerable<Output> Create(IEnumerable<Input> inputs)
{
foreach (var input in inputs)
{
Console.WriteLine("Creating Output...");
yield return new Output();
}
}
}
My current thinking is that maybe one class should be responsible for the objects creation while another class be responsible for orchestrating my list of inputs. See example below.
public interface ICreator<in TInput, out TReturn>
{
TReturn Create(TInput input);
}
public class SingleCreator : ICreator<Input, Output>
{
public Output Create(Input input)
{
Console.WriteLine("Creating Output...");
return new Output();
}
}
public class CompositeCreator : ICreator<IEnumerable<Input>, IEnumerable<Output>>
{
private readonly ICreator<Input, Output> _singleCreator;
public CompositeCreator(ICreator<Input, Output> singleCreator)
{
_singleCreator = singleCreator;
}
public IEnumerable<Output> Create(IEnumerable<Input> inputs)
{
return inputs.Select(input => _singleCreator.Create(input));
}
}
With what's been posted above, I can easily test that I'm able to create one single instance of Output given an Input. Note that I do not need to call SingleCreator anywhere else in the code base other than from CompositeCreator. Creating ICreator would also give me the benefit of reusing it for other times I need to do similar tasks, which I currently do 2-3 other times in my current project
Anyone have any experience with this that could shed some light? Am I simply overthinking this? Suggestions are greatly appreciated.

Generally speaking, there's nothing inherently wrong with your reasoning. More or less that's how the issue can be solved.
However, your CompositeCreator isn't actually composite, since it uses precisely one "creation method".
It's difficult to say anything more, because we don't know your project internals, but if it integrates well into your use cases, then it's fine. What I'd try is stay with ICreator<Tin, Tout> only and make an extension method IEnumerable<Tout> CreateMany(this IEnumerable<Tin> c) to deal with collections. You can test both easily, independently (fake ICreator and check whether collection of inputs is processed). This way you get rid of ICreator<IEnumerable, ...>, which is usually good, because operating on collection as a whole and operating on individual items often don't go well together.

I'm not entirely sure why you need the IEnumerable input/output option, the composite creator, unless it is more than just a collection, as that's a problem solved by LINQ, which would look something like:
var singleCreator = new SingleCreator();
var outputs = InputEnumerable.Select(singleCreator.Create);
I think this is subjective, and depends on the complexity of the classes you are passing around - if it's not just an IEnumerable then it's worthwhile having some sort of multiple creator, which may or may not need to be a class.

Related

C# refactoring considerations

I have the following doubt.
For refactoring, I have read that is good create methods that has a very specific responsability, so if it is possible, it is a good idea to split a complex method in others small methods.
But imagine that I have this case:
I have to create a list of objects, and insdie this objects, I have to create another object. Something like that:
public void myComplexMethod(List<MyTypeA> paramObjectsA)
{
foreach(MyTypeA iteratorA in paramObjectsA)
{
//Create myObjectB of type B
//Create myObjectC of type C
myObjectB.MyPorpertyTpyeC = myObjectC;
}
}
I can split this method in two methods.
public void myMethodCreateB(List<MyTypeA> paramObjectsA)
{
foreach(MyTypeA iteratorA in paramObjectsA)
{
//Create myObjectB of type B
}
}
public void myMethodCreateB(List<MyTypeB> paramObjectsB)
{
foreach(MyTypeB iteratorB in paramObjectsB)
{
//Create myObjectC of type C
iteratorB.PropertyC = myObjectC;
}
}
In the second option, when I use two methods instead one, the unit tests are less complex, but the problem is that I use two foreach loops, so it is less efficient than use only one loop like in the first option.
So, what is the best practice, at least in general, to use a method a bit more complex to be more efficient or to use more methods?
Thanks so much.
I generally put readability at a higher priority than performance, until proven otherwise. I'm generalizing now a bit but in my experience, when people focus on performance too much at the code level, the result is less maintainable code, it distracts them from creating functionally correct code, it takes longer (=more money), and possibly results in even less performant code.
So don't worry about it and use the more readable approach. If your app is really too slow in the end, run it through a profiler and pinpoint (and prove) the one or two places where it requires optimization. I can guarantee you it won't be this code.
Making the correct choices at the architectural level early on is much more critical because you won't be able to easily make changes at that level once your app is built.
Usually I would keep using one for-loop in this case.
Seems you are just create and decorate the objects of MyTypeB.
I would prefer create a factory method in class MyTypeB:
static MyTypeB Create(MyTypeA a) { // if the creation of MyTypeB depends on A
//Create myObjectB of type B
//Create myObjectC of type C
myObjectB.MyPorpertyTpyeC = myObjectC;
return myObjectB;
}
then your complex method will become:
public void myComplexMethod(List<MyTypeA> paramObjectsA)
{
foreach(MyTypeA iteratorA in paramObjectsA)
{
MyTypeB myObjectB = MyTypeB.Create(iteratorA);
}
}

Should I define custom enumerator or use built-in one?

I've been given some code from a customer that looks like this:
public class Thing
{
// custom functionality for Thing...
}
public class Things : IEnumerable
{
Thing[] things;
internal int Count { get { return things.Length; } }
public Thing this[int i] { get { return this.things[i]; } }
public IEnumerator GetEnumerator() { return new ThingEnumerator(this); }
// custom functionality for Things...
}
public class ThingEnumerator : IEnumerator
{
int i;
readonly int count;
Things container;
public ThingEnumerator(Things container)
{
i = -1;
count = container.Count;
this.container = container;
}
public object Current { get { return this.container[i]; } }
public bool MoveNext() { return ++i < count; }
public void Reset() { i = -1; }
}
What I'm wondering is whether it would have been better to have gotten rid of the ThingEnumerator class and replaced the Things.GetEnumerator call with an implementation that simply delegated to the array's GetEnumerator? Like so:
public IEnumerator GetEnumerator() { return things.GetEnumerator(); }
Are there any advantages to keeping the code as is? (Another thing I've noticed is that the existing code could be improved by replacing IEnumerator with IEnumerator<Thing>.)
With generics, there is really little value in implementing IEnumerable and IEnumerator yourself.
Removing these are replacing the class with a generic collection means you have far less code to maintain and has the advantage of using code that is known to work.
In the general case, there can sometimes be a reason to implement your own enumerator. You might want some functionality that the built-in one doesn't offer - some validation, logging, raising OnAccess-type events somewhere, perhaps some logic to lock items and release them afterwards for concurrent access (I've seen code that does that last one; it's odd and I wouldn't recommend it).
Having said that, I can't see anything like that in the example you've posted, so it doesn't seem to be adding any value beyond what IEnumerable provides. As a rule, if there's built-in code that does what you want, use it. All you'll achieve by rolling your own is to create more code to maintain.
The code you have looks like code that was written for .NET 1.0/1.1, before .NET generics were available - at that time, there was value in implementing your own collection class (generally derived from System.Collections.CollectionBase) so that the indexer property could be typed to the runtime type of the collection.
However, unless you were using value types and boxing/unboxing was the performance limitant, I would have inherited from CollectionBase and there would be no need to redefine GetEnumerator() or Count.
However, now, I would recommend one of these two approaches:
If you need the custom collection to have some custom functionality, then derive the collection from System.Collections.ObjectModel.Collection<Thing> - it provides all the necessary hooks for you to control insertion, replacement and deletion of items in the collection.
If you actually only need something that needs to be enumerated, I would return a standard IList<Thing> backed by a List<Thing>.
Unless you are doing something truly custom (such as some sort of validation) in the custom enumerator, there really isn't any reason to do this no.
Generally, go with what is available in the standard libraries unless there is definite reason not to. They are likely better tested and have more time spent on them, as individual units of code, then you can afford to spend, and why recreate the wheel?
In cases like this, the code already exists but it may still be better to replace the code if you have time to test very well. (It's a no-brainer if there is decent unit test coverage.)
You'll be reducing your maintenance overhead, removing a potential source of obscure bugs and leaving the code cleaner than you found it. Uncle Bob would be proud.
An array enumerator does pretty much the same as your custom enumerator, so yes, you can just as well return the array's enumerator directly.
In this case, I would recommend you do it, because array enumerators also perform more error checking and, as you stated, it's just simpler.

Need a C# example of unintended consequences

I am putting together a presentation on the benefits of Unit Testing and I would like a simple example of unintended consequences: Changing code in one class that breaks functionality in another class.
Can someone suggest a simple, easy to explain an example of this?
My plan is to write unit tests around this functionality to demonstrate that we know we broke something by immediately running the test.
A slightly simpler, and thus perhaps clearer, example is:
public string GetServerAddress()
{
return "172.0.0.1";
}
public void DoSomethingWithServer()
{
Console.WriteLine("Server address is: " + GetServerAddress());
}
If GetServerAddress is changes to return an array:
public string[] GetServerAddress()
{
return new string[] { "127.0.0.1", "localhost" };
}
The output from DoSomethingWithServer will be somewhat different, but it will all still compile, making for an even subtler bug.
The first (non-array) version will print Server address is: 127.0.0.1 and the second will print Server address is: System.String[], this is something I've also seen in production code. Needless to say it's no longer there!
Here's an example:
class DataProvider {
public static IEnumerable<Something> GetData() {
return new Something[] { ... };
}
}
class Consumer {
void DoSomething() {
Something[] data = (Something[])DataProvider.GetData();
}
}
Change GetData() to return a List<Something>, and Consumer will break.
This might seen somewhat contrived, but I've seen similar problems in real code.
Say you have a method that does:
abstract class ProviderBase<T>
{
public IEnumerable<T> Results
{
get
{
List<T> list = new List<T>();
using(IDataReader rdr = GetReader())
while(rdr.Read())
list.Add(Build(rdr));
return list;
}
}
protected abstract IDataReader GetReader();
protected T Build(IDataReader rdr);
}
With various implementations being used. One of them is used in:
public bool CheckNames(NameProvider source)
{
IEnumerable<string> names = source.Results;
switch(names.Count())
{
case 0:
return true;//obviously none invalid.
case 1:
//having one name to check is a common case and for some reason
//allows us some optimal approach compared to checking many.
return FastCheck(names.Single());
default:
return NormalCheck(names)
}
}
Now, none of this is particularly weird. We aren't assuming a particular implementaiton of IEnumerable. Indeed, this will work for arrays and very many commonly used collections (can't think of one in System.Collections.Generic that doesn't match off the top of my head). We've only used the normal methods, and the normal extension methods. It's not even unusual to have an optimised case for single-item collections. We could for instance change the list to be an array, or maybe a HashSet (to automatically remove duplicates), or a LinkedList or a few other things and it'll keep working.
Still, while we aren't depending on a particular implementation, we are depending on a particular feature, specifically that of being rewindable (Count() will either call ICollection.Count or else enumerate through the enumerable, after which the name-checking will take place.
Someone though sees Results property and thinks "hmm, that's a bit wasteful". They replace it with:
public IEnumerable<T> Results
{
get
{
using(IDataReader rdr = GetReader())
while(rdr.Read())
yield return Build(rdr);
}
}
This again is perfectly reasonable, and will indeed lead to a considerable performance boost in many cases. If CheckNames isn't hit in the immediate "tests" done by the coder in question (maybe it isn't hit in a lot of code paths), then the fact that CheckNames will error (and possibly return a false result in the case of more than 1 name, which may be even worse, if it opens a security risk).
Any unit test that hits on CheckNames with the more than zero results is going to catch it though.
Incidentally a comparable (if more complicated) change is a reason for a backwards-compatibility feature in NPGSQL. Not quite as simple as just replacing a List.Add() with a return yield, but a change to the way ExecuteReader worked gave a comparable change from O(n) to O(1) to get the first result. However, before then NpgsqlConnection allowed users to obtain another reader from a connection while the first was still open, and after it didn't. The docs for IDbConnection says you shouldn't do this, but that didn't mean there was no running code that did. Luckily one such piece of running code was an NUnit test, and a backwards-compatibility feature added to allow such code to continue to function with just a change to configuration.

Static Methods vs Class Instances and return values in C#

I have various classes for handling form data and querying a database. I need some advice on reducing the amount of code I write from site to site.
The following code is for handling a form posted via ajax to the server. It simply instantiates a Form class, validates the data and processes any errors:
public static string submit(Dictionary<string, string> d){
Form f = new Form("myform");
if (!f.validate(d)){
return f.errors.toJSON();
}
//process form...
}
Is there a way to reduce this down to 1 line as follows:
if (!Form.validate("myform", d)){ return Form.errors.toJSON(); }
Let's break that down into two questions.
1) Can I write the existing logic all in one statement?
The local variable has to be declared in its own statement, but the initializer doesn't have to be there. It's prefectly legal to say:
Form f;
if (!(f=new Form("myform")).validate(d))return f.errors.toJSON();
Why you would want to is beyond me; doing so is ugly, hard to debug, hard to understand, and hard to maintain. But it's perfectly legal.
2) Can I make this instance method into a static method?
Probably not directly. Suppose you had two callers validating stuff on two different threads, both calling the static Form.Validate method, and both producing errors. Now you have a race. One of them is going to win and fill in Form.Errors. And now you have two threads reporting the same set of errors, but the errors are wrong for one of them.
The better way to make this into a static method is to make the whole thing into a static method that has the desired semantics, as in plinth's answer.
Errors errors = Validator.Validate(d);
if (errors != null) return errors.toJSON();
Now the code is very clear, and the implementation of Validate is straightforward. Create a form, call the validator, either return null or the errors.
I would suggest that you don't need advice on reducing the amount of code you write. Rather, get advice on how to make the code read more like the meaning it intends to represent. Sometimes that means writing slightly more code, but that code is clear and easy to understand.
I would move all common validation logic to a superclass.
I think the main problem of your code is not that is long, but that you're repeating that in many places, either if you manage to make it a one-liner, it would not be DRY.
Take a look at the Template Method pattern, it might help here (The abstract class with the validation would be the Template and your specific 'actions' would be the subclasses).
Of course you could write this:
public static string FormValidate(Dictionary<string, string> d)
{
Form f = new Form("myform");
if (!f.validate(d))
return f.errors.ToJSON();
return null;
}
then your submit can be:
public static string submit(Dictionary<string, string> d)
{
if ((string errs = FormValidate(d))!= null) { return errs; }
// process form
}
That cuts down your code and doesn't hurt readability much at all.
If you really, really wanted to, you could store the error text in a thread-local property.
Does C# have a "ThreadLocal" analog (for data members) to the "ThreadStatic" attribute?

C#: Why can't we have inner methods / local functions?

Very often it happens that I have private methods which become very big and contain repeating tasks but these tasks are so specific that it doesn't make sense to make them available to any other code part.
So it would be really great to be able to create 'inner methods' in this case.
Is there any technical (or even philosophical?) limitation that prevents C# from giving us this? Or did I miss something?
Update from 2016: This is coming and it's called a 'local function'. See marked answer.
Well, we can have "anonymous methods" defined inside a function (I don't suggest using them to organize a large method):
void test() {
Action t = () => Console.WriteLine("hello world"); // C# 3.0+
// Action t = delegate { Console.WriteLine("hello world"); }; // C# 2.0+
t();
}
If something is long and complicated than usually its good practise to refactor it to a separate class (either normal or static - depending on context) - there you can have private methods which will be specific for this functionality only.
I know a lot of people dont like regions but this is a case where they could prove useful by grouping your specific methods into a region.
Could you give a more concrete example? After reading your post I have the following impression, which is of course only a guess, due to limited informations:
Private methods are not available outside your class, so they are hidden from any other code anyway.
If you want to hide private methods from other code in the same class, your class might be to big and might violate the single responsibility rule.
Have a look at anonymous delegates an lambda expressions. It's not exactly what you asked for, but they might solve most of your problems.
Achim
If your method becomes too big, consider putting it in a separate class, or to create private helper methods. Generally I create a new method whenever I would normally have written a comment.
The better solution is to refactor this method to separate class. Create instance of this class as private field in your initial class. Make the big method public and refactor big method into several private methods, so it will be much clear what it does.
Seems like we're going to get exactly what I wanted with Local Functions in C# 7 / Visual Studio 15:
https://github.com/dotnet/roslyn/issues/2930
private int SomeMethodExposedToObjectMembers(int input)
{
int InnerMethod(bool b)
{
// TODO: Change return based on parameter b
return 0;
}
var calculation = 0;
// TODO: Some calculations based on input, store result in calculation
if (calculation > 0) return InnerMethod(true);
return InnerMethod(false);
}
Too bad I had to wait more than 7 years for this :-)
See also other answers for earlier versions of C#.

Categories