Need assistance in structuring my code with design patterns.
So I have a single method in service class that looks like this. The constructor loads a json file into a List<Option> of Option class instances. Finally when the service method is called it does some logic to find the Option match based on the parameter values and returns a new instance of 'Tool' class with the configured "options".
public Tool BestOptionFinderService.FindBestMatch(string value 1, int value2, int value3, .. bool value20, etc...) {..}
I'm not sure if I a "service" class is correct for this versus a "factory" or something else. I would appreciate your thoughts and suggestions on how you would design your code for this problem or similar.
I would say, having single method with ~20 params is awful by itself, without even considering OOP patterns. Let's try to do better; more than that, let's try not to reinvent the wheel.
Important yet obvious observation: whatever matching logic one has, any object either matches or nor, and never both. Thus it makes sense to stick with Boolean algebra and signatures like predicate: t -> bool or bool Predicate<T>(T obj). The other thing we know about predicates, is that one can easily reduce arbitrary two into a single one: there are different ways to do it, however you're obviously interested in the and or && operator.
Thus, instead of having 20 parameters, you could have had twenty different simple, clear, self-describing predicates, later on "reduced" into a single instance. Linq.Aggregate() could help you out to not only beautify your code, but make it parallel (if necessary) as well. Then you could map your input to an sufficient (it's up to you to decide whether you do need more, or don't) number of flags, representing a "fitness" of a particular object you're examining.
Such an approach would indeed be better because:
1) You stick to well-known Boolean algebra,
1.A) which, actually, forms a monoid under various operation, including and, which, in turn, makes it
1.B) easily distributable over any computation cluster
1.C) compact, expressive, self-explanatory.
2) Your code tells much more story: simple predicates are easy to read, maintain, unit-test and refactor, which is never the case for 20-params methods.
3) You clearly separate your primitives from a compositions - more complex structures, built up from somewhat known primitives. That's the way math goes and thus the only known reliable, proven by thousands of years way to tackle down the complexity and not get enslaved by it at some point.
4) There are more advantages, but I tired to type ;)
Related
Consider the following class:
public class Person
{
public String FirstName;
public String LastName;
public DateTime DateOfBirth;
public StaffType PersonType;
}
And the following 2 methods and method calls:
public void DoSomethingWithPersonType(Person person)
{
TestMethod(person.PersonType);
}
(called by DoSomethingWithPersonType(person);)
public void DoSomethingWithPersonType(StaffType personType)
{
TestMethod(personType)
}
(called by DoSomethingPersonType(person.PersonType);).
Which method is more efficient? Are they both as 'efficient' as each other? Does it make no difference because it's only one reference passed in anyway? Or does the reference passed differ in some way?
The reason I ask is because I've opted for the first method signature in a current project - and I've done it because at a later date we may need to use other fields of our 'person' - so a more easily scalable/adaptable method is what I've got - but is it a more expensive one?
EDIT: Note - it goes without saying that these differences will be marginal, otherwise I would've already encountered scenarios were the performance implications are evident enough for me to act on. It's simply a matter of curiosity.
Performance wise they will be identical. In both cases you are passing a single reference (which has the same performance cost) and in both cases you're getting the PersonType object out of one of the fields.
The only difference here is in terms of code readability/maintainability, etc. In general, it's best for methods to only accept what information they need. If you only need a PersonType then only accept a PersonType. It allows that method to be used more flexibly. (What if something other than a Person has a PersonType?) If you think you'll actually need more than just the person type than accepting a Person may be appropriate.
Also note that even if there is a difference here, it would most certainly be tiny. You shouldn't think about every little think in terms of how expensive it is to run. Computers are fast these days. Even poor algorithms tend to take little time to run in practice (from the point of view of a person). If you get to the point where your program is taking longer to run than it needs to, then it's time to start looking for places to improve. You should focus on those areas of the code (identified with a profiler) as being a significant percentage of the processor time. Network communication (including database calls), and other IO tend to be good targets for optimizations.
Also note that it's generally considered bad practice to have public fields. At the very least, you should probably have public properties instead.
Whatever you decide to pass to the method performance implications will depend on context. If you pass parameters between methods in the same class performance implication will be immeasurably small, if you call a method over RMI it is a different matter
That depends somewhat on the type of StaffType, but normally the performance difference is negligible.
If StaffType if a class, then there isn't any difference, as both are references. If it's an enum then it would be marginally faster to pass a StaffType value than a reference.
(If StaffType is a large struct, then it would be slower to pass, but then the actual problem is a badly designed struct, not how you use it.)
So, you should use the one that makes most sense. Usually you should send as little information as possible, just to make it easier to see what the method actually uses. Whether you should send the StaffType value or the entire Person object for future possible expansion is hard to tell, but consider the YAGNI principle - You Ain't Gonna Need It.
Presumably, StaffType is an enum. In this case, it should be marginally more efficient than passing an object reference, the enum will create less work for garbage collection.
But any difference will be negligible, I would rate extensibility as far more important in this case.
Speaking of efficiency, they are equal becuase in both cases you'll pass a reference (speaking of two classes, else if StaffType is a struct the second one will be slower). But in terms of Best Coding Practices, it depends on what do you want to do. If you don't need any other property from the Person class you should use the second one, because you should pass less information as possible, but if you have to work more on the person object, you should use the first solution.
But you have to think to the re-usability of the method and your framework, let's say you need to use this method not only for the Person type but you also need it for another "type" you'll have to rewrite another method for it, and the aim of object oriented programming, which is code less, will disappear.
I have a method that takes 30 parameters. I took the parameters and put them into one class, so that I could just pass one parameter (the class) into the method. Is it perfectly fine in the case of refactoring to pass in an object that encapsulates all the parameters even if that is all it contains.
That is a great idea. It is typically how data contracts are done in WCF for example.
One advantage of this model is that if you add a new parameter, the consumer of the class doesn't need to change just to add the parameter.
As David Heffernan mentions, it can help self document the code:
FrobRequest frobRequest = new FrobRequest
{
FrobTarget = "Joe",
Url = new Uri("http://example.com"),
Count = 42,
};
FrobResult frobResult = Frob(frobRequest);
While other answers here are correctly point out that passing an instance of a class is better than passing 30 parameters, be aware that a large number of parameters may be a symptom of an underlying issue.
E.g., many times static methods grow in their number of parameters, because they should have been instance methods all along, and you are passing a lot of info that could more easily be maintained in an instance of that class.
Alternatively, look for ways to group the parameters into objects of a higher abstraction level. Dumping a bunch of unrelated parameters into a single class is a last resort IMO.
See How many parameters are too many? for some more ideas on this.
It's a good start. But now that you've got that new class, consider turning your code inside-out. Move the method which takes that class as a parameter into your new class (of course, passing an instance of the original class as the parameter). Now you've got a big method, alone in a class, and it will be easier to tease it apart into smaller, more manageable, testable methods. Some of those methods might move back to the original class, but a fair chunk will probably stay in your new class. You've moved beyond Introduce Parameter Object on to Replace Method with Method Object.
Having a method with thirty parameters is a pretty strong sign that the method is too long and too complicated. Too hard to debug, too hard to test. So you should do something about it, and Introduce Parameter Object is a fine place to start.
Whilst refactoring to a Parameter Object isn't in itself a bad idea it shouldn't be used to hide the problem that a class that needs 30 pieces of data provided from elsewhere could still be something of a code smell. The Introduce Parameter Object refactoring should probably be regarded as a step along the way in a broader refactoring process rather than the end of that procedure.
One of the concerns that it doesn't really address is that of Feature Envy. Does the fact that the class being passed the Parameter Object is so interested in the data of another class not indicate that maybe the methods that operate on that data should be moved to where the data resides? It's really better to identify clusters of methods and data that belong together and group them into classes, thereby increasing encapsulation and making your code more flexible.
After several iterations of splitting off behaviour and the data it operates on into separate units you should find that you no longer have any classes with enormous numbers of dependencies which is always a better end result because it'll make your code more supple.
That is an excellent idea and a very common solution to the problem. Methods with more than 2 or 3 parameters get exponentially harder and harder to understand.
Encapsulating all this in a single class makes for much clearer code. Because your properties have names you can write self-documenting code like this:
params.height = 42;
params.width = 666;
obj.doSomething(params);
Naturally when you have a lot of parameters the alternative based on positional identication is simply horrid.
Yet another benefit is that adding extra parameters to the interface contract can be done without forcing changes at all call sites. However, this is not always as trivial as it seems. If different call sites require different values for the new parameter, then it is harder to hunt them down than with the parameter based approach. In the parameter based approach, adding a new parameter forces a change at each call site to supply the new parameter and you can let the compiler do the work of finding them all.
Martin Fowler calls this Introduce Parameter Object in his book Refactoring. With that citation, few would call it a bad idea.
30 parameters is a mess. I think it's way prettier to have a class with the properties. You could even create multiple "parameter classes" for groups of parameters that fit in the same category.
You could also consider using a structure instead of a class.
But what you're trying to do is very common and a great idea!
It can be reasonable to use a Plain Old Data class whether you're refactoring or not. I'm curious as to why you thought it might not be.
Maybe C# 4.0's optional and named parameters be a good alternative to this?
Anyway, the method you are describing can also be good for abstracting the programs behavior. For example you could have one standard SaveImage(ImageSaveParameters saveParams)-function in an Interface where ImageSaveParameters also is an interface and can have additional parameters depending on the image-format. For example JpegSaveParameters has a Quality-property while PngSaveParameters has a BitDepth-property.
This is how the save save-dialog in Paint.NET does it so it is a very real life example.
As stated before: it is the right step to do, but consider the followings too:
your method might be too complex (you should consider dividing it into more methods, or even turn it into a separate class)
if you create the class for the parameters, make it immutable
if many of the parameters could be null or could have some default value, you might want to use the builder pattern for your class.
So many great answers here. I would like to add my two cents.
Parameter object is a good start. But there is more that could be done. Consider the following (ruby examples):
/1/ Instead of simply grouping all the parameters, see if there can be meaningful grouping of parameters. You might need more than one parameter object.
def display_line(startPoint, endPoint, option1, option2)
might become
def display_line(line, display_options)
/2/ Parameter object may have lesser number of properties than the original number of parameters.
def double_click?(cursor_location1, control1, cursor_location2, control2)
might become
def double_click?(first_click_info, second_click_info)
# MouseClickInfo being the parameter object type
# having cursor_location and control_at_click as properties
Such uses will help you discover possibilities of adding meaningful behavior to these parameter objects. You will find that they shake off their initial Data Class smell sooner to your comfort. :--)
In the comments of this answer it is stated that "checking whether the object has implemented the interface , rampant as it may be, is a bad thing"
Below is what I believe is an example of this practice:
public interface IFoo
{
void Bar();
}
public void DoSomething(IEnumerable<object> things)
{
foreach(var o in things)
{
if(o is IFoo)
((IFoo)o).Bar();
}
}
With my curiosity piqued as someone who has used variations of this pattern before, I searched for a good example or explanation of why it is a bad thing and was unable to find one.
While it is very possible that I misunderstood the comment, can someone provide me with an example or link to better explain the comment?
It depends on what you're trying to do. Sometimes it can be appropriate - examples could include:
LINQ to Objects, where it's used to optimize operations like Count which can be performed more efficiently on an IList<T> via the specialized members.
LINQ to XML, where it's used to provide a really friendly API which accepts a wide range of types, iterating over values where appropriate
If you wanted to find all the controls of a certain type under a particular control in Windows Forms, you would want to check whether each control was a container to determine whether or not to recurse.
In other cases it's less appropriate and you should consider whether you can change the parameter type instead. It's definitely a "smell" - normally you shouldn't concern yourself with the implementation details of whatever has been handed to you; you should just use the API provided by the declared parameter type. This is also known as a violation of the Liskov Substitution Principle.
Whatever the dogmatic developers around may say, there are times when you simply do want to check an object's execution time type. It's hard to override object.Equals(object) correctly without using is/as/GetType, for example :) It's not always a bad thing, but it should always make you consider whether there's a better approach. Use sparingly, only where it's genuinely the most appropriate design.
I would personally rather write the code you've shown like this, mind you:
public void DoSomething(IEnumerable<object> things)
{
foreach(var foo in things.OfType<IFoo>())
{
foo.Bar();
}
}
It accomplishes the same thing, but in a neater way :)
I would expect the method to look like this, it seems much safer:
public void DoSomething(IEnumerable<IFoo> things)
{
foreach(var o in things)
{
o.Bar();
}
}
To read about the referred violation of the Liskov Principle: What is the Liskov Substitution Principle?
If you want to know why the commenter made that comment, probably best to ask them to explain.
I would not consider the code you posted to be "bad". A more "genuinely" bad practice is to use interfaces as markers. That is, you're not planning on actually using a method of the interface; rather, you have declared the interface on a class as a way of describing it in some way. Use attributes, not interfaces, as markers on classes.
Marker interfaces are hazardous in a number of ways. A real-world situation I once ran into where an important product made a bad decision on the basis of a marker interface is here: http://blogs.msdn.com/b/ericlippert/archive/2004/04/05/108086.aspx
That said, the C# compiler itself uses a "marker interface" in one situation. Mads tells the story here: http://blogs.msdn.com/b/madst/archive/2006/10/10/what-is-a-collection_3f00_.aspx
A reason is that there will be a dependency on that interface that is not immediately visible without digging in the code.
The statement
checking whether the object has implemented the interface , rampant
as it may be, is a bad thing
Is overly dogmatic in my opinion. As other people have answered, you may well be able to pass a collection of IFoo to your method and achieve the same result.
However, interfaces can be useful to add optional features to classes. For example the .net framework provides the IDataErrorInfo interface*. When this is implemented it indicates to a consumer that in addition to the class' standard functionality, it can also provide error information.
In this case, the error information is optional. A WPF view model may or may not provide error information. Without querying for interfaces, this optional functionality would not be possible without base classes with huge surface area.
*We'll ignore for the moment the terrible design of the IDataErrorInfo interface.
If your method requires that you inject an instance of an interface, you should treat it the same regardless of the implementation.
In your example you generally wouldn't have a generic list of object, but a list of ISomething's and calling an ISomething.Bar() would be implemented by the concrete type, therefore calling it's implementaiton. If that implementation is to do nothing, then you don't have to do a check.
I dislike this whole "switch on type" style of coding for a couple of reasons. (Examples drawn in relation to my industry, game development. Apologies in advance. :) )
First and foremost, I think it's sloppy to have a heterogeneous collection of items. E.g. I could have a collection of "everything everywhere," but then when iterating the collection to apply bullet effects or fire damage or enemy AI, I have to walk this list which is mostly stuff I don't care about. It's much "cleaner" IMHO to have separate collections of bullets, raging fires, and enemies. Note that there's no reason why I can't have a single item in multiple collections; a single burning robotic missile could be referenced in all three of those lists to do parts of its "update" as appropriate for the three types of logic it needs to run. Outside of having "one single collection that references everything," I think a collection containing everything everywhere is not terribly useful; you can't do anything with anything in the list unless you query it for what it can do.
I hate doing unnecessary work. This really ties into the above, but when you create a given thing you know what its capabilities are (or can query them at that point), so you might as well take the opportunity at that time to put them in the right more specific collections. You have 16ms to process everything in the world, do you want to waste your time dealing with, querying, and selecting from generic things, or do you want to get down to business and operate only on the specific things you care about?
In my experience, transforming a codebase from generic operation on heterogeneous datasets to one that has homogeneous datasets has resulted in not only performance increases but also comprehension increases that come from simpler code doing more obvious work and in general a reduction in the amount of code required to do any given task.
So yeah, it's dogmatic to say that querying interfaces is bad, but it does seem to make things simpler if you can figure out how to avoid needing to query anything. As for my "performance" statements and the counter that "if you don't measure it, you can't say anything about it," it should be obvious that not doing something is faster than doing it. Whether or not this is important to an individual project, programmer, or function is up to the person with the editor, but if I can simplify code and while doing so make it do less work for the same results, I'm going to do it without bothering to measure.
I don’t see this as a “bad thing” at all, at least not in itself. The code is merely a literal transcription of “x all of the y in z”, and in a situation where you need to do that, it’s perfectly acceptable. You can of course use things.OfType<Foo>() for the sake of concision.
The main reason to recommend against it is that, according to OOP theology, interfaces are intended to model the different kinds of “black box” for which an object may substituted. Predicating an algorithm on fulfillment of an interface constitutes moving behaviour to the algorithm that should be in that interface.
Essentially, an interface is a behavioural role. If you think OOP is a good idea, then you should use interfaces only to model behaviours, so that algorithms don’t have to. I don’t think what passes for OOP these days is in fact a good idea, so this is as far as my answer can be useful.
My question is about naming, design, and implementation choices. I can see myself going in two different directions with how to solve an issue and I'm interested to see where others who may have come across similar concerns would handle the issue. It's part aesthetics, part function.
A little background on the code... I created a type called ISlice<T> that provides a reference into a section of a source of items which can be a collection (e.g. array, list) or a string. The core support comes from a few implementation classes that support fast indexing using the Begin and End markers for the slice to get the item from the original source. The purpose is to provide slicing capabilities similar to what the Go language provides while using Python style indexing (i.e. both positive and negative indexes are supported).
To make creating slices (instances of ISlice<T>) easier and more "fluent", I created a set of extension methods. For example:
static public ISlice<T> Slice<T>(this IList<T> source, int begin, int end)
{
return new ListSlice<T>(source, begin, end);
}
static public ISlice<char> Slice(this string source, int begin, int end)
{
return new StringSlice(source, begin, end);
}
There are others, such as providing optional begin/end parameters, but the above will suffice for where I'm going with this.
These routines work well and make it easy to slice up a collection or a string. What I also need is way to take a slice and create a copy of it as an array, a list, or a string. That's where things get "interesting". Originally, I thought I'd need to create ToArray, ToList extension methods, but then remembered that the LINQ variants perform optimizations if your collection implements ICollection<T>. In my case, ISlice<T>, does inherits from it, though much to my chagrin as I dislike throwing NotSupportedExceptions from methods like Add. Regardless, I get those for free. Great.
What about converting back into a string as there's no built-in support for converting an IEnumerable<char> easily back into a string? Closest thing I found is one of the string.Concat overloads, but it would not handle chars as efficiently as it could. Just as important from a design stand point is that it doesn't jump out as a "conversion" routine.
The first thought was to create a ToString extension method, but that doesn't work as ToString is an instance method which means it trumps extension methods and would never be called. I could override ToString, but the behavior would be inconsistent as ListSlice<T> would need to special case its ToString for times where T is a char. I don't like that as the ToString will give something useful when the type parameter is a char, but the class name in other cases. Also, if there are other slice types created in the future I'd have to create a common base class to ensure the same behavior or each class would have to implement this same check. An extension method on the interface would handle that much more elegantly.
The extension method leads me to a naming convention issue. The obvious is to use ToString, but as stated earlier it's not allowed. I could name it something different, but what? ToNewString? NewString? CreateString? Something in the To-family of methods would let it fall in with the ToArray/ToList routines, but ToNewString sticks out as being 'odd' when seen in the intellisense and code editor. NewString/CreateString are not as discoverable as you'd have to know to look for them. It doesn't fit the "conversion method" pattern that the To-family methods provide.
Go with overriding ToString and accept the inconsistent behavior hardcoded into the ListSlice<T> implementation and other implementations? Go with the more flexible, but potentially more poorly named extension method route? Is there a third option I haven't considered?
My gut tells me to go with the ToString despite my reservations, though, it also occurred to me... Would you even consider ToString giving you a useful output on a collection/enumerable type? Would that violate the principle of least surprise?
Update
Most implementations of slicing operations provide a copy, albeit a subset, of the data from whatever source was used for the slice. This is perfectly acceptable in most use cases and leaves for a clean API as you can simply return the same data type back. If you slice a list, you return a list containing only the items in the range specified in the slice. If you slice a string, you return a string. And so on.
The slicing operations I'm describing above are solving an issue when working with constraints which make this behavior undesirable. For example, if you work with large data sets, the slice operations would lead to unnecessary additional memory allocations not to mention the performance impact of copying the data. This is especially true if the slices will have further processing done on them before getting to your final results. So, the goal of the slice implementation is to have references into larger data sets to avoid making unnecessary copies of the information until it becomes beneficial to do so.
The catch is that at the end of the processing the desire to turn the slice-based processed data back into a more API and .NET friendly type like lists, arrays, and strings. It makes the data easier to pass into other APIs. It also allows you to discard the slices, thus, also the large data set the slices referenced.
Would you even consider ToString giving you a useful output on a collection/enumerable type? Would that violate the principle of least surprise?
No, and yes. That would be completely unexpected behavior, since it would behave differently than every other collection type.
As for this:
What about converting back into a string as there's no built-in support for converting an IEnumerable>char< easily back into a string?
Personally, I would just use the string constructor taking an array:
string result = new string(mySlice.ToArray());
This is explicit, understood, and expected - I expect to create a new string by passing an object to a constructor.
Perhaps the reason for your conundrum is the fact that you are treating string as a ICollection<char>. You haven't provide details about the problem that you are trying to solve but maybe that's a wrong assumption.
It's true that a string is an IEnumerable<char>. But as you've noticed assuming a direct mapping to a collection of chars creates problems. Strings are just too "special" in the framework.
Looking at it from the other end, would it be obvious that the difference between an ISlice<char> and ISlice<byte> is that you can concatenate the former into a string? Would there be a concatenate operation on the latter that makes sense? What about ISlice<string>? Shouldn't I be able to concatenate those as well?
Sorry I'm not providing specific answers but maybe these questions will point you at the right solution for your problem.
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
What Advantages of Extension Methods have you found?
All right, first of all, I realize this sounds controversial, but I don't mean to be confrontational. I am asking a serious question out of genuine curiosity (or maybe puzzlement is a better word).
Why were extension methods ever introduced to .NET? What benefit do they provide, aside from making things look nice (and by "nice" I mean "deceptively like instance methods")?
To me, any code that uses an extension method like this:
Thing initial = GetThing();
Thing manipulated = initial.SomeExtensionMethod();
is misleading, because it implies that SomeExtensionMethod is an instance member of Thing, which misleads developers into believing (at least as a gut feeling... you may deny it but I've definitely observed this) that (1) SomeExtensionMethod is probably implemented efficiently, and (2) since SomeExtensionMethod actually looks like it's part of the Thing class, surely it will remain valid if Thing is revised at some point in the future (as long as the author of Thing knows what he/she's doing).
But the fact is that extension methods don't have access to protected members or any of the internal workings of the class they're extending, so they're just as prone to breakage as any other static methods.
We all know that the above could easily be:
Thing initial = GetThing();
Thing manipulated = SomeNonExtensionMethod(initial);
To me, this seems a lot more, for lack of a better word, honest.
What am I missing? Why do extension methods exist?
Extension methods were needed to make Linq work in the clean way that it does, with method chaining. If you have to use the "long" form, it causes the function calls and the parameters to become separated from each other, making the code very hard to read. Compare:
IEnumerable<int> r = list.Where(x => x > 10).Take(5)
versus
// What does the 5 do here?
IEnumerable<int> r = Enumerable.Take(Enumerable.Where(list, x => x > 10), 5);
Like anything, they can be abused, but extension methods are really useful when used properly.
I think that the main upside is discoverability. Type initial and a dot, and there you have all the stuff that you can do with it. It's a lot harder to find static methods tucked away in some class somewhere else.
First of all, in the Thing manipulated = SomeNonExtensionMethod(initial); case, SomeNonExtensionMethod is based on exactly the same assumptions like in the Thing manipulated = initial.SomeExtensionMethod(); case. Thing can change, SomeExtensionMethod can break. That's life for us programmers.
Second, when I see Thing manipulated = initial.SomeExtensionMethod();, it doesn't tell me exactly where SomeExtensionMethod() is implemented. Thing could inherit it from TheThing, which inherits it from TheOriginalThing. So the "misleading" argument leads to nowhere. I bet the IDE takes care of leading you to the right source, doesn't it?
What's so great? It makes code more consistent. If it works on a string, it looks like if it was a member of string. It's ugly to have several MyThing.doThis() methods and several static ThingUtil.doSomethingElse(Mything thing) methods in another class.
SO you can extend someone else's class. not yours... that's the advantage.
(and you can say.. oh I wish they implement this / that.... you just do it yourself..)
they are great for automatically mixing in functionality based on Interfaces that a class inherits without that class having to explicitly re implement it.
Linq makes use of this a lot.
Great way to decorate classes with extra functionality. Most effective when applied to an Interface rather than a specific class. Still a good way to extend Framework classes though.
It's just convenient syntactic sugar so that you can call a method with the same syntax regardless of whether it's actually part of the class. If party A releases a lib, and party B releases stuff that uses that lib, it's easier to just call everything with class.method(args) than to have to remember what gets called with method(class, args) vs. class.method(args).