To use or not to use Repositories with an ORM - c#

There are many voices saying that in this age of ORMs, there is no longer need for repositories. Recently I was thinking about it myself and their arguments made usually sense to me.
On my new project, using EF Core, I thought I try the no-repository approach to make things more straightforward and simple. Simplicity is always good, after all.
Things went pretty well, until I realized I often write things like this (this is not a real aggregate, just an artificial example):
ctx.Invoices.Include(x => x.User).ThenInclude(x => x.Address).Include(x => x.Items).FirstOrDefault(x => x.Id == id);
Alright, so every time I need to get an aggregate, I need to write this statement and there are chances, sometimes/someone will forget to include an important reference.
What about to put this code to a helper class?
class InvoiceHelper
{
public Invoice FindById(int id) {...}
}
Isn't this just a start of a repository in disguise? So even with an ORM, I still find repositories kinda useful.
How do others do these things and still appear to be happy without repositories?
Some articles discussing no-repository approach:
https://www.thereformedprogrammer.net/is-the-repository-pattern-useful-with-entity-framework-core/
https://rob.conery.io/2014/03/03/repositories-and-unitofwork-are-not-a-good-idea/
https://cockneycoder.wordpress.com/2013/04/07/why-entity-framework-renders-the-repository-pattern-obsolete/

If you believe in the rule of thumb "don't mock what you don't own", I fail to see how you could substitute your ORM functionality for a mock or stub in your tests without a Repository-like abstraction.
Repository is a very important seam - adapter in Hexagonal parlance - between your application and storage.
And yes, the fact that you needed to create a helper to factor in some data fetching behavior is a perfect illustration of that importance.
[Edit] Thoughts on the "no repos" articles
As often, most of the "repository backlash" is overreaction to misuse of Repository and UoW rooted in collective amnesia of the original reasons behind the patterns.
"Why hide a perfectly good framework ?" - EF is not!
For instance, IDbSet mentioned by one of the articles is a walking leaky abstraction, from its very name to Attach() to all its extension methods. Also see below misuses and misunderstandings of methods like Update() provided by the "perfectly good framework".
In contrast, the point of the original Repository pattern was to be a minimalist 3-method collection-like abstraction you can't go wrong with. Add(), Remove() and a few Get methods are all we need, not the bells and whistles of a whole framework.
"Repository doesn’t make testing any easier" : yes it does! The smaller the contract of a dependency, the easier it is to reason about, test, debug. A.k.a. KISS.
"1. Performance – sort/filter: Microsoft’s old (2013) implementation of the Rep/UoW’s has a method called GetStudents that returns IEnumerable. This means any filtering or sorting is going to be done in software, which is inefficient."
Seriously? The lousiness of Microsoft tutorial sample code written years ago is the #1 drawback about a whole pattern? When you just have to create a method inside the repo implementation and it can be as optimized as you need!
A Rep/UoW would update an entity using the EF Core’ Update method - Says who? A straw man again. I've been using these patterns for 10+ years and never needed that method. SaveChanges() has always been the way to go.
The allure of the Rep/UoW comes from the view that you can write one, generic repository then you use that to build all your sub-repositories
Absolutely not. This was not part of the original pattern and even largely identified as an anti-pattern in the community since the early days of DDD adoption.
"Option 2 - DDD-styled entity classes"
aka domain classes that do data access... which destroys persistence ignorance. Not really a good idea IMO.
I could go on and on about almost every argument there, but the gist of it is :
If you're going to blame a pattern, do it in full understanding of its fundamentals, not because you don't like some broken implementation that's been copy pasted and cargo culted along generations of developers to the point of uselessness.

Related

Is it correct to create UofW in order to share the DbContext among the Repositories?

Is it correct to create Unit of Work in order to share the DbContext among the Repositories?
If isn't what is the recommendation? I really think it is needed to share the DbContext sometimes.
I'm asking this because of the answer for this question: In-memory database doesn't save data
Is it correct to create Unit of Work in order to share the DbContext among the Repositories?
It is design decision, but yes. There is no problem in doing that. It is absolutely valid that code from multiple repositories is executed under one single connection.
I really think it is needed to share the DbContext sometimes.
Absolutely; there are many times when you need to share DbContext.
Your linked answer is really good. I specially like the three points it mention. OP on that question is doing some unnecessary complicated things like Singleton, Service Locator and Async calls without understanding how they work. All these are good things but only if they are used at right time at right place.
Following is from your linked answer:
The best thing is that all of these could be avoided if people stopped attempting to create a Unit of Work + Repository pattern over yet another Unit of Work and Repository. Entity Framework Core already implements these:
Yes; this is true. But even so, repository and UoW may be helpful in some cases. This is design decision based on business needs. I have explained this in my answers below:
https://stackoverflow.com/a/49850950/5779732
https://stackoverflow.com/a/50877329/5779732
Using ORM directly in calling code has following issues:
It makes code little more complicated.
Database code is merged in business logic.
As many ORM objects are used in-line in calling code, it is very hard to unit test the code.
All those issues could be overcome by creating Concrete Repositories in Data Access Layer. DAL should expose concrete repositories to calling code (BLL or Services or Controller whatever) through interfaces. This way, your database and ORM code is fully consumed in DAL and you can easily unit-test calling code by mocking repositories. Refer this article explaining benefit of repository even with ORMs.
Apart from all above, one other issue generally discussed is "What if we decide to change ORM in future". This is entirely false in my personal understanding. It happens very rarely and in most cases, should not be considered while design.
I recommend avoid overthinking and over-design. Focus on your business objectives.
Refer this example code to understand how to inject UoW in repositories. The code sample is with Dapper. But overall design may still useful to you.
What you need is a class that contains multiple repositories and creates a UoW. Then, when you have a use case in which you need to use multiple repositories with shared UoW, this class creates it and pass it to repositories.
I typically call this class Service, but I think there is not some standardized naming.

Should I avoid using Dependency Injection and IoC?

In my mid-size project I used static classes for repositories, services etc. and it actually worked very well, even if the most of programmers will expect the opposite. My codebase was very compact, clean and easy to understand. Now I tried to rewrite everything and use IoC (Invertion of Control) and I was absolutely disappointed. I have to manually initialize dozen of dependencies in every class, controller etc., add more projects for interfaces and so on. I really don't see any benefits in my project and it seems that it causes more problems than solves. I found the following drawbacks in IoC/DI:
much bigger codesize
ravioli-code instead of spaghetti-code
slower performance, need to initialize all dependencies in constructor even if the method I want to call has only one dependency
harder to understand when no IDE is used
some errors are pushed to run-time
adding additional dependency (DI framework itself)
new staff have to learn DI first in order to work with it
a lot of boilerplate code, which is bad for creative people (for example copy instances from constructor to properties...)
We do not test the entire codebase, but only certain methods and use real database. So, should Dependency Injection be avoided when no mocking is required for testing?
The majority of your concerns seem to boil down to either misuse or misunderstanding.
much bigger codesize
This is usually a result of properly respecting both the Single Responsibility Principle and the Interface Segregation Principle. Is it drastically bigger? I suspect not as large as you claim. However, what it is doing is most likely boiling down classes to specific functionality, rather than having "catch-all" classes that do anything and everything. In most cases this is a sign of healthy separation of concerns, not an issue.
ravioli-code instead of spaghetti-code
Once again, this is most likely causing you to think in stacks instead of hard-to-see dependencies. I think this is a great benefit since it leads to proper abstraction and encapsulation.
slower performance Just use a fast container. My favorites are SimpleInjector and LightInject.
need to initialize all dependencies in constructor even
if the method I want to call has only one dependency
Once again, this is a sign that you are violating the Single Responsibility Principle. This is a good thing because it is forcing you to logically think through your architecture rather than adding willy-nilly.
harder to understand when no IDE is used some errors are pushed to run-time
If you are STILL not using an IDE, shame on you. There's no good argument for it with modern machines. In addition, some containers (SimpleInjector) will validate on first run if you so choose. You can easily detect this with a simple unit test.
adding additional dependency (DI framework itself)
You have to pick and choose your battles. If the cost of learning a new framework is less than the cost of maintaining spaghetti code (and I suspect it will be), then the cost is justified.
new staff have to learn DI first in order to work with it
If we shy away from new patterns, we never grow. I think of this as an opportunity to enrich and grow your team, not a way to hurt them. In addition, the tradeoff is learning the spaghetti code which might be far more difficult than picking up an industry-wide pattern.
a lot of boilerplate code which is bad for creative people (for example copy instances from constructor to properties...)
This is plain wrong. Mandatory dependencies should always be passed in via the constructor. Only optional dependencies should be set via properties, and that should only be done in very specific circumstances since oftentimes it is violating the Single Responsibility Principle.
We do not test the entire codebase, but only certain methods and use real database. So, should Dependency Injection be avoided when no mocking is required for testing?
I think this might be the biggest misconception of all. Dependency Injection isn't JUST for making testing easier. It is so you can glance at the signature of a class constructor and IMMEDIATELY know what is required to make that class tick. This is impossible with static classes since classes can call both up and down the stack whenever they like without rhyme or reason. Your goal should be to add consistency, clarity, and distinction to your code. This is the single biggest reason to use DI and it is why I highly recommend you revisit it.
Although IoC/DI is not some silver bullet that works in all cases, it is possible that you didn't apply it correctly. The set of principles behind Dependency Injection take time to master, or at least, it sure did for me. When applied right, it can bring (among others) the following benefits:
Improved testability
Improved flexibility
Improved maintainability
Improved parallel development
From your question, I can already extract some things that might have gone wrong in your case:
I have to manually initialize dozen of dependencies in every class
This implies that each class you create is responsible of creating the dependencies it requires. This is an anti-pattern known as Control Freak. A class should not new up its dependencies itself. You might even have applied the Service Locator anti-pattern where your class requests its dependencies by calling the container (or an abstraction that represents the container) to get a particular dependency. A class should just define the dependencies it requires as constructor arguments.
dozen of dependencies
This statement implies that you are violating the Single Responsibly Principle. This is actually not coupled to IoC/DI, your old code probably already violated the Single Responsibility Principle causing it to become hard to understand and maintain for other developers. It's often hard for the original author to understand why others have a hard time maintaining code, since the thing you wrote often fits nicely in your head. Often the violation of the SRP will cause others to have trouble understanding and maintaining code. And testing classes that violate SRP is often even harder. A class should have half a dozen dependencies at most.
add more projects for interfaces and so on
This implies that you are violating the Reused Abstraction Principle. In general, the majority of components/classes in your application should be covered by a dozen of abstractions. For instance, all classes that implement some use case probably deserve one single (generic) abstraction. Classes that implement queries also deserve one abstraction. For the systems that I write, 80% to 95% of my components (classes that contain the application's behavior) are covered by 5 to 12 (mostly generic) abstractions. Most of the time you don't need to create a new project solely for the interfaces.
Most of the time I place those interfaces in the root of the same project.
much bigger codesize
The amount of code you write will initially not be very different. The practice of Dependency Injection however, only works great when applying SOLID as well, and SOLID promotes small focussed classes. Classes with one single responsibility. This means that you will have many small classes that are easy to understand and easy to compose into flexible systems. And don't forget: we shouldn't strive to write less code, but rather more maintainable code.
However, with a good SOLID design and the right abstractions in place, I experienced actually having to write much less code than I had to before. For instance, applying certain cross-cutting concerns (like logging, audit trailing, authorization, etc) can be applied by just writing a few lines of code in the infrastructure layer of the application, instead of having it to be spread out throughout the complete application. It even lead me to be able to do things that werent feasible before, because they forced me to make sweeping changes throughout the entire code base, which was so time consuming that management didn't allow me to do so.
ravioli-code instead of spaghetti-code
harder to understand when no IDE is used
This is kind of true. Dependency Injection promotes classes to become decoupled from one another. This can sometimes make it harder to browse to a code base, since a class usually depends on an abstraction instead of a concrete classes. In the past I found the flexibily that DI gives me outweigh the cost of finding the implementation by far. With Visual Studio 2015 I can simply do CTRL + F12 to find the implementations of an interface. If there is just one implementation, Visual Studio will jump right to that implementation.
slower performance
This is not true. The performance doesn't have to be any different than working with a code base of only static method calls. You however chose to have your classes with a Transient lifestyle which means it you new up instances all over the place. In my last applications I created all my classes just once per application, which gives roughly the same performance as only having static method calls, but with the benefit of the application being very flexible and maintainable. But note that even if you decide to new complete graphs of objects for each (web) request, the performance cost will most likely be orders of magnitude lower than any I/O (database, file system and web services calls) that you perform during that request, even with the slowest DI containers.
some errors are pushed to run-time
adding additional dependency (DI framework itself)
These issues both imply the usage of a DI library. DI libraries do object composition at runtime. A DI library however is not a required tool when practicing Dependency Injection. Small applications can benefit from using Dependency Injection without a tool; a practice called Pure DI. Your application might not benefit from using a DI container, but most applications actually benefit from using Dependency Injection (when used correctly) as a practice. Againt: tools are optional, writing maintainable code isn't.
But even if you use a DI library, there are libraries that have tools built-in that allow you to verify and diagnose your configuration. They won't give you compile-time support, but they allow you to run this analysis either when the application starts up or using a unit test. This prevents you from doing a regression on the complete application just to verify whether your container is wired correctly. My advise is to pick a DI container that helps you in detecting these configuration errors.
new staff have to learn DI first in order to work with it
This is kind of true, but Dependency Injection itself isn't actually hard to learn. What is actually hard to learn is to apply the SOLID principles correctly, and you need to learn this anyway when you want to write applications that need to be maintained by more than one developer for a considerate period of time. I rather invest into teaching the developers on my team to write SOLID code instead of just letting them crank out code; that will surely cause a maintenance hell later on.
a lot of boilerplate code
There is some boilerplate code when we look at code written in C# 6, but this isn't actually that bad, especially when you consider the advantages it gives. And future versions of C# will remove the boilerplate that is mainly caused by having to define constructors that take in arguments that are null-checked and assigned to private variables. C# 7 or 8 will surely fix this when record types and non-nullable reference types are introduced.
which is bad for creative people
I'm sorry, but this argument is plain bullshit. I've seen this argument used over and over again as an excuse to write bad code by developers who didn't want to learn about design patterns and software principles and practices. Being creative is no excuse for writing code that no one else can understand or code that is impossible to test. We need to apply accepted patterns and practices and within that boundary there is enough room to be creative, while writing good code. Writing code is not an art; it’s a craft.
Like I said, DI is not appropriate in all cases, and the practices around it take time to master. I can advise you to read the book Dependency Injection in .NET by Mark Seemann; it will give many answers and will give you a good sense how and when to apply it, and when not.
Be warned: I hate IoC.
There are many great answers here which are comforting. The main benefits according to Steven (very strong answer) are:
Improved testability
Improved flexibility
Improved maintainability
Improved scalability
My experiences are very different through, here they are for some balance:
(Bonus) Stupid Repository Pattern
Too often, this is included along with IoC. The repository pattern should only be used to access external data, and where interchangeability is a core expectation.
When you use this, with Entity Framework, you disable all the power of Entity Framework, this also happens with Service Layers.
Eg. Calling:
var employees = peopleService.GetPeople(false, false, true, true); //Terrible
It should be:
var employees = db.People.ActiveOnly().ToViewModel();
In this case using extension methods.
Who needs flexibility?
If you have no plans to change service implementations, you don't need it. If you think you'll have more than one implementation in the future, perhaps add IoC then, and only for that part.
But "Testability"!
Entity Framework (and probably other ORMs too), allow you to change the connection-string to point to an in-memory database. Granted, that's only available starting EF7. However, it can simply be a new (proper) test database in a staging environment.
Do you have other special test resources and service points? In this day and age, they're probably different WebService URI endpoints, which can also be configured in App.Config / Web.Config.
Automated Tests make your code maintainable
TDD - If it's a Web Application, use Jasmine or Selenium and have automated behaviour tests. This tests everything all the way to the user. It's an investment over time, starting by covering critical features and functions.
DevOps/SysOps - Maintain scripts for provisioning your whole environment (this is also best practice), spin up a staging environment and run all the tests. You can also clone your production environment and run your tests there. Don't make "maintainable" and "testable" your excuse for choosing IoC. Start with those requirements and find the best ways to meet those requirements.
Scalability - in what way?
(I probably need to read the book)
For coder scalability, Distributed Code Version Control, is the norm (although I hate merging).
For human resource scalability, you shouldn't be wasting days designing extra abstract layers for your project.
For production concurrent user scalability, you should be building, testing, then improving.
For server throughput scalability, you need to think a lot higher-level than IoC. Are you going to run a server on the customer LAN? Can you replicate your data? Are you replicating at the database level or application level? Is offline access important while mobile? These are substantial architecture questions, where IoC is rarely the answer.
Try F12
If you're using an IDE (which you should be doing), such as Visual Studio Community Edition, then you'll know how handy F12 can be, to navigate around code.
With IoC you'll be taken to the Interface, and then you'll need to find all references using a particular interface. Only one extra step, but for a tool that's used so much, it frustrates me.
Steven is on the ball
With Visual Studio 2015 I can simply do CTRL + F12 to find the
implementations of an interface.
Yes, but you have to then trawl through a list of both usages as well as the declaration. (Actually I think in the latest VS, the declaration lists separately, but it's still an extra mouse click, taking your hands away from the keyboard. And I should say this is a limitation of Visual Studio, not able to take you to an only interface implementation directly.
There are many 'textbook' arguments in favor of using IoC, but in my personal experience, the gains are/were:
Possibility to test only parts of the project, and mock some other parts. For example, if you have a component returning configuration from DB, it's easy to mock it so that your test can work without a real DB. With static classes this is not possible.
Better visibility and control of dependencies. With the static classes it's very easy to add some dependecies without even noticing, that can create problems later. With IoC this is more explicit and visible.
More explicit initialization order. With static classes this can be often a black box, and there can be latent problems due to circular usage.
The only inconvenience for me was that by placing everything before interfaces it's not possible to navigate directly to the implementation from the usage (F12).
However, it is the developers of a project who can judge best the pros and cons in the particular case.
Was there a reason why you didn't choose to use an IOC Library (StructureMap, Ninject, Autofac, etc)?
Using any of these would have made your life much easier.
Although David L has already made an excellent set of commentaries on your points, I'll add my own as well.
Much bigger codesize
I am not sure how you ended up with a larger codebase; the typical setup for an IOC library is pretty small, and since you are defining your invariants (dependencies) in the class constructors, you are also removing some code (i.e. the "new xyz()" stuff) that you don't need any more.
Ravioli-code instead of spaghetti-code
I happen to quite like ravioli :)
Slower performance, need to initialize all dependencies in constructor even if the method I want to call has only one dependency
If you are doing this then you are not really using Dependency Injection at all. You should be receiving ready-made, fully loaded object graphs via the dependency arguments declared in the constructor parameters of the class itself - not creating them in the constructor!
Most modern IOC libraries are ridiculously fast, and will never, ever be a performance problem.
Here's a good video that proves the point.
Harder to understand when no IDE is used
That's true, but it also means you can take the opportunity to think in terms of abstractions. So for example, you can look at a piece of code
public class Something
{
readonly IFrobber _frobber;
public Something(IFrobber frobber)
{
_frobber=frobber;
}
public void LetsFrobSomething(Thing theThing)
{
_frobber.Frob(theThing)
}
}
When you are looking at this code and trying to figure out if it works, or if it is the root cause of a problem, you can ignore the actual IFrobber implementation; it just represents the abstract capability to Frob something, and you don't need to mentally carry along how any particular Frobber might do its work. you can focus on making sure that this class does what it's supposed to - namely, delegating some work to a Frobber of some kind.
Note also that you don't even need to use interfaces here; you can go ahead and inject concrete implementations as well. However that tends to violate the Dependency Inversion principle (which is only tangenitally related to the DI we are talking about here) because it forces the class to depend on a concretion as opposed to an abstraction.
Some errors are pushed to run-time
No more or less than they would be with manually constructing graphs in the constructor;
Adding additional dependency (DI framework itself)
That is also true, but most IOC libraries are pretty small and unobtrusive, and at some point you have to decide if the tradeoff of having a slightly larger production artifact is worth it (it really is)
New staff have to learn DI first in order to work with it
That isn't really any different than would be the case with any new technology :) Learning to use an IOC library tends to open the mind to other possibilities like TDD, the SOLID principles and so forth, which is never a bad thing!
A lot of boilerplate code, which is bad for creative people (for example copy instances from constructor to properties...)
I don't understand this one, how you might end up with much boilerplate code; I wouldn't count storing the given dependencies in private readonly members as boilerplate worth talking about - bearing in mind that if you have more than 3 or 4 dependencies per class you are likely to be in violation of the SRP and should rethink your design.
Finally if you are not convinced by any of the arguments put forth here, I would still recommend you read Mark Seeman's "Dependency Injection in .Net". (or indeed anything else he has to say on DI which you can find on his blog).
I promise you will learn some useful things and I can tell you, it changed the way I write software for the better.
if you have to initialise dependencies manually in the code, you're doing something wrong. General patter for IoC is constructor injection or, probably, property injection. Class or controller shouldn't know about DI container at all.
Generally, all you have to do is:
configure container, like Interface = Class in Singleton scope
Use it, like Controller(Interface interface) {}
Benefit from controlling all dependencies in one place
I dont see any boilerplate code or slower performance or anything else you described. I can't really imaging how to write more or less complex app without it.
But generally, you need to decide what is more important. To please "creative people" or build maintainable and robust app.
Btw, to create property or filed from constructor you can use Alt+Enter in R# and it do all the job for you.

Best way to deal with conflated business and presentation code?

Considering a hypothetical situation where an old, legacy presentation library has been maintained over the years, and has gradually had more and more business logic coded into it through a process of hasty corrections and lack of proper architectural oversight. Alternatively, consider a business class or namespace that is not separated from presentation by assembly boundaries, and was thus capable of referencing something like System.Windows.Forms without being forced to add a reference (a much more sobering action than a simple using clause).
In situations like this, it's not unimaginable that the business code used by this UI code will eventually be called upon for reuse. What is a good way to refactor the two layers apart to allow for this?
I'm loosely familiar with design patterns--at least in principle anyway. However, I don't have a whole ton of practical experience so I'm somewhat unsure of my intuitions. I've started along the path of using the Strategy pattern for this. The idea is to identify the places where the business logic calls up to UI components to ask the user a question and gather data, and then to encapsulate those into a set of interfaces. Each method on that interface will contain the UI-oriented code from the original workflow, and the UI class will then implement that interface.
The new code that wants to reuse the business logic in question will also implement this interface, but substitute either new windows or possibly pre-fab or parameterized answers to the questions originally answered by the UI components. This way, the biz logic can be treated as a real library, albeit with a somewhat awkward interface parameter passed to some of its methods.
Is this a decent approach? How better should I go about this? I will defer to your collective internet wisdom.
Thanks!
I humbly suggest Model–View–Controller - MVC has a high probability as a successful solution to your problem. It separates various logic, much as you describe.
HTH
You seem to be taking a good approach, in which you break dependencies between concrete elements in your design to instead depend on abstractions (interfaces). When you break dependencies like this, you should immediately start using unit tests to cover your legacy code base and to evolve the design with improved assurance.
I've found the book Working Effectively with Legacy Code to be invaluable in these situations. Also, don't jump right into the patterns without first looking at the principles of object oriented design, like the SOLID principles. They often guide your choice of patterns and decisions about the evolution of the system.
I would approach it by clearly identifying the entities and the actions they can do or can be done to them. Then one by one try to start creating independent business logic objects for those refactoring the logic out of the UI, making the UI call to the BL objects.
At that point if I understand your scenario correctly you would have a hand full of BL objects, some portion of which made win forms calls, the win forms calls would need to be promoted out into the UI layer.
Then as JustBoo says, I think you'll have a distinct enough situation to start abstracting out controllers from your BL and UI and make it all function in an MVC design.
Okay, given your various comments, I would take Mr. Hoffa's advice and extend it. I'm sure you've heard hard problems should be broken down into ever smaller units of work until they can be "conquered."
Using that technique, coupled with the methodologies of Refactoring could solve your problems. There is a book and lots of information on the web about it. You now have a link. That page has a ton of links to information.
One more link from the author of the book.
So, you refactor, slowly but surely to the creamy goodness of MVC, step-by-step.
HTH

How to implement SOLID principles into an existing project

I apologize for the subjectiveness of this question, but I am a little stuck and I would appreciate some guidance and advice from anyone who's had to deal with this issue before:
I have (what's become) a very large RESTful API project written in C# 2.0 and some of my classes have become monstrous. My main API class is an example of this -- with several dozen members and methods (probably approaching hundreds). As you can imagine, it's becoming a small nightmare, not only to maintain this code but even just navigating the code has become a chore.
I am reasonably new to the SOLID principles, and I am massive fan of design patterns (but I am still at that stage where I can implement them, but not quite enough to know when to use them - in situations where its not so obvious).
I need to break my classes down in size, but I am at a loss of how best to go about doing it. Can my fellow StackOverflow'ers please suggest ways that they have taken existing code monoliths and cut them down to size?
Single Responsibility Principle - A class should have only one reason to change. If you have a monolithic class, then it probably has more than one reason to change. Simply define your one reason to change, and be as granular as reasonable. I would suggest to start "large". Refactor one third of the code out into another class. Once you have that, then start over with your new class. Going straight from one class to 20 is too daunting.
Open/Closed Principle - A class should be open for extension, but closed for change. Where reasonable, mark your members and methods as virtual or abstract. Each item should be relatively small in nature, and give you some base functionality or definition of behavior. However, if you need to change the functionality later, you will be able to add code, rather than change code to introduce new/different functionality.
Liskov Substitution Principle - A class should be substitutable for its base class. The key here, in my opinion, is do to inheritance correctly. If you have a huge case statement, or two pages of if statements that check the derived type of the object, then your violating this principle and need to rethink your approach.
Interface Segregation Principle - In my mind, this principle closely resembles the Single Responsibility principle. It just applies specifically to a high level (or mature) class/interface. One way to use this principle in a large class is to make your class implement an empty interface. Next, change all of the types that use your class to be the type of the interface. This will break your code. However, it will point out exactly how you are consuming your class. If you have three instances that each use their own subset of methods and properties, then you now know that you need three different interfaces. Each interface represents a collective set of functionality, and one reason to change.
Dependency Inversion Principle - The parent / child allegory made me understand this. Think of a parent class. It defines behavior, but isn't concerned with the dirty details. It's dependable. A child class, however, is all about the details, and can't be depended upon because it changes often. You always want to depend upon the parent, responsible classes, and never the other way around. If you have a parent class depending upon a child class, you'll get unexpected behavior when you change something. In my mind, this is the same mindset of SOA. A service contract defines inputs, outputs, and behavior, with no details.
Of course, my opinions and understandings may be incomplete or wrong. I would suggest learning from people who have mastered these principles, like Uncle Bob. A good starting point for me was his book, Agile Principles, Patterns, and Practices in C#. Another good resource was Uncle Bob on Hanselminutes.
Of course, as Joel and Jeff pointed out, these are principles, not rules. They are to be tools to help guide you, not the law of the land.
EDIT:
I just found these SOLID screencasts which look really interesting. Each one is approximately 10-15 minutes long.
There's a classic book by Martin Fowler - Refactoring: Improving the Design of Existing Code.
There he provides a set of design techniques and example of decisions to make your existing codebase more manageable and maintainable (and that what SOLID principals are all about). Even though there are some standard routines in refactoring it is a very custom process and one solution couldn't be applied to all project.
Unit testing is one of the corner pillars for this process to succeed. You do need to cover your existing codebase with enough code coverage so that you'd be sure you don't break stuff while changing it. Actually using modern unit testing framework with mocking support will lead encourage you to better design.
There are tools like ReSharper (my favorite) and CodeRush to assist with tedious code changes. But those are usually trivial mechanical stuff, making design decisions is much more complex process and there's no so much tool support. Using class diagrams and UML helps. That what I would start from, actually. Try to make sense of what is already there and bring some structure to it. Then from there you can make decisions about decomposition and relations between different components and change your code accordingly.
Hope this helps and happy refactoring!
It will be a time consuming process. You need to read the code and identify parts that do not meet the SOLID principles and refactor into new classes. Using a VS add-in like Resharper (http://www.jetbrains.com) will assist with the refactoring process.
Ideally you will have good coverage of automated unit tests so that you can ensure your changes do not introduce problems with the code.
More Information
In the main API class, you need to identify methods that relate to each other and create a class that more specifically represents what actions the method performs.
e.g.
Let's say I had an Address class with separate variables containing street number, name, etc. This class is responsible for inserting, updating, deleting, etc. If I also needed to format an address a specific way for a postal address, I could have a method called GetFormattedPostalAddress() that returned the formatted address.
Alternatively, I could refactor this method into a class called AddressFormatter that takes an Address in it constructor and has a Get property called PostalAddress that returns the formatted address.
The idea is to separate different responsibilities into separate classes.
What I've done when presented with this type of thing (and I'll readily admit that I haven't used SOLID principles before, but from what little I know of them, they sound good) is to look at the existing codebase from a connectivity point of view. Essentially, by looking at the system, you should be able to find some subset of functionality that is internally highly coupled (many frequent interactions) but externally loosely coupled (few infrequent interactions). Usually, there are a few of these pieces in any large codebase; they are candidates for excision. Essentially, once you've identified your candidates, you have to enumerate the points at which they are externally coupled to the system as a whole. This should give you a good idea of the level of interdependency involved. There usually is a fair bit of interdependency involved. Evaluate the subsets and their connection points for refactoring; frequently (but not always) there ends up being a couple of clear structural refactorings that can increase the decoupling. With an eye on those refactorings, use the existing couplings to define the minimal interface required to allow the subsystem to work with the rest of the system. Look for commonalities in those interfaces (frequently, you find more than you'd expect!). And finally, implement these changes that you've identified.
The process sounds terrible, but in practice, it's actually pretty straightforward. Mind you, this is not a roadmap towards getting to a completely perfectly designed system (for that, you'd need to start from scratch), but it very certainly will decrease the complexity of the system as a whole and increase the code comprehensibility.
OOD - Object Oriented Design
SOLID - class design
Single Responsibility Principle - SRP - introduced by Uncle Bob. Method, class, module are responsible only for doing single thing(one single task)
Open/Closed Principle - OCP - introduced by Bertrand Meyer. Method, class, module are open for extension and closed for modification. Use a power of inheritance, abstraction, polymorphism, extension, wrapper. [Java example], [Swift example]
[Liskov Substitution Principle] - LSP - introduced by Barbara Liskov and Jeannette Wing. A subtype can replace supertype without side effects
Interface Segregation Principle - ISP - introduced by Uncle Bob. Your interface should be as small as possible
[Dependency Inversion Principle(DIP)] - DIP - introduced by Uncle Bob. Internal class, layer should not be depended on external class, layer. For example when you have aggregation[About] dependency you should rather use some abstraction/interfaces. [DIP vs DI vs IoC]
6 principles about packages/modules(.jar, .aar, .framework):
what to put inside a package
The Release Reuse Equivalency
The Common Closure
The Common Reuse
couplings between packages
The Acyclic Dependencies
The Stable Dependencies
The Stable Abstractions
[Protocol Oriented Programming(POP)]

How to make an existing public API testable for external programmers using it?

I have a C# public API that is used by many third-party developers that have written custom applications on top of it. In addition, the API is used widely by internal developers.
This API wasn't written with testability in mind: most class methods aren't virtual and things weren't factored out into interfaces. In addition, there are some helper static methods.
For many reasons I can't change the design significantly without causing breaking changes for applications developed by programmers using my API. However, I'd still like to give internal and external developers using this API the chance to write unit tests and be able to mock the objects in the API.
There are several approaches that come to mind, but none of them seem great:
The traditional approach would be to force developers to create a proxy class that they controlled that would talk to my API. This won't work in practice because there are now hundreds of classes, many of which are effectively strongly typed data transfer objects that would be a pain to reproduce and maintain.
Force all developers using the API that want to unit test it to buy TypeMock. This seems harsh to force people to pay $300+ per developer and potentially require them to learn a different mock object tool than what their used to.
Go through the entire project and make all the methods virtual. This would allow mock-ing of objects using free tools like Moq or Rhino Mocks, but it could potentially open up security risks for classes that were never meant to be derived from. Additionally this could cause breaking changes.
I could create a tool that given an input assembly would output an assembly with the same namespaces, classes, and members, but would make all of the methods virtual and it would make the method body just return the default value for the return type. Then, I could ship this dummy test assembly each time I released an update to the API. Developers could then write tests for the API against the dummy assembly since it had virtual members that are very mock-able. This might work, but it seems a bit tedious to write a custom tool for this and I can't seem to find an existing one that does it well (especially that works well with generics). Furthermore, it has the complication that it requires developers to use two different assemblies that could possibly go out of date.
Similar to #4, I could go through every file and add something like "#ifdef UNITTEST" to every method and body to do the equivalent of what a tool would do. This doesn't require an external tool, but it would pollute the codebase with a lot of ugly "#ifdef"'s.
Is there something else that I haven't considered that would be a good fit? Does a tool like what I mentioned in #4 already exist?
Again, the complicating factor is that this is a rather large API (hundreds of classes and ~10 files) and has existing applications using it which makes it hard to do drastic design changes.
There have been several questions on Stack Overflow that were generic about retrofitting an existing application to make it testable, but none seem to address the concerns I have (specifically in the context of a widely used API with many third-party developers). I'm also aware of "Working Effectively With Legacy Code" and think it has good advice, but I am looking for a specific .net approach given the constraints mentioned above.
UPDATE: I appreciate the answers so far. One that Patrik Hägne brought up is "why not extract interfaces?" This indeed works to a point, but has some problems such as the existing design has many cases where we take expose a concrete class. For example:
public class UserRepository
{
public UserData GetData(string userName)
{
...
}
}
Existing customers that are expecting the concrete class (e.g. "UserData") would break if they were given an "IUserData."
Additionally, as mentioned in the comments there are cases where we take in a class and then expose it for convenience. This could cause problems if we took in an interface and then had to expose it as a concrete class.
The biggest challenge to a significant rewrite or redesign is that there is a huge investment in the current API (thousands of hours of development and probably just as much third party training). So, while I agree that a better SOLID design rewrite or abstraction layer (that eventually could become the new API) that focused on items like the Interface Separation Principle would be a plus from a testability perspective, it'd be a large undertaking that probably can't be cost justified at the present time.
We do have testing for the current API, but it is more complicated integration testing rather than unit-testing.
Additionally, as mentioned by Chad Myers, this is question addresses a similar problem that the .NET framework itself faces in some areas.
I realize that I'm probably looking for a "silver bullet" here that doesn't exist, but all help is appreciated. The important part is protecting the huge time investments by many third party developers as well as the huge existing development to create the current API.
All answers, especially those that consider the business side of the problem, will be carefully reviewed. Thanks!
What you're really asking is, "How do I design my API with SOLID and similar principles in mind so my API plays well with others?" It's not just about testability. If your customers are having problems testing their code with yours, then they're also having problems WRITING/USING their code with yours, so this is a bigger problem than just testability.
Simply extracting interfaces will not solve the problem because it's likely your existing class interfaces (what the concrete classes expose as their methods/properties) aren't design with Interface Segregation Principle in mind, so the extracted interface would have all sorts of problems (some of which you mentioned in comment to a previous answer).
I like to call this the IHttpContext problem. ASP.NET, as you know, is very difficult to test around or with due to the "Magic Singleton Dependency" problem of HttpContext.Current. HttpContext is not mockable without fancy tricks like what TypeMock uses. Simply extracting an interface of HttpContext is not going to help that much because it's SO huge. Eventually, even IHttpContext would become a burden to test with so much so that it's almost not worth doing any more than trying to mock HttpContext itself.
Identifying object responsibilities, slicing up interfaces and interactions appropriately, and designing with Open/Closed Principle in mind is not something you and try to force/cram into an existing API designed without these principles in mind.
I hate to leave you with such a grim answer, so I'll give you one positive suggest: How's about YOU take all the grief on behalf of your customers and make some sort of service/facade layer over top of your old API. This service layer will have to deal with the minutiae and pain of your API, but will present a nice, clean, SOLID-friendly public API that your customers can use with much less friction.
This also has the added benefit of allowing you to slowly replace parts of your API and eventually make it so your new API isn't just a facade, it IS the API (and the old API is phased out).
Another approach would be to create a seperate branch of the API and do option 3 there. Then you just maintain these two versions and deprecate the former. Merging changes from one branch into the other should work automatically most of the time.
As a reply to your edit, interface extraction does indeed work very well here:
public interface IUserRepository
{
IUserData GetData(string userName);
}
public class UserRepository
: IUserRepository
{
// The old method is not touched.
public UserData GetData(string userName)
{
...
}
// Explicitly implement the interface method.
IUserData IUserRepository.GetData(string userName)
{
return this.GetData(userName);
}
}
As I also said in a comment this may not be the way to go in every place. I think you should identify some main points in your API where it's extra important for your customers to be able to fake the interaction and start there. You don't have to make a complete rewrite of the whole API but it can transform gradually.
One approach you don't mention (and the one I'd prefer in most cases) is to extract interfaces for the classes you want the user of the API to be able to fake. Not knowing your API not every single class in it has to have it's interface extracted.
Third party users should not be testing your API. They would want to test their code against your API and so they need to create Mocks of the API etc. but they would be relying on your testing of the API to ensure it works. Or is that what you meant? Do you want to make your API easy to test against?
Start again in that case, and this time think about the testers :)
I agree with Kim. Why not re-write your core API using the best practices you explained, and supply a set of proxy/adapter classes that expose the old interface but talk to your new API?
Old developers will be naturally encouraged to migrate to the new API, but not be forced to immediately do so. New developers will simply use your new API. Announce an EOL for your old API interface if you are concerned about developers staying on the old API.

Categories