I've inherited a project that is massive in scale, and a bit of a labyrinth. The traffic is substantial enough to want to optimize the data access, so I've began converting some WCF services that are invoked by javascript to Web API.
Unfortunately, the database's primary keys (not auto incrementing) are also managed by a custom ORM by querying a MySQL function that returns the next set of ID's to be used. The ORM then caches them and serves them up to the application. The database itself is an ever growing 2 TB's of data, which would make downtime significant.
I had planned on using Dapper as I've enjoyed ease/performance in the past, but weening this database off of this custom ORM seems daunting and prone to error.
Questions:
Does anyone have any advice for tackling a large scale project?
Should I focus more on migrating the database into an entirely new data structure? (it needs significant normalization, too!)
My humble opinion:
A rule of thumb when you deal with legacy code is: if something works, keep it that way. If necessary, make a change or an improvement. The main reasons are:
The effect of redesign is almost zero when you want to add a business value to the system.
The system, good or bad, works. No matter the care you have, you can always mess something with an structural change.
Besides, it depends a lot on the plan of the company the reasons to change (adding a feature, fixing a bug, improving design or optimizing resource usage). My humble experience tells me that time and budget are very important, and although we always want to redesign (or in some cases, to code from scratch), the most important is the business objectives and the added value.
In your case, maybe it's not necessary to change all the ORM. If you say that the ID's are cached, a better approach would be to modify the PK's, adding on them the identity property (with the properly starting value on each table). After that you can delete that particular part of the code that get the next Id's.
In some cases, an unnormalized database has his reasons. I've seen cases in which the data is copied to the tables to avoid the join, which affects performance. I'm talking about millions of records...
Reasons to change the ORM: maybe if it's inefficient, or if it does not close unmanaged code (in this case a better approach is to implement the IDisposable interface). If it works, maybe a better approach is to use a new ORM once you need to create new functionalities. If the project needs a refactoring for optimization purposes, the change needs to be applied in the bottlenecks, not the entire code.
There is a lot of discussion about the topic. A good recommended resource is "Working effectively with legacy code" by Michael Feathers, or "Getting Started With DDD When Surrounded By Legacy Systems" by Eric Evans.
Greetings!
Related
Short description of our application: We analyze .NET assemblies and detect dependencies between them (e.g. method calls). We save those dependencies in a MSSQL server database. From a class/method in code we can now find all direct and indirect dependencies and are able to find out which code may break if we change the interface or implementation.
Although we make good use of indices (dropped our import performance, but that runs overnight anyways) we still have performance issues. As we import many many versions of the same assembly we have quite a heavy amount of data and queries take a few seconds, which is just not fast enough (< 1.5s is the target).
As dependencies are a graph-like structure we're wondering if switching from MSSQL to a NoSQL graph database may help. This would take some time so we're hoping for some external input first.
If yes, you can of course also post a recommended .NET graph database :-)
Call me an old fogey, but I would be quite careful making such a technology switch - as this SO question shows, the technology choice is fairly limited, and I think you run the risk of turning your project into a "Neo4J" project, rather than a "dependency management" project. If you've really hit the buffers, that's worth considering, but it doesn't sound like you should be there with the data volumes you're discussing.
The first thing I'd consider is looking at the "nested set" model - this specifically solves the performance problem when retrieving all children for a given node.
In my last question I posted some sample code on how I was trying to achieve separation of concerns. I received some ok advice, but I still just don't "get it" and can't figure out how to design my app to properly separate concerns without designing the next space shuttle.
The site I am working on (slowly converting from old ASP section by section) is moderately sized with several different sections including a store (with ~100 orders per day) and gets a decent amount of traffic (~300k uniques/month). I am the primary developer and there might be at most 2-3 devs that will also work on the system.
With this in mind, I am not sure I need full enterprise level architecture (correct me if i am wrong), but since I will be working on this code for the next few years, I want it to perform well and also be easy to extend as needed. I am learning C# and trying to incorporate best practices from the beginning. The old ASP site was a spaghetti mess and I want to avoid that this time around.
My current stab at doing this ended up being a bunch of DTOs with services that validate and make calls to a DAL layer to persist. It was not intentional, but I think the way it is setup now is a perfect anemic domain model. I have been trying to combat this by turning my BLL to domain objects and only use the DTOs to transfer data between the DAL and BO, but it is just not working. I also had all my dtos/blls split up according to the database tables / functionality (eg - YouTube style app - I have separate DTO/BLL/DAL for segments, videos, files, comments, etc).
From what I have been reading, I need to be at least using repositories and probably interfaces as well. This is great, but I am unsure how to move forward. Please help!
From what I can see you have four points that need addressing:
(1) "With this in mind, I am not sure I need full enterprise level architecture"
Lets deal with the high level fluff first. It depends on what you mean by "full enterprise level architecture", but the short answer is "Yes" you need to address many aspects of the system (and it will depend on the context of the system as to what the main ones are). If nothing else, the keys ones would be Change and Supportability. You need to structure the application in a way that supports changes in the future (logical and physical separation of concerns (Dependency Injection is a great for the latter); modular design, etc).
(2) "How to properly separate concerns in my architecture without designing a spacecraft?"
I like this approach (it's an article I wrote that distilled everything I had learnt up to that point) - but here's the gist:
Looking at this you'll have a minimum of six assemblies - and that's not huge. If you can break your system down (separate concerns) into these large buckets it should go a long way to giving what you need.
(3) Detail
Separating concerns into different layers and classes is great but you need to go further than that if you want to be able to effectively deal with change. Dependency Inversion (DI) is a key tool to use here. When I learnt DI it was a hand-rolled affair (as shown in the previous link) but there are lots of frameworks (etc) for it now. If you're new to DI (and you work in .Net) the article will step you through the basics.
(4) How to move forward
Get a simple vertical slice (UI all the way to the DB) working using DI, etc. As you do this you'll also be building the bones of the framework (sub-systems and major plumbing) that your system will use.
Having got that working start on a second slice; it's at this point that you should uncover any places where you're inadvertently not reusing things you should be - this is the time to change those before you build slices 3,4 and 5 - before there's too much rework.
Updates for Comments:
Do you you think I should completely
drop web forms and take up MVC from
scratch or just with what I know for
now?
I have no idea, but for the answer to be 'yes' you'd need to be able to answer these following questions with 'yes':
We have the required skills and experience to use and support MVC.
We have time to make the change (there is clear benefit in making this change).
We know MVC is better suited for our needs.
Making this change does not put successful delivery at risk.
...do I need to move to projects and
setup each of these layers as a
separate project?
Yes. Projects map 1-to-1 with assemblies, so get the benefits of loose-coupling you'll definitely want to separate things that way, and be careful how you set references.
when you refer to POCOs, are you meaning just DTOs or rich domain objects?
DTO not Rich Domain Object. BUT, people seem yo use the terms POCO and DTO interchangeably when strictly speaking they aren't - if you're from the Martin Fowler school of thought. In his view a DTO would be a bunch of POCO's (or other objects(?)) parcelled together for sending "across the wire" so that you only make one call to some external system and not lots of calls.
Everyone says I should not expose my
data structures to my UI, but I say
why not?
Managing dependencies. What you don't want is for you UI to reference the physical data structure because as soon as that changes (and it will) you'll be (to use the technical term) screwed. This is the whole point of layering. What you want to do is have the UI depend on abstractions - not implementations. In the 5-Layer Architecture the POCOs are safe to use for that because they are an abstract / logical definition of 'some thing' (a business concept) and so they should only change if there is a business reason - so in that sense they are fairly stable and safer to depend on.
If you are in the process of rewriting your eCommerce site, you should atleast consider replacing it with a standard package.
There are many more such packages available today. So although the decision to build the original site may have been correct, it is possible that building a custom app is no longer the correct decision.
There are several eConmmerce platforms listed here: Good e-commerce platform for Java or .NET
It should cost much less than the wages of 2-3 developers.
I am working on a .NET web application that uses an SQL Server database with approximatly 20 to 30 tables.
Most tables will be included in the .NET solution as class.
I have written my own data access layer to read the objects from, and write them to the database.
The whole thing is consist of just a few classes and very few lines of code en uses generics and reflection to find out what SQL and parameters to use.
Now, such thing could be done by using NHibernate (or similair framework) and some co-workers claim that is foolish of me not to use it.
My main argument for not using it is that i want maximum control over my application, know exactly what everything does and how everything works, even if that costs me more development time.
I also dont like the fact i have to map my database in XML files (my own solution lets me map it in the entity class files).
So, what i would like to hear from you is, is it really stupid to not use NHibernate in this situation?
Am i really being ignorant or is it not such a strange idea to use my own solution?
I think these days there really isn't any reason to roll your own persistence framework since there are so many good choices out there. You don't have to use NHibernate (though it is a good choice) but I would seriously consider using something that is well tested and established in the industry as it will tend to perform better and have less bugs that something you write yourself.
It probably is foolish to write your own classes instead of using NHibernate, but it's less foolish to continue using your own classes, given that you've already written them. Maybe.
I won't call you foolish because I've done exactly the same thing in the past. Then I started using NHibernate and wondered why the hell I rolled my own. It's good, give it a go.
You have several possibilities that are probably better than you reinventing the wheel. Let me name two most likely choices:
Use Entity Framework for your DAL+DAO. This will make your classes (that you've already written) obsolete, since EF will create their own and you'll get up to date with latest language capabilities and technologies.
Use Fluent NHibernate so you don't have to work with XML mappings. This way you'll keep your business layer object classes you've written and avoid tedious NHibernation XML files. It's all C#.
Your way of thinking is good. You want control. That's fine. But using your own DAL is a bit foolish these days, because you are basically reinventing the wheel, plus you'll have not tested/buggy code that will take considerable time to develop+test+debug.
If I were you, I'd go with the #2 option, since I've done option #1 and I know I had to customize lots of things to make EF work as it should. EF will be ready with V2.
People tend to use frameworks that are already written because, well, they're already written (and tested).
But there IS merit to rolling your own. Only you and your colleagues can make assumptions about your domain. A generic framework like NHibernate cannot make many assumptions, because that wouldn't make it very universal.
When you roll your own, you can bake these assumptions into your framework, to make a more streamlined, natural API. That said, if you were starting over I would have suggested taking an existing framework and wrapping it to better suit your needs. But since you already have something and it works for you, I'm not sure that I would suggest swapping it out for something else.
It depends on what they mean by "foolish."
If by "foolish" they mean you shouldn't have written your persistence layer in the first place, they're probably right, but that's crying over spilled milk.
If by "foolish" they mean you should rewrite all your existing code to use another framework (like NHibernate) when it's already working with yours, they're probably wrong (although there's something to be said for # of bugs in NHibernate vs likely # of bugs in yours).
If by "foolish" they mean the entire team knows NHibernate cold, and it's already used in the rest of your code, so by using your framework you're making it harder on the team, they're absolutely right, and you should probably refactor the code in NHibernate as soon as possible, before any more code gets locked in to your framework.
If by "foolish" they mean no one there really knows NHibernate, they just like it, then... nobody wins. They're being fussy, you implemented a framework you didn't have to... let's call it a tie.
All of that said, everyone should write a persistence framework or three. Those probably shouldn't end up in anything that ships, but it's a good exercise. The only mistake you made was tying code the team had to maintain into your good exercise.
There are many good persistence tools out there that are well tested and have proven performance (NHibernate, Linq to entities, LLBL Gen Pro). If your needs are very different from the normal persistence frameworks that exist then I would roll my own. I would want to take advantage of the testing and optimizations of an existing tool if at all possible, however.
That being said, I might also roll my own if I wanted to have the experience of building my own ORM tool and was willing to live with the downsides (not as well tested or optimized as tools that have been around for years, speed to market).
Making your own solution, especially when it seems to work fine and be as simple as you say, is neither ignorant nor strange. There are lots of situations where it's better to do that than to add a dependency on a separate project like nHibernate.
That said, there are of course also a lot of situations where the complete opposite is true. :)
It really depends on your project and team. If you are developing an enterprise application that will eventually be supported by someone else, sticking to industry standards might be a good idea even if it means a bit more work up front.
All of the answers here are great, but I am really surprised that nobody has mentioned Castle ActiveRecord, it sounds very similar to what your framework does and really simplifies the interface to NHibernate. It's one of the patterns that made Ruby on Rails so popular after all!
Ayende Rahien (one of the principal NH developers) gave a GREAT presentation on ActiveRecord at Oredev a few years ago which I highly recommend: http://www.viddler.com/explore/oredev/videos/89
I think that it is a matter of balance of control. You say that you want control and you don't want mappings. If this control comes at the cost that there is an increased development and maintenance cost and that it takes longer to produce working code, then it is a problem.
I personally don't see a problem in rolling a framework as long as it simplifies a repetitive task and makes development more productive and code more stable due to less room for interpretation. We have rolled our own framework, that includes a persistence/data access implementation. Our reasons for doing it, though, were specific. In this case, it was to work within a DDD environment that was much closer to what Evans describes than what most off the shelf products were providing.
I think the difference is, though, that we understood that there was an upfront cost and that it would eventually balance itself out through savings in development time in the future. Of course, if you are writing code that you manually have to manage connections, map data, etc., you are probably going down the wrong path. At the very least, you could be using something like Enterprise Library to help you manage the tedium of connectivity and command construction. But, I also think, that if you have no reuse - nothing that is a "framework" type of implementation that you can abstract and apply to other projects, then you are creating a maintenance nightmare and time sink that you will be the sole owner of.
We were also using our own Data Access Layer and entity classes. We also had a code generator who used to generate all this classes for us. But now we are using Entity Framework and we are more then happy.
Simple advise : Start learning nHibernate or whatever you prefer and start using it in your next project.
Entity Spaces - http://www.entityspaces.net/Portal/Default.aspx
is also a good tool.
I ended up using Fluent NHibernate for the job.
All my entity classes were generated with ActiveRecordGenerator (http://code.google.com/p/active-record-gen/)
I wondered whether UpdateModel is considered an "expensive" operation (due to Reflection lookup of the model properties), especially when seen in the context of a larger web application (think StackOverflow)?
I don't want to engage in premature optimization but I consider it a design choice to use UpdateModel which is why I'd like to know early whether it is advisable or not to go with it. The other (tedious) choice is writing my own UpdateModel method for various domain objects with fixed properties.
Thank you!
You are smart to want to not engage in premature optimization. Especially since this "optimization" would favor the processor's time over yours, which is far more expensive.
The primary rule of optimization is to optimize the slow stuff first. So consider how often you actually update a model versus selecting from your database backend. I'm guessing it's 1/10 as often or less. Now consider the cost of selecting from the database backend versus the cost of reflection. The cost of reflection is measured in milliseconds. The cost of selecting from the database backend can be measured in seconds at worst. My experience is that POSTs are rarely very slow, and when they are it's usually the database at fault rather than the reflection. I think you're likely to spend most of your optimization time on GETs.
Compared to network latency, database calls and general IO, the UpdateModel() call is trivial and I wouldn't bother with it.
I think UpdateModel is a bit of a shortcut that causes a huge amount of coupling between the view and the model.
I choose not to use "built-in" models (like being able to pass LINQ created objects to the view directly from the database) because I want the option to replace my model with something more sophisticated - or even just another database provider. It is very tempting to use LINQtoSQL (or ADO.NET Entities) for fast prototyping though.
What I tend to do is create my MVC application, then expose a 'service' layer which is then connected to a 'model' (which is an OO view of my domain). That way I can easily create a web service layer, swap databases, write new workflows etc without concern.
(and make sure you write your tests and use DI - it saves a lot of hassle!)
Rob
I believe that the best way to save your application state is to a traditional relational database which most of the time its table structure is pretty much represent the data model of our system + meta data.
However other guys in my team think that today it's best to simply serialize the entire object graph to a binary or XML file.
No need to say (but I'll still say it) that World War 3 is going between us and I would like to hear your opinion about this issue.
Personally I hate serialization because:
The data saved is adhered only to your development platform (C# in my case). No other platforms like Java or C++ can use this data.
Entire object graph (including all the inheritance chain) is saved and not only the data we need.
Changing the data model might cause severe backward compatibility issues when trying to load old states.
Sharing parts of the data between applications is problematic.
I would like to hear your opinion about that.
You didn't say what kind of data it is -- much depends on your performance, simultaneity, installation, security, and availability/centralization requirements.
If this data is very large (e.g. many instances of the objects in question), a database can help performance via its indexing capabilities. Otherwise it probably hurts performance, or is indistinguishable.
If your app is being run by multiple users simultaneously, and they may want to write this data, a database helps because you can rely on transactions to ensure data integrity. With file-based persistence you have to handle that yourself. If the data is single-user or single-instance, a database is very likely overkill.
If your app has its own soup-to-nuts installation, using a database places an additional burden on the user, who must set up and maintain (apply patches etc.) the database server. If the database can be guaranteed to be available and is handled by someone else, this is less of an issue.
What are the security requirements for the data? If the data is centralized, with multiple users (either simultaneous or sequential), you may need to manage security and permissions on the data. Without seeing the data it's hard to say whether it would be easier to manage with file-based persistence or a database.
If the data is local-only, many of the above questions about the data have answers pointing toward file-based persistence. If you need centralized access, the answers generally point toward a database.
My guess is that you probably don't need a database, based solely on the fact that you're asking about it mainly from a programming-convenience perspective and not a data-requirements perspective. Serialization, especially in .NET, is highly customizable and can be easily tailored to persist only the essential pieces you need. There are well-known best practices for versioning this data as well, so I'm not sure there's an advantage on the database side from that perspective.
About cross-platform concerns: If you do not know for certain that cross-platform functionality will be required in the future, do not build for it now. It's almost certainly easier overall to solve that problem when the time comes (migration etc.) than to constrain your development now. More often than not, YAGNI.
About sharing data between parts of the application: That should be architected into the application itself, e.g. into the classes that access the data. Don't overload the persistence mechanism to also be a data conduit between parts of the application; if you overload it that way, you're turning the persisted state into a cross-object contract instead of properly treating it as an extension of the private state of the object.
It depends on what you want to serialize of course. In some cases serialization is ridicilously easy.
(I once wrote kind of a timeline program in Java,
where you could draw en drag around and resize objects. If you were ready you could save it in file (like myTimeline.til). On that momenet hundreds of objects where saved, their position on the canvas, their size, their colors, their innertexts, their special effects,...
You could than ofcourse open myTimeLine.til and work further.
All this only asked a few lines of code. (just made all classes and their dependencies
serializable) and my coding time took less than 5 minutes, I was astonished myself! (it was the first time I used serialization ever)
Working on a timeline you could also 'saveAs' for different versions and the 'til' files where very easy to backup and mail.
I think in my particular case it would be a bit idiot to use databases. But that's of course for document-like structures only, like Word to name one.)
My point thus first : there are certainly several scenarios in which databases wouldn't be the best solution. Serialization was not invented by developers just because they were bored.
Not true if you use XMLserialization or SOAP
Not quite relevant anymore
Only if you are not carefull, plenty of 'best practices' for that.
Only if you want it to be problematic, see 1
Of course serialization has besides the speed of implementation other important advantages like not needing a database at all in some cases!
See this Stackoverflow posting for a commentary on the applicability of XML vs. the applicability of a database management system. It discusses an issue that's quite similar to the subject of the debate in your team.
You have some good points. I pretty much agree with you, but I'll play the devil's advocate.
Well, you could always write a converter in C# to extract the data later if needed.
That's a weak point, because disk space is cheap and the amount of extra bytes we'll use costs far less than the time we'll waste trying to get this all to work your way.
That's the way of the world. Burn the bridges and require upgrades. Convert the data, or make a tool to do that, and then no longer support the old version's way of doing it.
Not if the C# program hands off the data to the other applications. Other applications should not be accessing the data that belongs to this application directly, should they?
For transfer and offline storage, serialization is fine; but for active use, some kind of database is far preferable.
Typically (as you say), without a database, you need to deserialize the entire stream to perform any query, which makes it hard to scale. Add the inherent issues with threading etc, and you're asking for pain.
Some of your other pain points about serialization aren't all true - as long as you pick wisely. Obviously, BinaryFormatter is a bad choice for portability and versioning, but "protocol buffers" (Google's serialization format) has versions for Java, C++, C#, and a lot of others, and is designed to be version tolerant.
Just make sure you have a component that handles saving/loading state with a clean interface to the rest of your application. Then whatever choice you make for persistence can easily be revisited later.
Serializing an object graph to a file might be a good quick and dirty initial solution that is very quick to implement.
But if you start to run into issues that make a database a better choice you can plug in a new version with little or no impact on the rest of the application.
Yes propably true. The downside is that you must retrieve the whole object which is like retrieving all rows from a table. And if it's big it will be a downside. But if it ain't so big and with my hobbyprojects they are not, so maybe they should be a perfect match?