is a database intermediary good system design?

is a database intermediary good system design? - c#

background: we've got a number of server processes and client apps that are used entirely internally, in a fairly controlled environment. we capture a significant amount of data every day that goes into a couple database machines. most everything is c#, with a few c++ apps.
just about every app has some basic (if not extensive) dependence on database data, whether it's for historical data, daily-calculated values, or assorted parameters. as the whole environment has gotten a bit more sprawling, I've been wondering about the sense in sticking an intermediary in between all client and server apps and the database, a sort of "database data broker". any app that needs values from the db makes a request to the data broker, instead of a dll wrapper function that calls a stored proc.
one immediate downside is that the data would make two trips across the network: from db to broker, and from broker to calling app. seems like poor form, but the amount of data would be small enough in each request that I'm ok with it as far as performance goes.
one (seeming) upside is that it would be trivial to set up a test environment, as it would entail just setting up a test data broker, and there's no maintaining of db connection strings locally anywhere else. also, I've been pondering creating a mini request language so you wouldn't have to enumerate functions for each dataset you might request (instead of GetX() and GetY(), there would be Get("name = X")
am I over-engineering this, or is it possibly a worthy architecture?
edit: thanks for all the great comments so far, great food for thought.

It depends on what you're trying to accomplish with it. According to Rocky Lhotka, you should only add a tier if you are forced to, kicking and screaming all the way.
I agree with him: don't tier unless you need to. I think there are valid reasons to add additional tiers, usually for purposes of security, scalability and maintainability. The question becomes: is yours a valid reason?
It looks like the major reason is maintainability. Does it outweigh the benefits you get by not having the tier?

only you can answer these:
what are the benefits of doing this?
what are the problems/risks of doing this?
do you need this to make testing easier or even possible?
if you make this change and when it goes live and crashes will you be fired?
if you make the changes and it goes live will you get a promotion?
etc...

As the former architect of a system that also used a database heavily as a "hub," I can say that there are several drawbacks that you should be aware of. Our system used databases:
As a transaction store (typical OLTP stuff)
As a staging queue (submitted but unprocessed transactions)
As a historical data store (results of processed transactions)
As an interoperation layer (untranslated commands or transactions issued from other systems)
One of the major drawbacks is ownership costs. When your databases become the single point of failure for so many types of operations, it becomes necessary to ensure that they are all hosted in high-availability environments. This not only expensive from a hardware perspective, but it is also expensive to support deployments to HA environments, since developers typically have very limited visibility to the internals.
A second drawback is that you have to seriously design integrity in to all of your tables. In a typical SOA environment, you have complete control over how data is modified. When you expose it through database tables, you must consider that any application with the right credentials will have the ability to modify data. Because of this, you must carefully consider utilitarian implementations of constraints. If you had a single service managing persistence, you could be much looser in constraints on the database and enforce them in code.
Third, if you ever want to expose any functionality that the database tables currently allow you to provide to outside parties, you must write service code anyway, so you might be better served doing it strategically as opposed to reacting to requests.
Fourth, UI interaction directly with the data layer creates security risks, especially if the client is a thick client.
Finally, writing code that responds to events (service calls) is much easier than polling code. Typically, organizations that rely heavily on database polling end up reinventing the wheel every time a new project requires a new "monitoring service." It can be avoided by creating a "framework," but those have their own pitfalls (primarily around prescription versus adoption).
This is just a laundry list of problems I have encountered. It's not necessarily meant to dissuade you from using databases for these functions, but it helps to know the dangers ahead of time so you can at least plan for them if they ever do become issues.
EDIT
Just thought of another scenario that caused us pains. Versioning your changes can be difficult. For example, if you need to change the shape of a table (normalize/denormalize), it has a cascading effect if multiple applications rely on it. In a SOA scenario, it is much easier, because you can keep your old API, change the internal interaction so that it works with the changed tables, and allow consumers to migrate to the new version on their own schedule.

A data broker sounds like a really good way to abstract out the multiple data sources for your apps. It would be easy to consolidate, change repositories, or otherwise move data around if needed in the future.

I may be misunderstanding something, but it seems to me like you should consider some entity framework. That is a framework you can use to "map" your interaction with the db to some domain objects. That way you work locally on domain objects that gets filled form your db, and when it is time to persist the state of your objects to the base, the framework handles all the connections back and forth. In this way you can also easily mock up these domain objects for unit testing without needing a db connection.
Check out NHibernate for a good entity framework alternative.

If you already have the database related know-how I think it's not a bad decission.
Good things that I can think of:
if the data model is consistent you can plug in new tools easily without making any changes in the other apps.
maybe you can have running the database more reliabily than your apps, so if one of them fails, the other one can still be working.
you can make backups and rollbacks using the database tools.
you can do emergency fixes manipulating the data directly with sql or some visual tool.
But if you have to learn new frameworks along the way, maybe the benefits are not worth the extra initial effort.

"any app that needs values from the db makes a request to the data broker"
When database technology was being invented over 40 years ago, the people doing that inventing had ideas along the lines of "any app that needs values from the db makes a request to the dbms".
Have you ever pondered the possibility that YOU ALREADY HAVE a "data broker", and that there might be very little added value in creating a second one of your own ?

Related

What is the best architectural choice for viewing aggregates of a constantly changing dataset?

I need to decide on the optimal way to write a C# client application to view the dataset in a number of different views. One, some or all views may be visible at once and must be coherent.
A simplified illustration of the dataset would be something like this, assume around 10000 items.
Based on this dataset a number of aggregates must be calculated, such as the sum of values for each ItemId and for each ClientId. The actual calculations are a bit more complicated, but assume that around 30 different aggregates must be calculated.
There will be around 10 clients that will view the data at any one time. Each user will decide if the data is continuously updated or refreshed automatically.
The data is stored in SQL Server 2008 R2 and all clients have access to this directly and are on the same LAN.
The UI needs to be non-blocking, so that new data can be read in the background and the active views refreshed when all aggregates have been calculated.
What architecture/technology/pattern is best suited to this sort of scenario?
Should I use WPF, Windows Forms or Silverlight?
Should the views be pre calculated on the server or should the client do this processing?
Should the client connect directly to the database or via a WCF service?

Unfortunately I think the answer for most of your questions is "It Depends".
1 - MVVM makes sense given your requirements of many views of the same sets of data.
2 - Which technology are you most familiar with now, and what is the timeframe of your application. If you're very familiar with WinForms and have a tight schedule that makes sense. If you're not and you have time learn, then Silverlight may make more sense. I'm kind of torn on WPF, since nowadays it seems more like Silverlight++ instead of the other way around. In other words, if you need to do a Line of Business app, pick Silverlight UNLESS theres a requirement that can only be fulfilled with WPF.
3 - The answer to this question depends on 2 things: How often is the data being updated and how complex/suited to SQL are the calculations. I would generally prefer to handle aggregating on the server side, but depending on the exact calculations you're performing that may or may not be feasible.
4 - This will really be made for you depending on your choice of technologies. Silverlight can't connect to the database directly, so you have to use a service. WinForms and WPF can directly connect to the database. Even so, going with a data access service can help if you find that having every client directly call the database could be a performance issue.
A lot of these decisions are trade offs, and you might not even know that you've traded something until it becomes an issue.
TLDR: It depends.

I mostly agree with Barry: MVVM for design pattern, Silverlight makes sense, going through a Web service can also help with authorization / providing different features to different clients (even though this was not a specific requirement)
The only place I disagree is where the aggregations are made: I personally think that you should send the minimum amount of data necessary to the client, and let the clients do as many calculations as possible. For example, in your table, the 'Value' field looks like it's a simple product of the 'Quantity' and 'Price'. I don't really see the reason for calculating this in the database, then loading it to the web service, then passing it to the client, such that the client only displays the data. This means that your database is doing 95% of the work, but you can make the client do some work (10-15%) and reduce the workload on your database.

What architecture/technology/pattern is best suited to this sort of scenario?
As this sounds like mostly a read-only application and not something that will be doing data entry and validation, I'd probably opt for WPF (10 clients isn't much). Silverlight would be my second choice. Either tech will give you asyn callbacks. I've done both; WPF is going to be more responsive.
Should I use WPF, Windows Forms or Silverlight?
Avoid WinForms like the plague.
Should the views be pre calculated on the server or should the client do this processing?
It depends on how much work you want the database to do. Personally, I hate putting business logic in the database... I'd probably write a lightweight Service Facade over the DB, put my business logic in there, and then expose the results as a WCF Service to a WPF client (or as a RIA Service if I opted for Silverlight).
Should the client connect directly to the database or via a WCF service?
Connecting directly to the DB means you're going to HAVE to put all the business logic either in the DB in views/Sprocs or all of the calculations will have to be on the client (bad, IMO). Or spread out over the two (worse).
I, personally, would want all of the business logic in one place, on the server, so that clients who are fed the data are guaranteed to get the SAME results. I would not let the clients do calculations of the data.
EDIT: Also, having the clients do ANY sort of business calculations makes deployment and versioning more difficult. If you KNOW all of the business logic sits on a server, then all you have to do to update things is release a new version of the service. Then you don't care how many clients you have and who is on what version of the client app - you can still guarantee that the clients are getting the newest data via the one service.
I'd probably either construct views on the DB for the result sets (if the data calculations were simple and lent themselves to views easily) or I'd have a service on the server-side that did the calculations every X minutes and then cached those results (cache to other tables, or a different DB, or no-sql, whatever). Clients would then be free to pull data as often as they want, but the calculation of the data would actually be controlled by the service, not the clients. Letting the clients drive the updating of the data could be a mistake... The only way I'd find that acceptable really is if you put some kind of throttling mechanism on your messages so you could tell clients, "Don't ping me so fast for results, the server is lagging". Because the server needs to have some control over the load.
There's a lot to think about in this scenario, and these are just my initial thoughts.

Architecture Question - One Central Database and Many Different Programs Accessing It

I am designing a program that will build and maintain a database, and act as a central server. This is the 'first stage' of a grander plan. Coming later will be 3-5 remote programs built around the information put into this database.
The requirements are:
The remote programs must be able to access the information in the database.
The remote programs must be able to set alerts when information in the database changes.
The remote programs must be able to request the central server to go out and fetch new / different data.
So, the question is this: how do I expose this data and events to the outside world? My two choices are:
Have them communicate directly with my 'server' application. This seems easier to:
do event notifications (although I suppose I'm probably missing something in SQL).
It also seems like this is more 'upgradeable' - that is I don't need to worry about the database updating and crashing all my remote programs because something changed. I can account for this and transform it the data to a version the child program will understand.
Just go ahead and let them connect directly to the database.
This nice thing about this is that it's solved. I can use LINQ for SQL. The only thing the main server application needs to do is let the remote programs know where the database is.
I'm unsure how to trigger / relay 'events' for field changes in a database over different programs that may or may not be on the same computer.
Forgive my ignorance on this question. I feel woefully unprepared to ask it, but I'm having a hard time figuring out where to get started with this. It is my first real DB project :-/
Thanks!

If the other programs are going to need to know about updates to the database, then the best solution is to manage all db updates through your server application so it can alert clients of the changes. Otherwise it will be tough for the clients to be aware of changes to the db. This also has the advantage of hiding the implementation details of your storage solution from the clients, so you are free to change databases, etc...

My suggestion would be to go with option 1. Build out a web service that can provide the information they all need. This will be the most flexible and allow you to reduce duplicate backend code that would happen with direct communication with the database.

I would recommend looking at some Data Source design patterns first. This types of patterns will help you come up with solutions about how to manage the states of your data. Otherwise I think that I would require some more information about your requirements for the clients to make any further useful suggestions.

I recommend you learn about SQL Server and/or databases first. You don't appear to realize that most of what you want from your "central server" can all be done by SQL Server itself.

A central databse is the simplest option and the cheapest to both build and maintain.
There are however a few scenarios where a central database could cause problems:
High load on one of the systems: A high load on one of the systems could reduce performance on the other systems. For example someone running an internal report stops you being able to take orders on your eCommerce site.
With several systems writing to the same database there is a greater chance of locking.
With several systems dependent on the same database schema, how do you upgrade? All systems at the same time?
If you need to take down the database all systems stop.

How can I cache private data in a webfarm for ASP MVC

I am making a member based web app in ASP MVC3 and I am trying to plan ahead, at first our user base will not be huge, but as with any software the potential for a sudden volume spike is always a possibility.
Thinking ahead to this scenario, I know that the database is the bottleneck area on most web apps. We are using MSSQL 2008RS we will have dedicated servers with several client databases each client has there own database so if one server begins to bottle neck we can scale vertically or move some of the databases to a new server and begin filling it up.
To access the databases we use primarily LINQ 2 SQL and are currently re-factoring some of our code to make use of the IQueryable mechanisms to do a lazy load of content. but each page contains quite a bit of content from various parts of the database.
We also have a few large databases that are used for widgets in the program that rarely change but have millions of rows. The goal with those is to somehow sync them to the primary source and distribute them across several machines and then load balance those servers.
With this layout should I even worry about caching, or will the built-in caching mechanisms in MSSQL be sufficient?
If so where should I begin? I have looked briefly at app fabric but it looks as tho it is for Azure only?
Resources:
How to cache data in a MVC application
http://stephenwalther.com/blog/archive/2008/08/28/asp-net-mvc-tip-39-use-the-velocity-distributed-cache.aspx
http://stephenwalther.com/blog/archive/2008/08/29/asp-net-mvc-tip-40-don-t-cache-pages-that-require-authentication.aspx

Lazy loading is a performance killer. Its better to load the entire object graph with one join than to lazy load other properties. This is especially the case with a list of objects. If you iterate you'll end up lazy loading for each item in the list. Furthermore every call to the db has overhead. Less calls = better performance.
SO was a top 1000 website before it needed two database servers. I think you'll be ok.
If your revenue model says "each client will have its own database" than your scaling issues should be really easy to solve. Sounds like you already have a plan to scale up with more servers as your client base increases. Whats the problem?
Caching on the web tier is usually the first scaling fix you'll have to worry about. You probably don't need to do a fresh db call with each page request.
Overall this sounds like a lot of premature optimization. Your traffic hasn't reached a point where you need to be worried about scaling. Make these kinds of decisions at the last second possible.

The database cache is different to most caches - it can if course load used data into memory and re-use query plans, but that isn't really a cache as such.
AppFabric is definitely not just azure; after all, I it was you wouldnt be able to install it (and use it) locally :) but in truth there is little between AppFabroc, redis and memcached (the latter lacks persistance, of course).
But I think you should initially look at using the inbuilt asp.net caching; both data caching via HttpContext.Cache, and caching of entire responses (or, in MVC 3, partials). Obviously you should have a broad idea of what data is used heavily by lots of requests, and is safe to re-use : cache that!
Just make sure you treat all cached FAA as immutable (if you need to update the cache, re-add a modified value; don't modify the existing objects) - reason: it won't work the same if you start needing to use distributed caching, as that uses serialization, and any changes you make won't be seen by the next request.

Sometimes Connected CRUD application DAL

I am working on a Sometimes Connected CRUD application that will be primarily used by teams(2-4) of Social Workers and Nurses to track patient information in the form of a plan. The application is a revisualization of a ASP.Net app that was created before my time. There are approx 200 tables across 4 databases. The Web App version relied heavily on SP's but since this version is a winform app that will be pointing to a local db I see no reason to continue with SP's. Also of note, I had planned to use Merge Replication to handle the Sync'ing portion and there seems to be some issues with those two together.
I am trying to understand what approach to use for the DAL. I originally had planned to use LINQ to SQL but I have read tidbits that state it doesn't work in a Sometimes Connected setting. I have therefore been trying to read and experiment with numerous solutions; SubSonic, NHibernate, Entity Framework. This is a relatively simple application and due to a "looming" verion 3 redesign this effort can be borderline "throwaway." The emphasis here is on getting a desktop version up and running ASAP.
What i am asking here is for anyone with any experience using any of these technology's(or one I didn't list) to lend me your hard earned wisdom. What is my best approach, in your opinion, for me to pursue. Any other insights on creating this kind of App? I am really struggling with the DAL portion of this program.
Thank you!

If the stored procedures do what you want them to, I would have to say I'm dubious that you will get benefits by throwing them away and reimplementing them. Moreover, it shouldn't matter if you use stored procedures or LINQ to SQL style data access when it comes time to replicate your data back to the master database, so worrying about which DAL you use seems to be a red herring.
The tricky part about sometimes connected applications is coming up with a good conflict resolution system. My suggestions:
Always use RowGuids as your primary keys to tables. Merge replication works best if you always have new records uniquely keyed.
Realize that merge replication can only do so much: it is great for bringing new data in disparate systems together. It can even figure out one sided updates. It can't magically determine that your new record and my new record are actually the same nor can it really deal with changes on both sides without human intervention or priority rules.
Because of this, you will need "matching" rules to resolve records that are claiming to be new, but actually aren't. Note that this is a fuzzy step: rarely can you rely on a unique key to actually be entered exactly the same on both sides and without error. This means giving weighted matches where many of your indicators are the same or similar.
The user interface for resolving conflicts and matching up "new" records with the original needs to be easy to operate. I use something that looks similar to the classic three way merge that many source control systems use: Record A, Record B, Merged Record. They can default the Merged Record to A or B by clicking a header button, and can select each field by clicking against them as well. Finally, Merged Records fields are open for edit, because sometimes you need to take parts of the address (say) from A and B.
None of this should affect your data access layer in the slightest: this is all either lower level (merge replication, provided by the database itself) or higher level (conflict resolution, provided by your business rules for resolution) than your DAL.

If you can install a db system locally, go for something you feel familiar with. The greatest problem I think will be the syncing and merging part. You must think of several possibilities: Changed something that someone else deleted on the server. Who does decide?
Never used the Sync framework myself, just read an article. But this may give you a solid foundation to built on. But each way you go with data access, the solution to the businesslogic will probably have a much wider impact...

There is a sample app called issueVision Microsoft put out back in 2004.
http://windowsclient.net/downloads/folders/starterkits/entry1268.aspx
Found link on old thread in joelonsoftware.com. http://discuss.joelonsoftware.com/default.asp?joel.3.25830.10
Other ideas...
What about mobile broadband? A couple 3G cellular cards will work tomorrow and your app will need no changes sans large pages/graphics.
Excel spreadsheet used in the field. DTS or SSIS to import data into application. While a "better" solution is created.
Good luck!

If by SP's you mean stored procedures... I'm not sure I understand your reasoning from trying to move away from them. Considering that they're fast, proven, and already written for you (ie. tested).
Surely, if you're making an app that will mimic the original, there are definite merits to keeping as much of the original (working) codebase as possible - the least of which is speed.
I'd try installing a local copy of the db, and then pushing all affected records since the last connected period to the master db when it does get connected.

Simple Object to Database Product

I've been taking a look at some different products for .NET which propose to speed up development time by providing a way for business objects to map seamlessly to an automatically generated database. I've never had a problem writing a data access layer, but I'm wondering if this type of product will really save the time it claims. I also worry that I will be giving up too much control over the database and make it harder to track down any data level problems. Do these type of products make it better or worse in the already tough case that the database and business object structure must change?
For example:
Object Relation Mapping from Dev Express
In essence, is it worth it? Will I save "THAT" much time, effort, and future bugs?

I have used SubSonic and EntitySpaces. Once you get the hang of them, I beleive they can save you time, but as complexity of your app and volume of data grow, you may outgrow these tools. You start to lose time trying to figure out if something like a performance issue is related to the ORM or to your code. So, to answer your question, I think it depends. I tend to agree with Eric on this, high volume enterprise apps are not a good place for general purpose ORMs, but in standard fare smaller CRUD type apps, you might see some saved time.

I've found iBatis from the Apache group to be an excellent solution to this problem. My team is currently using iBatis to map all of our calls from Java to our MySQL backend. It's been a huge benefit as it's easy to manage all of our SQL queries and procedures because they're all located in XML files, not in our code. Separating SQL from your code, no matter what the language, is a great help.
Additionally, iBatis allows you to write your own data mappers to map data to and from your objects to the DB. We wanted this flexibility, as opposed to a Hibernate type solution that does everything for you, but also (IMO) limits your ability to perform complex queries.
There is a .NET version of iBatis as well.

I've recently set up ActiveRecord from the Castle Project for an app. It was pretty easy to get going. After creating a new app with it, I even used MyGeneration to script out class files for a legacy app that ActiveRecord could use in a pretty short time. It uses NHibernate to interact with the database, but takes away all the xml mapping that comes with NHibernate. The nice thing is though, if necessary, you already have NHibernate in your project, you can use its full power if you have some special cases. I'd suggest taking a look at it.

There are lots of choices of ORMs. Linq to Sql, nHibernate. For pure object databases there is db4o.
It depends on the application, but for a high volume enterprise application, I would not go this route. You need more control of your data.

I was discussing this with a friend over the weekend and it seems like the gains you make on ease of storage are lost if you need to be able to query the database outside of the application. My understanding is that these databases work by storing your object data in a de-normalized fashion. This makes it fast to retrieve entire sets of objects, but if you need to select data from a perspective that doesn't match your object model, the odbms might have a hard time getting at the particular data you want.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.