The project I am working on is facing a design dilemma on how to get objects and collections of objects from a database. Sometimes it is useful to buffer *all* objects from the database with its properties into memory, sometimes it is useful to just set an object id and query its properties on-demand (1 db call per object to get all properties). And in many cases, collections need to support both buffering objects into memory and being initialized with minimum information for on-demand access. After all, not everything can be buffered into memory and not everything can be read on-demand. It is a ubiquitous memory vs IO problem.
Did anyone have to face the same problem? How did affect your design? What are the tough lessons learned? Any other thoughts and recommendations?
EDIT: my project is a classic example of a business layer dll, consumed by a web application, web services and desktop application. When a list of products is requested for a desktop application and displayed only by product name, it is ok to have this sequence of steps to display all products (lets say there is a million of products in the database):
1. One db call to get all product names
2. One db call to get all product information if the user clicks on the product to see details (on-demand access)
However, if this same API is going to be consumed by a web service to display all products with details, the network traffic will become chatty. The better sequence in this case would be:
1. What the heck, buffer all products and product fields from just one db call (in this case buffering 1 million products also looks scary)
It depends how often the data changes. It is common to cache static and near static data (usually with a cache expiry window).
Databases are already designed to cache data, so provided network I/O is not a bottleneck, let the database do what it is good at.
Have you looked at some of the caching technologies available?
.NET Framework 4 ObjectCache Class
Cache Class: Using the ASP.NET Cache outside of ASP.NET
Velocity: Build Better Data-Driven Apps With Distributed Caching
Object cache for C#
This is not a popular position, but avoid all caching unless absolutely necessary or if you know for sure immediately that you're going to need to "Internet-scale." Tried to scale out a layered-cache atop the database? Are you going to write-through the cache and only read or wait for an LRU object to write changes? what happens when another app or web services tier sit atop the DB and get inconsistent reads?
Most modern databases already have cache and likely can implement them better than you, just determine if you want to hit the DB wire every time you need something. In a large majority of the cases, the DB'll perform just fine and you'll keep your consistency. BASE and CAP theory is nice and fun to talk about and imagine, but you sometimes just can't beat the cost-to-market of just hitting the good old database. Stress test and determine your hotspots, implement your cache conservatively if needed.
Related
I am new to this concept so i need guidance that what will be best to use in following scenario.
I have to make a desktop application that contains many features like parts Stock ,Employees Data,Company Cars Data etc etc.
now the problem is that many users would be using the application and offices situated are in different cities in which this application is installed.
I want a scheme that if one uploads any data to database other gets its reflection and other instantly gets updated.for example if more cars are added everyone using gets their cars list updated.
I was having idea to use webservices and data should be stored somewhere on website database so that everyone's application refreshes lists every say 20 seconds or so.
Any help is appreciated
You wouldn't reload all your data constantly; there are a couple of common approaches here:
keep a list of changes; if you add new data you add the primary data record and you write the fact that the change happened (essentially an "events" list). Then you can query the change log periodically to get and additions/updates/deletes simply by asking for all events after (x)
if the infrastructure allows, some kind of pub/sub framework - same approach really but typically using middleware for the changes, rather than the main DB
re how you get the data; polling is simple and effective; active pushing is harder to setup but may reduce latency - not sure it is worth it here
Another approach, though, is to design it as a web app - then all your data lives at the server-farm and is trivial to update immediately. Your "desktop" app could be a web page using ajax
Try Cloud Computing and store your data into cloud
OK trying to recover my points here after the downvote.
The cloud (windows Azure especially) is a great fit for this project. Web services would help too as they can be easily scaled out to a number of webservers (Instances in Azure speak). Having many desktop clients talking directly to a database is not a good idea and often results in scalability issues.
Output cacheing could help a great deal here if you are refreshing your client side data frequently, this can be implemented with almost no code. This makes it much easier to do than managing lists of changes.
I am making a member based web app in ASP MVC3 and I am trying to plan ahead, at first our user base will not be huge, but as with any software the potential for a sudden volume spike is always a possibility.
Thinking ahead to this scenario, I know that the database is the bottleneck area on most web apps. We are using MSSQL 2008RS we will have dedicated servers with several client databases each client has there own database so if one server begins to bottle neck we can scale vertically or move some of the databases to a new server and begin filling it up.
To access the databases we use primarily LINQ 2 SQL and are currently re-factoring some of our code to make use of the IQueryable mechanisms to do a lazy load of content. but each page contains quite a bit of content from various parts of the database.
We also have a few large databases that are used for widgets in the program that rarely change but have millions of rows. The goal with those is to somehow sync them to the primary source and distribute them across several machines and then load balance those servers.
With this layout should I even worry about caching, or will the built-in caching mechanisms in MSSQL be sufficient?
If so where should I begin? I have looked briefly at app fabric but it looks as tho it is for Azure only?
Resources:
How to cache data in a MVC application
http://stephenwalther.com/blog/archive/2008/08/28/asp-net-mvc-tip-39-use-the-velocity-distributed-cache.aspx
http://stephenwalther.com/blog/archive/2008/08/29/asp-net-mvc-tip-40-don-t-cache-pages-that-require-authentication.aspx
Lazy loading is a performance killer. Its better to load the entire object graph with one join than to lazy load other properties. This is especially the case with a list of objects. If you iterate you'll end up lazy loading for each item in the list. Furthermore every call to the db has overhead. Less calls = better performance.
SO was a top 1000 website before it needed two database servers. I think you'll be ok.
If your revenue model says "each client will have its own database" than your scaling issues should be really easy to solve. Sounds like you already have a plan to scale up with more servers as your client base increases. Whats the problem?
Caching on the web tier is usually the first scaling fix you'll have to worry about. You probably don't need to do a fresh db call with each page request.
Overall this sounds like a lot of premature optimization. Your traffic hasn't reached a point where you need to be worried about scaling. Make these kinds of decisions at the last second possible.
The database cache is different to most caches - it can if course load used data into memory and re-use query plans, but that isn't really a cache as such.
AppFabric is definitely not just azure; after all, I it was you wouldnt be able to install it (and use it) locally :) but in truth there is little between AppFabroc, redis and memcached (the latter lacks persistance, of course).
But I think you should initially look at using the inbuilt asp.net caching; both data caching via HttpContext.Cache, and caching of entire responses (or, in MVC 3, partials). Obviously you should have a broad idea of what data is used heavily by lots of requests, and is safe to re-use : cache that!
Just make sure you treat all cached FAA as immutable (if you need to update the cache, re-add a modified value; don't modify the existing objects) - reason: it won't work the same if you start needing to use distributed caching, as that uses serialization, and any changes you make won't be seen by the next request.
We have an application (rules engine) that has a lot of tables in memory to perform certain business rules. This engine is also used for writing back to the database when needed.
The DB structure is denormalized, and we have 5 transactional tables, that also sometimes need to be queried for reporting.
The issue here is, we want to cache the data inside the app, so it loads on App startup, and then only changes if the DB changed.
Any recommendations?
We are leaning towards creating a DB service, that will handle all Inserts, Updates and Deletes, and queue them to decrease load on the DB server (the transactional tables have loads of indexes also). Also, we are thinking of enabling the DB service to sit on top and serve all reports / other apps that need direct DB access.
The aim here ofcourse is to decrease DB hits for Select queries per request, and prioritize transactions. Also to ensure people accessing apps dont bring the DB server down.
Rules Engine is a C# desktop app, reporting and other apps are web based.
What would be the best way to go about this? I did also think of removing all indexes from my transactional table, and having a trigger insert into a new table which would be a copy, but indexed for report retrieval.
You should perhaps look at distributed caching solutions (from both performance and scalability point of view). In short, I am taking about scalable DB Services backed by distributed cache (so that multiple DB services get served by same cache).
Here's the article that discusses distributed caching including various approaches for database synchronization. And here is the blog post that list few options in .NET for distributed caching.
I've done something similar with an obscenely complex rules engine. Ultimately, I set it up so that the data was serialized centrally (with a process to release new changes, causing a new copy to be serialized and the blob stored somewhere accessible). During load, each app-server would check whether they have the up to date version of the blob, and if not fetch it (and store it locally).
Then all it has to do is deserialize the data into memory. No db hit, except for occasionally grabbing the new blob. It also means the app-server can work while the db server is offline (as long as it has a cached copy of the blob). It also polled periodically for new updates while running, of course - but only to the "is there a new blob" code (it still didn't need to hit the main tables).
You may be interested in this article It uses xml to store a readonly copy of the database (in memory). And XPath to query. Nowadays you'd prefer to query with linq, of course.
I am currently developing a system that holds all relevant master data like for example a customer, or information about an operational system that exists within our system landscape.
The assigned IDs to these entities are unique within the enterprise. When some system stores e.g. customer related data, it has to hold the master data ID of the customer as well.
The master data system is based on .Net and a MSSQL 2005.
Now my question is, when developing another system with it's own assemblies, database, etc., that uses data of the MDM system, would you store that data redundantly in the other systems database, create its own business entities (like customer) and hard code the required master data from the MDM in the other database (or by ETL)? That way the other system is detached from the MDM but only stores the global Master Data IDs.
Or would you integrate the assemblies of the MDM into other systems (if .Net of course) and use the data layer of the MDM to load global entities (like a customer)?
Or would you let the other system create it's own entities but for retrieving master data you would use a SOAP interface provided by the MDM.
I tend to use the approach no. 1 because I think it's better to detach other systems from the MDM solution (separation of concerns). Since the MDM solution can hold much more data of a customer entity than I would need in some other system where a just the customer's name is required. Option 3 would be possible but web services might slow down an operational system alot. What do you think?
This carries a high risk of data getting out of sync followed by a major headache trying to unravel it.
This is a viable option, but you'd have to maintain a decent set of assemblies for people to use. This is a non-trivial task as they need to be robust, well documented, have a usable API plus some sensible release management, much like you'd expect from any 3rd party framework. This is usually a level of rigour above normal LOB development practices in my experience.
Got to be the way to go I'd say. A Service Oriented Architecture allowing other people to access the data, but giving them the flexibility for how they access/consume it.
As you say, performance may be the deciding factor - in which case 1 combined with 3 could be the best. i.e. the local copies are viewed and treated as cached data only rather than reliable up-to-date copies. The application could do a quick check with the master DB to see if the local cache is still valid (much like a HEAD request in HTTP land) and then either use the local data or refresh it from the master database.
The pros and cons of solution 1 are:
Pros:
- Faster response time (vs. having to consult the Master at each operation, you could cache something to alleviate this)
- Your satellite system can work even if the Master is temporarily unavailable.
Cons:
- You risk to work with obsolete data (if you got your ETL refresh at midnight, you won't get any new or updated record until the next midnight cycle)
- Of course you can't allow the satellite to ever touch any local copy of the MDM (two-way aligns, especially with multiple different satellites becomes a nightmare).
So depending on the specifics, Solution 1 may be ok. I'd prefer to go for querying the Master every time (again, possibly caching answers for a bit of time) but is more of a personal preference.
background: we've got a number of server processes and client apps that are used entirely internally, in a fairly controlled environment. we capture a significant amount of data every day that goes into a couple database machines. most everything is c#, with a few c++ apps.
just about every app has some basic (if not extensive) dependence on database data, whether it's for historical data, daily-calculated values, or assorted parameters. as the whole environment has gotten a bit more sprawling, I've been wondering about the sense in sticking an intermediary in between all client and server apps and the database, a sort of "database data broker". any app that needs values from the db makes a request to the data broker, instead of a dll wrapper function that calls a stored proc.
one immediate downside is that the data would make two trips across the network: from db to broker, and from broker to calling app. seems like poor form, but the amount of data would be small enough in each request that I'm ok with it as far as performance goes.
one (seeming) upside is that it would be trivial to set up a test environment, as it would entail just setting up a test data broker, and there's no maintaining of db connection strings locally anywhere else. also, I've been pondering creating a mini request language so you wouldn't have to enumerate functions for each dataset you might request (instead of GetX() and GetY(), there would be Get("name = X")
am I over-engineering this, or is it possibly a worthy architecture?
edit: thanks for all the great comments so far, great food for thought.
It depends on what you're trying to accomplish with it. According to Rocky Lhotka, you should only add a tier if you are forced to, kicking and screaming all the way.
I agree with him: don't tier unless you need to. I think there are valid reasons to add additional tiers, usually for purposes of security, scalability and maintainability. The question becomes: is yours a valid reason?
It looks like the major reason is maintainability. Does it outweigh the benefits you get by not having the tier?
only you can answer these:
what are the benefits of doing this?
what are the problems/risks of doing this?
do you need this to make testing easier or even possible?
if you make this change and when it goes live and crashes will you be fired?
if you make the changes and it goes live will you get a promotion?
etc...
As the former architect of a system that also used a database heavily as a "hub," I can say that there are several drawbacks that you should be aware of. Our system used databases:
As a transaction store (typical OLTP stuff)
As a staging queue (submitted but unprocessed transactions)
As a historical data store (results of processed transactions)
As an interoperation layer (untranslated commands or transactions issued from other systems)
One of the major drawbacks is ownership costs. When your databases become the single point of failure for so many types of operations, it becomes necessary to ensure that they are all hosted in high-availability environments. This not only expensive from a hardware perspective, but it is also expensive to support deployments to HA environments, since developers typically have very limited visibility to the internals.
A second drawback is that you have to seriously design integrity in to all of your tables. In a typical SOA environment, you have complete control over how data is modified. When you expose it through database tables, you must consider that any application with the right credentials will have the ability to modify data. Because of this, you must carefully consider utilitarian implementations of constraints. If you had a single service managing persistence, you could be much looser in constraints on the database and enforce them in code.
Third, if you ever want to expose any functionality that the database tables currently allow you to provide to outside parties, you must write service code anyway, so you might be better served doing it strategically as opposed to reacting to requests.
Fourth, UI interaction directly with the data layer creates security risks, especially if the client is a thick client.
Finally, writing code that responds to events (service calls) is much easier than polling code. Typically, organizations that rely heavily on database polling end up reinventing the wheel every time a new project requires a new "monitoring service." It can be avoided by creating a "framework," but those have their own pitfalls (primarily around prescription versus adoption).
This is just a laundry list of problems I have encountered. It's not necessarily meant to dissuade you from using databases for these functions, but it helps to know the dangers ahead of time so you can at least plan for them if they ever do become issues.
EDIT
Just thought of another scenario that caused us pains. Versioning your changes can be difficult. For example, if you need to change the shape of a table (normalize/denormalize), it has a cascading effect if multiple applications rely on it. In a SOA scenario, it is much easier, because you can keep your old API, change the internal interaction so that it works with the changed tables, and allow consumers to migrate to the new version on their own schedule.
A data broker sounds like a really good way to abstract out the multiple data sources for your apps. It would be easy to consolidate, change repositories, or otherwise move data around if needed in the future.
I may be misunderstanding something, but it seems to me like you should consider some entity framework. That is a framework you can use to "map" your interaction with the db to some domain objects. That way you work locally on domain objects that gets filled form your db, and when it is time to persist the state of your objects to the base, the framework handles all the connections back and forth. In this way you can also easily mock up these domain objects for unit testing without needing a db connection.
Check out NHibernate for a good entity framework alternative.
If you already have the database related know-how I think it's not a bad decission.
Good things that I can think of:
if the data model is consistent you can plug in new tools easily without making any changes in the other apps.
maybe you can have running the database more reliabily than your apps, so if one of them fails, the other one can still be working.
you can make backups and rollbacks using the database tools.
you can do emergency fixes manipulating the data directly with sql or some visual tool.
But if you have to learn new frameworks along the way, maybe the benefits are not worth the extra initial effort.
"any app that needs values from the db makes a request to the data broker"
When database technology was being invented over 40 years ago, the people doing that inventing had ideas along the lines of "any app that needs values from the db makes a request to the dbms".
Have you ever pondered the possibility that YOU ALREADY HAVE a "data broker", and that there might be very little added value in creating a second one of your own ?