We have an application (rules engine) that has a lot of tables in memory to perform certain business rules. This engine is also used for writing back to the database when needed.
The DB structure is denormalized, and we have 5 transactional tables, that also sometimes need to be queried for reporting.
The issue here is, we want to cache the data inside the app, so it loads on App startup, and then only changes if the DB changed.
Any recommendations?
We are leaning towards creating a DB service, that will handle all Inserts, Updates and Deletes, and queue them to decrease load on the DB server (the transactional tables have loads of indexes also). Also, we are thinking of enabling the DB service to sit on top and serve all reports / other apps that need direct DB access.
The aim here ofcourse is to decrease DB hits for Select queries per request, and prioritize transactions. Also to ensure people accessing apps dont bring the DB server down.
Rules Engine is a C# desktop app, reporting and other apps are web based.
What would be the best way to go about this? I did also think of removing all indexes from my transactional table, and having a trigger insert into a new table which would be a copy, but indexed for report retrieval.
You should perhaps look at distributed caching solutions (from both performance and scalability point of view). In short, I am taking about scalable DB Services backed by distributed cache (so that multiple DB services get served by same cache).
Here's the article that discusses distributed caching including various approaches for database synchronization. And here is the blog post that list few options in .NET for distributed caching.
I've done something similar with an obscenely complex rules engine. Ultimately, I set it up so that the data was serialized centrally (with a process to release new changes, causing a new copy to be serialized and the blob stored somewhere accessible). During load, each app-server would check whether they have the up to date version of the blob, and if not fetch it (and store it locally).
Then all it has to do is deserialize the data into memory. No db hit, except for occasionally grabbing the new blob. It also means the app-server can work while the db server is offline (as long as it has a cached copy of the blob). It also polled periodically for new updates while running, of course - but only to the "is there a new blob" code (it still didn't need to hit the main tables).
You may be interested in this article It uses xml to store a readonly copy of the database (in memory). And XPath to query. Nowadays you'd prefer to query with linq, of course.
Related
When working with ASP.NET MVC and SQL Server we are wondering if caching to XML is still something to think about or are their other possibilities for this?
Like for instance we have a table called Customers. If you call this db table everytime you click on Customers or do sorting or filtering in the app why not store this info in a xml file.
Then you work only with the xml file and not the db and you update the xml after adding changes to the customers table.
It is an absolutely brilliant idea.
If:
You only have 1 client
Or you have multiple client but they don't mind seeing old data
You have a database system that doesn't provide caching possibilities
You do not use database access frameworks that can handle caching for you
In short, no, it actually is almost never a good idea.
Databases are made to be used. Most of them can handle a much higher load than programmers think they can, as long as you treat them well. If necessary, a lot of them provide perfectly fine caching possibilities to improve performance if needed.
Any useful type of caching in your application should involving refreshing that cache when anything changes. Implementing that by yourself is usually not a good idea. If you do want a very simple cache of data that was just on the screen before the user clicked away, memory would be the place for it, not a file system. Unless you need centralised session cache, but that goes way beyond "let's write some xml".
Caching to xml file is bad choice. Database system can handle load of 100 users in 5 seconds if you have 50000 records in your table. If you want more speed than this then try using In-memory sql which stores data in RAM for fast access. But for it you need high RAM capacity on server.
We have a relative large scale application that uses relational DB (MSSQL).
After a lot of reading I've decided that I want to examine using MongoDB and not MSSQL, mainly because performance and scale issues.
I read and study about Mongo and couldn't figure out the answer for the following questions:
Should we do it? Bare in mind we have the time to invest, the only question is "is it good for us?"
How to model our data?
My problem with mongo is that we have a lot of one to many relations in our DB.
After reading this great post (and the second part as well), I've realized a good practice will be to divide the decision into 3 scenarios:
1 to few
1 to many
1 to squillions.
In our db, most of the times we use one-to-many, but the problem is that most of the times it's the same "one".
For example, we have users and transactions tables.
Each user can perform a transaction, so basically what I should do is to model the user as following:
{
"name": "John",
...,
"Transactions" : [ObjectId("..."), ObjectId("..."),...]
}
So far it's fine, the problem is that we have a lot more than just transactions, for example we could have: posts, requests and many more features like transactions, and then, my users collection becomes huge (more then 25 "columns"). And also when I want to retrieve a data set I have to do several queries unlike MSSQL in which I'm just using Join statement.
Another issue is that I'll have to save a lot of extra data, for example, for each transaction I have to save the terminal ID, and in the report I'll have to show the terminal name, in that case (as for my understanding) I have 2 choices, the one is to do 2 queries and the other is to save the terminal name as well. In relational DB this is a simple join.
So maybe for schemes like ours, Mongo(or any other document based DB) is not the best choice?
I know those are a newbie questions :)
We use c# for our server side (ASP.Net Web API)
Thanks in advance!
You can face with some serious issues while modeling your data with 2 and 3 approaches:
For One to many you may face with data inconsistency or/and eventual consistency. Here, you store inside document an index (array of references) to external documents. So, for your example to add a new transaction you need two requests: create a transaction and add its reference to a user (update document). Mongo DB has ACID transactions only on document level, so for your case application for some reason can create a transaction but doesn’t add its reference to user. It can be app failures, network problems, bugs and so on. Of course, you can simulate db transaction in app with try/catch block making data cleanup when an error occurs. It will help but not in fully because app can fall down between requests.
So, if your app is high loaded after some time you can have some number of “dad” transactions which are not linked to any user. It couldn’t be a big problem if your app doesn’t query transactions directly – only via users, you will have only useless data in db. Otherwise you will have data inconsistency.
To fix that you need to create background job which will make proper cleanup. So, some period of time your data can be inconsistent – eventual consistency. For some applications, it can be ok, for another – not.
The same problem you can face while deleting transactions.
I agree, that a document with 25 arrays of references (columns) looks not very good. Working with such objects manually will be harder (testing, manual data fixes and so on.
One to squillions doesn’t have this affect but you need indexes to query efficiently. For large and shared db you can have bad performance.
In general, I’d like to say document dbs are pretty good if your app works mostly with one document (aggregate) and don’t have a lot of references to another docs and you don’t need transactions between docs. Denormalization can also be a source of inconsistency.
Key-value data is very easy to scale. Document dbs – it’s one step closer to key-value data-store. Column-oriented dbs are even more closed to key-value and so they can be scaled even better.
Also, I recommend you to consider the next measures to improve your SQL Server db performance:
Caching – perhaps you can cache some your app aggregates instead of gathering (making joins) them in SQL db all the time. For instance, Stack Overflow uses SQL Server db and Redis for caching aggregates (questions with answers, comments and so on).
Tune query performance within indexes, db structure, demoralization and so on.
If your db is hosted in on premise SQL Server then additional memory, SSD disk, table partitioning, data compressions, replication can help. As a rule, SQL Server gives a good performance with these approaches for dbs up to 1 TB.
CQRS approach.
Consider storing your app data in different databases. Every type of dbs has its own strong and weak sides. Document DB is good for storing aggregates, SQL db – for relational data and so on. Complex apps as a rule use a few db types.
I have been working on an application for a couple of years that I updated using a back-end database. The whole key is that everything is cached on the client, so that it never requires an network connection to operate, but when it does have a connection it will always pickup the latest updates. Every application updated is shipped with the latest version of the database and I wanted it to download only the minimum amount of data when the database has been updated.
I currently use a table with a timestamp to check for updates. It looks something like this.
ID - Name - Description- Severity - LastUpdated
0 - test.exe - KnownVirus - Critical - 2009-09-11 13:38
1 - test2.exe - Firewall - None - 2009-09-12 14:38
This approach was fine for what I previously needed, but I am looking to expand more function of the application to use this type of dynamic approach. All the data is currently stored as XML, but I do not want to store complete XML files in the database and only transmit changed data.
So how would you go about allowing a fairly simple approach to storing dynamic content (text/xml/json/xaml) in a database, and have the client only download new updates? I was thinking of having logic that can handle XML inserted directly
ID - Data - Revision
15 - XXX - 15
XXX would be something like <Content><File>Test.dll<File/><Description>New DLL to load.</Description></Content> and would be inserted into the cache, but this would obviously be complicated as I would need to load them in sequence.
Another approach that has been mentioned was to base it on something similar to Source Control, storing the version in the root of the file and calculating the delta to figure out the minimal amount of data that need to be sent to the client.
Anyone got any suggestions on how to approach this with no risk for data corruption? I would also to expand with features that allows me to revert possibly bad revisions, and replace them with new working ones.
It really depends on the tools you are using and the architecture you already have. Is there already a server with some logic and a data access layer?
Dynamic approaches might get complicated, slow and limit the number of solutions. Why do you need a dynamic structure? Would it be feasible to just add data by using a name-value pair approach in a relational database? Static and uniform data structures are much easier to handle.
Before going into detail, you should consider the different scenarios.
Items can be added
Items can be changed
Items can be removed (I assume)
Adding is not a big problem. The client needs to remember the last revision number it got from the server and you write a query which get everything since there.
Changing is basically the same. You should care about identification of the items. You need an unchangeable surrogate key, as it seems to be the ID you already have. (Guids may be useful here.)
Removing is tricky. You need to either flag items as deleted instead of actually removing them, or have a list of removed IDs with the revision number when they had been removed.
Storing the data in the client: Consider using a relational database like SQLite in the client. (It doesn't need installation, it is just storing in a file. Firefox for instance stores quite a lot in SQLite databases.) When using the same in the server, you can probably reuse some code. It is also transaction based, which helps to keep it consistent (rollback in case of error during synchronization).
XML - if you really need it - can be stored just as a string in the database.
When using an abstraction layer or ORM that supports SQLite (eg. NHibernate), you may also reuse some code even when there is another database used by the server. Note that the learning curve for such an ORM might be rather steep. If you don't know anything like this, it could be too much.
You don't need to force reuse of code in the client and server.
Synchronization itself shouldn't be very complicated. You have a revision number in the client and a last revision in the server. You get all new / changed and deleted items since then in the client and apply it to the local store. Update the local revision number. Commit. Done.
I would never update only a part of a revision, because then you can't really know what changed since the last synchronization. Because you do differential updates, it is essential to have a well defined state of the client.
I would go with a solution using Sync Framework.
Quote from Microsoft:
Microsoft Sync Framework is a comprehensive synchronization platform enabling collaboration and offline for applications, services and devices. Developers can build synchronization ecosystems that integrate any application, any data from any store using any protocol over any network. Sync Framework features technologies and tools that enable roaming, sharing, and taking data offline.
A key aspect of Sync Framework is the ability to create custom providers. Providers enable any data sources to participate in the Sync Framework synchronization process, allowing peer-to-peer synchronization to occur.
I have just built an application pretty much exactly as you described. I built it on top of the Microsoft Sync Framework that DjSol mentioned.
I use a C# front end application with a SqlCe database, and a SQL 2005 Server at the other end.
The following articles were extremely useful for me:
Tutorial: Synchronizing SQL Server and SQL Server Compact
Walkthrough: Creating a Sync service
Step by step N-tier configuration of Sync services for ADO.NET 2.0
How to Sync schema changed database using sync framework?
You don't say what your back-end database is, but if it's SQL Server you can use SqlCE (SQL Server Compact Edition) as the client DB and then use RDA merge replication to update the client DB as desired. This will handle all your requirements for sure; there is no need to reinvent the wheel for such a common requirement.
I am making a member based web app in ASP MVC3 and I am trying to plan ahead, at first our user base will not be huge, but as with any software the potential for a sudden volume spike is always a possibility.
Thinking ahead to this scenario, I know that the database is the bottleneck area on most web apps. We are using MSSQL 2008RS we will have dedicated servers with several client databases each client has there own database so if one server begins to bottle neck we can scale vertically or move some of the databases to a new server and begin filling it up.
To access the databases we use primarily LINQ 2 SQL and are currently re-factoring some of our code to make use of the IQueryable mechanisms to do a lazy load of content. but each page contains quite a bit of content from various parts of the database.
We also have a few large databases that are used for widgets in the program that rarely change but have millions of rows. The goal with those is to somehow sync them to the primary source and distribute them across several machines and then load balance those servers.
With this layout should I even worry about caching, or will the built-in caching mechanisms in MSSQL be sufficient?
If so where should I begin? I have looked briefly at app fabric but it looks as tho it is for Azure only?
Resources:
How to cache data in a MVC application
http://stephenwalther.com/blog/archive/2008/08/28/asp-net-mvc-tip-39-use-the-velocity-distributed-cache.aspx
http://stephenwalther.com/blog/archive/2008/08/29/asp-net-mvc-tip-40-don-t-cache-pages-that-require-authentication.aspx
Lazy loading is a performance killer. Its better to load the entire object graph with one join than to lazy load other properties. This is especially the case with a list of objects. If you iterate you'll end up lazy loading for each item in the list. Furthermore every call to the db has overhead. Less calls = better performance.
SO was a top 1000 website before it needed two database servers. I think you'll be ok.
If your revenue model says "each client will have its own database" than your scaling issues should be really easy to solve. Sounds like you already have a plan to scale up with more servers as your client base increases. Whats the problem?
Caching on the web tier is usually the first scaling fix you'll have to worry about. You probably don't need to do a fresh db call with each page request.
Overall this sounds like a lot of premature optimization. Your traffic hasn't reached a point where you need to be worried about scaling. Make these kinds of decisions at the last second possible.
The database cache is different to most caches - it can if course load used data into memory and re-use query plans, but that isn't really a cache as such.
AppFabric is definitely not just azure; after all, I it was you wouldnt be able to install it (and use it) locally :) but in truth there is little between AppFabroc, redis and memcached (the latter lacks persistance, of course).
But I think you should initially look at using the inbuilt asp.net caching; both data caching via HttpContext.Cache, and caching of entire responses (or, in MVC 3, partials). Obviously you should have a broad idea of what data is used heavily by lots of requests, and is safe to re-use : cache that!
Just make sure you treat all cached FAA as immutable (if you need to update the cache, re-add a modified value; don't modify the existing objects) - reason: it won't work the same if you start needing to use distributed caching, as that uses serialization, and any changes you make won't be seen by the next request.
The project I am working on is facing a design dilemma on how to get objects and collections of objects from a database. Sometimes it is useful to buffer *all* objects from the database with its properties into memory, sometimes it is useful to just set an object id and query its properties on-demand (1 db call per object to get all properties). And in many cases, collections need to support both buffering objects into memory and being initialized with minimum information for on-demand access. After all, not everything can be buffered into memory and not everything can be read on-demand. It is a ubiquitous memory vs IO problem.
Did anyone have to face the same problem? How did affect your design? What are the tough lessons learned? Any other thoughts and recommendations?
EDIT: my project is a classic example of a business layer dll, consumed by a web application, web services and desktop application. When a list of products is requested for a desktop application and displayed only by product name, it is ok to have this sequence of steps to display all products (lets say there is a million of products in the database):
1. One db call to get all product names
2. One db call to get all product information if the user clicks on the product to see details (on-demand access)
However, if this same API is going to be consumed by a web service to display all products with details, the network traffic will become chatty. The better sequence in this case would be:
1. What the heck, buffer all products and product fields from just one db call (in this case buffering 1 million products also looks scary)
It depends how often the data changes. It is common to cache static and near static data (usually with a cache expiry window).
Databases are already designed to cache data, so provided network I/O is not a bottleneck, let the database do what it is good at.
Have you looked at some of the caching technologies available?
.NET Framework 4 ObjectCache Class
Cache Class: Using the ASP.NET Cache outside of ASP.NET
Velocity: Build Better Data-Driven Apps With Distributed Caching
Object cache for C#
This is not a popular position, but avoid all caching unless absolutely necessary or if you know for sure immediately that you're going to need to "Internet-scale." Tried to scale out a layered-cache atop the database? Are you going to write-through the cache and only read or wait for an LRU object to write changes? what happens when another app or web services tier sit atop the DB and get inconsistent reads?
Most modern databases already have cache and likely can implement them better than you, just determine if you want to hit the DB wire every time you need something. In a large majority of the cases, the DB'll perform just fine and you'll keep your consistency. BASE and CAP theory is nice and fun to talk about and imagine, but you sometimes just can't beat the cost-to-market of just hitting the good old database. Stress test and determine your hotspots, implement your cache conservatively if needed.