I have a software who does a heavy processing based on some files.
I have to query some tables in SQL Server in the process and this is killing the DB and the application performance. (other applications use the same tables).
After optimizing queries and code, getting better results but not enough. After research I reached the solution: Caching some query results. My idea is cache one specific table (identified as the overhead) rows that the file being process need.
I was think in using AppCache Fabric (I'm on MS stack), made some tests it have a large memory usage for small objects ( appcache service have ~350MB of ram usage without objects). But I need to make some queries in these result table (like search for lastname, ssn, birthdate etc.)
My second option is MongoDb as a cache store. I've research about this and most of people I read recommend using memcached or Redis, but I'm using Windows servers and they're not supported officialy.
Using mongo as cache store in this case it is a good approach? Or AppFabric Caching + tag search is better?
It is hard to tell what is better because we don't know enough about your bottlenecks. A lot is depending on quality of the data you're discussing. If the data is very static and is not called constantly but to compile the data set is time-consuming, the good solution might be to use the materialized view. If this data is frequently called than you better caching it on some server (e.g. app fabric).
There are many techniques and possibilities. But you really need to think of the network traffic, demand, size, etc, etc. And it is hard to answer this here without knowing all the details.
Looks like you are on the right way but may be all you need is just a parametrized query. Hard to tell. But I would add Materialized view into the roster that you just posted. May be all you need is to build this view from all the data you need and just access its contents.
My question to you would be that what are your long-term goals or estimates for your application? If this is the highest load you are going to expereince then tuning the DB or using MVL would be an answer. But the long term solution to this is distributed caching and you are already thinking along those lines. Your data requirements is what we'd called "reference data" or "lookup-data" and once you are excuting multiple lookups with limited DB resources there will be performance issue and your DB will become a performance bottleneck.
So the solution, that you are already thinking of, is caching this "reference" data in a cache without the need to go to the database, while, at the same time, keeping cache synchronized with the Database.
Appfabric I wouldn't be too sure about as it will have the same support issues that you mention. What is your budget like? Can you think about spending on a cachisng solution like NCache?
Related
When working with ASP.NET MVC and SQL Server we are wondering if caching to XML is still something to think about or are their other possibilities for this?
Like for instance we have a table called Customers. If you call this db table everytime you click on Customers or do sorting or filtering in the app why not store this info in a xml file.
Then you work only with the xml file and not the db and you update the xml after adding changes to the customers table.
It is an absolutely brilliant idea.
If:
You only have 1 client
Or you have multiple client but they don't mind seeing old data
You have a database system that doesn't provide caching possibilities
You do not use database access frameworks that can handle caching for you
In short, no, it actually is almost never a good idea.
Databases are made to be used. Most of them can handle a much higher load than programmers think they can, as long as you treat them well. If necessary, a lot of them provide perfectly fine caching possibilities to improve performance if needed.
Any useful type of caching in your application should involving refreshing that cache when anything changes. Implementing that by yourself is usually not a good idea. If you do want a very simple cache of data that was just on the screen before the user clicked away, memory would be the place for it, not a file system. Unless you need centralised session cache, but that goes way beyond "let's write some xml".
Caching to xml file is bad choice. Database system can handle load of 100 users in 5 seconds if you have 50000 records in your table. If you want more speed than this then try using In-memory sql which stores data in RAM for fast access. But for it you need high RAM capacity on server.
In a current project of mine I need to manage and store a moderate number (from 10-100 to 5000+) of users (ID, username, and some other data).
This means I have to be able to find users quickly at runtime, and I have to be able to save and restore the database to continue statistics after a restart of the program. I will also need to register every connect/disconnect/login/logout of a user for the statistics. (And some other data as well, but you get the idea).
In the past, I saved settings and other stuff in encoded textfiles, or serialized the needed objects and wrote them down. But these methods require me to rewrite the whole database on each change, and that's increasingly slowing it down (especially with a growing number of users/entries), isn't it?
Now the question is: What is the best way to do this kind of thing in C#?
Unfortunately, I don't have any experience in SQL or other query languages (except for a bit of LINQ), but that's not posing any problem for me, as I have the time and motivation to learn one (or more if required) for this task.
Most effective is highly subjective based on who you ask even if narrowing down this question to specific needs. If you are storing non-relational data Mongo or some other NoSQL type of database such as Raven DB would be effective. If your data has a relational shape then an RDBMS such as MySQL, SQL Server, or Oracle would be effective. Relational databases are ideal if you are going to have heavy reporting requirements as this allows non-developers more ease of access in writing simple SQL queries against it. But also keeping in mind performance with disk cache persistence that databases provide. Commonly accessed data is stored in memory to save the round trips to the disk (with hybrid drives I suppose accessing some files directly accomplishes the same thing however SSD's are still not as fast as RAM access). So you really need to ask yourself some questions to identify the best solution for you; What is the shape of your data (flat, relational, etc), do you have reporting requirements where less technical team members need to be able to query the data repository, and what are your performance metrics?
I am working on re-engineering/upgrade of a tool. The database communication is in C++(unmanaged ADO) and connects to SQL server 2005.
I had a few queries regarding archiving and backup/restore techniques.
Generally archiving is different than backup/restore . can someone provide any link which explains me that .Presently the solution uses bcp tool for archival.I see lot of dependency on table names in the code. what are the things i have to consider in choosing the design(considering i have to take up the backup/archival on a button click, database size of 100mb at max)
Will moving the entire communication to .net will be of any help? considering lot of ORM tools. also all the bussiness logic and UI is in C#
What s the best method to verify the archival data ?
PS: the questionmight be too high level, but i did not get any proper link to understand this. It will be really helpful if someone can answer. I can provide more details!
Thanks in advance!
At 100 MB, I would say you should probably not spend too much time on archiving, and just use traditional backup strategies. The size of your database is so small that archiving would be quite an elaborate operation with very little gain, as the archiving process would typically only be relevant in the case of huge databases.
Generally speaking, a backup in database terms is a way to provide recoverability in case of a disaster (accidental data deletion, server crash, etc). Archiving mostly means you partition your data.
A possible goal with archiving is to keep specific data available for querying, but without the ability to alter it. When dealing with high volume databases, this is an excellent way to increase performance, as read-only data can be indexed much more densely than "hot" data. It also allows you to move the read-only data to an isolated RAID partition that is optimized for READ operations, and will not have to bother with the typical RDBMS IO. Also, by removing the non-active data from the regular database means the size of the data contained in your tables will decrease, which should boost performance of the overall system.
Archiving is typically done for legal reasons. The data in question might not be important for the business anymore, but the IRS or banking rules require it to be available for a certain amount of time.
Using SQL Server, you can archive your data using partitioning strategies. This normally involves figuring out the criteria based on which you will split the data. An example of this could be a date (i.e. data older than 3 years will be moved to the archive-part of the database). In case of huge systems, it might also make sense to split data based on geographical criteria (I.e. Americas on one server, Europe on another).
To answer your questions:
1) See the explanation written above
2) It really depends on what the goal of upgrading is. Moving it to .NET will get the code to be managed, but how important is that for the business?
3) If you do decide to partition, verifying it works could include issuing a query on the original database for data that contains both values before and after the threshold you will be using for partitioning, then splitting the data, and re-issuing the query afterwards to verify it still returns the same record-set. If you configure the system to use an automatic sliding window, you could also keep an eye on the system to ensure that data will automatically be moved to the archive partition.
Again, if the 100MB is not a typo, I would think your database is too small to really benefit from archiving. If your goal is to speed things up, put the system on a server that is able to load the whole database into RAM, or use SSD drives.
If you need to establish a data archive for legal or administrative reasons, give horizontal table partitioning a look. It's a pretty straight-forward process that is mostly handled by SQL Server automatically.
Hope this helps you out!
I have been working with asp.net web api over recent weeks with great success. It has really assisted me with producing an interface for mobile clients to programme against over http.
I reached a point where I need some assistance.
I have a new endpoint which will can a database and could return 100K results. I am using OData to filter the data and return a paginated set of the data.
As this could happen for mutliple requests, I am concerned with performance. Returning 100K records from the database every time is not ideal. So I have some ideas.
First one is to cache the 100K results and let OData do its magic on this every time. I am working with AppFabric distributed cache as its a load balanced environment. However caching such an amount of data in AppFabric could result in memory complications so think I am best avoiding this.
Next option is to forget about the magic of OData and send the filters I use in to the database and return only the required data each time. So in other words hit the db every time.
I could look at using a caching handler like the version outlined in this article to cache in the http cache -> http://byterot.blogspot.ie/2012/06/aspnet-web-api-caching-handler.html The drawback of this is if the data gets update via another system which it may, the cached data is not expired.
Any other tips as to how I may handle this scenario, large amount of data, filtered with odata in conjunction with web api?
This is a question that's likely to result in a wide variety of answers. That said, let me put on my pre-MSFT hat and give you my two cents.
A lot of architecture questions are best answered with the consultant's answer, "It depends." The answer depends in your case on a few things specifically. Some developers have a problem with caching layers because there are additional things to think about. An ACID-compliant database buys you a lot of insurance that you have at least a very finite amount of eventual consistency.
If it were me making this decision, I would be considering a few things:
How many rows am I returning on a regular basis?
Are they the same rows over and over?
How big is that in memory? (100k is really not that many rows; you're right about not wanting those 100k rows to hit the disk every time, but it's probably not a problem to keep them all in memory; SQL Server would probably do this for you anyway.)
What am I willing to deal with re: eventual consistency? Do I want some other software to deal with it? (What frequently scares people about caches are things like ensuring that invalidation and insertion get done properly and consistently from different applications/different places in the application.)
Given the information you've already provided (tiered architecture, willingness to try a distributed cache) I think you should pursue a caching layer. There are lots of good caches out there. AppFabric worked fine for us before I worked at Microsoft, but I've also dealt with a variety of other caching layers as well.
Assuming you use Entity Framework it would be the best option to return the IQueryable of EF directly. This way the magic of OData will work directly on your database. $limit and $take will be mapped directly to your SQL query.
best way is to a distributed cache, which you are already using. but the cache provider which you are using i.e. AppFabric, has some limitations. by limitations i mean the feature limitations. check out NCache which is a well mature and feature rich third party distributed cache provider.
if you want to understand the differences of NCache and Appfabric, check the youtube link below, this is FYI....
http://www.youtube.com/watch?v=3CPi1QlskrU
The caching that I have pointed out in the blog http://byterot.blogspot.ie/2012/06/aspnet-web-api-caching-handler.html applies to HTTP caching also known as output caching. Actually the data itself is not cached on the server but on the client or mid-stream cache servers, so is not suitable for what you have it mind.
I am making a member based web app in ASP MVC3 and I am trying to plan ahead, at first our user base will not be huge, but as with any software the potential for a sudden volume spike is always a possibility.
Thinking ahead to this scenario, I know that the database is the bottleneck area on most web apps. We are using MSSQL 2008RS we will have dedicated servers with several client databases each client has there own database so if one server begins to bottle neck we can scale vertically or move some of the databases to a new server and begin filling it up.
To access the databases we use primarily LINQ 2 SQL and are currently re-factoring some of our code to make use of the IQueryable mechanisms to do a lazy load of content. but each page contains quite a bit of content from various parts of the database.
We also have a few large databases that are used for widgets in the program that rarely change but have millions of rows. The goal with those is to somehow sync them to the primary source and distribute them across several machines and then load balance those servers.
With this layout should I even worry about caching, or will the built-in caching mechanisms in MSSQL be sufficient?
If so where should I begin? I have looked briefly at app fabric but it looks as tho it is for Azure only?
Resources:
How to cache data in a MVC application
http://stephenwalther.com/blog/archive/2008/08/28/asp-net-mvc-tip-39-use-the-velocity-distributed-cache.aspx
http://stephenwalther.com/blog/archive/2008/08/29/asp-net-mvc-tip-40-don-t-cache-pages-that-require-authentication.aspx
Lazy loading is a performance killer. Its better to load the entire object graph with one join than to lazy load other properties. This is especially the case with a list of objects. If you iterate you'll end up lazy loading for each item in the list. Furthermore every call to the db has overhead. Less calls = better performance.
SO was a top 1000 website before it needed two database servers. I think you'll be ok.
If your revenue model says "each client will have its own database" than your scaling issues should be really easy to solve. Sounds like you already have a plan to scale up with more servers as your client base increases. Whats the problem?
Caching on the web tier is usually the first scaling fix you'll have to worry about. You probably don't need to do a fresh db call with each page request.
Overall this sounds like a lot of premature optimization. Your traffic hasn't reached a point where you need to be worried about scaling. Make these kinds of decisions at the last second possible.
The database cache is different to most caches - it can if course load used data into memory and re-use query plans, but that isn't really a cache as such.
AppFabric is definitely not just azure; after all, I it was you wouldnt be able to install it (and use it) locally :) but in truth there is little between AppFabroc, redis and memcached (the latter lacks persistance, of course).
But I think you should initially look at using the inbuilt asp.net caching; both data caching via HttpContext.Cache, and caching of entire responses (or, in MVC 3, partials). Obviously you should have a broad idea of what data is used heavily by lots of requests, and is safe to re-use : cache that!
Just make sure you treat all cached FAA as immutable (if you need to update the cache, re-add a modified value; don't modify the existing objects) - reason: it won't work the same if you start needing to use distributed caching, as that uses serialization, and any changes you make won't be seen by the next request.