I am trying to improve the performance of a Windows Service, developed in C# and .NET 2.0, that processes a great amount of files. I want to process more files per second.
In its process, for each file, the service does a database query to retrieve some parameters of the system.
Those parameters change annually, and I am thinking that I would gain some performance, if a loaded those parameters as a singleton and refreshed this singleton periodically. Instead of make a database query for each file being processed, I would get the parameters from memory.
To complete the scenario : I am using Windows Server 2008 R2 64 Bits, SQL Server 2008 is the database, C# and .NET 2.0 as already mentioned.
I am right in my approach? What would you do?
Thanks!
Those parameters change anually
Yes, do cache them in memory. Especially if they are large or complex.
You should take care to invalidate them at the right time once a year, depending how accurate that has to be.
Simply caching them for an hour or even for a few minutes might be a good compromise.
RAM memory data access is definitely faster that any other data access, except than cpu memories like registries and CPU cache
Chaching would be faster even if you change it every minute, so yes, caching that query is very faster
Crossing a network or going to disk is always orders of magnitude slower than in memory access.
Databases can cache data in memory so if you can achieve that and you're not crossing a network, the database might be faster since their data access patterns/indexes etc... may be faster than you're code. But, that's best case - if you need it faster, in memory caches help.
But, be aware that in memory caches can add complexity and bugs. You have to determine the lifetime of the cached data, how to refresh and the more complex it is, the more wierd edge case state bugs you will have. Even though they change annually, you have to handle that cusp.
Related
I need to load multiple sql statements from SQL Server into DataTables. Most of the statements return some 10.000 to 100.000 records and each take up to a few seconds to load.
My guess is that this is simply due to the amount of data that needs to be shoved around. The statements themselves don't take much time to process.
So I tried to use Parallel.For() to load the data in parallel, hoping that the overall processing time would decrease. I do get a 10% performance increase, but that is not enough. A reason might be that my machine is only a dual core, thus limiting the benefit here. The server on which the program will be deployed has 16 cores though.
My question is, how I could improve the performance more? Would the use of Asynchronous Data Service Queries be a better solution (BeginExecute, etc.) than PLINQ? Or maybe some other approach?
The SQl Server is running on the same machine. This is also the case on the deployment server.
EDIT:
I've run some tests with using a DataReader instead of a DataTable. This already decreased the load times by about 50%. Great! Still I am wondering whether parallel processing with BeginExecute would improve the overall load time if a multiprocessor machine is used. Does anybody have experience with this? Thanks for any help on this!
UPDATE:
I found that about half of the loading time was consumed by processing the sql statement. In SQL Server Management Studio the statements took only a fraction of the time, but somehow they take much longer through ADO.NET. So by using DataReaders instead of loading DataTables and adapting the sql statements I've come down to about 25% of the initial loading time. Loading the DataReaders in parallel threads with Parallel.For() does not make an improvement here. So for now I am happy with the result and leave it at that. Maybe when we update to .NET 4.5 I'll give the asnchronous DataReader loading a try.
My guess is that this is simply due to the amount of data that needs to be shoved around.
No, it is due to using a SLOW framework. I am pulling nearly a million rows into a dictionary in less than 5 seconds in one of my apps. DataTables are SLOW.
You have to change the nature of the problem. Let's be honest, who needs to view 10.000 to 100.000 records per request? I think no one.
You need to consider to handle paging and in your case, paging should be done on sql server. To make this clear, lets say you have stored procedure named "GetRecords". Modify this stored procedure to accept page parameter and return only data relevant for specific page (let's say 100 records only) and total page count. Inside app just show this 100 records (they will fly) and handle selected page index.
Hope this helps, best regards!
Do you often have to load these requests? If so, why not use a distributed cache?
We have an ASP.NET 4.0 application that draws from a database a complex data structure that takes over 12 hours to push into an in memory data structure (that is later stored in HttpRuntime.Cache). The size of the data structure is quickly increasing and we can't continue waiting 12+ hours to get it into memory if the application restarts. This is a major issue if you want to change the web.config or any code in the web application that causes a restart - it means a long wait before the application can be used, and hinders development or updating the deployment.
The data structure MUST be in memory to work at a speed that makes the website usable. In memory databases such as memcache or Redis are slow in comparison to HttpRuntime.Cache, and would not work in our situation (in memory db's have to serialize put/get, plus they can't reference each other, they use keys which are lookups - degrading performance, plus with a large amount of keys the performance goes down quickly). Performance is a must here.
What we would like to do is quickly dump the HttpRuntime.Cache to disk before the application ends (on a restart), and be able to load it back immediately when the application starts again (hopefully within minutes instead of 12+ hours or days).
The in-memory structure is around 50GB.
Is there a solution to this?
In memory databases such as memcache or Redis are slow in comparison to HttpRuntime.Cache
Yes, but they are very fast compared to a 12+ hour spin-up. Personally, I think you're taking the wrong approach here in forcing load of a 50 GB structure. Just a suggestion, but we use HttpRuntime.Cache as part of a multi-tier caching strategy:
local cache is checked etc first
otherwise redis is used as the next tier of cache (which is faster than the underlying data, persistent, and supports a number of app servers) (then local cache is updated)
otherwise, the underlying database is hit (and then both redis and local cache are updated)
The point being, at load we don't require anything in memory - it is filled as it is needed, and from then on it is fast. We also use pub/sub (again courtesy of redis) to ensure cache invalidation is prompt. The net result: it is fast enough when cold, and very fast when warm.
Basically, I would look at anything that avoids needing the 50GB data before you can do anything.
If this data isn't really cache, but is your data, I would look at serialization on a proper object model. I would suggest protobuf-net (I'm biased as the author) as a strong candidate here - very fast and very small output.
I am trying to use sqlite in my application as a sort of cache. I say sort of because items never expire from my cache and I am not storing anything. I simply need to use the cache to store all ids I processed before. I don't want to process anything twice.
I am entering items into the cache at 10,000 messages/sec for a total of 150 million messages. My table is pretty simple. It only has one text column which stores the id's. I was doing this all in memory using a dictionary, however, I am processing millions of messages and, although it is fast that way, I ran out of memory after some time.
I have researched sqlite and performance and I understand that configuration is key, however, I am still getting horrible performance on inserts (I haven't tried selects yet). I am not able to keep up with even 5000 inserts/sec. Maybe this is as good as it gets.
My connection string is as below:
Data Source=filename;Version=3;Count Changes=off;Journal Mode=off;
Pooling=true;Cache Size=10000;Page Size=4096;Synchronous=off
Thanks for any help you can provide!
If you are doing lots of inserts or updates at once, put them in a transaction.
Also, if you are executing essentially the same SQL each time, use a parameterized statement.
Have you looked at the SQLite Optimization FAQ (bit old).
SQLite performance tuning and optimization on embedded systems
If you have many threads writing to the same database, then you're going to run into concurrency problems with that many transactions per second. SQLite always locks the whole database for writes so only one write transaction can be processed at a time.
An alternative is Oracle Berkley DB with SQLite. This latest version of Berkley DB includes a SQLite front end that has a page-level locking mechanism instead of database level. This provides much higher numbers of transactions per second when there is a high concurrency requirement.
http://www.oracle.com/technetwork/database/berkeleydb/overview/index.html
It includes the same SQLite.NET provider and is supposed to be a drop-in replacement.
Since you're requirements are so specific you may be better off with something more dedicated, like memcached. This will provide a very high throughput caching implementation that will be a lot more memory efficient than a simple hashtable.
Is there a port of memcache to .Net?
I have some design issue,
Lets say you have an application that stores data into its cache during runtime, what do you think is the maximum amount of data (in mb) an application should cache before it must use DB.
Thanks.
How much memory do you have? How much can you afford to lose when the app or system crashes? How long can you afford the start up time to be when you restart and have to reload those caches? Typically, even with caches, you need to write through to the DB any way (or to something) to persist the data.
And if you eventually have "too much data" to fit in to memory, then you're now paging working sets from the DB to memory. You also have the cache synchronization issues if someone changes the DB behind your back.
All sorts of fun issues.
But, if you have the memory, go ahead and use it.
There's a reasonable chance that SQLServer is better at this than you are, and that you should cache approximately nothing.
Like Will says, though, if you're willing to pay in terms of lost data on a crash, you might increase responsiveness by not persisting until later. Anything you can do "for free", in the sense of giving up the guarantees you'd get by writing it to the DB immediately, the DB can in principle do for itself. SQL Server is quite aggressive about using almost all available free memory (and quite polite in reducing its memory usage when other apps need RAM). So any memory you use for caching, you're taking away from SQL Server's caches.
I am working on an application which has potential for a large memory load (>5gb) but has a requirement to run on 32bit and .NET 2 based desktops due to the customer deployment environment. My solution so far has been to use an app-wide data store for these large volume objects, when an object is assigned to the store, the store checks for the total memory usage by the app and if it is getting close to the limit it will start serialising some of the older objects in the store to the user's temp folder, retrieving them back into memory as and when they are needed. This is proving to be decidedly unreliable, as if other objects within the app start using memory, the store has no prompt to clear up and make space. I did look at using weak pointers to hold the in-memory data objects, with them being serialised to disk when they were released, however the objects seemed to be getting released almost immediately, especially in debug, causing a massive performance hit as the app was serialising everything.
Are there any useful patterns/paradigms I should be using to handle this? I have googled extensively but as yet haven't found anything useful.
I thought virtual memory was supposed to have you covered in this situation?
Anyways, it seems suspect that you really need all 5gb of data in memory at any given moment - you can't possibly be processing all that data at any given time - at least not on what sounds like a consumer PC. You didn't go into detail about your data, but something to me smells like the object itself is poorly designed in the sense that you need the entire set to be in memory to work with it. Have you thought about trying to fragment out your data into more sensible units - and then do some preemptive loading of the data from disk, just before it needs to be processed? You'd essentially be paying a more constant performance trade-off this way, but you'd reduce your current thrashing issue.
Maybe you go with Managing Memory-Mapped Files and look here. In .NET 2.0 you have to use PInvoke to that functions. Since .NET 4.0 you have efficient built-in functionality with MemoryMappedFile.
Also take a look at:
http://msdn.microsoft.com/en-us/library/dd997372.aspx
You can't store 5GB data in-memory efficiently. You have 2 GB limit per process in 32-bit OS and 4 GB limit per 32-bit process in 64-bit Windows-on-Windows
So you have choice:
Go in Google's Chrome way (and FireFox 4) and maintain potions of data between processes. It may be applicable if your application started under 64-bit OS and you have some reasons to keep your app 32-bit. But this is not so easy way. If you don't have 64-bit OS I wonder where you get >5GB RAM?
If you have 32-bit OS when any solution will be file-based. When you try to keep data in memory (thru I wonder how you address them in memory under 32-bit and 2 GB per process limit) OS just continuously swap portions of data (memory pages) to disk and restores them again and again when you access it. You incur great performance penalty and you already noticed it (I guessed from description of your problem). The main problem OS can't predict when you need one data and when you want another. So it just trying to do best by reading and writing memory pages on/from disk.
So you already use disk storage indirecltly in inefficient way, MMFs just give you same solution in efficient and controlled manner.
You can rearchitecture your application to use MMFs and OS will help you in efficient caching. Do the quick test by yourself MMF maybe good enough for your needs.
Anyway I don't see any other solution to work with dataset greater than available RAM other than file-based. And usually better to have direct control on data manipulation especially when such amount of data came and needs to be processed.
When you have to store huge loads of data and mantain accessibility, sometimes the most useful solution is to use data store and management system like database. Database (MySQL for example) can store a lots of typical data types and of course binary data too. Maybe you can store your object to database (directly or by programming business object model) and get it when you need to. This solution sometimes can solve many problems with data managing (moving, backup, searching, updating...), and storage (data layer) - and it's location independent - mayby this point of view can help you.