In my C# 3.5 application,code performs following steps:
1.Loop through a collection[of length 10]
2.For each item in step 1, fetch records from oracle database by executing a stored proc[here,record count is typically 100]
3.Process items fetched in step 2.
4.Go to next item in step 1.
My question here, with regard to performance, is it a good idea to fetch all items in step #2[ie. 10 * 100=1000 records] in one shot rather than connecting to database in each step and retrieving the 10 records?
Thanks.
Yes it's slightly better because you will lose the overhead of connecting to the DB, but you will still have the overhead of 10 stored procedure calls. If you could find a way to pass all 10 items as parameter to the stored proc and execute just one stored proc call, I think you would get a better performance.
Depending on how intense the connection steps are, it might be better to fetch all the records at once. However, keep in mind that premature optimization is the root of all evil. :-)
Generally it is better to pull all the records from the database in one stored procedure call.
This is countered when the stored procedure call is long running or otherwise extensive enough to cause contention on the table. In your case however with only a 1000 records, I doubt that will be an issue.
Yes, it is an incredibly good idea. The key to database performance is to run as many operations in bulk as possible.
For example, consider just the interaction between PL/SQL and SQL. These two languages run on the same server and are very thoroughly integrated. Yet I routinely see an order of magnitude performance increase when I reduce or eliminate any interaction between the two. I'm sure the same thing applies to interaction between the application and the database.
Even though the number of records may be small, bulking your operations is an excellent habit to get into. It's not premature optimization, it's a best practice that will save you a lot of time and effort later.
Related
I need to load multiple sql statements from SQL Server into DataTables. Most of the statements return some 10.000 to 100.000 records and each take up to a few seconds to load.
My guess is that this is simply due to the amount of data that needs to be shoved around. The statements themselves don't take much time to process.
So I tried to use Parallel.For() to load the data in parallel, hoping that the overall processing time would decrease. I do get a 10% performance increase, but that is not enough. A reason might be that my machine is only a dual core, thus limiting the benefit here. The server on which the program will be deployed has 16 cores though.
My question is, how I could improve the performance more? Would the use of Asynchronous Data Service Queries be a better solution (BeginExecute, etc.) than PLINQ? Or maybe some other approach?
The SQl Server is running on the same machine. This is also the case on the deployment server.
EDIT:
I've run some tests with using a DataReader instead of a DataTable. This already decreased the load times by about 50%. Great! Still I am wondering whether parallel processing with BeginExecute would improve the overall load time if a multiprocessor machine is used. Does anybody have experience with this? Thanks for any help on this!
UPDATE:
I found that about half of the loading time was consumed by processing the sql statement. In SQL Server Management Studio the statements took only a fraction of the time, but somehow they take much longer through ADO.NET. So by using DataReaders instead of loading DataTables and adapting the sql statements I've come down to about 25% of the initial loading time. Loading the DataReaders in parallel threads with Parallel.For() does not make an improvement here. So for now I am happy with the result and leave it at that. Maybe when we update to .NET 4.5 I'll give the asnchronous DataReader loading a try.
My guess is that this is simply due to the amount of data that needs to be shoved around.
No, it is due to using a SLOW framework. I am pulling nearly a million rows into a dictionary in less than 5 seconds in one of my apps. DataTables are SLOW.
You have to change the nature of the problem. Let's be honest, who needs to view 10.000 to 100.000 records per request? I think no one.
You need to consider to handle paging and in your case, paging should be done on sql server. To make this clear, lets say you have stored procedure named "GetRecords". Modify this stored procedure to accept page parameter and return only data relevant for specific page (let's say 100 records only) and total page count. Inside app just show this 100 records (they will fly) and handle selected page index.
Hope this helps, best regards!
Do you often have to load these requests? If so, why not use a distributed cache?
What is the most common and easy to implement solution to improve speed for SQL Server 2008R2 database & .Net 3.5 application.
We have an application with the following attributes:
- small number of simultaneous clients (~200 at MOST).
- complex math operations on SQL server side
- we are imitating something to oracle's row-level security (Thus using tvf's and storedprocs instead of directly querying tables)
-The main problem is that users perform high amount of updates/inserts/deletes/calculations, and they freak out because they need to wait for pages to reload while those actions are done.
The questions I need clarification on are as follows:
What is faster: returning whole dataset from sql server and performing math functions on C# side, or performing calculation functions on sql side (thus, not returning extra columns). Or is it only hardware dependant?
Will caching improve performance (For example if we add redis cache). Or caching solutions only feasible for large number of clients?
Is it a bad practice to pre-calculate some of the data and store somewhere in the database (so, when user will request, it will already be calculated). Or this is what caching suppose to do? If this is not a bad practice, how do you configure SQL server to do calculations when there are available resources?
How caching can improve performance if it still needs to go to the database and see if any records were updated?
general suggestions and comments are also welcome.
Let's separate the answer to two parts, performance of your query execution and caching to improve that performance.
I believe you should start with addressing the load on your SQL server and try to optimize process running on it to the maximum, this should resolve most of the need to implement any caching.
From your question it appears that you have a system that is used for both transactional processing and also for aggregations/calculations, this will often result in conflicts when these two tasks lock each other resources. A long query performing math operations may lock/hold an object required by the UI.
Optimizing these systems to work side-by-side and improving the query efficiency is the key for having increased performance.
To start, I'll use your questions. What is faster? depends on the actual aggregation you are performing, if you're dealing with a set operations, i.e. SUM/AVG of a column, keep it in SQL, on the other hand if you find yourself having a cursor in the procedure, move it to C#. Cursors will kill your performance!
You asked if it's bad-practice to aggregate data aside and later query that repository, this is the best practice :). You'll end up with having one database catering the transactional, high-paced clients and another database storing the aggregated info, this will be quickly and easily available for your other needs. Taking it to the next step will result with you having a data warehouse, so this is definitely where you want to be heading when you have a lot information and calculations.
Lastly, caching, this is tricky and really depends on the specific nature of your needs, I'd say take the above approach, spend the time in improving the processes and I expect the end result will make caching redundant.
One of your best friends for the task is SQL Profiler, run a trace on stmt:completed to see what are the highest duration/io/cpu and pick on them first.
Good luck!
We're working on an online system right now, and I'm confused about when to use in-memory search and when to use database search. Can someone please help me figure out the factors to be considered when it comes to searching records?
One factor is that if you need to go through the same results over and over, be sure to cache them in memory. This becomes an issue when you're using linq-to-sql or Entity FrameworkâORMs that support deferred execution.
So if you have an IQueryable<SomeType> that you need to go through multiple times, make sure you materialize it with a ToList() before firing up multiple foreach loops.
It depends on the situation, though I generally prefer in memory search when possible.
However depends on the context, for example if records can get updated between one search and another , and you need the most updated record at the time of the search, obviously you need database search.
If the size of the recordset (data table) that you need to store in memory is huge, maybe is better another search directly on the database.
However keep present that if you can and if performance are important loading data into a datatable and searching, filtering with LINQ for example can increase performance of the search itself.
Another thing to keep in mind is performance of database server and performance of application server : if the database server if fast enough on the search query, maybe you don't need to caching in memory on the application and so you can avoid one step. Keep in mind that caching for in memory search move computational request from database to the application server...
An absolute response is not possible for your question, it is relative on your context ...
It depends on the number of records. If the number of records is small then it's better to keep that in memory, i.e cache the records. Also, if the records get queried frequently then go for the memory option.
But if the record number or record size is too large than it's better to go for the database search option.
Basically it depends on how much memory you have on your server...
I am trying to use sqlite in my application as a sort of cache. I say sort of because items never expire from my cache and I am not storing anything. I simply need to use the cache to store all ids I processed before. I don't want to process anything twice.
I am entering items into the cache at 10,000 messages/sec for a total of 150 million messages. My table is pretty simple. It only has one text column which stores the id's. I was doing this all in memory using a dictionary, however, I am processing millions of messages and, although it is fast that way, I ran out of memory after some time.
I have researched sqlite and performance and I understand that configuration is key, however, I am still getting horrible performance on inserts (I haven't tried selects yet). I am not able to keep up with even 5000 inserts/sec. Maybe this is as good as it gets.
My connection string is as below:
Data Source=filename;Version=3;Count Changes=off;Journal Mode=off;
Pooling=true;Cache Size=10000;Page Size=4096;Synchronous=off
Thanks for any help you can provide!
If you are doing lots of inserts or updates at once, put them in a transaction.
Also, if you are executing essentially the same SQL each time, use a parameterized statement.
Have you looked at the SQLite Optimization FAQ (bit old).
SQLite performance tuning and optimization on embedded systems
If you have many threads writing to the same database, then you're going to run into concurrency problems with that many transactions per second. SQLite always locks the whole database for writes so only one write transaction can be processed at a time.
An alternative is Oracle Berkley DB with SQLite. This latest version of Berkley DB includes a SQLite front end that has a page-level locking mechanism instead of database level. This provides much higher numbers of transactions per second when there is a high concurrency requirement.
http://www.oracle.com/technetwork/database/berkeleydb/overview/index.html
It includes the same SQLite.NET provider and is supposed to be a drop-in replacement.
Since you're requirements are so specific you may be better off with something more dedicated, like memcached. This will provide a very high throughput caching implementation that will be a lot more memory efficient than a simple hashtable.
Is there a port of memcache to .Net?
I am designing a database and I would like to normalize the database. In one query I will joining about 30-40 tables. Will this hurt the website performance if it ever becomes extremely popular? This will be the main query and it will be getting called 50% of the time. The other queries I will be joining about two tables.
I have a choice right now to normalize or not to normalize but if the normalization becomes a problem in the future I may have to rewrite 40% of the software and it may take me a long time. Does normalization really hurt in this case? Should I denormalize now while I have the time?
I quote: "normalize for correctness, denormalize for speed - and only when necessary"
I refer you to: In terms of databases, is "Normalize for correctness, denormalize for performance" a right mantra?
HTH.
When performance is a concern, there are usually better alternatives than denormalization:
Creating appropriate indexes and statistics on the involved tables
Caching
Materialized views (Indexed views in MS SQL Server)
Having a denormalized copy of your tables (used exclusively for the queries that need them), in addition to the normalized tables that are used in most cases (requires writing synchronization code, that could run either as a trigger or a scheduled job depending on the data accuracy you need)
Normalization can hurt performance. However this is no reason to denormalize prematurely.
Start with full normalization and then you'll see if you have any performance problems. At the rate you are describing (1000 updates/inserts per day) I don't think you'll run into problems unless the tables are huge.
And even if there are tons of database optimization options (Indexes, Prepared stored procedures, materialized views, ...) that you can use.
Maybe I missing something here. But if your architecture requires you to join 30 to 40 tables in a single query, ad that query is the main use of your site then you have larger problems.
I agree with others, don't prematurely optimize your site. However, you should optimize your architecture to account for you main use case. a 40 table join for a query run over 50% of the time is not optimized IMO.
Don't make early optimizations. Denormalization isn't the only way to speed up a website. Your caching strategy is also quite important and if that query of 30-40 tables is of fairly static data, caching the results may prove to be a better optimization.
Also, take into account the number of writes to the number of reads. If you are doing approximately 10 reads for every insert or update, you could say that data is fairly static, hence you should cache it for some period of time.
If you end up denormalizing your schema, your writes will also become more expensive and potentially slow things down as well.
Really analyze your problem before making too many optimizations and also wait to see where your bottlenecks in the system really as you might end up being surprised as to what it is you should optimize in the first place.