LINQ to SQL - How to make this works with database faster - c#

I have a problem. My LINQ to SQL queries are pushing data to the database at ~1000 rows per second. But this is much too slow for me. The objects are not complicated. CPU usage is <10% and bandwidth is not the bottleneck too.
10% is on client, on server is 0% or max 1% generally not working at all, not traversing indexes etc.
Why 1000/s are slow, i need something around 20000/s - 200000/s to solve my problem in other way i will get more data than i can calculate.
I dont using transaction but LINQ using, when i post for example milion objects new objects to DataContext and run SubmitChanges() then this is inserting in LINQ internal transaction.
I dont use parallel LINQ, i dont have many selects, mostly in this scenario i'm inserting objects and want use all resources i have not only 5% od cpu and 10kb/s of network!

when i post for example milion objects
Forget it. Linq2sql is not intended for such large batch updates/inserts.
The problem is that Linq2sql will execute a separate insert (or update) statement for each insert (update). This kind of behaviour is not suitable with such large numbers.
For inserts you should look into SqlBulkCopy because it is a lot faster (and really order of magnitudes faster).

Some performance optimization can be achived with LINQ-to-SQL using first off precompiled queries. A large part of the cost is compiling the actual query.
http://www.albahari.com/nutshell/speedinguplinqtosql.aspx
http://msdn.microsoft.com/en-us/library/bb399335.aspx
Also you can disable object tracking which may give you milliseconds of improvement. This is done on the datacontext right after you instantiate it.

I also encountered this problem before. The solution I used is Entity Framework. There is a tutorial here. One traditional way is to use LINQ-To-Entity, which has similar syntax and seamless integration of C# objects. This way gave me 10x acceleration in my impression. But a more efficient (in magnitude) way is to write SQL statement by yourself, and then use ExecuteStoreQuery function to fetch the results. It requires you to write SQL rather than LINQ statements, but the returned results can still be read by C# easily.

Related

Using LINQ vs SQL for Filtering Collection

I have a very general question regarding the use of LINQ vs SQL to filter a collection. Lets say you are running a fairly complex filter on a database table. It's running, say 10,000 times and the filters could be different every time. Performance wise, are you better off loading the entire database table collection into memory and executing the filters with LINQ, or should you let the database handle the filtering with SQL (since that's what is was built to do). Any thoughts?
EDIT: I should have been more clear. Lets assume we're talking about a table with 1000 records with 20 columns (containing int/string/date data). Currently in my app I am running one query every 1/2 hour to pull in all of the data into a collection (saving that collection in the application cache) and filtering that cached collection throughout my app. I'm wondering if that is worse than doing tons of round trips to the database server (it's Oracle fwiw).
After the update:
It's running, say 10,000 times and
I'm going to assume a table with 1000 records
It seems reasonable to assume the 1k records will fit easily in memory.
And then running 10k filters will be much cheaper in memory (LINQ).
Using SQL would mean loading 10M records, a lot of I/O.
EDIT
Its alwyas depends on the amount of data you have. If you have large amount data than go for sql and if less than for the linq. its also depends on the how frequently calling the data from sql server it its too frequently than its better to load in memory and than apply linq but if not than sql is better.
First Answer
Its better to go on sql side rather than loading in memory and than apply linq filter.
The one reason is better to go for sql rather an linq is
if go for linq
when you are getting 10,000 record it loads in memory as well as increase the nework traffic
if go for sql
no of record decreses so amount of memory utilise is less and aslo decrease networ traffic.
Depends on how big your table is and what type of data it stores.
Personally, I'd go with returning all the data if you plan to use all your filters during the same request.
If it's a filter on demand using ajax, you could reload the data from the database everytime (insuring by the same time your data is up to date)
This will probably cause some debate on the role of a database! I had this exact problem a little while back, some relatively complex filtering (things like "is in X country, where price is y and has the keyword z) and it was horrifically slow. Coupled with this, I was not allowed to change the database structure because it was a third party database.
I swapped out all of the logic, so that the database just returned the results (which i cached every hour) and did the filtering in memory - when I did this I saw massive performance increases.
I will say that is far better to let SQL do the complex filter and rest of processing, but why you may ask.
The main reason is because SQL Server have the index information's that you have set and use this index to access data very fast. If you load them on Linq then you do not have this index information for fast accessing the data, and you lose time to access them. Also you lose time to compile the linq every time.
You can make a simple test to see this different by your self. What test ? Create a simple table with hundred random string, and index this field with the string. Then make search on string field, one using linq and one direct asking the sql.
Update
My first thinking was that the SQL keep the index and make very quick access to the search data base on your SQL.
Then I think that linq can also translate this filter to sql and then get the data, then you make your action etc...
now I think that the actually reason is depend what actions you do. Is faster to run direct the SQL, but the reason of that is depend on how you actually set your linq.
If you try to load all in memory and then use linq then you lose for speed from SQL index, and lose memory, and lose a lot of action to move your data from sql to memory.
If you get data using linq, and then no other search need to be made, then you lose on the moving of all that data on memory, and lose memory.
t depends on the amount of data you are filtering on.
You say the filter runs 10K time and it can be different everytime, in this case if you don't have much data in database you can load that on to server variable.
If you have hundred thousands of records on database that you should not do this perhaps you can create indexes on database and per-compiled procedures to fetch data faster.
You can implement cache facade in between that helps you to store data in server side on first request and update it as per your requirement. (you can write the cache to fill variable only if data has limit of records).
You can calculate time to get data from database by running some test queries and observations. At the same time you can observer the response time from server if the data is stored in memory and calculate the difference and decide as per that.
There can be many other tricks but the base line is
You have to observer and decide.

How to maximize performance?

I have a problem which I cannot seem to get around no matter how hard I try.
This company works in market analysis, and have pretty large tables (300K - 1M rows) and MANY columns (think 250-300) which we do some calculations on.
I´ll try to get straight to the problem:
The problem is the filtering of the data. All databases I´ve tried so far are way too slow to select data and return it.
At the moment I am storing the entire table in memory and filtering using dynamic LINQ.
However, while this is quite fast (about 100 ms to filter 250 000 rows) I need better results than this...
Is there any way I can change something in my code (not the data model) which could speed the filtering up?
I have tried using:
DataTable.Select which is slow. Dynamic LINQ which is better, but
still too slow. Normal LINQ (just for testing purposes) which almost
is good enough. Fetching from MySQL and do the processing later on
which is badass slow.
At the beginning of this project we thought that some high-performance database would be able to handle this, but I tried:
H2 (IKVM)
HSQLDB (compiled ODBC-driver)
CubeSQL
MySQL
SQL
SQLite
...
And they are all very slow to interface .NET and get results from.
I have also tried splitting the data into chunks and combining them later in runtime to make the total amount of data which needs filtering smaller.
Is there any way in this universe I can make this faster?
Thanks in advance!
UPDATE
I just want to add that I have not created this database in question.
To add some figures, if I do a simple select of 2 field in the database query window (SQLyog) like this (visit_munic_name is indexed):
SELECT key1, key2 FROM table1 WHERE filter1 = filterValue1
It takes 125 milliseconds on 225639 rows.
Why is it so slow? I have tested 2 different boxes.
Of course they must change someting, obviously?
You do not explain what exactly you want to do, or why filtering a lot of rows is important. Why should it matter how fast you can filter 1M rows to get an aggregate if your database can precalculate that aggregate for you? In any case it seems you are using the wrong tools for the job.
On one hand, 1M rows is a small number of rows for most databases. As long as you have the proper indexes, querying shouldn't be a big problem. I suspect that either you do not have indexes on your query columns or you want to perform ad-hoc queries on non-indexed columns.
Furthermore, it doesn't matter which database you use if your data schema is wrong for the job. Analytical applications typically use star schemas to allow much faster queries for a lot more data than you describe.
All databases used for analysis purposes use special data structures which require that you transform your data to a form they like.
For typical relational databases you have to create star schemas that are combined with cubes to precalculate aggregates.
Column databases store data in a columnar format usually combined with compression to achieve fast analytical queries, but they require that you learn to query them in their own language, which may be very different than the SQL language most people are accustomed to.
On the other hand, the way you query (LINQ or DataTable.Select or whatever) has minimal effect on performance. Picking the proper data structure is much more important.
For instance, using a Dictionary<> is much faster than using any of the techniques you mentioned. A dictionary essentially checks for single values in memory. Executing DataTable.Select without indexes, using LINQ to Datasets or to Objects is essentially the same as scanning all entries of an array or a List<> for a specific value,because that is what all these methods do - scan an entire list sequentially.
The various LINQ providers do not do the job of a database. They do not optimize your queries. They just execute what you tell them to execute. Even doing a binary search on a sorted list is faster than using the generic LINQ providers.
There are various things you can try, depending on what you need to do:
If you are looking for a quick way to slice and dice your data, use an existing product like PowerPivot functionality of Excel 2010. PowerPivot loads and compresses MANY millions of rows in an in-memory columnar format and allows you to query your data just as you would with a Pivot table, and even define joins with other in memory sources.
If you want a more repeatable process you can either create the appropriate star schemas in a relational database or use a columnar database. In either case you will have to write the scripts to load your data in the proper structures.
If you are creating your own application you really need to investigate the various algorithms and structures used by other similar tools either for in memory.

Is there anything faster than SqlDataReader in .NET?

I need to load one column of strings from table on SqlServer into Array in memory using C#.
Is there a faster way than open SqlDataReader and loop through it.
Table is large and time is critical.
EDIT
I am trying to build .dll and use it on server for some operations on database. But it is to slow for now. If this is fastest than I have to redesign the database. I tough there may be some solution how to speed thing up.
Data Reader
About the fastest access you will get to SQL is with the SqlDataReader.
Profile it
It's worth actually profiling where your performance issue is. Usually, where you think the performance issue is, is proven to be totally wrong after you've profiled it.
For example it could be:
The time... the query takes to run
The time... the data takes to copy across the network/process boundry
The time... .Net takes to load the data into memory
The time... your code takes to do something with it
Profiling each of these in isolation will give you a better idea of where your bottleneck is. For profiling your code, there is a great article from Microsoft
Cache it
The thing to look at to improve performance is to work out if you need to load all that data every time. Can the list (or part of it) be cached? Take a look at the new System.Runtime.Caching namespace.
Rewrite as T-SQL
If you're doing purely data operations (as your question suggests), you could rewrite your code which is using the data to be T-SQL and run natively on SQL. This has the potential to be much faster, as you will be working with the data directly and not shifting it about.
If your code has a lot of necessary procedural logic, you can try mixing T-SQL with CLR Integration giving you the benefits of both worlds.
This very much comes down to the complexity (or more procedural nature) of your logic.
If all else fails
If all areas are optimal (or as near as), and your design is without fault. I wouldn't even get into micro-optimisation, I'd just throw hardware at it.
What hardware? Try the reliability and performance monitor to find out where the bottle neck is. Most likely place for the problem you describe HDD or RAM.
If SqlDataReader isn't fast enough, perhaps you should store your stuff somewhere else, such as an (in-memory) cache.
No. It is actually not only the fastest way - it is the ONLY (!) way. All other mechanisms INTERNALLY use a DataReader anyway.
I suspect that SqlDataReader is about as good as you're going to get.
SqlDataReader is the fastest way. Make sure you use the get by ordinal methods rather than get by column name. e.g. GetString(1);
Also worthwhile is experimenting with MinPoolSize in the connection string so that there are always some connections in the pool.
The SqlDataReader will be the fastest way.
Optimize the use of it, by using the appropriate Getxxx method , which takes an ordinal as parameter.
If it is not fast enough, see if you can tweak your query. Put a covering index on the column (s) that you want to retrieve. By doing so, Sql Server only has to read the index, and does not have to go to the table directly to retrieve all the info that is required.
What about transforming one column of rows to one row of columns, and having only one row to read? SqlDataReader has an optimization for reading a single row (System.Data.CommandBehavior.SingleRow argument of ExecuteReader), so maybe it can improve the speed a bit.
I see several advantages:
Single row improvement,
No need to access an array on each iteration (reader[0]),
Cloning an array (reader) to another one may be faster than looping through elements and adding each one to a new array.
On the other hand, it has a disadvantage to force SQL database to do more work.
"Provides a way of reading a forward-only stream of rows from a SQL Server database" This is the use of SqlDataReader from MSDN . The Data structure behind SqlDataReder only allow read forward, it's optimized for reading data in one direction. In my opinion, I want to use SqlDataReader than DataSet for simple data reading.
You have 4 sets of overheads
- Disk Access
- .net code (cpu)
- SQL server code (cpu)
- Time to switch between managed and unmanaged code (cpu)
Firstly is
select * where column = “junk”
fast enough for you, if not the only solution is to make the disk faster. (You can get data from SQL Server faster than it can read it)
You may be able to define a Sql Server function in C# then run the function over the column; sorry I don’t know how to do it. This may be faster than a data reader.
If you have more than one CPU, and you know a value the middle of the table, you could try using more than one thread.
You may be able to write some TSQL that combines all the strings into a single string using a separator you know is safe. Then split the string up again in C#. This will reduce the number of round trips between managed and unmanaged code.
Some surface-level things to consider that may affect speed (besides a data-reader):
Database Query Optimization
OrderBy is expensive
Distinct is expensive
RowCount is expensive
GroupBy is expensive
etc. Sometimes you can't live without these things, but if you can handle some of these things in your C# code instead, it may be faster.
Database Table indexing (for starters, are the fields in your WHERE clause indexed?)
Database Table DataTypes (are you using the smallest possible, given the data?)
Why are you converting the datareader to an array?
e.g., would it serve just as well to create an adapter/datatable that you then would not need to convert to an array?
Have you looked into Entity Framework? (might be slower...but if you're out of options, might be worthwhile to look into just to make sure)
Just random thoughts. Not sure what might help in your situation.
If responsiveness is an issue loading a great deal of data, look at using the asynchronous methods - BeginReader.
I use this all the time for populating large GUI elements in the background while the app continues to be responsive.
You haven't said exactly how large this data is, or why you are loading it all into an array.
Often times, for large amounts of data, you may want to leave it in the database or let the database do the heavy lifting. But we'd need to know what kind of processing you are doing that needs it all in an array at one time.

Improving performance Linq to Sql Compact Edition

I'm writing a WPF client app, using Linq to Sql with Sql Compact edition.
The db is relatively small (3MB) and read-only.
Bottom line is that The performance are not as good as I hoped them to be, and I'm looking for tips and practical ways to increase that.
More facts:
The schema contains around a dozen of entities with extensive relations between them.
Profiling the app found out that the query is being run quite fast but building the c# Entities is the the process that take the most time (could be up to 8 seconds).
Mostly I believe because we have used LoadWith, and the DataContext got no choice but to build the objects graph in memory.
I can provide additional information, if needed.
EDIT:
As I mentioned the db is read-only so DataContext is not tracking changes.
We are making use of static queries on reoccurring queries. The problem is when the application is initializing and we prefetch many objects to memory to be served as cache.
Thanks for your help.
Ariel
Well, you might find that making use of lazy loading (rather than eager loading) might help increase the performance (i.e. avoid using LoadWith) since the entities won't need memory allocated for the relationship chains (or deep loading of the object graph) and instead they will be populated on demand.
However, you'll need to be focused in your design to support this (otherwise you will simply move the performance bottleneck to become overly "chatty" with regard to SQL statements being executed against the SQL CE database.
The DataContext can also start to bloat (memory) as it tracks changes. You might need to consider your approach to how you use Data Contexts (for instance, you can attach them to new contexts provided the original context has been disposed).
A very simple solution is to use staticly declared compiled linq queries. This is of course not that practical, but it will improve performance as the expression trees will only need to be built once, during compile-time, instead of being dynamically created every time the query is called for execution.
This might help:
http://msmvps.com/blogs/omar/archive/2008/10/27/solving-common-problems-with-compiled-queries-in-linq-to-sql-for-high-demand-asp-net-websites.aspx

Comparing i4o vs. PLINQ for larger collections

I have a question for anyone who has experience on i4o or PLINQ. I have a big object collection (about 400K ) needed to query. The logic is very simple and straightforward. For example, there has a collection of Person objects, I need to find the persons matched with same firstName, lastName, datebirth, or the first initial of FirstName/lastname, etc. It is just a time consuming process using LINQ to Object.
I am wondering if i4o (http://www.codeplex.com/i4o)
or PLINQ can help on improving the query performance. Which one is better? And if there has any approach out there.
Thanks!
With 400k objects, I wonder whether a database (either in-process or out-of-process) wouldn't be a more appropriate answer. This then abstracts the index creation process. In particular, any database will support multiple different indexes over different column(s), making the queries cited all very supportable without having to code specifically for each (just let the query optimizer worry about it).
Working with it in-memory may be valid, but you might (with vanilla .NET) have to do a lot more manual index management. By the sounds of it, i4o would certainly be worth investigating, but I don't have any existing comparison data.
i4o : is meant to speed up quering using linq by using indexes like old relational database days.
PLinq: is meant to use extra cpu cores to process the query in parallel.
If performance is your target, depending on your hardware, I say go with i4o it will make a hell of improvement.
I haven't used i4o but I have used PLINQ.
Without know specifics of the query you're trying to improve it's hard to say which (if any) will help.
PLINQ allows for multiprocessing of queries, where it's applicable. There are time however when parallel processing won't help.
i4o looks like it helps with indexing, which will speed up some calls, but not others.
Bottom line is, it depends on the query being run.

Categories