I have a method that can pass in a where clause to a query. This method then returns a DataSet.
What is faster and more efficient? To pass the where clause through so the SQL Server sends back less data (but has to do more work) or let the web server handle it through LINQ?
Edit:
This is assuming that the SQL Server is more powerful than the web server (as probably should be the case).
Are you using straight up ADO.Net to perform your data acccess? If so, then yes - use a WHERE clause in your SQL and limit the amount of data sent back to your application.
SQL Server is effecient at this, you can design indexes to help it access data and you are transmitting less data back to your client application.
Imagine you have 20,000 rows in a table, but you are only interested in 100 of them. Of course it is much more effecient to only grab the 100 rows from the source and send them back, rather than the whole lot which you then filter in your web application.
You have tagged linq-to-sql, if that's the case then using a WHERE clause in your LINQ statement will generate the WHERE clause on the SQL Server.
But overall rule of thumb, just get the data you are interested in. It's less data over the wire, the query will generally run faster (as long as it's optimised via indexes etc) and your client application will have less work to do, it's already got only the data it's interested in.
SQL Server is good at filtering data, in fact: that's what it's built for, so always make use of that. If you filter in C#; you won't be able to make use of any indexes you have on your tables. It's going to be far less efficient.
Selecting all rows only to discard many/most of them is wasteful and it will definitely show in the performance.
If you are not using any sort of ORM then use the where condition
at database level as i believe filtering should happen at database
level.
But if your using any ORM like Entity Framework or Linq to SQL then from performance point of view its is the same as your Linq Where clause will ultimately translated into an SQL Where clause as far as your using where clause on IQuerable.
From the point of efficiency, it is SQL server that shoud do the job. If it does not requires multiple database callings, it is always a better solution to use SQL server. But, if you already have a Dataset from a database, you can filter it with LINQ
Related
Looking for a piece of advice......
I am analyzing an old C# web method to rewrite in REST. In the old method I observed that one private method is getting called multiple(nearly 25+) times. This private method connects to database and just calls simple select statement with simple where condition. This returns single row.
In my new method, I am planning to select the entire table content (400 rows approximately) and store in .net side whenever any data needed from this result set thought of querying it by writing LINQ or lambda. Is this the correct way of storing this huge dataset during this main method execution? Is this the correct approach?? How can I optimize? Can I store this data in a static datatable?
Well it depends.
Calling Sql server select with where 25 times per one method is not good thing.
Each out of process call consumes much more time comparing with the local one (despite that ADO Net connection caching whatever)
To cache 400 rows looks not a big deal actually if it is the same set for each 25 calls.
It is small enough to be processed by C# linq query in effective way.
However what about scalability? Sql server can handle hundred thousands of rows in effective way using indexes, special caching etc.
Let's imagine that database grows and you have 4M rows to be checked to get the one row.
I do not think that it is fine to have 4M records to be read by C# and filtered locally in the memory.
So you have to check the data and the code that is written.
The best way is to mix two approaches into the one.
For example you can check why so many db calls are in one method.
Is it possible to reduce db calls quantity?
Is there any way to do some join or additional calculations on the SQL server side?
Actually SQL is much more than flat table storage tool.
You can create stored procedure (or two or three) and implement some logic using the SP to reduce db calls. To reduce db calls quantity and keep data on the SQL side at the same time.
For your scenario, the approach that I would recommend would be:
Construct a stored procedure
Now, you may ask, why another stored procedure when you're already invoking a SELECT statement to the Database.
Separation of database logic from business logic. Ideally, your .NET code should only consists of business logic.
Unless you are already using Parameterized Query in your SELECT statement, otherwise, Stored Procedure offers:
Better protection against SQL Injection Attacks
Lower network traffic (sending Stored Procedure Name vs Dynamic Query)
SQL able to optimize Stored Procedure in its cached query plan better than Dynamic Query. Optimizing Query Plan
Retrieval of records via Stored Procedure will perform way better than using foreach/LINQ/Lambda in C#. Database is optimized for searching for data and it has larger memory and better CPU threading to handle these data retrieval operation. This is especially so if you have proper DB index on your search criteria. We will cover this in the next section.
Reference: Dynamic SQL vs Stored Procedure
Create a non-clustered index for your search criteria
If your search criteria is not currently indexed, you may consider creating a non-clustered index for that column. This will further improve your retrieval speed.
With these in place, you can support up to thousands of retrieval to this table without any significant impact to your application performance.
I'm building a small part for my ASP.NET MVC website the will require the following steps:
Query my SQL Server DB for data from a particular table.
For each data row returned take that data and run another query (stored procedure) that returns some values.
take those values and run a calculation.
now that I have that calculation I will need to take it and store it along with some other data items from the first query in-memory (or, not if you think otherwise) and filter and sort. after filtering and sorting displaying the results to the user.
What do you guys recommend doing for such a scenario where you need to have an in-memory data representation that will have to be manipulated? should I just stick with DataTable? I found a component called QueryADataSet which allows running querys against .NET DataSets and DataTables - does anyone knows it? how about using that kind of solution? recommended?
would love to hear your thoughts...
Thanks
Change the website to behave as follows:
Send a single query to SQL that uses set operations to apply the calculations to the relevant data and return the result.
This is not a joke nor irony. Using the app server for doing 'sort' and 'filter' is not the proper place. Aside from the lack of an adequate toolset, there are issues around consitency/caching behavior. Data manipulation belongs to the back end, this is why you use SQL and not a key-value store. Not even mentioning the counter-pattern of 'retrieve a set and then call DB for each row'.
If the processing cannot be performed on the database server, then LINQ is a good toolset to filter and sort data in the application.
I just did the following:
var items =
from c in Items
where
c.Pid == Campaigns.Where(d=>d.Campaign_name=="US - Autos.com").First().Pid
&& c.Affid == Affiliates.Where(e=>e.Add_code=="CD4729").First().Affid
select c;
Then I want to update a field for all the results:
items.ToList().ForEach(c=>c.Cost_per_unit=8);
SubmitChanges();
When querying, I know I can use:
GetCommand(items);
To see the SQL that will be executed.
But on submitting changes, I don't know how to do that.
I looked at:
GetChangeSet()
And I see that there are about 18 updates in this case.
QUESTION 1: are there efficiency issues using L2S to update this way?
QUESTION 2 (maybe this should be a separate question but I'll try it here): is there a general way to just monitor the SQL statements that go to SQL Server 2008 R2? I guess I could disable all but TCP for the instance and WireShark the port (if the stuff is even readable), but I'm hoping there's an easier way.
The DataContext has a Log property that you can hook into to dump the executed SQL. There is also Linq To Sql Profiler which is awesome.
When querying, I know I can use:
GetCommand(items); To see the SQL
that will be executed.
But on submitting changes, I don't
know how to do that.
You may be able to use this:
yourContext.Log = Console.Out;
But I'm not certain if this logs all SQL or just selects.
Your SQL is different for each affected object. L2S will use dependencies to determine the order in which objects must be saved (if the order is important), then will construct SQL insert, update, and delete statements to persist the changes. The generated statements (especially for update) are dependent upon which properties of the object have changed. There is no way in particular to view the entire batch that will be executed.
QUESTION 1: are there efficiency
issues using L2S to update this way?
No, this is how any other automated data access layer would perform updates.
QUESTION 2 (maybe this should be a
separate question but I'll try it
here): is there a general way to just
monitor the SQL statements that go to
SQL Server 2008 R2? I disable all but
TCP for the instance and WireShark the
port, but I'm hoping there's an easier
way.
This should be another question, but the answer is to use a trace. While you can trace with any version of SQL Server (including Express), the SQL Server Profiler tool that comes with all versions other than Express makes this very easy to do. If you want more information on this, feel free to ask another question with your specific issues.
Regarding efficiency - Of course there are more efficient ways of performing the update as it exists in your question; For example, a SQL query like this would be much more efficient (there would not be a SELECT query executed, code executed to pull the data from SQL into a set of objects, code executed to perform the update on the objects, code executed to determine which objects changed, code executed to generate the appropriate SQL statements, and UPDATE queries executed on the SQL server):
UPDATE Items SET Cost_per_unit = #CostPerUnit
FROM Items
JOIN Campaigns ON ...
JOIN Affiliates ON ...
WHERE ...
But Linq to SQL doesn't provide any way of building such a query. If you are going to be updating thousands of rows in a very simple way similar to your question, you may be better off running a SQL statement like this instead. If there aren't going to be that many rows updated or if the logic is more complicated, then keep it in Linq to SQL.
What is the benefit of writing a custom LINQ provider over writing a simple class which implements IEnumerable?
For example this quesiton shows Linq2Excel:
var book = new ExcelQueryFactory(#"C:\Users.xls");
var administrators = from x in book.Worksheet<User>()
where x.Role == "Administrator"
select x;
But what is the benefit over the "naive" implementation as IEnumerable?
A Linq provider's purpose is to basically "translate" Linq expression trees (which are built behind the scenes of a query) into the native query language of the data source. In cases where the data is already in memory, you don't need a Linq provider; Linq 2 Objects is fine. However, if you're using Linq to talk to an external data store like a DBMS or a cloud, it's absolutely essential.
The basic premise of any querying structure is that the data source's engine should do as much of the work as possible, and return only the data that is needed by the client. This is because the data source is assumed to know best how to manage the data it stores, and because network transport of data is relatively expensive time-wise, and so should be minimized. Now, in reality, that second part is "return only the data asked for by the client"; the server can't read your program's mind and know what it really needs; it can only give what it's asked for. Here's where an intelligent Linq provider absolutely blows away a "naive" implementation. Using the IQueryable side of Linq, which generates expression trees, a Linq provider can translate the expression tree into, say, a SQL statement that the DBMS will use to return the records the client is asking for in the Linq statement. A naive implementation would require retrieving ALL the records using some broad SQL statement, in order to provide a list of in-memory objects to the client, and then all the work of filtering, grouping, sorting, etc is done by the client.
For example, let's say you were using Linq to get a record from a table in the DB by its primary key. A Linq provider could translate dataSource.Query<MyObject>().Where(x=>x.Id == 1234).FirstOrDefault() into "SELECT TOP 1 * from MyObjectTable WHERE Id = 1234". That returns zero or one records. A "naive" implementation would probably send the server the query "SELECT * FROM MyObjectTable", then use the IEnumerable side of Linq (which works on in-memory classes) to do the filtering. In a statement you expect to produce 0-1 results out of a table with 10 million records, which of these do you think would do the job faster (or even work at all, without running out of memory)?
You don't need to write a LINQ provider if you only want to use the LINQ-to-Objects (i.e. foreach-like) functionality for your purpose, which mostly works against in-memory lists.
You do need to write a LINQ provider if you want to analyse the expression tree of a query in order to translate it to something else, like SQL. The ExcelQueryFactory you mentioned seems to work with an OLEDB-Connection for example. This possibly means that it doesn't need to load the whole excel file into memory when querying its data.
In general performance. If you have some kind of index you can do a query much faster than what is possible on a simple IEnumerable<T>.
Linq-To-Sql is a good example for that. Here you transform the linq statement into another for understood by the SQL server. So the server will do the filtering, ordering,... using the indexes and doesn't need to send the whole table to the client which then does it with linq-to-objects.
But there are simpler cases where it can be useful too:
If you have a tree index over the propery Time then a range query like .Where(x=>(x.Time>=now)&&(x.Time<=tomorrow)) can be optimized a lot, and doesn't need to iterate over every item in the enumerable.
LINQ will provide deferred execution as much as maximum possible to improve the performance.
IEnumurable<> and IQueryable<> will totally provide different program implementations. IQueryable will give native query by building expression tree dynamically which provides good performance indeed then IEnumurable.
http://msdn.microsoft.com/en-us/vcsharp/ff963710.aspx
if we are not sure we may use var keyword and dynamically it will initialize a most suitable type.
I have an application that uses DataTables to perform grouping, filtering and aggregation of data. I want to replace datatables with my own data structures so we don't have any unnecessary overhead that we get from using datatables. So my question is if Linq can be used to perform the grouping, filtering and aggregation of my data and if it can is the performance comparable to datatables or should I just hunker down and write my own algorithms to do it?
Thanks
Dan R.
Unless you go for simple classes (POCO etc), your own implementation is likely to have nearly as much overhead as DataTable. Personally, I'd look more at using tools like LINQ-to-SQL, Entity Framework, etc. Then you can use either LINQ-to-Objects against local data, or the provider-specific implementation for complex database queries without pulling all the data to the client.
LINQ-to-Objects can do all the things you mention, but it involves having all the data in memory. If you have non-trivial data, a database is recommended. SQL Server Express Edition would be a good starting point if you look at LINQ-to-SQL or Entity Framework.
Edited re comment:
Regular TSQL commands are fine and dandy, but you ask about the difference... the biggest being that LINQ-to-SQL will provide the entire DAL for you, which is a huge time saver, as well as making it possible to get a lot more compile-time safety. But is also allows you to use the same approach to look at your local objects and your database - for example, the following is valid C# 3.0 (except for [someDataSource], see below):
var qry = from row in [someDataSource]
group row by row.Category into grp
select new {Category = grp.Key, Count = grp.Count(),
TotalValue = grp.Sum(x=>x.Value) };
foreach(var x in qry) {
Console.WriteLine("{0}, {1}, {2}", x.Category, x.Count, x.TotalValue);
}
If [someDataSource] is local data, such as a List<T>, this will execute locally; but if this is from your LINQ-to-SQL data-context, it can build the appropriate TSQL at the database server. This makes it possible to use a single query mechanism in your code (within the bounds of LOLA, of course).
You'd be better off letting your database handle grouping, filtering and aggregation. DataTables are actually relatively good at this sort of thing (their bad reputation seems to come primarily from inappropriate usage), but not as good as an actual database. Moreover, without a lot of work on your part, I would put my money on the DataTable's having better performance than your homegrown data structure.
Why not use a local database like Sqlserver CE or firebird embedded? (or even ms access! :)). Store the data in the local database, do the processing using simple sql queries and pull the data back. Much simpler and likely less overhead, plus you don't have to write all the logic for grouping/aggregates etc. as the database systems already have that logic built in, debugged and working.
Yes, you can use LINQ to do all those things using your custom objects.
And I've noticed a lot of people suggest that you do this type of stuff in the database... but you never indicated where the database was coming from.
If the data is coming from the database then at the very least the filtering should probably happen there, unless you are doing something specialized (e.g. working from a cached set of data). And even then, if you are working with a significant amount of cached data, you might do well to put that data into an embedded database like SQLite, as someone else has already mentioned.