LINQ to SQL - calling built-in aggregate functions (i.e. STDEV) - c#

Is there any way to have LINQ translate queries directly to functions like SQL's STDEV? For example, the LINQ
from t in table
group t by t.something into g
select new {
avg = g.Average(o => o.number)
stdev = g.?????
}
gets turned into a SQL AVG. However, there is no support for standard deviation in LINQ.
One approach was suggested here:
Standard Deviation in LINQ
However, this is clunky and it also requires a bit of additional work to handle possible null values, which the sql STDEV function automatically ignores for you. Additionally, it results in less data being sent over the wire and it's faster to compute.
Any way to do this?

No, there's no way to do it directly through LINQ.
You could create a CLR aggregate which really just wraps the STDEV aggregate in SQL Server, but that would be a ton of excessive work (although it is possible).
However, the option of least resistance is going to be for each query that you want the standard deviation on, you would create a stored procedure which performs the operations and then use LINQ-to-SQL to call that.
Or, if you want to use composition in LINQ-to-SQL, you can create a user-defined table function in SQL server and then access that in LINQ-to-SQL and compose your query using that.

Related

More complex queries in LINQ than SQL

I am new to LINQ queries. I have read/researched about all advantages of LINQ queries over SQL but i have one basic question why do we need to use these queries as i feel their syntax is more complicated than traditional sql queries?
For example look at below example for simple Left Outer Join
var q=(from pd in dataContext.tblProducts
join od in dataContext.tblOrders on pd.ProductID equals od.ProductID into t
from rt in t.DefaultIfEmpty()
orderby pd.ProductID
select new
{
//To handle null values do type casting as int?(NULL int)
//since OrderID is defined NOT NULL in tblOrders
OrderID=(int?)rt.OrderID,
pd.ProductID,
pd.Name,
pd.UnitPrice,
//no need to check for null since it is defined NULL in database
rt.Quantity,
rt.Price,
})
.ToList();
So, the point of LINQ (Language Integrated Queries) is to provide easy ways of working with enumerable collections in executing memory. Contrast to SQL, which is a language for determining what the user gets from a set of data in a database.
Because of the SQL-like syntax, it's easy to confuse LINQ code with SQL, and think that they're 'alike' - they're really not. SQL gets a subset of data from a superset; LINQ is 'syntactic sugar' that hides common operations involving foreach loops.
For instance, this is a common programming pattern:
foreach(Thing thing in things)
{
if(thing.SomeProperty() == "Some Value")
return true;
}
...this is done rather easily in LINQ:
return things.Any(t => t.SomeProperty() == "Some Value");
The two code are functionally the same, and I'm pretty sure even compile to roughly the same IL code. The difference is how it looks to you.
You don't have to use LINQ; you can choose to use a standard foreach, and there are times, such as complex loops, where it is useful to do so. Ultimately it is a question of readability - my counter-question to you is, is the LINQ version of your foreach loop more, or less, readable than the original foreach loop?
If the answer is 'less', then I suggest converting it back to a foreach.
I'm by no means an sql or a linq expert, I use them both.
There is a trend to either make linq into something bad or a silver bullet depending on what side are you.
You need to seriously consider your project requirements in order to choose. The choice is not mutually exclusive. Take what is good from them both .
Advantages
Quick turn around for development
Queries can be dynamically
Tables are automatically created into class
Columns are automatically created into properties
Relationship are automatically appeaded to classes
Lambda expressions are awesome
Data is easy to setup and use
Disadvantages
No clear outline for Tiers
No good way of view permissions
Small data sets will take longer to build the query than execute
There is an overhead for creating queries
When queries are moved from sql to application side, joins are very slow
DBML concurrency issues
Hard to understand advance queries using Expressions
I found that programmers used to Sql will have a hard time figuring out the tricks with LINQ. But programmers with Sql knowledge, but haven't done a ton of work with it, will pick up linq quicker.
The main issue when people start using LINQ is that they keep thinking in the SQL way, they design the SQL query first and then translate it to LINQ. You need to learn how to think in the LINQ way and your LINQ query will become neater and simpler. For instance, in your LINQ you don't need joins. You should use Associations/Navigation Properties instead. Check this post for more details.

Querying with many (~100) search terms with Entity Framework

I need to do a query on my database that might be something like this where there could realistically be 100 or more search terms.
public IQueryable<Address> GetAddressesWithTown(string[] towns)
{
IQueryable<Address> addressQuery = DbContext.Addresses;
addressQuery.Where( x => towns.Any( y=> x.Town == y ) );
return addressQuery;
}
However when it contains more than about 15 terms it throws and exception on execution because the SQL generated is too long.
Can this kind of query be done through Entity Framework?
What other options are there available to complete a query like this?
Sorry, are we talking about THIS EXACT SQL?
In that case it is a very simple "open your eyes thing".
There is a way (contains) to map that string into an IN Clause, that results in ONE sql condition (town in ('','',''))
Let me see whether I get this right:
addressQuery.Where( x => towns.Any( y=> x.Town == y ) );
should be
addressQuery.Where ( x => towns.Contains (x.Town)
The resulting SQL will be a LOT smaller. 100 items is still taxing it - I would dare saying you may have a db or app design issue here and that requires a business side analysis, I have not me this requirement in 20 years I work with databases.
This looks like a scenario where you'd want to use the PredicateBuilder as this will help you create an Or based predicate and construct your dynamic lambda expression.
This is part of a library called LinqKit by Joseph Albahari who created LinqPad.
public IQueryable<Address> GetAddressesWithTown(string[] towns)
{
var predicate = PredicateBuilder.False<Address>();
foreach (string town in towns)
{
string temp = town;
predicate = predicate.Or (p => p.Town.Equals(temp));
}
return DbContext.Addresses.Where (predicate);
}
You've broadly got two options:
You can replace .Any with a .Contains alternative.
You can use plain SQL with table-valued-parameters.
Using .Contains is easier to implement and will help performance because it translated to an inline sql IN clause; so 100 towns shouldn't be a problem. However, it also means that the exact sql depends on the exact number of towns: you're forcing sql-server to recompile the query for each number of towns. These recompilations can be expensive when the query is complex; and they can evict other query plans from the cache as well.
Using table-valued-parameters is the more general solution, but it's more work to implement, particularly because it means you'll need to write the SQL query yourself and cannot rely on the entity framework. (Using ObjectContext.Translate you can still unpack the query results into strongly-typed objects, despite writing sql). Unfortunately, you cannot use the entity framework yet to pass a lot of data to sql server efficiently. The entity framework doesn't support table-valued-parameters, nor temporary tables (it's a commonly requested feature, however).
A bit of TVP sql would look like this select ... from ... join #townTableArg townArg on townArg.town = address.town or select ... from ... where address.town in (select town from #townTableArg).
You probably can work around the EF restriction, but it's not going to be fast and will probably be tricky. A workaround would be to insert your values into some intermediate table, then join with that - that's still 100 inserts, but those are separate statements. If a future version of EF supports batch CUD statements, this might actually work reasonably.
Almost equivalent to table-valued paramters would be to bulk-insert into a temporary table and join with that in your query. Mostly that just means you're table name will start with '#' rather than '#' :-). The temp table has a little more overhead, but you can put indexes on it and in some cases that means the subsequent query will be much faster (for really huge data-quantities).
Unfortunately, using either temporary tables or bulk insert from C# is a hassle. The simplest solution here is to make a DataTable; this can be passed to either. However, datatables are relatively slow; the over might be relevant once you start adding millions of rows. The fastest (general) solution is to implement a custom IDataReader, almost as fast is an IEnumerable<SqlDataRecord>.
By the way, to use a table-valued-parameter, the shape ("type") of the table parameter needs to be declared on the server; if you use a temporary table you'll need to create it too.
Some pointers to get you started:
http://lennilobel.wordpress.com/2009/07/29/sql-server-2008-table-valued-parameters-and-c-custom-iterators-a-match-made-in-heaven/
SqlBulkCopy from a List<>

Passing query data from LINQ to method in same query

I was able to create a LINQ statement that I thought was strange and wanted to see if anyone else had experience with it.
I've simplified it to this:
var x = db.Test
.Where(a => a.Field1 == Utils.CreateHash(Preferences.getValue(a.Field2)))
.FirstOrDefault();
Now how does this translate to database code? Wouldn't LINQ need to do a double query for every single row, i.e. for row a:
1) Query a.Field2
2) Return value to run Utils.CreateHash(Preferences.getValue(a.Field2))
3) Take that value from step 2 and compare it against a.Field1
4) Repeat 1-3 until I've gone through all the rows or returned a matching row
Wouldn't this be extremely inefficient? Or is LINQ smart enough to run this in a better way? Note, I haven't actually run this code so another possibility is a runtime error. Why wouldn't LINQ be smart enough to detect a conflict then and not let me compile it?
The query as is will not work since have a call to Utils.CreateHash in your lambda that you are trying to execute on the DB - in that context you cannot execute that method since there simply is no equivalent on the DB side hence the query will fail.
In general the ability of 3rd party Linq IQuerable providers (e.g. Linq to SQL, Linq to Entities) to access in memory constructs such as methods or classes is very limited, as a rule of thumb at most accessing primitive values or collections of primitives will work.
Just to add fast...
A good example to know how this works would be to write (extreme case I agree, but best :) or go through the source code for a custom (open source) LINQ provider (e.g. http://relinq.codeplex.com/ has one etc.).
Basically (I'm simplifying things here a bit), a LINQ provider can only 'map' to Db (supported SQL, functions) what he 'knows' about.
i.e. it has a standard set it can work with, other than that, and with your custom methods (that do not translate to constants etc.) in the frame, there is no way to resolve that on the 'Db/SQL side'.
E.g. with your 'custom' linq provider (not the case here) you could add a specific extension call e.g. .MyCalc() - which would be properly resolved and translated into SQL equivalent - and then you'd be able to use it.
Other than that, I think if I recall correct, provider will leave that as an expression, to resolve when it returns from the Db 'fetch', query operation. Or complain about it in certain cases.
Linq is based on IQueryable - and you can take a look at extension methods provided there for SQL equivalents supported.
hope this helps
EDIT: whether things 'work' or not doesn't matter - it still doesn't mean it'd execute on the Db context - i.e. it'd be unacceptable performance wise in most cases. IQueryable works with expressions (and if you look at the interface) - and linq is executed when you invoke or enumerate usually. At that point some of the expressions may evaluate to a const value that can be worked into a SQL, but not in your case.
Best way to test is to test back the SQL generated by query (possibly this one I think Translate LINQ to sql statement).
No.
The LINQ provider will run a single SELECT query that selects both fields, then execute your lambda expression with the two values for each returned row.

Linq efficiency - How do I best query a database for a list of values?

Let us say I have a database of Terms and a list of strings, is this a good (efficient) idea? It works smoothly, but I'm not sure it is scalable or the most efficient.
var results =
from t in Terms
join x in Targets on t.Term equals x
select t;
Here Terms is a database table with index table Term. Targets is an IEnumerable of strings. Terms might hold millions, Targets between 10-20 strings. Any thoughts?
Ultimately what matters, as far as efficiency is concerned, is if the query that is executed against the database is efficient. To see this, you can either use SQL Profiler or find an application that will show you SQL generated by linq-to-sql.
If you use SQL Profiler, be sure to have it look for stored procedures, as Linq-to-sql uses the exec_sql procedure to execute queries.
If you need to join two tables on one key, as in your example, there's no other way to express it than an actual join. What you have is as efficient as it CAN get.
However, change the select to return only the fields you're interested in, and make sure you trim them, because sql databases like to return char fields with trailing spaces, and they take time to process and transfer across the network.
Hmm, I didn't know you could join a local collection in like that. Perhaps that's a .Net 4.0 feature?
I have frequently issued queries like this:
IQueryable<Term> query =
from t in Terms
where Targets.Contains(t.Term)
select t;
There's a few caveats.
The variable x must be a List<string> reference. The variable x may not be an IList<string> reference.
Each string in the list is translated into a sql parameter. While linq to sql will happily translate many thousands of strings into parameters (I've seen 50k parameters), Sql Server will only accept ~2100. If you exceed this limit, you'll get a sql exception.
nvarchar vs varchar indexes.

Linq2Sql - Storing Complex Linq Queries for future dynamic execuction - raw text - possible?

I am having a lot of fun with Linq2Sql. Expression Trees have been great, and just the standard Linq2Sql syntax has been a lot of fun.
I am now down to part of my application where I have to somehow store queries in a database, that are custom for different customers that use the same database and same tables (well, view, but you know what I mean). Basically, I cant hard-code anything, and I have to leave the query language clear text so someone can write a new where-clause type query.
So, if that description was harsh, let me clarify:
In a previous version of our application, we used to do direct SQL calls to the db using raw SQL. Yea. it was fun, dirty, and it worked. We would have a database table fulled of different criteria like
(EventType = 6 and Total > 0)
or a subquery style
(EventType = 7
AND Exists (
select *
from events as e1
where events.EventType = e1.EventType
and e1.objectNumber = 89)
)
(sql injection anyone?)
In Linq2Sql, this is a little more challenging. I can make all these queries no problem in the CLR, but being able to pass dynamic where criterias to Linq is a little more challenging, especially if I want to perform a sub query (like the above example).
Some ideas I had:
Get the raw expression, and store it --- but I have no idea how to take the raw text expression and reverse it back to executable to object expression.
Write a SQl like language, and have it parse the code and generate Linq Expression -- wow, that could be a lot of fun
I am quite sure there is no SomeIqueryable.Where("EventType = 6 and Total > 54"). I was reading that it was available in beta1, but I don't see how you can do that now.
var exp2 = context.POSDataEventView.Where("EmployeeNumber == #0", 8310);
This would be the easiest way for me to deploy.. I think.
Store serialized Expressions -- wow.. that would be confusing to a user trying to write a query --- hell, I'm not sure I could even type it all out.
So, I am looking for some ideas on how I can store a query in some kind of clear text, and then execute it against my Linq2Sql objects in some fashion without calling the ExecuteSQL. I want to use the LinqObjects.
P.S. I am using pLinqo for this application if that helps. Its still linq2sql though.
Thanks in advance!
Perhaps the Dynamic LINQ Library (in the MSDN samples) would help?
In particular, usage like:
This should work with any IQueryable<T> source - including LINQ-to-Objects simply by calling .AsQueryable() on the sequence (typically IEnumerable<T>).

Categories