I have been using EF6 for a while and I'm at a point where I need to optimize to the maximum every single query I'm performing to the DB.
There is a point where I simply need to get a string based on a Guid, which is not precisely a complex query but I wanted to know what would be best practice and why:
a) Find/FindAsync
string senderName = Context.Senders.Find(senderId).Name;
b) Where, Select and FirstOrDefault/FirstOrDefaultAsync
string senderName = Context.Senders.Where(x => x.Id == senderId)
.Select(x => x.Name)
.FirstOrDefault();
I can't profile the SQL it's performing right now but since a) query seems "simpler", b) query seems to use defered execution (IQueryable) which could be more interesting even combined with async execution.
Am I right? What would be the best choice and why?
Using Find is much faster. where the quarry is very simple select * from t where id=[ID]
is much cleaner and there wont be any db check etc that happen in EF6 and on the top of that EF wont need to parse the Linq Where and Select statement.
And also for those people who hate EF, i have build an ORM library that work like EF and the old ADO.Net. with migration, code to db etc all those are offcourse optional. is 100% faster with test prov please check it EntityWorker.Core
As pointed out in the comments: a) loads the whole entity into memory while b) loads only the name. If you need the name only, then b is the much better choice.
Related
I am extremely stuck with getting the right information from the DB. So, basically the problem is that I need to add where closure in my statement to validate that it only retrieves the real and needed information.
public async Task<IEnumerable<Post>> GetAllPosts(int userId, int pageNumber)
{
var followersIds = _dataContext.Followees.Where(f => f.CaUserId == userId).AsQueryable();
pageNumber *= 15;
var posts = await _dataContext.Posts
.Include(p => p.CaUser)
.Include(p => p.CaUser.Photos)
.Include(c => c.Comments)
.Where(u => u.CaUserId == followersIds.Id) <=== ERROR
.Include(l => l.LikeDet).ToListAsync();
return posts.OrderByDescending(p => p.Created).Take(pageNumber);
}
As you can see the followersIds contains all the required Id which I need to validate in the post variable. However I have tried with a foreach but nothing seems to work here. Can somebody help me with this issue?
The short version is that you can change that error line you have marked above to something like .Where(u => followersIds.Contains(u.CaUserId) which will return all entities with an CaUserID that is contained in the followersIds variable, however this still has the potential to return a much larger dataset than you will actually need and be quite a large query. (You also might need to check the sytax just a bit, shooting from memory without an IDE open) You are including a lot of linked entities in that query above, so maybe you'd be better off using a Select query vs a Where query, which would load only the properties that you need from each entity.
Take a look at this article from Jon Smith, who wrote the book "Entity Framework Core In Action", where he talks about using Select queries and DTO's to only get out what you need. Chances are, you don't need every property of every entity you are asking for in the query you have above. (Maybe you do, what do I know :p) Using this might help you get something much more efficient for just the dataset you need. More lines of code in the query, but potentaily better performance on the back end and a lighter memory footprint.
SQL server is able to translate EF .First() using its function TOP(1). But when using Entity Framework's .Last() function, it throws an exception. SQL server does not recognize such functions, for obvious reasons.
I used to work it around by sorting descending and taking the first corresponding line :
var v = db.Table.OrderByDescending(t => t.ID).FirstOrDefault(t => t.ClientNumber == ClientNumberDetected);
This does it with a single query, but sorting the whole table (million rows) before querying...
Do I have good reasons to think there will be speed issues if I abuse of this technique ?
I thought of something similar... but it requires two query :
int maxID_of_Client = db.Where(t => t.ClientNumber == ClientNumberDetected).Max(t => t.ID);
var v = db.First(t => t.ID == maxID_of_Client);
It's consisting of retrieving the max ID of the client, then use this ID to retrieve the last line of the client.
It doesn't seems faster to query two times...
There must be a way to optimize this and use a single query without sorting millions of datas.
Unless there is something I don't understand, I'm probably not the first to think about this problem and I want to solve it for good !
Thanks in advance.
The assumption driving this question is that result sets with no ordering clause come back from your DB in any predictable order at all.
In reality, result sets that come back from SQL have no implicit ordering and none should be assumed.
Therefore, the result of
db.Table.FirstOrDefault(t => t.ClientNumber == ClientNumberDetected)
is actually indeterminate.
Whether you're taking first or last, without ordering it's all meaningless anyway.
Now, what goes to SQL where you add an ordering clause to your LINQ? It will be something similar to...
SELECT TOP(1) something FROM somewhere WHERE foo=bar ORDER BY somevalue
or, in the the descending/last case
SELECT TOP(1) something FROM somewhere WHERE foo=bar ORDER BY somevalue DESC
From SQL's POV, there's no significant difference here and your DB will be optimized for this sort of query. The index can be scanned in either direction, and the cost of each query above is the same.
TL;DR :
db.Table.OrderByDescending(t => t.ID)
.FirstOrDefault(t => t.ClientNumber == ClientNumberDetected)
is just fine.
I had the following:
List<Message> unreadMessages = this.context.Messages
.Where( x =>
x.AncestorMessage.MessageID == ancestorMessageID &&
x.Read == false &&
x.SentTo.Id == userID ).ToList();
foreach(var unreadMessage in unreadMessages)
{
unreadMessage.Read = true;
}
this.context.SaveChanges();
But there must be a way of doing this without having to do 2 SQL queries, one for selecting the items, and one for updating the list.
How do i do this?
Current idiomatic support in EF
As far as I know, there is no direct support for "bulk updates" yet in Entity Framework (there has been an ongoing discussion for bulk operation support for a while though, and it is likely it will be included at some point).
(Why) Do you want to do this?
It is clear that this is an operation that, in native SQL, can be achieved in a single statement, and provides some significant advantages over the approach followed in your question. Using the single SQL statement, only a very small amount of I/O is required between client and DB server, and the statement itself can be completely executed and optimized by the DB server. No need to transfer to and iterate through a potentially large result set client side, just to update one or two fields and send this back the other way.
How
So although not directly supported by EF, it is still possible to do this, using one of two approaches.
Option A. Handcode your SQL update statement
This is a very simple approach, that does not require any other tools/packages and can be performed Async as well:
var sql = "UPDATE TABLE x SET FIELDA = #fieldA WHERE FIELDB = #fieldb";
var parameters = new SqlParameter[] { ..., ... };
int result = db.Database.ExecuteSqlCommand(sql, parameters);
or
int result = await db.Database.ExecuteSqlCommandAsync(sql, parameters);
The obvious downside is, well breaking the nice linqy paradigm and having to handcode your SQL (possibly for more than one target SQL dialect).
Option B. Use one of the EF extension/utility packages
Since a while, a number of open source nuget packages are available that offer specific extensions to EF. A number of them do provide a nice "linqy" way to issue a single update SQL statement to the server. Two examples are:
Entity Framework Extended Library that allows performing a bulk update using a statement like:
context.Messages.Update(
x => x.Read == false && x.SentTo.Id == userID,
x => new Message { Read = true });
It is also available on github
EntityFramework.Utilities that allows performing a bulk update using a statement like:
EFBatchOperation
.For(context, context.Messages)
.Where(x => x.Read == false && x.SentTo.Id == userID)
.Update(x => x.Read, x => x.Read = true);
It is also available on github
And there are definitely other packages and libraries out there that provide similar support.
Even SQL has to do this in two steps in a sense, in that an UPDATE query with a WHERE clause first runs the equivalent of a SELECT behind the scenes, filtering via the WHERE clause, then applying the update. So really, I don't think you need to be worried about improving this.
Further, the reason why it's broken into two steps like this in LINQ is precisely for performance reasons. You want that "select" to be as minimal as possible, i.e. you don't want to load any more objects from the database into in memory objects than you have to. Only then do you alter objects (in the foreach).
If you really want to run a native UPDATE on the SQL side, you could use a System.Data.SqlClient.SqlCommand to issue the update, instead of having LINQ give you back objects that you then update. That will be faster, but then you conceptually move some of your logic out of your C# code object model space into the database model space (you are doing things in the database, not in your object space), even if the SqlCommand is being issued from your code.
I have the following:
var objectives = _objectivesRepository
.GetAll()
.Where(o => o.ExamId == examId || examId == 0)
.Include(o => o.ObjectiveDetails)
.ToList();
In a previous post one of the users said that it was important to put the where before the include in a LINQ query.
Can someone let me know if this is correct? Does order matter? How about if there are many where and includes ?
In Entity Framework yes it does matter, but only in certain scenarios. When using groupings or projections, it will fail to include the requested data.
See this blog post on the subject.
The actual answer, is that usually, the order does not matter significantly. Following your example statement, I would describe the logical translational steps to a relational query:
Get all objects, with all their properties (in relational algebra they are considered attributes)
Restrict the retrieved rows based on your condition ((relational algebra projection operation)
Restrict the attributes of the retrieved rows which are eagerly loaded (relational algebra selection operation)
In your specific query, the steps 2 and 3 are interchangeable without altering the final outcome. As stated here, this is the default case. Nevertheless, even if the final outcome would not change, the performance could be significantly be affected. This is the reason for which the modern databases have query optimizers which create an execution plan to optimize the specific query.
Nevertheless, this is not always the case. So, I suppose that you could always find a case where the above do not apply. Regarding performance, no assumptions are safe. You should always measure things. You could always use the SQL Server profiler to see the translation of your linq to entities query to the final SQL query. Then you could use the SQL server tools (like the query analyzer) to see the execution plan of the final SQL query.
Hope I helped!
I need to do a query on my database that might be something like this where there could realistically be 100 or more search terms.
public IQueryable<Address> GetAddressesWithTown(string[] towns)
{
IQueryable<Address> addressQuery = DbContext.Addresses;
addressQuery.Where( x => towns.Any( y=> x.Town == y ) );
return addressQuery;
}
However when it contains more than about 15 terms it throws and exception on execution because the SQL generated is too long.
Can this kind of query be done through Entity Framework?
What other options are there available to complete a query like this?
Sorry, are we talking about THIS EXACT SQL?
In that case it is a very simple "open your eyes thing".
There is a way (contains) to map that string into an IN Clause, that results in ONE sql condition (town in ('','',''))
Let me see whether I get this right:
addressQuery.Where( x => towns.Any( y=> x.Town == y ) );
should be
addressQuery.Where ( x => towns.Contains (x.Town)
The resulting SQL will be a LOT smaller. 100 items is still taxing it - I would dare saying you may have a db or app design issue here and that requires a business side analysis, I have not me this requirement in 20 years I work with databases.
This looks like a scenario where you'd want to use the PredicateBuilder as this will help you create an Or based predicate and construct your dynamic lambda expression.
This is part of a library called LinqKit by Joseph Albahari who created LinqPad.
public IQueryable<Address> GetAddressesWithTown(string[] towns)
{
var predicate = PredicateBuilder.False<Address>();
foreach (string town in towns)
{
string temp = town;
predicate = predicate.Or (p => p.Town.Equals(temp));
}
return DbContext.Addresses.Where (predicate);
}
You've broadly got two options:
You can replace .Any with a .Contains alternative.
You can use plain SQL with table-valued-parameters.
Using .Contains is easier to implement and will help performance because it translated to an inline sql IN clause; so 100 towns shouldn't be a problem. However, it also means that the exact sql depends on the exact number of towns: you're forcing sql-server to recompile the query for each number of towns. These recompilations can be expensive when the query is complex; and they can evict other query plans from the cache as well.
Using table-valued-parameters is the more general solution, but it's more work to implement, particularly because it means you'll need to write the SQL query yourself and cannot rely on the entity framework. (Using ObjectContext.Translate you can still unpack the query results into strongly-typed objects, despite writing sql). Unfortunately, you cannot use the entity framework yet to pass a lot of data to sql server efficiently. The entity framework doesn't support table-valued-parameters, nor temporary tables (it's a commonly requested feature, however).
A bit of TVP sql would look like this select ... from ... join #townTableArg townArg on townArg.town = address.town or select ... from ... where address.town in (select town from #townTableArg).
You probably can work around the EF restriction, but it's not going to be fast and will probably be tricky. A workaround would be to insert your values into some intermediate table, then join with that - that's still 100 inserts, but those are separate statements. If a future version of EF supports batch CUD statements, this might actually work reasonably.
Almost equivalent to table-valued paramters would be to bulk-insert into a temporary table and join with that in your query. Mostly that just means you're table name will start with '#' rather than '#' :-). The temp table has a little more overhead, but you can put indexes on it and in some cases that means the subsequent query will be much faster (for really huge data-quantities).
Unfortunately, using either temporary tables or bulk insert from C# is a hassle. The simplest solution here is to make a DataTable; this can be passed to either. However, datatables are relatively slow; the over might be relevant once you start adding millions of rows. The fastest (general) solution is to implement a custom IDataReader, almost as fast is an IEnumerable<SqlDataRecord>.
By the way, to use a table-valued-parameter, the shape ("type") of the table parameter needs to be declared on the server; if you use a temporary table you'll need to create it too.
Some pointers to get you started:
http://lennilobel.wordpress.com/2009/07/29/sql-server-2008-table-valued-parameters-and-c-custom-iterators-a-match-made-in-heaven/
SqlBulkCopy from a List<>