Proper Implementation of IQueryable as a Member Variable on a Model? - c#

I'm new to the world of .Net, ASP, Entity Framework, and Linq, so bear with me...
I originally had a model set up like the following;
public class Pad
{
[Key]
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public Guid PadId { get; set; }
public string StreetAddress { get; set; }
public int ZipCode { get; set; }
public virtual ICollection<Mate> Mates { get; set; }
public virtual ICollection<Message> Messages { get; set; }
}
(A pad is a chat room - it contains many Mates and many thousands of Messages)
In a Web API controller, I have a function designed to get the 25 most recent messages from the specified Pad.
public IHttpActionResult GetMessages(string id)
{
var padGuid = new Guid(id);
// Try to find the pad referenced by the passed ID
var pads = (from p in db.Pads
where p.PadId == padGuid
select p);
if (pads.Count() <= 0)
{
return NotFound();
}
var pad = pads.First();
// Grab the last 25 messages in this pad.
// PERFORMANCE PROBLEM
var messages = pad.Messages.OrderBy(c => c.SendTime).Skip(Math.Max(0, pad.Messages.Count() - 25));
var messagesmodel = from m in messages
select m.toViewModel();
return Ok(messagesmodel);
}
The problem with this implementation is that it seems as though EF is loading the entire set of messages (multiple thousands) into memory before getting the count, ordering, etc. Resulting in a massive performance penalty in Pads with a ton of messages.
My first thought was to convert the Pad.Messages member type to an IQueryable instead of an ICollection - this should defer the Linq queries to SQL, or so I thought. Upon doing this, however, functions like pad.Messages.Count() above complain - turns out that pad.Messages is a null value! And it breaks in other places, such as adding new Messages to the Pad.Messages value.
What is the proper implementation of something like this? In other places, I've seen the recommended solution is constructing a second query against the context such as select Messages where PadId = n, but this hardly seems intuitive when I can have a Messages member value to work with.
Thank you!

var messages = db.Pads.Where(p => p.PadId == padGuid)
.SelectMany(pad => p.Messages.OrderBy(c => c.SendTime)
.Skip(Math.Max(0, pad.Messages.Count() - 25)));

How do you plan to count the number of results in a DB query without actually executing the DB query?
How do you plan to get the first item in the query without actually executing the query?
You cannot do either; both must execute the query.

Related

Entity Framework core select causes too many queries

I have the following method which is meant to build me up a single object instance, where its properties are built via recursively calling the same method:
public ChannelObjectModel GetChannelObject(Guid id, Guid crmId)
{
var result = (from channelObject in _channelObjectRepository.Get(x => x.Id == id)
select new ChannelObjectModel
{
Id = channelObject.Id,
Name = channelObject.Name,
ChannelId = channelObject.ChannelId,
ParentObjectId = channelObject.ParentObjectId,
TypeId = channelObject.TypeId,
ChannelObjectType = channelObject.ChannelObjectTypeId.HasValue ? GetChannelObject(channelObject.ChannelObjectTypeId.Value, crmId) : null,
ChannelObjectSearchType = channelObject.ChannelObjectSearchTypeId.HasValue ? GetChannelObject(channelObject.ChannelObjectSearchTypeId.Value, crmId) : null,
ChannelObjectSupportingObject = channelObject.ChannelObjectSupportingObjectId.HasValue ? GetChannelObject(channelObject.ChannelObjectSupportingObjectId.Value, crmId) : null,
Mapping = _channelObjectMappingRepository.Get().Where(mapping => mapping.ChannelObjectId == channelObject.Id && mapping.CrmId == crmId).Select(mapping => new ChannelObjectMappingModel
{
CrmObjectId = mapping.CrmObjectId
}).ToList(),
Fields = _channelObjectRepository.Get().Where(x => x.ParentObjectId == id).Select(field => GetChannelObject(field.Id, crmId)).ToList()
}
);
return result.First();
}
public class ChannelObjectModel
{
public ChannelObjectModel()
{
Mapping = new List<ChannelObjectMappingModel>();
Fields = new List<ChannelObjectModel>();
}
public Guid Id { get; set; }
public Guid ChannelId { get; set; }
public string Name { get; set; }
public List<ChannelObjectMappingModel> Mapping { get; set; }
public int TypeId { get; set; }
public Guid? ParentObjectId { get; set; }
public ChannelObjectModel ParentObject { get; set; }
public List<ChannelObjectModel> Fields { get; set; }
public Guid? ChannelObjectTypeId { get; set; }
public ChannelObjectModel ChannelObjectType { get; set; }
public Guid? ChannelObjectSearchTypeId { get; set; }
public ChannelObjectModel ChannelObjectSearchType { get; set; }
public Guid? ChannelObjectSupportingObjectId { get; set; }
public ChannelObjectModel ChannelObjectSupportingObject { get; set; }
}
this is connecting to a SQL database using Entity Framework Core 2.1.1
Whilst it technically works, it causes loads of database queries to be made - I realise its because of the ToList() and First() etc. calls.
However because of the nature of the object, I can make one huge IQueryable<anonymous> object with a from.... select new {...} and call First on it, but the code was over 300 lines long going just 5 tiers deep in the hierarchy, so I am trying to replace it with something like the code above, which is much cleaner, albeit much slower..
ChannelObjectType, ChannelObjectSearchType, ChannelObjectSupportingObject
Are all ChannelObjectModel instances and Fields is a list of ChannelObjectModel instances.
The query takes about 30 seconds to execute currently, which is far too slow and it is on a small localhost database too, so it will only get worse with a larger number of db records, and generates a lot of database calls when I run it.
The 300+ lines code generates a lot less queries and is reasonably quick, but is obviously horrible, horrible code (which I didn't write!)
Can anyone suggest a way I can recursively build up an object in a similar way to the above method, but drastically cut the number of database calls so it's quicker?
I work with EF6, not Core, but as far as I know, same things apply here.
First of all, move this function to your repository, so that all calls share the DbContext instance.
Secondly, use Include on your DbSet on properties to eager load them:
ctx.DbSet<ChannelObjectModel>()
.Include(x => x.Fields)
.Include(x => x.Mapping)
.Include(x => x.ParentObject)
...
Good practice is to make this a function of context (or extension method) called for example BuildChannelObject() and it should return the IQueryable - just the includes.
Then you can start the recursive part:
public ChannelObjectModel GetChannelObjectModel(Guid id)
{
var set = ctx.BuildChannelObject(); // ctx is this
var channelModel = set.FirstOrDefault(x => x.Id == id); // this loads the first level
LoadRecursive(channelModel, set);
return channelModel;
}
private void LoadRecursive(ChannelObjectModel c, IQueryable<ChannelObjectModel> set)
{
if(c == null)
return; // recursion end condition
c.ParentObject = set.FirstOrDefault(x => x.Id == c?.ParentObject.Id);
// all other properties
LoadRecursive(c.ParentObject, set);
// all other properties
}
If all this code uses the same instance of DbContext, it should be quite fast. If not, you can use another trick:
ctx.DbSet<ChannelObjectModel>().BuildChannelObjectModel().Load();
This loads all objects to memory cache of your DbContext. Unfortunately, it dies with context instance, but it makes those recursive calls much faster, since no database trip is made.
If this is still to slow, you can add AsNoTracking() as last instruction of BuildChannelObjectModel().
If this is still to slow, just implement application wide memory cache of those objects and use that instead of querying database everytime - this works great if your app is a service that can have long startup, but then work fast.
Whole another approach is to enable lazy loading by marking navigation properties as virtual - but remember that returned type will be derived type anonymous proxy, not your original ChannelObjectModel! Also, properties will load only as long you don't dispose the context - after that you get an exception. To load all properties with the context and then return complete object is also a little bit tricky - easiest (but not the best!) way to do it to serialize the object to JSON (remember about circural references) before returning it.
If that does not satisfy you, switch to nHibernate which I hear has application wide cache by default.

How do you make EntityFramework generate efficient SQL queries for related objects?

I am trying to work out how to use the .NET EntityFramework to generate both readable and natural code and efficient SQL query statements when fetching related entities. For example, given the following code-first definition
public class WidgetContext : DbContext
{
public DbSet<Widget> Widgets { get; set; }
public DbSet<Gizmo> Gizmos { get; set; }
}
public class Widget
{
public virtual int Id { get; set; }
[Index]
[MaxLength(512)]
public virtual string Name { get; set; }
public virtual ICollection<Gizmo> Gizmos { get; set; }
}
public class Gizmo
{
public virtual long Id { get; set; }
[Index]
[MaxLength(512)]
public virtual string Name { get; set; }
public virtual Widget Widget { get; set; }
public virtual int WidgetId { get; set; }
}
I want to be able to write code like
using (var wc = new WidgetContext())
{
var widget = wc.Widgets.First(x => x.Id == 123);
var gizmo = widget.Gizmos.First(x => x.Name == "gizmo 99");
}
and see a SQL query created along the lines of
SELECT TOP (1) * from Gizmos WHERE WidgetId = 123 AND Name = 'gizmo 99'
So that the work of picking the right Gizmo is performed by the database. This is important because in my use case each Widget could have thousands of related Gizmos and in a particular request I only need to retrieve one at a time. Unfortunately the code above causes the EntityFramework to create SQL like this instead
SELECT * from Gizmos WHERE WidgetId = 123
The match on Gizmo.Name is then being performed in memory by scanning the complete set of related Gizmo entities.
After a good deal of experimentation, I have found ways of creating the efficient SQL use I am looking for in the entity framework, but only by using ugly code which is much less natural to write. The example below illustrates this.
using System.Data.Entity;
using System.Data.Entity.Core.Objects.DataClasses;
using System.Linq;
static void Main(string[] args)
{
Database.SetInitializer(new DropCreateDatabaseAlways<WidgetContext>());
using (var wc = new WidgetContext())
{
var widget = new Widget() { Name = "my widget"};
wc.Widgets.Add(widget);
wc.SaveChanges();
}
using (var wc = new WidgetContext())
{
var widget = wc.Widgets.First();
for (int i = 0; i < 1000; i++)
widget.Gizmos.Add(new Gizmo() { Name = string.Format("gizmo {0}", i) });
wc.SaveChanges();
}
using (var wc = new WidgetContext())
{
wc.Database.Log = Console.WriteLine;
var widget = wc.Widgets.First();
Console.WriteLine("=====> Query 1");
// queries all gizmos associated with the widget and then runs the 'First' query in memory. Nice code, ugly database usage
var g1 = widget.Gizmos.First(x => x.Name == "gizmo 99");
Console.WriteLine("=====> Query 2");
// queries on the DB with two terms in the WHERE clause - only pulls one record, good SQL, ugly code
var g2 = ((EntityCollection<Gizmo>) widget.Gizmos).CreateSourceQuery().First(x => x.Name == "gizmo 99");
Console.WriteLine("=====> Query 3");
// queries on the DB with two terms in the WHERE clause - only pulls one record, good SQL, ugly code
var g3 = wc.Gizmos.First(x => x.Name == "gizmo 99" && x.WidgetId == widget.Id);
Console.WriteLine("=====> Query 4");
// queries on the DB with two terms in the WHERE clause - only pulls one record, also good SQL, ugly code
var g4 = wc.Entry(widget).Collection(x => x.Gizmos).Query().First(x => x.Name == "gizmo 99");
}
Console.ReadLine();
}
Query 1 demonstrates the 'fetch everything and filter' approach that is generated by the natural usage of the entity objects.
Queries 2,3 and 4 above all generate what I would consider to be an efficient SQL query - one that returns a single row and has two terms in the WHERE clause, but they all involve very stilted C# code.
Does anyone have a solution that will allow natural C# code to be written and generate efficient SQL utilization in this case?
I should note that I have tried replacing ICollection with EntityCollection in my Widget object to allow the cast to be removed from the Query 2 code above. Unfortunately this leads to an EntityException telling me that
The object could not be added to the EntityCollection or
EntityReference. An object that is attached to an ObjectContext cannot
be added to an EntityCollection or EntityReference that is not
associated with a source object.
when I try to retrieve any related objects.
Any suggestions appreciated.
Ok, further digging has let me get as close as I think is possible to where I want to be (which, to reiterate, is code that looks OO but generates efficient DB usage patterns).
It turns out that Query2 above (casting the related collection to an EntityCollection) actually isn't a good solution, since although it generates the desired query type against the database, the mere act of fetching the Gizmos collection from the widget is enough to make the entity framework go off to the database and fetch all related Gizmos - i.e. performing the query that I am trying to avoid.
However, it's possible to get the EntityCollection for a relationship without calling the getter of the collection property, as described here http://blogs.msdn.com/b/alexj/archive/2009/06/08/tip-24-how-to-get-the-objectcontext-from-an-entity.aspx. This approach sidesteps the entity framework fetching related entities when you access the Gizmos collection property.
So, an additional read-only property on the Widget can be added like this
public IQueryable<Gizmo> GizmosQuery
{
get
{
var relationshipManager = ((IEntityWithRelationships)this).RelationshipManager;
return (IQueryable<Gizmo>) relationshipManager.GetAllRelatedEnds().First( x => x is EntityCollection<Gizmo>).CreateSourceQuery();
}
}
and then the calling code can look like this
var g1 = widget.GizmosQuery.First(x => x.Name == "gizmo 99");
This approach generates SQL that efficiently fetches only a single row from the database, but depends on the following conditions holding true
Only one relationship from the source to the target type. Having multiple relationships linking a Widget to Gizmos would mean a more complicated predicate would be needed in the .First() call in GizmosQuery.
Proxy creation is enabled for the DbContext and the Widget class is eligible for proxy generation (https://msdn.microsoft.com/en-us/library/vstudio/dd468057%28v=vs.100%29.aspx)
The GizmosQuery property must not be called on objects that are newly created using new Widget() since these will not be proxies and will not implement IEntityWithRelationships. New objects that are valid proxies can be created using wc.Widgets.Create() instead if necessary.

Class with multiple List Properties of same type but with different restrictions

Here's my problem: I have a class that have 2 list properties of the same class type (but with some different restriction as on how to be filled), let's say:
public class Team
{
[Key]
public int IDTeam { get; set; }
public string TeamName { get; set; }
public List<Programmer> Members { get; set; }
public List<Programmer> Leaders { get; set; }
public LoadLists(MyProjectDBContext db)
{
this.Members = db.Programmers.Where(p => p.IDTeam = this.IDTeam
&& (p.Experience == "" || p.Experience == null)).ToList();
this.Leaders = db.Programmers.Where(p => p.IDTeam = this.IDTeam
&& (p.Experience != null && p.Experience != "")).ToList();
}
}
public class Programmer
{
[Key]
public int IDProgrammer { get; set; }
[ForeignKey("Team")]
public int IDTeam { get; set; }
public virtual Team Team { get; set; }
public string Name { get; set; }
public string Experience { get; set; }
}
At some point, I need to take a list of Teams, with it's members and leaders, and for this I would assume something like:
return db.Teams
.Include(m => m.Members.Where(p => p.Experience == "" || p.Experience == null)
.Include(l => l.Leaders.Where(p => p.Experience != null && p.Experience != "")
.OrderBy(t => t.TeamName)
.ToList();
And, of course, in this case I would be assuming it wrong (cause it's not working at all).
Any ideas on how to achieve that?
EDIT: To clarify a bit more, the 2 list properties of the team class should be filled according to:
1 - Members attribute - Should include all related proggramers with no experience (proggramer.Experience == null or "");
2 - Leaders attribute - Should include all related proggramers with any experience (programmer.Experiente != null nor "");
EDIT 2: Here's the MyProjectDbContext declaration:
public class MyProjectDBContext : DbContext
{
public DbSet<Team> Teams { get; set; }
public DbSet<Programmer> Programmers { get; set; }
}
You are talking about EntityFramework (Linq to entities) right? If so, Include() is a Method of Linq To Entities to include a sub-relation in the result set. I think you should place the Where() outside of the Inlcude().
On this topic you'll find some examples on how to use the Include() method.
So I suggest to add the Include()'s first to include the relations "Members" and "Leaders" and then apply your Where-Statement (can be done with one Where()).
return db.Teams
.Include("Team.Members")
.Include("Team.Leaders")
.Where(t => string.IsNullOrWhitespace(t.Members.Experience) ... )
What is unclear to me is your where criteria and your use-case at all as you are talking of getting a list of Teams with Leaders and Members. May above example will return a list of Teams that match the Where() statement. You can look though it and within that loop you can list its members and leaders - if that is the use-case.
An alternative is something like this:
return db.Members
.Where(m => string.IsNullOrWhitespace(m.Experience))
.GroupBy(m => m.Team)
This get you a list of members with no experience grouped by Team. You can loop the groups (Teams) and within on its members. If you like to get each team only once you can add a Distinct(m => m.Team) at the end.
Hope this helps. If you need some more detailed code samples it would help to understand your requirements better. So maybe you can say a few more words on what you expect from the query.
Update:
Just read our edits which sound interesting. I don't think you can do this all in one Linq-To-Entities statement. Personally I would do that on the getters of the properties Members and Leaders which do their own query (as a read-only property). To get performance for huge data amount I would even do it with SQL-views on the DB itself. But this depends a little on the context the "Members" and "Leaders" are used (high frequent etc).
Update 2:
Using a single query to get a table of teams with sublists for members and leaders I would do a query on "Programmers" and group them nested by Team and Experience. The result is then a list of groups (=Teams) with Groups (Experienced/Non-experience) with Programmers in it. The final table then can be build with three nested foreach-Statements. See here for some grouping examples (see the example "GroupBy - Nested").
Whenever you fetch entities, they will be stored in the context -- regardless of the form they are "selected" in. That means you can fetch the teams along with all the necessary related entities into an anonymous type, like this:
var teams =
(from team in db.Teams
select new {
team,
relatedProgrammers = team.Programmers.Where(
[query that gets all leaders OR members])
}).ToList().Select(x => x.team);
It looks like we're throwing away the relatedProgrammers field here, but those Programmer entities are still in memory. So, when you execute this:
foreach (var team in teams) team.LoadLists(db);
...it will populate the lists from the programmers that were already fetched, without querying the database again (assuming db is the same context instance as above).
Note: I haven't tested this myself. It's based on a similar technique shown in this answer.
EDIT - Actually, it looks like your "leaders" and "members" cover all programmers associated with a team, so you should be able to just do Teams.Include(t => t.Programmers) and then LoadLists.

Best way to query using EF

Using LINQ, I am having trouble querying my DbContext in an efficient way.
The database contains 700,000 over entities which have a date and a name and other information.
In my code, I have a new list of objects (which can potentially have 100,000 elements) coming in and I would like to query my database and deduct which information are new entity or which information are existing entities that needs to be updated.
I would like to do it in a very efficient way (with a single query if possible).
This is my code :
public class MyDbContext : DbContext
{
public DbSet<MyEntity> MyEntities { get; set; }
}
public class MyEntity
{
[Key]
public Guid Id { get; set; }
public DateTime Date { get; set; }
public string Name { get; set; }
public double Amount { get; set; }
public string Description { get; set; }
}
public class IncomingInfo
{
public DateTime Date { get; set; }
public string Name { get; set; }
public double Amount { get; set; }
}
public class Modifier
{
public void AddOrUpdate(IList<IncomingInfo> info)
{
using (var context = new MyDbContext())
{
//Find the new information
//to add as new entities
IEnumerable<MyEntity> EntitiesToAdd = ??
//Find the information
//to update in existing entities
IEnumerable<MyEntity> EntitiesToUpdate = ??
}
}
}
Can someone help me constructing my query?
Thank you very much.
Edit :
Sorry I forgot to explain how do I consider two entities equal.
There are equal if the Date and the Name property are identical.
I first tried to build a predicate using LinqKit PredicateBuilder without much success (encountered the error of parameter too large, had to make multiple queries which took time).
So far the most successful way I found was to implement a LEFT OUTER join and join the incoming list to the DbSet
Which I implemented this way :
var values = info.GroupJoin(context.MyEntities,
inf => inf.Name + inf.Date.ToString(),
ent => ent.Name + ent.Date.ToString(),
(inf, ents) => new { Info = inf, Entities = ents })
.SelectMany(i => i.Entities.DefaultIfEmpty(),
(i, ent) => new { i.Info.Name, i.Info.Amount, i.Info.Date, ToBeAdded = ent == null ? true : false });
IEnumerable<MyEntity> EntitiesToAdd = values.Where(i => i.ToBeAdded)
.Select(i => new MyEntity
{
Id = Guid.NewGuid(),
Amount = i.Amount,
Date = i.Date,
Name = i.Name,
Description = null
}).ToList();
My test contains 700,000 entities in database. The incoming info list contains 70,000 items; where 50,000 are existing entities and 20,000 are new entities.
This query takes around 15 seconds to execute which does not seem right to me.
Hopefully this is enough to ask for help. Can someone help me one this ?
Thank you very much.
I read the pastebin response from #Leniency and it covers some of the same stuff I was going to say, like querying a date range and comparing on there. The problem with that method though is that (depending on how those dates are set) it might return all 700K+ records in the database, which would give you the absolute worst performance.
My suggestion is that you analyze your network topology to see how expensive your calls to the database really are. I'm assuming this is running on a (web) server which is receiving these IncomingInfo objects from clients. If this server is closely connected to your database server (or on the same machine) then you might be better off not optimizing your calls to the database.
Also, if you have control over the behavior of the clients, you might want to force them to send only like 25 to 100 records with each request. This would make it so that you could deal with them in much more manageable chunks. The client might have to send 100 or more requests to the server (which you could do async so that they get sent ~5 at a time, depending on expected load profiles), but at least it wouldn't be sitting there for 5+ minutes waiting to get a response back from the server for a single request.
BTW, the GroupJoin call that you said took 15 seconds probably is having to download all 700K records before doing the join. You see, joins can't be done on objects that don't exist on the same machine, it either has to send all the IncomingInfo objects (or at least the Name+Date.ToString() concatenations) to the database, or it has to request all the records from the database before any joins can be done. You would probably have to look at the SQL that is being sent to the database in order to tell which method is being used. But you would probably find that querying the database for matches one at a time would probably be faster than the join in this case.
Hope that helps! ;)

Fluent NHibernate - 'failed to lazily initialize a collection' - Querying Over a Collection

I have a self-referential model class:
public class Word
{
public virtual int Id { get; set; }
public virtual string Text { get; set; }
public virtual IList<Word> Synonyms { get; set; }
public virtual int Extra { get; set; }
}
I am trying to query for all synonyms of a word where Extra is 1 and returning the list of words in JSON format in my MVC 3 app:
[HttpPost]
public JsonResult Synonyms(string wordText)
{
using (var session = ...)
{
using (var tx = session.BeginTransaction())
{
var word = session.QueryOver<Word>()
.Where(w => w.Text == wordText)
.SingleOrDefault();
var results = new SynonymsResults()
{
Words = word.Synonyms
.Where(x => x.Extra == 1)
.Select(x => x.Text)
};
return Json(results);
}
}
}
I'm getting an error that it fails to lazily initialize the collection. I'm not sure why though, since I am still in the same session here and even using a transaction.
The result executes much later, after the action has finished running and outside of the session. The fact that you have returned Json(results) doesn't mean that these results will be immediately serialized into JSON. The action will first finish executing, then the ASP.NET MVC pipeline will handle the execution to the result (OnResultExecuting) and it is at this point that the JavaScriptSerializer will touch the collection. At that point of time sessions and transactions are long gone.
So either instruct your ORM to eagerly fetch the dependent collections or even better take a look at the following series of blog posts and use view models.
To get rid of the error, install Nuget Pacakage Manager Newton.JSON and map to the appropriate project and decorate the property with [JsonIgnore], this will skip the serialization issues and you won't get the error.

Categories