Performance issue loading data from CRM

Performance issue loading data from CRM - c#

Currently our website is facing a problem with slow response times (more than 1 min) when we query CRM from our website. We are using CRM 2011 though a web service. When we investigated we found that the time was spent at the point of querying CRM.
We have used the CrmSvcUtil.exe to generate our proxy classes that map to CRM entities. Then we create an instance of context and query CRM using LINQ with C#.
When we query, We load our parent object with LINQ to CRM and then we use LoadProperty to load the related children.
I would like to know if anyone out there using a different method of querying CRM, and if you have come across issues like this in your implementation.
I’ve included a simplified sample query below.
public void SelectEventById(Guid id)
{
var crmEventDelivery = this.ServiceContext.EventDeliverySet.FirstOrDefault(eventDelivery => eventDelivery.Id == id);
if (crmEventDelivery != null)
{
this.SelectCrmEventDeliveryWithRelationships(crmEventDelivery);
}
}
private void SelectCrmEventDeliveryWithRelationships(EventDelivery crmEventDelivery)
{
// Loading List of Venue Delivery on parent crmEventDelivery thats been passed
this.ServiceContext.LoadProperty(crmEventDelivery, Attributes.EventDelivery.eventdelivery_venuedelivery);
foreach (var venueDelivery in crmEventDelivery.eventdelivery_venuedelivery)
{
// Loading Venue on each Venue Delivery
ServiceContext.LoadProperty(venueDelivery, Attributes.VenueDelivery.venue_venuedelivery);
}
// Loading List of Session Delivery on parent crmEventDelivery thats been passed
this.ServiceContext.LoadProperty(crmEventDelivery, Attributes.EventDelivery.eventdelivery_sessiondelivery);
foreach (var sessionDelivery in crmEventDelivery.eventdelivery_sessiondelivery)
{
// Loading Presenters on each Session Delivery
ServiceContext.LoadProperty(sessionDelivery, Attributes.SessionDelivery.sessiondelivery_presenterbooking);
}
}

Like mentioned on the other answers your main problem is the number of web service calls. What no one mentioned is that you can retrieve many objects with a single call using query joins. So you could try something like:
var query_join = (from e in ServiceContext.EventDeliverySet
join v in ServiceContext.VenueDeliverySet on e.EventDeliveryId equals v.EvendDeliveryId.Id
join vn in ServiceContext.VenueSet on v.VenueDeliveryId equals vn.VenueDeliveryId.Id
join s in ServiceContext.SessionDeliverSet on e.EventDeliveryId equals s.EvendDeliveryId.Id
where e.EventDeliveryId == id // *improtant (see below)
select new { EventDelivery = e, VenueDelivery = v, Venue = vn, SessionDeliver = s }).ToList();
Then you can run a foreach on query_join and put it together.
***improtant: do not use the base Id property (e.Id), stick with e.EntityNameId.Value (don't know why but it took a while for me to figure it out. Id returns default Guid value "00000..").

Based on what you've provided this looks like a standard lazy-load issue, except my guess is that each lazy load is resulting in a web service call. This would be called a "chatty" service architecture. Your goal should be to make as few service calls as possible to retrieve data for a single request.
Calling to fill in details can seem like a good idea because you can re-use the individual service methods for cases where you only want data 1 or 2 levels deep, or all the way down, but you pay a steep performance penalty.
You would be better off defining a web service call that returns a complete object graph in scenarios like this. I don't know if/what you're using for an ORM layer within the CRM but if you make a specific call to fetch a complete graph of Deliveries then the ORM can eager-fetch the data into fewer SQL statements. Fewer calls to the web service (and subsequently fewer calls into the CRM's data store) should noticeably improve your performance.

So I can see why this might take a while. I think as everyone else have commented you are making quite a few web service calls. If you get a moment it would be interesting to know if the individual calls are slow or its just because you are making so many, I would suggest profiling this.
In any case I suspect you would get better performance by not using the strongly type entities.
I would suggest using a FetchXml query, this will allow you to build a Sql Xml-Style query. Basically you should be able to replace your many we bservice calls with a single call. The MSDN has an example, also check out the Stunnware FetchXml designer, Products > Stunnware Tools > Download and Evaluation. It was built for Crm 4 but supports virtually all the features you will need.
If you dont fancy that, you could also try a QueryExpression or OData, both of which should allow you to get your data in one hit.

After trying all the suggested tips in the other answers and doing further profiling, in our particular scenario with our use of CRM, and how it was set up - we decided to simply bypass it.
We ended up using some of the in-built views, this is not a recommended approach in the CRM documentation, but we really needed to achieve higher performance and the CRM approach in this instance was just in our way.
To anyone else reading this, see the other answers too.

Because the query does not know what fields will be needed later, all columns are returned from the entity when only the entity is specified in the select clause. In order to specify only the fields you will use, you must return a new object in the select clause, specifying the fields you want to use.
So instead of this:
var accounts = from acct in xrm.AccountSet
where acct.Name.StartsWith("Test")
select acct;
Use this:
var accounts = from acct in xrm.AccountSet
where acct.Name.StartsWith("Test")
select new Account()
{
AccountId = acct.AccountId,
Name = acct.Name
};
Check out this post more details.
To Linq or not to Linq

Related

Does EF automatically load many to many references collections

Imagine we have the following db structure
Organization
{
Guid OrganizationId
//....
}
User
{
Guid UserId
}
OrganizationUsers
{
Guid OrganizationId
Guid UserId
}
When the edmx generated this class it abstracts away the OrganizationUsers into a many to many references. So no POCO class will be generated for it.
Say I'm loading data from my context, but to avoid Cartesian Production, I don't use an include I make two seperate queries.
using(var context = new EntitiesContext())
{
var organizationsQuery = context.Where(FilterByParent);
var organizations = organizationsQuery.ToList();
var users = organizationsQuery.SelectMany(x => x.Users).Load();
}
Is it safe to assume that the connected entitites are loaded?
Would this make any difference if I loaded the users directly from the DBSet?

From database point of view:
Is it safe to assume that the connected entitites are loaded?
Yes It's safe, because first organizations being tracked by EF Change Tracker and then by calling Load in next statement EF knows that results should be attach to tracked entities
Would this make any difference if I loaded the users directly from the DBSet?
In fact using Load this way does nothing better than Include!
If you use Include EF translate it to LEFT JOIN, if you use Load it will be translated to INNER JOIN, and if you fetch Users directly by their ids using Contains method it will be translated to IN on Sql side.
In Load and Contains cases you execute two query (in two pass) on Sql, but in Include case it's being done in one pass, so overally it's outperform your approach.
You can compare these approaches yourself using Sql Profiler tool.
Update:
Based on conversations I realized that the main issue of Johnny is just existence of OrganizationUsers object. So I suggest to change your approach from DB First to Code first then this object explicitly could be exist! See this to help you on this way
Also another approach that I guess maybe work is customizing T4 Template that seems harder but not impossible!

Eager-loading using LINQ to SQL with Include()

I have spent 2 days bashing my head against this problem, and I can't seem to crack it (the problem that is). The same code was working fine until I added database relationships, and I have since read a lot about lazy-loading.
I have two database tables with a 1:1 relationship between them. PromoCode table tracks codes, and has a PK column named id. CustomerPromo table has a column PromoId which is linked to the PromoCode table id. These two tables have no other relationships. I generated all this in SQL Server Management Studio, then generated the model from the database.
To make matters slightly more complicated, I'm doing this inside a WCF data service, but I don't believe that should make a difference (it worked before database relationships were added). After enabling logging, I always get an Exception in the log file with text:
DataContext accessed after Dispose.
My function currently returns all entries from the table:
using (MsSqlDataContext db = new MsSqlDataContext())
{
// This causes issues with lazy-loading
return db.PromoCodes.ToArray();
}
I have read numerous articles/pages/answers and they all say to use the .Include() method. But this doesn't work for me:
return db.PromoCodes.Include(x => x.CustomerPromos).ToArray();
I've tried the "magic string" version as well:
return db.PromoCodes.Include("CustomerPromos").ToArray();
The only code I've managed to get to work is this:
PromoCode[] toReturn = db.PromoCodes.ToArray();
foreach (var p in toReturn)
p.CustomerPromos.Load();
return toReturn;
I've tried added a .Where() criteria to the query, I've tried .Select(), I've tried moving the .Include() after the .Where() (this answer says to do it last, but I think that's only due to nested queries). I've read about scenarios where .Include() will silently fail, and after all this I'm no closer.
What am I missing? Syntax problem? Logic problem? Once I get this "simple" case working, I also need to have nested Includes (i.e. if CustomerPromo table had a relationship to Customer).
Edit
Including all relevant code. The rest is either LINQ to SQL, or WCF Data Services configuration. This is all there is:
[WebGet]
[OperationContract]
public PromoCode[] Test()
{
using (MsSqlDataContext db = new MsSqlDataContext())
{
return db.PromoCodes.Include(x => x.CustomerPromos).ToArray();
}
}
If I call that through a browser directly (e.g. http://<address>:<port>/DataService.svc/Test) I get a reset connection message and have to look up the WCF logs to find out "DataContext accessed after Dispose.". If I make the same query through an AJAX call in a webpage I get an AJAX error with status error (that's all!).

I prematurely posted the previous answer when I didn't actually have any child data to fetch. At the time I was only interested in fetching parent data, and that answer worked.
Now when I actually need child data as well I find it didn't work completely. I found this article which indicates that .Include() (he says Including() but I'm not sure if that's a typo) has been removed, and the correct solution is to use DataLoadOptions. In addition, I also needed to enable Unidirectional Serialisation.
And to top it off, I no longer need DeferredLoadingEnabled. So now the final code looks like this:
using (MsSqlDataContext db = new MsSqlDataContext())
{
DataLoadOptions options = new DataLoadOptions();
options.LoadWith<PromoCode>(p => p.CustomerPromos);
db.LoadOptions = options;
return db.PromoCodes.ToArray();
}
After setting Unidirectional Serialisation it will happily return a parent object without having to load the child, or explicitly set DeferredLoadingEnabled = false;.

Edit: This did not solve the problem entirely. At the time of testing there wasn't any child data, and I wasn't trying to use it. This only allowed me to return the parent object, it doesn't return child objects. For the full solution see this answer.
Contrary to everything I've read, the answer is not to use .Include() but rather to change the context options.
using (MsSqlDataContext db = new MsSqlDataContext())
{
db.DeferredLoadingEnabled = false; // THIS makes all the difference
return db.PromoCodes.ToArray();
}
This link posted in the question comments (thanks #Virgil) hint at the answer. However I couldn't find a way to access LazyLoadingEnabled for LINQ to SQL (I suspect it's for EntityFramework instead). This page indicated that the solution for LINQ to SQL was DeferredLoadingEnabled.
Here is a link to the MSDN documentation on DeferredLoadingEnabled.

How to get updated entities with Linq after ExecuteMultipleRequest

So, I'm using CRM 2011. In order to improve performance I have started using the ExecuteMultipleRequest. It works fine when creating many records at once. Great! The issue I have is that right after I have done a
context.Execute(myMultipleRequest);
and gotten a valid response with id's back, if I then do a
context.myEntitiesSet.Where(x => x.Name == "foo")
(basically query the objects just created) I don't get valid objects back, meaning their id's are empty (Guid.Empty).
So, it seems I have to choose to either use:
use context.Create(), context.Update(), context.Where(...), et.c. or
use context.Execute(multiple) and context.RetrieveMultiple()
There doesn't seem to be a middle ground, as the Context doesn't seem to update which entities it is tracking when I'm using the ExecuteMultipleRequest. That is my basic problem. I can create objects just fine, but if I want to query them I can't use a linq query on the context, I must then use RetrieveMultiple.
Have I gotten this backwards, or is this well known when using CRM? I am an experienced developer, but relatively new to CRM.
Should I have to call context.AttachObject() myself for all newly created entities when using ExecuteMultipleRequest?
Any help would be appreciated. Oh, and I'm using early bound objects.

I don't believe the CrmLinqProvider has been extended to handle your instance. The ExecuteMultipleRequest returns an ExecuteMultipleResponse object that contains the results of each request. You'll need to loop through this to determine the ids, and update them yourself.

CQRS + ES - Where to query Data needed for business logic?

I'm using CQRS + ES and I have a modeling problem that can't find a solution for.
You can skip the below and answer the generic question in the title: Where would you query data needed for business logic?
Sorry of it turned out to be a complex question, my mind is twisted at the moment!!!
Here's the problem:
I have users that are members of teams. It's a many to many relationship. Each user has an availability status per team.
Teams receive tickets, each with a certain load factor, that should be assigned to one of the team's members depending on their availability and total load.
First Issue, I need to query the list of users that are available in a team and select the one with the least load since he's the eligible for assignment.(to note that this is one of the cases, it might be a different query to run)
Second Issue, load factor of a ticket might change so i have to take that into consideration when calculating the total load per user . Noting that although ticket can belong to 1 team, the assignment should be based on the user total load and not his load per that team.
Currently a TicketReceivedEvent is received by this bounded context and i should trigger a workflow to assign that ticket to a user.
Possible Solutions:
The easiest way would be to queue events and sequentially send a command AssignTicketToUser and have a service query the read model for the user id, get the user and user.assignTicket(Ticket). Once TicketAssignedEvent is received, send the next assignment command. But it seems to be a red flag to query the read model from within the command handler! and a hassle to queue all these tickets!
Have a process manager per user with his availability/team and tickets assigned to that user. In that case we replace the query to the read side by a "process manager lookup" query and the command handler would call Ticket.AssignTo(User). The con is that i think too much business logic leaked outside the domain model specifically that we're pulling all the info/model from the User aggregate to make it available for querying
I'm inclined to go with the first solution, it seems easier to maintain, modify/extend and locate in code but maybe there's something i'm missing.

Always (well, 99.99% of cases) in the business/domain layer i.e in your "Command" part of CQRS. This means that your repositories should have methods for the specific queries and your persistence model should be 'queryable' enough for this purpose. This means you have to know more about the use cases of your Domain before deciding how to implement persistence.
Using a document db (mongodb, raven db or postgres) might make work easier. If you're stuck with a rdbms or a key value store, create querying tables i.e a read model for the write model, acting as an index :) (this assumes you're serializing objects). If you're storing things relationally with specific table schema for each entity type (huge overhead, you're complicating your life) then the information is easily queryable automatically.

Why can't you query the aggregates involved?
I took the liberty to rewrite the objective:
Assign team-ticket to user with the lowest total load.
Here we have a Ticket which should be able to calculate a standard load factor, a Team which knows its users, and a User which knows its total load and can accept new tickets:
Update: If it doesn't feel right to pass a repository to an aggregate, it can be wrapped in a service, in this case a locator. Doing it this way makes it easier to enforce that only one aggregate is updated at a time.
public void AssignTicketToUser(int teamId, int ticketId)
{
var ticket = repository.Get<Ticket>(ticketId);
var team = repository.Get<Team>(teamId);
var users = new UserLocator(repository);
var tickets = new TicketLocator(repository);
var user = team.GetUserWithLowestLoad(users, tickets);
user.AssignTicket(ticket);
repository.Save(user);
}
The idea is that the User is the only aggregate we update.
The Team will know its users:
public User GetGetUserWithLowestLoad(ILocateUsers users, ILocateTickets tickets)
{
User lowest = null;
foreach(var id in userIds)
{
var user = users.GetById(id);
if(user.IsLoadedLowerThan(lowest, tickets))
{
lowest = user;
}
}
return lowest;
}
Update: As a ticket may change load over time, the User needs to calculate its current load.
public bool IsLoadedLowerThan(User other, ILocateTickets tickets)
{
var load = CalculateLoad(tickets);
var otherLoad = other.CalculateLoad(tickets);
return load < otherLoad;
}
public int CalculateLoad(ILocateTickets tickets)
{
return assignedTicketIds
.Select(id => tickets.GetById(id))
.Sum(ticket.CalculateLoad());
}
The User then accepts the Ticket:
public void AssignTicket(Ticket ticket)
{
if(ticketIds.Contains(ticket.Id)) return;
Publish(new TicketAssignedToUser
{
UserId = id,
Ticket = new TicketLoad
{
Id = ticket.Id,
Load = ticket.CalculateLoad()
}
});
}
public void When(TicketAssignedToUser e)
{
ticketIds.Add(e.Ticket.Id);
totalLoad += e.Ticket.Load;
}
I would use a process manager / saga to update any other aggregate.

You can query the data you need in your application service. This seems to be similar to your first solution.
Usually, you keep your aggregates cross-referenced, so I am not quite sure where the first issue comes from. Each user should have a list of teams it belongs to and each group has the list of users. You can complement this data with any attributes you want, including, for example, availability. So, when you read your aggregate, you have the data directly available. Surely, you will have lots of data duplication, but this is very common.
In the event sourced model never domain repositories are able to provide any querying ability. AggregateSource by Yves Reynhout is a good reference, here is the IRepository interface there. You can easily see there is no "Query" method in this interface whatsoever.
There is also a similar question Domain queries in CQRS

Querying from the Controller a List<T> obtained fromt the repository increase coupling?

I have an ASP.NET MVC application coded with C#. The application is structured this way:
Controller
Repository
LINQ to Entities (Entity Framework)
View
I use the Repository (_ProductRep) to query the LINQ to Entities and give to the Controller actual entities or List<T>, not IQueriables<T>.
I would like to have some help about a situation where I have more than a doubt. I have the following code:
List<Monthly_Report> lproduct_monthlyReport = _ProductRep.GetArchiveReport(product.Prod_ID, lmonth, lyear);
After I get this lproduct_monthlyReport I need to query it inside a foreach and get a specific record. Currently I implemented the solution like this:
foreach (var item in litemList)
{
var lproductItem_monthlyReport = lproduct_monthlyReport.Single(m => m.Item_ID == item.Item_ID);
// Other code
}
Where litemList is the list of all the possible items a product can have.
I wanted to know whether this solution sensibly increase the coupling (and violates the law of Demeter) or it is acceptable because I am actually querying a List<T> and not an IQueriable<T>. Correct me if I am wrong, but I guess that since the List does not need to access the EF DataContext, there is no coupling between Controller and EF.
In case I am wrong, the only solution I can think about is to substitute the query with a Repository method (that still I have to implement):
var lproductItem_monthlyReport_ProductRep.GetArchiveReport(product.Prod_ID, lmonth, lyear, item.Item_ID);
with this solution however the Repository makes one query with 4 conditions every loop cycle whilst in the previous solution the repository was making a query with just one conditions.
May you please enlighten me on this issue? Thanks.
PS: I need both variables lproduct_monthlyReport and lproductItem_monthlyReport inside the loop, I cannot just use one of them
PPS: I know that I should have a Business Service Layer between Controller and Repository, it is my next step.

Returning Lists from your repository will give you awful performance, because you lose the deferred execution behaviour. Basically your repository will retrieve every single record, and not related entities, into memory, and turn them into a List, which then gets processed in memory. If you want to access a related entity, it'll need another database hit. If you stick with IEnumerable (or IQueryable), then you are hiding the nuances of the entity framework behaviour from the client, but still getting the advantages like lazy loading and deferred execution.
Ignoring the specifics of your Repository for now, if you do this:
List<Product> products = MyEntities.Products.ToList();
Product product1 = products.Single(p => p.Id = 1);
it will perform much worse than this:
IEnumerable<Product> products = MyEntities.Products;
Product product1 = products.Single(p => p.Id = 1);
The first one will perform a SELECT in the database with no WHERE clauses, then instantiate .Net objects for every result, then query that in-memory list. The second will do nothing until you access a property on product1 and will at that point issue a database command to just retrieve the 1 product, and only instantiate that 1 product.
The difference between the 2 may not be noticeable with small data sets, but as the data set gets larger this will get worse and worse. Throw in a connected entity (or worse still entity collection), and you'll get potentially thousands of database hits, where if you stuck with IEnumerable you'd get 1.

I would probably have function like this GetArchiveReport(int prodID, int lmonth, int lyear, IEnumerable<int> itemIDs) that would do a itemIDs.Contains(tbl.ID) inside your query
var SelectedReports = _ProductRep.GetArchiveReport(product.Prod_ID, lmonth, lyear, litemList.Select(item => item.Item_ID));
foreach(var prodItem in SelectedReports)
{
//Do code
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.