Not sure if it's the best title for the question... maybe someone could rename it for me?
My question is regarding performance of reading and combining data in c# ServiceStack wrapper for Redis and how the calls work internally.
I will explain two scenarios that will hopefully yield in a final result. One scenario has the list of category id's attached to the Transaction so that the Category can be stored independently.
Question: My end goal is to retrieve all transactions that have category 'food'.
I have tried to number other points where clarity would help my understanding. Consider there being 10,000 transactions and each transaction had on average 3 categories.
Note: There is a related question at ServiceStack.Net Redis: Storing Related Objects vs. Related Object Ids however doesn't explain the efficiency.
Example A
public class Transaction
{
public List<string> CategoryIds;
}
Example B
public class Transaction
{
public List<string> CategoryNames;
}
Code
var transactionClient = redisClient.GetTypedClient<Transaction>();
//1. is this inefficient returning all transactions?
// is there any filtering available at this part?
var allTransactions = transactionClient.GetAll();
//2. In the case of Example A where the categories are stored as id's
// how would I map the categories to a transaction?
// maybe I have a List that has a container with the Transaction associated with a
// list of Categories, however this seems inefficient as I would have to loop
// through all transactions make a call to get their Categories and then
// populate the container datatype.
//3. If we are taking Example B how can I efficiently just retrieve the transactions
// where they have a category of food.
The efficiency is less network calls vs more data. Data in Redis just gets blobbed, most of the time a single API call maps 1:1 with a redis server operation. Which means you can think about the perf implications as simply downloading a json dataset blob from a remote server's memory and deserializing it on the client - which is effectively all that happens.
In some APIs such as GetAll() it requires 2 calls, 1 to fetch all the ids in the Entity set, and the other to fetch all the records with those ids. The source code of the Redis Client is quite approachable so I recommend having a look to see exactly what's happening.
Because you've only got 3 categories, it's not that much extra data you're saving by trying to filter on the server.
So your options are basically:
Download the entire entity dataset and filter on the client
Maintain a custom index mapping from Category > Ids
More Advanced: Use a server-side LUA operation to apply server side filtering (requires Redis 2.6)
Related
I define the index API of a controller as the following:
[HttpGet]
public IEnumerable<Blog> GetDatas()
{
return _context.Blogs;
}
This always returns empty even-thought the database contains many blogs. However, when I do the following for test reasons only, entity framework manages to see the data and can return all the blogs in the database:
[HttpGet]
public IEnumerable<Blog> GetDatas()
{
var blogs = _context.blogs.ToList();
return _context.Blogs;
}
Any thoughts?
(maybe related to my other unanswered question).
Update 1
To avoid confusion around deferred execution in LINQ; I've tried the following two methods and using neither of the methods the returned json object contains the information already in the database. In other words, the serialized objects do not reflect entities persisted in the database. I think these methods would trigger execution of LINQ query, correct?
// Method 1:
[HttpGet]
public async Task<ActionResult<IEnumerable<Blog>>> GetDatas()
{
return await _context.Blogs.ToListAsync().ConfigureAwait(false);
}
// Method 2:
[HttpGet]
public IEnumerable<Blog> GetDatas()
{
return _context.Blogs.ToList();
}
As Daniel said, this is by design. See What are the benefits of a Deferred Execution in LINQ? for an extended discussion but essentially data is loaded when it is used, not when it is requested. The only way you can see that it's empty is in the debugger; your runtime code doesn't see it that way because as soon as you try and find out whether the first form is empty or not it will fill with data, at which point (the point of use) it doesn't matter that it was empty up to that point - nothing was using it to find out whether it was empty or full
Think of it a bit like Schrondinger's cat
It's quite helpful actually:
var w = worldPopulation.Where(e => e.Gender = Gender.Male)
if(name!=null)
w=w.Where(e=>e.Name == name)
The first query, if it ran immediately, could see 3.5 billion results being downloaded from your db to your client (a low spec machine compared with the server), then the name filter would reduce it to a few million. Better to only download a few million into your slow, low spec machine over a very slow network, in the first place.. right?
One benefit of only running the query when you actually ask for the data is that at that point you finally actually KNOW you want the data. Up to that point you might never have needed it, so downloading it would be a waste of resources
Use a scoped or context pool for database instead of singleton or transient; i.e., use:
services.AddDbContextPool<BlogsContext>(options => {options.UseSqlServer();});
and avoid registrations such as (note ServiceLifetime.Singleton):
services.AddDbContext<BlogsContext>(options => {options.UseSqlServer();}, ServiceLifetime.Singleton);
I have a somewhat complex permission system that uses six database tables in total and in order to speed it up, I would like to cache these tables in memory instead of having to hit the database every page load.
However, I'll need to update this cache when a new user is added or a permission is changed. I'm not sure how to go about having this in memory cache, and how to update it safely without causing problems if its accessed at the same time as updating
Does anyone have an example of how to do something like this or can point me in the right direction for research?
Without knowing more about the structure of the application, there are lots of possible options. One such option might be to abstract the data access behind a repository interface and handle in-memory caching within that repository. Something as simple as a private IEnumerable<T> on the repository object.
So, for example, say you have a User object which contains information about the user (name, permissions, etc.). You'd have a UserRepository with some basic fetch/save methods on it. Inside that repository, you could maintain a private static HashSet<User> which holds User objects which have already been retrieved from the database.
When you fetch a User from the repository, it first checks the HashSet for an object to return, and if it doesn't find out it gets it from the database, adds it to the HashSet, then returns it. When you save a User it updates both the HashSet and the database.
Again, without knowing the specifics of the codebase and overall design, it's hard to give a more specific answer. This should be a generic enough solution to work in any application, though.
I would cache items as you use it, which means on your data layer when you are getting you data back you check on your cache if it is available there otherwise you go to the database and cache the result after.
public AccessModel GetAccess(string accessCode)
{
if(cache.Get<string>(accessCode) != null)
return cache.Get<string>(accessCode);
return GetFromDatabase(accessCode);
}
Then I would think next on my cache invalidate strategy. You can follow two ways:
One would be set expire data to be 1 hour and then you just hit the database once in a hour.
Or invalidate the cache whenever you update the data. That is for sure the best but is a bit more complex.
Hope it helps.
Note: you can either use ASP.NET Cache or another solution like memcached depending on your infrastructure
Is it hitting the database every page load that's the problem or is it joining six tables that's the problem?
If it's just that the join is slow, why not create a database table that summarizes the data in a way that is much easier and faster to query?
This way, you just have to update your summary table each time you add a user or update a permission. If you group all of this into a single transaction, you shouldn't have issues with out-of-sync data.
You can take advantage of ASP.NET Caching and SqlCacheDependency Class. There is article on MSDN.
You can use the Cache object built in ASP.Net. Here is an article that explains how.
I can suggest cache such data in Application state object. For thread-safe usage, consider using lock operator. Your code would look something like this:
public void ClearTableCache(string tableName)
{
lock (System.Web.HttpContext.Current)
{
System.Web.HttpContext.Current.Application[tableName] = null;
}
}
public SomeDataType GetTableData(string tableName)
{
lock (System.Web.HttpContext.Current)
{
if (System.Web.HttpContext.Current.Application[tableName] == null)
{
//get data from DB then put it into application state
System.Web.HttpContext.Current.Application[tableName] = dataFromDb;
return dataFromDb;
}
return (SomeDataType)System.Web.HttpContext.Current.Application[tableName];
}
}
I have a question about Saving a list of object in ASP.NET MVC.
First I'm not using EntityFramework or Nh like ORM tool, just use Ado.net
suppose I have an object Product, and I want to collect all the products data via javascript and batch update the product list in one call.
my question is when should I differentiate which item is inserted, updated, or deleted?
one strategy is that I have a enum property on the DTO object and
also on the javascript ViewModel, and when I add an item into the
viewModel, I marked this object to add, and if I changed one Item, I
marked it to updated. so when this request come to the action, I can
know which items to be insert or update.
pros: it's easy on server side, don't need to differentiate the object status from server side.
cons: if I want to publish this action to webapi that will be called by third party, that may need third party user to
differentiate the state of the object.
differentiate the data from server side, just give me a list of object, on the server side, first retrive the current data from database, compare the data, then check which record to be inserted or updated.
pros: all the compare are done from server side.
cons: proformance issue
what ever the data passed from client, just remove the current data and insert the new data
I hope someone could give me an advice, what's the best practice to handle this situation, I think it's quite common but I can't find a best solution.
I've seen option 1 where added/deleted/modified items are maintained in javascript arrays and posted back to server. But for some reason, I didn't like it maybe because of writing client side code to maintain state.
So, I had used second option and thanks to LINQ for making my task easier. Assuming list has some unique id, below is pseudo code. Note: newly added items should have unique random id's, otherwise there might be chance of treating them as already existing item. In my case its GUID, so there was no chance of overriding.
var submittedIds = vmList.Select(a=>a.Id).ToList();
var dbIds = dbList.Select(d=>d.Id).ToList();
//Added items
var newIds = submittedIds.Except(dbIds).ToList();
//loop over newIds and construct list object to contain newly added items
//Deleted items
var deletedIds = dbIds.Except(submittedIds).ToList();
//Modified items
var modifiedIds = dbIds.Intersect(submittedIds).ToList();//if the values don't change, update statement won't do any harm here
This approach gives reasonable performance unless you are dealing with huge lists.
I think third option is not good. For ex: if you plan to implement audit features on your tables, it will give you wrong functionality. If a new record is inserted, you will have entries for all records as deleted and then one inserted which is wrong because only one is inserted.
3rd strategy is suitable for simple situations e.g. when you want to update a Purchase Order items, an Order will not have too much OrderLineItems. However, you have to take care concurrency issue.
I think your first strategy is best suitable in general case. It's also easy to implement. When you want to publish your service to a 3rd party, it's usual that a client must follow the service definition and requirement.
Update
For 1st strategy: If you don't want your clients have to specify status for their data, then do it for them. You can separate the SaveOrder service into smaller services: CreateOrder, UpdateOrder, DeleteOrder.
Currently our website is facing a problem with slow response times (more than 1 min) when we query CRM from our website. We are using CRM 2011 though a web service. When we investigated we found that the time was spent at the point of querying CRM.
We have used the CrmSvcUtil.exe to generate our proxy classes that map to CRM entities. Then we create an instance of context and query CRM using LINQ with C#.
When we query, We load our parent object with LINQ to CRM and then we use LoadProperty to load the related children.
I would like to know if anyone out there using a different method of querying CRM, and if you have come across issues like this in your implementation.
I’ve included a simplified sample query below.
public void SelectEventById(Guid id)
{
var crmEventDelivery = this.ServiceContext.EventDeliverySet.FirstOrDefault(eventDelivery => eventDelivery.Id == id);
if (crmEventDelivery != null)
{
this.SelectCrmEventDeliveryWithRelationships(crmEventDelivery);
}
}
private void SelectCrmEventDeliveryWithRelationships(EventDelivery crmEventDelivery)
{
// Loading List of Venue Delivery on parent crmEventDelivery thats been passed
this.ServiceContext.LoadProperty(crmEventDelivery, Attributes.EventDelivery.eventdelivery_venuedelivery);
foreach (var venueDelivery in crmEventDelivery.eventdelivery_venuedelivery)
{
// Loading Venue on each Venue Delivery
ServiceContext.LoadProperty(venueDelivery, Attributes.VenueDelivery.venue_venuedelivery);
}
// Loading List of Session Delivery on parent crmEventDelivery thats been passed
this.ServiceContext.LoadProperty(crmEventDelivery, Attributes.EventDelivery.eventdelivery_sessiondelivery);
foreach (var sessionDelivery in crmEventDelivery.eventdelivery_sessiondelivery)
{
// Loading Presenters on each Session Delivery
ServiceContext.LoadProperty(sessionDelivery, Attributes.SessionDelivery.sessiondelivery_presenterbooking);
}
}
Like mentioned on the other answers your main problem is the number of web service calls. What no one mentioned is that you can retrieve many objects with a single call using query joins. So you could try something like:
var query_join = (from e in ServiceContext.EventDeliverySet
join v in ServiceContext.VenueDeliverySet on e.EventDeliveryId equals v.EvendDeliveryId.Id
join vn in ServiceContext.VenueSet on v.VenueDeliveryId equals vn.VenueDeliveryId.Id
join s in ServiceContext.SessionDeliverSet on e.EventDeliveryId equals s.EvendDeliveryId.Id
where e.EventDeliveryId == id // *improtant (see below)
select new { EventDelivery = e, VenueDelivery = v, Venue = vn, SessionDeliver = s }).ToList();
Then you can run a foreach on query_join and put it together.
***improtant: do not use the base Id property (e.Id), stick with e.EntityNameId.Value (don't know why but it took a while for me to figure it out. Id returns default Guid value "00000..").
Based on what you've provided this looks like a standard lazy-load issue, except my guess is that each lazy load is resulting in a web service call. This would be called a "chatty" service architecture. Your goal should be to make as few service calls as possible to retrieve data for a single request.
Calling to fill in details can seem like a good idea because you can re-use the individual service methods for cases where you only want data 1 or 2 levels deep, or all the way down, but you pay a steep performance penalty.
You would be better off defining a web service call that returns a complete object graph in scenarios like this. I don't know if/what you're using for an ORM layer within the CRM but if you make a specific call to fetch a complete graph of Deliveries then the ORM can eager-fetch the data into fewer SQL statements. Fewer calls to the web service (and subsequently fewer calls into the CRM's data store) should noticeably improve your performance.
So I can see why this might take a while. I think as everyone else have commented you are making quite a few web service calls. If you get a moment it would be interesting to know if the individual calls are slow or its just because you are making so many, I would suggest profiling this.
In any case I suspect you would get better performance by not using the strongly type entities.
I would suggest using a FetchXml query, this will allow you to build a Sql Xml-Style query. Basically you should be able to replace your many we bservice calls with a single call. The MSDN has an example, also check out the Stunnware FetchXml designer, Products > Stunnware Tools > Download and Evaluation. It was built for Crm 4 but supports virtually all the features you will need.
If you dont fancy that, you could also try a QueryExpression or OData, both of which should allow you to get your data in one hit.
After trying all the suggested tips in the other answers and doing further profiling, in our particular scenario with our use of CRM, and how it was set up - we decided to simply bypass it.
We ended up using some of the in-built views, this is not a recommended approach in the CRM documentation, but we really needed to achieve higher performance and the CRM approach in this instance was just in our way.
To anyone else reading this, see the other answers too.
Because the query does not know what fields will be needed later, all columns are returned from the entity when only the entity is specified in the select clause. In order to specify only the fields you will use, you must return a new object in the select clause, specifying the fields you want to use.
So instead of this:
var accounts = from acct in xrm.AccountSet
where acct.Name.StartsWith("Test")
select acct;
Use this:
var accounts = from acct in xrm.AccountSet
where acct.Name.StartsWith("Test")
select new Account()
{
AccountId = acct.AccountId,
Name = acct.Name
};
Check out this post more details.
To Linq or not to Linq
We are using Linq to SQL to read and write our domain objects to a SQL Server database.
We are exposing a number of services (via WCF) to do various operations. Conecptually, the implementation of these operations consists of three steps: reconstitute the necessary domain objects from the database; execute the operation on the domain objects; persist the (now changed) domain objects back to the database.
Problem is that sometimes, there are two or more instances of the same entity objects, which can lead to inconsistenties when saving the objects back to the db. A little made-up example:
public void Move(string sourceLocationid, destinationLocationId, itemId);
which is supposed to move the item with the given id from the source to the destination location (actual services are more complicated, often involving many locations, items etc). Now, it could be that both source and destination location id are the same - a naive implementation would just reconstitute two instances of the entity object, which would lead to problems.
This issue is now "solved" by checking for it manually, i.e. we reconstitute a first location, check if the id of the second is different from it, and if so reconsistute the second, and so on. This is obvisouly difficult and error-prone.
Anyway, I was actually surprised that there does not seem to be a "standard" solution for this in domain driven design. In particular, repositories or factories do not seem to solve this problem (unless they maintain their own cache, which then needs to be updated etc).
My idea would be to make a DomainContext object per operation, which tracks and caches the domain objects used in that particular method. Instead of reconstituing and saving individual domain objects, such an object would be reconstituted and saved as a whole (possibly using repositories), and it could act as a cache for the domain objects used in that particular operation.
Anyway, it seems that this is a common problem, so how is this usually dealt with? What do you think of the idea above?
The DataContext in Linq-To-Sql supports the Identity Map concept out of the box and should be caching the objects you retrieve. The objects will only be different if you are not using the same DataContext for each GetById() operation.
Linq to Sql objects aren't really valid outside of the lifetime of the DataContext. You may find Rick Strahl's Linq to SQL DataContext Lifetime Management a good background read.
Also, the ORM is not responsible for logic in the domain. It's not going to disallow your example Move operation. That's up for the domain to decide what that means. Does it ignore it? or is it an error? It's your domain logic, and that needs to be implemented at the service boundary you are creating.
However, Linq-To-Sql does know when an object changes, and from what I've looked at, it won't record the change if you are re-assigning the same value. e.g. if Item.LocationID = 12, setting the locationID to 12 again won't trigger an update when SubmitChanges() is called.
Based on the example given, I'd be tempted to return early without ever loading an object if the source and destination are the same.
public void Move(string sourceLocationId, destinationLocationId, itemId)
{
if( sourceLocationId == destinationLocationId )
return;
using( DataContext ctx = new DataContext() )
{
Item item = ctx.Items.First( o => o.ItemID == itemId );
Location destination =
ctx.Locations.First( o => o.LocationID == destinationLocationID );
item.Location = destination;
ctx.SubmitChanges();
}
}
Another small point, which may or may not be applicable, is you should make your interfaces as chunky as possible. e.g. If you're typically going to perform 10 move operations at once, it's better to call 1 service method to perform all 10 operations at once, rather than 1 operation at a time. ref: chunky vs chatty
Many ORMs use two concepts that, if I understand you, address your issue. The first and most relevant is Context this is responsible for ensuring that only one object represents a entity (database table row, in the simple case) no mater how many times or ways it's requested from the database. The second is Unit of Work; this ensures that updates to the database for a group of entities either all succeed or all fail.
Both of these are implemented by the ORM I'm most familiar with (LLBLGen Pro), however I believe NHibernate and others also implement these concepts.