Update only the changes that are different - c#

I have a Entity-Set employee_table, I'm getting the data through excel sheet which I have loaded in the memory and the user will click Save to save the changes in the db and its all good for the first time inserting records and no issue with that.
but how can I update only the changes that are made? meaning that, let say I have 10 rows and 5 columns and out of 10 rows say row 7th was modified and out of 5 column let say 3rd column was modified and I just need to update only those changes and keep the existing value of the other columns.
I can do with checking if (myexistingItem.Name != dbItem.Name) { //update } but its very tedious and not efficient and I'm sure there is a better way to handle.
here is what I got so far.
var excelData = SessionWrapper.GetSession_Model().DataModel.OrderBy(x => x.LocalName).ToList();;
var dbData = context.employee_master.OrderBy(x => x.localname).ToList();
employee_master = dbEntity = employee_master();
if (dbData.Count > 0)
{
//update
foreach (var dbItem in dbData)
{
foreach(var xlData in excelData)
{
if(dbItem.customer == xlData.Customer)
{
dbEntity.customer = xlData.Customer;
}
//...do check rest of the props....
db.Entry(dbEntity).State = EntityState.Modified;
db.employee_master.Add(dbEntity);
}
}
//save
db.SaveChanges();
}
else
{
//insert
}

You can make this checking more generic with reflection.
Using this answer to get the value by property name.
public static object GetPropValue(object src, string propName)
{
return src.GetType().GetProperty(propName).GetValue(src, null);
}
Using this answer to set the value by property name.
public static void SetPropertyValue(object obj, string propName, object value)
{
obj.GetType().GetProperty(propName).SetValue(obj, value, null);
}
And this answer to list all properties
public static void CopyIfDifferent(Object target, Object source)
{
foreach (var prop in target.GetType().GetProperties())
{
var targetValue = GetPropValue(target, prop.Name);
var sourceValue = GetPropValue(source, prop.Name);
if (!targetValue.Equals(sourceValue))
{
SetPropertyValue(target, prop.Name, sourceValue);
}
}
}
Note: If you need to exclude some properties, you can implement very easy by passsing the list of properties to the method and you can check in if to be excluded or not.

Update:
I am updating this answer to provide a little more context as to why I suggested not going with a hand-rolled reflection-based solution for now; I also want to clarify that there is nothing wrong with a such a solution per se, once you have identified that it fits the bill.
First of all, I assume from the code that this is a work in progress and therefore not complete. In that case, I feel that the code doesn't need more complexity before it's done and a hand-rolled reflection-based approach is more code for you to write, test, debug and maintain.
For example, right now you seem to have a situation where there is a simple 1:1 simple copy from the data in the excel to the data in the employee_master object. So in that case reflection seems like a no-brainer, because it saves you loads of boring manual property assignments.
But what happens when HR (or whoever uses this app) come back to you with the requirement: If Field X is blank on the Excel sheet, then copy the value "Blank" to the target field, unless it's Friday, in which case copy the value "N.A".
Now a generalised solution has to accomodate custom business logic and could start to get burdensome. I have been in this situation and unless you are very careful, it tends to end up with turning a mess in the long run.
I just wanted to point this out - and recommend at least looking at Automapper, because this already provides one very proven way to solve your issue.
In terms of efficiency concerns, they are only mentioned because the question mentioned them, and I wanted to point out that there are greater inefficiencies at play in the code as posted as opposed to the inefficiency of manually typing 40+ property assignments, or indeed the concern of only updating changed fields.
Why not rewrite the loop:
foreach (var xlData in excelData)
{
//find existing record in database data:
var existing = dbData.FirstOrDefault(d=>d.customer==xlData.Customer);
if(existing!=null)
{
//it already exists in database, update it
//see below for notes on this.
}
else
{
//it doesn't exist, create employee_master and insert it to context
//or perform validation to see if the insert can be done, etc.
}
//and commit:
context.SaveChanges();
}
This lets you avoid the initial if(dbData.Count>0) because you will always insert any row from the excel sheet that doesn't have a matching entry in dbData, so you don't need a separate block of code for first-time insertion.
It's also more efficient than the current loop because right now you are iterating every object in dbData for every object in xlData; that means if you have 1,000 items in each you have a million iterations...
Notes on the update process and efficiency in general
(Note: I know the question wasn't directly about efficiency, but since you mentioned it in the context of copying properties, I just wanted to give some food for thought)
Unless you are building a system that has to do this operation for multiple entities, I'd caution against adding more complexity to your code by building a reflection-based property copier.
If you consider the amount of properties you have to copy (i.e the number of foo.x = bar.x type statements) , and then consider the code required to have a robust, fully tested and provably efficient reflection-based property copier (i.e with built-in cache so you don't have to constantly re-reflect type properties, a mechanism to allow you to specify exceptions, handling for edge cases where for whatever unknown reason you discover that for random column X the value "null" is to be treated a little differently in some cases, etc), you may well find that the former is actually significantly less work :)
Bear in mind that even the fastest reflection-based solution will always still be slower than a good old fashioned foo.x = bar.x assignment.
By all means if you have to do this operation for 10 or 20 separate entities, consider the general case, otherwise my advice would be, manually write the property copy assignments, get it right and then think about generalising - or look at Automapper, for example.
In terms of only updating field that have changed - I am not sure you even need to. If the record exists in the database, and the user has just presented a copy of that record which they claim to be the "correct" version of that object, then just copy all the values they gave and save them to the database.
The reason I say this is because in all likelihood, the efficiency of only sending for example 4 modified fields versus 25 or whatever, pales into insignificance next to the overhead of the actual round-trip to the database itself; I'd be surprised if you were able to observe a meaningful performance increase in these kinds of operations by not sending all columns - unless of course all columns are NVARCHAR(MAX) or something :)
If concurrency is an issue (i.e. other uses might be modifying the same data) then include a ROWVERSION type column in the database table, map it in Entity Framework, and handle the concurrency issues if and when they arise.

Related

Appropriate update of entity while checking if it exists in EF Core

I have the following method updating an entity. The only biff I had was that when an non-existing ID were provided, I got an harsh exception.
public bool Update(Thing thing)
{
Context.Things.Update(thing);
int result = Context.SaveChanges();
return result == 1;
}
So I added a check to control the exception thrown (plus some nice logging and other facilitation). Eventually, I plan to skip the throwing up entirely.
public bool UpdateWithCheck(Thing thing)
{
Thing target = Context.Things.SingleOrDefault(a => a.Id == thing.Id);
if (target == null)
throw new CustomException($"No thing with ID {thing.Id}.");
Context.Things.Update(thing);
int result = Context.SaveChanges();
return result == 1;
}
No, this doesn't work, because the entity already is being tracked. I have several options to handle that.
Change to Context.Where(...).AsNoTracking().
Explicitly set the updated fields in target and save it.
Horse around with entity states and tampering with the tracker.
Removing the present and adding the new one.
I can't decide which is the best practice. Googling gave me the default examples that do not contain the check for pre-existing status in the same operation.
The reason for the exception is because by loading the entity from the Context to check if it exists, you now have a tracked reference. When you go to update the detatched reference, EF will complain that a instance is already tracked.
The simplest work-around would be:
public bool UpdateWithCheck(Thing thing)
{
bool doesExist = Context.Things.Any(a => a.Id == thing.Id);
if (!doesExist)
throw new CustomException($"No thing with ID {thing.Id}.");
Context.Things.Update(thing);
int result = Context.SaveChanges();
return result == 1;
}
However, there are two problems with this approach. Firstly, because we don't know the scope of the DbContext instance or can guarantee the order of methods, it may be possible that at some point that DbContext instance could have loaded and tracked that instance of the thing. This can manifest as seemingly intermittent errors. The proper way to guard against that would be something like:
public bool UpdateWithCheck(Thing thing)
{
bool doesExist = Context.Things.Any(a => a.Id == thing.Id);
if (!doesExist)
throw new CustomException($"No thing with ID {thing.Id}.");
Thing existing = Context.Things.Local.SingleOrDefault(a => a.Id == thing.Id);
if (existing != null)
Context.Entry(existing).State = EntityState.Detached;
Context.Things.Update(thing);
int result = Context.SaveChanges();
return result == 1;
}
This checks the local tracking cache for any loaded instances, and if found, detaches them. The risk here is that any modifications that haven't be persisted in those tracked references will be discarded, and any references floating around that would have assumed were attached, will now be detached.
The second significant issue is with using Update(). When you have detached entities being passed around there is a risk that data you don't intend to be updated could be updated. Update will replace all columns, where typically if a client might only be expected to update a subset of them. EF can be configured to check row versions or timestamps on entities against the database before updating when your database is set up to support them (Such as Snapshot isolation) which can help guard against stale overwrites, but still allow unexpected tampering.
As you've already figured out, the better approach is to avoid passing detached entities around, and instead use dedicated DTOs. This avoids potential confusion about what objects represent view/consumer state vs. data state. By explicitly copying the values across from the DTO to the entity, or configuring a mapper to copy supported values, you also protect your system from unexpected tampering and potential stale overwrites. One consideration with this approach is that you should guard updates to avoid unconditionally overwriting data with potentially stale data by ensuring your Entity and DTO have a RowVersion/Timestamp to compare. Before copying from DTO to the freshly loaded Entity, compare the version, if it matches then nothing has changed in the data row since you fetched and composed your DTO. If it has changed, that means someone else has updated the underlying data row since the DTO was read, so your modifications are against stale data. From there, take an appropriate action such as discard changes, overwrite changes, merge the changes, log the fact, etc.
Just alter properties of target and call SaveChanges() - remove the Update call. I'd say the typical use case these days is for the input thing to not actually be a Thing but to be a ThingViewModel, ThingDto or some other variation on a theme of "an object that carries enough data to identify and update a Thing but isn't actually a DB entity". To that extent, if the notion of updating properties of Thing from ThingViewModel by hand bores you, you can look at a mapper (AutoMapper is probably the most well known but there are many others) to do the copying for you, or even set you up with a new Thing if you decide to turn this method into an Upsert

Mass Update a property on multiple records inside a dictionary (VB.NET / C#)

I have a Dictionary (of Long, Class), where Class has multiple properties (assume we have a property called Updated as Boolean).
I want to update this (Updated) property to (True) at once for let's say all Odd key records (or based on any specific rule). What is the best way to do so?
My thoughts are to use Linq to fetch those records then (for each) them, but is there any better way to do so like doing a mass update where a condition happens (like what we do in the database)?
An example of my approach is below. Appreciate it if there is a better way to do such an update...
Thanks
Dim ReturnedObjs = From Obj In Dictionary Where Obj.Key Mod 2 = 1
For Each item As KeyValuePair(Of Long, Class) In ReturnedObjs
item.Value.Updated = True
Next
First, this sounds like a obvious case for the speed rant:
https://ericlippert.com/2012/12/17/performance-rant/
Second:
The best way is to keep this in the Database. You are not going to beat the speed of a DB Query with Indexes designed for quick matching, by transfering the data over the network twice (once to get it, once to return it) and doubling the search load (once to get all odd ones, once to update all the ones you just changed). My standing advice is to always keep as much work as possible on the DB side. Your client code will never be able to beat it.
Third:
If you do need to use client side processing:
Now a lot of my answer depend on details of the implementation, how the JiT and general Compiler optimsiations work, etc.
Foreach uses works on enumerators, not Collections. But if you feed a collection to foreaach, a Enumerator is implicitly created. Now enumerators do have two properties:
If the collection changes, the Enumerator becomes invalid. Most people learn about them because they ran into this issue.
It is a extra function call and set of checks for accessing a collection. So it will be a slowdown. How much is hard to say, as the Optimisations and JiT are pretty good.
So you propably want to use for loop instead.
If you could turn the Dictionary into a collection where the Primary Key is used as Index, it might be a bit faster. But hat has the danger of running into a lot of "dry spells" regarding data, so it depends a lot on your source data.

Restful API thread clashing

I currently have a shopping cart API to add the item to table when the item doesn't exist, and then increase the qty column each time the item is then added:
var exist = _context.Carts.Any(a => a.CartID == dto.CartSesID && a.SweetID == dto.SweetID);
if (!exist)
{
// Create a new cart item if no cart item exists
var cartItem = new Cart
{
SweetID = dto.SweetID,
CartID = dto.CartSesID,
Qty = qty,
DateCreated = DateTime.Now
};
_context.Carts.Add(cartItem);
}
else
{
var cartItem = _context.Carts.FirstOrDefault(a => a.CartID == dto.CartSesID && a.SweetID == dto.SweetID);
// If the item does exist in the cart,
// then add one to the quantity
if(type == "plus")
cartItem.Qty = cartItem.Qty + qty;
if (type == "minus")
cartItem.Qty = cartItem.Qty - qty;
if(cartItem.Qty == 0)
_context.Carts.Remove(cartItem);
}
// Save changes
_context.SaveChanges();
The problem is that the if (!exist) check seems to think that the item doesn't exist when the button is clicked multiple times too fast (maybe thread not finished when other has started?) resulting in the same item added on several rows:
But it should be added as the following:
Does anyone know an ideal fix?
You have a race condition here. When two requests follow rapidly one another, the first request may have not committed its changes to the DB before the second one queries the existence of the item in question.
You need to apply some concurrency control to resolve this. Basically there are two ways to go:
Serializing the changes made to the DB. Again, two main options:
most RDBMS supports the serializable isolation level for transactions or
you can use some locking mechanisms. This can take place on DB level (e.g. table locking) or application level (.NET locking constructions) depending on your application architecture.
You can apply a trial-and-error (or trial-and-retry-on-error to be more precise) approach aka optimistic concurrency control.
Obviously, serializing have a negative impact on performance (especially option 1.1) so usually optimistic concurrency control is preferred and the other ones are for special cases.
Luckily, EF has built-in support for optimistic concurrency handling. All the details is discussed in this MSDN article.
In this particular case you have an even simpler way. You need to define a compound unique constraint on (CartID, SweetID) fields. Doing this guarantees that no duplicates can be inserted into the table. You get an exception when such an attempt is detected and by catching it you can handle the situation according to your requirements. E.g. you can initiate an update (but keep in mind that even in this case you need optimistic concurrency checking to make the process absolutely fail-safe!)
Footnote
Disabling the button is just a masking of the problem, indeed. On server-side you cannot trust what JS does on client-side because it is completely out of your control. JS can be disabled or modified by the user easily.
Update
Reading through my answer again, I feel there's a conclusion I should add:
In this particular case I think the best you can do is to
set the unique constraint as I suggested but don't care about the exception thrown when a duplicate insert is detected AND
disable the submit button on client-side for the time of submit.
This way you ensure that no invalid data can be stored in your DB even if someone manipulates your JS code. At the same time, you don't need to overcomplicate your data persisting logic.

Partially thread-safe dictionary

I have a class that maintains a private Dictionary instance that caches some data.
The class writes to the dictionary from multiple threads using a ReaderWriterLockSlim.
I want to expose the dictionary's values outside the class.
What is a thread-safe way of doing that?
Right now, I have the following:
public ReadOnlyCollection<MyClass> Values() {
using (sync.ReadLock())
return new ReadOnlyCollection<MyClass>(cache.Values.ToArray());
}
Is there a way to do this without copying the collection many times?
I'm using .Net 3.5 (not 4.0)
I want to expose the dictionary's values outside the class.
What is a thread-safe way of doing that?
You have three choices.
1) Make a copy of the data, hand out the copy. Pros: no worries about thread safe access to the data. Cons: Client gets a copy of out-of-date data, not fresh up-to-date data. Also, copying is expensive.
2) Hand out an object that locks the underlying collection when it is read from. You'll have to write your own read-only collection that has a reference to the lock of the "parent" collection. Design both objects carefully so that deadlocks are impossible. Pros: "just works" from the client's perspective; they get up-to-date data without having to worry about locking. Cons: More work for you.
3) Punt the problem to the client. Expose the lock, and make it a requirement that clients lock all views on the data themselves before using it. Pros: No work for you. Cons: Way more work for the client, work they might not be willing or able to do. Risk of deadlocks, etc, now become the client's problem, not your problem.
If you want a snapshot of the current state of the dictionary, there's really nothing else you can do with this collection type. This is the same technique used by the ConcurrentDictionary<TKey, TValue>.Values property.
If you don't mind throwing an InvalidOperationException if the collection is modified while you are enumerating it, you could just return cache.Values since it's readonly (and thus can't corrupt the dictionary data).
EDIT: I personally believe the below code is technically answering your question correctly (as in, it provides a way to enumerate over the values in a collection without creating a copy). Some developers far more reputable than I strongly advise against this approach, for reasons they have explained in their edits/comments. In short: This is apparently a bad idea. Therefore I'm leaving the answer but suggesting you not use it.
Unless I'm missing something, I believe you could expose your values as an IEnumerable<MyClass> without needing to copy values by using the yield keyword:
public IEnumerable<MyClass> Values {
get {
using (sync.ReadLock()) {
foreach (MyClass value in cache.Values)
yield return value;
}
}
}
Be aware, however (and I'm guessing you already knew this), that this approach provides lazy evaluation, which means that the Values property as implemented above can not be treated as providing a snapshot.
In other words... well, take a look at this code (I am of course guessing as to some of the details of this class of yours):
var d = new ThreadSafeDictionary<string, string>();
// d is empty right now
IEnumerable<string> values = d.Values;
d.Add("someKey", "someValue");
// if values were a snapshot, this would output nothing...
// but in FACT, since it is lazily evaluated, it will now have
// what is CURRENTLY in d.Values ("someValue")
foreach (string s in values) {
Console.WriteLine(s);
}
So if it's a requirement that this Values property be equivalent to a snapshot of what is in cache at the time the property is accessed, then you're going to have to make a copy.
(begin 280Z28): The following is an example of how someone unfamiliar with the "C# way of doing things" could lock the code:
IEnumerator enumerator = obj.Values.GetEnumerator();
MyClass first = null;
if (enumerator.MoveNext())
first = enumerator.Current;
(end 280Z28)
Review next possibility, just exposes ICollection interface, so in Values() you can return your own implementation. This implementation will use only reference on Dictioanry.Values and always use ReadLock for access items.

C# Dictionary Loop Enhancment

I have a dictionary with around 1 milions items. I am constantly looping throw the dictionnary :
public void DoAllJobs()
{
foreach (KeyValuePair<uint, BusinessObject> p in _dictionnary)
{
if(p.Value.MustDoJob)
p.Value.DoJob();
}
}
The execution is a bit long, around 600 ms, I would like to deacrese it. Here is the contraints :
MustDoJob values mostly stay the same beetween two calls to DoAllJobs()
60-70% of the MustDoJob values == false
From time to times MustDoJob change for 200 000 pairs.
Some p.Value.DoJob() can not be computed at the same time (COM object call)
Here, I do not need the key part of the _dictionnary objet but I really do need it somewhere else
I wanted to do the following :
Parallelizes but I am not sure is going to be effective due to 4.
Sorts the dictionnary since 1. and 2. (and stop want I find the first MustDoJob == false) but I am wondering what 3. would result in
I did not implement any of the previous ideas since it could be a lot of job and I would like to investigate others options before. So...any ideas ?
What I would suggest is that your business object could raise an event to indicate that it needs to do a job when MustDoJob becomes true and you can subscribe to that event and store references to those objects in a simple list and then process the contents of that list when the DoAllJobs() method is called
My first suggestion would be to use just the values from the dictionary:
foreach (BusinessObject> value in _dictionnary.Values)
{
if(value.MustDoJob)
{
value.DoJob();
}
}
With LINQ this could be even easier:
foreach (BusinessObject value in _dictionnary.Values.Where(v => v.MustDoJob))
{
value.DoJob();
}
That makes it clearer. However, it's not clear what else is actually causing you a problem. How quickly do you need to be able to iterate over the dictionary? I expect it's already pretty nippy... is anything actually wrong with this brute force approach? What's the impact of it taking 600ms to iterate over the collection? Is that 600ms when nothing needs to do any work?
One thing to note: you can't change the contents of the dictionary while you're iterating over it - whether in this thread or another. That means not adding, removing or replacing key/value pairs. It's okay for the contents of a BusinessObject to change, but the dictionary relationship between the key and the object can't change. If you want to minimise the time during which you can't modify the dictionary, you can take a copy of the list of references to objects which need work doing, and then iterate over that:
foreach (BusinessObject value in _dictionnary.Values
.Where(v => v.MustDoJob)
.ToList())
{
value.DoJob();
}
Try using a profiler first. 4 makes me curious - 600ms may not be that much if the COM object uses most of the time, and then it is either paralellize or live with it.
I would get sure first - with a profiler run - that you dont target the totally wrong issue here.
Having established that the loop really is the problem (see TomTom's answer), I would maintain a list of the items on which MustDoJob is true -- e.g., when MustDoJob is set, add it to the list, and when you process and clear the flag, remove it from the list. (This might be done directly by the code manipulating the flag, or by raising an event when the flag changes; depends on what you need.) Then you loop through the list (which is only going to be 60-70% of the length), not the dictionary. The list might contain the object itself or just its key in the dictionary, although it will be more efficient if it holds the object itself as you avoid the dictionary lookup. It does depend on how frequently you're queuing 200k of them, and how time-critical the queuing vs. the execution is.
But again: Step 1 is make sure you're solving the right problem.
The use of a dictionary to me implies that the intention is to find items by a key, rather than visit every item. On the other hand, 600ms for looping through a million items is respectable.
Perhaps alter your logic so that you can simply pick the relevant items satisfying the condition directly out of the dictionary.
Use a List of KeyValuePairs instead. This means you can iterate over it super-quickly by doing
List<KeyValuePair<string,object>> list = ...;
int totalItems = list.Count;
for (int x = 0; x < totalItems; x++)
{
// whatever you plan to do with them, you have access to both KEY and VALUE.
}
I know this post is old, but I was looking for a way to iterate over a dictionary without the increased overhead of the Enumerator being created (GC and all), or generally a faster way to iterate over it.

Categories