Updating coordinated dictionaries in C# - c#

I have two dictionaries that are acting as a cache, let's call them d1 and d2, with d2 being a subset a of d1.
When I refresh the dictionaries with new entries of d1, I would like the entries in d2 to be refreshed as well. It can be done the obvious way:
foreach(var update in updates) {
d1[update.id] = update;
if(d2.Contains(update.id))
d2[update.id] = update;
}
But since I'd like to cache other subsets (d3,d4,d5, etc), this may get unwieldy. So I added a "CopyFrom" method to my object, which allows me to maintain the reference by simply just copying properties to the object being updated.
foreach(var update in updates) {
d1[update.id].CopyFrom(update)
}
In this way, any other dictionaries that have a reference to the entry won't lose it when d1 gets updated.
I'd just like to know if I'm missing anything here? I'm just getting back into C# after a break, and my grasp on the obvious may be shaky :).

Why not have a CacheItem class that contains the payload (the current values in your dictionaries). Then for each key in your dictionaries, store a CacheItem containing what you're currently storing. If you store the same CacheItem object in multiple dictionaries, you only need to modify the payload of a CacheItem, and all the dictionaries containing it are updated.
foreach(var update in updates) {
if (d1.ContainsKey(update.id)) {
var cacheItem = d1[update.id];
cacheItem.Payload = update;
} else {
d1[update.id] = new CacheItem(update);
}
}
My answer assumes that your design of having multiple dictionaries, some being subsets of the main one is based on your requirements, and is a sound way to address them. It seems a little unusual to me.

Well if the value of the Dictionary is a reference type, and you're only modifying it, you wouldn't need to do anything. If you are creating a new reference or it is a value type, I'd change the subsets to be arrays of the type of id which is the subset you'd like to have. And whenever you need the value, you'd still go to d1 for accessing it.

As for all types of cache keep in mind that a cache with a bad policy is another name for a memory leak.
Further: Caching Implies Policy.

Related

Update only the changes that are different

I have a Entity-Set employee_table, I'm getting the data through excel sheet which I have loaded in the memory and the user will click Save to save the changes in the db and its all good for the first time inserting records and no issue with that.
but how can I update only the changes that are made? meaning that, let say I have 10 rows and 5 columns and out of 10 rows say row 7th was modified and out of 5 column let say 3rd column was modified and I just need to update only those changes and keep the existing value of the other columns.
I can do with checking if (myexistingItem.Name != dbItem.Name) { //update } but its very tedious and not efficient and I'm sure there is a better way to handle.
here is what I got so far.
var excelData = SessionWrapper.GetSession_Model().DataModel.OrderBy(x => x.LocalName).ToList();;
var dbData = context.employee_master.OrderBy(x => x.localname).ToList();
employee_master = dbEntity = employee_master();
if (dbData.Count > 0)
{
//update
foreach (var dbItem in dbData)
{
foreach(var xlData in excelData)
{
if(dbItem.customer == xlData.Customer)
{
dbEntity.customer = xlData.Customer;
}
//...do check rest of the props....
db.Entry(dbEntity).State = EntityState.Modified;
db.employee_master.Add(dbEntity);
}
}
//save
db.SaveChanges();
}
else
{
//insert
}
You can make this checking more generic with reflection.
Using this answer to get the value by property name.
public static object GetPropValue(object src, string propName)
{
return src.GetType().GetProperty(propName).GetValue(src, null);
}
Using this answer to set the value by property name.
public static void SetPropertyValue(object obj, string propName, object value)
{
obj.GetType().GetProperty(propName).SetValue(obj, value, null);
}
And this answer to list all properties
public static void CopyIfDifferent(Object target, Object source)
{
foreach (var prop in target.GetType().GetProperties())
{
var targetValue = GetPropValue(target, prop.Name);
var sourceValue = GetPropValue(source, prop.Name);
if (!targetValue.Equals(sourceValue))
{
SetPropertyValue(target, prop.Name, sourceValue);
}
}
}
Note: If you need to exclude some properties, you can implement very easy by passsing the list of properties to the method and you can check in if to be excluded or not.
Update:
I am updating this answer to provide a little more context as to why I suggested not going with a hand-rolled reflection-based solution for now; I also want to clarify that there is nothing wrong with a such a solution per se, once you have identified that it fits the bill.
First of all, I assume from the code that this is a work in progress and therefore not complete. In that case, I feel that the code doesn't need more complexity before it's done and a hand-rolled reflection-based approach is more code for you to write, test, debug and maintain.
For example, right now you seem to have a situation where there is a simple 1:1 simple copy from the data in the excel to the data in the employee_master object. So in that case reflection seems like a no-brainer, because it saves you loads of boring manual property assignments.
But what happens when HR (or whoever uses this app) come back to you with the requirement: If Field X is blank on the Excel sheet, then copy the value "Blank" to the target field, unless it's Friday, in which case copy the value "N.A".
Now a generalised solution has to accomodate custom business logic and could start to get burdensome. I have been in this situation and unless you are very careful, it tends to end up with turning a mess in the long run.
I just wanted to point this out - and recommend at least looking at Automapper, because this already provides one very proven way to solve your issue.
In terms of efficiency concerns, they are only mentioned because the question mentioned them, and I wanted to point out that there are greater inefficiencies at play in the code as posted as opposed to the inefficiency of manually typing 40+ property assignments, or indeed the concern of only updating changed fields.
Why not rewrite the loop:
foreach (var xlData in excelData)
{
//find existing record in database data:
var existing = dbData.FirstOrDefault(d=>d.customer==xlData.Customer);
if(existing!=null)
{
//it already exists in database, update it
//see below for notes on this.
}
else
{
//it doesn't exist, create employee_master and insert it to context
//or perform validation to see if the insert can be done, etc.
}
//and commit:
context.SaveChanges();
}
This lets you avoid the initial if(dbData.Count>0) because you will always insert any row from the excel sheet that doesn't have a matching entry in dbData, so you don't need a separate block of code for first-time insertion.
It's also more efficient than the current loop because right now you are iterating every object in dbData for every object in xlData; that means if you have 1,000 items in each you have a million iterations...
Notes on the update process and efficiency in general
(Note: I know the question wasn't directly about efficiency, but since you mentioned it in the context of copying properties, I just wanted to give some food for thought)
Unless you are building a system that has to do this operation for multiple entities, I'd caution against adding more complexity to your code by building a reflection-based property copier.
If you consider the amount of properties you have to copy (i.e the number of foo.x = bar.x type statements) , and then consider the code required to have a robust, fully tested and provably efficient reflection-based property copier (i.e with built-in cache so you don't have to constantly re-reflect type properties, a mechanism to allow you to specify exceptions, handling for edge cases where for whatever unknown reason you discover that for random column X the value "null" is to be treated a little differently in some cases, etc), you may well find that the former is actually significantly less work :)
Bear in mind that even the fastest reflection-based solution will always still be slower than a good old fashioned foo.x = bar.x assignment.
By all means if you have to do this operation for 10 or 20 separate entities, consider the general case, otherwise my advice would be, manually write the property copy assignments, get it right and then think about generalising - or look at Automapper, for example.
In terms of only updating field that have changed - I am not sure you even need to. If the record exists in the database, and the user has just presented a copy of that record which they claim to be the "correct" version of that object, then just copy all the values they gave and save them to the database.
The reason I say this is because in all likelihood, the efficiency of only sending for example 4 modified fields versus 25 or whatever, pales into insignificance next to the overhead of the actual round-trip to the database itself; I'd be surprised if you were able to observe a meaningful performance increase in these kinds of operations by not sending all columns - unless of course all columns are NVARCHAR(MAX) or something :)
If concurrency is an issue (i.e. other uses might be modifying the same data) then include a ROWVERSION type column in the database table, map it in Entity Framework, and handle the concurrency issues if and when they arise.

Is there a LinkedList collection that supports dictionary type operations

I was recently profiling an application trying to work out why certain operations were extremely slow. One of the classes in my application is a collection based on LinkedList. Here's a basic outline, showing just a couple of methods and some fluff removed:
public class LinkInfoCollection : PropertyNotificationObject, IEnumerable<LinkInfo>
{
private LinkedList<LinkInfo> _items;
public LinkInfoCollection()
{
_items = new LinkedList<LinkInfo>();
}
public void Add(LinkInfo item)
{
_items.AddLast(item);
}
public LinkInfo this[Guid id]
{ get { return _items.SingleOrDefault(i => i.Id == id); } }
}
The collection is used to store hyperlinks (represented by the LinkInfo class) in a single list. However, each hyperlink also has a list of hyperlinks which point to it, and a list of hyperlinks which it points to. Basically, it's a navigation map of a website. As this means you can having infinite recursion when links go back to each other, I implemented this as a linked list - as I understand it, it means for every hyperlink, no matter how many times it is referenced by another hyperlink, there is only ever one copy of the object.
The ID property in the above example is a GUID.
With that long winded description out the way, my problem is simple - according to the profiler, when constructing this map for a fairly small website, the indexer referred to above is called no less than 27906 times. Which is an extraordinary amount. I still need to work out if it's really necessary to be called that many times, but at the same time, I would like to know if there's a more efficient way of doing the indexer as this is the primary bottleneck identified by the profiler (also assuming it isn't lying!). I still needed the linked list behaviour as I certainly don't want more than one copy of these hyperlinks floating around killing my memory, but I also do need to be able to access them by a unique key.
Does anyone have any advice to offer on improving the performance of this indexer. I also have another indexer which uses a URI rather than a GUID, but this is less problematic as the building incoming/outgoing links is done by GUID.
Thanks;
Richard Moss
You should use a Dictionary<Guid, LinkInfo>.
You don't need to use LinkedList in order to have only one copy of each LinkInfo in memory. Remember that LinkInfo is a managed reference type, and so you can place it in any collection, and it'll just be a reference to the object that gets placed in the list, not a copy of the object itself.
That said, I'd implement the LinkInfo class as containing two lists of Guids: one for the things this links to, one for the things linking to this. I'd have just one Dictionary<Guid, LinkInfo> to store all the links. Dictionary is a very fast lookup, I think that'll help with your performance.
The fact that this[] is getting called 27,000 times doesn't seem like a big deal to me, but what's making it show up in your profiler is probably the SingleOrDefault call on the LinkedList. Linked lists are best for situations where you need fast insertions & removals, particularly in the middle of the list. For quick lookups, which is probably more important here, let the Dictionary do its work with hash tables.

Partially thread-safe dictionary

I have a class that maintains a private Dictionary instance that caches some data.
The class writes to the dictionary from multiple threads using a ReaderWriterLockSlim.
I want to expose the dictionary's values outside the class.
What is a thread-safe way of doing that?
Right now, I have the following:
public ReadOnlyCollection<MyClass> Values() {
using (sync.ReadLock())
return new ReadOnlyCollection<MyClass>(cache.Values.ToArray());
}
Is there a way to do this without copying the collection many times?
I'm using .Net 3.5 (not 4.0)
I want to expose the dictionary's values outside the class.
What is a thread-safe way of doing that?
You have three choices.
1) Make a copy of the data, hand out the copy. Pros: no worries about thread safe access to the data. Cons: Client gets a copy of out-of-date data, not fresh up-to-date data. Also, copying is expensive.
2) Hand out an object that locks the underlying collection when it is read from. You'll have to write your own read-only collection that has a reference to the lock of the "parent" collection. Design both objects carefully so that deadlocks are impossible. Pros: "just works" from the client's perspective; they get up-to-date data without having to worry about locking. Cons: More work for you.
3) Punt the problem to the client. Expose the lock, and make it a requirement that clients lock all views on the data themselves before using it. Pros: No work for you. Cons: Way more work for the client, work they might not be willing or able to do. Risk of deadlocks, etc, now become the client's problem, not your problem.
If you want a snapshot of the current state of the dictionary, there's really nothing else you can do with this collection type. This is the same technique used by the ConcurrentDictionary<TKey, TValue>.Values property.
If you don't mind throwing an InvalidOperationException if the collection is modified while you are enumerating it, you could just return cache.Values since it's readonly (and thus can't corrupt the dictionary data).
EDIT: I personally believe the below code is technically answering your question correctly (as in, it provides a way to enumerate over the values in a collection without creating a copy). Some developers far more reputable than I strongly advise against this approach, for reasons they have explained in their edits/comments. In short: This is apparently a bad idea. Therefore I'm leaving the answer but suggesting you not use it.
Unless I'm missing something, I believe you could expose your values as an IEnumerable<MyClass> without needing to copy values by using the yield keyword:
public IEnumerable<MyClass> Values {
get {
using (sync.ReadLock()) {
foreach (MyClass value in cache.Values)
yield return value;
}
}
}
Be aware, however (and I'm guessing you already knew this), that this approach provides lazy evaluation, which means that the Values property as implemented above can not be treated as providing a snapshot.
In other words... well, take a look at this code (I am of course guessing as to some of the details of this class of yours):
var d = new ThreadSafeDictionary<string, string>();
// d is empty right now
IEnumerable<string> values = d.Values;
d.Add("someKey", "someValue");
// if values were a snapshot, this would output nothing...
// but in FACT, since it is lazily evaluated, it will now have
// what is CURRENTLY in d.Values ("someValue")
foreach (string s in values) {
Console.WriteLine(s);
}
So if it's a requirement that this Values property be equivalent to a snapshot of what is in cache at the time the property is accessed, then you're going to have to make a copy.
(begin 280Z28): The following is an example of how someone unfamiliar with the "C# way of doing things" could lock the code:
IEnumerator enumerator = obj.Values.GetEnumerator();
MyClass first = null;
if (enumerator.MoveNext())
first = enumerator.Current;
(end 280Z28)
Review next possibility, just exposes ICollection interface, so in Values() you can return your own implementation. This implementation will use only reference on Dictioanry.Values and always use ReadLock for access items.

C# Dictionary Loop Enhancment

I have a dictionary with around 1 milions items. I am constantly looping throw the dictionnary :
public void DoAllJobs()
{
foreach (KeyValuePair<uint, BusinessObject> p in _dictionnary)
{
if(p.Value.MustDoJob)
p.Value.DoJob();
}
}
The execution is a bit long, around 600 ms, I would like to deacrese it. Here is the contraints :
MustDoJob values mostly stay the same beetween two calls to DoAllJobs()
60-70% of the MustDoJob values == false
From time to times MustDoJob change for 200 000 pairs.
Some p.Value.DoJob() can not be computed at the same time (COM object call)
Here, I do not need the key part of the _dictionnary objet but I really do need it somewhere else
I wanted to do the following :
Parallelizes but I am not sure is going to be effective due to 4.
Sorts the dictionnary since 1. and 2. (and stop want I find the first MustDoJob == false) but I am wondering what 3. would result in
I did not implement any of the previous ideas since it could be a lot of job and I would like to investigate others options before. So...any ideas ?
What I would suggest is that your business object could raise an event to indicate that it needs to do a job when MustDoJob becomes true and you can subscribe to that event and store references to those objects in a simple list and then process the contents of that list when the DoAllJobs() method is called
My first suggestion would be to use just the values from the dictionary:
foreach (BusinessObject> value in _dictionnary.Values)
{
if(value.MustDoJob)
{
value.DoJob();
}
}
With LINQ this could be even easier:
foreach (BusinessObject value in _dictionnary.Values.Where(v => v.MustDoJob))
{
value.DoJob();
}
That makes it clearer. However, it's not clear what else is actually causing you a problem. How quickly do you need to be able to iterate over the dictionary? I expect it's already pretty nippy... is anything actually wrong with this brute force approach? What's the impact of it taking 600ms to iterate over the collection? Is that 600ms when nothing needs to do any work?
One thing to note: you can't change the contents of the dictionary while you're iterating over it - whether in this thread or another. That means not adding, removing or replacing key/value pairs. It's okay for the contents of a BusinessObject to change, but the dictionary relationship between the key and the object can't change. If you want to minimise the time during which you can't modify the dictionary, you can take a copy of the list of references to objects which need work doing, and then iterate over that:
foreach (BusinessObject value in _dictionnary.Values
.Where(v => v.MustDoJob)
.ToList())
{
value.DoJob();
}
Try using a profiler first. 4 makes me curious - 600ms may not be that much if the COM object uses most of the time, and then it is either paralellize or live with it.
I would get sure first - with a profiler run - that you dont target the totally wrong issue here.
Having established that the loop really is the problem (see TomTom's answer), I would maintain a list of the items on which MustDoJob is true -- e.g., when MustDoJob is set, add it to the list, and when you process and clear the flag, remove it from the list. (This might be done directly by the code manipulating the flag, or by raising an event when the flag changes; depends on what you need.) Then you loop through the list (which is only going to be 60-70% of the length), not the dictionary. The list might contain the object itself or just its key in the dictionary, although it will be more efficient if it holds the object itself as you avoid the dictionary lookup. It does depend on how frequently you're queuing 200k of them, and how time-critical the queuing vs. the execution is.
But again: Step 1 is make sure you're solving the right problem.
The use of a dictionary to me implies that the intention is to find items by a key, rather than visit every item. On the other hand, 600ms for looping through a million items is respectable.
Perhaps alter your logic so that you can simply pick the relevant items satisfying the condition directly out of the dictionary.
Use a List of KeyValuePairs instead. This means you can iterate over it super-quickly by doing
List<KeyValuePair<string,object>> list = ...;
int totalItems = list.Count;
for (int x = 0; x < totalItems; x++)
{
// whatever you plan to do with them, you have access to both KEY and VALUE.
}
I know this post is old, but I was looking for a way to iterate over a dictionary without the increased overhead of the Enumerator being created (GC and all), or generally a faster way to iterate over it.

How would you obtain the first and last items in a Queue?

Say I have a rolling collection of values where I specify the size of the collection and any time a new value is added, any old values beyond this specified size are dropped off. Obviously (and I've tested this) the best type of collection to use for this behavior is a Queue:
myQueue.Enqueue(newValue)
If myQueue.Count > specifiedSize Then myQueue.Dequeue()
However, what if I want to calculate the difference between the first and last items in the Queue? Obviously I can't access the items by index. But to switch from a Queue to something implementing IList seems like overkill, as does writing a new Queue-like class. Right now I've got:
Dim firstValue As Integer = myQueue.Peek()
Dim lastValue As Integer = myQueue.ToArray()(myQueue.Count - 1)
Dim diff As Integer = lastValue - firstValue
That call to ToArray() bothers me, but a superior alternative isn't coming to me. Any suggestions?
One thing you could do is have a temporary variable that stores the value that was just enqueued because that will be the last value and so the variable can be accessed to get that value.
Seems to me if you need quick access to the first item in the list, then you're using the wrong data structure. Switch a LinkedList instead, which conveniently has First and Last properties.
Be sure you only add and remove items to the linked list using AddLast and RemoveFirst to maintain the Queue property. To prevent yourself from inadvertantly violating the Queue property, consider creating a wrapper class around the linked list and exposing only the properties you need from your queue.
public class LastQ<T> : Queue<T>
{
public T Last { get; private set; }
public new void Enqueue(T item)
{
Last = item;
base.Enqueue(item);
}
}
Edit:
Obviously this basic class should be more robust to do things like protect the Last property on an empty queue. But this should be enough for the basic idea.
You could use a deque (double-ended queue).
I don't think there is one built into System.Collections(.Generic) but here's some info on the data structure. If you implemented something like this you could just use PeekLeft() and PeekRight() to get the first and last values.
Of course, it will be up to you whether or not implementing your own deque is preferable to dealing with the unsexiness of ToArray(). :)
http://www.codeproject.com/KB/recipes/deque.aspx
Your best bet would be to keep track of the last value added to the Queue, then use the myQueue.Peek() function to see the "first" (meaning next) item in the list without removing it.

Categories