Converting Object.GetHashCode() to Guid

Converting Object.GetHashCode() to Guid - c#

I need to assign a guid to objects for managing state at app startup & shutdown
It looks like i can store the lookup values in a dictionary using
dictionary<int,Guid>.Add(instance.GetHashCode(), myGUID());
are there any potential issues to be aware of here ?
NOTE
This does NOT need to persist between execution runs, only the guid like so
create the object
gethashcode(), associate with new or old guid
before app terminate, gethashcode() and lookup guid to update() or insert() into persistence engine USING GUID
only assumption is that the gethashcode() remains consistent while the process is running
also gethashcode() is called on the same object type (derived from window)
Update 2 - here is the bigger picture
create a state machine to store info about WPF user controls (later ref as UC) between runs
the types of user controls can change over time (added / removed)
in the very 1st run, there is no prior state, the user interacts with a subset of UC and modifies their state, which needs to recreated when the app restarts
this state snapshot is taken when the app has a normal shutdown
also there can be multiple instances of a UC type
at shutdown, each instance is assigned a guid and saved along with the type info and the state info
all these guids are also stored in a collection
at restart, for each guid, create object, store ref/guid, restore state per instance so the app looks exactly as before
the user may add or remove UC instances/types and otherwise interact with the system
at shutdown, the state is saved again
choices at this time are to remove / delete all prior state and insert new state info to the persistence layer (sql db)
with observation/analysis over time, it turns out that a lot of instances remain consistent/static and do not change - so their state need not be deleted/inserted again as the state info is now quite large and stored over a non local db
so only the change delta is persisted
to compute the delta, need to track reference lifetimes
currently stored as List<WeakReference> at startup
on shutdown, iterate through this list and actual UC present on screen, add / update / delete keys accordingly
send delta over to persistence
Hope the above makes it clear.
So now the question is - why not just store the HashCode (of usercontrol only)
instead of WeakReference and eliminate the test for null reference while
iterating thru the list
update 3 - thanks all, going to use weakreference finally

Use GetHashCode to balance a hash table. That's what it's for. Do not use it for some other purpose that it was not designed for; that's very dangerous.

You appear to be assuming that a hash code will be unique. Hash codes don't work like that. See Eric Lippert's blog post on Guidelines and rules for GetHashCode for more details, but basically you should only ever make the assumptions which are guaranteed for well-behaving types - namely the if two objects have different hash codes, they're definitely unequal. If they have the same hash code, they may be equal, but may not be.
EDIT: As noted, you also shouldn't persist hash codes between execution runs. There's no guarantee they'll be stable in the face of restarts. It's not really clear exactly what you're doing, but it doesn't sound like a good idea.
EDIT: Okay, you've now noted that it won't be persistent, so that's a good start - but you still haven't dealt with the possibility of hash code collisions. Why do you want to call GetHashCode() at all? Why not just add the reference to the dictionary?

The quick and easy fix seems to be
var dict = new Dictionary<InstanceType, Guid>();
dict.Add(instance, myGUID());
Of course you need to implement InstanceType.Equals correctly if it isn't yet. (Or implement IEQuatable<InstanceType>)

Possible issues I can think of:
Hash code collisions could give you duplicate dictionary keys
Different object's hash algorithms could give you the same hash code for two functionally different objects; you wouldn't know which object you're working with
This implementation is prone to ambiguity (as described above); you may need to store more information about your objects than just their hash codes.
Note - Jon said this more elegantly (see above)

Since this is for WPF controls, why not just add the Guid as a dependency proptery? You seem to already be iterating through the user controls, in order to get their hash codes, so this would probably be a simpler method.
If you want to capture that a control was removed and which Guid it had, some manager object that subscribes to closing/removed events and just store the Guid and a few other details would be a good idea. Then you would also have an easier time to capture more details for analysis if you need.

Related

Keeping class state valid vs performance

If i have public method that returns a reference type value, which is private field in the current class, do i need to return a copy of it? In my case i need to return List, but this method is called very often and my list holds ~100 items. The point is that if i return the same variable, everybody can modify it, but if i return a copy, the performance will degrade. In my case im trying to generate sudoku table, which is not fast procedure.
Internal class SudokuTable holds the values with their possible values. Public class SudokuGame handles UI requests and generates/solves SudokuTable. Is it good practice to chose performance instead OOP principles? If someone wants to make another library using my SudokuTable class, he wont be aware that he can brake its state with modifying the List that it returns.

Performance and object-oriented programming are not mutually exclusive - your code can be object-oriented and perform badly, etc.
In the case you state here I don't think it would be wise to allow external parts edit the internal state of a thing, so I would return an array or ReadOnlyCollection of the entries (it could be a potential possibility to use an ObservableCollection and monitor for tampering out-of-bounds, and 'handling' that accordingly (say, with an exception or something) - unsure how desirable this would be).
From there, you might consider how you expose access to these entries, trying to minimise the need for callers to get the full collection when all they need is to look up and return a specific one.
It's worth noting that an uneditable collection doesn't necessarily mean the state cannot be altered, either; if the entries are represented by a reference type rather than a value type then returning an entry leaves that open to tampering (potentially, depending on the class definition), so you might be better off with structs for the entry types.
At length, this, without a concrete example of where you're having problems, is a bit subjective and theoretical at the moment. Have you tried restricting the collection? And if so, how was the performance? Where were the issues? And so on.

Call methods from set accessor or force the user to do manually

When a property is updated is it good practice to change other properties based on this or should you force the user to call a method directly? For example:
someObject.TodaysTotalSales = 1234.56;
Would it be OK to have the set accessor update another value say ThisYearsTotalSales or should you force the end user to do it manually.
someObject.TodaysTotalSales = 1234.56;
someObject.UpdateThisYearsTotal();

I think the best practise is to recalculate the total year consumption only when it is accessed. Otherwise if you update the TodaysTotalSales property very often, you will compute the total year count for nothing.
More generally, when you call a property setter, you don't expect a complex operation. By convention, getters and setters are expected to return almost immediately.
If your algorithm is too complex, in that case you can use a cache value to avoid a recalculation at each call; you invalidate the cache value when one of its prerequisite has changed

It depends.
Does he need to know the TotalYearsOfSales even after he updated TodaysSales?
Yes -> Provide an additional method to update someObject.UpdateThisYearsTotal(); and at the same time flag that he has not updated YearsTotal while he did update TodaysSales, so you can throw some error at the end of the process if needed
No -> Autoupdate other properties of which the values are not needed to prior to updating the TodaysSales

TL;DR: it depends
I assume you have public interface of a class in mind.
If you follow OOP Encapsulation principle to the limit, then someObject's externally visible state should be consistent with every public access, i.e. you shouldn't need any public UpdateState methods. So in this case someObject.UpdateThisYearsTotal() is a no-no. What happens internally: be it lazy recalculation, caching, private UpdateAllInternal - would not matter.
But OOP is not an icon/idol - so for performance reasons you may design program flow as you see fit. For example: deferred bulk data processing, game loop, Entity Component System design, ORMs - those systems clearly state in their docs (rarely in code contracts) the way they are supposed to be used.

Is it possible to create a truely weak-keyed dictionary in C#?

I'm trying to nut out the details for a true WeakKeyedDictionary<,> for C#... but I'm running into difficulties.
I realise this is a non-trivial task, but the seeming inability to declare a WeakKeyedKeyValuePair<,> (where the GC only follows the value reference if the key is reachable) makes it seemingly impossible.
There are two main problems I see:
Every implementation I've so far seen does not trim values after keys have been collected. Think about that - one of the main reasons for using such a Dictionary is to prevent those values being kept around (not just the keys!) as they're unreachable, but here they are left pointed to by strong references.
Yes, add/remove from the Dictionary enough and they'll eventually be replaced, but what if you don't?
Without a hypothetical WeakKeyedKeyValuePair<,> (or another means of telling the GC to only mark the value if the key is reachable) any value that refers to it's key would never be collected. This is a problem when storing arbitrary values.
Problem 1 could be tackled in a fairly non-ideal/hackish way : use GC Notifications to wait for a full GC to complete, and then go along and prune the dictionary in another thread. This one I'm semi-ok with.
But problem 2 has me stumped. I realise this is easily countered by a "so don't do that", but it has me wondering - is this problem even possible to solve?

Have a look at the ConditionalWeakTable<TKey, TValue> Class.
Enables compilers to dynamically attach object fields to managed objects.
It's essentially a dictionary where both the key and the value are a WeakReference, and the value is kept alive as long as the key is alive.
Note! This class does not use GetHashCode and Equals to do equality comparisons, it uses ReferenceEquals.

Hashing the state of a complex object in .NET

Some background information:
I am working on a C#/WPF application, which basically is about creating, editing, saving and loading some data model.
The data model contains of a hierarchy of various objects. There is a "root" object of class A, which has a list of objects of class B, which each has a list of objects of class C, etc. Around 30 classes involved in total.
Now my problem is that I want to prompt the user with the usual "you have unsaved changes, save?" dialog, if he tries to exit the program. But how do I know if the data in current loaded model is actually changed?
There is of course ways to solve this, like e.g. reloading the model from file and compare against the one in memory value by value or make every UI control set a flag indicating the model has been changed. Now instead, I want to create a hash value based on the model state on load and generate a new value when the user tries to exit, and compare those two.
Now the question:
So inspired of that, I was wondering if there exist some way to generate a hash value from the (value)state of some arbitrary complex object? Preferably in a generic way, e.g. no need to apply attributes to each involved class/field.
One idea could be to use some of .NET's serialization functionality (assuming it will work out-of-the-box in this case) and apply a hash function to the content of the resulting file. However, I guess there exist some more suitable approach.
Thanks in advance.
Edit:
Point taken about the hashing and possible collisions. Instead, I am going for deep comparing value by value. I am already using the XML serializer for persistence, so I am just going to serialize and compare chars. Not pretty, but it does the trick in this case.

Ok you can use reflection and some sort of recursive function of course.
But keep in mind that every object is a model of a particular thing. I mean there maybe a lot of "unimportant" fields and properties.
And, thanks to #compie!
You can create a hash function just for your domain. But this requires strong mathematic skills.
And you can try to use classic hash functions like SHA. Just assume that your object is a string or byte array.

Because this is a WPF app, it may be easier than you think to be notified of changes as they happen. The event architecture of WPF allows you to create event handlers at a level somewhere above where the event actually originates. So, you could create event handlers for the various "change" events of your UI elements in the root window of your interface and set the "changed" flag at that scope.
WPF Routed Events Overview

I would advice against this. Different objects can have the same hash. It's not safe to rely on this for checking if changes have to be saved.

C#: How to implement a smart cache

I have some places where implementing some sort of cache might be useful. For example in cases of doing resource lookups based on custom strings, finding names of properties using reflection, or to have only one PropertyChangedEventArgs per property name.
A simple example of the last one:
public static class Cache
{
private static Dictionary<string, PropertyChangedEventArgs> cache;
static Cache()
{
cache = new Dictionary<string, PropertyChangedEventArgs>();
}
public static PropertyChangedEventArgs GetPropertyChangedEventArgs(
string propertyName)
{
if (cache.ContainsKey(propertyName))
return cache[propertyName];
return cache[propertyName] = new PropertyChangedEventArgs(propertyName);
}
}
But, will this work well? For example if we had a whole load of different propertyNames, that would mean we would end up with a huge cache sitting there never being garbage collected or anything. I'm imagining if what is cached are larger values and if the application is a long-running one, this might end up as kind of a problem... or what do you think? How should a good cache be implemented? Is this one good enough for most purposes? Any examples of some nice cache implementations that are not too hard to understand or way too complex to implement?

This is a large problem, you need to determine the domain of the problem and apply the correct techniques. For instance, how would you describe the expiration of the objects? Do they become stale over a fixed interval of time? Do they become stale from an external event? How frequently does this happen? Additionally, how many objects do you have? Finally, how much does it cost to generate the object?
The simplest strategy would be to do straight memoization, as you have above. This assumes that objects never expire, and that there are not so many as to run your memory dry and that you think the cost to create these objects warrants the use of a cache to begin with.
The next layer might be to limit the number of objects, and use an implicit expiration policy, such as LRU (least recently used). To do this you'd typically use a doubly linked list in addition to your dictionary, and every time an objects is accessed it is moved to the front of the list. Then, if you need to add a new object, but it is over your limit of total objects, you'd remove from the back of the list.
Next, you might need to enforce explicit expiration, either based on time, or some external stimulus. This would require you to have some sort of expiration event that could be called.
As you can see there is alot of design in caching, so you need to understand your domain and engineer appropriately. You did not provide enough detail for me to discuss specifics, I felt.
P.S. Please consider using Generics when defining your class so that many types of objects can be stored, thus allowing your caching code to be reused.

You could wrap each of your cached items in a WeakReference. This would allow the GC to reclaim items if-and-when required, however it doesn't give you any granular control of when items will disappear from the cache, or allow you to implement explicit expiration policies etc.
(Ha! I just noticed that the example given on the MSDN page is a simple caching class.)

Looks like .NET 4.0 now supports System.Runtime.Caching for caching many types of things. You should look into that first, instead of re-inventing the wheel. More details:
http://msdn.microsoft.com/en-us/library/system.runtime.caching%28VS.100%29.aspx

This is a nice debate to have, but depending your application, here's some tips:
You should define the max size of the cache, what to do with old items if your cache is full, have a scavenging strategy, determine a time to live of the object in the cache, does your cache can/must be persisted somewhere else that memory, in case of application abnormal termination, ...

This is a common problem that has many solutions depending on your application need.
It is so common that Microsoft released a whole library to address it.
You should check out Microsoft Velocity before rolling up your own cache.
http://msdn.microsoft.com/en-us/data/cc655792.aspx
Hope this help.

You could use a WeakReference but if your object is not that large than don't because the WeakReference would be taking more memory than the object itself which is not a good technique. Also, if the object is a short-time usage where it will never make it to generation 1 from generation 0 on the GC, there is not much need for the WeakReference but IDisposable interface on the object would have with the release on SuppressFinalize.
If you want to control the lifetime you need a timer to update the datetime/ timespan again the desiredExpirationTime on the object in your cache.
The important thing is if the object is large then opt for the WeakReference else use the strong reference. Also, you can set the capacity on the Dictionary and create a queue for requesting additional objects in your temp bin serializing the object and loading it when there is room in the Dictionary, then clear it from the temp directory.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.