Is it possible to create a truely weak-keyed dictionary in C#? - c#

I'm trying to nut out the details for a true WeakKeyedDictionary<,> for C#... but I'm running into difficulties.
I realise this is a non-trivial task, but the seeming inability to declare a WeakKeyedKeyValuePair<,> (where the GC only follows the value reference if the key is reachable) makes it seemingly impossible.
There are two main problems I see:
Every implementation I've so far seen does not trim values after keys have been collected. Think about that - one of the main reasons for using such a Dictionary is to prevent those values being kept around (not just the keys!) as they're unreachable, but here they are left pointed to by strong references.
Yes, add/remove from the Dictionary enough and they'll eventually be replaced, but what if you don't?
Without a hypothetical WeakKeyedKeyValuePair<,> (or another means of telling the GC to only mark the value if the key is reachable) any value that refers to it's key would never be collected. This is a problem when storing arbitrary values.
Problem 1 could be tackled in a fairly non-ideal/hackish way : use GC Notifications to wait for a full GC to complete, and then go along and prune the dictionary in another thread. This one I'm semi-ok with.
But problem 2 has me stumped. I realise this is easily countered by a "so don't do that", but it has me wondering - is this problem even possible to solve?

Have a look at the ConditionalWeakTable<TKey, TValue> Class.
Enables compilers to dynamically attach object fields to managed objects.
It's essentially a dictionary where both the key and the value are a WeakReference, and the value is kept alive as long as the key is alive.
Note! This class does not use GetHashCode and Equals to do equality comparisons, it uses ReferenceEquals.

Related

Maintaining data locality in a Dictionary<TKey,TValue>

I'm making a game and I decided that for reasons, I'd give each game object an int entity ID that I could easily search them by instead of having to linearly search a list or worse, many lists. The idea was inspired by the ECS pattern and I figured if I made sure to re-use ints when they were destroyed, it would help keep all the data close together in memory and reduce cache misses by a bit. (I know that depends more on access order, just thinking in the abstract here). The problem is I'm now doubting myself and I've read so much that I can't keep the ideas straight in my head.
The question is essentially if I keep endlessly adding higher numbered keys to a Dictionary<int, SomeClass>, will the speed/memory usage be worse than if I try to re-use lower numbers?
Note: I feel like the answer is going to be "write your own class" but I was trying to avoid that and I don't think I'd do a good job if I don't understand this concept.
No, it makes no difference at all. From MSDN:
The Dictionary generic class provides a mapping from a set of keys to a set of values. Each addition to the dictionary consists of a value and its associated key. Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table.
So, the speed will always be O(1) because it internally uses a hash table, the value of the key doesn't affects it at all.
The only problem you can face is if you reach int.MaxValue, that's up to your scenerio.
Okay here's my best effort at answering this myself, apologies if I get anything wrong.
Short answer: No. If you add higher numbers they just get stuck somewhere into the array until it's full. The solution to the example problem is to just replace the dictionary with a GameObject array and use the int as an index, and if necessary write a class to handle expanding it.
Longer answer: I think my confusion came from reading somewhere that a dictionary was just a pair of parallel arrays or something like that. I guess that's true but since it's indexed by hash codes, it's not intended for contiguous index values. So it's doing a bunch of redundant work to handle cases that I'm never going to use it for.

Keeping class state valid vs performance

If i have public method that returns a reference type value, which is private field in the current class, do i need to return a copy of it? In my case i need to return List, but this method is called very often and my list holds ~100 items. The point is that if i return the same variable, everybody can modify it, but if i return a copy, the performance will degrade. In my case im trying to generate sudoku table, which is not fast procedure.
Internal class SudokuTable holds the values with their possible values. Public class SudokuGame handles UI requests and generates/solves SudokuTable. Is it good practice to chose performance instead OOP principles? If someone wants to make another library using my SudokuTable class, he wont be aware that he can brake its state with modifying the List that it returns.
Performance and object-oriented programming are not mutually exclusive - your code can be object-oriented and perform badly, etc.
In the case you state here I don't think it would be wise to allow external parts edit the internal state of a thing, so I would return an array or ReadOnlyCollection of the entries (it could be a potential possibility to use an ObservableCollection and monitor for tampering out-of-bounds, and 'handling' that accordingly (say, with an exception or something) - unsure how desirable this would be).
From there, you might consider how you expose access to these entries, trying to minimise the need for callers to get the full collection when all they need is to look up and return a specific one.
It's worth noting that an uneditable collection doesn't necessarily mean the state cannot be altered, either; if the entries are represented by a reference type rather than a value type then returning an entry leaves that open to tampering (potentially, depending on the class definition), so you might be better off with structs for the entry types.
At length, this, without a concrete example of where you're having problems, is a bit subjective and theoretical at the moment. Have you tried restricting the collection? And if so, how was the performance? Where were the issues? And so on.

Converting Object.GetHashCode() to Guid

I need to assign a guid to objects for managing state at app startup & shutdown
It looks like i can store the lookup values in a dictionary using
dictionary<int,Guid>.Add(instance.GetHashCode(), myGUID());
are there any potential issues to be aware of here ?
NOTE
This does NOT need to persist between execution runs, only the guid like so
create the object
gethashcode(), associate with new or old guid
before app terminate, gethashcode() and lookup guid to update() or insert() into persistence engine USING GUID
only assumption is that the gethashcode() remains consistent while the process is running
also gethashcode() is called on the same object type (derived from window)
Update 2 - here is the bigger picture
create a state machine to store info about WPF user controls (later ref as UC) between runs
the types of user controls can change over time (added / removed)
in the very 1st run, there is no prior state, the user interacts with a subset of UC and modifies their state, which needs to recreated when the app restarts
this state snapshot is taken when the app has a normal shutdown
also there can be multiple instances of a UC type
at shutdown, each instance is assigned a guid and saved along with the type info and the state info
all these guids are also stored in a collection
at restart, for each guid, create object, store ref/guid, restore state per instance so the app looks exactly as before
the user may add or remove UC instances/types and otherwise interact with the system
at shutdown, the state is saved again
choices at this time are to remove / delete all prior state and insert new state info to the persistence layer (sql db)
with observation/analysis over time, it turns out that a lot of instances remain consistent/static and do not change - so their state need not be deleted/inserted again as the state info is now quite large and stored over a non local db
so only the change delta is persisted
to compute the delta, need to track reference lifetimes
currently stored as List<WeakReference> at startup
on shutdown, iterate through this list and actual UC present on screen, add / update / delete keys accordingly
send delta over to persistence
Hope the above makes it clear.
So now the question is - why not just store the HashCode (of usercontrol only)
instead of WeakReference and eliminate the test for null reference while
iterating thru the list
update 3 - thanks all, going to use weakreference finally
Use GetHashCode to balance a hash table. That's what it's for. Do not use it for some other purpose that it was not designed for; that's very dangerous.
You appear to be assuming that a hash code will be unique. Hash codes don't work like that. See Eric Lippert's blog post on Guidelines and rules for GetHashCode for more details, but basically you should only ever make the assumptions which are guaranteed for well-behaving types - namely the if two objects have different hash codes, they're definitely unequal. If they have the same hash code, they may be equal, but may not be.
EDIT: As noted, you also shouldn't persist hash codes between execution runs. There's no guarantee they'll be stable in the face of restarts. It's not really clear exactly what you're doing, but it doesn't sound like a good idea.
EDIT: Okay, you've now noted that it won't be persistent, so that's a good start - but you still haven't dealt with the possibility of hash code collisions. Why do you want to call GetHashCode() at all? Why not just add the reference to the dictionary?
The quick and easy fix seems to be
var dict = new Dictionary<InstanceType, Guid>();
dict.Add(instance, myGUID());
Of course you need to implement InstanceType.Equals correctly if it isn't yet. (Or implement IEQuatable<InstanceType>)
Possible issues I can think of:
Hash code collisions could give you duplicate dictionary keys
Different object's hash algorithms could give you the same hash code for two functionally different objects; you wouldn't know which object you're working with
This implementation is prone to ambiguity (as described above); you may need to store more information about your objects than just their hash codes.
Note - Jon said this more elegantly (see above)
Since this is for WPF controls, why not just add the Guid as a dependency proptery? You seem to already be iterating through the user controls, in order to get their hash codes, so this would probably be a simpler method.
If you want to capture that a control was removed and which Guid it had, some manager object that subscribes to closing/removed events and just store the Guid and a few other details would be a good idea. Then you would also have an easier time to capture more details for analysis if you need.

Performance recommendations of millions objects creation

My code has to generate millions object to perform some algorithm (millions objects will be created and at the same time 2/3 of them should be destroyed).
I know that object creation causes performance problems.
Could someone recommend how to manage so huge amount of objects, garbage collection and so on?
Thank you.
Elaborating a bit on my "make them a value type" comment above.
If you have a struct Foo, then preparing for the algorithm with e.g. var storage = new Foo[1000000] will only allocate one big block of memory (I 'm assuming the required amount of contiguous memory will be available).
You can then manually manage the memory inside that block to avoid performing more memory allocations:
Keep a count of how many slots in the array are actually used
To "create" a new Foo, put it at the first unused slot and increment the counter
To "delete" a Foo, swap it with the one in last used slot and decrement the counter
Of course making an algorithm work with value types vs reference types is not as simple as changing class to struct. But if workable it will allow you to side-step all of this overhead for an one-time startup cost.
If it is possible in your algorithm then try to reuse objects - if 2/3 are destroyed immedietly then you can try to use them again.
You can implement IDisposable interface on the type whose object is been created. Then you can implment using keyword and write whatever logic involving the object within the using scope. The following links will give you a fair idea of what i am trying to say. Hope they are of some help.
http://www.codeguru.com/csharp/csharp/cs_syntax/interfaces/article.php/c8679
Am I implementing IDisposable correctly?
Regards,
Samar

Should I check whether particular key is present in Dictionary before accessing it?

Should I check whether particular key is present in Dictionary if I am sure it will be added in dictionary by the time I reach the code to access it?
There are two ways I can access the value in dictionary
checking ContainsKey method. If it returns true then I access using indexer [key] of dictionary object.
or
TryGetValue which will return true or false as well as return value through out parameter.
(2nd will perform better than 1st if I want to get value. Benchmark.)
However if I am sure that the function which is accessing global dictionary will surely have the key then should I still check using TryGetValue or without checking I should use indexer[].
Or I should never assume that and always check?
Use the indexer if the key is meant to be present - if it's not present, it will throw an appropriate exception, which is the right behaviour if the absence of the key indicates a bug.
If it's valid for the key not to be present, use TryGetValue instead and react accordingly.
(Also apply Marc's advice about accessing a shared dictionary safely.)
If the dictionary is global (static/shared), you should be synchronizing access to it (this is important; otherwise you can corrupt it).
Even if your thread is only reading data, it needs to respect the locks of other threads that might be editing it.
However; if you are sure that the item is there, the indexer should be fine:
Foo foo;
lock(syncLock) {
foo = data[key];
}
// use foo...
Otherwise, a useful pattern is to check and add in the same lock:
Foo foo;
lock(syncLock) {
if(!data.TryGetValue(key, out foo)) {
foo = new Foo(key);
data.Add(key, foo);
}
}
// use foo...
Here we only add the item if it wasn't there... but inside the same lock.
Always check. Never say never. I assume your application is not that performance critical that you will have to save the checking time.
TIP: If you decide not to check, at least use Debug.Assert( dict.ContainsKey( key ) ); This will only be compiled when in Debug mode, your release build will not contain it. That way you could at least have the check when debugging.
Still: if possible, just check it :-)
EDIT: There have been some misconceptions here. By "always check" I did not only mean using an if somewhere. Handling an exception properly was also included in this. So, to be more precise: never take anything for granted, expect the unexpected. Check by ContainsKey or handle the potential exception, but do SOMETHING in case the element is not contained.
Personally I'd check the key is there, regardless of whether or not you are SURE it is, some may say this check is superfluous and that dictionary will throw an exception which you can catch, but imho you should not rely on that exception, you should check yourself and then either throw your own exception which means something or a result object with a success flag and reason inside... the failure mechanism is really implementation dependant.
Surely the answer is "it all depends on the situation". You need to balance the risk that the key will be missing from the dictionary (low for small systems where there is limited access to the data, where you can rely on the order things are done, larger for larger systems, multiple programmers accessing the same data, especially with read/write/delete access, where threads are involved and order cannot be guaranteed or where data originates externally and reading can fail) with the impact of the risk (safety-critical systems, commercial releases or systems that a business will rely on compared with something made for fun, for a one-off job and/or for your use only) and with any requirements for speed, size and laziness.
If I were making a system to control railway signalling I would want to be safe against all possible and impossible errors, and safe from errors in the error-handling and so on (Murphy's 2nd law: "what can't go wrong will go wrong".) If I'm chucking stuff together for fun, even if size and speed are not an issue I will be MUCH more relaxed about stuff like this - I will want to get to the fun stuff.
Of course, sometimes this is the fun stuff in itself.
TryGetValue is the same code as indexing it by key, except the former returns a default value (for the out parameter) where the latter throws an exception. Use TryGetValue and you'll get consistent checks with absolutely no performance loss.
Edit: As Jon said, if you know it will always have the key, then you can index it and let it throw the appropriate exception. However, if you can provide better context information by throwing it yourself with a detailed message, that would be preferable.
There's 2 trains of thought on this from a performance point of view.
1) Avoid exceptions where possible, as exceptions are expensive - i.e. check before you try to retrieve a specific key from the dictionary, whether it exists or not. Better approach in my opinion if there's a fair chance it may not exist. This would prevent fairly common exceptions.
2) If you're confident the item will exist in there 99% of the time, then don't check for it's existence before accessing it. The 1% of times when it doesn't exist, an exception will be thrown but you've saved time for the other 99% of the time by not checking.
What I'm saying is, optimise for the majority if there is a clear one. If there is any real degree in uncertainty about an item existing, then check before retrieving.
If you know that the dictionary normally contains the key, you don't have to check for it before accessing it.
If something would be wrong and the dictionary doesn't contain the items that it should, you can let the dictionary throw the exception. The only reason for checking for the key first would be if you want to take care of this problem situation yourself without getting the exception. Letting the dictionary throw the exception and catch that is however a perfectly valid way of handling the situation.
I think Marc and Jon have it (as usual) pretty sown up. Since you also mention performance in your question it might be worth considering how you lock the dictionary.
The straightforward lock serialises all read access which may not be desirable if read is massively frequent and writes are relatively few. In that case using a ReaderWriterLockSlim might be better. The downside is the code is a little more complex and writes are slightly slower.

Categories