Object to GUID/UUID - c#

I want to take any object and get a guid that represents that object.
I know that entails a lot of things. I am looking for a good-enough solution for common applications.
My specific use case is for caching, I want to know that the object used to create the thing I am caching has already made one in the past. There would be 2 different types of objects. Each type contains only public properties, and may contain a list/ienumable.
Assuming the object could be serializable my first idea was to serialize it to json (via native jsonserlizer or newtonsoft) and then take the json string and convert that to a uuid version 5 as detailed in a gist here How can I generate a GUID for a string?
My second approach if it's not serializable ( for example contained a dictionary ) would be to use reflection on the public properties to generate a unique string of some sort and then convert that to uuid version 5.
Both approaches use uuid version 5 to take a string to guid. Is there a proven c# class that makes valid uuid 5 guids? The gist looks good but want to be sure.
I was thinking of making the c# namespace and type name be the namespace for the uuid 5. Is that a valid use of namespace ?
My first approach is good enough for my simple use case but I wanted to explore the second approach as it's more flexible.
If creating the guid couldn't guarantee reasonable uniqueness it should throw an error. Surely super complicated objects would fail. How might I know that is the case if using reflection?
I am looking for new approaches or concerns/implementations to the second approach.
Edit: The reason why I bounty/reopened this almost 3 years later is because I need this again (and for caching again); but also because of the introduction of the generic unmanaged constraint in c# 7.3. The blog post at http://devblogs.microsoft.com/premier-developer/dissecting-new-generics-constraints-in-c-7-3/ seems to suggest that if the object can obey the unmanaged spec you can find a suitable key for a key-value store. Am I misunderstanding something?
This is still limited because the object (generic) must obey the unmanaged type constraint which is very limiting (no strings, no arrays, etc), but its one step closer. I don't completely understand why the method of getting the memory stream and getting a sha1 hash cant be done on not unmanaged typed.
I understand that reference types are pointing to places in memory and its not as easy to get the memory that represents all whole object; but it feels doable. After all, objects eventually are made up a bunch of implementations of unmanaged types (string is an array chars, etc)
PS: The requirement of GUID is loose, any integer/string at or under 512 bits would suffice

The problem of equality is a difficult one.
Here some thoughts on how you could solve your problem.
Hashing a serialized object
One method would be to serialize an object and then hash the result as proposed by Georg.
Using the md5 checksum gives you a strong checksum with the right input.
But getting it right is the problem.
You might have trouble using a common serialization framework, because:
They don't care whether a float is 1.0 or 1.000000000000001.
They might have a different understanding about what is equal than you / your employer.
They bloat the serialized text with unneeded symbols. (performance)
Just a little deviation in the serialized text causes a large deviation in the hashed GUID/UUID.
That's why, you should carefully test any serialization you do.
Otherwise you might get false possitives/negatives for objects (mostly false negatives).
Some points to think about:
Floats & Doubles:
Always write them the same way, preferably with the same number of digits to prevent something like 1.000000000000001 vs 1.0 from interfering.
DateTime, TimeStamp, etc.:
Apply a fixed format that wont change and is unambiguous.
Unordered collections:
Sort the data before serializing it. The order must be unambiguous
Strings:
Is the equality case-sensitive? If not make all the strings lower or upper case.
If necessary, make them culture invariant.
More:
For every type, think carefully what is equal and what is not. Think especially about edge cases. (float.NaN, -0 vs 0, null, etc.)
It's up to you whether you use an existing serializer or do it yourself.
Doing it yourself is more work and error prone, but you have full control over all aspects of equality and serialization.
Using an existing serializer is also error prone, because you need to test or prove whether the results are always like you want.
Introducing an unambiguous order and use a tree
If you have control over the source code, you can introduce a custom order function.
The order must take all properties, sub objects, lists, etc. into account.
Then you can create a binary tree, and use the order to insert and lookup objects.
The same problems as mentioned by the first approach still apply, you need to make sure that equal values are detected as such.
The big O performance is also worse than using hashing. But in most real live examples, the actual performance should be comparable or at least fast enough.
The good thing is, you can stop comparing two objects, as soon as you found a property or value that is not equal. Thus no need to always look at the whole object.
A binary tree needs O(log2(n)) comparisons for a lookup, thus that would be quite fast.
The bad thing is, you need access to all actual objects, thus keep them in memory.
A hashtable needs only O(1) comparisons for a lookup, thus would even be faster (theoretically at least).
Put them in a database
If you store all your objects in a database, then the database can do the lookup for you.
Databases are quite good in comparing objects and they have built in mechanisms to handle the equality/near equality problem.
I'm not a database expert, so for this option, someone else might have more insight on how good this solution is.

As others have said in comments, it sounds like GetHashCode might do the trick for you if you're willing to settle for int as your key. If not, there is a Guid constructor that takes byte[] of length 16. You could try something like the following
using System.Linq;
class Foo
{
public int A { get; set; }
public char B { get; set; }
public string C { get; set; }
public Guid GetGuid()
{
byte[] aBytes = BitConverter.GetBytes(A);
byte[] bBytes = BitConverter.GetBytes(B);
byte[] cBytes = BitConverter.GetBytes(C);
byte[] padding = new byte[16];
byte[] allBytes =
aBytes
.Concat(bBytes)
.Concat(cBytes)
.Concat(padding)
.Take(16)
.ToArray();
return new Guid(allBytes);
}
}

As said in the comments, there is no bullet entirely out of silver here, but a few that come quite close. Which of them to use depends on the types you want to use your class with and your context, e.g. when do you consider two objects to be equal. However, be aware that you will always face possible conflicts, a single GUID will not be sufficient to guarantee collision avoidance. All you can do is to decrease the probability of a collision.
In your case,
already made one in the past
sounds like you don't want to refer to reference equality but want to use a notion of value equality. The simplest way to do so is to trust that the classes implement equality using value equality because in that case, you would already be done using GetHashCode but that has a higher probability of collisions because it is only 32bit. Further, you would assume that whoever wrote the class did a good job, which is not always a good assumption to be made, particularly since people tend to blame you rather then themselves.
Otherwise, your best chances are serialization combined with a hashing algorithm of your choice. I would recommend MD5 because it is the fastest and produces the 128bit you need for a GUID. If you say your types consist of public properties only, I would suggest to use an XmlSerializer like so:
private MD5 _md5 = new MD5CryptoServiceProvider();
private Dictionary<Type, XmlSerializer> _serializers = new Dictionary<Type, XmlSerializer>();
public Guid CreateID(object obj)
{
if (obj == null) return Guid.Empty;
var type = obj.GetType();
if (!_serializers.TryGetValue(type, out var serializer))
{
serializer = new XmlSerializer(type);
_serializers.Add(type, serializer);
}
using (var stream = new MemoryStream())
{
serializer.Serialize(stream, obj);
stream.Position = 0;
return new Guid(_md5.ComputeHash(stream));
}
}
Just about all serializers have their drawbacks. XmlSerializer is not capable of serializing cyclic object graphs, DataContractSerializer requires your types to have dedicated attributes and also the old serializers based on the SerializableAttribute require that attribute to be set. You somehow have to make assumptions.

Related

What to return when overriding Object.GetHashCode() in classes with no immutable fields?

Ok, before you get all mad because there are hundreds of similar sounding questions posted on the internet, I can assure you that I have just spent the last few hours reading all of them and have not found the answer to my question.
Background:
Basically, one of my large scale applications had been suffering from a situation where some Bindings on the ListBox.SelectedItem property would stop working or the program would crash after an edit had been made to the currently selected item. I initially asked the 'An item with the same key has already been added' Exception on selecting a ListBoxItem from code question here, but got no answers.
I hadn't had time to address that problem until this week, when I was given a number of days to sort it out. Now to cut a long story short, I found out the reason for the problem. It was because my data type classes had overridden the Equals method and therefore the GetHashCode method as well.
Now for those of you that are unaware of this issue, I discovered that you can only implement the GetHashCode method using immutable fields/properties. Using a excerpt from Harvey Kwok's answer to the Overriding GetHashCode() post to explain this:
The problem is that GetHashCode is being used by Dictionary and HashSet collections to place each item in a bucket. If hashcode is calculated based on some mutable fields and the fields are really changed after the object is placed into the HashSet or Dictionary, the object can no longer be found from the HashSet or Dictionary.
So the actual problem was caused because I had used mutable properties in the GetHashCode methods. When users changed these property values in the UI, the associated hash code values of the objects changed and then items could no longer be found in their collections.
Question:
So, my question is what is the best way of handling the situation where I need to implement the GetHashCode method in classes with no immutable fields? Sorry, let me be more specific, as that question has been asked before.
The answers in the Overriding GetHashCode() post suggest that in these situations, it is better to simply return a constant value... some suggest to return the value 1, while other suggest returning a prime number. Personally, I can't see any difference between these suggestions because I would have thought that there would only be one bucket used for either of them.
Furthermore, the Guidelines and rules for GetHashCode article in Eric Lippert's Blog has a section titled Guideline: the distribution of hash codes must be "random" which highlights the pitfalls of using an algorithm that results in not enough buckets being used. He warns of algorithms that decrease the number of buckets used and cause a performance problem when the bucket gets really big. Surely, returning a constant falls into this category.
I had an idea of adding an extra Guid field to all of my data type classes (just in C#, not the database) specifically to be used in and only in the GetHashCode method. So I suppose at the end of this long intro, my actual question is which implementation is better? To summarise:
Summary:
When overriding Object.GetHashCode() in classes with no immutable fields, is it better to return a constant from the GetHashCode method, or to create an additional readonly field for each class, solely to be used in the GetHashCode method? If I should add a new field, what type should it be and shouldn't I then include it in the Equals method?
While I am happy to receive answers from anyone, I am really hoping to receive answers from advanced developers with a sound knowledge on this subject.
Go back to basics. You read my article; read it again. The two ironclad rules that are relevant to your situation are:
if x equals y then the hash code of x must equal the hash code of y. Equivalently: if the hash code of x does not equal the hash code of y then x and y must be unequal.
the hash code of x must remain stable while x is in a hash table.
Those are requirements for correctness. If you can't guarantee those two simple things then your program will not be correct.
You propose two solutions.
Your first solution is that you always return a constant. That meets the requirement of both rules, but you are then reduced to linear searches in your hash table. You might as well use a list.
The other solution you propose is to somehow produce a hash code for each object and store it in the object. That is perfectly legal provided that equal items have equal hash codes. If you do that then you are restricted such that x equals y must be false if the hash codes differ. This seems to make value equality basically impossible. Since you wouldn't be overriding Equals in the first place if you wanted reference equality, this seems like a really bad idea, but it is legal provided that equals is consistent.
I propose a third solution, which is: never put your object in a hash table, because a hash table is the wrong data structure in the first place. The point of a hash table is to quickly answer the question "is this given value in this set of immutable values?" and you don't have a set of immutable values, so don't use a hash table. Use the right tool for the job. Use a list, and live with the pain of doing linear searches.
A fourth solution is: hash on the mutable fields used for equality, remove the object from all hash tables it is in just before every time you mutate it, and put it back in afterwards. This meets both requirements: the hash code agrees with equality, and hashes of objects in hash tables are stable, and you still get fast lookups.
I would either create an additional readonly field or else throw NotSupportedException. In my view the other option is meaningless. Let's see why.
Distinct (fixed) hash codes
Providing distinct hash codes is easy, e.g.:
class Sample
{
private static int counter;
private readonly int hashCode;
public Sample() { this.hashCode = counter++; }
public override int GetHashCode()
{
return this.hashCode;
}
public override bool Equals(object other)
{
return object.ReferenceEquals(this, other);
}
}
Technically you have to look out for creating too many objects and overflowing the counter here, but in practice I think that's not going to be an issue for anyone.
The problem with this approach is that instances will never compare equal. However, that's perfectly fine if you only want to use instances of Sample as indexes into a collection of some other type.
Constant hash codes
If there is any scenario in which distinct instances should compare equal then at first glance you have no other choice than returning a constant. But where does that leave you?
Locating an instance inside a container will always degenerate to the equivalent of a linear search. So in effect by returning a constant you allow the user to make a keyed container for your class, but that container will exhibit the performance characteristics of a LinkedList<T>. This might be obvious to someone familiar with your class, but personally I see it as letting people shoot themselves in the foot. If you know from beforehand that a Dictionary won't behave as one might expect, then why let the user create one? In my view, better to throw NotSupportedException.
But throwing is what you must not do!
Some people will disagree with the above, and when those people are smarter than oneself then one should pay attention. First of all, this code analysis warning states that GetHashCode should not throw. That's something to think about, but let's not be dogmatic. Sometimes you have to break the rules for a reason.
However, that is not all. In his blog post on the subject, Eric Lippert says that if you throw from inside GetHashCode then
your object cannot be a result in many LINQ-to-objects queries that use hash tables
internally for performance reasons.
Losing LINQ is certainly a bummer, but fortunately the road does not end here. Many (all?) LINQ methods that use hash tables have overloads that accept an IEqualityComparer<T> to be used when hashing. So you can in fact use LINQ, but it's going to be less convenient.
In the end you will have to weigh the options yourself. My opinion is that it's better to operate with a whitelist strategy (provide an IEqualityComparer<T> whenever needed) as long as it is technically feasible because that makes the code explicit: if someone tries to use the class naively they get an exception that helpfully tells them what's going on and the equality comparer is visible in the code wherever it is used, making the extraordinary behavior of the class immediately clear.
Where I want to override Equals, but there is no sensible immutable "key" for an object (and for whatever reason it doesn't make sense to make the whole object immutable), in my opinion there is only one "correct" choice:
Implement GetHashCode to hash the same fields as Equals uses. (This might be all the fields.)
Document that these fields must not be altered while in a dictionary.
Trust that users either don't put these objects in dictionaries, or obey the second rule.
(Returning a constant value compromises dictionary performance. Throwing an exception disallows too many useful cases where objects are cached but not modified. Any other implementation for GetHashCode would be wrong.)
Where this runs the user into trouble anyway, it's probably their fault. (Specifically: using a dictionary where they shouldn't, or using a model type in a context where they should be using a view-model type that uses reference equality instead.)
Or perhaps I shouldn't be overriding Equals in the first place.
If the classes truly contain nothing constant on which a hash value can be calculated then I would use something simpler than a GUID. Just use a random number persisted in the class (or in a wrapper class).
A simple approach is to store the hashCode in a private member and generate it on the first use. If your entity doesn't change often, and you're not going to be using two different objects that are Equal (where your Equals method returns true) as keys in your dictionary, then this should be fine:
private int? _hashCode;
public override int GetHashCode() {
if (!_hashCode.HasValue)
_hashCode = Property1.GetHashCode() ^ Property2.GetHashCode() etc... based on whatever you use in your equals method
return _hashCode.Value;
}
However, if you have, say, object a and object b, where a.Equals(b) == true, and you store an entry in your dictionary using a as the key (dictionary[a] = value).
If a does not change, then dictionary[b] will return value, however, if you change a after storing the entry in the dictionary, then dictionary[b] will most likely fail.
The only workaround to this is to rehash the dictionary when any of the keys change.

Serialization of primitives

I've come across this problem often, but I haven't found a satisfying solution yet.
I am implementing a reader for savegames (but it could also be applied to other types of files). Depending on the version, there are some added entries, but the order always remains the same. Therefore I created a class:
public class Entry<T> {
public T Value;
public readonly FileVersion MinVersion;
public Entry(T v = default(T), ScenarioVersion m = FileVersion.V115) {
Value = v;
MinVersion = m;
}
}
Now, you guess, I want to write those entries with as less code as possible. I want to write the line if (version >= MinVersion) { /* write data */ } only once. The Entries can be primitive types or objects, which is the problem...
Should define an interface and implement it for every needed primitive type as a wrapper? Or is there a more elegant solution?
(Looking at the comment for specific questions.)
Some values are only written if a certain condition is met.
Are these conditions known at the time the file is read/written or, when read, are they based on other data in the file? If the former (already known), pass in a Func<bool> that must evaluate to true for the read or write operation to occur. The caller can supply an appropriate delegate or lambda method that makes the decision. You mention a minimum version in the question. I assume it is an example of this.
If the latter (values are read/written based on other data in the file), this is a wider question. If the decision can be made on data earlier in the file or in known places, load it and pass the appropriate arguments into the Func. Otherwise, you may need to look at more complex parsing mechanisms but I think this not what you are asking.
It is not a static structure and contains some things like struct { int len; char[len]; }.
.Net offers multiple ways to serialize objects but I suspect you want to read/write in a defined format, such as one that stores a string as a length followed by 8-bit characters. If the .Net mechanisms do not do what you want, you may have to write your own. See Byte for byte serialization of a struct in C# for more information on this, including the use of Marshal to get the underlying bytes of a primitive.
Also, more for reference, if you want to avoid writing primitive types out, you could use public class Entry<T> where T: class.

List of const int instead of enum

I started working on a large c# code base and found the use of a static class with several const ints fields. This class is acting exactly like an enum would.
I would like to convert the class to an actual enum, but the powers that be said no. The main reason I would like to convert it is so that I could have the enum as the data type instead of int. This would help a lot with readability.
Is there any reason to not use enums and to use const ints instead?
This is currently how the code is:
public int FieldA { get; set; }
public int FieldB { get; set; }
public static class Ids
{
public const int ItemA = 1;
public const int ItemB = 2;
public const int ItemC = 3;
public const int ItemD = 4;
public const int ItemE = 5;
public const int ItemF = 6;
}
However, I think it should be the following instead:
public Ids FieldA { get; set; }
public Ids FieldB { get; set; }
I think many of the answers here ignore the implications of the semantics of enums.
You should consider using an enum when the entire set of all valid values (Ids) is known in advance, and is small enough to be declared in program code.
You should consider using an int when the set of known values is a subset of all the possible values - and the code only needs to be aware of this subset.
With regards to refactoring - when time and business contraints allow, it's a good idea to clean code up when the new design/implementation has clear benefit over the previous implementation and where the risk is well understood. In situations where the benefit is low or the risk is high (or both) it may be better to take the position of "do no harm" rather than "continuously improve". Only you are in a position to judge which case applies to your situation.
By the way, a case where neither enums or constant ints are necessarily a good idea is when the IDs represent the identifiers of records in an external store (like a database). It's often risky to hardcode such IDs in the program logic, as these values may actually be different in different environments (eg. Test, Dev, Production, etc). In such cases, loading the values at runtime may be a more appropriate solution.
Your suggested solution looks elegant, but won't work as it stands, as you can't use instances of a static type. It's a bit trickier than that to emulate an enum.
There are a few possible reasons for choosing enum or const-int for the implementation, though I can't think of many strong ones for the actual example you've posted - on the face of it, it seems an ideal candidate for an enum.
A few ideas that spring to mind are:
Enums
They provide type-safety. You can't pass any old number where an enum value is required.
Values can be autogenerated
You can use reflection to easily convert between the 'values' and 'names'
You can easily enumerate the values in an enum in a loop, and then if you add new enum members the loop will automatically take them into account.
You can insert new enunm values without worrying about clashes occurring if you accidentally repeat a value.
const-ints
If you don't understand how to use enums (e.g. not knowing how to change the underlying data type of an enum, or how to set explicit values for enum values, or how to assign the same value to mulitple constants) you might mistakenly believe you're achieving something you can't use an enum for, by using a const.
If you're used to other languages you may just naturally approach the problem with consts, not realising that a better solution exists.
You can derive from classes to extend them, but annoyingly you can't derive a new enum from an existing one (which would be a really useful feature). Potentially you could therefore use a class (but not the one i your example!) to achieve an "extendable enum".
You can pass ints around easily. Using an enum may require you to be constantly casting (e.g.) data you receive from a database to and from the enumerated type. What you lose in type-safety you gain in convenience. At least until you pass the wrong number somewhere... :-)
If you use readonly rather than const, the values are stored in actual memory locations that are read when needed. This allows you to publish constants to another assembly that are read and used at runtime, rather than built into the other assembly, which means that you don't have to recompile the dependant assembly when you change any of the constants in your own assembly. This is an important consideration if you want to be able to patch a large application by just releasing updates for one or two assemblies.
I guess it is a way of making it clearer that the enum values must stay unchanged. With an enum another programmer will just drop in a new value without thinking, but a list of consts makes you stop and think "why is it like this? How do I add a new value safely?". But I'd achieve this by putting explicit values on the enums and adding a clear comment, rather than resorting to consts.
Why should you leave the implementation alone?
The code may well have been written by an idiot who has no good reason for what he did. But changing his code and showing him he's an idiot isn't a smart or helpful move.
There may be a good reason it's like that, and you will break something if you change it (e.g. it may need to be a class due to being accessed through reflection, being exposed through external interfaces, or to stop people easily serializing the values because they'll be broken by the obfuscation system you're using). No end of unnecessary bugs are introduced into systems by people who don't fully understand how something works, especially if they don't know how to test their changes to ensure they haven't broken anything.
The class may be autogenerated by an external tool, so it is the tool you need to fix, not the source code.
There may be a plan to do something more with that class in future (?!)
Even if it's safe to change, you will have to re-test everything that is affected by the change. If the code works as it stands, is the gain worth the pain? When working on legacy systems we will often see existing code of poor quality or just done a way we don't personally like, and we have to accept that it is not cost effective to "fix" it, no matter how much it niggles. Of course, you may also find yourself biting back an "I told you so!" when the const-based implementation fails due to lacking type-safety. But aside from type-safety, the implementation is ultimately no less efficient or effective than an enum.
If it ain't broke, don't fix it.
I don't know the design of the system you're working on, but I suspect that the fields are integers that just happen to have a number of predefined values. That's to say they could, in some future state, contain more than those predefined values. While an enum allows for that scenario (via casting), it implies that only the values the enumeration contains are valid.
Overall, the change is a semantic one but it is unnecessary. Unnecessary changes like this are often a source of bugs, additional test overhead and other headaches with only mild benefits. I say add a comment expressing that this could be an enum and leave it as it is.
Yes, it does help with readability, and no I cannot think of any reason against it.
Using const int is a very common "old school" of programming practice for C++.
The reason I see is that if you want to be loosely coupled with another system that uses the same constants, you avoid being tightly coupled and share the same enum type.
Like in RPC calls or something...

(Deep) comparison of an object to a reference in unit tests (C#)

In a Unit Test (in Visual Studio 2008) I want to compare the content of a large object (a list of custom types, to be precise) with a stored reference of this object. The goal is to make sure, that any later refactorings of the code produces the same object content.
Discarded Idea:
A first thought was to serialize to XML, and then compare the hardcoded strings or a file content. This would allow for easy finding of any difference. However since my types are not XML serializable without a hack, I must find another solution. I could use binary serialization but this will not be readable anymore.
Is there a simple and elegant solution to this?
EDIT: According to Marc Gravell's proposal I do now like this:
using (MemoryStream stream = new MemoryStream())
{
//create actual graph using only comparable properties
List<NavigationResult> comparableActual = (from item in sparsed
select new NavigationResult
{
Direction = item.Direction,
/*...*/
VersionIndication = item.VersionIndication
}).ToList();
(new BinaryFormatter()).Serialize(stream, comparableActual);
string base64encodedActual = System.Convert.ToBase64String(stream.GetBuffer(), 0, (int)stream.Length);//base64 encoded binary representation of this
string base64encodedReference = #"AAEAAAD....";//this reference is the expected value
Assert.AreEqual(base64encodedReference, base64encodedActual, "The comparable part of the sparsed set is not equal to the reference.");
}
In essence I do select the comparable properties first, then encode the graph, then compare it to a similarly encoded reference.
Encoding enables deep comparison in a simple way. The reason I use base64 encoding is, that I can easily store the reference it in a string variable.
I would still be inclined to use serialization. But rather than having to know the binary, just create an expected graph, serialize that. Now serialize the actual graph and compare bytes. This is only useful to tell you that there is a difference; you'd need inspection to find what, which is a pain.
I would use the hack to do XML comparision. Or you could use reflection to automaticaly traverse object properties (but this will traverse ALL of them, also some you could not want to).
I would make each custom type inherit IComparable, and provide equality methods, that compare each custom types, as well as making the main class ICompareble, You can then simply compare the 2 objects ( if you have them in memory when running unit tests) If not then I would suggest either serializing, or defining constants which you expect the refactored object to have.

Which is faster/more efficient: Dictionary<string,object> or Dictionary<enum,object>?

Are enum types faster/more efficient than string types when used as dictionary keys?
IDictionary<string,object> or IDictionary<enum,object>
As a matter of fact, which data type is most suitable as a dictionary key and why?
Consider the following: NOTE: Only 5 properties for simplicity
struct MyKeys
{
public string Incomplete = "IN";
public string Submitted = "SU";
public string Processing="PR";
public string Completed = "CO";
public string Closed = "CL";
}
and
enum MyKeys
{
Incomplete,
Submitted,
Processing,
Completed,
Closed
}
Which of the above will be better if used as keys in a dictionary!
Certainly the enum version is better (when both are applicable and make sense, of course). Not just for performance (it can be better or worse, see Rashack's very good comment) as it's checked compile time and results in cleaner code.
You can circumvent the comparer issue by using Dictionary<int, object> and casting enum keys to ints or specifying a custom comparer.
I think you should start by focusing on correctness. This is far more important than the minimal difference between the minor performance differences that may occur within your program. In this case I would focus on the proper representation of your types (enum appears to be best). Then later on profile your application and if there is a issue, then and only then should you fix it.
Making code faster later in the process is typically a straight forward process. Take the link that skolima provided. If you had chosen enum, it would have been a roughly 10 minute fix to remove a potential performance problem in your application. I want to stress the word potential here. This was definitely a problem for NHibernate but as to whether or not it would be a problem for your program would be solely determined by the uses.
On the other hand, making code more correct later in the process tends to be more difficult. In a large enough problem you'll find that people start taking dependencies on the side effects of the previous bad behavior. This can make correcting code without breaking other components challenging.
Use enum to get cleaner and nicer code, but remember to provide a custom comparer if you are concerned with performance: http://ayende.com/Blog/archive/2009/02/21/dictionaryltenumtgt-puzzler.aspx .
I would guess that the enum version is faster. Under the hood the dictionary references everything by hashcode. My guess is that it is slower to generate the hashcode for a string. However, this is probably negligibly slower, and is most certainly faster than anything like a string compare. I agree with the other posters who said that an enum is cleaner.

Categories