Unique ID for each class - c#

I'm want a unique ID (preferably static, without computation) for each class implementation, but not instance. The most obvious way to do this is just hardcode a value in the class, but keeping the values unique becomes a task for an human and isn't ideal.
class Base
{
abstract int GetID();
}
class Foo: Base
{
int GetID() => 10;
}
class Bar: Base
{
int GetID() => 20;
}
Foo foo1 = new Foo();
Foo foo2 = new Foo();
Bar bar = new Bar();
foo1.GetID() == foo2.GetID();
foo1.GetID() != bar.GetID()
The class name would be an obvious unique identifier, but I need an int (or fixed length bytes). I pack the entire object into bytes, and use the id to know what class it is when I unpack it at the other end.
Hashing the class name every time I call GetID() seems needlessly process heavy just to get an ID number.
I could also make an enum as a lookup, but again I need to populate the enum manually.
EDIT: People have been asking important questions, so I'll put the info here.
Needs to be unique per class, not per instance (this is why the identified duplicate question doesn't answer this one).
ID value needs to be persistent between runs.
Value needs to be fixed length bytes or int. Variable length strings such as class name are not acceptable.
Needs to reduce CPU load wherever possible (caching results or using assembly based metadata instead of doing a hash each time).
Ideally, the ID can be retrieved from a static function. This means I can make a static lookup function that matches ID to class.
Number of different classes that need ID isn't that big (<100) so collisions isn't a major concern.
EDIT2:
Some more colour since people are skeptical that this is really needed. I'm open to a different approach.
I'm writing some networking code for a game, and its broken down into message objects. Each different message type is a class that inherits from MessageBase, and adds it's own fields which will be sent.
The MessageBase class has a method for packing itself into bytes, and it sticks a message identifier (the class ID) on the front. When it comes to unpacking it at the other end, I use the identifier to know how to unpack the bytes. This results in some easy to pack/unpack messages and very little overhead (few bytes for ID, then just class property values).
Currently I hard code an ID number in the classes, but it doesn't seem like the best way of doing things.
EDIT3: Here is my code after implementing the accepted answer.
public class MessageBase
{
public MessageID id { get { return GetID(); } }
private MessageID cacheId;
private MessageID GetID()
{
// Check if cacheID hasn't been intialised
if (cacheId == null)
{
// Hash the class name
MD5 md5 = MD5.Create();
byte[] md5Bytes = md5.ComputeHash(Encoding.UTF8.GetBytes(GetType().AssemblyQualifiedName));
// Convert the first few bytes into a uint32, and create the messageID from it and store in cache
cacheId = new MessageID(BitConverter.ToUInt32(md5Bytes, 0));
}
// Return the cacheId
return cacheId;
}
}
public class Protocol
{
private Dictionary<Type, MessageID> messageTypeToId = new Dictionary<Type, int>();
private Dictionary<MessageID, Type> idToMessageType = new Dictionary<int, Type>();
private Dictionary<MessageID, Action<MessageBase>> handlers = new Dictionary<int, Action<MessageBase>>();
public Protocol()
{
// Create a list of all classes that are a subclass of MessageBase this namespace
IEnumerable<Type> messageClasses = from t in Assembly.GetExecutingAssembly().GetTypes()
where t.Namespace == GetType().Namespace && t.IsSubclassOf(typeof(MessageBase))
select t;
// Iterate through the list of message classes, and store their type and id in the dicts
foreach(Type messageClass in messageClasses)
{
MessageID = (MessageID)messageClass.GetField("id").GetValue(null);
messageTypeToId[messageClass] = id;
idToMessageType[id] = messageClass;
}
}
}

Given that you can get a Type by calling GetType on the instance, you can easily cache the results. That reduces the problem to working out how to generate an ID for each type. You'd then call something like:
int id = typeIdentifierCache.GetIdentifier(foo1.GetType());
... or make GetIdentifier accept object and it can call GetType(), leaving you with
int id = typeIdentifierCache.GetIdentifier(foo1);
At that point, the detail is all in the type identifier cache.
A simple option would be to take a hash (e.g. SHA-256, for stability and making it very unlikely that you'll encounter collisions) of the fully-qualified type name. To prove that you have no collisions, you could easily write a unit test that runs over all the type names in the assembly and hashes them, then checks there are no duplicates. (Even that might be overkill, given the nature of SHA-256.)
This is all assuming that the types are in a single assembly. If you need to cope with multiple assemblies, you may want to hash the assembly-qualified name instead.

Here is one suggestion. I have used a sha256 byte array which is guaranteed to be a fixed size and astronomically unlikely to have a collision. That may well be overkill, you can easily substitute it out for something smaller. You could also use the AssemblyQualifiedName rather than FullName if you need to worry about version differences or the same class name in multiple assemblies
Firstly, here are all my usings
using System;
using System.Collections.Concurrent;
using System.Text;
using System.Security.Cryptography;
Next, a static cached type hasher object to remember the mapping between your types and the resulting byte arrays. You don't need the Console.WriteLines below, they are just there to demonstrate that you are not computing it over and over again.
public static class TypeHasher
{
private static ConcurrentDictionary<Type, byte[]> cache = new ConcurrentDictionary<Type, byte[]>();
public static byte[] GetHash(Type type)
{
byte[] result;
if (!cache.TryGetValue(type, out result))
{
Console.WriteLine("Computing Hash for {0}", type.FullName);
SHA256Managed sha = new SHA256Managed();
result = sha.ComputeHash(Encoding.UTF8.GetBytes(type.FullName));
cache.TryAdd(type, result);
}
else
{
// Not actually required, but shows that hashing only done once per type
Console.WriteLine("Using cached Hash for {0}", type.FullName);
}
return result;
}
}
Next, an extension method on object so that you can ask for anything's id. Of course if you have a more suitable base class, it doesn't need to go on object per se.
public static class IdExtension
{
public static byte[] GetId(this object obj)
{
return TypeHasher.GetHash(obj.GetType());
}
}
Next, here are some random classes
public class A
{
}
public class ChildOfA : A
{
}
public class B
{
}
And finally, here is everything put together.
public class Program
{
public static void Main()
{
A a1 = new A();
A a2 = new A();
B b1 = new B();
ChildOfA coa = new ChildOfA();
Console.WriteLine("a1 hash={0}", Convert.ToBase64String(a1.GetId()));
Console.WriteLine("b1 hash={0}", Convert.ToBase64String(b1.GetId()));
Console.WriteLine("a2 hash={0}", Convert.ToBase64String(a2.GetId()));
Console.WriteLine("coa hash={0}", Convert.ToBase64String(coa.GetId()));
}
}
Here is the console output
Computing Hash for A
a1 hash=VZrq0IJk1XldOQlxjN0Fq9SVcuhP5VWQ7vMaiKCP3/0=
Computing Hash for B
b1 hash=335w5QIVRPSDS77mSp43if68S+gUcN9inK1t2wMyClw=
Using cached Hash for A
a2 hash=VZrq0IJk1XldOQlxjN0Fq9SVcuhP5VWQ7vMaiKCP3/0=
Computing Hash for ChildOfA
coa hash=wSEbCG22Dyp/o/j1/9mIbUZTbZ82dcRkav4olILyZs4=
On the other side, you would use reflection to iterate all of the types in your library and store a reverse dictionary of hash to type.

Have not seen you answer the question if the same value needs to persist between different runs, but if all you need is a unique ID for a class, then use the built-in and simple GetHashCode method:
class BaseClass
{
public int ClassId() => typeof(this).GetHashCode();
}
If you are worried about performance of multiple calls to GetHashCode(), then first, don't, that is ridiculous micro-optimization, but if you insist, then store it.
GetHashCode() is fast, that is its entire purpose, as a fast way to compare values in a hash.
EDIT:
After doing some tests, the same hash code is returned between different runs using this method. I did not test after altering the classes, though, I am not aware of the exact method on how a Type is hashed.

Related

why don't List<T>.GetHashCode and ObservableCollection<T>.GetHashCode evaluate their items?

I think it is strange that the GetHashCode function of these collections don't base their hashcode on the items in their lists.
I need this to work in order to provide dirty checking (you have unsaved data).
I've written a wrapping class that overrides the GetHashCode method but I find it weird that this is not the default implementation.
I guess this is a performance optimization?
class Program
{
static void Main(string[] args)
{
var x = new ObservableCollection<test>();
int hash = x.GetHashCode();
x.Add(new test("name"));
int hash2 = x.GetHashCode();
var z = new List<test>();
int hash3 = z.GetHashCode();
z.Add(new test("tets"));
int hash4 = z.GetHashCode();
var my = new CustomObservableCollection<test>();
int hash5 = my.GetHashCode();
var test = new test("name");
my.Add(test);
int hash6 = my.GetHashCode();
test.Name = "name2";
int hash7 = my.GetHashCode();
}
}
public class test
{
public test(string name)
{
Name = name;
}
public string Name { get; set; }
public override bool Equals(object obj)
{
if (obj is test)
{
var o = (test) obj;
return o.Name == this.Name;
}
return base.Equals(obj);
}
public override int GetHashCode()
{
return Name.GetHashCode();
}
}
public class CustomObservableCollection<T> : ObservableCollection<T>
{
public override int GetHashCode()
{
int collectionHash = base.GetHashCode();
foreach (var item in Items)
{
var itemHash = item.GetHashCode();
if (int.MaxValue - itemHash > collectionHash)
{
collectionHash = collectionHash * -1;
}
collectionHash += itemHash;
}
return collectionHash;
}
}
If it did, it would break a few of the guidelines for implementing GetHashCode. Namely:
the integer returned by GetHashCode should never change
Since the content of a list can change, then so would its hash code.
the implementation of GetHashCode must be extremely fast
Depending on the size of the list, you could risk slowing down the calculation of its hash code.
Also, I do not believe you should be using an object's hashcode to check if data is dirty. The probability of collision is higher than you think.
The Equals/GetHashCode of lists checks for reference equality, not content equality. The reason behind this is, that lists are both mutable and by reference (not struct) objects. So every time you change the contents, the hash code would change.
The common use case of hash codes are hash tables (for example Dictionary<K,V> or HashSet), which sort their items based on hash when the are first inserted into the table. If the hash of an object wich is already in the table changes, it may no longer be found, wich leads to erratic behavior.
The key of GetHashCode is to reflect the Equals() logic, in a light weight way.
And List<T>.Equals() inherits Object.Equals(), and Object.Equals() compares the equality by reference, so that the list do not based on it's items, but the list itself
It would be helpful to have a couple types which behaved like List<T> and could generally be used interchangeably with it, but with GetHashCode and Equals methods which would define equivalence either in terms of the sequence of identities, or the Equals and GetHashCode behaviors of the items encapsulated therein. Making such methods to behave efficiently, however, would require that the class include code to cache its hash value but invalidate or update the cached hash value whenever the collection was modified (it would not be legitimate to modify a list while it was stored as a dictionary key, but it should be legitimate to remove a list, modify it, and re-add it, and it would be very desirable to avoid having such modification necessitate re-hashing the entire contents of the list). It was not considered worthwhile to have ordinary lists go through the effort of supporting such behavior at the cost of slowing down operations on lists that never get hashed; nor was it considered worthwhile to define multiple types of list, multiple types of dictionary, etc. based upon the kind of equivalence they should look for in their members or should expose to the outside world.

Datastructure to represent extensible and unique values

I've a small problem. I've a application monitoring part in a framework which is used by multiple applications.
Right now I've a functionality like this:
public enum Vars
{
CPU,
RAM
}
public void Add(Vars variable, object value)
{
[...]
}
The Variable which is used as Parameter in the Add method will be used as the name of the entry in the database.
Now I got the requirement, that applications can specify own variables outside the framework. Because you can't inherit from an enum this causes some trouble.
I see basicly 2 possibilities (which are bot not very satisfying in my opinion) to solve this.
Possibility 1:
public void Add(enum variable, object value)
This method would accept all sorts of enums, so users could use the Vars enums as well as enums which they've defined by themself. The problem with this solution: It would be possible, that users use the same names in both.. application and framework. I'm not able to differ between two enums with the value "CPU" (Framework may store percent values as "CPU", application may store process cpu usage as "CPU").
Possibility 2:
The second method would be an class instead a enum, something like:
public class Vars
{
public const string CPU = "CPU";
public const string RAM = "RAM";
}
The drawbacks here:
1. More to write.
2. I would have to define parameters as strings:
public void Add(string variable, object value);
This could lead to missuse as well (Applications which add strings directly instead defining a class which inherits from Vars).
Any thoughts on how to define a model which:
Can be inherited (to extend the values by applicationspecific values)
Can be used as a parameter
Ensures, that there are no double (=same value) entries
?
The context is not completely clear, but what about creating a class
public class Vars
{
public static Vars CPU = Vars.Get("CPU", 1);
public static Vars RAM = Vars.Get("RAM", 2);
//You can keep one of the params, name or id
private Vars(string name, int id)
{
...
}
public static Vars Get(string name, int id)
{
//check if id or name exists in static dictionary, and return that instance or create new one
}
}
public void Add(Vars variable, object value);
Now user can create any kind of Parameter and pass to the method,
Vars newVar = Vars.Get("MyNewParam", 10);
You can easily check if the passed param is one, about which you know
Get method returns same instance if the params are the same

Generating an identifier for objects so that they can be added to a hashtable I have created

I have a hashtable base class and I am creating different type of hashtable by deriving from it. I only allow it to accept objects that implement my IHashable interface.For example -
class LinearProbingHashTable<T> : HashTableBase<T> where T: IHashable
{
...
...
...
}
interface IHashable
{
/**
* Every IHashable implementation should provide an indentfying value for use in generating a hash key.
*/
int getIdentifier();
}
class Car : IHashable
{
public String Make { get; set; }
public String Model { get; set; }
public String Color { get; set; }
public int Year { get; set; }
public int getIdentifier()
{
/// ???
}
}
Can anyone suggest a good method for generating an identifier for the car that can be used by the hash function to place it in the hash table?
I am actually really looking for a general purpose solution to generating an id for any given class. I would like to have a base class for all classes, HashableObject, that implements IHashable and its getIdentifier method. So then I could just derive from HashableObject which would automatically provide an identifier for any instances. Which means I wouldn't have to write a different getIdentifier method for every object I add to the hashtable.
public class HashableObject : IHashable
{
public int getIdentifier()
{
// Looking for code here that would generate an id for any object...
}
}
public class Dog : HashableObject
{
// Dont need to implement getIdentifier because the parent class does it for me
}
I would split the problem in two:
How to generate hash codes of primitive types: strings, integers etc.
How to combine multiple hash codes into one hash code
using (1) and then (2) you can generate the hash code of any class or structure.
The naive way to do (1) for strings is to add the code of all characters in the string:
public static int getStringIdentifier(string str)
{
int result = 0;
foreach (char c in str) {
result += (int)c;
}
return result;
}
Similar naive algorithms can be used for other basic data types (that are all array of bytes in the end..).
The naive way to do (2) is to simply combine the various hash codes with XOR:
public int getIdentifier()
{
return getStringIdentifier(Make) ^ getStringIdentifier(Model) ^ getStringIdentifier(Color);
}
These algorithms will work, but won't generate good distributions of the hash code values - i.e. there will be collisions.
If you want better algorithms you can have a look at how the .NET framework does it - here is the source code of the class used intenally to combine multiple hash codes, and here is the source code of the String class - including String.GetHashCode().
As you can see they are variants of the naive one above, with different starting values and more complex combinations.
If you want a single method that works on different classes the way to do it is to use reflection to detect all the primitive fields contained in the class, compute their hash code using the primitive functions and then combine them.
It is tricky and extermely .NET-specific though - my preference would be to create methods handling the primitive types and then just re-define getIdentifier() for each class.
You should use the default GetHashCode method. It does everything you need. Documentation. It exists for all objects and is virtual so you can choose to override it if you wish.
I assume you know how to generate hashes for the primitive data types (ints, floats, strings, non-extended object, and a few others) and combine multiple hashes, so I won't bore you with the details.
If you absolutely must write your own generic hash function you could use Reflection. You would recursively hash each data member until you got to a primitive type where you'd have to manually handle those cases. There will likely be problems with certain data-types that have unmanaged data. In particular, one example would be a .net class that has a pointer to a class with an unspecified data-structure. Reflection clearly can't handle this case and would not be able to hash the unmanaged portion of the class.

Monitoring the state of a queue

Given a Queue<MyMessage>, where MyMessage is the base class for some types of messages: all message types have different fields, so they will use a different amount of bytes. Therefore it would make sense to measure the fill level of this queue in terms of bytes rather than of elements present in the queue.
In fact, since this queue is associated with a connection, I could better control the message flow, reducing the traffic if the queue is nearly full.
In order to get this target, I thought to wrap a simple Queue with a custom class MyQueue.
public class MyQueue
{
private Queue<MyMessage> _outputQueue;
private Int32 _byteCapacity;
private Int32 _currentSize; // number of used bytes
public MyQueue(int byteCapacity)
{
this._outputQueue = new Queue<MyMessage>();
this._byteCapacity = byteCapacity;
this._currentSize = 0;
}
public void Enqueue(MyMessage msg)
{
this._outputQueue.Enqueue(msg);
this._currentSize += Marshal.SizeOf(msg.GetType());
}
public MyMessage Dequeue()
{
MyMessage result = this._outputQueue.Dequeue();
this._currentSize -= Marshal.SizeOf(result.GetType());
return result;
}
}
The problem is that this is not good for classes, because Marshal.SizeOf throws an ArgumentException exception.
Is it possible to calculate in some way the size of an object (instance of a class)?
Are there some alternatives to monitor the fill level of a queue in terms of bytes?
Are there any queues that can be managed in this way?
UPDATE: As an alternative solution I could add a method int SizeBytes() on each message type, but this solution seems a little ugly, although it would perhaps be the most efficient since You cannot easily measure a reference type.
public interface MyMessage
{
Guid Identifier
{
get;
set;
}
int SizeBytes();
}
The classes that implement this interface must, in addition to implementing the SizeBytes() method, also implement an Identifier property.
public class ExampleMessage
{
public Guid Identifier { get; set; } // so I have a field and its Identifier property
public String Request { get; set; }
public int SizeBytes()
{
return (Marshal.SizeOf(Identifier)); // return 16
}
}
The sizeof operator can not be used with Guid because it does not have a predefined size, so I use Marshal.SizeOf(). But at this point perhaps I should use the experimentally determined values​​: for example, since Marshal.SizeOf() returns 16 for a Guid and since a string consists of N char, then the SizeBytes() method could be as following:
public int SizeBytes()
{
return (16 + Request.Length * sizeof(char));
}
If you could edit the MyMessage base class with a virtual method SizeOf(), then you could have the message classes use the c# sizeof operator on its primitive types. If you can do that, the rest of your code is gold.
You can get an indication of the size of your objects by measuring the length of their binary serialization. Note that this figure will typically be higher than you expect, since .NET may also include metadata in the serialized representation. This approach would also require all your classes to be marked with the [Serializable] attribute.
public static long GetSerializedSize(object root)
{
using (var memoryStream = new MemoryStream())
{
var binaryFormatter = new BinaryFormatter();
binaryFormatter.Serialize(memoryStream, root);
return memoryStream.Length;
}
}

Get type from GUID

For various reasons, I need to implement a type caching mechanism in C#. Fortunately, the CLR provides Type.GUID to uniquely identify a type. Unfortunately, I can't find any way to look up a type based on this GUID. There's Type.GetTypeFromCLSID() but based on my understanding of the documentation (and experiments) that does something very, very different.
Is there any way to get a type based on its GUID short of looping through all the loaded types and comparing to their GUIDs?
EDIT: I forgot to mention that I would really like a "type fingerprint" of fixed width, that's why the GUID is so appealing to me. In a general case, of course, the fully qualified name of the type would work.
why not use the designated property for that, ie. AssemblyQualifiedName? This property is documented as "can be persisted and later used to load the Type".
The GUID is for COM interop.
This may just be a summary of answers already posted, but I don't think there is a way to do this without first building a map of Guid->Type.
We do this in our framework on initialization:
static TypeManager()
{
AppDomain.CurrentDomain.AssemblyLoad += (s, e) =>
{
_ScanAssembly(e.LoadedAssembly);
};
foreach (Assembly a in AppDomain.CurrentDomain.GetAssemblies())
{
_ScanAssembly(a);
}
}
private static void _ScanAssembly(Assembly a)
{
foreach (Type t in a.GetTypes())
{
//optional check to filter types (by interface or attribute, etc.)
//Add type to type map
}
}
Handling the AssemblyLoad event takes care of dynamically loaded assemblies.
From what I understand, Type.GUID uses the assembly version of the type as part of the Guid generation algorithm. This may lead to trouble if you increment your assembly version numbers. Using the GetDeterministicGuid method described in another answer would probably be advisable, depending on your application.
Don't loop to compare. Populate a Dictionary<Type> and use the Contains method.
Dictionary<Type> types = new Dictionary<Types>();
... //populate
if (types.Contains(someObject.GetType()))
//do something
This will certainly give you a fixed size entry, since all of them will be object references (instances of Type essentially being factory objects).
What about (from Generating Deterministic GUIDs):
private Guid GetDeterministicGuid(string input)
{
// use MD5 hash to get a 16-byte hash of the string:
MD5CryptoServiceProvider provider = new MD5CryptoServiceProvider();
byte[] inputBytes = Encoding.Default.GetBytes(input);
byte[] hashBytes = provider.ComputeHash(inputBytes);
// generate a guid from the hash:
Guid hashGuid = new Guid(hashBytes);
return hashGuid;
}
And throw in that typeof().AssemblyQualifiedName. You could to store this data inside a Dictionary<string, Guid> collection (or, whatever, a <Guid, string>).
This way you'll have always a same GUID for a given type (warning: collision is possible).
If you are in control of these classes I would recommend:
public interface ICachable
{
Guid ClassId { get; }
}
public class Person : ICachable
{
public Guid ClassId
{
get { return new Guid("DF9DD4A9-1396-4ddb-98D4-F8F143692C45"); }
}
}
You can generate your GUIDs using Visual Studio, Tools->Create Guid.
The Mono documentation reports that a module has a Metadata heap of guids.
Perhaps Cecil might help you lookup a type based on its guid? Not sure though, there is a GuidHeap class, it seems to be generating the guids though, but perhaps this is enough for your cache to work?
I would use the typeof (class).GUID to find the instance in the cache dictionary:
private Dictionary<Guid, class> cacheDictionary { get; set; }
and I would have a method to return the dictionary and the GUID as parameter of the method to search for the class in the dictionary.
public T Getclass<T>()
{
var key = typeof(T).GUID;
var foundClass= cacheDictionary.FirstOrDefault(x => x.Key == key);
T item;
if (foundClass.Equals(default(KeyValuePair<Guid, T>)))
{
item = new T()
cacheDictionary.Add(key, item);
}
else
item = result.Value;
return item;
}
and I would use a singleton pattern for the cache,
and the call would be something like the code below:
var cachedObject = Cache.Instance.Getclass<class>();

Categories