Generating an identifier for objects so that they can be added to a hashtable I have created - c#

I have a hashtable base class and I am creating different type of hashtable by deriving from it. I only allow it to accept objects that implement my IHashable interface.For example -
class LinearProbingHashTable<T> : HashTableBase<T> where T: IHashable
{
...
...
...
}
interface IHashable
{
/**
* Every IHashable implementation should provide an indentfying value for use in generating a hash key.
*/
int getIdentifier();
}
class Car : IHashable
{
public String Make { get; set; }
public String Model { get; set; }
public String Color { get; set; }
public int Year { get; set; }
public int getIdentifier()
{
/// ???
}
}
Can anyone suggest a good method for generating an identifier for the car that can be used by the hash function to place it in the hash table?
I am actually really looking for a general purpose solution to generating an id for any given class. I would like to have a base class for all classes, HashableObject, that implements IHashable and its getIdentifier method. So then I could just derive from HashableObject which would automatically provide an identifier for any instances. Which means I wouldn't have to write a different getIdentifier method for every object I add to the hashtable.
public class HashableObject : IHashable
{
public int getIdentifier()
{
// Looking for code here that would generate an id for any object...
}
}
public class Dog : HashableObject
{
// Dont need to implement getIdentifier because the parent class does it for me
}

I would split the problem in two:
How to generate hash codes of primitive types: strings, integers etc.
How to combine multiple hash codes into one hash code
using (1) and then (2) you can generate the hash code of any class or structure.
The naive way to do (1) for strings is to add the code of all characters in the string:
public static int getStringIdentifier(string str)
{
int result = 0;
foreach (char c in str) {
result += (int)c;
}
return result;
}
Similar naive algorithms can be used for other basic data types (that are all array of bytes in the end..).
The naive way to do (2) is to simply combine the various hash codes with XOR:
public int getIdentifier()
{
return getStringIdentifier(Make) ^ getStringIdentifier(Model) ^ getStringIdentifier(Color);
}
These algorithms will work, but won't generate good distributions of the hash code values - i.e. there will be collisions.
If you want better algorithms you can have a look at how the .NET framework does it - here is the source code of the class used intenally to combine multiple hash codes, and here is the source code of the String class - including String.GetHashCode().
As you can see they are variants of the naive one above, with different starting values and more complex combinations.
If you want a single method that works on different classes the way to do it is to use reflection to detect all the primitive fields contained in the class, compute their hash code using the primitive functions and then combine them.
It is tricky and extermely .NET-specific though - my preference would be to create methods handling the primitive types and then just re-define getIdentifier() for each class.

You should use the default GetHashCode method. It does everything you need. Documentation. It exists for all objects and is virtual so you can choose to override it if you wish.
I assume you know how to generate hashes for the primitive data types (ints, floats, strings, non-extended object, and a few others) and combine multiple hashes, so I won't bore you with the details.
If you absolutely must write your own generic hash function you could use Reflection. You would recursively hash each data member until you got to a primitive type where you'd have to manually handle those cases. There will likely be problems with certain data-types that have unmanaged data. In particular, one example would be a .net class that has a pointer to a class with an unspecified data-structure. Reflection clearly can't handle this case and would not be able to hash the unmanaged portion of the class.

Related

Unique ID for each class

I'm want a unique ID (preferably static, without computation) for each class implementation, but not instance. The most obvious way to do this is just hardcode a value in the class, but keeping the values unique becomes a task for an human and isn't ideal.
class Base
{
abstract int GetID();
}
class Foo: Base
{
int GetID() => 10;
}
class Bar: Base
{
int GetID() => 20;
}
Foo foo1 = new Foo();
Foo foo2 = new Foo();
Bar bar = new Bar();
foo1.GetID() == foo2.GetID();
foo1.GetID() != bar.GetID()
The class name would be an obvious unique identifier, but I need an int (or fixed length bytes). I pack the entire object into bytes, and use the id to know what class it is when I unpack it at the other end.
Hashing the class name every time I call GetID() seems needlessly process heavy just to get an ID number.
I could also make an enum as a lookup, but again I need to populate the enum manually.
EDIT: People have been asking important questions, so I'll put the info here.
Needs to be unique per class, not per instance (this is why the identified duplicate question doesn't answer this one).
ID value needs to be persistent between runs.
Value needs to be fixed length bytes or int. Variable length strings such as class name are not acceptable.
Needs to reduce CPU load wherever possible (caching results or using assembly based metadata instead of doing a hash each time).
Ideally, the ID can be retrieved from a static function. This means I can make a static lookup function that matches ID to class.
Number of different classes that need ID isn't that big (<100) so collisions isn't a major concern.
EDIT2:
Some more colour since people are skeptical that this is really needed. I'm open to a different approach.
I'm writing some networking code for a game, and its broken down into message objects. Each different message type is a class that inherits from MessageBase, and adds it's own fields which will be sent.
The MessageBase class has a method for packing itself into bytes, and it sticks a message identifier (the class ID) on the front. When it comes to unpacking it at the other end, I use the identifier to know how to unpack the bytes. This results in some easy to pack/unpack messages and very little overhead (few bytes for ID, then just class property values).
Currently I hard code an ID number in the classes, but it doesn't seem like the best way of doing things.
EDIT3: Here is my code after implementing the accepted answer.
public class MessageBase
{
public MessageID id { get { return GetID(); } }
private MessageID cacheId;
private MessageID GetID()
{
// Check if cacheID hasn't been intialised
if (cacheId == null)
{
// Hash the class name
MD5 md5 = MD5.Create();
byte[] md5Bytes = md5.ComputeHash(Encoding.UTF8.GetBytes(GetType().AssemblyQualifiedName));
// Convert the first few bytes into a uint32, and create the messageID from it and store in cache
cacheId = new MessageID(BitConverter.ToUInt32(md5Bytes, 0));
}
// Return the cacheId
return cacheId;
}
}
public class Protocol
{
private Dictionary<Type, MessageID> messageTypeToId = new Dictionary<Type, int>();
private Dictionary<MessageID, Type> idToMessageType = new Dictionary<int, Type>();
private Dictionary<MessageID, Action<MessageBase>> handlers = new Dictionary<int, Action<MessageBase>>();
public Protocol()
{
// Create a list of all classes that are a subclass of MessageBase this namespace
IEnumerable<Type> messageClasses = from t in Assembly.GetExecutingAssembly().GetTypes()
where t.Namespace == GetType().Namespace && t.IsSubclassOf(typeof(MessageBase))
select t;
// Iterate through the list of message classes, and store their type and id in the dicts
foreach(Type messageClass in messageClasses)
{
MessageID = (MessageID)messageClass.GetField("id").GetValue(null);
messageTypeToId[messageClass] = id;
idToMessageType[id] = messageClass;
}
}
}
Given that you can get a Type by calling GetType on the instance, you can easily cache the results. That reduces the problem to working out how to generate an ID for each type. You'd then call something like:
int id = typeIdentifierCache.GetIdentifier(foo1.GetType());
... or make GetIdentifier accept object and it can call GetType(), leaving you with
int id = typeIdentifierCache.GetIdentifier(foo1);
At that point, the detail is all in the type identifier cache.
A simple option would be to take a hash (e.g. SHA-256, for stability and making it very unlikely that you'll encounter collisions) of the fully-qualified type name. To prove that you have no collisions, you could easily write a unit test that runs over all the type names in the assembly and hashes them, then checks there are no duplicates. (Even that might be overkill, given the nature of SHA-256.)
This is all assuming that the types are in a single assembly. If you need to cope with multiple assemblies, you may want to hash the assembly-qualified name instead.
Here is one suggestion. I have used a sha256 byte array which is guaranteed to be a fixed size and astronomically unlikely to have a collision. That may well be overkill, you can easily substitute it out for something smaller. You could also use the AssemblyQualifiedName rather than FullName if you need to worry about version differences or the same class name in multiple assemblies
Firstly, here are all my usings
using System;
using System.Collections.Concurrent;
using System.Text;
using System.Security.Cryptography;
Next, a static cached type hasher object to remember the mapping between your types and the resulting byte arrays. You don't need the Console.WriteLines below, they are just there to demonstrate that you are not computing it over and over again.
public static class TypeHasher
{
private static ConcurrentDictionary<Type, byte[]> cache = new ConcurrentDictionary<Type, byte[]>();
public static byte[] GetHash(Type type)
{
byte[] result;
if (!cache.TryGetValue(type, out result))
{
Console.WriteLine("Computing Hash for {0}", type.FullName);
SHA256Managed sha = new SHA256Managed();
result = sha.ComputeHash(Encoding.UTF8.GetBytes(type.FullName));
cache.TryAdd(type, result);
}
else
{
// Not actually required, but shows that hashing only done once per type
Console.WriteLine("Using cached Hash for {0}", type.FullName);
}
return result;
}
}
Next, an extension method on object so that you can ask for anything's id. Of course if you have a more suitable base class, it doesn't need to go on object per se.
public static class IdExtension
{
public static byte[] GetId(this object obj)
{
return TypeHasher.GetHash(obj.GetType());
}
}
Next, here are some random classes
public class A
{
}
public class ChildOfA : A
{
}
public class B
{
}
And finally, here is everything put together.
public class Program
{
public static void Main()
{
A a1 = new A();
A a2 = new A();
B b1 = new B();
ChildOfA coa = new ChildOfA();
Console.WriteLine("a1 hash={0}", Convert.ToBase64String(a1.GetId()));
Console.WriteLine("b1 hash={0}", Convert.ToBase64String(b1.GetId()));
Console.WriteLine("a2 hash={0}", Convert.ToBase64String(a2.GetId()));
Console.WriteLine("coa hash={0}", Convert.ToBase64String(coa.GetId()));
}
}
Here is the console output
Computing Hash for A
a1 hash=VZrq0IJk1XldOQlxjN0Fq9SVcuhP5VWQ7vMaiKCP3/0=
Computing Hash for B
b1 hash=335w5QIVRPSDS77mSp43if68S+gUcN9inK1t2wMyClw=
Using cached Hash for A
a2 hash=VZrq0IJk1XldOQlxjN0Fq9SVcuhP5VWQ7vMaiKCP3/0=
Computing Hash for ChildOfA
coa hash=wSEbCG22Dyp/o/j1/9mIbUZTbZ82dcRkav4olILyZs4=
On the other side, you would use reflection to iterate all of the types in your library and store a reverse dictionary of hash to type.
Have not seen you answer the question if the same value needs to persist between different runs, but if all you need is a unique ID for a class, then use the built-in and simple GetHashCode method:
class BaseClass
{
public int ClassId() => typeof(this).GetHashCode();
}
If you are worried about performance of multiple calls to GetHashCode(), then first, don't, that is ridiculous micro-optimization, but if you insist, then store it.
GetHashCode() is fast, that is its entire purpose, as a fast way to compare values in a hash.
EDIT:
After doing some tests, the same hash code is returned between different runs using this method. I did not test after altering the classes, though, I am not aware of the exact method on how a Type is hashed.

More informative comparison of objects in C#

In my C# testing, I often want to compare two objects of the same type (typically an expected object against the actual object), but I want to allow for some flexibility. For example, there may be timestamp fields that I know can't be equal or some fields that I just want to ignore when comparing the objects.
Most importantly, I want to provide an informative message that describes where the two object properties' values differ in order that I can quickly identify what the problem is. For example, a message that says "Source property Name value Fred does not match target property Name value Freda".
The standard Equals and Comparer methods just seem to return ints or Booleans which don't provide enough information for me. At the moment, my object comparison methods return a custom type that has two fields (a boolean and a message), but my thinking is that there must be a more standard way to do this. These days, perhaps a Tuple might be the way to go, but I would welcome suggestions.
"Comparison" might not be the word for what you're trying to do. That word already has a common meaning in this context. We compare objects for equality, which returns a boolean - they are equal or they are not. Or we compare them to see which is greater. That returns an int which can indicate that one or the other is greater, or that they are equal. This is helpful when sorting objects.
What you're trying to do is determine specific differences between objects. I wouldn't try to write something generic that handles different types of objects unless you intend for them to be extremely simple. That gets really complicated as you get into properties that return additional complex objects or collections or collections of complex objects. It's not impossible, just rarely worth the effort compared to just writing a method that compares the particular type you want to compare.
Here's a few interfaces and classes that could make the task a little easier and more consistent. But to be honest it's hard to tell what to do with this. And again, it gets complicated if you're dealing with nested complex properties. What happens if two properties both contain lists of some other object, and all the items in those lists are the same except one on each side that have a differing property. Or what if they're all different? In that case how would you describe the "inequality" of the parent objects? It might be useful to know that they are or are not equal, but less so to somehow describe the difference.
public interface IInstanceComparer<T>
{
IEnumerable<PropertyDifference> GetDifferences(T left, T right);
}
public abstract class InstanceComparer<T> : IInstanceComparer<T>
{
public IEnumerable<PropertyDifference> GetDifferences(T left, T right)
{
var result = new List<PropertyDifference>();
PopulateDifferences(left, right, result);
return result;
}
public abstract void PopulateDifferences(T left, T right,
List<PropertyDifference> differences);
}
public class PropertyDifference
{
public PropertyDifference(string propertyName, string leftValue,
string rightValue)
{
PropertyName = propertyName;
LeftValue = leftValue;
RightValue = rightValue;
}
public string PropertyName { get; }
public string LeftValue { get; }
public string RightValue { get; }
}
public class Animal
{
public string Name { get; }
public int NumberOfLimbs { get; }
public DateTime Created { get; }
}
public class AnimalDifferenceComparer : InstanceComparer<Animal>
{
public override void PopulateDifferences(Animal left, Animal right,
List<PropertyDifference> differences)
{
if(left.Name != right.Name)
differences.Add(new PropertyDifference("Name", left.Name, right.Name));
if(left.NumberOfLimbs!=right.NumberOfLimbs)
differences.Add(new PropertyDifference("NumberOfLimbs",
left.NumberOfLimbs.ToString(),
right.NumberOfLimbs.ToString()));
}
}
You could use extension methods to do this. For example:
public static Extensions
{
public static void CompareWithExpected(this <type> value, <type> expected)
{
Assert.AreEqual(expected.Property1, value.Property1, "Property1 did not match expected";
Assert.AreEqual(expected.Property2, value.Property2, "Property2 did not match expected";
}
}
Then this can be used as follows:
public void TestMethod()
{
// Arrange
...
// Act
...
// Assert
value.CompareWithExpected(expected);
}
You could have any number of these extension methods allowing you the flexibility to check only certain values etc.
This also means you do not need to pollute your types with what is essentially test code.

How to sort a List<> that contains derived class objects, by derived class method

I have a double problem here. I need to sort a List<> that I know contains objects of a derived class to the class that the list was declared to contain originally. AND, I need to sort by the return value from a method in that derived class, which takes a parameter. Keep in mind that I already know the List contains objects all of the derived class type.
I've created some sample code here to demonstrate the question since the real code cannot be shared publicly. Note, I have no control over the base conditions here (i.e. the fact that the List<> collection's declared contents are the parent class and that it contains objects of the derived class, which contains a method that takes an argument and returns the values that I need to sort the collection by). So, I doubt I'd be able to use any suggestion that requires changes there. What I think I need is a way to specify (cast?) what is really in the List so I can access the method defined there. But I'm open to other thoughts for sure. Otherwise I'm left with a traditional bubble sort. Thanks.
public class Component
{
public int X;
public int Y;
}
public class ComponentList : List<Component>
{
// Other members that deal with Components, generically
}
public class Fence : Component
{
public int Distance(int FromX, int FromY)
{
int returnValue = 0;
// Caluclate distance...
return returnValue;
}
}
public class Yard : Component
{
// Yada yada yada
}
public class MyCode
{
public List<Component> MyFences;
public MyCode(List<Component> Fences, int FromX, int FromY)
{
// Sort the fences by their distance from specified X,Y
Fences.Sort((A as Fence, B as Fence) => A.Distance(FromX, FromY).CompareTo(B.Distance(FromX, FromY)));
// Or
List<Fence> sortedFences = MyFences.OrderBy(A => A.Distance(FromX, FromY)).ToList();
// Or ???
}
}
Use the Enumerable.Cast<Fence> extension method to transform your IEnumerable<Component> to IEnumerable<Fence>. Then I'd use your second approach (the OrderBy approach) to sort it, but that's my preference.
List<Fence> sortedFences = MyFences.Cast<Fence>().OrderBy(A => A.Distance(FromX, FromY)).ToList();
This approach will throw if there is an object in MyFences that can't be cast to Fence. If you expect that the code should only be passed Fences, this might be what you want. If, instead, you want to skip over non-Fence members, you can use:
List<Fence> sortedFences = MyFences.OfType<Fence>().OrderBy(A => A.Distance(FromX, FromY)).ToList();

Creating a Many to One Class Relationship

I have a set of class objects that I can not touch. All of them have an ID property that I would like to access in other functions in a generic way.
For simplicities sake here is an example of my problem.
class Example1 {
int ID { get; set;}
}
class Example2 {
int ID { get; set; }
}
I am not able to edit either of these two classes or the library they are in.
I also have a function that expects an ID that can come from either Example1 or Example2. In order to handle this I have come up with a number of solutions but am curious what the proper way to solve this would be.
I could:
Use dynamic classes to access the various classes ID's.
Use reflection to pull out an ID parameter from any given type.
Use an odd inheritance by creating a new class so that Example1ViewModel : Example1, IIdentifiableObject and then expect IIdentifiableObject in my function and implement a copy constructor in Example1ViewModel to handle collecting the data
Write a separate filter function that can extract out the relevant parts from either class and provide the results.
None of these solutions seem particularly good to me. How should I be handling a many to one relationship like this in code and are there tools that C# provides to handle this?
possible solution using extension methods for the classes
public static class MyExtensions
{
public static int GetId(this Example1 ex)
{
return ex.Id;
}
public static int GetId(this Example2 ex)
{
return ex.Id;
}
}
You can add a static method using reflection:
public static int GetId(object obj)
{
Type type = obj.GetType();
return Convert.ToInt32(type.GetProperty("ID").GetValue(obj, null));
}
Then you can invoke it with any object to get the id property value.
Here is the solution that we ended up using and why.
We are using an inheritence structure that that takes the following two base classes:
FooExample
BarExample
and wraps them in the following
IExample
FooExampleModel : IExample
BarExampleModel : IExample
Both FooExampleModel and BarExampleModel have constructors which accept the class they are wrapping.
The importance of this is that it allows us to create methods accepting IExample instances without having to manipulate data beforehand. Additionally, unlike using dynamic types or reflection this solution provides us with compile time error checking.
Unfortunately using extension methods does not work. While it allows us to call the same method on two different object types like we wanted it does not allow those objects to be passed as Generic types to a seperate function.
The result of all of this is that this is now possible:
var foos = new List<FooExample>(); //Pretend there is data here
var bars = new List<BarExample>();
var examples = foos.Select((foo) => (IExample)new FooExampleModel(foo))
.Concat(bars.Select((bar) => (IExample)new BarExampleModel(bar)))
.ToList(); // Force evaluation before function call
DoSomethingOnIExamples(examples);
Besides that slightly gross LINQ query this appears to be the best way to accomplish this (DoSomethingOnIExamples(...) is a function accepting an IEnumerable<IExample> argument). Obviously this solution gets less nice as more types are added to this mix.

How to name a class that wraps several primitive types?

I have a naming problem for some of my classes. I need to wrap some primitive .net types into a class like the following. There will be about 20 of such classes.
(The naming is crap, of course. Just for a demonstrative purpose)
public class Int32Single
{
public int Value { get; set; }
}
public class Int32Double
{
public int Value1 { get; set; }
public int Value2 { get; set; }
}
public class DoubleSingle
{
public double Value { get; set; }
}
I can't use a generic approach for this.
How should I name such wrapper classes, where each class name should provide the necessary information which primite types are wrapped and in which quantity?
It might also be possible that I have class that contains mixed primite types.
This doesn't seem like a very good idea at all. You have both the Tuple class and a standard array available, that both make more sense in any conceivable use case. However, that doesn't answer your question, so...
The most intuitive name for a wrapper class would follow the convention of {type-name}Wrapper, or for example, Int32Wrapper. In your case, the wrapped object is a primitive type, so makes sense to call the class a "Tuple". Since you want to specify the size of the Tuple in your class name, {primitive-type-name}{size}Tuple seems like the most intuitive naming convention but this causes several problems.
The natural language used to describe Tuples create ambiguity (such as Single and Double because they conflict with the Type names). (e.g. DoubleDouble is bad)
Integers are used in the naming of some primitive types so this could cause ambiguity. (e.g. Int322Tuple is bad).
We can't move the size to the beginning such as 2Int32Tuple because integers are not valid characters to begin a class name. So, There are two approaches that I think could work.
I think your best bet to get around these constraints, is to use a {primitive-type-name}{text-represented-size}Tuple convention. (e.g. Int32TwoTuple or DoubleTwoTuple). This convention expresses the contents of the wrapper class without ambiguity, so it seems like a good approach. In addition the name begins with the primitive type name, so, if you have a lot of these classes, it will be easier for your IntelliSense to fill in the correct class name, and it will alphabetically be listed next to the primitive type that is being wrapped.
Generics can help you out here:
public class WrapTwo<T>
{
public T Value1 { get; set; }
public T Value2 { get; set; }
}
public class WrapOne<T>
{
public T Value1 { get; set; }
}
And have you considered the Tuple class?
OneInt32, TwoInt32s, TwoDoubles? Doesn't sound great.
Tuples? http://www.dotnetperls.com/tuple
I don't very fond of Tuples or arrays, because both don't tell much about their purpose. Well, I use them. But mostly as internal members of classes, local variables, or with 3rd party/legacy code.
Back to naming. Compare:
Tuple<int,int> a = Tuple.Create(10,10);
Int32Double b = new Int32Double(15, 15);
WrapTwo<int> c = new WrapTwo<int>(20, 20);
With
Point a = new Point(10, 10);
Vertex b = new Vertex(15, 15);
One can argue, that 'a' is not good name for variable (and suggest to use 'pointA' instead). But I think it's pretty good in context of geometry application.
Not just type name and creation code looks obscure, but consider type fields names:
a.X = 20;
b.Value1 = 20;
So, I think you need some self-descriptive type in context of your application domain.

Categories