In a Unit Test (in Visual Studio 2008) I want to compare the content of a large object (a list of custom types, to be precise) with a stored reference of this object. The goal is to make sure, that any later refactorings of the code produces the same object content.
Discarded Idea:
A first thought was to serialize to XML, and then compare the hardcoded strings or a file content. This would allow for easy finding of any difference. However since my types are not XML serializable without a hack, I must find another solution. I could use binary serialization but this will not be readable anymore.
Is there a simple and elegant solution to this?
EDIT: According to Marc Gravell's proposal I do now like this:
using (MemoryStream stream = new MemoryStream())
{
//create actual graph using only comparable properties
List<NavigationResult> comparableActual = (from item in sparsed
select new NavigationResult
{
Direction = item.Direction,
/*...*/
VersionIndication = item.VersionIndication
}).ToList();
(new BinaryFormatter()).Serialize(stream, comparableActual);
string base64encodedActual = System.Convert.ToBase64String(stream.GetBuffer(), 0, (int)stream.Length);//base64 encoded binary representation of this
string base64encodedReference = #"AAEAAAD....";//this reference is the expected value
Assert.AreEqual(base64encodedReference, base64encodedActual, "The comparable part of the sparsed set is not equal to the reference.");
}
In essence I do select the comparable properties first, then encode the graph, then compare it to a similarly encoded reference.
Encoding enables deep comparison in a simple way. The reason I use base64 encoding is, that I can easily store the reference it in a string variable.
I would still be inclined to use serialization. But rather than having to know the binary, just create an expected graph, serialize that. Now serialize the actual graph and compare bytes. This is only useful to tell you that there is a difference; you'd need inspection to find what, which is a pain.
I would use the hack to do XML comparision. Or you could use reflection to automaticaly traverse object properties (but this will traverse ALL of them, also some you could not want to).
I would make each custom type inherit IComparable, and provide equality methods, that compare each custom types, as well as making the main class ICompareble, You can then simply compare the 2 objects ( if you have them in memory when running unit tests) If not then I would suggest either serializing, or defining constants which you expect the refactored object to have.
Related
I want to take any object and get a guid that represents that object.
I know that entails a lot of things. I am looking for a good-enough solution for common applications.
My specific use case is for caching, I want to know that the object used to create the thing I am caching has already made one in the past. There would be 2 different types of objects. Each type contains only public properties, and may contain a list/ienumable.
Assuming the object could be serializable my first idea was to serialize it to json (via native jsonserlizer or newtonsoft) and then take the json string and convert that to a uuid version 5 as detailed in a gist here How can I generate a GUID for a string?
My second approach if it's not serializable ( for example contained a dictionary ) would be to use reflection on the public properties to generate a unique string of some sort and then convert that to uuid version 5.
Both approaches use uuid version 5 to take a string to guid. Is there a proven c# class that makes valid uuid 5 guids? The gist looks good but want to be sure.
I was thinking of making the c# namespace and type name be the namespace for the uuid 5. Is that a valid use of namespace ?
My first approach is good enough for my simple use case but I wanted to explore the second approach as it's more flexible.
If creating the guid couldn't guarantee reasonable uniqueness it should throw an error. Surely super complicated objects would fail. How might I know that is the case if using reflection?
I am looking for new approaches or concerns/implementations to the second approach.
Edit: The reason why I bounty/reopened this almost 3 years later is because I need this again (and for caching again); but also because of the introduction of the generic unmanaged constraint in c# 7.3. The blog post at http://devblogs.microsoft.com/premier-developer/dissecting-new-generics-constraints-in-c-7-3/ seems to suggest that if the object can obey the unmanaged spec you can find a suitable key for a key-value store. Am I misunderstanding something?
This is still limited because the object (generic) must obey the unmanaged type constraint which is very limiting (no strings, no arrays, etc), but its one step closer. I don't completely understand why the method of getting the memory stream and getting a sha1 hash cant be done on not unmanaged typed.
I understand that reference types are pointing to places in memory and its not as easy to get the memory that represents all whole object; but it feels doable. After all, objects eventually are made up a bunch of implementations of unmanaged types (string is an array chars, etc)
PS: The requirement of GUID is loose, any integer/string at or under 512 bits would suffice
The problem of equality is a difficult one.
Here some thoughts on how you could solve your problem.
Hashing a serialized object
One method would be to serialize an object and then hash the result as proposed by Georg.
Using the md5 checksum gives you a strong checksum with the right input.
But getting it right is the problem.
You might have trouble using a common serialization framework, because:
They don't care whether a float is 1.0 or 1.000000000000001.
They might have a different understanding about what is equal than you / your employer.
They bloat the serialized text with unneeded symbols. (performance)
Just a little deviation in the serialized text causes a large deviation in the hashed GUID/UUID.
That's why, you should carefully test any serialization you do.
Otherwise you might get false possitives/negatives for objects (mostly false negatives).
Some points to think about:
Floats & Doubles:
Always write them the same way, preferably with the same number of digits to prevent something like 1.000000000000001 vs 1.0 from interfering.
DateTime, TimeStamp, etc.:
Apply a fixed format that wont change and is unambiguous.
Unordered collections:
Sort the data before serializing it. The order must be unambiguous
Strings:
Is the equality case-sensitive? If not make all the strings lower or upper case.
If necessary, make them culture invariant.
More:
For every type, think carefully what is equal and what is not. Think especially about edge cases. (float.NaN, -0 vs 0, null, etc.)
It's up to you whether you use an existing serializer or do it yourself.
Doing it yourself is more work and error prone, but you have full control over all aspects of equality and serialization.
Using an existing serializer is also error prone, because you need to test or prove whether the results are always like you want.
Introducing an unambiguous order and use a tree
If you have control over the source code, you can introduce a custom order function.
The order must take all properties, sub objects, lists, etc. into account.
Then you can create a binary tree, and use the order to insert and lookup objects.
The same problems as mentioned by the first approach still apply, you need to make sure that equal values are detected as such.
The big O performance is also worse than using hashing. But in most real live examples, the actual performance should be comparable or at least fast enough.
The good thing is, you can stop comparing two objects, as soon as you found a property or value that is not equal. Thus no need to always look at the whole object.
A binary tree needs O(log2(n)) comparisons for a lookup, thus that would be quite fast.
The bad thing is, you need access to all actual objects, thus keep them in memory.
A hashtable needs only O(1) comparisons for a lookup, thus would even be faster (theoretically at least).
Put them in a database
If you store all your objects in a database, then the database can do the lookup for you.
Databases are quite good in comparing objects and they have built in mechanisms to handle the equality/near equality problem.
I'm not a database expert, so for this option, someone else might have more insight on how good this solution is.
As others have said in comments, it sounds like GetHashCode might do the trick for you if you're willing to settle for int as your key. If not, there is a Guid constructor that takes byte[] of length 16. You could try something like the following
using System.Linq;
class Foo
{
public int A { get; set; }
public char B { get; set; }
public string C { get; set; }
public Guid GetGuid()
{
byte[] aBytes = BitConverter.GetBytes(A);
byte[] bBytes = BitConverter.GetBytes(B);
byte[] cBytes = BitConverter.GetBytes(C);
byte[] padding = new byte[16];
byte[] allBytes =
aBytes
.Concat(bBytes)
.Concat(cBytes)
.Concat(padding)
.Take(16)
.ToArray();
return new Guid(allBytes);
}
}
As said in the comments, there is no bullet entirely out of silver here, but a few that come quite close. Which of them to use depends on the types you want to use your class with and your context, e.g. when do you consider two objects to be equal. However, be aware that you will always face possible conflicts, a single GUID will not be sufficient to guarantee collision avoidance. All you can do is to decrease the probability of a collision.
In your case,
already made one in the past
sounds like you don't want to refer to reference equality but want to use a notion of value equality. The simplest way to do so is to trust that the classes implement equality using value equality because in that case, you would already be done using GetHashCode but that has a higher probability of collisions because it is only 32bit. Further, you would assume that whoever wrote the class did a good job, which is not always a good assumption to be made, particularly since people tend to blame you rather then themselves.
Otherwise, your best chances are serialization combined with a hashing algorithm of your choice. I would recommend MD5 because it is the fastest and produces the 128bit you need for a GUID. If you say your types consist of public properties only, I would suggest to use an XmlSerializer like so:
private MD5 _md5 = new MD5CryptoServiceProvider();
private Dictionary<Type, XmlSerializer> _serializers = new Dictionary<Type, XmlSerializer>();
public Guid CreateID(object obj)
{
if (obj == null) return Guid.Empty;
var type = obj.GetType();
if (!_serializers.TryGetValue(type, out var serializer))
{
serializer = new XmlSerializer(type);
_serializers.Add(type, serializer);
}
using (var stream = new MemoryStream())
{
serializer.Serialize(stream, obj);
stream.Position = 0;
return new Guid(_md5.ComputeHash(stream));
}
}
Just about all serializers have their drawbacks. XmlSerializer is not capable of serializing cyclic object graphs, DataContractSerializer requires your types to have dedicated attributes and also the old serializers based on the SerializableAttribute require that attribute to be set. You somehow have to make assumptions.
I have a class which has an bool array member. If I modify an element of this array, a new modified copy of the instance should be created. Sounds like a perfect opportunity for using an Immutable type. Googling around showed that Microsoft provides a new library Immutable Collections which works quite well for another use case. But not for the aforementioned bool array member.
The seemingly fitting type ImmutableArray has been removed for time being and the documentation didn't seem to contain an indexer as well. The potential replacement ImmutableList doesn't work with structs. I'm loathe to introduce another third party library, so I'm wondering what options I have and which I should choose.
I could create a class Bool to satisfy the reference type requirement. Or I could use BitArray, but trying to use like this fails with a compile error:
IReadOnlyList<BitArray> test = new IReadOnlyList<BitArray>(new BitArray());
So any ideas what I should do?
Note that this is perfectly valid:
var ba = new BitArray(10);
ba.SetAll(true);
IImmutableList<bool> test = ba.Cast<bool>().ToImmutableList();
Your problem is that the immutable item type is bool, not BitArray! And that BitArray is from the pre-generics era, so it doesn't support IEnumerable<bool>, ICollection<bool>, IList<bool>, so you can't use it directly (see the .Cast<bool>() to solve this)
An Office Add-In I'm developing is going to have a few dozen 120-length double arrays in it (statically). I could just make a class with a bunch of static member arrays and use array initializers, but this seems a little ugly. It makes sense to me that I'd be able to store these in a Resource file, but it doesn't really have any options that fit. The closest option is "text file" but then I'd either have to parse each array each time I wanted to use it or build a lazyloader (which seems just as inelegant). Is there a better option?
(For the curious, the arrays are mortality tables.)
I would personally make them static members of a static class and put the data in an ini file. Add a bool to the class to indicate whether or not the arrays have been initialized (or you could do this on an array by array basis) and in all files you access the class do a check to make sure it's initialized, if not, call the method which reads the file and loads data into the arrays. The method to load the data really shouldn't be that messy, and it's a fairly trivial operation.
This method also gives you namespace access to the data (global if you have the proper using statements/build dependencies).
Why not use a string resource. Read that string, seperate it on the comma's and parse each number. When you have read the string, all seperating and parsing can be done in one line of LINQ.
Example:
string myResource = "1, 2, 3.4, 5.6";
double[] values = myResource.Split(',').Select(Convert.ToDouble).ToArray();
Otherwise you could serialize your array with doubles once to a binary file and deserialize it in your code.
If parsing is your main concern (because of speed), store the result locally in a private cache. Each time the array is requested, just make a clone of it.
I've come across this problem often, but I haven't found a satisfying solution yet.
I am implementing a reader for savegames (but it could also be applied to other types of files). Depending on the version, there are some added entries, but the order always remains the same. Therefore I created a class:
public class Entry<T> {
public T Value;
public readonly FileVersion MinVersion;
public Entry(T v = default(T), ScenarioVersion m = FileVersion.V115) {
Value = v;
MinVersion = m;
}
}
Now, you guess, I want to write those entries with as less code as possible. I want to write the line if (version >= MinVersion) { /* write data */ } only once. The Entries can be primitive types or objects, which is the problem...
Should define an interface and implement it for every needed primitive type as a wrapper? Or is there a more elegant solution?
(Looking at the comment for specific questions.)
Some values are only written if a certain condition is met.
Are these conditions known at the time the file is read/written or, when read, are they based on other data in the file? If the former (already known), pass in a Func<bool> that must evaluate to true for the read or write operation to occur. The caller can supply an appropriate delegate or lambda method that makes the decision. You mention a minimum version in the question. I assume it is an example of this.
If the latter (values are read/written based on other data in the file), this is a wider question. If the decision can be made on data earlier in the file or in known places, load it and pass the appropriate arguments into the Func. Otherwise, you may need to look at more complex parsing mechanisms but I think this not what you are asking.
It is not a static structure and contains some things like struct { int len; char[len]; }.
.Net offers multiple ways to serialize objects but I suspect you want to read/write in a defined format, such as one that stores a string as a length followed by 8-bit characters. If the .Net mechanisms do not do what you want, you may have to write your own. See Byte for byte serialization of a struct in C# for more information on this, including the use of Marshal to get the underlying bytes of a primitive.
Also, more for reference, if you want to avoid writing primitive types out, you could use public class Entry<T> where T: class.
I want to have a method that could traverse an object by property names and get me the value of the property.
More specifically as an input I have a string like "Model.Child.Name" and I want this method to take an object and get me the value that could be found programatically via: object.Model.Child.Name.
I understand that the only way to do this is to use Reflection, but I don't want to write this code on my own, because I believe that there are pitfalls. Moreover, I think it is more or less usual task.
Is there any well-known implementation of algorithm like that on C#?
Reflection is the way to go.
Reflection to access properties at runtime
You can take a look at ObjectDumper and modify the source code as per your requirement.
ObjectDumper take a .NET object and dump it to string, file, textWriter etc.
The is not that difficult to write. Yes there are some pitfalls, but it's good to know the pitfalls.
The algorithm is straightforward, it's traversing a tree structure. At each node you inspect it for a primitive value (int, string, char, etc) if it's not one of these times, then its a structure that has one or more primitives and needs to be traversed to it's primitives.
The pitfalls are dealing with nulls, nullable types, value versus reference types, etc. Straight forward stuff that every developer should know about.