Serializing a dictionary in C# - c#

I have a class named serializableVector2:
[Serializable]
class serializableVector2
{
public float x, y;
public serializableVector2(int x, int y)
{
this.x = x;
this.y = y;
}
}
and I have a struct named savedMapTile:
[Serializable]
struct savedMapTile
{
public oreInstance ore;
public int backgroundTileId;
public int playerId;
public tree tree;
}
and I have a dictionary using these two classes:
[SerializeField]
Dictionary<serializableVector2, savedMapTile> savedTiles;
I am trying to load this dictionary modify it, and then save it again all using serialization.
I am deserializing the dictionary like so:
FileStream f = File.Open(saveFileName, FileMode.Open);
BinaryFormatter b = new BinaryFormatter();
savedTiles = (Dictionary<serializableVector2, savedMapTile>)b.Deserialize(f);
f.Close();
and I am serializing it like so:
FileStream f = File.Open(saveFileName, FileMode.Create);
BinaryFormatter b = new BinaryFormatter();
b.Serialize(f, savedTiles);
f.Close();
However, when I try to access an element in the dictionary that I know should exist I get the following error:
System.Collections.Generic.KeyNotFoundException: The given key was not
present in the dictionary.
I get this error from running this code:
id = (savedTiles[new serializableVector2(-19,13)].backgroundTileId);
What I find really strange is that I am able to print out the entirety of the dictionaries keys and its values as well. This is where I am getting the values -19 and 13 for the Vector2. I print the keys and values like so:
for (int i = 0; i < 100; i++ )
{
UnityEngine.Debug.Log(vv[i].x +" "+vv[i].y);
UnityEngine.Debug.Log(x[i].backgroundTileId);
}
At this point I'm really stumped, I have no clue what is going on. I can see the file being saved in windows explorer, I can access keys and values in the dictionary, but I cant seem to use it properly. It is also important to note that when I use the .Contains() method on the dictionary in a similar way to how I am trying to access a value, it always returns false.
This is for a Unity 5 project, using C# in visual studio running on windows 8.1.

Change your serializableVector2 from a class to a struct and you should be able to find things in your dictionary. Someone may correct me if I have this wrong, but to the best of my knowledge the Dictionary is going to call GetHashCode on the key and use that code to store the item in the dictionary. If you create two instances of your class with the same x and y coordinates and call GetHashCode you will see that two instances yield different hash codes. If you change it to a struct than they will produce the same hash code. I believe this is what is causing you to get the "Key not found" issues. On a somewhat related note, it does seem strange that the constructor takes int for the x and y and then stores them as floats. You may want to consider changing the constructor to take float.
[Serializable]
struct serializableVector2
{
public float x, y;
public serializableVector2(float x, float y)
{
this.x = x;
this.y = y;
}
}

You have two issues:
Your dictionary key serializableVector2 is a class relying on the default equality and hashing methods. The defaults use reference equality such that only variables pointing to the same object will be equal and return the same hash.
If that were not the case you would still be relying on floating point equality. Unless your serialised can guarantee precise storage and retrieval of floating point values the deserialised serializableVector2 may NOT be equal to the original.
Suggested solution:
Override GetHashCode and Equals for your serializableVector2 class. When performing comparisons and hashing round your floats to within 32-bit floating point precision of your expected range of values. You can rely on 6+ significant digits of precision (within the same range) so if your world is += 1000 units I believe you can safely round to 3 decimal points.
Example for GetHashCode (without testing):
public override int GetHashCode() {
return Math.Round(x,3).GetHashCode() ^ Math.Round(y,3).GetHashCode();
}

Related

Implementing IEnumerator<T> for Fixed Arrays

I need to implement a mutable polygon that behaves like a struct, that it is copied by value and changes to the copy have no side effects on the original.
Consider my attempt at writing a struct for such a type:
public unsafe struct Polygon : IEnumerable<System.Drawing.PointF>
{
private int points;
private fixed float xPoints[64];
private fixed float yPoints[64];
public PointF this[int i]
{
get => new PointF(xPoints[i], yPoints[i]);
set
{
xPoints[i] = value.X;
yPoints[i] = value.Y;
}
}
public IEnumerator<PointF> GetEnumerator()
{
return new PolygonEnumerator(ref this);
}
}
I have a requirement that a Polygon must be copied by value so it is a struct.
(Rationale: Modifying a copy shouldn't have side effects on the original.)
I would also like it to implement IEnumerable<PointF>.
(Rationale: Being able to write for (PointF p in poly))
As far as I am aware, C# does not allow you to override the copy/assignment behaviour for value types. If that is possible then there's the "low hanging fruit" that would answer my question.
My approach to implementing the copy-by-value behaviour of Polygon is to use unsafe and fixed arrays to allow a polygon to store up to 64 points in the struct itself, which prevents the polygon from being indirectly modified through its copies.
I am running into a problem when I go to implement PolygonEnumerator : IEnumerator<PointF> though.
Another requirement (wishful thinking) is that the enumerator will return PointF values that match the Polygon's fixed arrays, even if those points are modified during iteration.
(Rationale: Iterating over arrays works like this, so this polygon should behave in line with the user's expectations.)
public class PolygonEnumerator : IEnumerator<PointF>
{
private int position = -1;
private ??? poly;
public PolygonEnumerator(ref Polygon p)
{
// I know I need the ref keyword to ensure that the Polygon
// passed into the constructor is not a copy
// However, the class can't have a struct reference as a field
poly = ???;
}
public PointF Current => poly[position];
// the rest of the IEnumerator implementation seems straightforward to me
}
What can I do to implement the PolygonEnumerator class according to my requirements?
It seems to me that I can't store a reference to the original polygon, so I have to make a copy of its points into the enumerator itself; But that means changes to the original polygon can't be visited by the enumerator!
I am completely OK with an answer that says "It's impossible".
Maybe I've dug a hole for myself here while missing a useful language feature or conventional solution to the original problem.
Your Polygon type should not be a struct because ( 64 + 64 ) * sizeof(float) == 512 bytes. That means every value-copy operation will require a copy of 512 bytes - which is very inefficient (not least because of locality-of-reference which strongly favours the use objects that exist in a single location in memory).
I have a requirement that a Polygon must be copied by value so it is a struct.
(Rationale: Modifying a copy shouldn't have side effects on the original.)
Your "requirement" is wrong. Instead define an immutable class with an explicit copy operation - and/or use a mutable "builder" object for efficient construction of large objects.
I would also like it to implement IEnumerable<PointF>.
(Rationale: Being able to write for (PointF p in poly))
That's fine - but you hardly ever need to implement IEnumerator<T> directly yourself because C# can do it for you when using yield return (and the generated CIL is very optimized!).
My approach to implementing the copy-by-value behaviour of Polygon is to use unsafe and fixed arrays to allow a polygon to store up to 64 points in the struct itself, which prevents the polygon from being indirectly modified through its copies.
This is not how C# should be written. unsafe should be avoided wherever possible (because it breaks the CLR's built-in guarantees and safeguards).
Another requirement (wishful thinking) is that the enumerator will return PointF values that match the Polygon's fixed arrays, even if those points are modified during iteration.
(Rationale: Iterating over arrays works like this, so this polygon should behave in line with the user's expectations.)
Who are your users/consumers in this case? If you're so concerned about not breaking user's expectations then you shouldn't use unsafe!
Consider this approach instead:
(Update: I just realised that the class Polygon I defined below is essentially just a trivial wrapper around ImmutableList<T> - so you don't even need class Polygon, so just use ImmutableList<Point> instead)
public struct Point
{
public Point( Single x, Single y )
{
this.X = x;
this.Y = y;
}
public Single X { get; }
public Single Y { get; }
// TODO: Implement IEquatable<Point>
}
public class Polygon : IEnumerable<Point>
{
private readonly ImmutableList<Point> points;
public Point this[int i] => this.points[i];
public Int32 Count => this.points[i];
public Polygon()
{
this.points = new ImmutableList<Point>();
}
private Polygon( ImmutableList<Point> points )
{
this.points = points;
}
public IEnumerator<PointF> GetEnumerator()
{
//return Enumerable.Range( 0, this.points ).Select( i => this[i] );
return this.points.GetEnumerator();
}
public Polygon AddPoint( Single x, Single y ) => this.AddPoint( new Point( x, y ) );
public Polygon AddPoint( Point p )
{
ImmutableList<Point> nextList = this.points.Add( p );
return new Polygon( points: nextList );
}
}

Should I use Int32[,] or System.Drawing.Point when all I want is the x,y coordinates?

I am building an app that lets me control my Android devices from my PC. It's running great so now I want to start cleaning up my code for release. I'm trying to clean up solution references that I don't need so I took a look at the using System.Drawing; that I have for implementing the Point class. The thing is, I don't really need it if I switch to using a two-dimensional Int32 array.
So I could have: new Int32[,] {{200, 300}}; instead of new Point(200, 300); and get rid of the System.Drawing namespace altogether. The question is: does it really matter? Am I realistically introducing bloat in my app by keeping the System.Drawing namespace? Is Int32[,] meaningfully more lightweight?
Or, should I not use either and just keep track of the x,y coordinates in individual Int32 variables?
EDIT: I got rid of the original idea I wrote: Int32[200, 300] and replaced it with new Int32[,] {{200, 300}}; because as #Martin Mulder pointed out Int32[200, 300] "creates a two-dimensional array with 60000 integers, all of them are 0."
EDIT2: So I'm dumb. First of all I was trying to fancify too much by using the multi-dimensional array. Utter, overboard silliness. Secondly, I took the advice to use a struct and it all worked flawlessly, so thank you to the first four answers; every one of them was correct. But, after all that, I couldn't end up removing the System.Drawing reference because I was working on a WinForms app and the System.Drawing is being used all over in the designer of the app! I suppose I could further refactor it but I got the size down to 13KB so it's good enough. Thank you all!
Just create your own:
public struct Point : IEquatable<Point>
{
private int _x;
private int _y;
public int X
{
get { return _x; }
set { _x = value; }
}
public int Y
{
get { return _y; }
set { _y = value; }
}
public Point(int x, int y)
{
_x = x;
_y = y;
}
public bool Equals(Point other)
{
return X == other.X && Y == other.Y;
}
public override bool Equals(object other)
{
return other is Point && Equals((Point)other);
}
public int GetHashCode()
{
return unchecked(X * 1021 + Y);
}
}
Better yet, make it immutable (make the fields readonly and remove the setters), though if you'd depended on the mutability of the two options you consider in your question then that'll require more of a change to how you do things. But really, immutability is the way to go here.
You are suggesting very ill advised:.
new Point(200, 300) creates a new point with two integers: The X and Y property with values 200 and 300.
new Int32[200,300] creates a two-dimensional array with 60000 integers, all of them are 0.
(After your edit) new Int32[,] {{200, 300}} also creates a two-dimensional array, this time with 2 integers. To retrieve the first value (200), you can access it like this: array[0,0] and the second value (300) like array[0,1]. The second dimension is not required or needed or desired.
If you want to get rid of the reference to the library there are a few other suggestions:
new Int32[] {200, 300} creates an one-dimensional array of two integers with values 200 and 300. You can access them with array[0] and array[1].
As Ron Beyer suggested, you could use Tuple<int, int>.
Create your own Point-struct (pointed out by Jon Hanna). It makes your applicatie a bit larger, but you prevent the reference and you prevent the library System.Drawing is loaded into memory.
If I wanted to remove that reference, I would go for the last option since it is more clear to what I am doing (a Point is more readable than an Int32-array or Tuple). Solution 2 and 3 are slightly faster that solution 1.
Nothing gets "embedded" in your application by just referencing a library. However, if the Point class really is all you need, you could just remove the reference and implement you own Point struct. That may be more intuitive to read instead of an int array.
Int32[,] is something different by the way. It's a two-dimensional array, not a pair of two int values. You'll be making things worse by using that.
You could use Tuple<int, int>, but I'd go for creating your own structure.
As some people have suggested implementations here. So just wrap your two integers, I'd just use this:
public class MyPoint
{
public int X;
public int Y;
}
Add all other features only if needed.
As #Glorin Oakenfoot said, you should implement your own Point class. Here's an example:
public class MyPoint // Give it a unique name to avoid collisions
{
public int X { get; set; }
public int Y { get; set; }
public MyPoint() {} // Default constructor allows you to use object initialization.
public MyPoint(int x, int y) { X = x, Y = y }
}

Efficient implementation of flyweight pattern

Background
One of the most used data-structures in our application is a custom Point struct. Recently we have been running into memory issues, mostly caused by an excessive number of instances of this struct.
Many of these instances contain the same data. Sharing a single instance would significantly help to reduce memory usage. However, since we are using structs, instances cannot be shared. It is also not possible to change it to a class, because the struct semantics are important.
Our workaround for this is to have a struct containing a single reference to a backing class, which contains the actual data. These flyweight dataclasses are stored in and retrieved from a factory to ensure no duplicates exist.
A narrowed down version of the code looks something like this:
public struct PointD
{
//Factory
private static class PointDatabase
{
private static readonly Dictionary<PointData, PointData> _data = new Dictionary<PointData, PointData>();
public static PointData Get(double x, double y)
{
var key = new PointData(x, y);
if (!_data.ContainsKey(key))
_data.Add(key, key);
return _data[key];
}
}
//Flyweight data
private class PointData
{
private double pX;
private double pY;
public PointData(double x, double y)
{
pX = x;
pY = y;
}
public double X
{
get { return pX; }
}
public double Y
{
get { return pY; }
}
public override bool Equals(object obj)
{
var other = obj as PointData;
if (other == null)
return false;
return other.X == this.X && other.Y == this.Y;
}
public override int GetHashCode()
{
return X.GetHashCode() * Y.GetHashCode();
}
}
//Public struct
public Point(double x, double y)
{
_data = Point3DDatabase.Get(x, y);
}
public double X
{
get { return _data == null ? 0 : _data.X; }
set { _data = PointDatabase.Get(value, Y); }
}
public double Y
{
get { return _data == null ? 0 : _data.Y; }
set { _data = PointDatabase.Get(X, value); }
}
}
This implementation ensures that the struct semantics are maintained, while ensuring only one instance of the same data is kept around.
(Please don't mention memory leaks or such, this is simplified example code)
The Problem
Although above approach works to lower our memory usage, the performance is horrendous. A project in our application can easily contain a million different points or more. As a result, the lookup of a PointData instance is very costly. And this lookup has to be done whenever a Point is manipulated, which, as you can probably guess, is what our application is all about. As a result, this approach is not suitable for us.
As an alternative, we could make two versions of the Point class: one with backing flyweight as above, and one containing its own data (with possible duplicates). All (short-lived) calculations could be done in the second class, while when storing the Point for longer durations they could be converted to the first, memory-efficient class. However, this means that all the users of the Point class have to be inspected and adjusted to this scheme, something which is not feasible for us.
What we are looking for is an approach which meets below criteria:
When there are multiple Points with the same data, the memory usage should be lower than having a different struct instance for each of these.
Performance should not be much worse than working directly on primitive data in the struct.
Struct semantics should be maintained.
The 'Point' interface should remain the same (i.e. classes that use 'Point' should not have to be changed).
Is there any way we can improve our approach towards these criteria? Or can anyone suggest a different approach we can attempt?
Rather than re-work an entire data structure and programming model, my go-to solution for performance and memory issues is to cache, pre-fetch and most importantly cull you data when it is not needed.
Think of it this way. On a graph, you cannot display few millions of points at once because you run out of pixels (you should occlusion-cull these points). Similarly, in a table, there isn't enough vertical space on screen (you need data set truncation). Consider streaming data from your source file as you need it. If your source data structure is not appropriate for dynamic retrieval, consider an intermediate, temporary file format. This is one of the ways .Net's JITer works so quickly!

C#: How would you unit test GetHashCode?

Testing the Equals method is pretty much straight forward (as far as I know). But how on earth do you test the GetHashCode method?
Test that two distinct objects which are equal have the same hash code (for various values). Check that non-equal objects give different hash codes, varying one aspect/property at a time. While the hash codes don't have to be different, you'd be really unlucky to pick different values for properties which happen to give the same hash code unless you've got a bug.
Gallio/MbUnit v3.2 comes with convenient contract verifiers which are able to test your implementation of GetHashCode() and IEquatable<T>. More specifically you may be interested by the EqualityContract and the HashCodeAcceptanceContract. See here, here and there for more details.
public class Spot
{
private readonly int x;
private readonly int y;
public Spot(int x, int y)
{
this.x = x;
this.y = y;
}
public override int GetHashCode()
{
int h = -2128831035;
h = (h * 16777619) ^ x;
h = (h * 16777619) ^ y;
return h;
}
}
Then you declare your contract verifier like this:
[TestFixture]
public class SpotTest
{
[VerifyContract]
public readonly IContract HashCodeAcceptanceTests = new HashCodeAcceptanceContract<Spot>()
{
CollisionProbabilityLimit = CollisionProbability.VeryLow,
UniformDistributionQuality = UniformDistributionQuality.Excellent,
DistinctInstances = DataGenerators.Join(Enumerable.Range(0, 1000), Enumerable.Range(0, 1000)).Select(o => new Spot(o.First, o.Second))
};
}
It would be fairly similar to Equals(). You'd want to make sure two objects which were the "same" at least had the same hash code. That means if .Equals() returns true, the hash codes should be identical as well. As far as what the proper hashcode values are, that depends on how you're hashing.
From personal experience. Aside from obvious things like same objects giving you same hash codes, you need to create large enough array of unique objects and count unique hash codes among them. If unique hash codes make less than, say 50% of overall object count, then you are in trouble, as your hash function is not good.
List<int> hashList = new List<int>(testObjectList.Count);
for (int i = 0; i < testObjectList.Count; i++)
{
hashList.Add(testObjectList[i]);
}
hashList.Sort();
int differentValues = 0;
int curValue = hashList[0];
for (int i = 1; i < hashList.Count; i++)
{
if (hashList[i] != curValue)
{
differentValues++;
curValue = hashList[i];
}
}
Assert.Greater(differentValues, hashList.Count/2);
In addition to checking that object equality implies equality of hashcodes, and the distribution of hashes is fairly flat as suggested by Yann Trevin (if performance is a concern), you may also wish to consider what happens if you change a property of the object.
Suppose your object changes while it's in a dictionary/hashset. Do you want the Contains(object) to still be true? If so then your GetHashCode had better not depend on the mutable property that was changed.
I would pre-supply a known/expected hash and compare what the result of GetHashCode is.
You create separate instances with the same value and check that the GetHashCode for the instances returns the same value, and that repeated calls on the same instance returns the same value.
That is the only requirement for a hash code to work. To work well the hash codes should of course have a good distribution, but testing for that requires a lot of testing...

Question about Dictionary<T,T>

I have a class which looks like this:
public class NumericalRange:IEquatable<NumericalRange>
{
public double LowerLimit;
public double UpperLimit;
public NumericalRange(double lower, double upper)
{
LowerLimit = lower;
UpperLimit = upper;
}
public bool DoesLieInRange(double n)
{
if (LowerLimit <= n && n <= UpperLimit)
return true;
else
return false;
}
#region IEquatable<NumericalRange> Members
public bool Equals(NumericalRange other)
{
if (Double.IsNaN(this.LowerLimit)&& Double.IsNaN(other.LowerLimit))
{
if (Double.IsNaN(this.UpperLimit) && Double.IsNaN(other.UpperLimit))
{
return true;
}
}
if (this.LowerLimit == other.LowerLimit && this.UpperLimit == other.UpperLimit)
return true;
return false;
}
#endregion
}
This class holds a neumerical range of values. This class should also be able to hold a default range, where both LowerLimit and UpperLimit are equal to Double.NaN.
Now this class goes into a Dictionary
The Dictionary works fine for 'non-NaN' numerical range values, but when the Key is {NaN,NaN} NumericalRange Object, then the dictionary throws a KeyNotFoundException.
What am I doing wrong? Is there any other interface that I have to implement?
Based on your comment, you haven't implemented GetHashCode. I'm amazed that the class works at all in a dictionary, unless you're always requesting the identical key that you put in. I would suggest an implementation of something like:
public override int GetHashCode()
{
int hash = 17;
hash = hash * 23 + UpperLimit.GetHashCode();
hash = hash * 23 + LowerLimit.GetHashCode();
return hash;
}
That assumes Double.GetHashCode() gives a consistent value for NaN. There are many values of NaN of course, and you may want to special case it to make sure they all give the same hash.
You should also override the Equals method inherited from Object:
public override bool Equals(Object other)
{
return other != null &&
other.GetType() == GetType() &&
Equals((NumericalRange) other);
}
Note that the type check can be made more efficient by using as if you seal your class. Otherwise you'll get interesting asymmetries between x.Equals(y) and y.Equals(x) if someone derives another class from yours. Equality becomes tricky with inheritance.
You should also make your fields private, exposing them only as propertes. If this is going to be used as a key in a dictionary, I strongly recommend that you make them readonly, too. Changing the contents of a key when it's used in a dictionary is likely to lead to it being "unfindable" later.
The default implementation of the GetHashCode method uses the reference of the object rather than the values in the object. You have to use the same instance of the object as you used to put the data in the dictionary for that to work.
An implementation of GetHashCode that works simply creates a code from the hash codes of it's data members:
public int GetHashCode() {
return LowerLimit.GetHashCode() ^ UpperLimit.GetHashCode();
}
(This is the same implementation that the Point structure uses.)
Any implementation of the method that always returns the same hash code for any given parameter values works when used in a Dictionary. Just returning the same hash code for all values actually also works, but then the performance of the Dictionary gets bad (looking up a key becomes an O(n) operation instead of an O(1) operation. To give the best performance, the method should distribute the hash codes evenly within the range.
If your data is strongly biased, the above implementation might not give the best performance. If you for example have a lot of ranges where the lower and upper limits are the same, they will all get the hash code zero. In that case something like this might work better:
public int GetHashCode() {
return (LowerLimit.GetHashCode() * 251) ^ UpperLimit.GetHashCode();
}
You should consider making the class immutable, i.e. make it's properties read-only and only setting them in the constructor. If you change the properties of an object while it's in a Dictionary, it's hash code will change and you will not be able to access the object any more.

Categories