Substitute the GetHashCode() Method of System.Drawing.Point - c#

System.Drawing.Point has a really, really bad GetHashCode method if you intend to use it to describes 'pixels' in a Image/Bitmap: it is just XOR between the X and Y coordinates.
So for a image with, say, 2000x2000 size, it has an absurd number of colisions, since only the numbers in the diagonal are going to have a decent hash.
It's quite easy to create a decent GetHashCode method using unchecked multiplication, as some people already mentioned here.
But what can I do to use this improved GetHashCode method in a HashSet?
I know I could create my own class/struct MyPoint and implement it using this improved methods, but then I'd break all other pieces of code in my project that use a System.Drawing.Point.
Is it possible to "overwrite" the method from System.Drawing.Point using some sort of extension method or the like? Or to "tell" the HashSet to use another function instead of the GetHashCode?
Currently I'm using a SortedSet<System.Drawing.Point> with a custom IComparer<Point> to store my points. When I want to know if the set contains a Point I call BinarySearch. It's faster than a HashSet<System.Drawing.Point>.Contains method in a set with 10000 colisions, but it's no as fast as HashSet with a good hash could be.

You can create your own class that implements IEqualityComparer<Point>, then give that class to the HashSet constructor.
Example:
public class MyPointEqualityComparer : IEqualityComparer<Point>
{
public bool Equals(Point p1, Point p2)
{
return p1 == p2; // defer to Point's existing operator==
}
public int GetHashCode(Point obj)
{
return /* your favorite hashcode function here */;
}
}
class Program
{
static void Main(string[] args)
{
// Create hashset with custom hashcode algorithm
HashSet<Point> myHashSet = new HashSet<Point>(new MyPointEqualityComparer());
// Same thing also works for dictionary
Dictionary<Point, string> myDictionary = new Dictionary<Point, string>(new MyPointEqualityComparer());
}
}

Related

Implementing IEnumerator<T> for Fixed Arrays

I need to implement a mutable polygon that behaves like a struct, that it is copied by value and changes to the copy have no side effects on the original.
Consider my attempt at writing a struct for such a type:
public unsafe struct Polygon : IEnumerable<System.Drawing.PointF>
{
private int points;
private fixed float xPoints[64];
private fixed float yPoints[64];
public PointF this[int i]
{
get => new PointF(xPoints[i], yPoints[i]);
set
{
xPoints[i] = value.X;
yPoints[i] = value.Y;
}
}
public IEnumerator<PointF> GetEnumerator()
{
return new PolygonEnumerator(ref this);
}
}
I have a requirement that a Polygon must be copied by value so it is a struct.
(Rationale: Modifying a copy shouldn't have side effects on the original.)
I would also like it to implement IEnumerable<PointF>.
(Rationale: Being able to write for (PointF p in poly))
As far as I am aware, C# does not allow you to override the copy/assignment behaviour for value types. If that is possible then there's the "low hanging fruit" that would answer my question.
My approach to implementing the copy-by-value behaviour of Polygon is to use unsafe and fixed arrays to allow a polygon to store up to 64 points in the struct itself, which prevents the polygon from being indirectly modified through its copies.
I am running into a problem when I go to implement PolygonEnumerator : IEnumerator<PointF> though.
Another requirement (wishful thinking) is that the enumerator will return PointF values that match the Polygon's fixed arrays, even if those points are modified during iteration.
(Rationale: Iterating over arrays works like this, so this polygon should behave in line with the user's expectations.)
public class PolygonEnumerator : IEnumerator<PointF>
{
private int position = -1;
private ??? poly;
public PolygonEnumerator(ref Polygon p)
{
// I know I need the ref keyword to ensure that the Polygon
// passed into the constructor is not a copy
// However, the class can't have a struct reference as a field
poly = ???;
}
public PointF Current => poly[position];
// the rest of the IEnumerator implementation seems straightforward to me
}
What can I do to implement the PolygonEnumerator class according to my requirements?
It seems to me that I can't store a reference to the original polygon, so I have to make a copy of its points into the enumerator itself; But that means changes to the original polygon can't be visited by the enumerator!
I am completely OK with an answer that says "It's impossible".
Maybe I've dug a hole for myself here while missing a useful language feature or conventional solution to the original problem.
Your Polygon type should not be a struct because ( 64 + 64 ) * sizeof(float) == 512 bytes. That means every value-copy operation will require a copy of 512 bytes - which is very inefficient (not least because of locality-of-reference which strongly favours the use objects that exist in a single location in memory).
I have a requirement that a Polygon must be copied by value so it is a struct.
(Rationale: Modifying a copy shouldn't have side effects on the original.)
Your "requirement" is wrong. Instead define an immutable class with an explicit copy operation - and/or use a mutable "builder" object for efficient construction of large objects.
I would also like it to implement IEnumerable<PointF>.
(Rationale: Being able to write for (PointF p in poly))
That's fine - but you hardly ever need to implement IEnumerator<T> directly yourself because C# can do it for you when using yield return (and the generated CIL is very optimized!).
My approach to implementing the copy-by-value behaviour of Polygon is to use unsafe and fixed arrays to allow a polygon to store up to 64 points in the struct itself, which prevents the polygon from being indirectly modified through its copies.
This is not how C# should be written. unsafe should be avoided wherever possible (because it breaks the CLR's built-in guarantees and safeguards).
Another requirement (wishful thinking) is that the enumerator will return PointF values that match the Polygon's fixed arrays, even if those points are modified during iteration.
(Rationale: Iterating over arrays works like this, so this polygon should behave in line with the user's expectations.)
Who are your users/consumers in this case? If you're so concerned about not breaking user's expectations then you shouldn't use unsafe!
Consider this approach instead:
(Update: I just realised that the class Polygon I defined below is essentially just a trivial wrapper around ImmutableList<T> - so you don't even need class Polygon, so just use ImmutableList<Point> instead)
public struct Point
{
public Point( Single x, Single y )
{
this.X = x;
this.Y = y;
}
public Single X { get; }
public Single Y { get; }
// TODO: Implement IEquatable<Point>
}
public class Polygon : IEnumerable<Point>
{
private readonly ImmutableList<Point> points;
public Point this[int i] => this.points[i];
public Int32 Count => this.points[i];
public Polygon()
{
this.points = new ImmutableList<Point>();
}
private Polygon( ImmutableList<Point> points )
{
this.points = points;
}
public IEnumerator<PointF> GetEnumerator()
{
//return Enumerable.Range( 0, this.points ).Select( i => this[i] );
return this.points.GetEnumerator();
}
public Polygon AddPoint( Single x, Single y ) => this.AddPoint( new Point( x, y ) );
public Polygon AddPoint( Point p )
{
ImmutableList<Point> nextList = this.points.Add( p );
return new Polygon( points: nextList );
}
}

Which GetHashcode will dominate in case of IComparer

I am having the following situation
class Custom
{
public override int GetHashCode(){...calculation1}
}
public class MyComparer : IEqualityComparer<Custom>
{
public bool Equals(Custom cus1, Custom cus2)
{
if (cus1 == null || cus2 == null)
return false;
return cus1.GetHashCode() == cus2.GetHashCode();
}
public int GetHashCode(Custom cus1)
{
return ...calculation2;
}
}
int Main()
{
List<Custom> mine1 = new List<Custom>(){....};
List<Custom> mine2 = new List<Custom>(){....};
MyComparer myComparer = new MyComparer();
List<Custom> result = mine1.intersect(mine2,myComparer);
}
Here Just I want to know which GetHashCode will be used in intersecting.
To answer your question, it will be GetHashCode from MyComparer.
But, there is a very improtant reason why there is a GetHashCode and an Equals method. GetHashCode() is an optimization, so when the items are initially compared, only the hash code is checked, if the hash code is the same, then the Equals method is used. That avoids the chance of same hashes for different objects (the chance is one in ~4 bilions, but it still happens, seen it first person). In Equals() method you should compare all the relevant fields from one object to the other. Comparing objects by hashcode in Equals is wrong and defies the whole purpose of this method.
Hope that clarifies.
Why didn't you test it yourself? You already have the code...
MyComparer.GetHashCode will be used in your case. You can see the code here: http://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs#f4105a494115b366
Custom.GetHashCode would be used if you didn't specify comparer at Intersect call at all.
Generally, Hash codes as well as getHashCode functions provide a good mechanism for comparing, but you should beware of similarity. In result of limited range supported by hash facilities, it is very common that two different numbers consequence in the same hash-code which may interferes comparison contexts.

Can I retrieve the stored value x in a hashset given an object y where x.Equals(y)

[TestFixture]
class HashSetExample
{
[Test]
public void eg()
{
var comparer = new OddEvenBag();
var hs = new HashSet<int>(comparer);
hs.Add(1);
Assert.IsTrue(hs.Contains(3));
Assert.IsFalse(hs.Contains(0));
// THIS LINE HERE
var containedValue = hs.First(x => comparer.Equals(x, 3)); // i want something faster than this
Assert.AreEqual(1, containedValue);
}
public class OddEvenBag : IEqualityComparer<int>
{
public bool Equals(int x, int y)
{
return x % 2 == y % 2;
}
public int GetHashCode(int obj)
{
return obj % 2;
}
}
}
As well as checking if hs contains an odd number, I want to know what odd number if contains. Obviously I want a method that scales reasonably and does not simply iterate-and-search over the entire collection.
Another way to rephrase the question is, I want to replace the line below THIS LINE HERE with something efficient (say O(1), instead of O(n)).
Towards what end? I'm trying to intern a laaaaaaaarge number of immutable reference objects similar in size to a Point3D. Seems like using a HashSet<Foo> instead of a Dictionary<Foo,Foo> saves about 10% in memory. No, obviously this isn't a game changer but I figured it would not hurt to try it for a quick win. Apologies if this has offended anybody.
Edit: Link to similar/identical post provided by Balazs Tihanyi in comments, put here for emphasis.
The simple answer is no, you can't.
If you want to retrieve the object you will need to use a HashSet. There just isn't any suitable method in the API to do what you are asking for otherwise.
One optimization you could make though if you must use a Set for this is to first do a contains check and then only iterate over the Set if the contains returns true. Still you would almost certainly find that the extra overhead for a HashMap is tiny (since essentially it's just another object reference).

Question about Dictionary<T,T>

I have a class which looks like this:
public class NumericalRange:IEquatable<NumericalRange>
{
public double LowerLimit;
public double UpperLimit;
public NumericalRange(double lower, double upper)
{
LowerLimit = lower;
UpperLimit = upper;
}
public bool DoesLieInRange(double n)
{
if (LowerLimit <= n && n <= UpperLimit)
return true;
else
return false;
}
#region IEquatable<NumericalRange> Members
public bool Equals(NumericalRange other)
{
if (Double.IsNaN(this.LowerLimit)&& Double.IsNaN(other.LowerLimit))
{
if (Double.IsNaN(this.UpperLimit) && Double.IsNaN(other.UpperLimit))
{
return true;
}
}
if (this.LowerLimit == other.LowerLimit && this.UpperLimit == other.UpperLimit)
return true;
return false;
}
#endregion
}
This class holds a neumerical range of values. This class should also be able to hold a default range, where both LowerLimit and UpperLimit are equal to Double.NaN.
Now this class goes into a Dictionary
The Dictionary works fine for 'non-NaN' numerical range values, but when the Key is {NaN,NaN} NumericalRange Object, then the dictionary throws a KeyNotFoundException.
What am I doing wrong? Is there any other interface that I have to implement?
Based on your comment, you haven't implemented GetHashCode. I'm amazed that the class works at all in a dictionary, unless you're always requesting the identical key that you put in. I would suggest an implementation of something like:
public override int GetHashCode()
{
int hash = 17;
hash = hash * 23 + UpperLimit.GetHashCode();
hash = hash * 23 + LowerLimit.GetHashCode();
return hash;
}
That assumes Double.GetHashCode() gives a consistent value for NaN. There are many values of NaN of course, and you may want to special case it to make sure they all give the same hash.
You should also override the Equals method inherited from Object:
public override bool Equals(Object other)
{
return other != null &&
other.GetType() == GetType() &&
Equals((NumericalRange) other);
}
Note that the type check can be made more efficient by using as if you seal your class. Otherwise you'll get interesting asymmetries between x.Equals(y) and y.Equals(x) if someone derives another class from yours. Equality becomes tricky with inheritance.
You should also make your fields private, exposing them only as propertes. If this is going to be used as a key in a dictionary, I strongly recommend that you make them readonly, too. Changing the contents of a key when it's used in a dictionary is likely to lead to it being "unfindable" later.
The default implementation of the GetHashCode method uses the reference of the object rather than the values in the object. You have to use the same instance of the object as you used to put the data in the dictionary for that to work.
An implementation of GetHashCode that works simply creates a code from the hash codes of it's data members:
public int GetHashCode() {
return LowerLimit.GetHashCode() ^ UpperLimit.GetHashCode();
}
(This is the same implementation that the Point structure uses.)
Any implementation of the method that always returns the same hash code for any given parameter values works when used in a Dictionary. Just returning the same hash code for all values actually also works, but then the performance of the Dictionary gets bad (looking up a key becomes an O(n) operation instead of an O(1) operation. To give the best performance, the method should distribute the hash codes evenly within the range.
If your data is strongly biased, the above implementation might not give the best performance. If you for example have a lot of ranges where the lower and upper limits are the same, they will all get the hash code zero. In that case something like this might work better:
public int GetHashCode() {
return (LowerLimit.GetHashCode() * 251) ^ UpperLimit.GetHashCode();
}
You should consider making the class immutable, i.e. make it's properties read-only and only setting them in the constructor. If you change the properties of an object while it's in a Dictionary, it's hash code will change and you will not be able to access the object any more.

Units of measure in C# - almost

Inspired by Units of Measure in F#, and despite asserting (here) that you couldn't do it in C#, I had an idea the other day which I've been playing around with.
namespace UnitsOfMeasure
{
public interface IUnit { }
public static class Length
{
public interface ILength : IUnit { }
public class m : ILength { }
public class mm : ILength { }
public class ft : ILength { }
}
public class Mass
{
public interface IMass : IUnit { }
public class kg : IMass { }
public class g : IMass { }
public class lb : IMass { }
}
public class UnitDouble<T> where T : IUnit
{
public readonly double Value;
public UnitDouble(double value)
{
Value = value;
}
public static UnitDouble<T> operator +(UnitDouble<T> first, UnitDouble<T> second)
{
return new UnitDouble<T>(first.Value + second.Value);
}
//TODO: minus operator/equality
}
}
Example usage:
var a = new UnitDouble<Length.m>(3.1);
var b = new UnitDouble<Length.m>(4.9);
var d = new UnitDouble<Mass.kg>(3.4);
Console.WriteLine((a + b).Value);
//Console.WriteLine((a + c).Value); <-- Compiler says no
The next step is trying to implement conversions (snippet):
public interface IUnit { double toBase { get; } }
public static class Length
{
public interface ILength : IUnit { }
public class m : ILength { public double toBase { get { return 1.0;} } }
public class mm : ILength { public double toBase { get { return 1000.0; } } }
public class ft : ILength { public double toBase { get { return 0.3048; } } }
public static UnitDouble<R> Convert<T, R>(UnitDouble<T> input) where T : ILength, new() where R : ILength, new()
{
double mult = (new T() as IUnit).toBase;
double div = (new R() as IUnit).toBase;
return new UnitDouble<R>(input.Value * mult / div);
}
}
(I would have liked to avoid instantiating objects by using static, but as we all know you can't declare a static method in an interface)
You can then do this:
var e = Length.Convert<Length.mm, Length.m>(c);
var f = Length.Convert<Length.mm, Mass.kg>(d); <-- but not this
Obviously, there is a gaping hole in this, compared to F# Units of measure (I'll let you work it out).
Oh, the question is: what do you think of this? Is it worth using? Has someone else already done better?
UPDATE for people interested in this subject area, here is a link to a paper from 1997 discussing a different kind of solution (not specifically for C#)
You are missing dimensional analysis. For example (from the answer you linked to), in F# you can do this:
let g = 9.8<m/s^2>
and it will generate a new unit of acceleration, derived from meters and seconds (you can actually do the same thing in C++ using templates).
In C#, it is possible to do dimensional analysis at runtime, but it adds overhead and doesn't give you the benefit of compile-time checking. As far as I know there's no way to do full compile-time units in C#.
Whether it's worth doing depends on the application of course, but for many scientific applications, it's definitely a good idea. I don't know of any existing libraries for .NET, but they probably exist.
If you are interested in how to do it at runtime, the idea is that each value has a scalar value and integers representing the power of each basic unit.
class Unit
{
double scalar;
int kg;
int m;
int s;
// ... for each basic unit
public Unit(double scalar, int kg, int m, int s)
{
this.scalar = scalar;
this.kg = kg;
this.m = m;
this.s = s;
...
}
// For addition/subtraction, exponents must match
public static Unit operator +(Unit first, Unit second)
{
if (UnitsAreCompatible(first, second))
{
return new Unit(
first.scalar + second.scalar,
first.kg,
first.m,
first.s,
...
);
}
else
{
throw new Exception("Units must match for addition");
}
}
// For multiplication/division, add/subtract the exponents
public static Unit operator *(Unit first, Unit second)
{
return new Unit(
first.scalar * second.scalar,
first.kg + second.kg,
first.m + second.m,
first.s + second.s,
...
);
}
public static bool UnitsAreCompatible(Unit first, Unit second)
{
return
first.kg == second.kg &&
first.m == second.m &&
first.s == second.s
...;
}
}
If you don't allow the user to change the value of the units (a good idea anyways), you could add subclasses for common units:
class Speed : Unit
{
public Speed(double x) : base(x, 0, 1, -1, ...); // m/s => m^1 * s^-1
{
}
}
class Acceleration : Unit
{
public Acceleration(double x) : base(x, 0, 1, -2, ...); // m/s^2 => m^1 * s^-2
{
}
}
You could also define more specific operators on the derived types to avoid checking for compatible units on common types.
Using separate classes for different units of the same measure (e.g., cm, mm, and ft for Length) seems kind of weird. Based on the .NET Framework's DateTime and TimeSpan classes, I would expect something like this:
Length length = Length.FromMillimeters(n1);
decimal lengthInFeet = length.Feet;
Length length2 = length.AddFeet(n2);
Length length3 = length + Length.FromMeters(n3);
You could add extension methods on numeric types to generate measures. It'd feel a bit DSL-like:
var mass = 1.Kilogram();
var length = (1.2).Kilometres();
It's not really .NET convention and might not be the most discoverable feature, so perhaps you'd add them in a devoted namespace for people who like them, as well as offering more conventional construction methods.
I recently released Units.NET on GitHub and on NuGet.
It gives you all the common units and conversions. It is light-weight, unit tested and supports PCL.
Example conversions:
Length meter = Length.FromMeters(1);
double cm = meter.Centimeters; // 100
double yards = meter.Yards; // 1.09361
double feet = meter.Feet; // 3.28084
double inches = meter.Inches; // 39.3701
Now such a C# library exists:
http://www.codeproject.com/Articles/413750/Units-of-Measure-Validator-for-Csharp
It has almost the same features as F#'s unit compile time validation, but for C#.
The core is a MSBuild task, which parses the code and looking for validations.
The unit information are stored in comments and attributes.
Here's my concern with creating units in C#/VB. Please correct me if you think I'm wrong. Most implementations I've read about seem to involve creating a structure that pieces together a value (int or double) with a unit. Then you try to define basic functions (+-*/,etc) for these structures that take into account unit conversions and consistency.
I find the idea very attractive, but every time I balk at what a huge step for a project this appears to be. It looks like an all-or-nothing deal. You probably wouldn't just change a few numbers into units; the whole point is that all data inside a project is appropriately labeled with a unit to avoid any ambiguity. This means saying goodbye to using ordinary doubles and ints, every variable is now defined as a "Unit" or "Length" or "Meters", etc. Do people really do this on a large scale? So even if you have a large array, every element should be marked with a unit. This will obviously have both size and performance ramifications.
Despite all the cleverness in trying to push the unit logic into the background, some cumbersome notation seems inevitable with C#. F# does some behind-the-scenes magic that better reduces the annoyance factor of the unit logic.
Also, how successfully can we make the compiler treat a unit just like an ordinary double when we so desire, w/o using CType or ".Value" or any additional notation? Such as with nullables, the code knows to treat a double? just like a double (of course if your double? is null then you get an error).
Thanks for the idea. I have implemented units in C# many different ways there always seems to be a catch. Now I can try one more time using the ideas discussed above. My goal is to be able to define new units based on existing ones like
Unit lbf = 4.44822162*N;
Unit fps = feet/sec;
Unit hp = 550*lbf*fps
and for the program to figure out the proper dimensions, scaling and symbol to use. In the end I need to build a basic algebra system that can convert things like (m/s)*(m*s)=m^2 and try to express the result based on existing units defined.
Also a requirement must be to be able to serialize the units in a way that new units do not need to be coded, but just declared in a XML file like this:
<DefinedUnits>
<DirectUnits>
<!-- Base Units -->
<DirectUnit Symbol="kg" Scale="1" Dims="(1,0,0,0,0)" />
<DirectUnit Symbol="m" Scale="1" Dims="(0,1,0,0,0)" />
<DirectUnit Symbol="s" Scale="1" Dims="(0,0,1,0,0)" />
...
<!-- Derived Units -->
<DirectUnit Symbol="N" Scale="1" Dims="(1,1,-2,0,0)" />
<DirectUnit Symbol="R" Scale="1.8" Dims="(0,0,0,0,1)" />
...
</DirectUnits>
<IndirectUnits>
<!-- Composite Units -->
<IndirectUnit Symbol="m/s" Scale="1" Lhs="m" Op="Divide" Rhs="s"/>
<IndirectUnit Symbol="km/h" Scale="1" Lhs="km" Op="Divide" Rhs="hr"/>
...
<IndirectUnit Symbol="hp" Scale="550.0" Lhs="lbf" Op="Multiply" Rhs="fps"/>
</IndirectUnits>
</DefinedUnits>
there is jscience: http://jscience.org/, and here is a groovy dsl for units: http://groovy.dzone.com/news/domain-specific-language-unit-. iirc, c# has closures, so you should be able to cobble something up.
Why not use CodeDom to generate all possible permutations of the units automatically? I know it's not the best - but I will definitely work!
you could use QuantitySystem instead of implementing it by your own. It builds on F# and drastically improves unit handling in F#. It's the best implementation I found so far and can be used in C# projects.
http://quantitysystem.codeplex.com
Is it worth using?
Yes. If I have "a number" in front of me, I want to know what that is. Any time of the day. Besides, that's what we usually do. We organize data into a meaningful entity -class, struct, you name it. Doubles into coordinates, strings into names and address etc. Why units should be any different?
Has someone else already done better?
Depends on how one defines "better". There are some libraries out there but I haven't tried them so I don't have an opinion. Besides it spoils the fun of trying it myself :)
Now about the implementation. I would like to start with the obvious: it's futile to try replicate the [<Measure>] system of F# in C#. Why? Because once F# allows you to use / ^ (or anything else for that matter) directly on another type, the game is lost. Good luck doing that in C# on a struct or class. The level of metaprogramming required for such a task does not exist and I'm afraid it is not going to be added any time soon -in my opinion. That's why you lack the dimensional analysis that Matthew Crumley mentioned in his answer.
Let's take the example from fsharpforfunandprofit.com: you have Newtons defined as [<Measure>] type N = kg m/sec^2. Now you have the square function that that the author created that will return a N^2 which sounds "wrong", absurd and useless. Unless you want to perform arithmetic operations where at some point during the evaluation process, you might get something "meaningless" until you multiply it with some other unit and you get a meaningful result. Or even worse, you might want to use constants. For example the gas constant R which is 8.31446261815324 J /(K mol). If you define the appropriate units, then F# is ready to consume the R constant. C# is not. You need to specify another type just for that and still you won't be able to do any operation you want on that constant.
That doesn't mean that you shouldn't try. I did and I am quite happy with the results. I started SharpConvert around 3 years ago, after I got inspired by this very question. The trigger was this story: once I had to fix a nasty bug for the RADAR simulator that I develop: an aircraft was plunging in the earth instead of following the predefined glide path. That didn't make me happy as you could guess and after 2 hours of debugging, I realized that somewhere in my calculations, I was treating kilometers as nautical miles. Until that point I was like "oh well I will just be 'careful'" which is at least naive for any non trivial task.
In your code there would be a couple of things I would do different.
First I would turn UnitDouble<T> and IUnit implementations into structs. A unit is just that, a number and if you want them to be treated like numbers, a struct is a more appropriate approach.
Then I would avoid the new T() in the methods. It does not invoke the constructor, it uses Activator.CreateInstance<T>() and for number crunching it will be bad as it will add overhead. That depends though on the implementation, for a simple units converter application it won't harm. For time critical context avoid like the plague. And don't take me wrong, I used it myself as I didn't know better and I run some simple benchmarks the other day and such a call might double the execution time -at least in my case. More details in Dissecting the new() constraint in C#: a perfect example of a leaky abstraction
I would also change Convert<T, R>() and make it a member function. I prefer writing
var c = new Unit<Length.mm>(123);
var e = c.Convert<Length.m>();
rather than
var e = Length.Convert<Length.mm, Length.m>(c);
Last but not least I would use specific unit "shells" for each physical quantity (length time etc) instead of the UnitDouble, as it will be easier to add physical quantity specific functions and operator overloads. It will also allow you to create a Speed<TLength, TTime> shell instead of another Unit<T1, T2> or even Unit<T1, T2, T3> class. So it would look like that:
public readonly struct Length<T> where T : struct, ILength
{
private static readonly double SiFactor = new T().ToSiFactor;
public Length(double value)
{
if (value < 0) throw new ArgumentException(nameof(value));
Value = value;
}
public double Value { get; }
public static Length<T> operator +(Length<T> first, Length<T> second)
{
return new Length<T>(first.Value + second.Value);
}
public static Length<T> operator -(Length<T> first, Length<T> second)
{
// I don't know any application where negative length makes sense,
// if it does feel free to remove Abs() and the exception in the constructor
return new Length<T>(System.Math.Abs(first.Value - second.Value));
}
// You can add more like
// public static Area<T> operator *(Length<T> x, Length<T> y)
// or
//public static Volume<T> operator *(Length<T> x, Length<T> y, Length<T> z)
// etc
public Length<R> To<R>() where R : struct, ILength
{
//notice how I got rid of the Activator invocations by moving them in a static field;
//double mult = new T().ToSiFactor;
//double div = new R().ToSiFactor;
return new Length<R>(Value * SiFactor / Length<R>.SiFactor);
}
}
Notice also that, in order to save us from the dreaded Activator call, I stored the result of new T().ToSiFactor in SiFactor. It might seem awkward at first, but as Length is generic, Length<mm> will have its own copy, Length<Km> its own, and so on and so forth. Please note that ToSiFactor is the toBase of your approach.
The problem that I see is that as long as you are in the realm of simple units and up to the first derivative of time, things are simple. If you try to do something more complex, then you can see the drawbacks of this approach. Typing
var accel = new Acceleration<m, s, s>(1.2);
will not be as clear and "smooth" as
let accel = 1.2<m/sec^2>
And regardless of the approach, you will have to specify every math operation you will need with hefty operator overloading, while in F# you have this for free, even if the results are not meaningful as I was writing at the beginning.
The last drawback (or advantage depending on how you see it) of this design, is that it can't be unit agnostic. If there are cases that you need "just a Length" you can't have it. You need to know each time if your Length is millimeters, statute mile or foot. I took the opposite approach in SharpConvert and LengthUnit derives from UnitBase and Meters Kilometers etc derive from this. That's why I couldn't go down the struct path by the way. That way you can have:
LengthUnit l1 = new Meters(12);
LengthUnit l2 = new Feet(15.4);
LengthUnit sum = l1 + l2;
sum will be meters but one shouldn't care as long as they want to use it in the next operation. If they want to display it, then they can call sum.To<Kilometers>() or whatever unit. To be honest, I don't know if not "locking" the variable to a specific unit has any advantages. It might worth investigating it at some point.
I would like the compiler to help me as much as possible. So maybe you could have a TypedInt where T contains the actual unit.
public struct TypedInt<T>
{
public int Value { get; }
public TypedInt(int value) => Value = value;
public static TypedInt<T> operator -(TypedInt<T> a, TypedInt<T> b) => new TypedInt<T>(a.Value - b.Value);
public static TypedInt<T> operator +(TypedInt<T> a, TypedInt<T> b) => new TypedInt<T>(a.Value + b.Value);
public static TypedInt<T> operator *(int a, TypedInt<T> b) => new TypedInt<T>(a * b.Value);
public static TypedInt<T> operator *(TypedInt<T> a, int b) => new TypedInt<T>(a.Value * b);
public static TypedInt<T> operator /(TypedInt<T> a, int b) => new TypedInt<T>(a.Value / b);
// todo: m² or m/s
// todo: more than just ints
// todo: other operations
public override string ToString() => $"{Value} {typeof(T).Name}";
}
You could have an extensiom method to set the type (or just new):
public static class TypedInt
{
public static TypedInt<T> Of<T>(this int value) => new TypedInt<T>(value);
}
The actual units can be anything. That way, the system is extensible.
(There's multiple ways of handling conversions. What do you think is best?)
public class Mile
{
// todo: conversion from mile to/from meter
// maybe define an interface like ITypedConvertible<Meter>
// conversion probably needs reflection, but there may be
// a faster way
};
public class Second
{
}
This way, you can use:
var distance1 = 10.Of<Mile>();
var distance2 = 15.Of<Mile>();
var timespan1 = 4.Of<Second>();
Console.WriteLine(distance1 + distance2);
//Console.WriteLine(distance1 + 5); // this will be blocked by the compiler
//Console.WriteLine(distance1 + timespan1); // this will be blocked by the compiler
Console.WriteLine(3 * distance1);
Console.WriteLine(distance1 / 3);
//Console.WriteLine(distance1 / timespan1); // todo!
See Boo Ometa (which will be available for Boo 1.0):
Boo Ometa and Extensible Parsing
I really liked reading through this stack overflow question and its answers.
I have a pet project that I've tinkered with over the years, and have recently started re-writing it and have released it to the open source at https://github.com/MafuJosh/NGenericDimensions
It happens to be somewhat similar to many of the ideas expressed in the question and answers of this page.
It basically is about creating generic dimensions, with the unit of measure and the native datatype as the generic type placeholders.
For example:
Dim myLength1 as New Length(of Miles, Int16)(123)
With also some optional use of Extension Methods like:
Dim myLength2 = 123.miles
And
Dim myLength3 = myLength1 + myLength2
Dim myArea1 = myLength1 * myLength2
This would not compile:
Dim myValue = 123.miles + 234.kilograms
New units can be extended in your own libraries.
These datatypes are structures that contain only 1 internal member variable, making them lightweight.
Basically, the operator overloads are restricted to the "dimension" structures, so that every unit of measure doesn't need operator overloads.
Of course, a big downside is the longer declaration of the generics syntax that requires 3 datatypes. So if that is a problem for you, then this isn't your library.
The main purpose was to be able to decorate an interface with units in a compile-time checking fashion.
There is a lot that needs to be done to the library, but I wanted to post it in case it was the kind of thing someone was looking for.

Categories