Efficient implementation of flyweight pattern - c#

Background
One of the most used data-structures in our application is a custom Point struct. Recently we have been running into memory issues, mostly caused by an excessive number of instances of this struct.
Many of these instances contain the same data. Sharing a single instance would significantly help to reduce memory usage. However, since we are using structs, instances cannot be shared. It is also not possible to change it to a class, because the struct semantics are important.
Our workaround for this is to have a struct containing a single reference to a backing class, which contains the actual data. These flyweight dataclasses are stored in and retrieved from a factory to ensure no duplicates exist.
A narrowed down version of the code looks something like this:
public struct PointD
{
//Factory
private static class PointDatabase
{
private static readonly Dictionary<PointData, PointData> _data = new Dictionary<PointData, PointData>();
public static PointData Get(double x, double y)
{
var key = new PointData(x, y);
if (!_data.ContainsKey(key))
_data.Add(key, key);
return _data[key];
}
}
//Flyweight data
private class PointData
{
private double pX;
private double pY;
public PointData(double x, double y)
{
pX = x;
pY = y;
}
public double X
{
get { return pX; }
}
public double Y
{
get { return pY; }
}
public override bool Equals(object obj)
{
var other = obj as PointData;
if (other == null)
return false;
return other.X == this.X && other.Y == this.Y;
}
public override int GetHashCode()
{
return X.GetHashCode() * Y.GetHashCode();
}
}
//Public struct
public Point(double x, double y)
{
_data = Point3DDatabase.Get(x, y);
}
public double X
{
get { return _data == null ? 0 : _data.X; }
set { _data = PointDatabase.Get(value, Y); }
}
public double Y
{
get { return _data == null ? 0 : _data.Y; }
set { _data = PointDatabase.Get(X, value); }
}
}
This implementation ensures that the struct semantics are maintained, while ensuring only one instance of the same data is kept around.
(Please don't mention memory leaks or such, this is simplified example code)
The Problem
Although above approach works to lower our memory usage, the performance is horrendous. A project in our application can easily contain a million different points or more. As a result, the lookup of a PointData instance is very costly. And this lookup has to be done whenever a Point is manipulated, which, as you can probably guess, is what our application is all about. As a result, this approach is not suitable for us.
As an alternative, we could make two versions of the Point class: one with backing flyweight as above, and one containing its own data (with possible duplicates). All (short-lived) calculations could be done in the second class, while when storing the Point for longer durations they could be converted to the first, memory-efficient class. However, this means that all the users of the Point class have to be inspected and adjusted to this scheme, something which is not feasible for us.
What we are looking for is an approach which meets below criteria:
When there are multiple Points with the same data, the memory usage should be lower than having a different struct instance for each of these.
Performance should not be much worse than working directly on primitive data in the struct.
Struct semantics should be maintained.
The 'Point' interface should remain the same (i.e. classes that use 'Point' should not have to be changed).
Is there any way we can improve our approach towards these criteria? Or can anyone suggest a different approach we can attempt?

Rather than re-work an entire data structure and programming model, my go-to solution for performance and memory issues is to cache, pre-fetch and most importantly cull you data when it is not needed.
Think of it this way. On a graph, you cannot display few millions of points at once because you run out of pixels (you should occlusion-cull these points). Similarly, in a table, there isn't enough vertical space on screen (you need data set truncation). Consider streaming data from your source file as you need it. If your source data structure is not appropriate for dynamic retrieval, consider an intermediate, temporary file format. This is one of the ways .Net's JITer works so quickly!

Related

Implementing IEnumerator<T> for Fixed Arrays

I need to implement a mutable polygon that behaves like a struct, that it is copied by value and changes to the copy have no side effects on the original.
Consider my attempt at writing a struct for such a type:
public unsafe struct Polygon : IEnumerable<System.Drawing.PointF>
{
private int points;
private fixed float xPoints[64];
private fixed float yPoints[64];
public PointF this[int i]
{
get => new PointF(xPoints[i], yPoints[i]);
set
{
xPoints[i] = value.X;
yPoints[i] = value.Y;
}
}
public IEnumerator<PointF> GetEnumerator()
{
return new PolygonEnumerator(ref this);
}
}
I have a requirement that a Polygon must be copied by value so it is a struct.
(Rationale: Modifying a copy shouldn't have side effects on the original.)
I would also like it to implement IEnumerable<PointF>.
(Rationale: Being able to write for (PointF p in poly))
As far as I am aware, C# does not allow you to override the copy/assignment behaviour for value types. If that is possible then there's the "low hanging fruit" that would answer my question.
My approach to implementing the copy-by-value behaviour of Polygon is to use unsafe and fixed arrays to allow a polygon to store up to 64 points in the struct itself, which prevents the polygon from being indirectly modified through its copies.
I am running into a problem when I go to implement PolygonEnumerator : IEnumerator<PointF> though.
Another requirement (wishful thinking) is that the enumerator will return PointF values that match the Polygon's fixed arrays, even if those points are modified during iteration.
(Rationale: Iterating over arrays works like this, so this polygon should behave in line with the user's expectations.)
public class PolygonEnumerator : IEnumerator<PointF>
{
private int position = -1;
private ??? poly;
public PolygonEnumerator(ref Polygon p)
{
// I know I need the ref keyword to ensure that the Polygon
// passed into the constructor is not a copy
// However, the class can't have a struct reference as a field
poly = ???;
}
public PointF Current => poly[position];
// the rest of the IEnumerator implementation seems straightforward to me
}
What can I do to implement the PolygonEnumerator class according to my requirements?
It seems to me that I can't store a reference to the original polygon, so I have to make a copy of its points into the enumerator itself; But that means changes to the original polygon can't be visited by the enumerator!
I am completely OK with an answer that says "It's impossible".
Maybe I've dug a hole for myself here while missing a useful language feature or conventional solution to the original problem.
Your Polygon type should not be a struct because ( 64 + 64 ) * sizeof(float) == 512 bytes. That means every value-copy operation will require a copy of 512 bytes - which is very inefficient (not least because of locality-of-reference which strongly favours the use objects that exist in a single location in memory).
I have a requirement that a Polygon must be copied by value so it is a struct.
(Rationale: Modifying a copy shouldn't have side effects on the original.)
Your "requirement" is wrong. Instead define an immutable class with an explicit copy operation - and/or use a mutable "builder" object for efficient construction of large objects.
I would also like it to implement IEnumerable<PointF>.
(Rationale: Being able to write for (PointF p in poly))
That's fine - but you hardly ever need to implement IEnumerator<T> directly yourself because C# can do it for you when using yield return (and the generated CIL is very optimized!).
My approach to implementing the copy-by-value behaviour of Polygon is to use unsafe and fixed arrays to allow a polygon to store up to 64 points in the struct itself, which prevents the polygon from being indirectly modified through its copies.
This is not how C# should be written. unsafe should be avoided wherever possible (because it breaks the CLR's built-in guarantees and safeguards).
Another requirement (wishful thinking) is that the enumerator will return PointF values that match the Polygon's fixed arrays, even if those points are modified during iteration.
(Rationale: Iterating over arrays works like this, so this polygon should behave in line with the user's expectations.)
Who are your users/consumers in this case? If you're so concerned about not breaking user's expectations then you shouldn't use unsafe!
Consider this approach instead:
(Update: I just realised that the class Polygon I defined below is essentially just a trivial wrapper around ImmutableList<T> - so you don't even need class Polygon, so just use ImmutableList<Point> instead)
public struct Point
{
public Point( Single x, Single y )
{
this.X = x;
this.Y = y;
}
public Single X { get; }
public Single Y { get; }
// TODO: Implement IEquatable<Point>
}
public class Polygon : IEnumerable<Point>
{
private readonly ImmutableList<Point> points;
public Point this[int i] => this.points[i];
public Int32 Count => this.points[i];
public Polygon()
{
this.points = new ImmutableList<Point>();
}
private Polygon( ImmutableList<Point> points )
{
this.points = points;
}
public IEnumerator<PointF> GetEnumerator()
{
//return Enumerable.Range( 0, this.points ).Select( i => this[i] );
return this.points.GetEnumerator();
}
public Polygon AddPoint( Single x, Single y ) => this.AddPoint( new Point( x, y ) );
public Polygon AddPoint( Point p )
{
ImmutableList<Point> nextList = this.points.Add( p );
return new Polygon( points: nextList );
}
}

Allocate array in the same object as class

I have an array inside a class:
class MatchNode
{
public short X;
public short Y;
public NodeVal[] ControlPoints;
private MatchNode()
{
ControlPoints = new NodeVal[4];
}
}
The NodeVal is:
struct NodeVal
{
public readonly short X;
public readonly short Y;
public NodeVal(short x, short y)
{
X = x;
Y = y;
}
}
Now what if we wanted to take performance to next level and avoid having a separate object for the array. Actually it doesn't have to have an array. The only restriction is that the client code should be able to access NodeVal by index like:
matchNode.ControlPoints[i]
OR
matchNode[i]
and of course the solution should be faster or as fast as array access since it's supposed to be an optimization.
EDIT: As Ryan suggested it seems I should explain more about the motivation:
The MatchNode class is used heavily in the project. Millions of them are used in the project and each are accessed hundreds of times so having them as compact and concise as possible can lead to less cache misses and overall performance.
Let's consider a 64bit machine. In the current implementation the class the array takes 8 bytes for the ControlPoints reference and the size of the array object would be at least 16 bytes of object overhead (for method table and sync block) and 16 byte for the actual byte. So we have at least 24 overhead bytes beside 16 bytes of actual data.
These objects are used in bottlenecks of the project so it matters if we could optimize them more.
Of course we could just have a super big array of NodeVal and just save an index in MatchNode that would locate the actual data but again it will change every client codes that uses the MatchNodes, let alone be a dirty non-object oriented solution.
It is okay to have a messy MatchNode that uses every kind of nasty trick like unsafe or static cache code. It is not okay to leak these optimizations out to the client code.
You´re looking for indexers:
class MatchNode
{
public short X;
public short Y;
private NodeVal[] myField;
public NodeVal this[int i] { get { return myField[i]; } set { myField[i] = value; } }
public MatchNode(int size) { this.myField = new NodeVal[size]; }
}
Now you can simply use this:
var m = new MatchNode(10);
m[0] = new NodeVal();
However I doubt this will affect performance (at least in means of speed) in any way and you should consider the actual problems using a profiling tool (dotTrace for instance). Furthermore this approach will also create a private backing-field which will produce the same memory-footprint.

Should I use Int32[,] or System.Drawing.Point when all I want is the x,y coordinates?

I am building an app that lets me control my Android devices from my PC. It's running great so now I want to start cleaning up my code for release. I'm trying to clean up solution references that I don't need so I took a look at the using System.Drawing; that I have for implementing the Point class. The thing is, I don't really need it if I switch to using a two-dimensional Int32 array.
So I could have: new Int32[,] {{200, 300}}; instead of new Point(200, 300); and get rid of the System.Drawing namespace altogether. The question is: does it really matter? Am I realistically introducing bloat in my app by keeping the System.Drawing namespace? Is Int32[,] meaningfully more lightweight?
Or, should I not use either and just keep track of the x,y coordinates in individual Int32 variables?
EDIT: I got rid of the original idea I wrote: Int32[200, 300] and replaced it with new Int32[,] {{200, 300}}; because as #Martin Mulder pointed out Int32[200, 300] "creates a two-dimensional array with 60000 integers, all of them are 0."
EDIT2: So I'm dumb. First of all I was trying to fancify too much by using the multi-dimensional array. Utter, overboard silliness. Secondly, I took the advice to use a struct and it all worked flawlessly, so thank you to the first four answers; every one of them was correct. But, after all that, I couldn't end up removing the System.Drawing reference because I was working on a WinForms app and the System.Drawing is being used all over in the designer of the app! I suppose I could further refactor it but I got the size down to 13KB so it's good enough. Thank you all!
Just create your own:
public struct Point : IEquatable<Point>
{
private int _x;
private int _y;
public int X
{
get { return _x; }
set { _x = value; }
}
public int Y
{
get { return _y; }
set { _y = value; }
}
public Point(int x, int y)
{
_x = x;
_y = y;
}
public bool Equals(Point other)
{
return X == other.X && Y == other.Y;
}
public override bool Equals(object other)
{
return other is Point && Equals((Point)other);
}
public int GetHashCode()
{
return unchecked(X * 1021 + Y);
}
}
Better yet, make it immutable (make the fields readonly and remove the setters), though if you'd depended on the mutability of the two options you consider in your question then that'll require more of a change to how you do things. But really, immutability is the way to go here.
You are suggesting very ill advised:.
new Point(200, 300) creates a new point with two integers: The X and Y property with values 200 and 300.
new Int32[200,300] creates a two-dimensional array with 60000 integers, all of them are 0.
(After your edit) new Int32[,] {{200, 300}} also creates a two-dimensional array, this time with 2 integers. To retrieve the first value (200), you can access it like this: array[0,0] and the second value (300) like array[0,1]. The second dimension is not required or needed or desired.
If you want to get rid of the reference to the library there are a few other suggestions:
new Int32[] {200, 300} creates an one-dimensional array of two integers with values 200 and 300. You can access them with array[0] and array[1].
As Ron Beyer suggested, you could use Tuple<int, int>.
Create your own Point-struct (pointed out by Jon Hanna). It makes your applicatie a bit larger, but you prevent the reference and you prevent the library System.Drawing is loaded into memory.
If I wanted to remove that reference, I would go for the last option since it is more clear to what I am doing (a Point is more readable than an Int32-array or Tuple). Solution 2 and 3 are slightly faster that solution 1.
Nothing gets "embedded" in your application by just referencing a library. However, if the Point class really is all you need, you could just remove the reference and implement you own Point struct. That may be more intuitive to read instead of an int array.
Int32[,] is something different by the way. It's a two-dimensional array, not a pair of two int values. You'll be making things worse by using that.
You could use Tuple<int, int>, but I'd go for creating your own structure.
As some people have suggested implementations here. So just wrap your two integers, I'd just use this:
public class MyPoint
{
public int X;
public int Y;
}
Add all other features only if needed.
As #Glorin Oakenfoot said, you should implement your own Point class. Here's an example:
public class MyPoint // Give it a unique name to avoid collisions
{
public int X { get; set; }
public int Y { get; set; }
public MyPoint() {} // Default constructor allows you to use object initialization.
public MyPoint(int x, int y) { X = x, Y = y }
}

Design pattern for subset view of larger dataset

I have a data structure consisting of thousands of medium sized (hundreds of byte) objects, which each represent a subset of a larger dataset. This isn't optimal for several reasons (complexity when analyzing larges scopes, strain on garbage collector, etc.)
Conceptually, you can imagine the objects representing for example meteorological data for a day, when the dataset as a whole is the data for a year (say). Trivial example:
class YearData
{
private readonly DayData[] days = new DayData[365];
public DayData GetDayData(int dayNumber)
{
return days[dayNumber];
}
}
class DayData
{
private readonly double[] temperatures = new double[24];
public double GetTemperature(int hour)
{
return temperatures[hour];
}
public void SetTemperature(int hour, double temperature)
{
temperatures[hour] = temperature;
}
}
In a refactoring effort I have tried to move the data to a single object representing the whole dataset, but to keep the rest of the code unchanged (and simple), I need the objects representing the subset/segment of data. Example:
class YearData
{
private readonly double[] temperatures = new double[365*24];
public DayData GetDayData(int day)
{
return new DayData(this, day);
}
internal double GetTemperature(int day, int hour)
{
return temperatures[day*24 + hour];
}
internal double SetTemperature(int day, int hour, double temperature)
{
temperatures[day*24 + hour] = temperature;
}
}
class DayData // or struct?
{
private readonly YearData yearData;
private readonly int dayNumber;
public DayData(YearData yearData, int dayNumber)
{
this.yearData = yearData;
this.dayNumber = dayNumber;
}
public double GetTemperature(int hour)
{
return yearData.GetData(dayNumber, hour);
}
public void SetTemperature(int hour, double temperature)
{
yearData.SetData(dayNumber, hour, temperature);
}
}
This way I can have a single huge and long lived object, and I can keep many small short lived objects for the analysis of the data. GC is happier and doing analysis directly on the whole dataset is now less complicated.
My questions are, first: does this pattern have a name? Seems like it should be pretty common pattern.
Second (.NET specific): The segment object is very lightweight and immutable. Does that make it a good candidate for being a struct? Does it matter that one of the struct fields is a reference? Is it bad form to use a struct for an type that appears mutable but which in fact isn't?
Very interesting problem and approach!
I am not sure, but I think these patterns may be considered a Flyweight and an Adapter.
A Flyweight, from sourcemaking.com:
Use sharing to support large numbers of fine-grained objects efficiently.
The Motif GUI strategy of replacing heavy-weight widgets with light-weight gadgets.
You are holding the temperatures array are stored in YearData, that can be seen as a DayData Factory.
And also an Adapter:
Convert the interface of a class into another interface clients expect. Adapter lets classes work together that couldn’t otherwise
because of incompatible interfaces.
Wrap an existing class with a new interface.
Impedance match an old component to a new system
So, by exposing the DayData that way, you are providing an interface the client wants
Also, I am not sure about DayDate being a struct, you can check a good explanation about when and how to use structs in this answer
Flyweight, definitely.
It allows you to pack the large data in an optimal way but still pretend that you have an extra object for each single data.

Units of measure in C# - almost

Inspired by Units of Measure in F#, and despite asserting (here) that you couldn't do it in C#, I had an idea the other day which I've been playing around with.
namespace UnitsOfMeasure
{
public interface IUnit { }
public static class Length
{
public interface ILength : IUnit { }
public class m : ILength { }
public class mm : ILength { }
public class ft : ILength { }
}
public class Mass
{
public interface IMass : IUnit { }
public class kg : IMass { }
public class g : IMass { }
public class lb : IMass { }
}
public class UnitDouble<T> where T : IUnit
{
public readonly double Value;
public UnitDouble(double value)
{
Value = value;
}
public static UnitDouble<T> operator +(UnitDouble<T> first, UnitDouble<T> second)
{
return new UnitDouble<T>(first.Value + second.Value);
}
//TODO: minus operator/equality
}
}
Example usage:
var a = new UnitDouble<Length.m>(3.1);
var b = new UnitDouble<Length.m>(4.9);
var d = new UnitDouble<Mass.kg>(3.4);
Console.WriteLine((a + b).Value);
//Console.WriteLine((a + c).Value); <-- Compiler says no
The next step is trying to implement conversions (snippet):
public interface IUnit { double toBase { get; } }
public static class Length
{
public interface ILength : IUnit { }
public class m : ILength { public double toBase { get { return 1.0;} } }
public class mm : ILength { public double toBase { get { return 1000.0; } } }
public class ft : ILength { public double toBase { get { return 0.3048; } } }
public static UnitDouble<R> Convert<T, R>(UnitDouble<T> input) where T : ILength, new() where R : ILength, new()
{
double mult = (new T() as IUnit).toBase;
double div = (new R() as IUnit).toBase;
return new UnitDouble<R>(input.Value * mult / div);
}
}
(I would have liked to avoid instantiating objects by using static, but as we all know you can't declare a static method in an interface)
You can then do this:
var e = Length.Convert<Length.mm, Length.m>(c);
var f = Length.Convert<Length.mm, Mass.kg>(d); <-- but not this
Obviously, there is a gaping hole in this, compared to F# Units of measure (I'll let you work it out).
Oh, the question is: what do you think of this? Is it worth using? Has someone else already done better?
UPDATE for people interested in this subject area, here is a link to a paper from 1997 discussing a different kind of solution (not specifically for C#)
You are missing dimensional analysis. For example (from the answer you linked to), in F# you can do this:
let g = 9.8<m/s^2>
and it will generate a new unit of acceleration, derived from meters and seconds (you can actually do the same thing in C++ using templates).
In C#, it is possible to do dimensional analysis at runtime, but it adds overhead and doesn't give you the benefit of compile-time checking. As far as I know there's no way to do full compile-time units in C#.
Whether it's worth doing depends on the application of course, but for many scientific applications, it's definitely a good idea. I don't know of any existing libraries for .NET, but they probably exist.
If you are interested in how to do it at runtime, the idea is that each value has a scalar value and integers representing the power of each basic unit.
class Unit
{
double scalar;
int kg;
int m;
int s;
// ... for each basic unit
public Unit(double scalar, int kg, int m, int s)
{
this.scalar = scalar;
this.kg = kg;
this.m = m;
this.s = s;
...
}
// For addition/subtraction, exponents must match
public static Unit operator +(Unit first, Unit second)
{
if (UnitsAreCompatible(first, second))
{
return new Unit(
first.scalar + second.scalar,
first.kg,
first.m,
first.s,
...
);
}
else
{
throw new Exception("Units must match for addition");
}
}
// For multiplication/division, add/subtract the exponents
public static Unit operator *(Unit first, Unit second)
{
return new Unit(
first.scalar * second.scalar,
first.kg + second.kg,
first.m + second.m,
first.s + second.s,
...
);
}
public static bool UnitsAreCompatible(Unit first, Unit second)
{
return
first.kg == second.kg &&
first.m == second.m &&
first.s == second.s
...;
}
}
If you don't allow the user to change the value of the units (a good idea anyways), you could add subclasses for common units:
class Speed : Unit
{
public Speed(double x) : base(x, 0, 1, -1, ...); // m/s => m^1 * s^-1
{
}
}
class Acceleration : Unit
{
public Acceleration(double x) : base(x, 0, 1, -2, ...); // m/s^2 => m^1 * s^-2
{
}
}
You could also define more specific operators on the derived types to avoid checking for compatible units on common types.
Using separate classes for different units of the same measure (e.g., cm, mm, and ft for Length) seems kind of weird. Based on the .NET Framework's DateTime and TimeSpan classes, I would expect something like this:
Length length = Length.FromMillimeters(n1);
decimal lengthInFeet = length.Feet;
Length length2 = length.AddFeet(n2);
Length length3 = length + Length.FromMeters(n3);
You could add extension methods on numeric types to generate measures. It'd feel a bit DSL-like:
var mass = 1.Kilogram();
var length = (1.2).Kilometres();
It's not really .NET convention and might not be the most discoverable feature, so perhaps you'd add them in a devoted namespace for people who like them, as well as offering more conventional construction methods.
I recently released Units.NET on GitHub and on NuGet.
It gives you all the common units and conversions. It is light-weight, unit tested and supports PCL.
Example conversions:
Length meter = Length.FromMeters(1);
double cm = meter.Centimeters; // 100
double yards = meter.Yards; // 1.09361
double feet = meter.Feet; // 3.28084
double inches = meter.Inches; // 39.3701
Now such a C# library exists:
http://www.codeproject.com/Articles/413750/Units-of-Measure-Validator-for-Csharp
It has almost the same features as F#'s unit compile time validation, but for C#.
The core is a MSBuild task, which parses the code and looking for validations.
The unit information are stored in comments and attributes.
Here's my concern with creating units in C#/VB. Please correct me if you think I'm wrong. Most implementations I've read about seem to involve creating a structure that pieces together a value (int or double) with a unit. Then you try to define basic functions (+-*/,etc) for these structures that take into account unit conversions and consistency.
I find the idea very attractive, but every time I balk at what a huge step for a project this appears to be. It looks like an all-or-nothing deal. You probably wouldn't just change a few numbers into units; the whole point is that all data inside a project is appropriately labeled with a unit to avoid any ambiguity. This means saying goodbye to using ordinary doubles and ints, every variable is now defined as a "Unit" or "Length" or "Meters", etc. Do people really do this on a large scale? So even if you have a large array, every element should be marked with a unit. This will obviously have both size and performance ramifications.
Despite all the cleverness in trying to push the unit logic into the background, some cumbersome notation seems inevitable with C#. F# does some behind-the-scenes magic that better reduces the annoyance factor of the unit logic.
Also, how successfully can we make the compiler treat a unit just like an ordinary double when we so desire, w/o using CType or ".Value" or any additional notation? Such as with nullables, the code knows to treat a double? just like a double (of course if your double? is null then you get an error).
Thanks for the idea. I have implemented units in C# many different ways there always seems to be a catch. Now I can try one more time using the ideas discussed above. My goal is to be able to define new units based on existing ones like
Unit lbf = 4.44822162*N;
Unit fps = feet/sec;
Unit hp = 550*lbf*fps
and for the program to figure out the proper dimensions, scaling and symbol to use. In the end I need to build a basic algebra system that can convert things like (m/s)*(m*s)=m^2 and try to express the result based on existing units defined.
Also a requirement must be to be able to serialize the units in a way that new units do not need to be coded, but just declared in a XML file like this:
<DefinedUnits>
<DirectUnits>
<!-- Base Units -->
<DirectUnit Symbol="kg" Scale="1" Dims="(1,0,0,0,0)" />
<DirectUnit Symbol="m" Scale="1" Dims="(0,1,0,0,0)" />
<DirectUnit Symbol="s" Scale="1" Dims="(0,0,1,0,0)" />
...
<!-- Derived Units -->
<DirectUnit Symbol="N" Scale="1" Dims="(1,1,-2,0,0)" />
<DirectUnit Symbol="R" Scale="1.8" Dims="(0,0,0,0,1)" />
...
</DirectUnits>
<IndirectUnits>
<!-- Composite Units -->
<IndirectUnit Symbol="m/s" Scale="1" Lhs="m" Op="Divide" Rhs="s"/>
<IndirectUnit Symbol="km/h" Scale="1" Lhs="km" Op="Divide" Rhs="hr"/>
...
<IndirectUnit Symbol="hp" Scale="550.0" Lhs="lbf" Op="Multiply" Rhs="fps"/>
</IndirectUnits>
</DefinedUnits>
there is jscience: http://jscience.org/, and here is a groovy dsl for units: http://groovy.dzone.com/news/domain-specific-language-unit-. iirc, c# has closures, so you should be able to cobble something up.
Why not use CodeDom to generate all possible permutations of the units automatically? I know it's not the best - but I will definitely work!
you could use QuantitySystem instead of implementing it by your own. It builds on F# and drastically improves unit handling in F#. It's the best implementation I found so far and can be used in C# projects.
http://quantitysystem.codeplex.com
Is it worth using?
Yes. If I have "a number" in front of me, I want to know what that is. Any time of the day. Besides, that's what we usually do. We organize data into a meaningful entity -class, struct, you name it. Doubles into coordinates, strings into names and address etc. Why units should be any different?
Has someone else already done better?
Depends on how one defines "better". There are some libraries out there but I haven't tried them so I don't have an opinion. Besides it spoils the fun of trying it myself :)
Now about the implementation. I would like to start with the obvious: it's futile to try replicate the [<Measure>] system of F# in C#. Why? Because once F# allows you to use / ^ (or anything else for that matter) directly on another type, the game is lost. Good luck doing that in C# on a struct or class. The level of metaprogramming required for such a task does not exist and I'm afraid it is not going to be added any time soon -in my opinion. That's why you lack the dimensional analysis that Matthew Crumley mentioned in his answer.
Let's take the example from fsharpforfunandprofit.com: you have Newtons defined as [<Measure>] type N = kg m/sec^2. Now you have the square function that that the author created that will return a N^2 which sounds "wrong", absurd and useless. Unless you want to perform arithmetic operations where at some point during the evaluation process, you might get something "meaningless" until you multiply it with some other unit and you get a meaningful result. Or even worse, you might want to use constants. For example the gas constant R which is 8.31446261815324 J /(K mol). If you define the appropriate units, then F# is ready to consume the R constant. C# is not. You need to specify another type just for that and still you won't be able to do any operation you want on that constant.
That doesn't mean that you shouldn't try. I did and I am quite happy with the results. I started SharpConvert around 3 years ago, after I got inspired by this very question. The trigger was this story: once I had to fix a nasty bug for the RADAR simulator that I develop: an aircraft was plunging in the earth instead of following the predefined glide path. That didn't make me happy as you could guess and after 2 hours of debugging, I realized that somewhere in my calculations, I was treating kilometers as nautical miles. Until that point I was like "oh well I will just be 'careful'" which is at least naive for any non trivial task.
In your code there would be a couple of things I would do different.
First I would turn UnitDouble<T> and IUnit implementations into structs. A unit is just that, a number and if you want them to be treated like numbers, a struct is a more appropriate approach.
Then I would avoid the new T() in the methods. It does not invoke the constructor, it uses Activator.CreateInstance<T>() and for number crunching it will be bad as it will add overhead. That depends though on the implementation, for a simple units converter application it won't harm. For time critical context avoid like the plague. And don't take me wrong, I used it myself as I didn't know better and I run some simple benchmarks the other day and such a call might double the execution time -at least in my case. More details in Dissecting the new() constraint in C#: a perfect example of a leaky abstraction
I would also change Convert<T, R>() and make it a member function. I prefer writing
var c = new Unit<Length.mm>(123);
var e = c.Convert<Length.m>();
rather than
var e = Length.Convert<Length.mm, Length.m>(c);
Last but not least I would use specific unit "shells" for each physical quantity (length time etc) instead of the UnitDouble, as it will be easier to add physical quantity specific functions and operator overloads. It will also allow you to create a Speed<TLength, TTime> shell instead of another Unit<T1, T2> or even Unit<T1, T2, T3> class. So it would look like that:
public readonly struct Length<T> where T : struct, ILength
{
private static readonly double SiFactor = new T().ToSiFactor;
public Length(double value)
{
if (value < 0) throw new ArgumentException(nameof(value));
Value = value;
}
public double Value { get; }
public static Length<T> operator +(Length<T> first, Length<T> second)
{
return new Length<T>(first.Value + second.Value);
}
public static Length<T> operator -(Length<T> first, Length<T> second)
{
// I don't know any application where negative length makes sense,
// if it does feel free to remove Abs() and the exception in the constructor
return new Length<T>(System.Math.Abs(first.Value - second.Value));
}
// You can add more like
// public static Area<T> operator *(Length<T> x, Length<T> y)
// or
//public static Volume<T> operator *(Length<T> x, Length<T> y, Length<T> z)
// etc
public Length<R> To<R>() where R : struct, ILength
{
//notice how I got rid of the Activator invocations by moving them in a static field;
//double mult = new T().ToSiFactor;
//double div = new R().ToSiFactor;
return new Length<R>(Value * SiFactor / Length<R>.SiFactor);
}
}
Notice also that, in order to save us from the dreaded Activator call, I stored the result of new T().ToSiFactor in SiFactor. It might seem awkward at first, but as Length is generic, Length<mm> will have its own copy, Length<Km> its own, and so on and so forth. Please note that ToSiFactor is the toBase of your approach.
The problem that I see is that as long as you are in the realm of simple units and up to the first derivative of time, things are simple. If you try to do something more complex, then you can see the drawbacks of this approach. Typing
var accel = new Acceleration<m, s, s>(1.2);
will not be as clear and "smooth" as
let accel = 1.2<m/sec^2>
And regardless of the approach, you will have to specify every math operation you will need with hefty operator overloading, while in F# you have this for free, even if the results are not meaningful as I was writing at the beginning.
The last drawback (or advantage depending on how you see it) of this design, is that it can't be unit agnostic. If there are cases that you need "just a Length" you can't have it. You need to know each time if your Length is millimeters, statute mile or foot. I took the opposite approach in SharpConvert and LengthUnit derives from UnitBase and Meters Kilometers etc derive from this. That's why I couldn't go down the struct path by the way. That way you can have:
LengthUnit l1 = new Meters(12);
LengthUnit l2 = new Feet(15.4);
LengthUnit sum = l1 + l2;
sum will be meters but one shouldn't care as long as they want to use it in the next operation. If they want to display it, then they can call sum.To<Kilometers>() or whatever unit. To be honest, I don't know if not "locking" the variable to a specific unit has any advantages. It might worth investigating it at some point.
I would like the compiler to help me as much as possible. So maybe you could have a TypedInt where T contains the actual unit.
public struct TypedInt<T>
{
public int Value { get; }
public TypedInt(int value) => Value = value;
public static TypedInt<T> operator -(TypedInt<T> a, TypedInt<T> b) => new TypedInt<T>(a.Value - b.Value);
public static TypedInt<T> operator +(TypedInt<T> a, TypedInt<T> b) => new TypedInt<T>(a.Value + b.Value);
public static TypedInt<T> operator *(int a, TypedInt<T> b) => new TypedInt<T>(a * b.Value);
public static TypedInt<T> operator *(TypedInt<T> a, int b) => new TypedInt<T>(a.Value * b);
public static TypedInt<T> operator /(TypedInt<T> a, int b) => new TypedInt<T>(a.Value / b);
// todo: m² or m/s
// todo: more than just ints
// todo: other operations
public override string ToString() => $"{Value} {typeof(T).Name}";
}
You could have an extensiom method to set the type (or just new):
public static class TypedInt
{
public static TypedInt<T> Of<T>(this int value) => new TypedInt<T>(value);
}
The actual units can be anything. That way, the system is extensible.
(There's multiple ways of handling conversions. What do you think is best?)
public class Mile
{
// todo: conversion from mile to/from meter
// maybe define an interface like ITypedConvertible<Meter>
// conversion probably needs reflection, but there may be
// a faster way
};
public class Second
{
}
This way, you can use:
var distance1 = 10.Of<Mile>();
var distance2 = 15.Of<Mile>();
var timespan1 = 4.Of<Second>();
Console.WriteLine(distance1 + distance2);
//Console.WriteLine(distance1 + 5); // this will be blocked by the compiler
//Console.WriteLine(distance1 + timespan1); // this will be blocked by the compiler
Console.WriteLine(3 * distance1);
Console.WriteLine(distance1 / 3);
//Console.WriteLine(distance1 / timespan1); // todo!
See Boo Ometa (which will be available for Boo 1.0):
Boo Ometa and Extensible Parsing
I really liked reading through this stack overflow question and its answers.
I have a pet project that I've tinkered with over the years, and have recently started re-writing it and have released it to the open source at https://github.com/MafuJosh/NGenericDimensions
It happens to be somewhat similar to many of the ideas expressed in the question and answers of this page.
It basically is about creating generic dimensions, with the unit of measure and the native datatype as the generic type placeholders.
For example:
Dim myLength1 as New Length(of Miles, Int16)(123)
With also some optional use of Extension Methods like:
Dim myLength2 = 123.miles
And
Dim myLength3 = myLength1 + myLength2
Dim myArea1 = myLength1 * myLength2
This would not compile:
Dim myValue = 123.miles + 234.kilograms
New units can be extended in your own libraries.
These datatypes are structures that contain only 1 internal member variable, making them lightweight.
Basically, the operator overloads are restricted to the "dimension" structures, so that every unit of measure doesn't need operator overloads.
Of course, a big downside is the longer declaration of the generics syntax that requires 3 datatypes. So if that is a problem for you, then this isn't your library.
The main purpose was to be able to decorate an interface with units in a compile-time checking fashion.
There is a lot that needs to be done to the library, but I wanted to post it in case it was the kind of thing someone was looking for.

Categories