I'm learning C# and am trying to implement a versatile RandomVar class along with some methods for computing common statistics as practice. I'd like to be able form the arbitrary Joint probability RandomVariable from its components by instantiating a new variable of dimension N where N is passed into the constructor. I'd like to implement random Var X as two one dimensional lists of doubles, and the randomVar XY not as two lists of length n^2, but as a randomVar of type double[][] which otherwise can still use all of the same methods (ExpectedValue, Covariance, etc).
I'm having a lot of trouble implementing this. Other than the first naive approach (which had lots of copy and pasting), I've tried inheriting from a base RandomVar class into a JointRandomVar class -- still a lot of copy-pasting. Now I'm trying to the probabilities and outcomes arrays of Class RandomVar as Generics of type List -- this however produces a lot of problems as I can't figure out how to write the methods in an adaptable way (The std_Dev method can't iterate over the way it needs to in general -- so I need some flexible way to define the method so that if the "dimension" of the random Var is 2, the std_Dev method will do a double loop, or flatten out the array for the process of iterating).
Wanting some design help from more experienced programmers -- is having the probabilities/outcomes arrays List the best way to pass a parameter like this?
Thank you very much for your assistance.
EDIT: Here is the the version of the code for all doubles, so people can read it since the un-updated version seemed more confusing to people. I'd like to be able to have all of these methods work on objects of type double[] for any dimension of array, and it to be possible to instantiate the class with _values and _probs having any dimension.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Chapter_3_GUI
{
class RandomVar
{
private double[] _values;
private double[] _probs;
private double _mean;
private double _stddev;
private int _length;
private double _evalue;
public RandomVar(double[] values, double[] probs)
{
_values = values;
_probs = probs;
_mean = meanCalc(_values);
_stddev = stddevCalc(_values, _mean);
_length = _values.Length;
_evalue = expectedVal(_probs, _values, _length);
}
public double[] Values
{
get { return _values; }
set { _values = value; }
}
public double Mean
{
get { return _mean; }
}
public double Stdev
{
get { return _stddev; }
}
public static double meanCalc(double[] var)
{
double mean = var.Sum();
return mean;
}
public static double stddevCalc(double[] var, double mean)
{
double[] varianceArr = new double[var.Length];
for (int i = 0; i <= var.Length; i++)
varianceArr[i] = (var[i] - mean) * (var[i] - mean);
double variance = varianceArr.Sum();
double stddev = Math.Sqrt(variance);
return stddev;
{
}
}
public static double[][] multiplyProbs(RandomVar X, RandomVar Y, double[][] cprobMatrix)
{
double[][] probArr = new double[X._length][Y._length];
for (int i=0; i <= probArr.Length; i++)
{
for (int j =0; j <= probArr.Length; j++)
{
probArr[i][j] = Y._probs[j] * cprobMatrix[i][j];
}
}
return probArr;
}
public static RandomVar multiplyVars(RandomVar X, RandomVar Y, Func<double,double> f)
{
double[][] productArr = new double[X._length][Y._length];
for (int i=0; i<= productArr.Length; i++)
{
for (int j=0; j <= productArr.Length; i=j++)
{
productArr[i][j] = f(X._values[i], Y._values[j]);
}
}
double[][] probArr = multiplyProbs(X, Y, cprobMatrix);
RandomVar product = new RandomVar(productArr, probArr);
return product;
}
public static double expectedVal(double[] _probs, double[] _values, int _length)
{
double[] expectedArr = new double[_length];
for (int i = 0; i <= expectedArr.Length; i++)
{
expectedArr[i] = _probs[i] * _values[i];
}
double evalue = expectedArr.Sum();
return evalue;
}
public static double covarianceCalc(RandomVar X, RandomVar Y, Func<double, double> f)
{
RandomVar VarXY = multiplyVars(X, Y, f);
double correlation = expectedVal(VarXY._probs, VarXY._values, VarXY._length);
double covariance = correlation - (X._mean * Y._mean);
return covariance;
}
}
}
Is the cardinality of each dimension going to be the same? Your comment about "treating it as one long array of size n^k" suggests it is. That is, n is the length of value/probability pairs in each dimension.
The other question I have is, what is the reasoning behind passing the values and probabilities in two different arrays? If it were me, I'd declare a struct that contains the pairs, e.g.:
struct ValueProbPair
{
public readonly double Value;
public readonly double Probability;
public ValueProbPair(double value, double probability)
{
Value = value;
Probability = probability;
}
}
Finally, as far as your specific question goes…well, it's not clear what the specific question is. You seem to have a broad question regarding a flexible way to implement this.
It seems to me that the biggest challenge here (i.e. the roadblock with the least intuitively obvious solution) is in the title of your question:
Passing the dimension of an Array as a Parameter
You can do this, i.e. create an appropriate array object, by using the Array.CreateInstance(Type, int[]) overload. IMHO, it will also work better if (but is not required that) you can consolidate the value/probability pairs into a single struct.
The other big caveat is that you won't get the benefit of compiler optimizations for accessing array elements. You'll have to use e.g. the GetValue() method, which will most likely prevent the compiler from accessing array elements directly (the optimization is theoretically possible, but seems unlikely to me).
So, for example, you could do something like:
Array Combine(ValueProbPair[] newDimension, Array previousDimensions)
{
int[] rankLengths = new int[previousDimensions.Rank + 1];
for (int j = 0; j < previousDimensions.Rank; j++)
{
rankLengths[j] = previousDimensions.GetLength(j);
}
rankLengths[previousDimensions.Rank] = newDimension.Length;
Array result = Array.CreateInstance(typeof(ValueProbPair), rankLengths);
// then fill in your matrix using GetValue and SetValue to
// access individual array elements...
// Finally, return the new multi-dimensional array:
return result;
}
The various array method overloads that access elements use params array parameters, so you can without too much difficulty write code that can handle matrices of arbitrary dimension. E.g.:
IEnumerable<double> GetAllValues(Array source)
{
int[] index = new int[source.Rank];
while (true)
{
yield return (double)source.GetValue(index);
int j = 0;
while (++index[j] == source.GetLength(j))
{
index[j] = 0;
if (++j == index.Length)
{
yield break;
}
}
}
}
One last note: for dealing with value/probability, depending on your scenario you might actually find it makes more sense to do all this using dictionaries. There are different complications, but the basic building block would be Dictionary<double, object>, where the value is either double or another Dictionary<double, object>. Then if you are looking for e.g. a combined probability, you don't have to scan lists of values, but rather can just look them up directly as the key in the dictionary.
Related
I need to call a function (Matrix.TransformPoints) which has an array of Point (PointF) as a parameter. Unfortunately I only want to apply that function to a slice of that array and I cannot find a function which returns a slice using the original original array as a backing.
It will always be a new array and as both types are structs, it'll copy the values of the Point. It would be nice if it would support Span<T>, as that would do the trick.
For testing, I've been writing this example. I've added a DoSpan function, because then I could verify that DoArray(Span.ToArray()) doesn't work while DoSpan(Span) does. Only when passing the original array to DoArray (or using DoSpan), it'll actually change the values from the original array.
// Same parameters as Matrix.TransformPoints
public static void DoArray(Point[] points)
{
for (int i = 0; i < points.Length; i++)
{
points[i].X++;
}
}
// Just to verify that with Spans it could work, but not Span.ToArray()
public static void DoSpan(Span<Point> points)
{
for (int i = 0; i < points.Length; i++)
{
points[i].X++;
}
}
public static void Main()
{
Point[] points = new Point[4];
DoArray(points);
PrintArray("Complete Array", points);
DoArray(points[2..4]);
PrintArray("Array-Slice", points);
DoSpan(points.AsSpan(2..4));
PrintArray("Span", points);
DoArray(points.AsSpan(2..4).ToArray());
PrintArray("Span.ToArray()", points);
}
private static void PrintArray(string msg, IEnumerable<Point> points)
{
Console.WriteLine(msg);
foreach (var p in points)
{
Console.WriteLine($"\t{p}");
}
}
Now if I was programming in C, I could just pass the pointer the third element and the function wouldn't know that it is an array slice (of course I'd also need to pass a length/count). So I'm wondering whether C# does have something similar.
Maybe overriding isn't the correct term here.
I want to extend some of the System.Math class functions to work on double arrays.
What I'm currently doing is:
public double[] Sin(double[] d)
{
double[] result = new double[d.Length];
for(int i=0;i<result.Length;i++)
result[i] = Math.Sin(d[i]);
}
For many functions in a Math (about 20) just replacing Sin by Cos,Round,...
Is there a way to make this more elegant?
Please note that I'm building something to allow the user to evaluate expressions in runtime.
The user needs to be able to write "Cos(d)" for d double array and for all the functions, so solutions from the input side aren't really an option.
Thanks all
Not really, but you can shorten it with Array.ConvertAll:
double[] result = Array.ConvertAll(d, Math.Sin);
If the function name is in a string, you might be able to use a dictionary:
var dict = new Dictionary<string, Func<double[], double[]>> {
{ "Sin", a => Array.ConvertAll(a, Math.Sin) },
{ "Cos", a => Array.ConvertAll(a, Math.Cos) }
};
double[] d = { 1, 2 };
double[] result = dict["Sin"](d); // { 0.8414709848078965, 0.90929742682568171 }
You can create more generic method to work with the arrays. Something like this
internal static void Transform(double[] values, Func<double, double> transformation)
{
for(int i = 0; i < values.Length; i++)
values[i] = transformation[values[i]];
}
Now you can use more concrete methods like
internal static void Sin(doube[] values)
{
return Transform(values, Math.Sin);
}
Or
internal static void Cos(doube[] values)
{
return Transform(values, Math.Cos);
}
the the usage will be like this
var result = Sin(values)
You could use Extension methods:
public static class MathExtensions
{
public static double[] Sin(this double[] input)
{
return input.Select(Math.Sin).ToArray();
}
}
And then the call would be:
var f = d.Sin();
Note: This doesn't get the syntax you wanted, nor does it solve the issue of having to write one of these for each of the corresponding Math methods but I'll post it here as an answer to how to extend existing methods more elegantly.
I have a method that accepts an array (float or double), start and end index and then does some element manipulations for indexes in startIndex to endIndex range.
Basically it looks like this:
public void Update(float[] arr, int startIndex, int endIndex)
{
if (condition1)
{
//Do some array manipulation
}
else if (condition2)
{
//Do some array manipulation
}
else if (condition3)
{
if (subcondition1)
{
//Do some array manipulation
}
}
}
Method is longer than this, and involves setting some elements to 0 or 1, or normalizing the array.
The problem is that I need to pass both float[] and double[] arrays there, and don't want to have a duplicated code that accepts double[] instead.
Performance is also critical, so I don't want to create a new double[] array, cast float array to it, perform calcs, then update original array by casting back to floats.
Is there any solution to it that avoids duplicated code, but is also as fast as possible?
You have a few options. None of them match exactly what you want, but depending on what kind of operations you need you might get close.
The first is to use a generic method where the generic type is restricted, but the only operations you can do are limited:
public void Update<T>(T[] arr, int startIndex, int endIndex) : IComarable
{
if (condition1)
{
//Do some array manipulation
}
else if (condition2)
{
//Do some array manipulation
}
else if (condition3)
{
if (subcondition1)
{
//Do some array manipulation
}
}
}
And the conditions and array manipulation in that function would be limited to expressions that use the following forms:
if (arr[Index].CompareTo(arr[OtherIndex])>0)
arr[Index] = arr[OtherIndex];
This is enough to do things like find the minimum, or maximum, or sort the items in the array. It can't do addition/subtraction/etc, so this couldn't, say, find the average. You can make up for this by creating your own overloaded delegates for any additional methods you need:
public void Update<T>(T[] arr, int startIndex, int endIndex, Func<T,T> Add) : IComarable
{
//...
arr[Index] = Add(arr[OtherIndex] + arr[ThirdIndex]);
}
You'd need another argument for each operation that you actually use, and I don't know how that will perform (that last part's gonna be a theme here: I haven't benchmarked any of this, but performance seems to be critical for this question).
Another option that came to mind is the dynamic type:
public void Update(dynamic[] arr, int startIndex, int endIndex)
{
//...logic here
}
This should work, but for something called over and over like you claim I don't know what it would do to the performance.
You can combine this option with another answer (now deleted) to give back some type safety:
public void Update(float[] arr, int startIndex, int endIndex)
{
InternalUpdate(arr, startIndex, endIndex);
}
public void Update(double[] arr, int startIndex, int endIndex)
{
InternalUpdate(arr, startIndex, endIndex);
}
public void InternalUpdate(dynamic[] arr, int startIndex, int endIndex)
{
//...logic here
}
One other idea is to cast all the floats to doubles:
public void Update(float[] arr, int startIndex, int endIndex)
{
Update( Array.ConvertAll(arr, x => (double)x), startIndex, endIndex);
}
public void Update(double[] arr, int startIndex, int endIndex)
{
//...logic here
}
Again, this will re-allocate the array, and so if that causes a performance issue we'll have to look elsewhere.
If (and only if) all else fails, and a profiler shows that this is a critical performance section of your code, you can just overload the method and implement the logic twice. It's not ideal from a code maintenance standpoint, but if the performance concern is well-established and documented, it can be the worth the copy pasta headache. I included a sample comment to indicate how you might want to document this:
/******************
WARNING: Profiler tests conducted on 12/29/2014 showed that this is a critical
performance section of the code, and that separate double/float
implementations of this method produced a XX% speed increase.
If you need to change anything in here, be sure to change BOTH SETS,
and be sure to profile both before and after, to be sure you
don't introduce a new performance bottleneck. */
public void Update(float[] arr, int startIndex, int endIndex)
{
//...logic here
}
public void Update(double[] arr, int startIndex, int endIndex)
{
//...logic here
}
One final item to explore here, is that C# includes a generic ArraySegment<T> type, that you may find useful for this.
Just an idea. I have no idea what the performance implications are, but this helped me to go to sleep :P
public void HardcoreWork(double[] arr){HardcoreWork(arr, null);}
public void HardcoreWork(float[] arr){HardcoreWork(null, arr);}
public struct DoubleFloatWrapper
{
private readonly double[] _arr1;
private readonly float[] _arr2;
private readonly bool _useFirstArr;
public double this[int index]
{
get {
return _useFirstArr ? _arr1[index] : _arr2[index];
}
}
public int Length
{
get {
return _useFirstArr ? _arr1.Length : _arr2.Length;
}
}
public DoubleFloatWrapper(double[] arr1, float[] arr2)
{
_arr1 = arr1;
_arr2 = arr2;
_useFirstArr = _arr1 != null;
}
}
private void HardcoreWork(double[] arr1, float[] arr2){
var doubleFloatArr = new DoubleFloatWrapper(arr1, arr2);
var len = doubleFloatArr.Length;
double sum = 0;
for(var i = 0; i < len; i++){
sum += doubleFloatArr[i];
}
}
Don't forget that if the amount of elements you have is ridiculously small, you can just use pooled memory, which will give you zero memory overhead.
ThreadLocal<double[]> _memoryPool = new ThreadLocal<double[]>(() => new double[100]);
private void HardcoreWork(double[] arr1, float[] arr2){
double[] array = arr1;
int arrayLength = arr1 != null ? arr1.Length : arr2.Length;
if(array == null)
{
array = _memoryPool.Value;
for(var i = 0; i < arr2.Length; i++)
array[i] = arr2[i];
}
for(var i = 0; i < 1000000; i++){
for(var k =0; k < arrayLength; k++){
var a = array[k] + 1;
}
}
}
What about implementing the method using generics? An abstract base class can be created for your core business logic:
abstract class MyClass<T>
{
public void Update(T[] arr, int startIndex, int endIndex)
{
if (condition1)
{
//Do some array manipulation, such as add operation:
T addOperationResult = Add(arr[0], arr[1]);
}
else if (condition2)
{
//Do some array manipulation
}
else if (condition3)
{
if (subcondition1)
{
//Do some array manipulation
}
}
}
protected abstract T Add(T x, T y);
}
Then implement per data type an inheriting class tuned to type-specific operations:
class FloatClass : MyClass<float>
{
protected override float Add(float x, float y)
{
return x + y;
}
}
class DoubleClass : MyClass<double>
{
protected override double Add(double x, double y)
{
return x + y;
}
}
John's comment about macros, although completely inaccurate characterization of C++ templates, got me thinking about the preprocessor.
C#'s preprocessor is nowhere near as powerful as C's (which C++ inherits), but it still is able to handle everything you need except the duplication itself:
partial class MyClass
{
#if FOR_FLOAT
using Double = System.Single;
#endif
public void Update(Double[] arr, int startIndex, int endIndex)
{
// do whatever you want, using Double where you want the type to change, and
// either System.Double or double where you don't
}
}
Now, you need to include two copies of the file in your project, one of which has an extra
#define FOR_FLOAT
line at the top. (Should be fairly easy to automate adding that)
Unfortunately, the /define compiler option applies to the entire assembly, not per-file, so you can't use a hardlink to include the file twice and have the symbol only defined for one. However, if you can tolerate the two implementations being in different assemblies, you can include the same source file into both, using the project options to define FOR_FLOAT in one of them.
I still advocate using templates in C++/CLI.
Most code isn't so performance-critical that taking the time to convert from float to double and back causes a problem:
public void Update(float[] arr, int startIndex, int endIndex)
{
double[] darr = new double[arr.Length];
for(int i=startIndex; i<endIndex; i++)
darr[i] = (double) arr[i];
Update(darr, startIndex, endIndex);
for(int j=startIndex; j<endIndex; j++)
arr[j] = darr[j];
}
Here's a thought experiment. Imagine that instead of copying, you duplicated the code of the double[] version to make a float[] version. Imagine that you optimized the float[] version as much as necessary.
Your question is then: does the copying really take that long? Consider that instead of maintaining two versions of the code, you could spend your time improving the performance of the double[] version.
Even if you had been able to use generics for this, it's possible that the double[] version would want to use different code from the float[] version in order to optimize performance.
Let say I got this function :
void int Calculate(double[] array) {}
And in my main I got this array:
double[,] myArray = new double[3,3];
How can I call Calculate(...) ?
I try (that's don't compile) :
double[] mySingleArray = myArray[0];
What I want to avoid is unnecessary loop (for).
I declare a regular array, but if a jagged array or any other type of array works better, it's fine for me.
I use c# 3.5
First, let's declare your Calculate() method like this:
int Calculate(IEnumerable<double> doubles)
Don't worry, you can still pass an array to that code. You might also need IList<double>, but 9 times out of 10 the IEnumerable is good enough. The main thing is that this will let us use the yield keyword to slice up your array in an efficient way:
public static IEnumerable<T> Slice(this T[,] values)
{
return Slice(values, 0, 0);
}
public static IEnumerable<T> Slice(this T[,] values, int index)
{
return Slice(values, 0, index);
}
public static IEnumerable<T> Slice(this T[,] values, int dimension, int index)
{
int length = values.GetUpperBound(dimension);
int[] point = new int[values.Rank];
point[dimension] = index;
dimension = 1 - dimension;// only works for rank == 2
for (int i = 0; i < length; i++)
{
point[dimension] = i;
yield return (T)values.GetValue(point);
}
}
It still needs some work because it only works with rank 2 arrays, but it should be fine for the example you posted.
Now you can call your calculate function like this:
Calculate(myArray.Slice(0));
Note that due to the way IEnumerable and the yield statement work the for loop in the code I posted is essentially free. It won't run until you actually iterate the items in your Calculate method, and even there runs in a "just-in-time" fashion so that the whole algorithm remains O(n).
It gets even more interesting when you share what your Calculate method is doing. You might be able to express it as a simple Aggregate + lambda expression. For example, let's say your calculate method returned the number of items > 5:
myArray.Slice(0).Count(x => x > 5);
Or say it summed all the items:
myArray.Slice().Sum();
A jagged array works the way you want:
double[][] jaggedArray = new double[][100];
for (int i = 0; i < jaggedArray.Length; ++i)
jaggedArray[i] = new double[100];
myFunction(jaggedArray[0]);
You can have different sizes for each array in this way.
A jagged array would let you split out the first array!
The Slice() method given above will get you a single row from your array, which seems to match the sample given in your question.
However, if you want a one dimensional array that contains all the elements in the rectangular array, you can use something like this, which is also O(n).
public static T[] Flatten<T>(this T[,] array)
where T : struct
{
int size = Marshal.SizeOf(array[0, 0]);
int totalSize = Buffer.ByteLength(array);
T[] result = new T[totalSize / size];
Buffer.BlockCopy(array, 0, result, 0, totalSize);
return result;
}
What is the best algorithm to take array like below:
A {0,1,2,3}
I expected to order it like array below:
B {3,1,0,2}
Any ideas?
So if you have two arrays and they hold the same data just in different order then just do this:
A = B
I suspect that is not your situation so I think we need more info.
What you need to do is determine the ordering of B and then apply that ordering to A. One way to accomplish this is to undo the ordering of B and keep track of what happens along the way. Then you can do the reverse to A.
Here's some sketchy C# (sorry, I haven't actually run this)...
Take a copy of B:
List<int> B2 = new List<int>(B);
Now sort it, using a sort function that records the swaps:
List<KeyValuePair<int,int>> swaps = new List<KeyValuePair<int,int>>();
B2.Sort( delegate( int x, int y ) {
if( x<y ) return -1;
if( x==y ) return 0;
// x and y must be transposed, so assume they will be:
swaps.Add( new KeyValuePair<int,int>(x,y) );
return 1;
});
Now apply the swaps, in reverse order, to A:
swaps.Reverse();
foreach( KeyValuePair<int,int> x in swaps )
{
int t = A[x.key];
A[x.key] = A[x.value];
A[x.value] = t;
}
Depending how the built-in sort algorithm works, you might need to roll your own. Something nondestructive like a merge sort should give you the correct results.
Here's my implementation of the comparer (uses LINQ, but can be easily adapted to older .net versions). You can use it for any sorting algorithms such as Array.Sort, Enumerable.OrderBy, List.Sort, etc.
var data = new[] { 1, 2, 3, 4, 5 };
var customOrder = new[] { 2, 1 };
Array.Sort(data, new CustomOrderComparer<int>(customOrder));
foreach (var v in data)
Console.Write("{0},", v);
The result is 2,1,3,4,5, - any items not listed in the customOrder are placed at the end in the default for the given type (unless a fallback comparator is given)
public class CustomOrderComparer<TValue> : IComparer<TValue>
{
private readonly IComparer<TValue> _fallbackComparer;
private const int UseDictionaryWhenBigger = 64; // todo - adjust
private readonly IList<TValue> _customOrder;
private readonly Dictionary<TValue, uint> _customOrderDict;
public CustomOrderComparer(IList<TValue> customOrder, IComparer<TValue> fallbackComparer = null)
{
if (customOrder == null) throw new ArgumentNullException("customOrder");
_fallbackComparer = fallbackComparer ?? Comparer<TValue>.Default;
if (UseDictionaryWhenBigger < customOrder.Count)
{
_customOrderDict = new Dictionary<TValue, uint>(customOrder.Count);
for (int i = 0; i < customOrder.Count; i++)
_customOrderDict.Add(customOrder[i], (uint) i);
}
else
_customOrder = customOrder;
}
#region IComparer<TValue> Members
public int Compare(TValue x, TValue y)
{
uint indX, indY;
if (_customOrderDict != null)
{
if (!_customOrderDict.TryGetValue(x, out indX)) indX = uint.MaxValue;
if (!_customOrderDict.TryGetValue(y, out indY)) indY = uint.MaxValue;
}
else
{
// (uint)-1 == uint.MaxValue
indX = (uint) _customOrder.IndexOf(x);
indY = (uint) _customOrder.IndexOf(y);
}
if (indX == uint.MaxValue && indY == uint.MaxValue)
return _fallbackComparer.Compare(x, y);
return indX.CompareTo(indY);
}
#endregion
}
In the example you gave (an array of numbers), there would be no point in re-ordering A, since you could just use B.
So, presumably these are arrays of objects which you want ordered by one of their properties.
Then, you will need a way to look up items in A based on the property in question (like a hashtable). Then you can iterate B (which is in the desired sequence), and operate on the corresponding element in A.
Both array's contain the same values (or nearly so) but I need to force them to be in the same order. For example, in array A the value "3045" is in index position 4 and in array B it is in index position 1. I want to reorder B so that the index positions of like values are the same as A.
If they are nearly the same then here is some pseudo code:
Make an ArrayList
Copy the contents of the smaller array to the arraylist
for each item I in the larger array
FInd I in the ArrayList
Append I to a new array
Remove I from the arraylist
Could the issue be resolved using a Dictionary so the elements have a relationship that isn't predicated on sort order at all?