Calculate some statistics with Math.Net - c#

I have some results that are stored in a multidimensional array:
double[,] results;
Each column is a time series of prices for a specific variable (e.g. "house", "car", "electricity"). I would like to calculate some statistics for each variable so that to summarize the results in a more compact form.
For example, I was looking at the percentile function in Math.Net.
I would like to calculate the 90th percentile of the prices for each column (so for each variable).
I am trying the following, since the function doesn't work on multidimensional array (so I cannot pass results[,] as argument for the percentile function):
for (int i = 0, i <= results.GetLength(2), i++)
{
myList.Add(MathNet.Numerics.Statistics.Statistics.Percentile(results[,i], 90));
}
So I want to loop through the columns of my results[,] and calculate the 90th percentile, adding the result to a list.
But this doesn't work because of wrong syntax in results[, i]. There is no other (more clear) error message unfortunately.
Can you help me understand where the problem is and if there's a better way to calculate a percentile by column?

Percentile is an extension method with following calling sequence:
public static double Percentile(this IEnumerable<double> data, int p)
So you can use Linq to transform your 2d array into an appropriate sequence to pass to Percentile.
However, results.GetLength(2) will throw an exception because the dimension argument of GetLength() is zero-based. You probably meant results.GetLength(1). Assuming that's what you meant, you can do:
var query = Enumerable.Range(0, results.GetLength(1))
.Select(iCol => Enumerable.Range(0, results.GetLength(0))
.Select(iRow => results[iRow, iCol])
.Percentile(90));
You can have Linq make the list for you,
var myList= query.ToList();
or add it to a pre-existing list:
myList.AddRange(query);
update
To filter NaN values use double.IsNaN:
var query = Enumerable.Range(0, results.GetLength(1))
.Select(iCol => Enumerable.Range(0, results.GetLength(0))
.Select(iRow => results[iRow, iCol])
.Where(d => !double.IsNaN(d))
.Percentile(90));
update
If one extracts a couple of array extensions:
public static class ArrayExtensions
{
public static IEnumerable<IEnumerable<T>> Columns<T>(this T[,] array)
{
if (array == null)
throw new ArgumentNullException();
return Enumerable.Range(0, array.GetLength(1))
.Select(iCol => Enumerable.Range(0, array.GetLength(0))
.Select(iRow => array[iRow, iCol]));
}
public static IEnumerable<IEnumerable<T>> Rows<T>(this T[,] array)
{
if (array == null)
throw new ArgumentNullException();
return Enumerable.Range(0, array.GetLength(0))
.Select(iRow => Enumerable.Range(0, array.GetLength(1))
.Select(iCol => array[iRow, iCol]));
}
}
Them the query becomes:
var query = results.Columns().Select(col => col.Where(d => !double.IsNaN(d)).Percentile(90));
which seems much clearer.

Related

C# LINQ sum array properties

So I have a class with an array of values, and a list of those classes.
And I want to return the sum (or any other operation) of all the items in the list, also as an array.
E.g. the sum of {1,2,3}, {10,20,30} & {100,200,300} would be {111,222,333}
So, the resulting array's 1st element will be the sum of all the 1st elements in the input arrays, the 2nd element will be the sum of all the 2nd elements in the input arrays, etc.
I can do it with:
public class Item
{
internal int[] Values = new int[3];
}
public class Items : List<Item>
{
internal int[] Values
{
get
{
int[] retVal = new int[3];
for (int x = 0; x < retVal.Length; x++)
{
retVal[x] = this.Sum(i => i.Values[x]);
}
return retVal;
}
}
}
But I feel that this should be achievable as a single line using LINQ. Is it?
Yes, this can be done using a single linq code line, using Enumrable.Range, Max, Select and Sum:
Notice I've also included a simple condition to save you from an IndexOutOfRangeException should one of the arrays is a different length than the others.
internal int[] ValuesLinq
{
get
{
return Enumerable
.Range(0, this.Max(i => i.Values.Length))
.Select(ind => this.Sum(item => item.Values.Length > ind ? item.Values[ind] : 0))
.ToArray();
}
}
You can see a live demo on Rextester
You can try to Group items withing the arrays by their indexes (so we sum all 1st arrays items, every 2nd items etc.):
int[] retVal = myList
.SelectMany(item => item.Values
.Select((value, index) => new {value, index}))
.GroupBy(item => item.index, item => item.value)
.Select(group => group.Sum())
.ToArray();

List<IEnumerator>.All(e => e.MoveNext()) doesn't move my enumerators on

I'm trying to track down a bug in our code. I've boiled it down to the snippet below. In the example below I have a grid of ints (a list of rows), but I want to find the indexes of the columns that have a 1. The implementation of this is to create an enumerator for each row and step through each column in turn by keeping the enumerators in step.
class Program
{
static void Main(string[] args)
{
var ints = new List<List<int>> {
new List<int> {0, 0, 1}, // This row has a 1 at index 2
new List<int> {0, 1, 0}, // This row has a 1 at index 1
new List<int> {0, 0, 1} // This row also has a 1 at index 2
};
var result = IndexesWhereThereIsOneInTheColumn(ints);
Console.WriteLine(string.Join(", ", result)); // Expected: "1, 2"
Console.ReadKey();
}
private static IEnumerable<int> IndexesWhereThereIsOneInTheColumn(
IEnumerable<List<int>> myIntsGrid)
{
var enumerators = myIntsGrid.Select(c => c.GetEnumerator()).ToList();
short i = 0;
while (enumerators.All(e => e.MoveNext())) {
if (enumerators.Any(e => e.Current == 1))
yield return i;
i++;
if (i > 1000)
throw new Exception("You have gone too far!!!");
}
}
}
However I have noticed that MoveNext() is not remembered each time around the while loop. MoveNext() always returns true, and Current is always 0. Is this a purposeful feature of Linq to make it more side effect free?
I noticed that this works:
private static IEnumerable<int> IndexesWhereThereIsOneInTheColumn(
IEnumerable<List<int>> myIntsGrid)
{
var enumerators = myIntsGrid.Select(c =>
c.ToArray().GetEnumerator()).ToList(); // added ToArray()
short i = 0;
while (enumerators.All(e => e.MoveNext())) {
if (enumerators.Any(e => (int)e.Current == 1)) // added cast to int
yield return i;
i++;
}
}
So is this just a problem with List?
It is because the enumerator of List<T> is a struct whereas the enumerator of Array is a class.
So when you call Enumerable.All with the struct, copy of enumerator is made and passed as a parameter to Func since structs are copied by value. So e.MoveNext is called on the copy, not the original.
Try this:
Console.WriteLine(new List<int>().GetEnumerator().GetType().IsValueType);
Console.WriteLine(new int[]{}.GetEnumerator().GetType().IsValueType);
It prints:
True
False
As Sriram Sakthivel's answer says the issue is due to lack of boxing and accidentally the list enumerator implementation being a struct, not a reference type. Usually, one would not expect the value-type behavior for an enumerator, as most are either exposed by the IEnumerator/IEnumerator<T> interfaces, or are reference types themselves. A quick way to go around this is to change this line
var enumerators = myIntsGrid.Select(c => c.GetEnumerator()).ToList();
to
var enumerators
= myIntsGrid.Select(c => (IEnumerator) c.GetEnumerator()).ToList();
instead.
The above code will construct a list of already boxed enumerators, which will be treated as reference type instances, because of the interface cast. From that moment on, they should behave as you expect them to in your later code.
If you need a generic enumerator (to avoid casts when latter using the enumerator.Current property), you can cast to the appropriate generic IEnumerator<T> interface:
c => (IEnumerator<int>) c.GetEnumerator()
or even better
c => c.GetEnumerator() as IEnumerator<int>
The as keyword is said to perform a lot better than direct casts, and in the case of a loop it could bring an essential performance benefit. Just be careful that as returns null if the cast fails As per Flater's request from comments:. In the OP's case, it is guaranteed the enumerator implements IEnumerator<int>, so it is safe to go for an as cast.
Alternatively, you could do it with a lambda extension
var ids = Enumerable.Range(0,ints.Max (row => row.Count)).
Where(col => ints.Any(row => (row.Count>col)? row[col] == (1) : false));
or
var ids = Enumerable.Range(0,ints.Max (row=> row.Count)).
Where(col => ints.Any (row => row.ElementAtOrDefault(col) == 1));
Here's a simple implementation using loops and yield:
private static IEnumerable<int> IndexesWhereThereIsOneInTheColumn(
IEnumerable<List<int>> myIntsGrid)
{
for (int i=0; myIntsGrid.Max(l=>l.Count) > i;i++)
{
foreach(var row in myIntsGrid)
{
if (row.Count > i && row[i] == 1)
{
yield return i;
break;
}
}
}
}
Alternatively, use this inside the for loop:
if (myIntsGrid.Any(row => row.Count > i && row[i] == 1)) yield return i;
Just for fun, here's a neat LINQ query that won't cause hard-to-trace side effects in your code:
IEnumerable<int> IndexesWhereThereIsOneInTheColumn(IEnumerable<IEnumerable<int>> myIntsGrid)
{
return myIntsGrid
// Collapse the rows into a single row of the maximum value of all rows
.Aggregate((acc, x) => acc.Zip(x, Math.Max))
// Enumerate the row
.Select((Value,Index) => new { Value, Index })
.Where(x => x.Value == 1)
.Select(x => x.Index);
}
Why can't you just get those indexes like this:
var result = ints.Select (i => i.IndexOf(1)).Distinct().OrderBy(i => i);
Seems to be much easier...

Count numbers in a List

In C# i have a List which contains numbers in string format. Which is the best way to count all this numbers? For example to say i have three time the number ten..
I mean in unix awk you can say something like
tempArray["5"] +=1
it is similar to a KeyValuePair but it is readonly.
Any fast and smart way?
Very easy with LINQ :
var occurrenciesByNumber = list.GroupBy(x => x)
.ToDictionary(x => x.Key, x.Count());
Of course, being your numbers represented as strings, this code does distinguish for instance between "001" and "1" even if conceptually are the same number.
To count numbers that have the same value, you could do for example:
var occurrenciesByNumber = list.GroupBy(x => int.Parse(x))
.ToDictionary(x => x.Key, x.Count());
(As noted in digEmAll's answer, I'm assuming you don't really care that they're numbers - everything here assumes that you wanted to treat them as strings.)
The simplest way to do this is to use LINQ:
var dictionary = values.GroupBy(x => x)
.ToDictionary(group => group.Key, group => group.Count());
You could build the dictionary yourself, like this:
var map = new Dictionary<string, int>();
foreach (string number in list)
{
int count;
// You'd normally want to check the return value, but in this case you
// don't care.
map.TryGetValue(number, out count);
map[number] = count + 1;
}
... but I prefer the conciseness of the LINQ approach :) It will be a bit less efficient, mind you - if that's a problem, I'd personally probably create a generic "counting" extension method:
public static Dictionary<T, int> GroupCount<T>(this IEnumerable<T> source)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
var map = new Dictionary<T, int>();
foreach (T value in source)
{
int count;
map.TryGetValue(number, out count);
map[number] = count + 1;
}
return map;
}
(You might want another overload accepting an IEqualityComparer<T>.) Having written this once, you can reuse it any time you need to get the counts for items:
var counts = list.GroupCount();

compare multiple arraylist lengths to find longest one

I have 6 array lists and I would like to know which one is the longest without using a bunch of IF STATEMENTS.
"if arraylist.count > anotherlist.count Then...." <- Anyway to do this other than this?
Examples in VB.net or C#.Net (4.0) would be helpfull.
arraylist1.count
arraylist2.count
arraylist3.count
arraylist4.count
arraylist5.count
arraylist6.count
DIM longest As integer = .... 'the longest arraylist should be stored in this variable.
Thanks
Is 1 if statement acceptable?
public ArrayList FindLongest(params ArrayList[] lists)
{
var longest = lists[0];
for(var i=1;i<lists.Length;i++)
{
if(lists[i].Length > longest.Length)
longest = lists[i];
}
return longest;
}
You could use Linq:
public static ArrayList FindLongest(params ArrayList[] lists)
{
return lists == null
? null
: lists.OrderByDescending(x => x.Count).FirstOrDefault();
}
If you just want the length of the longest list, it's even simpler:
public static int FindLongestLength(params ArrayList[] lists)
{
return lists == null
? -1 // here you could also return (int?)null,
// all you need to do is adjusting the return type
: lists.Max(x => x.Count);
}
If you store everything in a List of Lists like for example
List<List<int>> f = new List<List<int>>();
Then a LINQ like
List<int> myLongest = f.OrderBy(x => x.Count).Last();
will yield the list with the most number of items. Of course you will have to handle the case when there is tie for the longest list
SortedList sl=new SortedList();
foreach (ArrayList al in YouArrayLists)
{
int c=al.Count;
if (!sl.ContainsKey(c)) sl.Add(c,al);
}
ArrayList LongestList=(ArrayList)sl.GetByIndex(sl.Count-1);
If you just want the length of the longest ArrayList:
public int FindLongest(params ArrayList[] lists)
{
return lists.Max(item => item.Count);
}
Or if you don't want to write a function and just want to in-line the code, then:
int longestLength = (new ArrayList[] { arraylist1, arraylist2, arraylist3,
arraylist4, arraylist5, arraylist6 }).Max(item => item.Count);

Using lambda expressions to get a subset where array elements are equal

I have an interesting problem, and I can't seem to figure out the lambda expression to make this work.
I have the following code:
List<string[]> list = GetSomeData(); // Returns large number of string[]'s
List<string[]> list2 = GetSomeData2(); // similar data, but smaller subset
List<string[]> newList = list.FindAll(predicate(string[] line){
return (???);
});
I want to return only those records in list in which element 0 of each string[] is equal to one of the element 0's in list2.
list contains data like this:
"000", "Data", "more data", "etc..."
list2 contains data like this:
"000", "different data", "even more different data"
Fundamentally, i could write this code like this:
List<string[]> newList = new List<string[]>();
foreach(var e in list)
{
foreach(var e2 in list2)
{
if (e[0] == e2[0])
newList.Add(e);
}
}
return newList;
But, i'm trying to use generics and lambda's more, so i'm looking for a nice clean solution. This one is frustrating me though.. maybe a Find inside of a Find?
EDIT:
Marc's answer below lead me to experiment with a varation that looks like this:
var z = list.Where(x => list2.Select(y => y[0]).Contains(x[0])).ToList();
I'm not sure how efficent this is, but it works and is sufficiently succinct. Anyone else have any suggestions?
You could join? I'd use two steps myself, though:
var keys = new HashSet<string>(list2.Select(x => x[0]));
var data = list.Where(x => keys.Contains(x[0]));
If you only have .NET 2.0, then either install LINQBridge and use the above (or similar with a Dictionary<> if LINQBridge doesn't include HashSet<>), or perhaps use nested Find:
var data = list.FindAll(arr => list2.Find(arr2 => arr2[0] == arr[0]) != null);
note though that the Find approach is O(n*m), where-as the HashSet<> approach is O(n+m)...
You could use the Intersect extension method in System.Linq, but you would need to provide an IEqualityComparer to do the work.
static void Main(string[] args)
{
List<string[]> data1 = new List<string[]>();
List<string[]> data2 = new List<string[]>();
var result = data1.Intersect(data2, new Comparer());
}
class Comparer : IEqualityComparer<string[]>
{
#region IEqualityComparer<string[]> Members
bool IEqualityComparer<string[]>.Equals(string[] x, string[] y)
{
return x[0] == y[0];
}
int IEqualityComparer<string[]>.GetHashCode(string[] obj)
{
return obj.GetHashCode();
}
#endregion
}
Intersect may work for you.
Intersect finds all the items that are in both lists.
Ok re-read the question. Intersect doesn't take the order into account.
I have written a slightly more complex linq expression that will return a list of items that are in the same position (index) with the same value.
List<String> list1 = new List<String>() {"000","33", "22", "11", "111"};
List<String> list2 = new List<String>() {"000", "22", "33", "11"};
List<String> subList = list1.Select ((value, index) => new { Value = value, Index = index})
.Where(w => list2.Skip(w.Index).FirstOrDefault() == w.Value )
.Select (s => s.Value).ToList();
Result: {"000", "11"}
Explanation of the query:
Select a set of values and position of that value.
Filter that set where the item in the same position in the second list has the same value.
Select just the value (not the index as well).
Note I used:
list2.Skip(w.Index).FirstOrDefault()
//instead of
list2[w.Index]
So that it will handle lists of different lengths.
If you know the lists will be the same length or list1 will always be shorter then list2[w.Index] would probably a bit faster.

Categories