I would like to calculate the correlation matrix using linq, with a single phrase. How can I do that (if it is possible)?
Assume I have already an array of size N called volatilites and Returns is a jagged array, with N arrays all of the same size.
I am also using:
using stats = MathNet.Numerics.Statistics.ArrayStatistics
and this is the code that I want to make in LINQ:
double[,] correlation_matrix = new double[N,N];
for (int i=0; i<N;i++){
for (int j = i + 1; j < N; j++){
correlation_matrix [i,j]= stats.Covariance(Returns[i], Returns[j]) / (volatilities[i] * volatilities[j]); // stores it to check values
}
}
thanks!
If you let yourself have an array of arrays, you can do
var correlation_matrix =
Returns.Select((r_i, i) =>
Returns.Where((r_j, j) => j > i).Select((r_j, j) =>
stats.Covariance(r_i, r_j) / (volatilities[i] * volatilities[j])
).ToArray()
).ToArray();
If you want to use ranges (per your comment), you can do
var N = Returns.Length;
var correlation_matrix =
Enumerable.Range(0, N).Select(i =>
Enumerable.Range(i + 1, N - i - 1).Select(j =>
stats.Covariance(Returns[i], Returns[j]) / (volatilities[i] * volatilities[j])
).ToArray()
).ToArray();
That's not to say you should do this. The loop version is both more readable and more performant.
Per OP request Enumerable.Aggregate version with 2d array as result:
var correlation_matrix =
Enumerable.Range(0, N).Select(i =>
Enumerable.Range(i + 1, N - i - 1).Select(j =>
new {
i, j, // capture location of the result
Data = i + j } // compute whatever you need here
)
)
.SelectMany(r => r) // flatten into single list
.Aggregate(
new double[N,N],
(result, item) => {
result[item.i, item.j] = item.Data; // use pos captured earlier
return result; // to match signature required by Aggregate
});
Side note: this is essentially exercise in using LINQ and not code that you should be using in real code.
code have to capture position into anonymous object causing a lot of unnecessary allocations
I think this version is significantly harder to read compared to regular for version
Related
Example:
Say I have
var arr = new int[][] {
{ 1, 9, 4 },
{ 2, 4, 4 },
{ 3, 0, 5 }
};
and say I want the indices of 3. So I want a method that does the equivalent of
Tuple<int,int> indices;
for(int i = 0; i < arr.Length; ++i)
for(int j = 0; j < arr[i].Length; ++j)
if(arr[i][j] == 3)
return new Tuple<int,int>() { i, j }
ideally without having to write any extension methods and ideally in a way that is compact and efficient.
You can do this in LINQ in a "compact" manner - but as the comments suggest, a regular loop will trounce this for efficiency:
var indexes = arr.Select((a, x) => a.Select((v, y) => new { X = x, Y = y })
.Where(z => arr[z.X][z.Y] == 3)).SelectMany(x => x);
Even using LINQ, you still have to traverse the entire collection and build up the indexes (into an anonymous type here) and determine if they meet your criteria (and then flatten the result using SelectMany).
Also note this will return all instances, to get the first occurance simply throw a .First() on the end.
I'd strongly recommend a static helper or extension method in this case:
Requirements:
Create a list of n sequential numbers starting at a.
Exclude number x.
This is the best I have right now, the problem being that it creates n + 1 numbers if x is not within the range.
var numbers = Enumerable
.Range(a, numberOfDataRowsToAdd + 1)
.Where(i => i != TechnicalHeaderRowIndex);
Example 1 should produce 0,1,2,3,4,5,6,7,8,9.
var a = 0;
var n = 10;
var x = 11;
Example 2 should produce 0,1,2,3,4,5,7,8,9,10.
var a = 0;
var n = 10;
var x = 6;
Here is a Fiddle that demonstrates Mark's answer.
How about
Enumerable.Range(a, n + 1)
.Where(i => i != x)
.Take(n);
My example, how it can be done without LINQ and extra loop iterations:
public static IEnumerable<int> GenerateNumbers(int a, int n, int x)
{
for (var i = 0; i < n; i++)
{
if (a == x)
{
i--;
a++;
continue;
}
yield return a++;
}
}
But if you don't want create new method for this purpose, Mark Sowul or Jakub Lortz answers are better.
The problem can be described as
Get n + 1 sequential numbers starting from a
If x is in the range, remove x, otherwise remove the maximum number from the list
Translated to C#
int numberToExclude = Math.Min(n + a, x);
var numbers = Enumerable.Range(a, n + 1).Where(i => i != numberToExclude);
It makes sense to generate only necessary values instead of generating n + 1 values and then remove x:
Enumerable.Range(a, n).Select(i => i < x ? i : i + 1);
Example 1: 0,1,2,3,4,5,6,7,8,9.
Example 2: 0,1,2,3,4,5,7,8,9,10.
You can drop the last if your enumerable count is bigger than numberOfDataRowsToAdd
Extension method:
public static IEnumerable<T> DropLast<T>(this IEnumerable<T> enumerable)
{
return enumerable.Take(enumerable.Count()-1);
}
Usage:
var numbers = Enumerable
.Range(a, numberOfDataRowsToAdd + 1)
.Where(i => i != TechnicalHeaderRowIndex);
if(numbers.Count() > numberOfDataRowsToAdd)
numbers = numbers.DropLast();
I don't see what really is the challenge - Linq shortest or fastest or just working. How about the natural (which should also be the fastest Linq based)
var numbers = a <= x && x < a + n ?
Enumerable.Range(a, x - a).Concat(Enumerable.Range(x + 1, a - x + n)) :
Enumarble.Range(a, n);
Assuming I have a list of numbers, which could be any amount, realistically over 15.
I want to separate that list of numbers into three groups depending on their size, small, medium, and large for instance.
What is the best way of achieving this?
I've written out the below, is it necessary to make my own function as per below, or is there anything existing that I can utilise in .NET?
public static List<int> OrderByThree (List<int> list)
{
list.Sort();
int n = list.Count();
int small = n / 3;
int medium = (2 * n) / 3;
int large = n;
// depending if the number is lower/higher than s/m/l,
// chuck into group via series of if statements
return list;
}
Example
Say I have a list of numbers, 1-15 for instance, I want 1-5 in small, 6-10 in medium and 11-15 in large. However I won't know the amount of numbers at the start, no dramas, using list.count I was hoping to divide for my own function.
Since you have the list sorted already, you can use some LINQ to get the results. I'm assuming a right-closed interval here.
list.Sort();
int n = list.Count();
var smallGroup = list.TakeWhile(x => (x <= n / 3)).ToList();
var middleGroup = list.Skip(smallGroup.Count).TakeWhile(x => (x <= (2 * n) / 3)).ToList();
var largeGroup = list.Skip(smallGroup.Count + middleGroup.Count).ToList();
EDIT
As Steve Padmore commented, you probably will want to return a list of lists (List<List<int>>) from your method, rather than just List<int>.
return new List<List<int>> { smallGroup, middleGroup, largeGroup };
This would be a simple way of doing it:
var result = list.GroupBy (x =>
{
if(x <= small) return 1;
if(x <= medium) return 2;
return 3;
});
Or:
var result = list.GroupBy (x => x <= small ? 1 : x <= medium ? 2 : 3);
(This does not require the list to be sorted)
I knew the function would include a lot of data processing, but I didn't think it would end up taking minuets to process.
The function in question is fed a jagged 2D array which is made up of Paragraphs > Sentences this is made from a text file fed by the user so can be massive. It takes this array and compares every sentence to each other and saves a score between each sentence which is the number of common words.
This takes forever and I honestly didn't think it would.
My main test text is only 181 sentences long, but this translates to 32.7 thousand values scored in a 2D array.
This matrix of values is then used to calculate and select the most "relevant" sentences from each paragraph, and other things.
The 181 sentence text takes 1min 15seconds to process, a text of only 70 sentences takes 35 seconds, but this is based on number of sentences not words, but it gives you an idea. I dread to think how long it would take on an actual document.
The function in question:
protected void Intersection2DArray()
{
mainSentenceCoord = -1;
for (int x1 = 0; x1 < results.Length; x1++)
{
for (int x2 = 0; x2 < results[x1].Length; x2++)
{
var mainSentencesWords = wordSplit(results[x1][x2]);
secondarySentenceCoord = -1;
mainSentenceCoord++;
for (int y1 = 0; y1 < results.Length; y1++)
{
for (int y2 = 0; y2 < results[y1].Length; y2++)
{
var secondarySentencesWords = wordSplit(results[y1][y2]);
int commonElements = mainSentencesWords.Intersect(secondarySentencesWords).ToList().Count();
secondarySentenceCoord++;
intersectionArray[mainSentenceCoord, secondarySentenceCoord] = commonElements;
}
}
}
}
}
The wordSplit function:
protected List<String> wordSplit(string sentence)
{
var symbols = "£$€#&%+-.";
var punctuationsChars = Enumerable.Range(char.MinValue, char.MaxValue - char.MinValue)
.Select(i => (char)i)
.Where(c => char.IsPunctuation(c))
.Except(symbols)
.ToArray();
var words = sentence.Split(punctuationsChars)
.SelectMany(x => x.Split())
.Where(x => !(x.Length == 1 && symbols.Contains(x[0])))
.Distinct()
.ToList();
return words;
}
I initially wanted to do this split using one Regex line, but wouldn't figure it out, that may make it faster.
This loops through to select each sentence against each other, this is best I could come up. I'm fine with doing a total overall if it will drastically increase speed.
Edit: Using Moby Disk suggestion heres my new instant code:
Word Split function which is called once now and returns a List of List
public List<List<string>> createWordList()
{
List<List<string>> wordList = new List<List<string>>();
var symbols = "£$€#&%+-.";
var punctuationsChars = Enumerable.Range(char.MinValue, char.MaxValue - char.MinValue)
.Select(i => (char)i)
.Where(c => char.IsPunctuation(c))
.Except(symbols)
.ToArray();
for (int x1 = 0; x1 < results.Length; x1++)
{
for (int x2 = 0; x2 < results[x1].Length; x2++)
{
var words = results[x1][x2].Split(punctuationsChars)
.SelectMany(x => x.Split())
.Where(x => !(x.Length == 1 && symbols.Contains(x[0])))
.Distinct()
.ToList();
wordList.Add(words);
}
}
return wordList;
}
And the now super slim Intersection Function
protected void intersectionMatrix()
{
List<List<string>> wordList = createWordList();
mainSentenceCoord = -1;
for (var x = 0; x < wordList.Count; x++)
{
secondarySentenceCoord = -1;
mainSentenceCoord++;
for (var y = 0; y < wordList.Count; y++)
{
secondarySentenceCoord++;
intersectionArray[mainSentenceCoord, secondarySentenceCoord] = wordList[x].Intersect(wordList[y]).Count();
}
}
}
See update at the end:
There is some "low-hanging fruit" here that could speed it up a lot with out changing the algorithm itself:
wordSplit() recalculates "punctuationsChars" each time it is called. Do that once up front.
You are calling wordSplit() for the same sentence a N^2 number of times instead of N number of times since it is in both the outer (x1,x2) loop and the inner (y1,y2) loop. You only need to split 181 sentences, not 181^2 number of sentences. So instead, loop through results, call wordSplit once on each sentence, and store that result. If that takes up too much memory, look into memoization (http://en.wikipedia.org/wiki/Memoization) although I think you should be okay since it will only result in about 1 more copy of the text.
You don't need the ToList() after the Intersect(). That creates a list you don't use.
I'm confused as to the values of mainSentenceCoord and secondarySentenceCoord. What are the dimensions of the resulting intersectionArray?
OMG! #1 is it! That sped this up by a factor of 80x. Look at this line:
Enumerable.Range(char.MinValue, char.MaxValue - char.MinValue)
char.MaxValue is 65536. So if N=181, you are looping 181 x 181 x 65536! I just ran the profiler to confirm it: 98.4% of the CPU time is spent in the ToArray() calls in wordSplit.
I like to create an int[] with length X and value it with [0,1,2....X]
e.g.
public int[] CreateAA(int X){}
int[] AA = CreateAA(9) => [0,1,2,3,4,5,6,7,8,9]
is there any easy method? Or have to loop and init value
You can avail the functionality of IEnumerable.
int[] arr = Enumerable.Range(0, X+1).ToArray();
This will create a IEnumerable List for you and .ToArray() will satisfy your int array need.
So for X=9 in your case it would generate the array for [0,1,2,3,4,5,6,7,8,9] (as you need)
Using Enumerable.Range(0, 10).ToArray() is very concise but if you want to create a very large array the ToArray extension method will have to collect the numbers into a buffer that will have to be reallocated multiple times. On each reallocation the contents of the buffer is copied to the new larger buffer. .NET uses a strategy where the size of the buffer is doubled on each reallocation (and the initial buffer has four items).
So if you want to avoid multiple reallocations of the buffer you need to create the array in advance:
int[] aa = new int[10];
for (var i = 0; i < aa.Length; i += 1)
aa[i] = i;
This is the most efficient way of initializing the array.
However, if you need an array of say 100,000,000 consecutive numbers then you should look at a design where you don't have to keep all the numbers in an array to avoid the impact of the memory requirement. IEnumerable<int> is very useful for this purpose because you don't have to allocate the entire sequence but can produce it while you iterate and that is exactly what Enumerable.Range does. So avoiding the array of consecutive numbers in the first place may be even better than thinking about how to create it.
Why make a function when it is already there.
For this specific example, use
int[] AA = Enumerable.Range(0, 10).ToArray();
where 0 is the starting value and 10 (X + 1) is the length of array
So a general one applicable to all
int[] AA = Enumerable.Range(0, X + 1).ToArray();
with function and loop:
static int[] f(int X)
{
int[] a = new int[X+1];
for(int i = 0; i < a.Length; i++)
a[i] = i;
return a;
}
To initialize try this
int x = 10;
Enumerable.Range(0, x)
.Select((v, i) => v + i).ToArray();
For completeness, here is a function that creates an array.
I made it a bit more versatile by having parameters for the min and max value, i.e. CreateArray(0, 9) returns {0,1,2,3,4,5,6,7,8,9}.
static int[] CreateArray(int min, int max) {
int[] a = new int[max - min + 1];
for (int i = 0; i < a.Length; i++) {
a[i] = min + i;
}
return a;
}