hi i need to find the biggest dense region in a List of values based on a given range
example:
var radius =5; //Therm edited
var list = new List<int>{0,1,2,3,4,5,12,15,16,22,23,24,26,27,28,29};
//the following dense regions exist in the list above
var region1 = new List<int> { 0, 1, 2, 3, 4, 5 }; // exist 6 times (0, 1, 2, 3, 4, 5)
var region2 = new List<int> { 12, 15, 16}; // exist 3 times (12, 15, 16)
var region3 = new List<int> { 22, 23, 24, 26, 27}; // exist 1 times (22)
var region4 = new List<int> { 22, 23, 24, 26, 27, 28}; // exist 1 times (23)
var region5 = new List<int> { 22, 23, 24, 26, 27, 28, 29 }; // exist 3 times (24, 26, 27)
var region6 = new List<int> { 23, 24, 26, 27, 28, 29 }; // exist 1 times (28)
var region7 = new List<int> { 24, 26, 27, 28, 29 }; // exist 1 times (29)
//var result{22,23,24,26,27,28,29}
the solution doesn't really need to be fast because the max number of values is 21
is there an way to use fluent to achieve this?
i only know how to get the closest value
int closest = list.Aggregate((x,y) => Math.Abs(x-number) < Math.Abs(y-number) ? x : y);
and how to get values between 2 numbers
var between = list.Where(value=> min < value && value < max);
Edit
additional information's
Ok range is maybe the wrong therm radius would be a better word.
I define the dense region as the largest count of all values between currenvalue-range and currenvalue + range we get the dense region
A rather cryptic (but short) way would be:
int w = 5; // window size
var list = new List<int> { 0, 1, 2, 3, 4, 5, 12, 15, 16, 22,
23, 24, 26, 27, 28, 29 };
var result = list.Select(x => list.Where(y => y >= x - w && y <= x + w))
.Aggregate((a, b) => (a.Count() > b.Count()) ? a : b);
Console.WriteLine(string.Join(",", result.ToArray()));
Prints
22,23,24,26,27,28,29
This code consists of 3 steps:
For a given x the snippet list.Where(y => y >= x - w && y <= x + w) gives all elements from the list that are in the cluster around x.
list.Select(x => ...) computes that cluster for every element of the list.
.Aggregate((a, b) => (a.Count() > b.Count()) ? a : b) takes the cluster of maximum size.
Related
Initial DataFrame in Pandas
Let's suppose we have the following in Python with pandas:
import pandas as pd
df = pd.DataFrame({
"Col1": [10, 20, 15, 30, 45],
"Col2": [13, 23, 18, 33, 48],
"Col3": [17, 27, 22, 37, 52] },
index=pd.date_range("2020-01-01", "2020-01-05"))
df
Here's what we get in Jupyter:
Shifting columns
Now let's shift Col1 by 2 and store it in Col4.
We'll also store df['Col1'] / df['Col1'].shift(2) in Col5:
df_2 = df.copy(deep=True)
df_2['Col4'] = df['Col1'].shift(2)
df_2['Col5'] = df['Col1'] / df['Col1'].shift(2)
df_2
The result:
C# version
Now let's setup a similar DataFrame in C#:
#r "nuget:Microsoft.Data.Analysis"
using Microsoft.Data.Analysis;
var df = new DataFrame(
new PrimitiveDataFrameColumn<DateTime>("DateTime",
Enumerable.Range(0, 5).Select(i => new DateTime(2020, 1, 1).Add(new TimeSpan(i, 0, 0, 0)))),
new PrimitiveDataFrameColumn<int>("Col1", new []{ 10, 20, 15, 30, 45 }),
new PrimitiveDataFrameColumn<int>("Col2", new []{ 13, 23, 18, 33, 48 }),
new PrimitiveDataFrameColumn<int>("Col3", new []{ 17, 27, 22, 37, 52 })
);
df
The result in .NET Interactive:
Question
What's a good way to perform the equivalent column shifts as demonstrated in the Pandas version?
Notes
The above example is from the documentation for pandas.DataFrame.shift:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.shift.html
Update
It does indeed look like there isn't currently a built-in shift in Microsoft.Data.Analysis. I've posted an issue for this here:
https://github.com/dotnet/machinelearning/issues/6008
Helper functions
Perform a column shift.
PrimitiveDataFrameColumn<double> ShiftIntColumn(PrimitiveDataFrameColumn<int> col, int n, string name)
{
return
new PrimitiveDataFrameColumn<double>(
name,
Enumerable.Repeat((double?) null, n)
.Concat(col.Select(item => (double?) item))
.Take(col.Count()));
}
Carry out division, taking care of null values in divisor.
PrimitiveDataFrameColumn<double> DivAlt3(PrimitiveDataFrameColumn<int> a, PrimitiveDataFrameColumn<double> b, string name)
{
return
new PrimitiveDataFrameColumn<double>(name, a.Zip(b, (x, y) => y == null ? null : x / y));
}
Then the following:
var df = new DataFrame(
new PrimitiveDataFrameColumn<DateTime>("DateTime",
Enumerable.Range(0, 5).Select(i =>
new DateTime(2020, 1, 1).Add(new TimeSpan(i, 0, 0, 0)))),
new PrimitiveDataFrameColumn<int>("Col1", new []{ 10, 20, 15, 30, 45 }),
new PrimitiveDataFrameColumn<int>("Col2", new []{ 13, 23, 18, 33, 48 }),
new PrimitiveDataFrameColumn<int>("Col3", new []{ 17, 27, 22, 37, 52 })
);
df.Columns.Add(ShiftIntColumn((PrimitiveDataFrameColumn<int>)df["Col1"], 2, "Col4"));
df.Columns.Add(DivAlt3((PrimitiveDataFrameColumn<int>) df["Col1"], (PrimitiveDataFrameColumn<double>) df["Col4"], "Col5"));
results in:
Complete notebook
See the following notebook for a full demonstration of the above:
https://github.com/dharmatech/dataframe-shift-example-cs/blob/003/dataframe-shift-example-cs.ipynb
Notes
It would be great if Microsoft.Data.Analysis came with column shift functionality.
It would also be great if column division handled nulls natively.
Would love to see other perhaps more idiomatic approaches to this.
I have a list of numbers following an arithmetic sequence with a gap in between. For example [0,2,4,6,8,10,12,14,16,18,20,22,24,26,146,148,150,152,154,156,158,160,162,164,166,168,170,172,174,176,178,180]
I want to select the numbers till the gap is reached. In this example [0,2,4,6,8,10,12,14,16,18,20,22,24,26].
I have written this and I was wondering if this can be simplified using LINQ
double commonDifference = 2;
List<double> oldList = new List<double>() {
0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26,
146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180
};
List<double> newList = new List<double>();
for (int i = 0; i < oldList.Count - 1; i++)
{
if (oldList[i + 1] - oldList[i] == commonDifference)
{
newList.Add(oldList[i]);
}
else
{
newList.Add(oldList[i]);
break;
}
}
Check this Fiddle:
var arr = new [] {0,2,4,6,8,10,12,14,16,18,20,22,24,26,146,148,150,152,154,156,158,160,162,164,166,168,170,172,174,176,178,180};
var res = arr.TakeWhile(x => x <= arr.Aggregate((p,q) => p != q - 2 ? p : q)) ;
res.Dump();
You can try looping while getting difference from the first items; let's generalize the routine:
private static IEnumerable<double> UpToGap(IEnumerable<double> data) {
if (null == data)
throw new ArgumentNullException(nameof(data));
double difference = 0;
double prior = 0;
long count = 0;
// Math.Abs: when working with floating point (double)
// we should compare with tolerance, say 1e-8
foreach (int item in data)
if (count < 2 || Math.Abs(item - prior - difference) < 1e-8) {
yield return item;
count += 1;
difference = item - prior;
prior = item;
}
else
break;
}
Now you can exploit it:
using System.Linq;
...
List<double> newList = UpToGap(oldList).ToList();
Be careful, although Linq queries can be fancy sometimes they do use more than one iterator. This would be a case where a simple foreach would be just as readable, maintainable and additionally slightly more performant, since you're just iterating through the list once.
For example, i liked the approach of #presi
var newList = oldList.Zip(oldList.Skip(1), (previous, item) => new {previous, item}).Where(x => x.previous - x.item == commonDifference).Select(s => s.item).ToList();
But as you can see, oldList is iterated in the .Zip method, and again when it calls oldList.Skip(1)
You can always stick it into a function or extension method yourself. Here's another such example:
public static void Main()
{
double commonDifference = 2;
List<double> oldList = new List<double>()
{0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180};
List<double> newList = new List<double>();
foreach (var v in UpToGap(oldList, commonDifference))
{
Console.Write(v + " ");
}
Console.WriteLine("");
Console.WriteLine("> Done");
}
public static IEnumerable<double> UpToGap(IList<double> oldList, double diff) // Note this could be an extension method aswell
{
for (int i = 0; i < oldList.Count - 1; i++)
{
if (oldList[i] + diff == oldList[i + 1])
yield return oldList[i]; // Return items in an iterator
else
{
yield return oldList[i]; // Return the last item, before the gap
yield break; // Stop the iterator and return
}
}
}
0 2 4 6 8 10 12 14 16 18 20 22 24 26
> Done
https://dotnetfiddle.net/AEzXO3
This uses iterators with the yield keyword which linq extensively uses so its good to know about.
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/iterators
This also shows what defered execution means, which could be both an advantage and disadvantage (depending on what you want of course) when using linq.
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/deferred-execution-example
You can implement you own extension method and move from one element to another in more efficient way by using enumerator implementation.
public static IEnumerable<int> TakeUntilFirstGap(this IEnumerable<int> values)
{
using (var e = values.GetEnumerator())
{
if (e.MoveNext() == false)
{
yield break;
}
var first = e.Current;
if (e.MoveNext() == false)
{
yield break;
}
var previous = e.Current;
yield return first;
yield return previous;
var originalGap = previous - first;
while(e.MoveNext)
{
if (e.Current - previous != originalGap)
{
yield break;
}
yield return e.Current;
previous = e.Current
}
}
}
Usage
var withoutGap = values.TakeUntilFirstGap();
List<double> oldList = new List<double>() { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180 };
var newList = oldList
.Select((item, index) => (item, index))
.Where(item => item.index + 1 < oldList.Count)
.TakeWhile<(double current, int idx)>(item => item.current - oldList[item.idx + 1] == commonDifference)
.Select(item => item.current)
.ToList();
Explanation:
The first Select decomposes (deconstruction) the iterator into an item(double) and an index (int).
Without the break condition you would need to use Where, but because of early exit you should use the TakeWhile.
2.1) Here you need to deconstruct the ValueTuple (double current, int idx) in the generic type parameter of the TakeWhile. If you would do it inside the TakeWhile's predicate then it won't compile.
The second Select is needed to get rid of the index from the result.
UPDATE #1: I've added a new Where clause to my proposed solution because as juharr has pointed out it could throw IndexOutOfRangeException by over indexing the collection.
UPDATE #2: The original author of the question has fixed a bug in his/her algorithm. The newest code to reflect those changes:
List<double> oldList = new List<double>() { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180 };
var newList = oldList
.Select((item, index) => (item, index))
.Where(item => item.index + 1 < oldList.Count)
.TakeWhile<(double current, int idx)>(item => oldList[item.idx + 1] - item.current == commonDifference)
.Select(item => item.current)
.ToList();
The main thing is to find out what differentiate the 1st group of numbers from the 2nd. As we get this, it will be easy to construct a query. As far as I see, a) any member of the 1st group is smaller than any member of the 2nd group by at least 100; and b) the pairs of numbers in the group differs by 2. So let's "convert" this conditions into a LINQ form:
var firstGroup = oldList
.Where(
e => oldList.Any(e2 => e2 - e > 100)
&& oldList.Any(e2 => Math.Abs(e2 - e) == 2))
.ToList();
Suppose I have the following array (my sequences are all sorted in ascending order, and contain positive integers)
var tabSequence = new[] { 1, 2, 3, 7, 8, 9, 12, 15, 16, 17, 22, 23, 32 };
I made a code using LINQ and a loop to search missing numbers like that :
List<Int32> lstSearch = new List<int>();
var lstGroup = tabSequence
.Select((val, ind) => new { val, group = val - ind })
.GroupBy(v => v.group, v => v.val)
.Select(group => new{ GroupNumber = group.Key, Min = group.Min(), Max = group.Max() }).ToList();
for (int number = 0; number < lstGroup.Count; number++)
{
if (number < lstGroup.Count-1)
{
for (int missingNumber = lstGroup[number].Max+1; missingNumber < lstGroup[number+1].Min; missingNumber++)
lstSearch.Add(missingNumber);
}
}
var tabSequence2 = lstSearch.ToArray();
// Same result as var tabSequence2 = new[] {4, 5, 6, 10, 11, 13, 14, 18, 19, 20, 21, 24, 25, 26, 27, 28, 29, 30, 31 };
This code works but i'd like to know if there a better way to do the same thing only with linq.
Maybe I am just not understanding the problem. Your code seems very complicated, you could make this a lot simpler:
int[] tabSequence = new[] { 1, 2, 3, 7, 8, 9, 12, 15, 16, 17, 22, 23, 32 };
var results = Enumerable.Range(1, tabSequence.Max()).Except(tabSequence);
//results is: 4, 5, 6, 10, 11, 13, 14, 18, 19, 20, 21, 24, 25, 26, 27, 28, 29, 30, 31
I made a fiddle here
You can use IEnumerable.Aggregate to your advantage. The overload I choose uses an accumulator seed (empty List<IEnumerable<int>>) and proceeds to iterate over each item in your array.
The first time I set an lastNR defined before using the aggregate to the firsst number we iterate over. We compare the nexts iterations actual nr against this lastNr.
If we are in sequence we just increment the lastNr.
If not, we generate the missing numbers via Enumerable.Range(a,count) between lastNr
and the actual nr and add them to our accumulator-List. Then we set the lastNr to nr to continue.
public static List<IEnumerable<int>> GetMissingSeq(int[] seq)
{
var lastNr = int.MinValue;
var missing = seq.Aggregate(
new List<IEnumerable<int>>(),
(acc, nr) =>
{
if (lastNr == int.MinValue || lastNr == nr - 1)
{
lastNr = nr; // first ever or in sequence
return acc; // noting to do
}
// not in sequence, add the missing into our ac'umulator list
acc.Add(Enumerable.Range(lastNr + 1, nr - lastNr - 1));
lastNr = nr; //thats the new lastNR to compare against in the next iteration
return acc;
}
);
return missing;
}
Tested by:
public static void Main(string[] args)
{
var tabSequence = new[] { 1, 2, 3, 7, 8, 9, 12, 15, 16, 17, 22, 23, 32 };
var lastNr = int.MinValue;
var missing = tabSequence.Aggregate(
new List<IEnumerable<int>>(),
(acc, nr) =>
{
if (lastNr == int.MinValue || lastNr == nr - 1)
{
lastNr = nr; // first ever or in sequence
return acc; // noting to do
}
acc.Add(Enumerable.Range(lastNr + 1, nr - lastNr - 1));
return acc;
}
);
Console.WriteLine(string.Join(", ", tabSequence));
foreach (var inner in GetMissingSeq(tabSequence))
Console.WriteLine(string.Join(", ", inner));
Console.ReadLine();
}
Output:
1, 2, 3, 7, 8, 9, 12, 15, 16, 17, 22, 23, 32 // original followed by missing sequences
4, 5, 6
10, 11
13, 14
18, 19, 20, 21
24, 25, 26, 27, 28, 29, 30, 31
If you are not interested in the subsequences you can use GetMissingSeq(tabSequence).SelectMany(i => i) to flatten them into one IEnumerable.
This may be a bit difficult to describe, but I am building some statistical models and I need help with Linq syntax... My analogy is not exactly what I'm doing but it is the simplest description I can come up with...
I have an "ordered" list of inputs... Let's assume that I have an indeterminate list of Int's within a range... This list could contain anywhere from 1 to several million entities...
List<int> = [1, 3, 15, 16, 4, 27, 65, 2, 99, 3, 16, 21, 72, 1, 5, 7, 2, 8... ] (range 1 - 100).
What I am looking to extrapolate is "all" the ranges that contain a specific "search" set (or sub list) of entities. Each sub-list must contain "all" the entities from within the original input list, that is to maintain the inner erroneous data, and must not change the order... for instance, if I wanted to search for "all" ranges that contain [1, 2, 3, 4] from the list above I should get
[1, 3, 15, 16, 4, 27, 65, 2] - the first sub-list that contains the union of the search list.
[3, 15, 16, 4, 27, 65, 2, 99, 3, 16, 21, 72, 1] - next list...
[4, 27, 65, 2, 99, 3, 16, 21, 72, 1] - next list... etc...
The real critical piece of information is the "starting and ending indices" of each list... This data will need to be stored for use in a Neural Network as vector data... With this data the NN could simply use the index object to do the critical data calculations...
After some evaluation I realized that obviously each list will start and end with a search entity. This led me to start with this...
var indicies = lc.IntListData
.Select((v, i) => new { value = v, index = i })
.Where(n => search.Contains(n.value))
.ToList();
This reduced my list extensively, from looking at lists of millions of values to looking at thousands of anonymous types of value and index... Now, what I believe I need, is to find from "indicies", the first "n" anonymous types until I have at least one of each "value" in the list... No? Then simply use the min and max of the index values as necessary...
Any help with the Linq syntax to accomplish this would be most helpful.
LINQ version (with helper extension method):
If you're willing to accept a helper function to create tuples from your list of potential matches, you can get the LINQ down to this:
var subranges = matches
.Tuples(search.Length)
.Where(t => t.Select(n => n.Value).Distinct().Count() == search.Length)
.Select(t => new { StartIndex=t.Min(n => n.Index), EndIndex=t.Max(n => n.Index) })
.Select(r => list.Skip(r.StartIndex).Take(r.EndIndex-r.StartIndex+1))
;
The Tuples method is a variation of the extension method from this answer: https://stackoverflow.com/a/577612/209103.
public static IEnumerable<IEnumerable<T>> Tuples<T>(this IEnumerable<T> sequence, int nTuple)
{
if(nTuple <= 0) throw new ArgumentOutOfRangeException("nTuple");
for(int i = 0; i <= sequence.Count() - nTuple; i++)
for (int j = i+nTuple; j < sequence.Count(); j++)
yield return sequence.Skip(i).Take(j-i);
}
Note that Tuples is O(nn) and the entire solution is O(nn*n), so will not perform well for larger data sets.
Non LINQ versions below
Rough first draft of a non-LINQ version:
var list = new [] { 1, 3, 15, 16, 4, 27, 65, 2, 99, 3, 16, 21, 72, 1, 5, 7, 2, 8 };
var search = new [] { 1, 2, 3, 4 };
var group = new List<int>();
var start = -1;
for (var i=0; i < list.Length; i++) {
if (search.Contains(list[i])) {
if (!group.Any()) {
start = i;
}
if (!group.Contains(list[i])) {
group.Add(list[i]);
}
if (group.Count == search.Length) {
Console.WriteLine(start+" - "+i);
group.Clear();
i = start + 1;
start = -1;
}
}
}
This uses brute force, but could be optimized with your method of finding matching indices first.
Update - optimized by only considering relevant indices
var list = new [] { 1, 3, 15, 16, 4, 27, 65, 2, 1, 99, 3, 16, 21, 72, 1, 5, 7, 4, 2, 8 };
var search = new [] { 1, 2, 3, 4 };
var matches = list
.Select((v,i) => new { Value=v, Index=i })
.Where(n => search.Contains(n.Value))
.ToList()
;
for (var start=0; start < matches.Count(); start++) {
var group = new List<int>();
group.Add(matches[start].Value);
foreach (var match in matches.Skip(start)) {
if (!group.Contains(match.Value)) {
group.Add(match.Value);
}
if (group.Count == search.Length) {
Console.WriteLine(matches[start].Index+" - "+match.Index);
break;
}
}
}
Both approaches are non-LINQ. I'm not sure how to go about turning this into a LINQ expression. It's clearly a grouping exercise (.GroupBy), but I can't see what expression to group on.
For example
List contains integer values 34, 78, 20, 10, 17, 99, 101, 24, 50, 13
and the value to put is 11 at position 1, 4 and 5
Position is the index value which starts from 0
so the final result is => 34, 11, 78, 20, 10, 11, 17, 11, 99, 101, 24, 50, 13
My current code is as follows:
List<int> list_iNumbers = new List<int>();
list_iNumbers.Add(34);
list_iNumbers.Add(78);
list_iNumbers.Add(20);
list_iNumbers.Add(10);
list_iNumbers.Add(17);
list_iNumbers.Add(99);
list_iNumbers.Add(101);
list_iNumbers.Add(24);
list_iNumbers.Add(50);
list_iNumbers.Add(13);
List<int> list_iPosition = new List<int>();
list_iPosition.Add(1);
list_iPosition.Add(4);
list_iPosition.Add(5);
int iValueToInsert = 11;
Now How to insert at these positions and get the correct result?
Use Insert(index, element) method instead of Add. Something like that:
foreach(var pos in list_iPosition.OrderByDescending(x => x))
list_iNumbers.Insert(pos, iValueToInsert);
You have to do it from the last index, to make it right. That's why I used OrderByDescending first.
Non Linq Solution:
For(int i = 0; i<count_of_numbers_to_insert; i++)
{
list_iNumbers.Insert(pos+i, valueToInsert);
}