Equivalent of `shift` from pandas - c#

Initial DataFrame in Pandas
Let's suppose we have the following in Python with pandas:
import pandas as pd
df = pd.DataFrame({
"Col1": [10, 20, 15, 30, 45],
"Col2": [13, 23, 18, 33, 48],
"Col3": [17, 27, 22, 37, 52] },
index=pd.date_range("2020-01-01", "2020-01-05"))
df
Here's what we get in Jupyter:
Shifting columns
Now let's shift Col1 by 2 and store it in Col4.
We'll also store df['Col1'] / df['Col1'].shift(2) in Col5:
df_2 = df.copy(deep=True)
df_2['Col4'] = df['Col1'].shift(2)
df_2['Col5'] = df['Col1'] / df['Col1'].shift(2)
df_2
The result:
C# version
Now let's setup a similar DataFrame in C#:
#r "nuget:Microsoft.Data.Analysis"
using Microsoft.Data.Analysis;
var df = new DataFrame(
new PrimitiveDataFrameColumn<DateTime>("DateTime",
Enumerable.Range(0, 5).Select(i => new DateTime(2020, 1, 1).Add(new TimeSpan(i, 0, 0, 0)))),
new PrimitiveDataFrameColumn<int>("Col1", new []{ 10, 20, 15, 30, 45 }),
new PrimitiveDataFrameColumn<int>("Col2", new []{ 13, 23, 18, 33, 48 }),
new PrimitiveDataFrameColumn<int>("Col3", new []{ 17, 27, 22, 37, 52 })
);
df
The result in .NET Interactive:
Question
What's a good way to perform the equivalent column shifts as demonstrated in the Pandas version?
Notes
The above example is from the documentation for pandas.DataFrame.shift:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.shift.html
Update
It does indeed look like there isn't currently a built-in shift in Microsoft.Data.Analysis. I've posted an issue for this here:
https://github.com/dotnet/machinelearning/issues/6008

Helper functions
Perform a column shift.
PrimitiveDataFrameColumn<double> ShiftIntColumn(PrimitiveDataFrameColumn<int> col, int n, string name)
{
return
new PrimitiveDataFrameColumn<double>(
name,
Enumerable.Repeat((double?) null, n)
.Concat(col.Select(item => (double?) item))
.Take(col.Count()));
}
Carry out division, taking care of null values in divisor.
PrimitiveDataFrameColumn<double> DivAlt3(PrimitiveDataFrameColumn<int> a, PrimitiveDataFrameColumn<double> b, string name)
{
return
new PrimitiveDataFrameColumn<double>(name, a.Zip(b, (x, y) => y == null ? null : x / y));
}
Then the following:
var df = new DataFrame(
new PrimitiveDataFrameColumn<DateTime>("DateTime",
Enumerable.Range(0, 5).Select(i =>
new DateTime(2020, 1, 1).Add(new TimeSpan(i, 0, 0, 0)))),
new PrimitiveDataFrameColumn<int>("Col1", new []{ 10, 20, 15, 30, 45 }),
new PrimitiveDataFrameColumn<int>("Col2", new []{ 13, 23, 18, 33, 48 }),
new PrimitiveDataFrameColumn<int>("Col3", new []{ 17, 27, 22, 37, 52 })
);
df.Columns.Add(ShiftIntColumn((PrimitiveDataFrameColumn<int>)df["Col1"], 2, "Col4"));
df.Columns.Add(DivAlt3((PrimitiveDataFrameColumn<int>) df["Col1"], (PrimitiveDataFrameColumn<double>) df["Col4"], "Col5"));
results in:
Complete notebook
See the following notebook for a full demonstration of the above:
https://github.com/dharmatech/dataframe-shift-example-cs/blob/003/dataframe-shift-example-cs.ipynb
Notes
It would be great if Microsoft.Data.Analysis came with column shift functionality.
It would also be great if column division handled nulls natively.
Would love to see other perhaps more idiomatic approaches to this.

Related

Search missing numbers in sequence

Suppose I have the following array (my sequences are all sorted in ascending order, and contain positive integers)
var tabSequence = new[] { 1, 2, 3, 7, 8, 9, 12, 15, 16, 17, 22, 23, 32 };
I made a code using LINQ and a loop to search missing numbers like that :
List<Int32> lstSearch = new List<int>();
var lstGroup = tabSequence
.Select((val, ind) => new { val, group = val - ind })
.GroupBy(v => v.group, v => v.val)
.Select(group => new{ GroupNumber = group.Key, Min = group.Min(), Max = group.Max() }).ToList();
for (int number = 0; number < lstGroup.Count; number++)
{
if (number < lstGroup.Count-1)
{
for (int missingNumber = lstGroup[number].Max+1; missingNumber < lstGroup[number+1].Min; missingNumber++)
lstSearch.Add(missingNumber);
}
}
var tabSequence2 = lstSearch.ToArray();
// Same result as var tabSequence2 = new[] {4, 5, 6, 10, 11, 13, 14, 18, 19, 20, 21, 24, 25, 26, 27, 28, 29, 30, 31 };
This code works but i'd like to know if there a better way to do the same thing only with linq.
Maybe I am just not understanding the problem. Your code seems very complicated, you could make this a lot simpler:
int[] tabSequence = new[] { 1, 2, 3, 7, 8, 9, 12, 15, 16, 17, 22, 23, 32 };
var results = Enumerable.Range(1, tabSequence.Max()).Except(tabSequence);
//results is: 4, 5, 6, 10, 11, 13, 14, 18, 19, 20, 21, 24, 25, 26, 27, 28, 29, 30, 31
I made a fiddle here
You can use IEnumerable.Aggregate to your advantage. The overload I choose uses an accumulator seed (empty List<IEnumerable<int>>) and proceeds to iterate over each item in your array.
The first time I set an lastNR defined before using the aggregate to the firsst number we iterate over. We compare the nexts iterations actual nr against this lastNr.
If we are in sequence we just increment the lastNr.
If not, we generate the missing numbers via Enumerable.Range(a,count) between lastNr
and the actual nr and add them to our accumulator-List. Then we set the lastNr to nr to continue.
public static List<IEnumerable<int>> GetMissingSeq(int[] seq)
{
var lastNr = int.MinValue;
var missing = seq.Aggregate(
new List<IEnumerable<int>>(),
(acc, nr) =>
{
if (lastNr == int.MinValue || lastNr == nr - 1)
{
lastNr = nr; // first ever or in sequence
return acc; // noting to do
}
// not in sequence, add the missing into our ac'umulator list
acc.Add(Enumerable.Range(lastNr + 1, nr - lastNr - 1));
lastNr = nr; //thats the new lastNR to compare against in the next iteration
return acc;
}
);
return missing;
}
Tested by:
public static void Main(string[] args)
{
var tabSequence = new[] { 1, 2, 3, 7, 8, 9, 12, 15, 16, 17, 22, 23, 32 };
var lastNr = int.MinValue;
var missing = tabSequence.Aggregate(
new List<IEnumerable<int>>(),
(acc, nr) =>
{
if (lastNr == int.MinValue || lastNr == nr - 1)
{
lastNr = nr; // first ever or in sequence
return acc; // noting to do
}
acc.Add(Enumerable.Range(lastNr + 1, nr - lastNr - 1));
return acc;
}
);
Console.WriteLine(string.Join(", ", tabSequence));
foreach (var inner in GetMissingSeq(tabSequence))
Console.WriteLine(string.Join(", ", inner));
Console.ReadLine();
}
Output:
1, 2, 3, 7, 8, 9, 12, 15, 16, 17, 22, 23, 32 // original followed by missing sequences
4, 5, 6
10, 11
13, 14
18, 19, 20, 21
24, 25, 26, 27, 28, 29, 30, 31
If you are not interested in the subsequences you can use GetMissingSeq(tabSequence).SelectMany(i => i) to flatten them into one IEnumerable.

LINQ Solution for Multiple Resolves

I have an array of MyClass, which can be simplified as follow:
public class MyClass {
public int Id;
public string Origin;
public int Points;
public DateTime RequestTime;
public MyClass(int id, string origin, int points, DateTime requestTime) {
Id = id;
Origin = origin;
Points = points;
RequestTime = requestTime;
}
}
Now, in the Array, without any errors from the user side or throughout the input process, there cannot be MyClass instance with identical Id and Origin.
However, if there be any, I should resolve it. And here are the resolving rules:
Firstly by Points - that is, to take one among the duplicates which has the highest Points
But if the Points are the same, I have to further resolve it by using RequestTime - the latest will be taken.
And if, there is no difference in RequestTime, then I can take one of the duplicates arbitrarily.
Here is the sample data input I have:
MyClass[] myarr = new MyClass[] {
new MyClass(1, "Ware House 1", 5, new DateTime(2016, 1, 26, 14, 0, 0)), //[0]
new MyClass(1, "Ware House 1", 7, new DateTime(2016, 1, 26, 14, 0, 0)), //[1] //higher points
new MyClass(1, "Ware House 2", 7, new DateTime(2016, 1, 26, 14, 0, 0)), //[2]
new MyClass(1, "Ware House 2", 7, new DateTime(2016, 1, 26, 14, 1, 0)), //[3] //later time
new MyClass(1, "Ware House 2", 7, new DateTime(2016, 1, 26, 14, 0, 0)), //[4]
new MyClass(2, "Ware House 2", 7, new DateTime(2016, 1, 26, 14, 0, 0)), //[5] //higher points
new MyClass(2, "Ware House 2", 5, new DateTime(2016, 1, 26, 14, 1, 0)), //[6] //later time but less points
new MyClass(3, "Ware House 1", 6, new DateTime(2016, 1, 26, 14, 0, 0)), //[7] //triplet, pick any
new MyClass(3, "Ware House 1", 6, new DateTime(2016, 1, 26, 14, 0, 0)), //[8] //triplet, pick any
new MyClass(3, "Ware House 1", 6, new DateTime(2016, 1, 26, 14, 0, 0)) //[9] //triplet, pick any
};
The final result should be [1], [3], [5], + any of [7]/[8]/[9]
I want to implement LINQ solution for it, but stuck. I do not know how make query it at once.
Any idea?
Group by {Id, Origin} and take the first one from each group when ordered by Points and RequestTime:
var query = from x in myarr
group x by new {x.Id, x.Origin}
into g
select (
from z in g
orderby z.Points descending, z.RequestTime descending
select z).First();
In method syntax, this is:
var query =
myarr.GroupBy(x => new {x.Id, x.Origin})
.Select(g => g.OrderByDescending(z => z.Points)
.ThenByDescending(z => z.RequestTime)
.First());
Try following:
myarr.OrderBy(m=> m.Points).ToList();
or
myarr.OrderBy(m=> m.Points).Orderby(m=> m.RequestTime);

Get a specific value from a specific array when having two dropdown lists

I have 5 arrays which represents 1 city each. Each position in the array represents the distance to another city (all arrays shares the same position for each specific city). And I have two dropdown lists from where the user is supposed to select two cities to calculate the distance between them.
It's set up like this:
// City0, City1, City2, City3, City4
int[] distanceFromCity0 = { 0, 16, 39, 9, 24 };
int[] distanceFromCity1 = { 16, 0, 36, 32, 54 };
int[] distanceFromCity2 = { 39, 36, 0, 37, 55 };
int[] distanceFromCity3 = { 9, 32, 37, 0, 21 };
int[] distanceFromCity4 = { 24, 54, 55, 21, 0 };
int cityOne = Convert.ToInt16(DropDownList1.SelectedValue);
int cityTwo = Convert.ToInt16(DropDownList2.SelectedValue);
And within the dropdown lists each city has the corresponding ID (city0 = 0, city1 = 1 etc)
I have tried a few different ways, but none of them really works.
So basically, how do I "connect" DropDownList1 to one of the arrays depending on the choice, and then connecting DropDownList2 to one of the positions in the selected array (from DropDownList1 selection) and print it out to Label1?
Is it easier with a 2D array?
This probably looks easy for you, but I'm a noob in C#.
One way would be to combine distanceFromCity0 ... distanceFromCity4 into a single 2D array and use the two cities as indexes to the distance value:
int[][] distanceBetweenCities = {
new[]{ 0, 16, 39, 9, 24 },
new[]{ 16, 0, 36, 32, 54 },
new[]{ 39, 36, 0, 37, 55 },
new[]{ 9, 32, 37, 0, 21 },
new[]{ 24, 54, 55, 21, 0 }
};
int cityOne = Convert.ToInt32(DropDownList1.SelectedValue);
int cityTwo = Convert.ToInt32(DropDownList2.SelectedValue);
var distance = distanceBetweenCities[cityOne][cityTwo];
Yes, using two-dimensional array is very easy. You can regard it like a matrix. Some code like below:
int[,] distanceMatrix = new int[5, 5] { { 0, 16, 39, 9, 24 },
{ 16, 0, 36, 32, 54 },
{ 39, 36, 0, 37, 55 },
{ 9, 32, 37, 0, 21 },
{ 24, 54, 55, 21, 0 }
};
int cityOne = Convert.ToInt32(DropDownList1.SelectedValue);
int cityTwo = Convert.ToInt32(DropDownList2.SelectedValue);
var distance = distanceMatrix[cityOne, cityTwo]; //the distance between cityOne and cityTwo;

How to find a dense region in a List<int>

hi i need to find the biggest dense region in a List of values based on a given range
example:
var radius =5; //Therm edited
var list = new List<int>{0,1,2,3,4,5,12,15,16,22,23,24,26,27,28,29};
//the following dense regions exist in the list above
var region1 = new List<int> { 0, 1, 2, 3, 4, 5 }; // exist 6 times (0, 1, 2, 3, 4, 5)
var region2 = new List<int> { 12, 15, 16}; // exist 3 times (12, 15, 16)
var region3 = new List<int> { 22, 23, 24, 26, 27}; // exist 1 times (22)
var region4 = new List<int> { 22, 23, 24, 26, 27, 28}; // exist 1 times (23)
var region5 = new List<int> { 22, 23, 24, 26, 27, 28, 29 }; // exist 3 times (24, 26, 27)
var region6 = new List<int> { 23, 24, 26, 27, 28, 29 }; // exist 1 times (28)
var region7 = new List<int> { 24, 26, 27, 28, 29 }; // exist 1 times (29)
//var result{22,23,24,26,27,28,29}
the solution doesn't really need to be fast because the max number of values is 21
is there an way to use fluent to achieve this?
i only know how to get the closest value
int closest = list.Aggregate((x,y) => Math.Abs(x-number) < Math.Abs(y-number) ? x : y);
and how to get values between 2 numbers
var between = list.Where(value=> min < value && value < max);
Edit
additional information's
Ok range is maybe the wrong therm radius would be a better word.
I define the dense region as the largest count of all values between currenvalue-range and currenvalue + range we get the dense region
A rather cryptic (but short) way would be:
int w = 5; // window size
var list = new List<int> { 0, 1, 2, 3, 4, 5, 12, 15, 16, 22,
23, 24, 26, 27, 28, 29 };
var result = list.Select(x => list.Where(y => y >= x - w && y <= x + w))
.Aggregate((a, b) => (a.Count() > b.Count()) ? a : b);
Console.WriteLine(string.Join(",", result.ToArray()));
Prints
22,23,24,26,27,28,29
This code consists of 3 steps:
For a given x the snippet list.Where(y => y >= x - w && y <= x + w) gives all elements from the list that are in the cluster around x.
list.Select(x => ...) computes that cluster for every element of the list.
.Aggregate((a, b) => (a.Count() > b.Count()) ? a : b) takes the cluster of maximum size.

How to add a value to the various position of a int list?

For example
List contains integer values 34, 78, 20, 10, 17, 99, 101, 24, 50, 13
and the value to put is 11 at position 1, 4 and 5
Position is the index value which starts from 0
so the final result is => 34, 11, 78, 20, 10, 11, 17, 11, 99, 101, 24, 50, 13
My current code is as follows:
List<int> list_iNumbers = new List<int>();
list_iNumbers.Add(34);
list_iNumbers.Add(78);
list_iNumbers.Add(20);
list_iNumbers.Add(10);
list_iNumbers.Add(17);
list_iNumbers.Add(99);
list_iNumbers.Add(101);
list_iNumbers.Add(24);
list_iNumbers.Add(50);
list_iNumbers.Add(13);
List<int> list_iPosition = new List<int>();
list_iPosition.Add(1);
list_iPosition.Add(4);
list_iPosition.Add(5);
int iValueToInsert = 11;
Now How to insert at these positions and get the correct result?
Use Insert(index, element) method instead of Add. Something like that:
foreach(var pos in list_iPosition.OrderByDescending(x => x))
list_iNumbers.Insert(pos, iValueToInsert);
You have to do it from the last index, to make it right. That's why I used OrderByDescending first.
Non Linq Solution:
For(int i = 0; i<count_of_numbers_to_insert; i++)
{
list_iNumbers.Insert(pos+i, valueToInsert);
}

Categories