Finding consecutive rows to group based on arbitrary - c#

I need to group a list (really a datatable but for simplicity) based on 3 columns where in the first column in really a representation of a fiscal year in integer form. I need to group on all rows where:
the current fiscal year has the second column (Service) equal to 1 and
the following fiscal year is equal to 1, and any consecutive fiscal years have 1 as the Service
The end result group will then have merged all consecutive fiscal years that have 1 as the Service, by taking the first year the group started to the last year it ended while summing the Service and Earnings.
var list = new List(){
{20002001, 1 , 100.00},
{20012002, .5 , 100.00},
{20022003, 1.0 , 100.00},
{20042005, 1.0 , 50.00},
{20052006, 1.0 , 50.00
};
Should produce the results:
20002001, 1 , 100.00
20012002, 1 , 100.00
20022006, 3 , 200.00
This is not a legible example to work off of but maybe it will shed light on what I am attempting to work towards:
var test = from r in list
where r.Item("FiscalYear") + 10001 = list.SkipWhile(r2 => !r.Equals(current)).Skip(1).FirstOrDefault(r3 => r3.Item("FiscalYear")) &&
r.Item("Service") = 1 &&
list.SkipWhile(r2 => !r.Equals(current)).Skip(1).FirstOrDefault(r => r.Item("Service") = 1D)
Select New { FiscalYear = $"{r.Item("FiscalYear") % 1000}{I have NO IDEA HOW TO DETERMINE THIS PART}", Service = list.Sum(r => r.Item("FiscalYear"), Earnings = list.Sum(r => r.Item("Earnings"))
Assuming the get next would work from http://www.herlitz.nu/2011/12/01/getting-the-previous-and-next-record-from-list-using-linq/. But it does not work in my scenerio.
I have considered a group by which I could do except I would miss out on the correct counts to group the Fiscal Year by.

Using the GroupAdjacent by function described at https://stackoverflow.com/a/4682163/6137718 :
public static class LinqExtensions
{
public static IEnumerable<IEnumerable<T>> GroupAdjacentBy<T>(
this IEnumerable<T> source, Func<T, T, bool> predicate)
{
using (var e = source.GetEnumerator())
{
if (e.MoveNext())
{
var list = new List<T> { e.Current };
var pred = e.Current;
while (e.MoveNext())
{
if (predicate(pred, e.Current))
{
list.Add(e.Current);
}
else
{
yield return list;
list = new List<T> { e.Current };
}
pred = e.Current;
}
yield return list;
}
}
}
}
The result you want can be gotten like this. Note that I use string for year, but this can be changes to use int, if needed. Using the following class structure:
class Entry
{
public Entry(string year, double service, double earnings)
{
this.Year = year;
this.Service = service;
this.Earnings = earnings;
}
public string Year;
public double Service;
public double Earnings;
}
You can get the result you desire by doing something like this:
var result = list.GroupAdjacentBy((x, y) => x.Service == 1 && y.Service == 1).
Select(g => new Entry(
g.First().Year.Substring(0,4) + g.Last().Year.Substring(4,4),
g.Sum(e => e.Service),
g.Sum(e => e.Earnings)));
An example of my code can be found at https://dotnetfiddle.net/RqmYa9 .
I'm unsure as to why, in your example result, the second entry has 1 for Service instead of 0.5. If you wanted all Services to be at least one, you can do a ternary in the query when you select the sum of Service.

Related

How to sum periods between dates in linq c# till a specific period

I have a datatable that contains three columns, I need to check when each employee ID reached two or will reach two years by subtracting Date1 from Date2 and sum the difference by a LINQ query.
If Date2 value is null that means ID is still working till now.
ID Date1 Date2
> 100 10/01/2016 09/01/2017
> 100 20/09/2017 25/05/2019
> 101 05/07/2018
I need output like below:
ID two_years
> 100 19/09/2018
> 101 04/07/2020
Using my Scan extension which is based on the APL Scan operator (like Aggregate, only it returns the intermediate results):
public static class IEnumerableExt {
// TRes seedFn(T FirstValue)
// TRes combineFn(TRes PrevResult, T CurValue)
public static IEnumerable<TRes> Scan<T, TRes>(this IEnumerable<T> src, Func<T, TRes> seedFn, Func<TRes, T, TRes> combineFn) {
using (var srce = src.GetEnumerator()) {
if (srce.MoveNext()) {
var prev = seedFn(srce.Current);
while (srce.MoveNext()) {
yield return prev;
prev = combineFn(prev, srce.Current);
}
yield return prev;
}
}
}
}
Then assuming by two years, you mean 2 * 365 days, and assuming you count the beginning and ending dates of each period as part of the total, this LINQ will find the answer:
var ans = src.Select(s => new { s.ID, s.Date1, s.Date2, Diff = (s.Date2.HasValue ? s.Date2.Value-s.Date1 : DateTime.Now.Date-s.Date1).TotalDays+1 })
.GroupBy(s => s.ID)
.Select(sg => new { ID = sg.Key, sg = sg.Scan(s => new { s.Date1, s.Date2, s.Diff, DiffAccum = s.Diff }, (res, s) => new { s.Date1, s.Date2, s.Diff, DiffAccum = res.DiffAccum + s.Diff }) })
.Select(IDsg => new { IDsg.ID, two_year_base = IDsg.sg.FirstOrDefault(s => s.DiffAccum > twoYears) ?? IDsg.sg.Last() })
.Select(s => new { s.ID, two_years = s.two_year_base.Date1.AddDays(twoYears-(s.two_year_base.DiffAccum - s.two_year_base.Diff)).Date });
If your original data is not sorted in Date1 order, or ID+Date1 order, you will need to add OrderBy to sort by Date1.
Explanation:
First we compute the days worked(?) represented by each row, using today if we don't have an ending Date2. Then group by ID and work on each group. For each ID, we compute the running sum of days worked. Next find the Date1 that proceeds the two year mark using the running sum (DiffAccum) and compute the two year date from Date1 and the remaining time needed in that period.
If it is possible a particular ID could have a lot of periods, you could use another variation of Scan, ScanUntil which short circuits evaluation based on a predicate.

Filter c# list using timestamp, Take first record of each 5 seconds

I have a scenario like to filter the record based on timings.That is first record in a range of 5 seconds.
Example :
Input data :
data timings
1452 10:00:11
1455 10:00:11
1252 10:00:13
1952 10:00:15
1454 10:00:17
1451 10:00:19
1425 10:00:20
1425 10:00:21
1459 10:00:23
1422 10:00:24
Expected output
1452 10:00:11
1454 10:00:17
1459 10:00:23
I have tried to group the data based on timings like below
listSpacialRecords=listSpacialRecords.GroupBy(x => x.timings).Select(x => x.FirstOrDefault()).ToList();
But using this i can only filter the data using same time.
It hope someone can help me to resolve this
List contain huge data, so is there any way rather than looping through list ?
This works for me:
var results =
source
.Skip(1)
.Aggregate(
source.Take(1).ToList(),
(a, x) =>
{
if (x.timings.Subtract(a.Last().timings).TotalSeconds >= 5.0)
{
a.Add(x);
}
return a;
});
I get your desired output.
This should do (assuming listSpacialRecords is in order)
var result = new List<DateTime>();
var distance = TimeSpan.FromSeconds(5);
var pivot = default(DateTime);
foreach(var record in listSpacialRecords)
{
if(record.timings > pivot)
{
result.Add(record.timings); // yield return record.timings; as an alternative if you need defered execution
pivot = record.timings +distance;
}
}
If not, easiest but maybe not the most efficient way would be to change the foreach a littlebit
foreach(var time in listSpacialRecords.OrderBy(t=>t))
Doing this only using Linq is possible, but wont benefit readability.
assuming your class looks something like this:
public class DataNode
{
public int Data { get; set; }
public TimeSpan Timings { get; set; }
}
I wrote an extension method:
public static IEnumerable<DataNode> TimeFilter(this IEnumerable<DataNode> list, int timeDifference )
{
DataNode LastFound = null;
foreach (var item in list.OrderByDescending(p=> p.Timings))
{
if (item.Timings > LastFound?.Timings.Add(new TimeSpan(0,0,timeDifference)))
{
LastFound = item;
yield return item;
}
}
}
This can then be used like this
var list = new List<DataNode>();
var result = list.TimeFilter(5);
Something like this approach may work, using the % Operator (Modulo)
Assumptions
The list is in order
You don't care if it skips missing seconds
There is always a first element
And this is only within a 24 hour period
Note : Totally untested
var seconds = listSpacialRecords
.First() // get the first element
.Timmings
.TimeOfDay // convert it to TimeSpan
.TotalSeconds; // get the total seconds of the day
var result = listSpacialRecords
.Where(x => (x.Timmings
.TimeOfDay
.TotalSeconds - seconds) % 5 == 0)
// get the difference and mod 5
.ToList();

finding sequential patterns of objects in a list with particular properties

I have a class like this:
public class TestResults
{
public String TestName {get;set;}
public Int32 StudentID {get;set;}
public Decimal Score {get;set;}
public Date TestTaken {get;set;}
}
So some objects mike look like this:
test.TestName = "Big Important Test";
test.StudentID = 17;
test.Score = 0.75M;
test.TestTaken = "1/1/2015";
tests.add(test);
test.TestName = "Big Important Test";
test.StudentID = 12;
test.Score = 0.89M;
test.TestTaken = "1/1/2015";
tests.add(test);
test.TestName = "Sneaky Pop Quiz in Chemistry";
test.StudentID = 17;
test.Score = 0.97M;
test.TestTaken = "2/1/2015";
tests.add(test);
test.TestName = "Sneaky Pop Quiz in Chemistry";
test.StudentID = 17;
test.Score = 0.97M;
test.TestTaken = "2/1/2015";
tests.add(test);
What I'm trying to determine is something like "For every student, show me students with large jumps in their scores?" I asked a similar question a while back in the dba.stackexchange.com world and have used the LEAD function, but now I'd like to move the logic into C#.
So a concrete question I'd want to code for would be (as an example):
Show me students who've jumped from the 60 and 70 percent range to the
90 range.
I know I can write a rat's nest of loops and branching logic, but was wondering if there are any more elegant and more comprehensive ways of identifying sequences of patterns in LINQ / C# land.
I've heard people talk about F#, but have no practical experience with that. Additionally, I think the "pattern matching" I'm talking about is a bit more involved than some of the simple string-pattern-matching I keep running across.
You could use LINQ to get the answer. Here is an example of a way you could do it:
var scores = tests.GroupBy(t => t.StudentID)
.Select(g => new { StudentID = g.Key, Min = g.Min(i => i.Score), Max = g.Max(i => i.Score) })
.Where(s => s.Max - s.Min > .20M);
foreach(var score in scores)
Console.WriteLine("Student: {0} Jump: {1}", score.StudentID, score.Max - score.Min);
The LINQ statement first groups by StudentID. Next it projects the StudentID and Min and Max scores from each group to a new anonymous type. Finally, apply a where condition that only returns items with a "large jump in score". I define "large jump in score" as the difference between max score and min score is greater than .20.
Note: this code will work even when a student has more than 2 scores in the list.
UPDATE:
Since you have updated your post I understand your question better. Here is an updated answer:
var scores = tests.GroupBy(t => t.StudentID)
.Select(g => new { StudentID = g.Key, Min = g.OrderBy(i => i.Score).First(), Max = g.OrderByDescending(i => i.Score).First() })
.Where(s => (s.Min.Score >= .60M & s.Min.Score < .80M) & s.Max.Score >= .90M & s.Min.TestTaken < s.Max.TestTaken);
foreach(var score in scores)
Console.WriteLine("Student: {0} Jump: {1}", score.StudentID, score.Max.Score - score.Min.Score);
This uses a similar approach, but instead of recording the min and max scores in the anonymous type, I record the TestResults instance having the min score and max score. In ther where clause we check that the TestResults having the min score is in the 60-80 range. We check that the TestResults having the max score is in the 90+ range. Finally, we check that the min score occurred on a date before the max one occurred.
You can do something like this:
const decimal differenceLimit = 0.05M;
var studentIdsWithJump = tests.GroupBy (g => g.StudentID)
.Where(g => g.OrderBy(c => c.Score)
.GroupAdjacentBy((first, second) =>
first.Score + differenceLimit < second.Score
).Count() > 1
)
.Select(g => g.Key);
With the helper method taken from here:
public static class LinqExtensions
{
public static IEnumerable<IEnumerable<T>> GroupAdjacentBy<T>(this IEnumerable<T> source, Func<T, T, bool> predicate)
{
using (var e = source.GetEnumerator())
{
if (e.MoveNext())
{
var list = new List<T> { e.Current };
var pred = e.Current;
while (e.MoveNext())
{
if (predicate(pred, e.Current))
{
list.Add(e.Current);
}
else
{
yield return list;
list = new List<T> { e.Current };
}
pred = e.Current;
}
yield return list;
}
}
}
}
This gives you the jumps for all ranges. If you want to narrow it down, you could add a further .Where() for scores > 60, and adjust the differenceLimit accordingly

How to calculate a running total using linq

I have a linq query result as shown in the image. In the final query (not shown) I am grouping by Year by LeaveType. However I want to calculate a running total for the leaveCarriedOver per type over years. That is, sick LeaveCarriedOver in 2010 becomes "opening" balance for sick leave in 2011 plus the one for 2011.
I have done another query on the shown result list that looks like:
var leaveDetails1 = (from l in leaveDetails
select new
{
l.Year,
l.LeaveType,
l.LeaveTaken,
l.LeaveAllocation,
l.LeaveCarriedOver,
RunningTotal = leaveDetails.Where(x => x.LeaveType == l.LeaveType).Sum(x => x.LeaveCarriedOver)
});
where leaveDetails is the result from the image.
The resulting RunningTotal is not cumulative as expected. How can I achieve my initial goal. Open to any ideas - my last option will be to do it in javascript in the front-end. Thanks in advance
The simple implementation is to get the list of possible totals first then get the sum from the details for each of these categories.
getting the distinct list of Year and LeaveType is a group by and select first of each group. we return a List<Tuple<int, string>> where Int is the year and string is the LeaveType
var distinctList = leaveDetails1.GroupBy(data => new Tuple<int, string>(data.Year, data.LeaveType)).Select(data => data.FirstOrDefault()).ToList();
then we want total for each of these elements so you want a select of that list to return the id (Year and LeaveType) plus the total so an extra value to the Tuple<int, string, int>.
var totals = distinctList.Select(data => new Tuple<int, string, int>(data.Year, data.LeaveType, leaveDetails1.Where(detail => detail.Year == data.Year && detail.LeaveType == data.LeaveType).Sum(detail => detail.LeaveCarriedOver))).ToList();
reading the line above you can see it take the distinct totals we want to list, create a new object, store the Year and LeaveType for reference then set the last Int with the Sum<> of the filtered details for that Year and LeaveType.
If I completely understand what you are trying to do then I don't think I would rely on the built in LINQ operators exclusively. I think (emphasis on think) that any combination of the built in LINQ operators is going to solve this problem in O(n^2) run-time.
If I were going to implement this in LINQ then I would create an extension method for IEnumerable that is similar to the Scan function in reactive extensions (or find a library out there that has already implemented it):
public static class EnumerableExtensions
{
public static IEnumerable<TAccumulate> Scan<TSource, TAccumulate>(
this IEnumerable<TSource> source,
TAccumulate seed,
Func<TAccumulate, TSource, TAccumulate> accumulator)
{
// Validation omitted for clarity.
foreach(TSource value in source)
{
seed = accumulator.Invoke(seed, value);
yield return seed;
}
}
}
Then this should do it around O(n log n) (because of the order by operations):
leaveDetails
.OrderBy(x => x.LeaveType)
.ThenBy(x => x.Year)
.Scan(new {
Year = 0,
LeaveType = "Seed",
LeaveTaken = 0,
LeaveAllocation = 0.0,
LeaveCarriedOver = 0.0,
RunningTotal = 0.0
},
(acc, x) => new {
x.Year,
x.LeaveType,
x.LeaveTaken,
x.LeaveAllocation,
x.LeaveCarriedOver,
RunningTotal = x.LeaveCarriedOver + (acc.LeaveType != x.LeaveType ? 0 : acc.RunningTotal)
});
You don't say, but I assume the data is coming from a database; if that is the case then you could get leaveDetails back already sorted and skip the sorting here. That would get you down to O(n).
If you don't want to create an extension method (or go find one) then this will achieve the same thing (just in an uglier way).
var temp = new
{
Year = 0,
LeaveType = "Who Cares",
LeaveTaken = 3,
LeaveAllocation = 0.0,
LeaveCarriedOver = 0.0,
RunningTotal = 0.0
};
var runningTotals = (new[] { temp }).ToList();
runningTotals.RemoveAt(0);
foreach(var l in leaveDetails.OrderBy(x => x.LeaveType).ThenBy(x => x.Year))
{
var s = runningTotals.LastOrDefault();
runningTotals.Add(new
{
l.Year,
l.LeaveType,
l.LeaveTaken,
l.LeaveAllocation,
l.LeaveCarriedOver,
RunningTotal = l.LeaveCarriedOver + (s == null || s.LeaveType != l.LeaveType ? 0 : s.RunningTotal)
});
}
This should also be O(n log n) or O(n) if you can pre-sort leaveDetails.
If I understand the question you want something like
decimal RunningTotal = 0;
var results = leaveDetails
.GroupBy(r=>r.LeaveType)
.Select(r=> new
{
Dummy = RunningTotal = 0 ,
results = r.OrderBy(o=>o.Year)
.Select(l => new
{
l.Year,
l.LeaveType ,
l.LeaveAllocation,
l.LeaveCarriedOver,
RunningTotal = (RunningTotal = RunningTotal + l.LeaveCarriedOver )
})
})
.SelectMany(a=>a.results).ToList();
This is basically using the Select<TSource, TResult> overload to calculate the running balance, but first grouped by LeaveType so we can reset the RunningTotal for every LeaveType, and then ungrouped at the end.
You have to use Window Function Sum here. Which is not supported by EF Core and earlier versions of EF. So, just write SQL and run it via Dapper
SELECT
l.Year,
l.LeaveType,
l.LeaveTaken,
l.LeaveAllocation,
l.LeaveCarriedOver,
SUM(l.LeaveCarriedOver) OVER (PARTITION BY l.Year, l.LeaveType) AS RunningTotal
FROM leaveDetails l
Or, if you are using EF Core, use package linq2db.EntityFrameworkCore
var leaveDetails1 = from l in leaveDetails
select new
{
l.Year,
l.LeaveType,
l.LeaveTaken,
l.LeaveAllocation,
l.LeaveCarriedOver,
RunningTotal = Sql.Ext.Sum(l.LeaveCarriedOver).Over().PartitionBy(l.Year, l.LeaveType).ToValue()
};
// switch to alternative LINQ translator
leaveDetails1 = leaveDetails1.ToLinqToDB();

LINQ: Collapsing a series of strings into a set of "ranges"

I have an array of strings similar to this (shown on separate lines to illustrate the pattern):
{ "aa002","aa003","aa004","aa005","aa006","aa007", // note that aa008 is missing
"aa009"
"ba023","ba024","ba025"
"bb025",
"ca002","ca003",
"cb004",
...}
...and the goal is to collapse those strings into this comma-separated string of "ranges":
"aa002-aa007,aa009,ba023-ba025,bb025,ca002-ca003,cb004, ... "
I want to collapse them so I can construct a URL. There are hundreds of elements, but I can still convey all the information if I collapse them this way - putting them all into a URL "longhand" (it has to be a GET, not a POST) isn't feasible.
I've had the idea to separate them into groups using the first two characters as the key - but does anyone have any clever ideas for collapsing those sequences (without gaps) into ranges? I'm struggling with it, and everything I've come up with looks like spaghetti.
So the first thing that you need to do is parse the strings. It's important to have the alphabetic prefix and the integer value separately.
Next you want to group the items on the prefix.
For each of the items in that group, you want to order them by number, and then group items while the previous value's number is one less than the current item's number. (Or, put another way, while the previous item plus one is equal to the current item.)
Once you've grouped all of those items you want to project that group out to a value based on that range's prefix, as well as the first and last number. No other information from these groups is needed.
We then flatten the list of strings for each group into just a regular list of strings, since once we're all done there is no need to separate out ranges from different groups. This is done using SelectMany.
When that's all said and done, that, translated into code, is this:
public static IEnumerable<string> Foo(IEnumerable<string> data)
{
return data.Select(item => new
{
Prefix = item.Substring(0, 2),
Number = int.Parse(item.Substring(2))
})
.GroupBy(item => item.Prefix)
.SelectMany(group => group.OrderBy(item => item.Number)
.GroupWhile((prev, current) =>
prev.Number + 1 == current.Number)
.Select(range =>
RangeAsString(group.Key,
range.First().Number,
range.Last().Number)));
}
The GroupWhile method can be implemented like so:
public static IEnumerable<IEnumerable<T>> GroupWhile<T>(
this IEnumerable<T> source, Func<T, T, bool> predicate)
{
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
yield break;
List<T> list = new List<T>() { iterator.Current };
T previous = iterator.Current;
while (iterator.MoveNext())
{
if (!predicate(previous, iterator.Current))
{
yield return list;
list = new List<T>();
}
list.Add(iterator.Current);
previous = iterator.Current;
}
yield return list;
}
}
And then the simple helper method to convert each range into a string:
private static string RangeAsString(string prefix, int start, int end)
{
if (start == end)
return prefix + start;
else
return string.Format("{0}{1}-{0}{2}", prefix, start, end);
}
Here's a LINQ version without the need to add new extension methods:
var data2 = data.Skip(1).Zip(data, (d1, d0) => new
{
value = d1,
jump = d1.Substring(0, 2) == d0.Substring(0, 2)
? int.Parse(d1.Substring(2)) - int.Parse(d0.Substring(2))
: -1,
});
var agg = new { f = data.First(), t = data.First(), };
var query2 =
data2
.Aggregate(new [] { agg }.ToList(), (a, x) =>
{
var last = a.Last();
if (x.jump == 1)
{
a.RemoveAt(a.Count() - 1);
a.Add(new { f = last.f, t = x.value, });
}
else
{
a.Add(new { f = x.value, t = x.value, });
}
return a;
});
var query3 =
from q in query2
select (q.f) + (q.f == q.t ? "" : "-" + q.t);
I get these results:

Categories