LINQ Aggregate function up to current row

LINQ Aggregate function up to current row - c#

Assuming I have the following
var list = new []{
new { Price = 1000, IsFirst = true},
new { Price = 1100, IsFirst = false},
new { Price = 450, IsFirst = true},
new { Price = 300, IsFirst = false}
};
and I want to generate the following output:
Price IsFirst First Second Final
----------------------------------
1000 True 1000 0 1000
1100 False 0 1100 -100
450 True 450 0 350
300 False 0 300 50
Is it possible to have some sort of aggregate function processed up to current row? I like to have all the stuff in pure LINQ but as of now I have no other choice than manually iterating the list and sum the column conditionally.
var result = list.Select(x => new
{
Price = x.Price,
IsFirst = x.IsFirst,
First = x.IsFirst ? x.Price : 0,
Second = !x.IsFirst ? x.Price : 0,
Final = 0 // ???
}).ToList();
int sum = 0;
for(int i=0; i<result.Count(); i++)
{
sum += (result[i].IsFirst ? result[i].Price : - result[i].Price);
// updating Final value manually
}

The easiest way to do this is to use the Microsoft Reactive Extension Team's "Interactive Extension" method Scan. (Use NuGet and look for Ix-Main.)
var query =
list
.Scan(new
{
Price = 0,
IsFirst = true,
First = 0,
Second = 0,
Final = 0
}, (a, x) => new
{
Price = x.Price,
IsFirst = x.IsFirst,
First = x.IsFirst ? x.Price : 0,
Second = !x.IsFirst ? x.Price : 0,
Final = a.Final + (x.IsFirst ? x.Price : - x.Price)
});
This gives:
However, you can do it with the built-in Aggregate operator like this:
var query =
list
.Aggregate(new []
{
new
{
Price = 0,
IsFirst = true,
First = 0,
Second = 0,
Final = 0
}
}.ToList(), (a, x) =>
{
a.Add(new
{
Price = x.Price,
IsFirst = x.IsFirst,
First = x.IsFirst ? x.Price : 0,
Second = !x.IsFirst ? x.Price : 0,
Final = a.Last().Final + (x.IsFirst ? x.Price : - x.Price)
});
return a;
})
.Skip(1);
You get the same result.

What you want is called a running total.
As far as I know, there is no built-in method to do this in LINQ, but various people have written extension methods to do that. One that I quickly found online is this one:
Rollup Extension Method: Create Running Totals using LINQ to Objects
Based on this, it should be fairly easy to turn your code into an extension method that
resets the intermediate value when encountering an item with IsFirst = true,
otherwise decrements the value,
and yields it.

You can do something like this:
int final = 0;
var result = list.Select(x => new
{
Price = x.Price,
IsFirst = x.IsFirst,
First = x.IsFirst ? x.Price : 0,
Second = !x.IsFirst ? x.Price : 0,
Final = x.IsFirst ? final+=x.Price : final-=x.Price
}).ToList();
But you would need to define an integer (final) outside of the linq expression to keep track of the total sum.

Related

How can I compare two lists of uints and calculate the difference in any values?

I have two lists of uint type called firstReadOfMachineTotals and secondReadOfMachineTotals
I'm completely new to C# programming.
I would like to compare both lists and, if any of the values in the second list are higher than the first list, calculate by how much.
For instance...
firstReadOfMachineTotals = 10, 20, 4000, 554
secondReadOfMachineTotals = 10, 40, 4000, 554
I want to return '20' (based on the second item being 20 more than the equivalent in the first list).
Thanks
PS. There will never be more than one number different in the second list.

You can use a combination of Zip, Where:
var firstReadOfMachineTotals = new[]{ 10, 20, 4000, 554 };
var secondReadOfMachineTotals = new[]{ 10, 40, 4000, 554};
var result = firstReadOfMachineTotals.Zip(secondReadOfMachineTotals, (a,b) => b > a ? b - a : 0)
.Where(x => x > 0)
.OrderByDescending(x => x)
.FirstOrDefault();
Console.WriteLine(result); // output = 20
This method will default to 0 when all values are the same. If instead you wanted control of this default you could also do:
var firstReadOfMachineTotals = new[]{ 10, 20, 4000, 554 };
var secondReadOfMachineTotals = new[]{ 10, 40, 4000, 554};
var result = firstReadOfMachineTotals.Zip(secondReadOfMachineTotals, (a,b) => b > a ? b - a : 0)
.Where(x => x>0)
.DefaultIfEmpty(int.MinValue) // or whatever default you desire
.Max();
Console.WriteLine(result); // output = 20

You can index into the lists and simply take the difference of each element at the specified index then sum the difference to retrieve the result.
int result = Enumerable.Range(0, Math.Min(list1.Count, list2.Count))
.Select(i => list2[i] - list1[i] <= 0 ? 0 : list2[i] - list1[i]).Sum();

Use Zip:
var result = firstReadOfMachineTotals.Zip(secondReadOfMachineTotals,
(f, s) => s > f ? s - f : 0).Where(f => f > 0).DefaultIfEmpty(-1).Max();

The simplest solution will be to sort both the arrays
array1.sort()
array2.sort()
and compare each indexes and take action
for(int i=0;i<array1.lenght;i++)
{
if(array1[i] < array2[i])
{
// Specify your action.
}
}

Another way to do this is the following:
int difference = arr1
.Zip(arr2, (a, b) => (int?)Math.Max(b - a, 0))
.SingleOrDefault(d => d != 0) ?? 0;
It returns the difference if there is an element in the second collection which is larger than its corresponding element from the first collection.
If there isn't any, it returns zero.
Information to read:
LINQ Zip
LINQ FirstOrDefault
?? Operator
Nullable Types (int?)

Linq : Checking how many times the same value consecutively

it's my first question so if it's not quite clear , you can ask for extra information. Keep in mind that english is not my native language :).
I was wondering if it's possible to have an elegant way for next specification.
I think linq could be a possibility but i haven't got enough experience with the technology to get it to work:).
Remark this is not a homework assignment it's just a way to get an new angle to solve these kind of problems.
I've tried with the aggegrate function, maybe an action could help.
I want to keep track of:
the max times a value occurs in an array consecutively.
Per value it should display the maximum times the value occured consecutively
for example:
we have an array of 6 elements with elements either 0 or 1
0 , 0 , 0 , 1 , 1 ,0 result : 3 times 0 , 2 times 1
0 , 0 , 1 , 1 , 1 ,0 result : 2 times 0 , 3 times 1
0 , 1 , 0 , 1 , 1 ,0 result : 1 time 0 , 2 times 1
0 , 0 , 1 , 1 , 0 ,0 result : 2 times 0 , 2 times 1
Thanks in advance

I don't think Linq being a good way out; but a simple method will do:
// Disclamer: Dictionary can't have null key; so source must not coтtain nulls
private static Dictionary<T, int> ConsequentCount<T>(IEnumerable<T> source) {
if (null == source)
throw new ArgumentNullException("source");
Dictionary<T, int> result = new Dictionary<T, int>();
int count = -1;
T last = default(T);
foreach (T item in source) {
count = count < 0 || !object.Equals(last, item) ? 1 : count + 1;
last = item;
int v;
if (!result.TryGetValue(last, out v))
result.Add(last, count);
else if (v < count)
result[item] = count;
}
return result;
}
Tests:
int[][] source = new int[][] {
new[] { 0, 0, 0, 1, 1, 0 },
new[] { 0, 0, 1, 1, 1, 0 },
new[] { 0, 1, 0, 1, 1, 0 },
new[] { 0, 0, 1, 1, 0, 0 }, };
string report = string.Join(Environment.NewLine, source
.Select(array => $"{string.Join(" , ", array)} result : " +
string.Join(", ",
ConsequentCount(array)
.OrderBy(pair => pair.Key)
.Select(pair => $"{pair.Value} times {pair.Key}"))));
Console.Write(report);
Outcome:
0 , 0 , 0 , 1 , 1 , 0 result : 3 times 0, 2 times 1
0 , 0 , 1 , 1 , 1 , 0 result : 2 times 0, 3 times 1
0 , 1 , 0 , 1 , 1 , 0 result : 1 times 0, 2 times 1
0 , 0 , 1 , 1 , 0 , 0 result : 2 times 0, 2 times 1

You can write your own method for groupping by consecutive
public static class Extension
{
public static IEnumerable<IEnumerable<int>> GroupConsecutive(this IEnumerable<int> list)
{
var group = new List<int>();
foreach (var i in list)
{
if (group.Count == 0 || i - group[group.Count - 1] == 0)
group.Add(i);
else
{
yield return group;
group = new List<int> {i};
}
}
yield return group;
}
}
And then use it like that:
var groups = new[] { 0, 0, 1, 1, 0, 0 }.GroupConsecutive();
var maxGroupped = groups.GroupBy(i => i.First()).Select(i => new
{
i.Key,
Count = i.Max(j => j.Count())
});
foreach (var g in maxGroupped)
Console.WriteLine(g.Count + " times " + g.Key);

Here is a lazy inefficient way (seems like linear complexity to me):
int[] arr = { 0, 0, 0, 1, 1, 0 };
string str = string.Concat(arr); // "000110"
int max0 = str.Split('1').Max(s => s.Length); // 3
int max1 = str.Split('0').Max(s => s.Length); // 2
and here is the efficient O(n) version:
int[] arr = { 0, 1, 1, 0, 0, 0 };
int i1 = 0, i = 1;
int[] max = { 0, 0 };
for (; i < arr.Length; i++)
{
if (arr[i] != arr[i1])
{
if (i - i1 > max[arr[i1]]) max[arr[i1]] = i - i1;
i1 = i;
}
}
if (i - i1 > max[arr[i1]]) max[arr[i1]] = i - i1;
Debug.Print(max[0] + "," + max[1]); // "3,2"

Linq will be a bit ugly, but it is possible though, your choice of Aggregate is the way to go, but it won't be a one liner in any case,
Something like this will work,
static void Main(string[] args)
{
var list = new List<int>() { 0, 0, 1, 1, 1, 0, 0 };
var result = list.Aggregate(new
{
Last = (int?)null,
Counts = new Dictionary<int, int>(),
Max = new Dictionary<int, int>()
}, (context, current) =>
{
int count;
if (!context.Counts.TryGetValue(current, out count))
count = 1;
if (context.Last == current)
count += 1;
int lastMax;
context.Max.TryGetValue(current, out lastMax);
context.Max[current] = Math.Max(lastMax, count);
if (context.Last != current)
count = 1;
context.Counts[current] = count;
return new { Last = (int?)current, context.Counts, context.Max };
});
Console.WriteLine(string.Join(",", list) + " Result: ");
var output = string.Join(", ", result.Max.Select(x => string.Format("{0} times {1}", x.Value, x.Key)));
Console.WriteLine(output);
Console.ReadKey();
}

Like others said, performance wise Linq might not be the right tool for the job.
My linq only version would be:
from array in arrayOfArrays
let result = new {
Zeroes = array.TakeWhile(x => x == 0).Count(),
Ones = array.SkipWhile(x => x == 0).TakeWhile(x => x == 1).Count()
}
select $"{String.Join(", ", array)} result : {result.Zeroes} times 0, {result.Ones} times 1"
I'm not sure if Linq2Objects will be smart here to optimize the query internally. We ARE within the query iterating mutliple times over the array. So like i said in advance there may be a performance hit if you execute this over a lot of arrays. If anyone would care to check performance of this in regards to other non linq solutions.

First of all thanks to everyone who put the time and effort in answering the question.
I've choosen Dmitry Bychenko as a valid answer , he was the first to provide an answer , and it was an elegant answer.
Matthew diserves also credit because he has shown me how the aggregate function works with conditionals.
last but not least the answer of victor was the simplest one. I did enhance it to work with generics .
void Main()
{
var groups = new[] { 0, 0, 1, 1, 0,30,1,1,1,1,1 , 22, 22, 15,15,0,0,0,0,0,0 }.GroupConsecutive();
groups.Dump();
var maxGroupped = groups.GroupBy(i => i.First()).Select(i => new
{
i.Key,
Count = i.Max(j => j.Count())
});
foreach (var g in maxGroupped)
Console.WriteLine(g.Count + " times " + g.Key);
}
public static class Extension
{
public static IEnumerable<IEnumerable<T>> GroupConsecutive<T>(this IEnumerable<T> list)
{
var group = new List<T>();
foreach (var value in list)
{
if (group.Count == 0 || value.Equals(group[group.Count-1]))
group.Add(value);
else
{
yield return group;
group = new List<T> {value};
}
}
yield return group;
}

Using LINQ, how would you filter out all but one item of a particular criteria from a list?

I realize my title probably isn't very clear so here's an example:
I have a list of objects with two properties, A and B.
public class Item
{
public int A { get; set; }
public int B { get; set; }
}
var list = new List<Item>
{
new Item() { A = 0, B = 0 },
new Item() { A = 0, B = 1 },
new Item() { A = 1, B = 0 },
new Item() { A = 2, B = 0 },
new Item() { A = 2, B = 1 },
new Item() { A = 2, B = 2 },
new Item() { A = 3, B = 0 },
new Item() { A = 3, B = 1 },
}
Using LINQ, what's the most elegant way to collapse all the A = 2 items into the first A = 2 item and return along with all the other items? This would be the expected result.
var list = new List<Item>
{
new Item() { A = 0, B = 0 },
new Item() { A = 0, B = 1 },
new Item() { A = 1, B = 0 },
new Item() { A = 2, B = 0 },
new Item() { A = 3, B = 0 },
new Item() { A = 3, B = 1 },
}
I'm not a LINQ expert and already have a "manual" solution but I really like the expressiveness of LINQ and was curious to see if it could be done better.

How about:
var collapsed = list.GroupBy(i => i.A)
.SelectMany(g => g.Key == 2 ? g.Take(1) : g);
The idea is to first group them by A and then select those again (flattening it with .SelectMany) but in the case of the Key being the one we want to collapse, we just take the first entry with Take(1).

One way you can accomplish this is with GroupBy. Group the items by A, and use a SelectMany to project each group into a flat list again. In the SelectMany, check if A is 2 and if so Take(1), otherwise return all results for that group. We're using Take instead of First because the result has to be IEnumerable.
var grouped = list.GroupBy(g => g.A);
var collapsed = grouped.SelectMany(g =>
{
if (g.Key == 2)
{
return g.Take(1);
}
return g;
});

One possible solution (if you insist on LINQ):
int a = 2;
var output = list.GroupBy(o => o.A == a ? a.ToString() : Guid.NewGuid().ToString())
.Select(g => g.First())
.ToList();
Group all items with A=2 into group with key equal to 2, but all other items will have unique group key (new guid), so you will have many groups having one item. Then from each group we take first item.

Yet another way:
var newlist = list.Where (l => l.A != 2 ).ToList();
newlist.Add( list.First (l => l.A == 2) );

An alternative to other answers based on GroupBy can be Aggregate:
// Aggregate lets iterate a sequence and accumulate a result (the first arg)
var list2 = list.Aggregate(new List<Item>(), (result, next) => {
// This will add the item in the source sequence either
// if A != 2 or, if it's A == 2, it will check that there's no A == 2
// already in the resulting sequence!
if(next.A != 2 || !result.Any(item => item.A == 2)) result.Add(next);
return result;
});

What about this:
list.RemoveAll(l => l.A == 2 && l != list.FirstOrDefault(i => i.A == 2));
if you whould like more efficient way it would be:
var first = list.FirstOrDefault(i => i.A == 2);
list.RemoveAll(l => l.A == 2 && l != first);

Ravendb multimapping on the same set of documents to build query object?

I'm learning RavendDb by using it in a project and trying to do stuff. I don't have any background in SQL/relational db experience, which is why I find it easier to use map reduce and document databases.
I am attempting to make one static index to create an object holding the count the occurrence of 4 conditions fields instead of making 4 static indexes and combining the result after 4 database queries.
Here is the static index:
public class Client_ProductDeploymentSummary : AbstractIndexCreationTask<Product, ClientProductDeploymentResult>
{
public Client_ProductDeploymentSummary()
{
Map = products =>
from product in products
select new {
product.ClientName,
NumberProducts = 1,
NumberProductsWithCondition = 0,
NumberProductsWithoutCondition = 0,
NumberProductsConditionTestInconclusive = 0
};
Map = products =>
from product in products
where product.TestResults.Condition == true
select new
{
product.ClientName,
NumberProducts = 0,
NumberProductsWithCondition = 1,
NumberProductsWithoutCondition = 0,
NumberProductsConditionTestInconclusive = 0
};
Map = products =>
from product in products
where product.TestResults.Condition == false
select new
{
product.ClientName,
NumberProducts = 0,
NumberProductsWithCondition = 0,
NumberProductsWithoutCondition = 1,
NumberProductsConditionTestInconclusive = 0
};
Map = products =>
from product in products
where product.TestResults.Condition == null
select new
{
product.ClientName,
NumberProducts = 0,
NumberProductsWithCondition = 0,
NumberProductsWithoutCondition = 0,
NumberProductsConditionTestInconclusive = 1
};
Reduce = results =>
from result in results
group result by result.ClientName
into g
select new ClientProductDeploymentResult()
{
ClientName = g.Key,
NumberProducts = g.Sum(x => x.NumberProducts),
NumberProductsWithCondition = g.Sum(x => x.NumberProductsWithCondition),
NumberProductsWithoutCondition = g.Sum(x => x.NumberProductsWithoutCondition),
NumberProductsConditionTestInconclusive = g.Sum(x => x.NumberProductsConditionTestInconclusive)
};
}
}
I added the 4 variables to each select new statements to make the index compile and deploy in my unit test. I can't seem to use the AddMap(...) function as i've seen in some examples (i realize i'm just overwriting the Map variable). There are not so many Clients, in the 10s or 100s, but possibly many Products, definitely in the 1000s per client.
Is there a concise way to construct the intent of this index? Or is one map reduce for each field and combining the results in caller code the better way to go?

MultiMap indexes have a different base class. You would inherit from AbstractMultiMapIndexCreationTask to create a multimap index.
However, what you describe here is not suited for multimap. You use multimap when the data is coming from different source documents, not when the conditions are different. What you need is a single map statement that has your conditional logic inline.
Map = products =>
from product in products
select new {
product.ClientName,
NumberProducts = 1,
NumberProductsWithCondition = product.TestResults.Condition == true ? 1 : 0,
NumberProductsWithoutCondition = product.TestResults.Condition == false? 0 : 1,
NumberProductsConditionTestInconclusive = product.TestResults.Condition == null ? 1 : 0
};

Merge elements in IEnumerable according to a condition

I was looking for some fast and efficient method do merge items in array. This is my scenario. The collection is sorted by From. Adjacent element not necessarily differ by 1, that is there can be gaps between the last To and the next From, but they never overlap.
var list = new List<Range>();
list.Add(new Range() { From = 0, To = 1, Category = "AB" });
list.Add(new Range() { From = 2, To = 3, Category = "AB" });
list.Add(new Range() { From = 4, To = 5, Category = "AB" });
list.Add(new Range() { From = 6, To = 8, Category = "CD" });
list.Add(new Range() { From = 9, To = 11, Category = "AB" }); // 12 is missing, this is ok
list.Add(new Range() { From = 13, To = 15, Category = "AB" });
I would like the above collection to be merged in such way that the first three (this number can vary, from at least 2 elements to as many as the condition is satisfied) elements become one element. Cannot merge elements with different category.
new Range() { From = 0, To = 5, Category = "AB" };
So that the resulting collection would have 4 elements total.
0 - 5 AB
6 - 8 CD
9 - 11 AB // no merging here, 12 is missing
13 - 15 AB
I have a very large collection with over 2.000.000 items and I would like to this as efficiently as possible.

Here's a generic, reusable solution rather than an ad hoc, specific solution.
(Updated based on comments)
IEnumerable<T> Merge<T>(this IEnumerable<T> coll,
Func<T,T,bool> canBeMerged, Func<T,T,T>mergeItems)
{
using(IEnumerator<T> iter = col.GetEnumerator())
{
if (iter.MoveNext())
{
T lhs = iter.Current;
while(iter.MoveNext())
{
T rhs = iter.Current;
if (canBeMerged(lhs, rhs)
lhs=mergeItems(lhs, rhs);
else
{
yield return lhs;
lhs= rhs;
}
}
yield return lhs;
}
}
}
You will have to provide method to determine if the item can be merged, and to merge them.
These really should be part of your Range class, so it would be called like them:
list.Merge((l,r)=> l.IsFollowedBy(r), (l,r)=> l.CombineWith(r));
If you don't have these method, then you would have to call it like:
list.Merge((l,r)=> l.Category==r.Category && l.To +1 == r.From,
(l,r)=> new Range(){From = l.From, To=r.To, Category = l.Category});

Well, from the statement of the problem I think it is obvious that you cannot avoid iterating through the original collection of 2 million items:
var output = new List<Range>();
var currentFrom = list[0].From;
var currentTo = list[0].To;
var currentCategory = list[0].Category;
for (int i = 1; i < list.Count; i++)
{
var item = list[i];
if (item.Category == currentCategory && item.From == currentTo + 1)
currentTo = item.To;
else
{
output.Add(new Range { From = currentFrom, To = currentTo,
Category = currentCategory });
currentFrom = item.From;
currentTo = item.To;
currentCategory = item.Category;
}
}
output.Add(new Range { From = currentFrom, To = currentTo,
Category = currentCategory });
I’d be interested to see if there is a solution more optimised for performance.
Edit: I assumed that the input list is sorted. If it is not, I recommend sorting it first instead of trying to fiddle this into the algorithm. Sorting is only O(n log n), but if you tried to fiddle it in, you easily get O(n²), which is worse.
list.Sort((a, b) => a.From < b.From ? -1 : a.From > b.From ? 1 : 0);
As an aside, I wrote this solution because you asked for one that is performance-optimised. To this end, I didn’t make it generic, I didn’t use delegates, I didn’t use Linq extension methods, and I used local variables of primitive types and tried to avoid accessing object fields as much as possible.

Here's another one :
IEnumerable<Range> Merge(IEnumerable<Range> input)
{
input = input.OrderBy(r => r.Category).ThenBy(r => r.From).ThenBy(r => r.To).ToArray();
var ignored = new HashSet<Range>();
foreach (Range r1 in input)
{
if (ignored.Contains(r1))
continue;
Range tmp = r1;
foreach (Range r2 in input)
{
if (tmp == r2 || ignored.Contains(r2))
continue;
Range merged;
if (TryMerge(tmp, r2, out merged))
{
tmp = merged;
ignored.Add(r1);
ignored.Add(r2);
}
}
yield return tmp;
}
}
bool TryMerge(Range r1, Range r2, out Range merged)
{
merged = null;
if (r1.Category != r2.Category)
return false;
if (r1.To + 1 < r2.From || r2.To + 1 < r1.From)
return false;
merged = new Range
{
From = Math.Min(r1.From, r2.From),
To = Math.Max(r1.To, r2.To),
Category = r1.Category
};
return true;
}
You could use it directly:
var mergedList = Merge(list);
But that would be very inefficient it you have many items as the complexity is O(n²). However, since only items in the same category can be merged, you can group them by category and merge each group, then flatten the result:
var mergedList = list.GroupBy(r => r.Category)
.Select(g => Merge(g))
.SelectMany(g => g);

Assuming that the list is sorted -and- the ranges are non overlapping, as you have stated in the question, this will run in O(n) time:
var flattenedRanges = new List<Range>{new Range(list.First())};
foreach (var range in list.Skip(1))
{
if (flattenedRanges.Last().To + 1 == range.From && flattenedRanges.Last().Category == range.Category)
flattenedRanges.Last().To = range.To;
else
flattenedRanges.Add(new Range(range));
}
This is assuming you have a copy-constructor for Range
EDIT:
Here's an in-place algorithm:
for (int i = 1; i < list.Count(); i++)
{
if (list[i].From == list[i - 1].To+1 && list[i-1].Category == list[i].Category)
{
list[i - 1].To = list[i].To;
list.RemoveAt(i--);
}
}
EDIT:
Added the category check, and fixed the inplace version.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LINQ Aggregate function up to current row - c#

Related

How can I compare two lists of uints and calculate the difference in any values?

Linq : Checking how many times the same value consecutively

Using LINQ, how would you filter out all but one item of a particular criteria from a list?

Ravendb multimapping on the same set of documents to build query object?

Merge elements in IEnumerable according to a condition

Categories

Resources