Linq group by Chunks [duplicate] - c#

Let's take a class called Cls:
public class Cls
{
public int SequenceNumber { get; set; }
public int Value { get; set; }
}
Now, let's populate some collection with following elements:
Sequence
Number Value
======== =====
1 9
2 9
3 15
4 15
5 15
6 30
7 9
What I need to do, is to enumerate over Sequence Numbers and check if the next element has the same value. If yes, values are aggregated and so, desired output is as following:
Sequence Sequence
Number Number
From To Value
======== ======== =====
1 2 9
3 5 15
6 6 30
7 7 9
How can I perform this operation using LINQ query?

You can use Linq's GroupBy in a modified version which groups only if the two items are adjacent, then it's easy as:
var result = classes
.GroupAdjacent(c => c.Value)
.Select(g => new {
SequenceNumFrom = g.Min(c => c.SequenceNumber),
SequenceNumTo = g.Max(c => c.SequenceNumber),
Value = g.Key
});
foreach (var x in result)
Console.WriteLine("SequenceNumFrom:{0} SequenceNumTo:{1} Value:{2}", x.SequenceNumFrom, x.SequenceNumTo, x.Value);
DEMO
Result:
SequenceNumFrom:1 SequenceNumTo:2 Value:9
SequenceNumFrom:3 SequenceNumTo:5 Value:15
SequenceNumFrom:6 SequenceNumTo:6 Value:30
SequenceNumFrom:7 SequenceNumTo:7 Value:9
This is the extension method to to group adjacent items:
public static IEnumerable<IGrouping<TKey, TSource>> GroupAdjacent<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
{
TKey last = default(TKey);
bool haveLast = false;
List<TSource> list = new List<TSource>();
foreach (TSource s in source)
{
TKey k = keySelector(s);
if (haveLast)
{
if (!k.Equals(last))
{
yield return new GroupOfAdjacent<TSource, TKey>(list, last);
list = new List<TSource>();
list.Add(s);
last = k;
}
else
{
list.Add(s);
last = k;
}
}
else
{
list.Add(s);
last = k;
haveLast = true;
}
}
if (haveLast)
yield return new GroupOfAdjacent<TSource, TKey>(list, last);
}
}
and the class used:
public class GroupOfAdjacent<TSource, TKey> : IEnumerable<TSource>, IGrouping<TKey, TSource>
{
public TKey Key { get; set; }
private List<TSource> GroupList { get; set; }
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return ((System.Collections.Generic.IEnumerable<TSource>)this).GetEnumerator();
}
System.Collections.Generic.IEnumerator<TSource> System.Collections.Generic.IEnumerable<TSource>.GetEnumerator()
{
foreach (var s in GroupList)
yield return s;
}
public GroupOfAdjacent(List<TSource> source, TKey key)
{
GroupList = source;
Key = key;
}
}

You can use this linq query
Demo
var values = (new[] { 9, 9, 15, 15, 15, 30, 9 }).Select((x, i) => new { x, i });
var query = from v in values
let firstNonValue = values.Where(v2 => v2.i >= v.i && v2.x != v.x).FirstOrDefault()
let grouping = firstNonValue == null ? int.MaxValue : firstNonValue.i
group v by grouping into v
select new
{
From = v.Min(y => y.i) + 1,
To = v.Max(y => y.i) + 1,
Value = v.Min(y => y.x)
};

MoreLinq provides this functionality out of the box
It's called GroupAdjacent and is implemented as extension method on IEnumerable:
Groups the adjacent elements of a sequence according to a specified key selector function.
enumerable.GroupAdjacent(e => e.Key)
There is even a Nuget "source" package that contains only that method, if you don't want to pull in an additional binary Nuget package.
The method returns an IEnumerable<IGrouping<TKey, TValue>>, so its output can be processed in the same way output from GroupBy would be.

You can do it like this:
var all = new [] {
new Cls(1, 9)
, new Cls(2, 9)
, new Cls(3, 15)
, new Cls(4, 15)
, new Cls(5, 15)
, new Cls(6, 30)
, new Cls(7, 9)
};
var f = all.First();
var res = all.Skip(1).Aggregate(
new List<Run> {new Run {From = f.SequenceNumber, To = f.SequenceNumber, Value = f.Value} }
, (p, v) => {
if (v.Value == p.Last().Value) {
p.Last().To = v.SequenceNumber;
} else {
p.Add(new Run {From = v.SequenceNumber, To = v.SequenceNumber, Value = v.Value});
}
return p;
});
foreach (var r in res) {
Console.WriteLine("{0} - {1} : {2}", r.From, r.To, r.Value);
}
The idea is to use Aggregate creatively: starting with a list consisting of a single Run, examine the content of the list we've got so far at each stage of aggregation (the if statement in the lambda). Depending on the last value, either continue the old run, or start a new one.
Here is a demo on ideone.

I was able to accomplish it by creating a custom extension method.
static class Extensions {
internal static IEnumerable<Tuple<int, int, int>> GroupAdj(this IEnumerable<Cls> enumerable) {
Cls start = null;
Cls end = null;
int value = Int32.MinValue;
foreach (Cls cls in enumerable) {
if (start == null) {
start = cls;
end = cls;
continue;
}
if (start.Value == cls.Value) {
end = cls;
continue;
}
yield return Tuple.Create(start.SequenceNumber, end.SequenceNumber, start.Value);
start = cls;
end = cls;
}
yield return Tuple.Create(start.SequenceNumber, end.SequenceNumber, start.Value);
}
}
Here's the implementation:
static void Main() {
List<Cls> items = new List<Cls> {
new Cls { SequenceNumber = 1, Value = 9 },
new Cls { SequenceNumber = 2, Value = 9 },
new Cls { SequenceNumber = 3, Value = 15 },
new Cls { SequenceNumber = 4, Value = 15 },
new Cls { SequenceNumber = 5, Value = 15 },
new Cls { SequenceNumber = 6, Value = 30 },
new Cls { SequenceNumber = 7, Value = 9 }
};
Console.WriteLine("From To Value");
Console.WriteLine("===== ===== =====");
foreach (var item in items.OrderBy(i => i.SequenceNumber).GroupAdj()) {
Console.WriteLine("{0,-5} {1,-5} {2,-5}", item.Item1, item.Item2, item.Item3);
}
}
And the expected output:
From To Value
===== ===== =====
1 2 9
3 5 15
6 6 30
7 7 9

Here is an implementation without any helper methods:
var grp = 0;
var results =
from i
in
input.Zip(
input.Skip(1).Concat(new [] {input.Last ()}),
(n1, n2) => Tuple.Create(
n1, (n2.Value == n1.Value) ? grp : grp++
)
)
group i by i.Item2 into gp
select new {SequenceNumFrom = gp.Min(x => x.Item1.SequenceNumber),SequenceNumTo = gp.Max(x => x.Item1.SequenceNumber), Value = gp.Min(x => x.Item1.Value)};
The idea is:
Keep track of your own grouping indicator, grp.
Join each item of the collection to the next item in the collection (via Skip(1) and Zip).
If the Values match, they are in the same group; otherwise, increment grp to signal the start of the next group.

Untested dark magic follows. The imperative version seems like it would be easier in this case.
IEnumerable<Cls> data = ...;
var query = data
.GroupBy(x => x.Value)
.Select(g => new
{
Value = g.Key,
Sequences = g
.OrderBy(x => x.SequenceNumber)
.Select((x,i) => new
{
x.SequenceNumber,
OffsetSequenceNumber = x.SequenceNumber - i
})
.GroupBy(x => x.OffsetSequenceNumber)
.Select(g => g
.Select(x => x.SequenceNumber)
.OrderBy(x => x)
.ToList())
.ToList()
})
.SelectMany(x => x.Sequences
.Select(s => new { First = s.First(), Last = s.Last(), x.Value }))
.OrderBy(x => x.First)
.ToList();

Let me propose another option, which yields lazily both sequence of groups and
elements inside groups.
Demonstration in .NET Fiddle
Implementation:
public static class EnumerableExtensions
{
public static IEnumerable<IGrouping<TKey, TSource>> GroupAdjacent<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey>? comparer = null)
{
var comparerOrDefault = comparer ?? EqualityComparer<TKey>.Default;
using var iterator = new Iterator<TSource>(source.GetEnumerator());
iterator.MoveNext();
while (iterator.HasCurrent)
{
var key = keySelector(iterator.Current);
var elements = YieldAdjacentElements(iterator, key, keySelector, comparerOrDefault);
yield return new Grouping<TKey, TSource>(key, elements);
while (iterator.HasCurrentWithKey(key, keySelector, comparerOrDefault))
{
iterator.MoveNext();
}
}
}
static IEnumerable<TSource> YieldAdjacentElements<TKey, TSource>(
Iterator<TSource> iterator,
TKey key,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> comparer)
{
while (iterator.HasCurrentWithKey(key, keySelector, comparer))
{
yield return iterator.Current;
iterator.MoveNext();
}
}
private static bool HasCurrentWithKey<TKey, TSource>(
this Iterator<TSource> iterator,
TKey key,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> comparer) =>
iterator.HasCurrent && comparer.Equals(keySelector(iterator.Current), key);
private sealed class Grouping<TKey, TElement> : IGrouping<TKey, TElement>
{
public Grouping(TKey key, IEnumerable<TElement> elements)
{
Key = key;
Elements = elements;
}
public TKey Key { get; }
public IEnumerable<TElement> Elements { get; }
public IEnumerator<TElement> GetEnumerator() => Elements.GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() => Elements.GetEnumerator();
}
private sealed class Iterator<T> : IDisposable
{
private readonly IEnumerator<T> _enumerator;
public Iterator(IEnumerator<T> enumerator)
{
_enumerator = enumerator;
}
public bool HasCurrent { get; private set; }
public T Current => _enumerator.Current;
public void MoveNext()
{
HasCurrent = _enumerator.MoveNext();
}
public void Dispose()
{
_enumerator.Dispose();
}
}
}
Notice, that it is impossible to achieve such level of laziness with regular GroupBy operation, since it needs to look through the whole collection before yielding the first group.
Particularly, in my case migration from GroupBy to GroupAdjacent in connection with lazy handling of whole pipeline helped to resolve memory consumption issues for large sequences.
In general, GroupAdjacent can be used as lazy and more efficient alternative of GroupBy, provided that input collection satisfies condition, that keys are sorted (or at least not fragmented), and provided that all operations in pipeline are lazy.

Related

How to find consecutive same values items as a Linq group

var schedules = new List<Item>{
new Item { Id=1, Name = "S" },
new Item { Id=2, Name = "P" },
new Item { Id=3, Name = "X" },
new Item { Id=4, Name = "X" },
new Item { Id=5, Name = "P" },
new Item { Id=6, Name = "P" },
new Item { Id=7, Name = "P" },
new Item { Id=8, Name = "S" }
};
I want to select same values and same orders in a new list like this:
var groupedAndSelectedList = new List<List<Item>>{
new List<Item> {
new Item { Id=3, Name = "X" },
new Item { Id=4, Name = "X" },
},
new List<Item> {
new Item { Id=5, Name = "P" },
new Item { Id=6, Name = "P" },
new Item { Id=7, Name = "P" },
}
}
If item is single like new Item { Id=3, Name = "A" } I do not need to get it.
Group by selects all X or P elements in list. But I want if items stands after or before another item.
Is this possible using linq?
What you're looking for here is a GroupWhile<T> method.
Credit to user L.B for the solution. Go give his original answer an UpDoot
https://stackoverflow.com/a/20469961/30155
var schedules = new List<Item>{
new Item { Id=1, Name = "S" },
new Item { Id=2, Name = "P" },
new Item { Id=3, Name = "X" },
new Item { Id=4, Name = "X" },
new Item { Id=5, Name = "P" },
new Item { Id=6, Name = "P" },
new Item { Id=7, Name = "P" },
new Item { Id=8, Name = "S" }
};
var results = schedules
.GroupWhile((preceding, next) => preceding.Name == next.Name)
//Group items, while the next is equal to the preceding one
.Where(s => s.Count() > 1)
//Only include results where the generated sublist have more than 1 element.
.ToList();
foreach (var sublist in results)
{
foreach (Item i in sublist)
{
Console.WriteLine($"{i.Name} - {i.Id}");
}
Console.WriteLine("");
}
Console.ReadLine();
You can add the implementation as an Extension Method to all IEnumerable<T> like so.
public static class Extensions
{
public static IEnumerable<IEnumerable<T>> GroupWhile<T>(this IEnumerable<T> seq, Func<T, T, bool> condition)
{
T prev = seq.First();
List<T> list = new List<T>() { prev };
foreach (T item in seq.Skip(1))
{
if (condition(prev, item) == false)
{
yield return list;
list = new List<T>();
}
list.Add(item);
prev = item;
}
yield return list;
}
}
You can do it by maintaining the count of items that you found so far. This helps you find consecutive items, because the value of count(name) - index is invariant for them:
IDictionary<string,int> count = new Dictionary<string,int>();
var groups = schedules
.Select((s, i) => new {
Item = s
, Index = i
})
.GroupBy(p => {
var name = p.Item.Name;
int current;
if (!count.TryGetValue(name, out current)) {
current = 0;
count.Add(name, current);
}
count[name] = current + 1;
return new { Name = name, Order = current - p.Index };
})
.Select(g => g.ToList())
.Where(g => g.Count > 1)
.ToList();
This produces the desired output for your example:
{ Item = Id=3 Name=X, Index = 2 }
{ Item = Id=4 Name=X, Index = 3 }
-----
{ Item = Id=5 Name=P, Index = 4 }
{ Item = Id=6 Name=P, Index = 5 }
{ Item = Id=7 Name=P, Index = 6 }
Demo.
Note: If Order = current - p.Index expression looks a little like "magic", consider removing the final Select and Where clauses, and enumerating group keys.
#dasblinkenlight has provided an answer that just uses LINQ. Any answer using purely existing LINQ methods may be ugly, may perform poorly, and may not be highly reusable. (This is not a criticism of that answer. It's a criticism of LINQ.)
#eoin-campbell has provided an answer that uses a custom LINQ method. However, I think it can be improved upon to more closely match the capabilities of the existing LINQ GroupBy function, such as custom comparers (for when you need to do things like case-insensitive comparison of the keys). This Partition method below looks and feels like the GroupBy function but meets the requirement for consecutive items.
You can use this method to meet your goal by doing the following. Notice that it looks exactly like how you would write this if you didn't have the consecutivity requirement, but it's using Partition instead of GroupBy.
var partitionsWithMoreThan1 = schedules.Partition(o => o.Name)
.Where(p => p.Count() > 1)
.Select(p => p.ToList())
.ToList();
Here's the method:
static class EnumerableExtensions
{
/// <summary>
/// Partitions the elements of a sequence into smaller collections according to a specified
/// key selector function, optionally comparing the keys by using a specified comparer.
/// Unlike GroupBy, this method does not produce a single collection for each key value.
/// Instead, this method produces a collection for each consecutive set of matching keys.
/// </summary>
/// <typeparam name="TSource">The type of the elements of <paramref name="source"/>.</typeparam>
/// <typeparam name="TKey">The type of the key returned by <paramref name="keySelector"/>.</typeparam>
/// <param name="source">An <see cref="IEnumerable{T}"/> whose elements to partition.</param>
/// <param name="keySelector">A function to extract the key for each element.</param>
/// <param name="comparer">An <see cref="IEqualityComparer{T}"/> to compare keys.</param>
/// <returns>
/// An <b>IEnumerable{IGrouping{TKey, TSource}}</b> in C#
/// or <b>IEnumerable(Of IGrouping(Of TKey, TSource))</b> in Visual Basic
/// where each <see cref="IGrouping{TKey,TElement}"/> object contains a collection of objects and a key.
/// </returns>
public static IEnumerable<IGrouping<TKey, TSource>> Partition<TKey, TSource>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer = null)
{
if (comparer == null)
comparer = EqualityComparer<TKey>.Default;
using (var enumerator = source.GetEnumerator())
{
if (enumerator.MoveNext())
{
var item = enumerator.Current;
var partitionKey = keySelector(item);
var itemsInPartition = new List<TSource> {item};
var lastPartitionKey = partitionKey;
while (enumerator.MoveNext())
{
item = enumerator.Current;
partitionKey = keySelector(item);
if (comparer.Equals(partitionKey, lastPartitionKey))
{
itemsInPartition.Add(item);
}
else
{
yield return new Grouping<TKey, TSource>(lastPartitionKey, itemsInPartition);
itemsInPartition = new List<TSource> {item};
lastPartitionKey = partitionKey;
}
}
yield return new Grouping<TKey, TSource>(lastPartitionKey, itemsInPartition);
}
}
}
// it's a shame there's no ready-made public implementation that will do this
private class Grouping<TKey, TSource> : IGrouping<TKey, TSource>
{
public Grouping(TKey key, List<TSource> items)
{
_items = items;
Key = key;
}
public TKey Key { get; }
public IEnumerator<TSource> GetEnumerator()
{
return _items.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return _items.GetEnumerator();
}
private readonly List<TSource> _items;
}
}
Based on the comment clarifications (the question is really unclear now), I think this is what is needed.
It uses an extension method that groups runs of keys together, GroupByRuns, that is based on a GroupByWhile the groups by testing consecutive items, which is based on ScanPair, which is a variation of my APL inspired Scan operator that is like Aggregate, but returns intermediate results, and uses a ValueTuple (Key, Value) to pair keys with values along the way.
public static IEnumerable<IGrouping<int, TRes>> GroupByRuns<T, TKey, TRes>(this IEnumerable<T> src, Func<T,TKey> keySelector, Func<T,TRes> resultSelector, IEqualityComparer<TKey> cmp = null) {
cmp = cmp ?? EqualityComparer<TKey>.Default;
return src.GroupByWhile((prev, cur) => cmp.Equals(keySelector(prev), keySelector(cur)), resultSelector);
}
public static IEnumerable<IGrouping<int, T>> GroupByRuns<T, TKey>(this IEnumerable<T> src, Func<T,TKey> keySelector) => src.GroupByRuns(keySelector, e => e);
public static IEnumerable<IGrouping<int, T>> GroupByRuns<T>(this IEnumerable<T> src) => src.GroupByRuns(e => e, e => e);
public static IEnumerable<IGrouping<int, TRes>> GroupByWhile<T, TRes>(this IEnumerable<T> src, Func<T,T,bool> testFn, Func<T,TRes> resultFn) =>
src.ScanPair(1, (kvp, cur) => testFn(kvp.Value, cur) ? kvp.Key : kvp.Key + 1)
.GroupBy(kvp => kvp.Key, kvp => resultFn(kvp.Value));
public static IEnumerable<(TKey Key, T Value)> ScanPair<T, TKey>(this IEnumerable<T> src, TKey seedKey, Func<(TKey Key, T Value),T,TKey> combineFn) {
using (var srce = src.GetEnumerator()) {
if (srce.MoveNext()) {
var prevkv = (seedKey, srce.Current);
while (srce.MoveNext()) {
yield return prevkv;
prevkv = (combineFn(prevkv, srce.Current), srce.Current);
}
yield return prevkv;
}
}
}
I realize this is a lot of extension code, but by using the general ScanPair base, you can build other specialized grouping methods, such as GroupBySequential.
Now you just GroupByRuns of Name and select the runs with more than one member, then convert each run to a List and the whole thing to a List:
var ans = schedules.GroupByRuns(s => s.Name)
.Where(sg => sg.Count() > 1)
.Select(sg => sg.ToList())
.ToList();
NOTE: For #Aominè, who had an interesting take on optimizing Count() > 1 using Take(2).Count() or #MichaelGunter using Skip(1).Any(), after GroupBy the sub-groups (internal type Grouping) each implement IList and the Count() method just gets the count directly from the Grouping.count field.

LINQ: Group by index and value [duplicate]

This question already has answers here:
linq group by contiguous blocks
(5 answers)
Closed 4 years ago.
Lets say I have an list of strings with the following values:
["a","a","b","a","a","a","c","c"]
I want to execute a linq query that will group into 4 groups:
Group 1: ["a","a"] Group 2: ["b"] Group 3: ["a","a","a"] Group 4:
["c","c"]
Basically I want to create 2 different groups for the value "a" because they are not coming from the same "index sequence".
Anyone has a LINQ solution for this?
You just need key other than items of array
var x = new string[] { "a", "a", "a", "b", "a", "a", "c" };
int groupId = -1;
var result = x.Select((s, i) => new
{
value = s,
groupId = (i > 0 && x[i - 1] == s) ? groupId : ++groupId
}).GroupBy(u => new { groupId });
foreach (var item in result)
{
Console.WriteLine(item.Key);
foreach (var inner in item)
{
Console.WriteLine(" => " + inner.value);
}
}
Here is the result: Link
Calculate the "index sequence" first, then do your group.
private class IndexedData
{
public int Sequence;
public string Text;
}
string[] data = [ "a", "a", "b" ... ]
// Calculate "index sequence" for each data element.
List<IndexedData> indexes = new List<IndexedData>();
foreach (string s in data)
{
IndexedData last = indexes.LastOrDefault() ?? new IndexedData();
indexes.Add(new IndexedData
{
Text = s,
Sequence = (last.Text == s
? last.Sequence
: last.Sequence + 1)
});
}
// Group by "index sequence"
var grouped = indexes.GroupBy(i => i.Sequence)
.Select(g => g.Select(i => i.Text));
This is a naive foreach implementation where whole dataset ends up in memory (probably not an issue for you since you do GroupBy):
public static IEnumerable<List<string>> Split(IEnumerable<string> values)
{
var result = new List<List<string>>();
foreach (var value in values)
{
var currentGroup = result.LastOrDefault();
if (currentGroup?.FirstOrDefault()?.Equals(value) == true)
{
currentGroup.Add(value);
}
else
{
result.Add(new List<string> { value });
}
}
return result;
}
Here comes a slightly complicated implementation with foreach and yield return enumerator state machine which keeps only current group in memory - this is probably how this would be implemented on framework level:
EDIT: This is apparently also the way MoreLINQ does it.
public static IEnumerable<List<string>> Split(IEnumerable<string> values)
{
var currentValue = default(string);
var group = (List<string>)null;
foreach (var value in values)
{
if (group == null)
{
currentValue = value;
group = new List<string> { value };
}
else if (currentValue.Equals(value))
{
group.Add(value);
}
else
{
yield return group;
currentValue = value;
group = new List<string> { value };
}
}
if (group != null)
{
yield return group;
}
}
And this is a joke version using LINQ only, it is basically the same as the first one but is slightly harder to understand (especially since Aggregate is not the most frequently used LINQ method):
public static IEnumerable<List<string>> Split(IEnumerable<string> values)
{
return values.Aggregate(
new List<List<string>>(),
(lists, str) =>
{
var currentGroup = lists.LastOrDefault();
if (currentGroup?.FirstOrDefault()?.Equals(str) == true)
{
currentGroup.Add(str);
}
else
{
lists.Add(new List<string> { str });
}
return lists;
},
lists => lists);
}
Using an extension method based on the APL scan operator, that is like Aggregate but returns intermediate results paired with source values:
public static IEnumerable<KeyValuePair<TKey, T>> ScanPair<T, TKey>(this IEnumerable<T> src, TKey seedKey, Func<KeyValuePair<TKey, T>, T, TKey> combine) {
using (var srce = src.GetEnumerator()) {
if (srce.MoveNext()) {
var prevkv = new KeyValuePair<TKey, T>(seedKey, srce.Current);
while (srce.MoveNext()) {
yield return prevkv;
prevkv = new KeyValuePair<TKey, T>(combine(prevkv, srce.Current), srce.Current);
}
yield return prevkv;
}
}
}
You can create extension methods for grouping by consistent runs:
public static IEnumerable<IGrouping<int, TResult>> GroupByRuns<TElement, TKey, TResult>(this IEnumerable<TElement> src, Func<TElement, TKey> key, Func<TElement, TResult> result, IEqualityComparer<TKey> cmp = null) {
cmp = cmp ?? EqualityComparer<TKey>.Default;
return src.ScanPair(0,
(kvp, cur) => cmp.Equals(key(kvp.Value), key(cur)) ? kvp.Key : kvp.Key + 1)
.GroupBy(kvp => kvp.Key, kvp => result(kvp.Value));
}
public static IEnumerable<IGrouping<int, TElement>> GroupByRuns<TElement, TKey>(this IEnumerable<TElement> src, Func<TElement, TKey> key) => src.GroupByRuns(key, e => e);
public static IEnumerable<IGrouping<int, TElement>> GroupByRuns<TElement>(this IEnumerable<TElement> src) => src.GroupByRuns(e => e, e => e);
public static IEnumerable<IEnumerable<TResult>> Runs<TElement, TKey, TResult>(this IEnumerable<TElement> src, Func<TElement, TKey> key, Func<TElement, TResult> result, IEqualityComparer<TKey> cmp = null) =>
src.GroupByRuns(key, result).Select(g => g.Select(s => s));
public static IEnumerable<IEnumerable<TElement>> Runs<TElement, TKey>(this IEnumerable<TElement> src, Func<TElement, TKey> key) => src.Runs(key, e => e);
public static IEnumerable<IEnumerable<TElement>> Runs<TElement>(this IEnumerable<TElement> src) => src.Runs(e => e, e => e);
And using the simplest version, you can get either an IEnumerable<IGrouping>>:
var ans1 = src.GroupByRuns();
Or a version that dumps the IGrouping (and its Key) for an IEnumerable:
var ans2 = src.Runs();

How can I route Observable values to different Subscribers?

This is all just pseudo code...
Ok here is my scenario, I have an incoming data stream that gets parsed into packets.
I have an IObservable<Packets> Packets
Each packet has a Packet ID, i.e. 1, 2, 3, 4
I want to create observables that only receive a specific ID.
so I do:
Packets.Where(p=>p.Id == 1)
for example... that gives me an IObservable<Packets> that only gives me packets of Id 1.
I may have several of these:
Packets.Where(p=>p.Id == 2)
Packets.Where(p=>p.Id == 3)
Packets.Where(p=>p.Id == 4)
Packets.Where(p=>p.Id == 5)
This essentially works, but the more Ids I want to select the more processing is required, i.e. the p=>p.Id will be run for every single Id, even after a destination Observable has been found.
How can I do the routing so that it is more efficient, something analogous:
Dictionary listeners;
listeners.GetValue(packet.Id).OnDataReceived(packet)
so that as soon as an id is picked up by one of my IObservables, then none of the others get to see it?
Updates
Added an extension based on Lee Campbell's groupby suggestion:
public static class IObservableExtensions
{
class RouteTable<TKey, TSource>
{
public static readonly ConditionalWeakTable<IObservable<TSource>, IObservable<IGroupedObservable<TKey, TSource>>> s_routes = new ConditionalWeakTable<IObservable<TSource>, IObservable<IGroupedObservable<TKey, TSource>>>();
}
public static IObservable<TSource> Route<TKey, TSource>(this IObservable<TSource> source, Func<TSource, TKey> selector, TKey id)
{
var grouped = RouteTable<TKey, TSource>.s_routes.GetValue(source, s => s.GroupBy(p => selector(p)).Replay().RefCount());
return grouped.Where(e => e.Key.Equals(id)).SelectMany(e => e);
}
}
It would be used like this:
Subject<Packet> packetSubject = new Subject<Packet>();
var packets = packetSubject.AsObservable();
packets.Route((p) => p.Id, 5).Subscribe((p) =>
{
Console.WriteLine("5");
});
packets.Route((p) => p.Id, 4).Subscribe((p) =>
{
Console.WriteLine("4");
});
packets.Route((p) => p.Id, 3).Subscribe((p) =>
{
Console.WriteLine("3");
});
packetSubject.OnNext(new Packet() { Id = 1 });
packetSubject.OnNext(new Packet() { Id = 2 });
packetSubject.OnNext(new Packet() { Id = 3 });
packetSubject.OnNext(new Packet() { Id = 4 });
packetSubject.OnNext(new Packet() { Id = 5 });
packetSubject.OnNext(new Packet() { Id = 4 });
packetSubject.OnNext(new Packet() { Id = 3 });
output is:
3, 4, 5, 4, 3
It only checks the Id for every group when it sees a new packet id.
Here's an operator that I wrote quite some time ago, but I think it does what you're after. I still think that a simple .Where is probably better - even with multiple subscribers.
Nevertheless, I wanted a .ToLookup for observables that operates like the same operator for enumerables.
It isn't memory efficient, but it implements IDisposable so that it can be cleaned up afterwards. It also isn't thread-safe so a little hardening might be required.
Here it is:
public static class ObservableEx
{
public static IObservableLookup<K, V> ToLookup<T, K, V>(this IObservable<T> source, Func<T, K> keySelector, Func<T, V> valueSelector, IScheduler scheduler)
{
return new ObservableLookup<T, K, V>(source, keySelector, valueSelector, scheduler);
}
internal class ObservableLookup<T, K, V> : IDisposable, IObservableLookup<K, V>
{
private IDisposable _subscription = null;
private readonly Dictionary<K, ReplaySubject<V>> _lookups = new Dictionary<K, ReplaySubject<V>>();
internal ObservableLookup(IObservable<T> source, Func<T, K> keySelector, Func<T, V> valueSelector, IScheduler scheduler)
{
_subscription = source.ObserveOn(scheduler).Subscribe(
t => this.GetReplaySubject(keySelector(t)).OnNext(valueSelector(t)),
ex => _lookups.Values.ForEach(rs => rs.OnError(ex)),
() => _lookups.Values.ForEach(rs => rs.OnCompleted()));
}
public void Dispose()
{
if (_subscription != null)
{
_subscription.Dispose();
_subscription = null;
_lookups.Values.ForEach(rs => rs.Dispose());
_lookups.Clear();
}
}
private ReplaySubject<V> GetReplaySubject(K key)
{
if (!_lookups.ContainsKey(key))
{
_lookups.Add(key, new ReplaySubject<V>());
}
return _lookups[key];
}
public IObservable<V> this[K key]
{
get
{
if (_subscription == null) throw new ObjectDisposedException("ObservableLookup");
return this.GetReplaySubject(key).AsObservable();
}
}
}
}
public interface IObservableLookup<K, V> : IDisposable
{
IObservable<V> this[K key] { get; }
}
You would use it like this:
IObservable<Packets> Packets = ...
IObservableLookup<int, Packets> lookup = Packets.ToLookup(p => p.Id, p => p, Scheduler.Default);
lookup[1].Subscribe(p => { });
lookup[2].Subscribe(p => { });
// etc
The nice thing with this is that you can subscribe to values by key before a value with that key has been produced by the source observable.
Don't forget to call lookup.Dispose() when done to clean up the resources.
I would suggest looking at GroupBy and then checking if there is a performance pay off. I assume there is, but is it significant?
Packets.GroupBy(p=>p.Id)
Example code with tests on how to use GroupBy as a type of router
var scheduler = new TestScheduler();
var source = scheduler.CreateColdObservable(
ReactiveTest.OnNext(100, 1),
ReactiveTest.OnNext(200, 2),
ReactiveTest.OnNext(300, 3),
ReactiveTest.OnNext(400, 4),
ReactiveTest.OnNext(500, 5),
ReactiveTest.OnNext(600, 6),
ReactiveTest.OnNext(700, 7),
ReactiveTest.OnNext(800, 8),
ReactiveTest.OnNext(900, 9),
ReactiveTest.OnNext(1000, 10),
ReactiveTest.OnNext(1100, 11)
);
var router = source.GroupBy(i=>i%4)
.Publish()
.RefCount();
var zerosObserver = scheduler.CreateObserver<int>();
router.Where(grp=>grp.Key == 0)
.Take(1)
.SelectMany(grp=>grp)
.Subscribe(zerosObserver);
var onesObserver = scheduler.CreateObserver<int>();
router.Where(grp => grp.Key == 1)
.Take(1)
.SelectMany(grp => grp)
.Subscribe(onesObserver);
var twosObserver = scheduler.CreateObserver<int>();
router.Where(grp => grp.Key == 2)
.Take(1)
.SelectMany(grp => grp)
.Subscribe(twosObserver);
var threesObserver = scheduler.CreateObserver<int>();
router.Where(grp => grp.Key == 3)
.Take(1)
.SelectMany(grp => grp)
.Subscribe(threesObserver);
scheduler.Start();
ReactiveAssert.AreElementsEqual(new[] { ReactiveTest.OnNext(400, 4), ReactiveTest.OnNext(800, 8)}, zerosObserver.Messages);
ReactiveAssert.AreElementsEqual(new[] { ReactiveTest.OnNext(100, 1), ReactiveTest.OnNext(500, 5), ReactiveTest.OnNext(900, 9)}, onesObserver.Messages);
ReactiveAssert.AreElementsEqual(new[] { ReactiveTest.OnNext(200, 2), ReactiveTest.OnNext(600, 6), ReactiveTest.OnNext(1000, 10) }, twosObserver.Messages);
ReactiveAssert.AreElementsEqual(new[] { ReactiveTest.OnNext(300, 3), ReactiveTest.OnNext(700, 7), ReactiveTest.OnNext(1100, 11)}, threesObserver.Messages);
You can use GroupBy to split the data. I would suggest you set up all subscriptions first and then activate your source. Doing so would result in one huge nested GroupBy query, but it is also possible to multi-cast your groups and subscribe to them individually. I wrote a small helper utility to do so below.
Because you still might want to add new routes after the source has been activated (done trough Connect), we use Replay to replay the groups. Replay is also a multi-cast operator so we wont need Publish to multi-cast.
public sealed class RouteData<TKey, TSource>
{
private IConnectableObservable<IGroupedObservable<TKey, TSource>> myRoutes;
public RouteData(IObservable<TSource> source, Func<TSource, TKey> keySelector)
{
this.myRoutes = source.GroupBy(keySelector).Replay();
}
public IDisposable Connect()
{
return this.myRoutes.Connect();
}
public IObservable<TSource> Get(TKey id)
{
return myRoutes.FirstAsync(e => e.Key.Equals(id)).Merge();
}
}
public static class myExtension
{
public static RouteData<TKey, TSource> RouteData<TKey, TSource>(this IObservable<TSource> source, Func<TSource, TKey> keySelector)
{
return new RouteData<TKey, TSource>(source, keySelector);
}
}
Example usage:
public class myPackage
{
public int Id;
public myPackage(int id)
{
this.Id = id;
}
}
class program
{
static void Main()
{
var source = new[] { 0, 1, 2, 3, 4, 5, 4, 3 }.ToObservable().Select(i => new myPackage(i));
var routes = source.RouteData(e => e.Id);
var subscription = new CompositeDisposable(
routes.Get(5).Subscribe(Console.WriteLine),
routes.Get(4).Subscribe(Console.WriteLine),
routes.Get(3).Subscribe(Console.WriteLine),
routes.Connect());
Console.ReadLine();
}
}
You may want to consider writing a custom IObserver that does your bidding. I've included an example below.
void Main()
{
var source = Observable.Range(1, 10);
var switcher = new Switch<int, int>(i => i % 3);
switcher[0] = Observer.Create<int>(val => Console.WriteLine($"{val} Divisible by three"));
source.Subscribe(switcher);
}
class Switch<TKey,TValue> : IObserver<TValue>
{
private readonly IDictionary<TKey, IObserver<TValue>> cases;
private readonly Func<TValue,TKey> idExtractor;
public IObserver<TValue> this[TKey decision]
{
get
{
return cases[decision];
}
set
{
cases[decision] = value;
}
}
public Switch(Func<TValue,TKey> idExtractor)
{
this.cases = new Dictionary<TKey, IObserver<TValue>>();
this.idExtractor = idExtractor;
}
public void OnNext(TValue next)
{
IObserver<TValue> nextCase;
if (cases.TryGetValue(idExtractor(next), out nextCase))
{
nextCase.OnNext(next);
}
}
public void OnError(Exception e)
{
foreach (var successor in cases.Values)
{
successor.OnError(e);
}
}
public void OnCompleted()
{
foreach (var successor in cases.Values)
{
successor.OnCompleted();
}
}
}
You would obviously need to implement idExtractor to extract the ids from your packet.

Merge elements in list by property

Context
I have a list of time intervals. Time interval type is HistoMesures.
Each HistoMesure is defined by a Debut (begin) property, a Fin (end) property, and a Commentaires (a little note) property.
My list is made in such a way that :
All HistoMesure are exclusive, I mean that they can't be overlapping each other.
The list is sorted by Debut, so by the beggining of the interval.
Edit : All HistoMesure are contiguous in this configuration.
Question
I want to merge (transform two little intervals in one big interval) all adjacent HistoMesure which have the same Commentaires. Currently I achieve this that way :
//sortedHistos type is List<HistoMesure>
int i = 0;
while (i < sortedHistos.Count - 1)
{
if (sortedHistos[i].Commentaires == sortedHistos[i + 1].Commentaires)
{
sortedHistos[i].Fin = sortedHistos[i + 1].Fin;
sortedHistos.RemoveAt(i + 1);
}
else
{
++i;
}
}
But I feel that it exists a more elegant way to do this, maybe with LINQ. Do you have any suggestion ?
Your solution works fine, I would keep it.
Don't try too hard to use LINQ if it doesn't match your requirements. LINQ is great to write queries (this is the Q of LINQ), not so great to modify existing lists.
This code will produce overlapping merged intervals. I.e. if you have intervals A, B, C where A and C have same commentaries, result will be AC, B:
var result = from h in sortedHistos
group h by h.Commentaires into g
select new HistoMesure {
Debut = g.First().Debut, // thus you have sorted entries
Fin = g.Last().Fin,
Commentaires = g.Key
};
You can use Min and Max if intervals are not sorted.
UPDATE: There is no default LINQ operator which allows you to create adjacent groups. But you always can create one. Here is IEnumerable<T> extension (I skipped arguments check):
public static IEnumerable<IGrouping<TKey, TElement>> GroupAdjacent<TKey, TElement>(
this IEnumerable<TElement> source, Func<TElement, TKey> keySelector)
{
using (var iterator = source.GetEnumerator())
{
if(!iterator.MoveNext())
{
yield break;
}
else
{
var comparer = Comparer<TKey>.Default;
var group = new Grouping<TKey, TElement>(keySelector(iterator.Current));
group.Add(iterator.Current);
while(iterator.MoveNext())
{
TKey key = keySelector(iterator.Current);
if (comparer.Compare(key, group.Key) != 0)
{
yield return group;
group = new Grouping<TKey, TElement>(key);
}
group.Add(iterator.Current);
}
if (group.Any())
yield return group;
}
}
}
This extension creates groups of adjacent elements which have same key value. Unfortunately all implementations of IGrouping in .NET are internal, so you need yours:
public class Grouping<TKey, TElement> : IGrouping<TKey, TElement>
{
private List<TElement> elements = new List<TElement>();
public Grouping(TKey key)
{
Key = key;
}
public TKey Key { get; private set; }
public IEnumerator<TElement> GetEnumerator()
{
return elements.GetEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public void Add(TElement element)
{
elements.Add(element);
}
}
And now your code will look like:
var result = sortedHistos.GroupAdjacent(h => h.Commentaries)
.Select(g => new HistoMesure {
Debut = g.Min(h => h.Debut),
Fin = g.Max(h => h.Fin),
Commentaries = g.Key
});
Using Linq and borrowing from this article to group by adjacent values, this should work:
Your query:
var filteredHistos = sortedHistos
.GroupAdjacent(h => h.Commentaires)
.Select(g => new HistoMesure
{
Debut = g.First().Debut,
Fin = g.Last().Fin,
Commentaires = g.Key
});
And copying from the article, the rest of the code to group by:
public class GroupOfAdjacent<TSource, TKey> : IEnumerable<TSource>, IGrouping<TKey, TSource>
{
public TKey Key { get; set; }
private List<TSource> GroupList { get; set; }
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return ((System.Collections.Generic.IEnumerable<TSource>)this).GetEnumerator();
}
System.Collections.Generic.IEnumerator<TSource> System.Collections.Generic.IEnumerable<TSource>.GetEnumerator()
{
foreach (var s in GroupList)
yield return s;
}
public GroupOfAdjacent(List<TSource> source, TKey key)
{
GroupList = source;
Key = key;
}
}
public static class LocalExtensions
{
public static IEnumerable<IGrouping<TKey, TSource>> GroupAdjacent<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
{
TKey last = default(TKey);
bool haveLast = false;
List<TSource> list = new List<TSource>();
foreach (TSource s in source)
{
TKey k = keySelector(s);
if (haveLast)
{
if (!k.Equals(last))
{
yield return new GroupOfAdjacent<TSource, TKey>(list, last);
list = new List<TSource>();
list.Add(s);
last = k;
}
else
{
list.Add(s);
last = k;
}
}
else
{
list.Add(s);
last = k;
haveLast = true;
}
}
if (haveLast)
yield return new GroupOfAdjacent<TSource, TKey>(list, last);
}
}
If I understood you correctly, you need something like this:
var mergedMesures = mesures
.GroupBy(_ => _.Commentaires)
.Select(_ => new HistoMesures
{
Debut = _.Min(item => item.Debut),
Fin = _.Max(item => item.Fin),
Commentaires = _.Key
});

how to count continuous values in a list with linq

I've a list like this:
var query = Enumerable.Range(0, 999).Select((n, index) =>
{
if (index <= 333 || index >=777)
return 0;
else if (index <= 666)
return 1;
else
return 2;
});
So, Can I find how much indexes have same value continuously? For example;
query[0]=query[1]=query[2]=query[3]... = 0, query[334] = 1, query[777]=query[778]... = 0.
First 334 indexes have 0, so first answer is 333. Also Last 223 indexes have 0, so second answer is 223..
How can I find these and their indexes?
Thanks in advance.
You can create extension for consecutive grouping of items by some key:
public static IEnumerable<IGrouping<TKey, T>> GroupConsecutive<T, TKey>(
this IEnumerable<T> source, Func<T, TKey> keySelector)
{
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
yield break;
else
{
List<T> list = new List<T>();
var comparer = Comparer<TKey>.Default;
list.Add(iterator.Current);
TKey groupKey = keySelector(iterator.Current);
while (iterator.MoveNext())
{
var key = keySelector(iterator.Current);
if (!list.Any() || comparer.Compare(groupKey, key) == 0)
{
list.Add(iterator.Current);
continue;
}
yield return new Group<TKey, T>(groupKey, list);
list = new List<T> { iterator.Current };
groupKey = key;
}
if (list.Any())
yield return new Group<TKey, T>(groupKey, list);
}
}
}
Of course you can return IEnumerable<IList<T>> but that will be a little different from concept of group, which you want to have, because you also want to know which value was used to group sequence of items. Unfortunately there is no public implementation of IGrouping<TKey, TElement> interface, and we should create our own:
public class Group<TKey, TElement> : IGrouping<TKey, TElement>
{
private TKey _key;
private IEnumerable<TElement> _group;
public Group(TKey key, IEnumerable<TElement> group)
{
_key = key;
_group = group;
}
public TKey Key
{
get { return _key; }
}
public IEnumerator<TElement> GetEnumerator()
{
return _group.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
Now usage is very simple:
var groups = query.GroupConsecutive(i => i) // produces groups
.Select(g => new { g.Key, Count = g.Count() }); // projection
Result:
[
{ Key: 0, Count: 334 },
{ Key: 1, Count: 333 },
{ Key: 2, Count: 110 },
{ Key: 0, Count: 222 }
]
Using the GroupConsecutive extension method from here you can just get the counts of each group:
query.GroupConsecutive((n1, n2) => n1 == n2)
.Select(g => new {Number = g.Key, Count = g.Count()})
public static IEnumerable<int> GetContiguousCounts<T>(this IEnumerable<T> l, IEqualityComparer<T> cmp)
{
var last = default(T);
var count = 0;
foreach (var e in l)
{
if (count > 0 && !cmp.Equals(e, last))
{
yield return count;
count = 0;
}
count++;
last = e;
}
if (count > 0)
yield return count;
}
public static IEnumerable<int> GetContiguousCounts<T>(this IEnumerable<T> l)
{
return GetContiguousCounts(l, EqualityComparer<T>.Default);
}
static void Main(string[] args)
{
var a = new[] { 1, 2, 2, 3, 3, 3 };
var b = a.GetContiguousCounts();
foreach (var x in b)
Console.WriteLine(x);
}
For the simple test case, it outputs 1, 2, 3. For your case 334, 333, 110, 222 (the last value is not 223 as you asked in your question, because you only have 999 elements, not 1000).
erm, how about this, most efficient implementation I can think of.
IEnuemrable<KeyValuePair<T, int>> RepeatCounter<T>(
IEnumerable<T> source,
IEqualityComparer<T> comparer = null)
{
var e = source.GetEnumerator();
if (!e.MoveNext())
{
yield break;
}
comparer = comparer ?? EqualityComparer<T>.Default;
var last = e.Current;
var count = 1;
while (e.MoveNext())
{
if (comparer.Equals(last, e.Current))
{
count++;
continue;
}
yield return new KeyValuePair<T, int>(last, count);
last = e.Current;
count = 1;
}
yield return new KeyValuePair<T, int>(last, count);
}
enumerates the sequence exactly once and only allocates variables when necessary.

Categories