Summing the previous values in an IEnumerable - c#

I have a sequence of numbers:
var seq = new List<int> { 1, 3, 12, 19, 33 };
and I want to transform that into a new sequence where the number is added to the preceding numbers to create a new sequence:
{ 1, 3, 12, 19, 33 } --> {1, 4, 16, 35, 68 }
I came up with the following, but I dislike the state variable 'count'. I also dislike the fact that I'm using the values Enumerable without acting on it.
int count = 1;
var summed = values.Select(_ => values.Take(count++).Sum());
How else could it be done?

This is a common pattern in functional programming which in F# is called scan. It's like C#'s Enumerable.Aggregate and F#'s fold except that it yields the intermediate results of the accumulator along with the final result. We can implement scan in C# nicely with an extension method:
public static IEnumerable<U> Scan<T, U>(this IEnumerable<T> input, Func<U, T, U> next, U state) {
yield return state;
foreach(var item in input) {
state = next(state, item);
yield return state;
}
}
And then use it as follows:
var seq = new List<int> { 1, 3, 12, 19, 33 };
var transformed = seq.Scan(((state, item) => state + item), 0).Skip(1);

"Pure" LINQ:
var result = seq.Select((a, i) => seq.Take(i + 1).Sum());
One more "pure" LINQ O(n):
var res = Enumerable.Range(0, seq.Count)
.Select(a => a == 0 ? seq[a] : seq[a] += seq[a - 1]);
One more LINQ, with state maintenance:
var tmp = 0;
var result = les.Select(a => { tmp += a; return tmp; });

Stephen Swensen's answer is great, scan is exactly what you need. There is another version of scan though that doesn't require a seed, which would be slightly more appropriate for your exact problem.
This version requires that your output element type is the same as your input element type, which it is in your case, and gives the advantage of not needing you to pass in a 0 and then Skip the first (0) result.
You can implement this version of scan in C# as follows:
public static IEnumerable<T> Scan<T>(this IEnumerable<T> Input, Func<T, T, T> Accumulator)
{
using (IEnumerator<T> enumerator = Input.GetEnumerator())
{
if (!enumerator.MoveNext())
yield break;
T state = enumerator.Current;
yield return state;
while (enumerator.MoveNext())
{
state = Accumulator(state, enumerator.Current);
yield return state;
}
}
}
And then use it as follows:
IEnumerable<int> seq = new List<int> { 1, 3, 12, 19, 33 };
IEnumerable<int> transformed = seq.Scan((state, item) => state + item);

var seq = new List<int> { 1, 3, 12, 19, 33 };
var summed = new List<int>();
seq.ForEach(i => summed.Add(i + summed.LastOrDefault()));

Just to offer another alternative, albeit not really LINQ, you could write a yield-based function to do the aggregation:
public static IEnumerable<int> SumSoFar(this IEnumerable<int> values)
{
int sumSoFar = 0;
foreach (int value in values)
{
sumSoFar += value;
yield return sumSoFar;
}
}
Like BrokenGlass's this makes only a single pass over the data although unlike his returns an iterator not a list.
(Annoyingly you can't easily make this generic on the numeric type in the list.)

To use Linq and only iterate over the list once you could use a custom aggregator:
class Aggregator
{
public List<int> List { get; set; }
public int Sum { get; set; }
}
..
var seq = new List<int> { 1, 3, 12, 19, 33 };
var aggregator = new Aggregator{ List = new List<int>(), Sum = 0 };
var aggregatorResult = seq.Aggregate(aggregator, (a, number) => { a.Sum += number; a.List.Add(a.Sum); return a; });
var result = aggregatorResult.List;

var seq = new List<int> { 1, 3, 12, 19, 33 };
for (int i = 1; i < seq.Count; i++)
{
seq[i] += seq[i-1];
}

All answers already posted are working fine. I just would to add two things :
MoreLinq already have a Scan method in it, with a lot of very useful functions that are not native in LINQ. It's a great library to know for this kind of applications so I would like to recall it.
Before to see the answers above, I wrote my own Scan :
public static IEnumerable<T> Scan<T>(IEnumerable<T> input, Func<T, T, T> accumulator)
{
if (!input.Any())
yield break;
T state = input.First();
yield return state;
foreach (var value in input.Skip(1))
{
state = accumulator(state, value);
yield return state;
}
}
Works well but very probably less efficient than using #Nathan Phillips version due to multiple accesses to the collection. Enumerator probably does it better.

Related

Cache LINQ query - IEnumerable.Skip()

Consider the following code
IEnumerable<Items> remainingItems = Items
var results = new List<List<Items>>();
var counter = 0;
while (remainingItems.Any())
{
var result = new List<Item>();
result.AddRange(remainingItems.TakeWhile(x => somePredicate(x, counter));
results.Add(result);
remainingItems = remainingItems.Skip(result.Count);
counter++;
}
If it's not clear whats happening, I'm taking an Ienumerable, and iterating through it till a predicate fails, putting all those items into one pile, and then continue iterating through the remaining items till the next predicate fails, and put all of those in a pile. Rinse, Wash, Repeat.
Now the bit I want to focus on here is the Ienumerable.Skip()
Since it uses delayed execution, it means I have to go through all the elements I've already skipped on each loop.
I could use ToList() to force it to evaluate, but then it needs to iterate through all the remaining items to do so, which is just as bad.
So what I really need is an IEnumerable, which does the skipping eagerly, and stores the first last point we were up to, to continue from there. So I need some function like:
IEnumerable.SkipAndCache(n) which allows me to access an IEnumerator starting at the nth item.
Any ideas?
You can use MoreLinq for that. There is an experimental function called Memoize which lazily caches the sequence. So the code will look like this:
while (remainingItems.Any())
{
var result = new List<Item>();
result.AddRange(remainingItems.TakeWhile(x => somePredicate(x, counter));
results.Add(result);
remainingItems = remainingItems.Skip(result.Count).Memoize();
counter++;
}
Here the result will not be materialized because it is still lazy evaluation:
remainingItems = remainingItems.Skip(result.Count).Memoize();
Here the remainingItems sequence will be evaluated and cached (the iterator will not go through all the elements like in ToList):
remainingItems.Any()
And here the cache will be used:
result.AddRange(remainingItems.TakeWhile(x => somePredicate(x, counter));
To use this method you need to add:
using MoreLinq.Experimental;
As we are skipping the result set in series why not use the for loop for the same like
for(int i = 0 ; i < result.Count ; i++){
//do some business logic and now i got X result
i = i + X
}
Yield might be useful, if I'm understanding your question correctly
public static IEnumerable<IEnumerable<T>> Test<T>(IEnumerable<T> source)
{
var items = new List<T>();
foreach (T item in source)
{
items.Add(item);
if (!SomePredicate(item))
{
yield return items;
items = new List<T>();
}
}
// if you want any remaining items to go into their own IEnumerable, even if there's no more fails
if (items.Count > 0)
{
yield return items;
}
}
Just as en example I made my fail condition to be !item % 10 == 0 and passed in values 0 to 1000 to the above method. I get 101 IEnumerables containing 0 in the first, and the rest containing 1 to 10, 11 to 20, etc. etc.
You could write a simple extension method to help with this:
public static IEnumerable<IEnumerable<T>> PartitionBy<T>(this IEnumerable<T> sequence, Func<T, int, bool> predicate)
{
var block = new List<T>();
int index = 0;
foreach (var item in sequence)
{
if (predicate(item, index++))
{
block.Add(item);
}
else if (block.Count > 0)
{
yield return block.ToList(); // Return a copy so the caller can't change our local list.
block.Clear();
}
}
if (block.Count > 0)
yield return block; // No need for a copy since we've finished using our local list.
}
(As an extension method, you need to put that in a static class.)
Then you can use it to partition data like so. For this example, we will partition a list of ints into partitions where the list element's value is equal to its index:
static void Main()
{ // 0 1 2 3 4 5 6 7 8 9
var ints = new List<int> {0, 1, 0, 3, 4, 5, 0, 0, 8, 9};
var result = ints.PartitionBy(((item, index) => item == index)); // Items where value == index.
foreach (var seq in result)
Console.WriteLine(string.Join(", ", seq));
// Output is:
// 0, 1
// 3, 4, 5
// 8, 9
}
Note that this implementation skips over elements that do not match the predicate.
Here's an alternative, more complicated implementation that doesn't make a copy of the data:
class Indexer
{
public int Index;
public bool Finished;
}
public static IEnumerable<IEnumerable<T>> PartitionBy<T>(this IEnumerable<T> sequence, Func<T, int, bool> predicate)
{
var iter = sequence.GetEnumerator();
var indexer = new Indexer();
while (!indexer.Finished)
{
yield return nextBlock(iter, predicate, indexer);
}
}
static IEnumerable<T> nextBlock<T>(IEnumerator<T> iter, Func<T, int, bool> predicate, Indexer indexer)
{
int index = indexer.Index;
bool any = false;
while (true)
{
if (!iter.MoveNext())
{
indexer.Finished = true;
yield break;
}
if (predicate(iter.Current, index++))
{
any = true;
yield return iter.Current;
}
else
{
indexer.Index = index;
if (any)
yield break;
}
}
}

Split a List<int> into groups of consecutive numbers [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I have a sorted List<int> like { 1, 2, 3, 4, 6, 7, 9 }
I want to split that into some groups -- every group has consecutive number like this: { {1, 2, 3, 4}, {6, 7}, {9} }
I know I can use for loop to traverse the list, and compare between the current value and previous value, then decide whether append to last group or create a new group. But I want to find a "pretty" way to do it. Maybe use LINQ?
Edit:
I found a python code from project more-itertools:
def consecutive_groups(iterable, ordering=lambda x: x):
for k, g in groupby(
enumerate(iterable), key=lambda x: x[0] - ordering(x[1])
):
yield map(itemgetter(1), g)
Here is an extension method taken from http://bugsquash.blogspot.com/2010/01/grouping-consecutive-integers-in-c.html
public static IEnumerable<IEnumerable<int>> GroupConsecutive(this IEnumerable<int> list) {
var group = new List<int>();
foreach (var i in list) {
if (group.Count == 0 || i - group[group.Count - 1] <= 1)
group.Add(i);
else {
yield return group;
group = new List<int> {i};
}
}
yield return group;
}
You can use it like this:
var numbers = new[] { 1, 2, 3, 4, 6, 7, 9 };
var groups = numbers.GroupConsecutive();
Once C# 7 is released, this can made even more efficient with the use of Span to avoid creating new lists.
This updated version does it without allocating any lists.
public static class EnumerableExtensions
{
public static IEnumerable<IEnumerable<int>> GroupConsecutive(this IEnumerable<int> list)
{
if (list.Any())
{
var count = 1;
var startNumber = list.First();
int last = startNumber;
foreach (var i in list.Skip(1))
{
if (i < last)
{
throw new ArgumentException($"List is not sorted.", nameof(list));
}
if (i - last == 1)
count += 1;
else
{
yield return Enumerable.Range(startNumber, count);
startNumber = i;
count = 1;
}
last = i;
}
yield return Enumerable.Range(startNumber, count);
}
}
}
Here is my suggestion for an extension method using iterators:
public static IEnumerable<IEnumerable<int>> GroupConsecutive(this IEnumerable<int> src) {
var more = false; // compiler can't figure out more is assigned before use
IEnumerable<int> ConsecutiveSequence(IEnumerator<int> csi) {
int prevCurrent;
do
yield return (prevCurrent = csi.Current);
while ((more = csi.MoveNext()) && csi.Current-prevCurrent == 1);
}
var si = src.GetEnumerator();
if (si.MoveNext()) {
do
// have to process to compute outside level
yield return ConsecutiveSequence(si).ToList();
while (more);
}
}
I must say the Python algorithm is very impressive, here is a C# implementation of it:
public static IEnumerable<IEnumerable<int>> GroupConsecutive(this IEnumerable<int> iterable, Func<int,int> ordering = null) {
ordering = ordering ?? (n => n);
foreach (var tg in iterable
.Select((e, i) => (e, i))
.GroupBy(t => t.i - ordering(t.e)))
yield return tg.Select(t => t.e);
}
Here is a C# one-line implementation of the Python algorithm:
public static IEnumerable<IEnumerable<int>> GroupConsecutive(this IEnumerable<int> iterable, Func<int,int> ordering = null) =>
iterable
.Select((e, i) => (e, i))
.GroupBy(
t => t.i - (ordering ?? (n => n))(t.e),
(k,tg) => tg.Select(t => t.e));
NOTE: C# 8 with nullable annotation context enabled should use Func<int,int>? in both Python methods. You could also use ??= to assign ordering.
The correct implementation of #Bradley Uffner and #NetMage non allocating iterator method is like this:
public static IEnumerable<IEnumerable<int>> GroupConsecutive(this IEnumerable<int> source)
{
using (var e = source.GetEnumerator())
{
for (bool more = e.MoveNext(); more; )
{
int first = e.Current, last = first, next;
while ((more = e.MoveNext()) && (next = e.Current) > last && next - last == 1)
last = next;
yield return Enumerable.Range(first, last - first + 1);
}
}
}
It works correctly even for unordered input, iterates the source sequence only once and handles correctly all corner cases and integer over/underflow. The only case it fails is for consecutive range count bigger than int.MaxValue.
But looking at your follow up question, probably the following implementation will better suit your needs:
public static IEnumerable<(int First, int Last)> ConsecutiveRanges(this IEnumerable<int> source)
{
using (var e = source.GetEnumerator())
{
for (bool more = e.MoveNext(); more;)
{
int first = e.Current, last = first, next;
while ((more = e.MoveNext()) && (next = e.Current) > last && next - last == 1)
last = next;
yield return (first, last);
}
}
}
Try the following code;
public static IEnumerable<IEnumerable<int>> GroupConsecutive(this IEnumerable<int> source)
{
if (!source.Any()) { yield break;}
var prev = source.First();
var grouped = new List<int>(){ prev };
source = source.Skip(1);
while (source.Any())
{
var current = source.First();
if (current - prev != 1)
{
yield return grouped;
grouped = new List<int>();
}
grouped.Add(current);
source = source.Skip(1);
prev = current;
}
yield return grouped;
}
var numbers = new[] { 1, 2, 3, 4, 6, 7, 9 };
var result = numbers.GroupConsecutive();
Output
1,2,3,4
6,7
9

Use Linq to break a list by special values?

I'm trying to use Linq to convert IEnumerable<int> to IEnumerable<List<int>> - the input stream will be separated by special value 0.
IEnumerable<List<int>> Parse(IEnumerable<int> l)
{
l.Select(x => {
.....; //?
return new List<int>();
});
}
var l = new List<int> {0,1,3,5,0,3,4,0,1,4,0};
Parse(l) // returns {{1,3,5}, {3, 4}, {1,4}}
How to implement it using Linq instead of imperative looping?
Or is Linq not good for this requirement because the logic depends on the order of the input stream?
Simple loop would be good option.
Alternatives:
Enumerable.Aggregate and start new list on 0
Write own extension similar to Create batches in linq or Use LINQ to group a sequence of numbers with no gaps
Aggregate sample
var result = list.Aggregate(new List<List<int>>(),
(sum,current) => {
if(current == 0)
sum.Add(new List<int>());
else
sum.Last().Add(current);
return sum;
});
Note: this is only sample of the approach working for given very friendly input like {0,1,2,0,3,4}.
One can even make aggregation into immutable lists but that will look insane with basic .Net types.
Here's an answer that lazily enumerates the source enumerable, but eagerly enumerates the contents of each returned list between zeroes. It properly throws upon null input or upon being given a list that does not start with a zero (though allowing an empty list through--that's really an implementation detail you have to decide on). It does not return an extra and empty list at the end like at least one other answer's possible suggestions does.
public static IEnumerable<List<int>> Parse(this IEnumerable<int> source, int splitValue = 0) {
if (source == null) {
throw new ArgumentNullException(nameof (source));
}
using (var enumerator = source.GetEnumerator()) {
if (!enumerator.MoveNext()) {
return Enumerable.Empty<List<int>>();
}
if (enumerator.Current != splitValue) {
throw new ArgumentException(nameof (source), $"Source enumerable must begin with a {splitValue}.");
}
return ParseImpl(enumerator, splitValue);
}
}
private static IEnumerable<List<int>> ParseImpl(IEnumerator<int> enumerator, int splitValue) {
var list = new List<int>();
while (enumerator.MoveNext()) {
if (enumerator.Current == splitValue) {
yield return list;
list = new List<int>();
}
else {
list.Add(enumerator.Current);
}
}
if (list.Any()) {
yield return list;
}
}
This could easily be adapted to be generic instead of int, just change Parse to Parse<T>, change int to T everywhere, and use a.Equals(b) or !a.Equals(b) instead of a == b or a != b.
You could create an extension method like this:
public static IEnumerable<IEnumerable<T>> SplitBy<T>(this IEnumerable<T> source, T value)
{
using (var e = source.GetEnumerator())
{
if (e.MoveNext())
{
var list = new List<T> { };
//In case the source doesn't start with 0
if (!e.Current.Equals(value))
{
list.Add(e.Current);
}
while (e.MoveNext())
{
if ( !e.Current.Equals(value))
{
list.Add(e.Current);
}
else
{
yield return list;
list = new List<T> { };
}
}
//In case the source doesn't end with 0
if (list.Count>0)
{
yield return list;
}
}
}
}
Then, you can do the following:
var l = new List<int> { 0, 1, 3, 5, 0, 3, 4, 0, 1, 4, 0 };
var result = l.SplitBy(0);
You could use GroupBy with a counter.
var list = new List<int> {0,1,3,5,0,3,4,0,1,4,0};
int counter = 0;
var result = list.GroupBy(x => x==0 ? counter++ : counter)
.Select(g => g.TakeWhile(x => x!=0).ToList())
.Where(l => l.Any());
Edited to fix possibility of zeroes within numbers
Here is a semi-LINQ solution:
var l = new List<int> {0,1,3,5,0,3,4,0,1,4,0};
string
.Join(",", l.Select(x => x == 0 ? "|" : x.ToString()))
.Split(new[] { '|' }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries));
This is probably not preferable to using a loop due to performance and other reasons, but it should work.

how to get the min and max of a multi dim array (with one dim is specified) in c#

I have a multi dimensions array ins C# defined as follow:
double[,,] myArray=new double[10000,10000,3];
I find the maximum value of this array when the last dim is for example is 0. something g such as this:
double m1=myArray[?,?,0].Max();
How can I calculate it using Linq or other methods?
If you'd like to get the max across some subset of the array you can do this:
double m1 =
(from x in Enumerable.Range(0, myArray.GetLength(0))
from y in Enumerable.Range(0, myArray.GetLength(1))
select myArray[x, y, 0])
.Max();
If you'd like to get the max across all elements in the array you can just do this
double m1 = myArray.Cast<double>().Max();
However, you can get a significant performance boost by implementing your own extension method like this:
public static IEnumerable<T> Flatten<T>(this T[,,] arry) {
foreach(T x in arry) yield return item;
}
myArray.Flatten().Max();
EDIT 2
Note, this extension works equally well for the hideous but valid case of a non zero based array,
var nonZeroBasedArray = Array.CreateInstance(
typeof(double),
new[] { 4, 4, 3 },
new[] { -2, -2, 0 });
Note that the first two dimensions range from -2 to 1 inclusive (yikes.) This test code illustrates that the Flatten extension still works.
var count = 0;
foreach (var element in nonZeroBasedArray.Flatten<double>(null, null, 0))
{
Console.Write(string.Join(", ", element.Key));
Console.WriteLine(": {0}", element.Value);
}
Console.WriteLine("Count: {0}", count);
Console.ReadKey();
EDIT
So, using the extension defined below you can do
var myArray = new double[10000,10000,3];
var ordered = myArray.Flatten<double>(null, null, 0).OrderBy(p => p.Value);
var maxZ0 = ordered.First();
var minZ0 = ordered.Last();
The element type is a KeyValuePair<IEnumerable<int>, T> so the Key allows you to back reference to the original array.
Ok, here is a generic extension, intially inspired by p.s.w.g's answer
If you start with Eric Lippert's inspirational CartesianProduct<T> extension,
public static IEnumerable<IEnumerable<T>> CartesianProduct<T>(
this IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[]
{
item
}));
}
Then you make a function to generate the bound sets of a multi dimensional array that allows you to specify fixed values for some dimensions.
private static IEnumerable<IEnumerable<int>> GetBoundSequences(
Array array,
int?[] definedBounds)
{
for (var rank = 0; rank < array.Rank; rank++)
{
var defined = definedBounds.ElementAtorDefault(rank);
if (defined.HasValue)
{
yield return new[] { defined.Value };
}
else
{
var min = array.GetLowerBound(rank);
yield return Enumerable.Range(
min,
(array.GetUpperBound(rank) - min) + 1);
}
}
}
you can use both to make a flexible Flatten<T> extension, that works with arrays of any rank.
public static IEnumerable<KeyValuePair<IEnumerable<int>, T>> Flatten<T>(
this Array array,
params int?[] definedBounds)
{
var coordSets = GetBoundSequences(array, definedBounds).CartesianProduct();
foreach (var coordSet in coordSets)
{
var coords = coordSet.ToArray();
var value = (T)array.GetValue(coords);
yield return new KeyValuePair<IEnumerable<int>, T>(
coords,
value);
}
}
Once you have this, you can do something like
var myArray = new double[10000,10000,3];
var maxZ0 = myArray.Flatten<double>(null, null, 0).Max(p => p.Value);
This is good because it lazily iterates and converts only the elements specified.
Try this
double[,,] myArray = new double[10000, 10000, 3];
double max = myArray.Cast<double>().Max();

Detecting sequence of at least 3 sequential numbers from a given list

I have a list of numbers e.g. 21,4,7,9,12,22,17,8,2,20,23
I want to be able to pick out sequences of sequential numbers (minimum 3 items in length), so from the example above it would be 7,8,9 and 20,21,22,23.
I have played around with a few ugly sprawling functions but I am wondering if there is a neat LINQ-ish way to do it.
Any suggestions?
UPDATE:
Many thanks for all the responses, much appriciated. Im am currently having a play with them all to see which would best integrate into our project.
It strikes me that the first thing you should do is order the list. Then it's just a matter of walking through it, remembering the length of your current sequence and detecting when it's ended. To be honest, I suspect that a simple foreach loop is going to be the simplest way of doing that - I can't immediately think of any wonderfully neat LINQ-like ways of doing it. You could certainly do it in an iterator block if you really wanted to, but bear in mind that ordering the list to start with means you've got a reasonably "up-front" cost anyway. So my solution would look something like this:
var ordered = list.OrderBy(x => x);
int count = 0;
int firstItem = 0; // Irrelevant to start with
foreach (int x in ordered)
{
// First value in the ordered list: start of a sequence
if (count == 0)
{
firstItem = x;
count = 1;
}
// Skip duplicate values
else if (x == firstItem + count - 1)
{
// No need to do anything
}
// New value contributes to sequence
else if (x == firstItem + count)
{
count++;
}
// End of one sequence, start of another
else
{
if (count >= 3)
{
Console.WriteLine("Found sequence of length {0} starting at {1}",
count, firstItem);
}
count = 1;
firstItem = x;
}
}
if (count >= 3)
{
Console.WriteLine("Found sequence of length {0} starting at {1}",
count, firstItem);
}
EDIT: Okay, I've just thought of a rather more LINQ-ish way of doing things. I don't have the time to fully implement it now, but:
Order the sequence
Use something like SelectWithPrevious (probably better named SelectConsecutive) to get consecutive pairs of elements
Use the overload of Select which includes the index to get tuples of (index, current, previous)
Filter out any items where (current = previous + 1) to get anywhere that counts as the start of a sequence (special-case index=0)
Use SelectWithPrevious on the result to get the length of the sequence between two starting points (subtract one index from the previous)
Filter out any sequence with length less than 3
I suspect you need to concat int.MinValue on the ordered sequence, to guarantee the final item is used properly.
EDIT: Okay, I've implemented this. It's about the LINQiest way I can think of to do this... I used null values as "sentinel" values to force start and end sequences - see comments for more details.
Overall, I wouldn't recommend this solution. It's hard to get your head round, and although I'm reasonably confident it's correct, it took me a while thinking of possible off-by-one errors etc. It's an interesting voyage into what you can do with LINQ... and also what you probably shouldn't.
Oh, and note that I've pushed the "minimum length of 3" part up to the caller - when you have a sequence of tuples like this, it's cleaner to filter it out separately, IMO.
using System;
using System.Collections.Generic;
using System.Linq;
static class Extensions
{
public static IEnumerable<TResult> SelectConsecutive<TSource, TResult>
(this IEnumerable<TSource> source,
Func<TSource, TSource, TResult> selector)
{
using (IEnumerator<TSource> iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
TSource prev = iterator.Current;
while (iterator.MoveNext())
{
TSource current = iterator.Current;
yield return selector(prev, current);
prev = current;
}
}
}
}
class Test
{
static void Main()
{
var list = new List<int> { 21,4,7,9,12,22,17,8,2,20,23 };
foreach (var sequence in FindSequences(list).Where(x => x.Item1 >= 3))
{
Console.WriteLine("Found sequence of length {0} starting at {1}",
sequence.Item1, sequence.Item2);
}
}
private static readonly int?[] End = { null };
// Each tuple in the returned sequence is (length, first element)
public static IEnumerable<Tuple<int, int>> FindSequences
(IEnumerable<int> input)
{
// Use null values at the start and end of the ordered sequence
// so that the first pair always starts a new sequence starting
// with the lowest actual element, and the final pair always
// starts a new one starting with null. That "sequence at the end"
// is used to compute the length of the *real* final element.
return End.Concat(input.OrderBy(x => x)
.Select(x => (int?) x))
.Concat(End)
// Work out consecutive pairs of items
.SelectConsecutive((x, y) => Tuple.Create(x, y))
// Remove duplicates
.Where(z => z.Item1 != z.Item2)
// Keep the index so we can tell sequence length
.Select((z, index) => new { z, index })
// Find sequence starting points
.Where(both => both.z.Item2 != both.z.Item1 + 1)
.SelectConsecutive((start1, start2) =>
Tuple.Create(start2.index - start1.index,
start1.z.Item2.Value));
}
}
Jon Skeet's / Timwi's solutions are the way to go.
For fun, here's a LINQ query that does the job (very inefficiently):
var sequences = input.Distinct()
.GroupBy(num => Enumerable.Range(num, int.MaxValue - num + 1)
.TakeWhile(input.Contains)
.Last()) //use the last member of the consecutive sequence as the key
.Where(seq => seq.Count() >= 3)
.Select(seq => seq.OrderBy(num => num)); // not necessary unless ordering is desirable inside each sequence.
The query's performance can be improved slightly by loading the input into a HashSet (to improve Contains), but that will still not produce a solution that is anywhere close to efficient.
The only bug I am aware of is the possibility of an arithmetic overflow if the sequence contains negative numbers of large magnitude (we cannot represent the count parameter for Range). This would be easy to fix with a custom static IEnumerable<int> To(this int start, int end) extension-method. If anyone can think of any other simple technique of dodging the overflow, please let me know.
EDIT:
Here's a slightly more verbose (but equally inefficient) variant without the overflow issue.
var sequences = input.GroupBy(num => input.Where(candidate => candidate >= num)
.OrderBy(candidate => candidate)
.TakeWhile((candidate, index) => candidate == num + index)
.Last())
.Where(seq => seq.Count() >= 3)
.Select(seq => seq.OrderBy(num => num));
I think my solution is more elegant and simple, and therefore easier to verify as correct:
/// <summary>Returns a collection containing all consecutive sequences of
/// integers in the input collection.</summary>
/// <param name="input">The collection of integers in which to find
/// consecutive sequences.</param>
/// <param name="minLength">Minimum length that a sequence should have
/// to be returned.</param>
static IEnumerable<IEnumerable<int>> ConsecutiveSequences(
IEnumerable<int> input, int minLength = 1)
{
var results = new List<List<int>>();
foreach (var i in input.OrderBy(x => x))
{
var existing = results.FirstOrDefault(lst => lst.Last() + 1 == i);
if (existing == null)
results.Add(new List<int> { i });
else
existing.Add(i);
}
return minLength <= 1 ? results :
results.Where(lst => lst.Count >= minLength);
}
Benefits over the other solutions:
It can find sequences that overlap.
It’s properly reusable and documented.
I have not found any bugs ;-)
Here is how to solve the problem in a "LINQish" way:
int[] arr = new int[]{ 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 };
IOrderedEnumerable<int> sorted = arr.OrderBy(x => x);
int cnt = sorted.Count();
int[] sortedArr = sorted.ToArray();
IEnumerable<int> selected = sortedArr.Where((x, idx) =>
idx <= cnt - 3 && sortedArr[idx + 1] == x + 1 && sortedArr[idx + 2] == x + 2);
IEnumerable<int> result = selected.SelectMany(x => new int[] { x, x + 1, x + 2 }).Distinct();
Console.WriteLine(string.Join(",", result.Select(x=>x.ToString()).ToArray()));
Due to the array copying and reconstruction, this solution - of course - is not as efficient as the traditional solution with loops.
Not 100% Linq but here's a generic variant:
static IEnumerable<IEnumerable<TItem>> GetSequences<TItem>(
int minSequenceLength,
Func<TItem, TItem, bool> areSequential,
IEnumerable<TItem> items)
where TItem : IComparable<TItem>
{
items = items
.OrderBy(n => n)
.Distinct().ToArray();
var lastSelected = default(TItem);
var sequences =
from startItem in items
where startItem.Equals(items.First())
|| startItem.CompareTo(lastSelected) > 0
let sequence =
from item in items
where item.Equals(startItem) || areSequential(lastSelected, item)
select (lastSelected = item)
where sequence.Count() >= minSequenceLength
select sequence;
return sequences;
}
static void UsageInt()
{
var sequences = GetSequences(
3,
(a, b) => a + 1 == b,
new[] { 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 });
foreach (var sequence in sequences)
Console.WriteLine(string.Join(", ", sequence.ToArray()));
}
static void UsageChar()
{
var list = new List<char>(
"abcdefghijklmnopqrstuvwxyz".ToCharArray());
var sequences = GetSequences(
3,
(a, b) => (list.IndexOf(a) + 1 == list.IndexOf(b)),
"PleaseBeGentleWithMe".ToLower().ToCharArray());
foreach (var sequence in sequences)
Console.WriteLine(string.Join(", ", sequence.ToArray()));
}
Here's my shot at it:
public static class SequenceDetector
{
public static IEnumerable<IEnumerable<T>> DetectSequenceWhere<T>(this IEnumerable<T> sequence, Func<T, T, bool> inSequenceSelector)
{
List<T> subsequence = null;
// We can only have a sequence with 2 or more items
T last = sequence.FirstOrDefault();
foreach (var item in sequence.Skip(1))
{
if (inSequenceSelector(last, item))
{
// These form part of a sequence
if (subsequence == null)
{
subsequence = new List<T>();
subsequence.Add(last);
}
subsequence.Add(item);
}
else if (subsequence != null)
{
// We have a previous seq to return
yield return subsequence;
subsequence = null;
}
last = item;
}
if (subsequence != null)
{
// Return any trailing seq
yield return subsequence;
}
}
}
public class test
{
public static void run()
{
var list = new List<int> { 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 };
foreach (var subsequence in list
.OrderBy(i => i)
.Distinct()
.DetectSequenceWhere((first, second) => first + 1 == second)
.Where(seq => seq.Count() >= 3))
{
Console.WriteLine("Found subsequence {0}",
string.Join(", ", subsequence.Select(i => i.ToString()).ToArray()));
}
}
}
This returns the specific items that form the sub-sequences and permits any type of item and any definition of criteria so long as it can be determined by comparing adjacent items.
What about sorting the array then create another array that is the difference between each element the previous one
sortedArray = 8, 9, 10, 21, 22, 23, 24, 27, 30, 31, 32
diffArray = 1, 1, 11, 1, 1, 1, 3, 3, 1, 1
Now iterate through the difference array; if the difference equlas 1, increase the count of a variable, sequenceLength, by 1. If the difference is > 1, check the sequenceLength if it is >=2 then you have a sequence of at at least 3 consecutive elements. Then reset sequenceLenght to 0 and continue your loop on the difference array.
Here is a solution I knocked up in F#, it should be fairly easy to translate this into a C# LINQ query since fold is pretty much equivalent to the LINQ aggregate operator.
#light
let nums = [21;4;7;9;12;22;17;8;2;20;23]
let scanFunc (mainSeqLength, mainCounter, lastNum:int, subSequenceCounter:int, subSequence:'a list, foundSequences:'a list list) (num:'a) =
(mainSeqLength, mainCounter + 1,
num,
(if num <> lastNum + 1 then 1 else subSequenceCounter+1),
(if num <> lastNum + 1 then [num] else subSequence#[num]),
if subSequenceCounter >= 3 then
if mainSeqLength = mainCounter+1
then foundSequences # [subSequence#[num]]
elif num <> lastNum + 1
then foundSequences # [subSequence]
else foundSequences
else foundSequences)
let subSequences = nums |> Seq.sort |> Seq.fold scanFunc (nums |> Seq.length, 0, 0, 0, [], []) |> fun (_,_,_,_,_,results) -> results
Linq isn't the solution for everything, sometimes you're better of with a simple loop. Here's a solution, with just a bit of Linq to order the original sequences and filter the results
void Main()
{
var numbers = new[] { 21,4,7,9,12,22,17,8,2,20,23 };
var sequences =
GetSequences(numbers, (prev, curr) => curr == prev + 1);
.Where(s => s.Count() >= 3);
sequences.Dump();
}
public static IEnumerable<IEnumerable<T>> GetSequences<T>(
IEnumerable<T> source,
Func<T, T, bool> areConsecutive)
{
bool first = true;
T prev = default(T);
List<T> seq = new List<T>();
foreach (var i in source.OrderBy(i => i))
{
if (!first && !areConsecutive(prev, i))
{
yield return seq.ToArray();
seq.Clear();
}
first = false;
seq.Add(i);
prev = i;
}
if (seq.Any())
yield return seq.ToArray();
}
I thought of the same thing as Jon: to represent a range of consecutive integers all you really need are two measly integers! So I'd start there:
struct Range : IEnumerable<int>
{
readonly int _start;
readonly int _count;
public Range(int start, int count)
{
_start = start;
_count = count;
}
public int Start
{
get { return _start; }
}
public int Count
{
get { return _count; }
}
public int End
{
get { return _start + _count - 1; }
}
public IEnumerator<int> GetEnumerator()
{
for (int i = 0; i < _count; ++i)
{
yield return _start + i;
}
}
// Heck, why not?
public static Range operator +(Range x, int y)
{
return new Range(x.Start, x.Count + y);
}
// skipping the explicit IEnumerable.GetEnumerator implementation
}
From there, you can write a static method to return a bunch of these Range values corresponding to the consecutive numbers of your sequence.
static IEnumerable<Range> FindRanges(IEnumerable<int> source, int minCount)
{
// throw exceptions on invalid arguments, maybe...
var ordered = source.OrderBy(x => x);
Range r = default(Range);
foreach (int value in ordered)
{
// In "real" code I would've overridden the Equals method
// and overloaded the == operator to write something like
// if (r == Range.Empty) here... but this works well enough
// for now, since the only time r.Count will be 0 is on the
// first item.
if (r.Count == 0)
{
r = new Range(value, 1);
continue;
}
if (value == r.End)
{
// skip duplicates
continue;
}
else if (value == r.End + 1)
{
// "append" consecutive values to the range
r += 1;
}
else
{
// return what we've got so far
if (r.Count >= minCount)
{
yield return r;
}
// start over
r = new Range(value, 1);
}
}
// return whatever we ended up with
if (r.Count >= minCount)
{
yield return r;
}
}
Demo:
int[] numbers = new[] { 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 };
foreach (Range r in FindConsecutiveRanges(numbers, 3))
{
// Using .NET 3.5 here, don't have the much nicer string.Join overloads.
Console.WriteLine(string.Join(", ", r.Select(x => x.ToString()).ToArray()));
}
Output:
7, 8, 9
20, 21, 22, 23
Here's my LINQ-y take on the problem:
static IEnumerable<IEnumerable<int>>
ConsecutiveSequences(this IEnumerable<int> input, int minLength = 3)
{
int order = 0;
var inorder = new SortedSet<int>(input);
return from item in new[] { new { order = 0, val = inorder.First() } }
.Concat(
inorder.Zip(inorder.Skip(1), (x, val) =>
new { order = x + 1 == val ? order : ++order, val }))
group item.val by item.order into list
where list.Count() >= minLength
select list;
}
uses no explicit loops, but should still be O(n lg n)
uses SortedSet instead of .OrderBy().Distinct()
combines consecutive element with list.Zip(list.Skip(1))
Here's a solution using a Dictionary instead of a sort...
It adds the items to a Dictionary, and then for each value increments above and below to find the longest sequence.
It is not strictly LINQ, though it does make use of some LINQ functions, and I think it is more readable than a pure LINQ solution..
static void Main(string[] args)
{
var items = new[] { -1, 0, 1, 21, -2, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 };
IEnumerable<IEnumerable<int>> sequences = FindSequences(items, 3);
foreach (var sequence in sequences)
{ //print results to consol
Console.Out.WriteLine(sequence.Select(num => num.ToString()).Aggregate((a, b) => a + "," + b));
}
Console.ReadLine();
}
private static IEnumerable<IEnumerable<int>> FindSequences(IEnumerable<int> items, int minSequenceLength)
{
//Convert item list to dictionary
var itemDict = new Dictionary<int, int>();
foreach (int val in items)
{
itemDict[val] = val;
}
var allSequences = new List<List<int>>();
//for each val in items, find longest sequence including that value
foreach (var item in items)
{
var sequence = FindLongestSequenceIncludingValue(itemDict, item);
allSequences.Add(sequence);
//remove items from dict to prevent duplicate sequences
sequence.ForEach(i => itemDict.Remove(i));
}
//return only sequences longer than 3
return allSequences.Where(sequence => sequence.Count >= minSequenceLength).ToList();
}
//Find sequence around start param value
private static List<int> FindLongestSequenceIncludingValue(Dictionary<int, int> itemDict, int value)
{
var result = new List<int>();
//check if num exists in dictionary
if (!itemDict.ContainsKey(value))
return result;
//initialize sequence list
result.Add(value);
//find values greater than starting value
//and add to end of sequence
var indexUp = value + 1;
while (itemDict.ContainsKey(indexUp))
{
result.Add(itemDict[indexUp]);
indexUp++;
}
//find values lower than starting value
//and add to start of sequence
var indexDown = value - 1;
while (itemDict.ContainsKey(indexDown))
{
result.Insert(0, itemDict[indexDown]);
indexDown--;
}
return result;
}

Categories