Split a collection into `n` parts with LINQ? [duplicate] - c#
This question already has answers here:
Split List into Sublists with LINQ
(34 answers)
Closed 1 year ago.
Is there a nice way to split a collection into n parts with LINQ?
Not necessarily evenly of course.
That is, I want to divide the collection into sub-collections, which each contains a subset of the elements, where the last collection can be ragged.
A pure linq and the simplest solution is as shown below.
static class LinqExtensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> list, int parts)
{
int i = 0;
var splits = from item in list
group item by i++ % parts into part
select part.AsEnumerable();
return splits;
}
}
EDIT: Okay, it looks like I misread the question. I read it as "pieces of length n" rather than "n pieces". Doh! Considering deleting answer...
(Original answer)
I don't believe there's a built-in way of partitioning, although I intend to write one in my set of additions to LINQ to Objects. Marc Gravell has an implementation here although I would probably modify it to return a read-only view:
public static IEnumerable<IEnumerable<T>> Partition<T>
(this IEnumerable<T> source, int size)
{
T[] array = null;
int count = 0;
foreach (T item in source)
{
if (array == null)
{
array = new T[size];
}
array[count] = item;
count++;
if (count == size)
{
yield return new ReadOnlyCollection<T>(array);
array = null;
count = 0;
}
}
if (array != null)
{
Array.Resize(ref array, count);
yield return new ReadOnlyCollection<T>(array);
}
}
static class LinqExtensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> list, int parts)
{
return list.Select((item, index) => new {index, item})
.GroupBy(x => x.index % parts)
.Select(x => x.Select(y => y.item));
}
}
Ok, I'll throw my hat in the ring. The advantages of my algorithm:
No expensive multiplication, division, or modulus operators
All operations are O(1) (see note below)
Works for IEnumerable<> source (no Count property needed)
Simple
The code:
public static IEnumerable<IEnumerable<T>>
Section<T>(this IEnumerable<T> source, int length)
{
if (length <= 0)
throw new ArgumentOutOfRangeException("length");
var section = new List<T>(length);
foreach (var item in source)
{
section.Add(item);
if (section.Count == length)
{
yield return section.AsReadOnly();
section = new List<T>(length);
}
}
if (section.Count > 0)
yield return section.AsReadOnly();
}
As pointed out in the comments below, this approach doesn't actually address the original question which asked for a fixed number of sections of approximately equal length. That said, you can still use my approach to solve the original question by calling it this way:
myEnum.Section(myEnum.Count() / number_of_sections + 1)
When used in this manner, the approach is no longer O(1) as the Count() operation is O(N).
This is same as the accepted answer, but a much simpler representation:
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items,
int numOfParts)
{
int i = 0;
return items.GroupBy(x => i++ % numOfParts);
}
The above method splits an IEnumerable<T> into N number of chunks of equal sizes or close to equal sizes.
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items,
int partitionSize)
{
int i = 0;
return items.GroupBy(x => i++ / partitionSize).ToArray();
}
The above method splits an IEnumerable<T> into chunks of desired fixed size with total number of chunks being unimportant - which is not what the question is about.
The problem with the Split method, besides being slower, is that it scrambles the output in the sense that the grouping will be done on the basis of i'th multiple of N for each position, or in other words you don't get the chunks in the original order.
Almost every answer here either doesn't preserve order, or is about partitioning and not splitting, or is plainly wrong. Try this which is faster, preserves order but a lil' more verbose:
public static IEnumerable<IEnumerable<T>> Split<T>(this ICollection<T> items,
int numberOfChunks)
{
if (numberOfChunks <= 0 || numberOfChunks > items.Count)
throw new ArgumentOutOfRangeException("numberOfChunks");
int sizePerPacket = items.Count / numberOfChunks;
int extra = items.Count % numberOfChunks;
for (int i = 0; i < numberOfChunks - extra; i++)
yield return items.Skip(i * sizePerPacket).Take(sizePerPacket);
int alreadyReturnedCount = (numberOfChunks - extra) * sizePerPacket;
int toReturnCount = extra == 0 ? 0 : (items.Count - numberOfChunks) / extra + 1;
for (int i = 0; i < extra; i++)
yield return items.Skip(alreadyReturnedCount + i * toReturnCount).Take(toReturnCount);
}
The equivalent method for a Partition operation here
I have been using the Partition function I posted earlier quite often. The only bad thing about it was that is wasn't completely streaming. This is not a problem if you work with few elements in your sequence. I needed a new solution when i started working with 100.000+ elements in my sequence.
The following solution is a lot more complex (and more code!), but it is very efficient.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;
namespace LuvDaSun.Linq
{
public static class EnumerableExtensions
{
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> enumerable, int partitionSize)
{
/*
return enumerable
.Select((item, index) => new { Item = item, Index = index, })
.GroupBy(item => item.Index / partitionSize)
.Select(group => group.Select(item => item.Item) )
;
*/
return new PartitioningEnumerable<T>(enumerable, partitionSize);
}
}
class PartitioningEnumerable<T> : IEnumerable<IEnumerable<T>>
{
IEnumerable<T> _enumerable;
int _partitionSize;
public PartitioningEnumerable(IEnumerable<T> enumerable, int partitionSize)
{
_enumerable = enumerable;
_partitionSize = partitionSize;
}
public IEnumerator<IEnumerable<T>> GetEnumerator()
{
return new PartitioningEnumerator<T>(_enumerable.GetEnumerator(), _partitionSize);
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
class PartitioningEnumerator<T> : IEnumerator<IEnumerable<T>>
{
IEnumerator<T> _enumerator;
int _partitionSize;
public PartitioningEnumerator(IEnumerator<T> enumerator, int partitionSize)
{
_enumerator = enumerator;
_partitionSize = partitionSize;
}
public void Dispose()
{
_enumerator.Dispose();
}
IEnumerable<T> _current;
public IEnumerable<T> Current
{
get { return _current; }
}
object IEnumerator.Current
{
get { return _current; }
}
public void Reset()
{
_current = null;
_enumerator.Reset();
}
public bool MoveNext()
{
bool result;
if (_enumerator.MoveNext())
{
_current = new PartitionEnumerable<T>(_enumerator, _partitionSize);
result = true;
}
else
{
_current = null;
result = false;
}
return result;
}
}
class PartitionEnumerable<T> : IEnumerable<T>
{
IEnumerator<T> _enumerator;
int _partitionSize;
public PartitionEnumerable(IEnumerator<T> enumerator, int partitionSize)
{
_enumerator = enumerator;
_partitionSize = partitionSize;
}
public IEnumerator<T> GetEnumerator()
{
return new PartitionEnumerator<T>(_enumerator, _partitionSize);
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
class PartitionEnumerator<T> : IEnumerator<T>
{
IEnumerator<T> _enumerator;
int _partitionSize;
int _count;
public PartitionEnumerator(IEnumerator<T> enumerator, int partitionSize)
{
_enumerator = enumerator;
_partitionSize = partitionSize;
}
public void Dispose()
{
}
public T Current
{
get { return _enumerator.Current; }
}
object IEnumerator.Current
{
get { return _enumerator.Current; }
}
public void Reset()
{
if (_count > 0) throw new InvalidOperationException();
}
public bool MoveNext()
{
bool result;
if (_count < _partitionSize)
{
if (_count > 0)
{
result = _enumerator.MoveNext();
}
else
{
result = true;
}
_count++;
}
else
{
result = false;
}
return result;
}
}
}
Enjoy!
Interesting thread. To get a streaming version of Split/Partition, one can use enumerators and yield sequences from the enumerator using extension methods. Converting imperative code to functional code using yield is a very powerful technique indeed.
First an enumerator extension that turns a count of elements into a lazy sequence:
public static IEnumerable<T> TakeFromCurrent<T>(this IEnumerator<T> enumerator, int count)
{
while (count > 0)
{
yield return enumerator.Current;
if (--count > 0 && !enumerator.MoveNext()) yield break;
}
}
And then an enumerable extension that partitions a sequence:
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> seq, int partitionSize)
{
var enumerator = seq.GetEnumerator();
while (enumerator.MoveNext())
{
yield return enumerator.TakeFromCurrent(partitionSize);
}
}
The end result is a highly efficient, streaming and lazy implementation that relies on very simple code.
Enjoy!
I use this:
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> instance, int partitionSize)
{
return instance
.Select((value, index) => new { Index = index, Value = value })
.GroupBy(i => i.Index / partitionSize)
.Select(i => i.Select(i2 => i2.Value));
}
As of .NET 6 you can use Enumerable.Chunk<TSource>(IEnumerable<TSource>, Int32).
This is memory efficient and defers execution as much as possible (per batch) and operates in linear time O(n)
public static IEnumerable<IEnumerable<T>> InBatchesOf<T>(this IEnumerable<T> items, int batchSize)
{
List<T> batch = new List<T>(batchSize);
foreach (var item in items)
{
batch.Add(item);
if (batch.Count >= batchSize)
{
yield return batch;
batch = new List<T>();
}
}
if (batch.Count != 0)
{
//can't be batch size or would've yielded above
batch.TrimExcess();
yield return batch;
}
}
There are lots of great answers for this question (and its cousins). I needed this myself and had created a solution that is designed to be efficient and error tolerant in a scenario where the source collection can be treated as a list. It does not use any lazy iteration so it may not be suitable for collections of unknown size that may apply memory pressure.
static public IList<T[]> GetChunks<T>(this IEnumerable<T> source, int batchsize)
{
IList<T[]> result = null;
if (source != null && batchsize > 0)
{
var list = source as List<T> ?? source.ToList();
if (list.Count > 0)
{
result = new List<T[]>();
for (var index = 0; index < list.Count; index += batchsize)
{
var rangesize = Math.Min(batchsize, list.Count - index);
result.Add(list.GetRange(index, rangesize).ToArray());
}
}
}
return result ?? Enumerable.Empty<T[]>().ToList();
}
static public void TestGetChunks()
{
var ids = Enumerable.Range(1, 163).Select(i => i.ToString());
foreach (var chunk in ids.GetChunks(20))
{
Console.WriteLine("[{0}]", String.Join(",", chunk));
}
}
I have seen a few answers across this family of questions that use GetRange and Math.Min. But I believe that overall this is a more complete solution in terms of error checking and efficiency.
protected List<List<int>> MySplit(int MaxNumber, int Divider)
{
List<List<int>> lst = new List<List<int>>();
int ListCount = 0;
int d = MaxNumber / Divider;
lst.Add(new List<int>());
for (int i = 1; i <= MaxNumber; i++)
{
lst[ListCount].Add(i);
if (i != 0 && i % d == 0)
{
ListCount++;
d += MaxNumber / Divider;
lst.Add(new List<int>());
}
}
return lst;
}
Great Answers, for my scenario i tested the accepted answer , and it seems it does not keep order. there is also great answer by Nawfal that keeps order.
But in my scenario i wanted to split the remainder in a normalized way,
all answers i saw spread the remainder or at the beginning or at the end.
My answer also takes the remainder spreading in more normalized way.
static class Program
{
static void Main(string[] args)
{
var input = new List<String>();
for (int k = 0; k < 18; ++k)
{
input.Add(k.ToString());
}
var result = splitListIntoSmallerLists(input, 15);
int i = 0;
foreach(var resul in result){
Console.WriteLine("------Segment:" + i.ToString() + "--------");
foreach(var res in resul){
Console.WriteLine(res);
}
i++;
}
Console.ReadLine();
}
private static List<List<T>> splitListIntoSmallerLists<T>(List<T> i_bigList,int i_numberOfSmallerLists)
{
if (i_numberOfSmallerLists <= 0)
throw new ArgumentOutOfRangeException("Illegal value of numberOfSmallLists");
int normalizedSpreadRemainderCounter = 0;
int normalizedSpreadNumber = 0;
//e.g 7 /5 > 0 ==> output size is 5 , 2 /5 < 0 ==> output is 2
int minimumNumberOfPartsInEachSmallerList = i_bigList.Count / i_numberOfSmallerLists;
int remainder = i_bigList.Count % i_numberOfSmallerLists;
int outputSize = minimumNumberOfPartsInEachSmallerList > 0 ? i_numberOfSmallerLists : remainder;
//In case remainder > 0 we want to spread the remainder equally between the others
if (remainder > 0)
{
if (minimumNumberOfPartsInEachSmallerList > 0)
{
normalizedSpreadNumber = (int)Math.Floor((double)i_numberOfSmallerLists / remainder);
}
else
{
normalizedSpreadNumber = 1;
}
}
List<List<T>> retVal = new List<List<T>>(outputSize);
int inputIndex = 0;
for (int i = 0; i < outputSize; ++i)
{
retVal.Add(new List<T>());
if (minimumNumberOfPartsInEachSmallerList > 0)
{
retVal[i].AddRange(i_bigList.GetRange(inputIndex, minimumNumberOfPartsInEachSmallerList));
inputIndex += minimumNumberOfPartsInEachSmallerList;
}
//If we have remainder take one from it, if our counter is equal to normalizedSpreadNumber.
if (remainder > 0)
{
if (normalizedSpreadRemainderCounter == normalizedSpreadNumber-1)
{
retVal[i].Add(i_bigList[inputIndex]);
remainder--;
inputIndex++;
normalizedSpreadRemainderCounter=0;
}
else
{
normalizedSpreadRemainderCounter++;
}
}
}
return retVal;
}
}
If order in these parts is not very important you can try this:
int[] array = new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
int n = 3;
var result =
array.Select((value, index) => new { Value = value, Index = index }).GroupBy(i => i.Index % n, i => i.Value);
// or
var result2 =
from i in array.Select((value, index) => new { Value = value, Index = index })
group i.Value by i.Index % n into g
select g;
However these can't be cast to IEnumerable<IEnumerable<int>> by some reason...
This is my code, nice and short.
<Extension()> Public Function Chunk(Of T)(ByVal this As IList(Of T), ByVal size As Integer) As List(Of List(Of T))
Dim result As New List(Of List(Of T))
For i = 0 To CInt(Math.Ceiling(this.Count / size)) - 1
result.Add(New List(Of T)(this.GetRange(i * size, Math.Min(size, this.Count - (i * size)))))
Next
Return result
End Function
This is my way, listing items and breaking row by columns
int repat_count=4;
arrItems.ForEach((x, i) => {
if (i % repat_count == 0)
row = tbo.NewElement(el_tr, cls_min_height);
var td = row.NewElement(el_td);
td.innerHTML = x.Name;
});
I was looking for a split like the one with string, so the whole List is splitted according to some rule, not only the first part, this is my solution
List<int> sequence = new List<int>();
for (int i = 0; i < 2000; i++)
{
sequence.Add(i);
}
int splitIndex = 900;
List<List<int>> splitted = new List<List<int>>();
while (sequence.Count != 0)
{
splitted.Add(sequence.Take(splitIndex).ToList() );
sequence.RemoveRange(0, Math.Min(splitIndex, sequence.Count));
}
Here is a little tweak for the number of items instead of the number of parts:
public static class MiscExctensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> list, int nbItems)
{
return (
list
.Select((o, n) => new { o, n })
.GroupBy(g => (int)(g.n / nbItems))
.Select(g => g.Select(x => x.o))
);
}
}
below code returns both given number of chunks also with sorted data
static IEnumerable<IEnumerable<T>> SplitSequentially<T>(int chunkParts, List<T> inputList)
{
List<int> Splits = split(inputList.Count, chunkParts);
var skipNumber = 0;
List<List<T>> list = new List<List<T>>();
foreach (var count in Splits)
{
var internalList = inputList.Skip(skipNumber).Take(count).ToList();
list.Add(internalList);
skipNumber += count;
}
return list;
}
static List<int> split(int x, int n)
{
List<int> list = new List<int>();
if (x % n == 0)
{
for (int i = 0; i < n; i++)
list.Add(x / n);
}
else
{
// upto n-(x % n) the values
// will be x / n
// after that the values
// will be x / n + 1
int zp = n - (x % n);
int pp = x / n;
for (int i = 0; i < n; i++)
{
if (i >= zp)
list.Add((pp + 1));
else
list.Add(pp);
}
}
return list;
}
int[] items = new int[] { 0,1,2,3,4,5,6,7,8,9, 10 };
int itemIndex = 0;
int groupSize = 2;
int nextGroup = groupSize;
var seqItems = from aItem in items
group aItem by
(itemIndex++ < nextGroup)
?
nextGroup / groupSize
:
(nextGroup += groupSize) / groupSize
into itemGroup
select itemGroup.AsEnumerable();
Just came across this thread, and most of the solutions here involve adding items to collections, effectively materialising each page before returning it. This is bad for two reasons - firstly if your pages are large there's a memory overhead to filling the page, secondly there are iterators which invalidate previous records when you advance to the next one (for example if you wrap a DataReader within an enumerator method).
This solution uses two nested enumerator methods to avoid any need to cache items into temporary collections. Since the outer and inner iterators are traversing the same enumerable, they necessarily share the same enumerator, so it's important not to advance the outer one until you're done with processing the current page. That said, if you decide not to iterate all the way through the current page, when you move to the next page this solution will iterate forward to the page boundary automatically.
using System.Collections.Generic;
public static class EnumerableExtensions
{
/// <summary>
/// Partitions an enumerable into individual pages of a specified size, still scanning the source enumerable just once
/// </summary>
/// <typeparam name="T">The element type</typeparam>
/// <param name="enumerable">The source enumerable</param>
/// <param name="pageSize">The number of elements to return in each page</param>
/// <returns></returns>
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> enumerable, int pageSize)
{
var enumerator = enumerable.GetEnumerator();
while (enumerator.MoveNext())
{
var indexWithinPage = new IntByRef { Value = 0 };
yield return SubPartition(enumerator, pageSize, indexWithinPage);
// Continue iterating through any remaining items in the page, to align with the start of the next page
for (; indexWithinPage.Value < pageSize; indexWithinPage.Value++)
{
if (!enumerator.MoveNext())
{
yield break;
}
}
}
}
private static IEnumerable<T> SubPartition<T>(IEnumerator<T> enumerator, int pageSize, IntByRef index)
{
for (; index.Value < pageSize; index.Value++)
{
yield return enumerator.Current;
if (!enumerator.MoveNext())
{
yield break;
}
}
}
private class IntByRef
{
public int Value { get; set; }
}
}
Related
Index out of bounds of array but only sometimes [duplicate]
Suppose I had a string: string str = "1111222233334444"; How can I break this string into chunks of some size? e.g., breaking this into sizes of 4 would return strings: "1111" "2222" "3333" "4444"
static IEnumerable<string> Split(string str, int chunkSize) { return Enumerable.Range(0, str.Length / chunkSize) .Select(i => str.Substring(i * chunkSize, chunkSize)); } Please note that additional code might be required to gracefully handle edge cases (null or empty input string, chunkSize == 0, input string length not divisible by chunkSize, etc.). The original question doesn't specify any requirements for these edge cases and in real life the requirements might vary so they are out of scope of this answer.
In a combination of dove+Konstatin's answers... static IEnumerable<string> WholeChunks(string str, int chunkSize) { for (int i = 0; i < str.Length; i += chunkSize) yield return str.Substring(i, chunkSize); } This will work for all strings that can be split into a whole number of chunks, and will throw an exception otherwise. If you want to support strings of any length you could use the following code: static IEnumerable<string> ChunksUpto(string str, int maxChunkSize) { for (int i = 0; i < str.Length; i += maxChunkSize) yield return str.Substring(i, Math.Min(maxChunkSize, str.Length-i)); } However, the the OP explicitly stated he does not need this; it's somewhat longer and harder to read, slightly slower. In the spirit of KISS and YAGNI, I'd go with the first option: it's probably the most efficient implementation possible, and it's very short, readable, and, importantly, throws an exception for nonconforming input.
Why not loops? Here's something that would do it quite well: string str = "111122223333444455"; int chunkSize = 4; int stringLength = str.Length; for (int i = 0; i < stringLength ; i += chunkSize) { if (i + chunkSize > stringLength) chunkSize = stringLength - i; Console.WriteLine(str.Substring(i, chunkSize)); } Console.ReadLine(); I don't know how you'd deal with case where the string is not factor of 4, but not saying you're idea is not possible, just wondering the motivation for it if a simple for loop does it very well? Obviously the above could be cleaned and even put in as an extension method. Or as mentioned in comments, you know it's /4 then str = "1111222233334444"; for (int i = 0; i < stringLength; i += chunkSize) {Console.WriteLine(str.Substring(i, chunkSize));}
This is based on #dove solution but implemented as an extension method. Benefits: Extension method Covers corner cases Splits string with any chars: numbers, letters, other symbols Code public static class EnumerableEx { public static IEnumerable<string> SplitBy(this string str, int chunkLength) { if (String.IsNullOrEmpty(str)) throw new ArgumentException(); if (chunkLength < 1) throw new ArgumentException(); for (int i = 0; i < str.Length; i += chunkLength) { if (chunkLength + i > str.Length) chunkLength = str.Length - i; yield return str.Substring(i, chunkLength); } } } Usage var result = "bobjoecat".SplitBy(3); // bob, joe, cat Unit tests removed for brevity (see previous revision)
Using regular expressions and Linq: List<string> groups = (from Match m in Regex.Matches(str, #"\d{4}") select m.Value).ToList(); I find this to be more readable, but it's just a personal opinion. It can also be a one-liner : ).
How's this for a one-liner? List<string> result = new List<string>(Regex.Split(target, #"(?<=\G.{4})", RegexOptions.Singleline)); With this regex it doesn't matter if the last chunk is less than four characters, because it only ever looks at the characters behind it. I'm sure this isn't the most efficient solution, but I just had to toss it out there.
Starting with .NET 6, we can also use the Chunk method: var result = str .Chunk(4) .Select(x => new string(x)) .ToList();
I recently had to write something that accomplishes this at work, so I thought I would post my solution to this problem. As an added bonus, the functionality of this solution provides a way to split the string in the opposite direction and it does correctly handle unicode characters as previously mentioned by Marvin Pinto above. So, here it is: using System; using Extensions; namespace TestCSharp { class Program { static void Main(string[] args) { string asciiStr = "This is a string."; string unicodeStr = "これは文字列です。"; string[] array1 = asciiStr.Split(4); string[] array2 = asciiStr.Split(-4); string[] array3 = asciiStr.Split(7); string[] array4 = asciiStr.Split(-7); string[] array5 = unicodeStr.Split(5); string[] array6 = unicodeStr.Split(-5); } } } namespace Extensions { public static class StringExtensions { /// <summary>Returns a string array that contains the substrings in this string that are seperated a given fixed length.</summary> /// <param name="s">This string object.</param> /// <param name="length">Size of each substring. /// <para>CASE: length > 0 , RESULT: String is split from left to right.</para> /// <para>CASE: length == 0 , RESULT: String is returned as the only entry in the array.</para> /// <para>CASE: length < 0 , RESULT: String is split from right to left.</para> /// </param> /// <returns>String array that has been split into substrings of equal length.</returns> /// <example> /// <code> /// string s = "1234567890"; /// string[] a = s.Split(4); // a == { "1234", "5678", "90" } /// </code> /// </example> public static string[] Split(this string s, int length) { System.Globalization.StringInfo str = new System.Globalization.StringInfo(s); int lengthAbs = Math.Abs(length); if (str == null || str.LengthInTextElements == 0 || lengthAbs == 0 || str.LengthInTextElements <= lengthAbs) return new string[] { str.ToString() }; string[] array = new string[(str.LengthInTextElements % lengthAbs == 0 ? str.LengthInTextElements / lengthAbs: (str.LengthInTextElements / lengthAbs) + 1)]; if (length > 0) for (int iStr = 0, iArray = 0; iStr < str.LengthInTextElements && iArray < array.Length; iStr += lengthAbs, iArray++) array[iArray] = str.SubstringByTextElements(iStr, (str.LengthInTextElements - iStr < lengthAbs ? str.LengthInTextElements - iStr : lengthAbs)); else // if (length < 0) for (int iStr = str.LengthInTextElements - 1, iArray = array.Length - 1; iStr >= 0 && iArray >= 0; iStr -= lengthAbs, iArray--) array[iArray] = str.SubstringByTextElements((iStr - lengthAbs < 0 ? 0 : iStr - lengthAbs + 1), (iStr - lengthAbs < 0 ? iStr + 1 : lengthAbs)); return array; } } } Also, here is an image link to the results of running this code: http://i.imgur.com/16Iih.png
It's not pretty and it's not fast, but it works, it's a one-liner and it's LINQy: List<string> a = text.Select((c, i) => new { Char = c, Index = i }).GroupBy(o => o.Index / 4).Select(g => new String(g.Select(o => o.Char).ToArray())).ToList();
This should be much faster and more efficient than using LINQ or other approaches used here. public static IEnumerable<string> Splice(this string s, int spliceLength) { if (s == null) throw new ArgumentNullException("s"); if (spliceLength < 1) throw new ArgumentOutOfRangeException("spliceLength"); if (s.Length == 0) yield break; var start = 0; for (var end = spliceLength; end < s.Length; end += spliceLength) { yield return s.Substring(start, spliceLength); start = end; } yield return s.Substring(start); }
You can use morelinq by Jon Skeet. Use Batch like: string str = "1111222233334444"; int chunkSize = 4; var chunks = str.Batch(chunkSize).Select(r => new String(r.ToArray())); This will return 4 chunks for the string "1111222233334444". If the string length is less than or equal to the chunk size Batch will return the string as the only element of IEnumerable<string> For output: foreach (var chunk in chunks) { Console.WriteLine(chunk); } and it will give: 1111 2222 3333 4444
Personally I prefer my solution :-) It handles: String lengths that are a multiple of the chunk size. String lengths that are NOT a multiple of the chunk size. String lengths that are smaller than the chunk size. NULL and empty strings (throws an exception). Chunk sizes smaller than 1 (throws an exception). It is implemented as a extension method, and it calculates the number of chunks is going to generate beforehand. It checks the last chunk because in case the text length is not a multiple it needs to be shorter. Clean, short, easy to understand... and works! public static string[] Split(this string value, int chunkSize) { if (string.IsNullOrEmpty(value)) throw new ArgumentException("The string cannot be null."); if (chunkSize < 1) throw new ArgumentException("The chunk size should be equal or greater than one."); int remainder; int divResult = Math.DivRem(value.Length, chunkSize, out remainder); int numberOfChunks = remainder > 0 ? divResult + 1 : divResult; var result = new string[numberOfChunks]; int i = 0; while (i < numberOfChunks - 1) { result[i] = value.Substring(i * chunkSize, chunkSize); i++; } int lastChunkSize = remainder > 0 ? remainder : chunkSize; result[i] = value.Substring(i * chunkSize, lastChunkSize); return result; }
Simple and short: // this means match a space or not a space (anything) up to 4 characters var lines = Regex.Matches(str, #"[\s\S]{0,4}").Cast<Match>().Select(x => x.Value);
I know question is years old, but here is a Rx implementation. It handles the length % chunkSize != 0 problem out of the box: public static IEnumerable<string> Chunkify(this string input, int size) { if(size < 1) throw new ArgumentException("size must be greater than 0"); return input.ToCharArray() .ToObservable() .Buffer(size) .Select(x => new string(x.ToArray())) .ToEnumerable(); }
public static IEnumerable<IEnumerable<T>> SplitEvery<T>(this IEnumerable<T> values, int n) { var ls = values.Take(n); var rs = values.Skip(n); return ls.Any() ? Cons(ls, SplitEvery(rs, n)) : Enumerable.Empty<IEnumerable<T>>(); } public static IEnumerable<T> Cons<T>(T x, IEnumerable<T> xs) { yield return x; foreach (var xi in xs) yield return xi; }
Best , Easiest and Generic Answer :). string originalString = "1111222233334444"; List<string> test = new List<string>(); int chunkSize = 4; // change 4 with the size of strings you want. for (int i = 0; i < originalString.Length; i = i + chunkSize) { if (originalString.Length - i >= chunkSize) test.Add(originalString.Substring(i, chunkSize)); else test.Add(originalString.Substring(i,((originalString.Length - i)))); }
static IEnumerable<string> Split(string str, int chunkSize) { IEnumerable<string> retVal = Enumerable.Range(0, str.Length / chunkSize) .Select(i => str.Substring(i * chunkSize, chunkSize)) if (str.Length % chunkSize > 0) retVal = retVal.Append(str.Substring(str.Length / chunkSize * chunkSize, str.Length % chunkSize)); return retVal; } It correctly handles input string length not divisible by chunkSize. Please note that additional code might be required to gracefully handle edge cases (null or empty input string, chunkSize == 0).
static IEnumerable<string> Split(string str, double chunkSize) { return Enumerable.Range(0, (int) Math.Ceiling(str.Length/chunkSize)) .Select(i => new string(str .Skip(i * (int)chunkSize) .Take((int)chunkSize) .ToArray())); } and another approach: using System; using System.Collections.Generic; using System.Linq; public class Program { public static void Main() { var x = "Hello World"; foreach(var i in x.ChunkString(2)) Console.WriteLine(i); } } public static class Ext{ public static IEnumerable<string> ChunkString(this string val, int chunkSize){ return val.Select((x,i) => new {Index = i, Value = x}) .GroupBy(x => x.Index/chunkSize, x => x.Value) .Select(x => string.Join("",x)); } }
Six years later o_O Just because public static IEnumerable<string> Split(this string str, int chunkSize, bool remainingInFront) { var count = (int) Math.Ceiling(str.Length/(double) chunkSize); Func<int, int> start = index => remainingInFront ? str.Length - (count - index)*chunkSize : index*chunkSize; Func<int, int> end = index => Math.Min(str.Length - Math.Max(start(index), 0), Math.Min(start(index) + chunkSize - Math.Max(start(index), 0), chunkSize)); return Enumerable.Range(0, count).Select(i => str.Substring(Math.Max(start(i), 0),end(i))); } or private static Func<bool, int, int, int, int, int> start = (remainingInFront, length, count, index, size) => remainingInFront ? length - (count - index) * size : index * size; private static Func<bool, int, int, int, int, int, int> end = (remainingInFront, length, count, index, size, start) => Math.Min(length - Math.Max(start, 0), Math.Min(start + size - Math.Max(start, 0), size)); public static IEnumerable<string> Split(this string str, int chunkSize, bool remainingInFront) { var count = (int)Math.Ceiling(str.Length / (double)chunkSize); return Enumerable.Range(0, count).Select(i => str.Substring( Math.Max(start(remainingInFront, str.Length, count, i, chunkSize), 0), end(remainingInFront, str.Length, count, i, chunkSize, start(remainingInFront, str.Length, count, i, chunkSize)) )); } AFAIK all edge cases are handled. Console.WriteLine(string.Join(" ", "abc".Split(2, false))); // ab c Console.WriteLine(string.Join(" ", "abc".Split(2, true))); // a bc Console.WriteLine(string.Join(" ", "a".Split(2, true))); // a Console.WriteLine(string.Join(" ", "a".Split(2, false))); // a
List<string> SplitString(int chunk, string input) { List<string> list = new List<string>(); int cycles = input.Length / chunk; if (input.Length % chunk != 0) cycles++; for (int i = 0; i < cycles; i++) { try { list.Add(input.Substring(i * chunk, chunk)); } catch { list.Add(input.Substring(i * chunk)); } } return list; }
I took this to another level. Chucking is an easy one liner, but in my case I needed whole words as well. Figured I would post it, just in case someone else needs something similar. static IEnumerable<string> Split(string orgString, int chunkSize, bool wholeWords = true) { if (wholeWords) { List<string> result = new List<string>(); StringBuilder sb = new StringBuilder(); if (orgString.Length > chunkSize) { string[] newSplit = orgString.Split(' '); foreach (string str in newSplit) { if (sb.Length != 0) sb.Append(" "); if (sb.Length + str.Length > chunkSize) { result.Add(sb.ToString()); sb.Clear(); } sb.Append(str); } result.Add(sb.ToString()); } else result.Add(orgString); return result; } else return new List<string>(Regex.Split(orgString, #"(?<=\G.{" + chunkSize + "})", RegexOptions.Singleline)); } Results based on below comment: string msg = "336699AABBCCDDEEFF"; foreach (string newMsg in Split(msg, 2, false)) { Console.WriteLine($">>{newMsg}<<"); } Console.ReadKey(); Results: >>33<< >>66<< >>99<< >>AA<< >>BB<< >>CC<< >>DD<< >>EE<< >>FF<< >><< Another way to pull it: List<string> splitData = (List<string>)Split(msg, 2, false); for (int i = 0; i < splitData.Count - 1; i++) { Console.WriteLine($">>{splitData[i]}<<"); } Console.ReadKey(); New Results: >>33<< >>66<< >>99<< >>AA<< >>BB<< >>CC<< >>DD<< >>EE<< >>FF<<
An important tip if the string that is being chunked needs to support all Unicode characters. If the string is to support international characters like 𠀋, then split up the string using the System.Globalization.StringInfo class. Using StringInfo, you can split up the string based on number of text elements. string internationalString = '𠀋'; The above string has a Length of 2, because the String.Length property returns the number of Char objects in this instance, not the number of Unicode characters.
Changed slightly to return parts whose size not equal to chunkSize public static IEnumerable<string> Split(this string str, int chunkSize) { var splits = new List<string>(); if (str.Length < chunkSize) { chunkSize = str.Length; } splits.AddRange(Enumerable.Range(0, str.Length / chunkSize).Select(i => str.Substring(i * chunkSize, chunkSize))); splits.Add(str.Length % chunkSize > 0 ? str.Substring((str.Length / chunkSize) * chunkSize, str.Length - ((str.Length / chunkSize) * chunkSize)) : string.Empty); return (IEnumerable<string>)splits; }
I think this is an straight forward answer: public static IEnumerable<string> Split(this string str, int chunkSize) { if(string.IsNullOrEmpty(str) || chunkSize<1) throw new ArgumentException("String can not be null or empty and chunk size should be greater than zero."); var chunkCount = str.Length / chunkSize + (str.Length % chunkSize != 0 ? 1 : 0); for (var i = 0; i < chunkCount; i++) { var startIndex = i * chunkSize; if (startIndex + chunkSize >= str.Length) yield return str.Substring(startIndex); else yield return str.Substring(startIndex, chunkSize); } } And it covers edge cases.
static List<string> GetChunks(string value, int chunkLength) { var res = new List<string>(); int count = (value.Length / chunkLength) + (value.Length % chunkLength > 0 ? 1 : 0); Enumerable.Range(0, count).ToList().ForEach(f => res.Add(value.Skip(f * chunkLength).Take(chunkLength).Select(z => z.ToString()).Aggregate((a,b) => a+b))); return res; } demo
Here's my 2 cents: IEnumerable<string> Split(string str, int chunkSize) { while (!string.IsNullOrWhiteSpace(str)) { var chunk = str.Take(chunkSize).ToArray(); str = str.Substring(chunk.Length); yield return new string(chunk); } }//Split
I've slightly build up on João's solution. What I've done differently is in my method you can actually specify whether you want to return the array with remaining characters or whether you want to truncate them if the end characters do not match your required chunk length, I think it's pretty flexible and the code is fairly straight forward: using System; using System.Linq; using System.Text.RegularExpressions; namespace SplitFunction { class Program { static void Main(string[] args) { string text = "hello, how are you doing today?"; string[] chunks = SplitIntoChunks(text, 3,false); if (chunks != null) { chunks.ToList().ForEach(e => Console.WriteLine(e)); } Console.ReadKey(); } private static string[] SplitIntoChunks(string text, int chunkSize, bool truncateRemaining) { string chunk = chunkSize.ToString(); string pattern = truncateRemaining ? ".{" + chunk + "}" : ".{1," + chunk + "}"; string[] chunks = null; if (chunkSize > 0 && !String.IsNullOrEmpty(text)) chunks = (from Match m in Regex.Matches(text,pattern)select m.Value).ToArray(); return chunks; } } }
public static List<string> SplitByMaxLength(this string str) { List<string> splitString = new List<string>(); for (int index = 0; index < str.Length; index += MaxLength) { splitString.Add(str.Substring(index, Math.Min(MaxLength, str.Length - index))); } return splitString; }
I can't remember who gave me this, but it works great. I speed tested a number of ways to break Enumerable types into groups. The usage would just be like this... List<string> Divided = Source3.Chunk(24).Select(Piece => string.Concat<char>(Piece)).ToList(); The extention code would look like this... #region Chunk Logic private class ChunkedEnumerable<T> : IEnumerable<T> { class ChildEnumerator : IEnumerator<T> { ChunkedEnumerable<T> parent; int position; bool done = false; T current; public ChildEnumerator(ChunkedEnumerable<T> parent) { this.parent = parent; position = -1; parent.wrapper.AddRef(); } public T Current { get { if (position == -1 || done) { throw new InvalidOperationException(); } return current; } } public void Dispose() { if (!done) { done = true; parent.wrapper.RemoveRef(); } } object System.Collections.IEnumerator.Current { get { return Current; } } public bool MoveNext() { position++; if (position + 1 > parent.chunkSize) { done = true; } if (!done) { done = !parent.wrapper.Get(position + parent.start, out current); } return !done; } public void Reset() { // per http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.reset.aspx throw new NotSupportedException(); } } EnumeratorWrapper<T> wrapper; int chunkSize; int start; public ChunkedEnumerable(EnumeratorWrapper<T> wrapper, int chunkSize, int start) { this.wrapper = wrapper; this.chunkSize = chunkSize; this.start = start; } public IEnumerator<T> GetEnumerator() { return new ChildEnumerator(this); } System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() { return GetEnumerator(); } } private class EnumeratorWrapper<T> { public EnumeratorWrapper(IEnumerable<T> source) { SourceEumerable = source; } IEnumerable<T> SourceEumerable { get; set; } Enumeration currentEnumeration; class Enumeration { public IEnumerator<T> Source { get; set; } public int Position { get; set; } public bool AtEnd { get; set; } } public bool Get(int pos, out T item) { if (currentEnumeration != null && currentEnumeration.Position > pos) { currentEnumeration.Source.Dispose(); currentEnumeration = null; } if (currentEnumeration == null) { currentEnumeration = new Enumeration { Position = -1, Source = SourceEumerable.GetEnumerator(), AtEnd = false }; } item = default(T); if (currentEnumeration.AtEnd) { return false; } while (currentEnumeration.Position < pos) { currentEnumeration.AtEnd = !currentEnumeration.Source.MoveNext(); currentEnumeration.Position++; if (currentEnumeration.AtEnd) { return false; } } item = currentEnumeration.Source.Current; return true; } int refs = 0; // needed for dispose semantics public void AddRef() { refs++; } public void RemoveRef() { refs--; if (refs == 0 && currentEnumeration != null) { var copy = currentEnumeration; currentEnumeration = null; copy.Source.Dispose(); } } } /// <summary>Speed Checked. Works Great!</summary> public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize) { if (chunksize < 1) throw new InvalidOperationException(); var wrapper = new EnumeratorWrapper<T>(source); int currentPos = 0; T ignore; try { wrapper.AddRef(); while (wrapper.Get(currentPos, out ignore)) { yield return new ChunkedEnumerable<T>(wrapper, chunksize, currentPos); currentPos += chunksize; } } finally { wrapper.RemoveRef(); } } #endregion
class StringHelper { static void Main(string[] args) { string str = "Hi my name is vikas bansal and my email id is bansal.vks#gmail.com"; int offSet = 10; List<string> chunks = chunkMyStr(str, offSet); Console.Read(); } static List<string> chunkMyStr(string str, int offSet) { List<string> resultChunks = new List<string>(); for (int i = 0; i < str.Length; i += offSet) { string temp = str.Substring(i, (str.Length - i) > offSet ? offSet : (str.Length - i)); Console.WriteLine(temp); resultChunks.Add(temp); } return resultChunks; } }
c# - Binary Search Algorithm for Custom Class String List
Basically I need help adapting my Binary Search Algorithm to work with my string list as seen below. Note, I have to use a written Binary Search algorithm, no use of built-in c# functions like .BinarySearch . I will now show you how the list is formatted and the list itself: // This class formats the list, might be useful to know public class Demo { public string Col; public string S1; public string S2; public string S3; public override string ToString() { return string.Format("Col: {0}, S1: {1}, S2: {2}, S3: {3}", Col, S1, S2, S3); } } // The list itself var list = new List<Demo> { new Demo {Col = "Blue", S1 ="88", S2 ="Yes"}, new Demo {Col = "Green", S1 ="43", S2 ="Yes"}, new Demo {Col = "Red", S1 ="216", S2 ="No"}, new Demo {Col = "Yellow", S1 ="100", S2 ="No"} }; The list is already sorted into alphabetical order of the 'Col' string values, hence why Blue is first and Yellow is last. The 'Col' is the part of the list that needs to be searched. Below I have inserted my current Binary Search that can search int arrays. public static int BinarySearch_R(int key, int[] array, int low, int high) { if (low > high) return -1; int mid = (low + high) / 2; if (key == array[mid]) { return mid; } if (key < array[mid]) { return BinarySearch_R(key, array, low, mid - 1); } else { return BinarySearch_R(key, array, mid + 1, high); } } I need help adapting my BinarySearch Algorith to work for the list above. If you guys have any questions, or need to see more of my code, just ask.
Concrete answer: Adapting your method for the specific case is quite easy. Let first update your existing method to use a more general method (IComparable<T>.CompareTo for comparing rather than the int operators: public static int BinarySearch_R(int key, int[] array, int low, int high) { if (low > high) return -1; int mid = (low + high) / 2; int compare = key.CompareTo(array[mid]); if (compare == 0) { return mid; } if (compare < 0) { return BinarySearch_R(key, array, low, mid - 1); } else { return BinarySearch_R(key, array, mid + 1, high); } } Then all you need is to copy/paste the above method, replace int key with string key, int[] array with List<Demo> array and array[mid] with array[mid].Col: public static int BinarySearch_R(string key, List<Demo> array, int low, int high) { if (low > high) return -1; int mid = (low + high) / 2; int compare = key.CompareTo(array[mid].Col); if (compare == 0) { return mid; } if (compare < 0) { return BinarySearch_R(key, array, low, mid - 1); } else { return BinarySearch_R(key, array, mid + 1, high); } } Extended answer: While you can do the above, it will require you to do the same for any other property/class you need such capability. A much better approach would be to generalize the code. For instance, int[] and List<Demo> can be generalized as IReadOnlyList<T>, int/string key as TKey key, Demo.Col as Func<T, TKey>, CompareTo as IComparer<TKey>.Compare, so the final generic method could be like this: public static class MyAlgorithms { public static int BinarySearch<T, TKey>(this IReadOnlyList<T> source, Func<T, TKey> keySelector, TKey key, IComparer<TKey> keyComparer = null) { return source.BinarySearch(0, source.Count, keySelector, key, keyComparer); } public static int BinarySearch<T, TKey>(this IReadOnlyList<T> source, int start, int count, Func<T, TKey> keySelector, TKey key, IComparer<TKey> keyComparer = null) { // Argument validations skipped if (keyComparer == null) keyComparer = Comparer<TKey>.Default; int lo = start, hi = start + count - 1; while (lo <= hi) { int mid = lo + (hi - lo) / 2; int compare = keyComparer.Compare(key, keySelector(source[mid])); if (compare < 0) hi = mid - 1; else if (compare > 0) lo = mid + 1; else return mid; } return -1; } } Now you can use that single method for any data structure. For instance, searching your List<Demo> by Col would be like this: int index = list.BinarySearch(e => e.Col, "Red");
Ive only done the most basic things in C# so this might just be completely useless. I had an assignment for CS 2 class where at least it sounds somewhat similar to what you want but we use java. So im going to assume you want your list of items sorted by some keyword ("Blue","Green" etc...). I used a LinkedList but it doesnt matter. class Node { String keyword; LinkedList<String> records = new LinkedList<>(); Node left; Node right; public Node(String keyword, LinkedList<String> records) { this.keyword = keyword; this.records = records; } } Now, the only real difference at least i can tell between having a BST sorted by a string and one sorted by numbers is that you need some type of comparison method to see whether one word is > or < in alphabet. So here's how i did the insert function: /** * insert node * #param keyword compare it to other strings */ public void insert(String keyword, LinkedList<String> records) { //create a new Node Node n = new Node(keyword, records); int result; Node current = root; Node parent = null; //cont. until NULL while (current != null) { result = current.keyword.compareTo(n.keyword); if (result == 0) return; else if (result > 0) { parent = current; current = current.left; } else if (result < 0) { parent = current; current = current.right; } } if (parent == null) root = n; else { result = parent.keyword.compareTo(n.keyword); if (result > 0) parent.left = n; else if (result < 0) parent.right = n; } } So the method "compareTo(...)" returns 1 if string is higher in alphabet 0 if same and -1 if lower. So i would, if im at all close to getting what youre asking, get the C# version of this method and implement BST as you normally would.
Just create make class IComparable and create a custom CompareTo() method. The standard methods like sort will automatically work once the class inherits IComparable. public class Demo : IComparable { public string Color; public int value; public Boolean truth; public int CompareTo(Demo other) { int results = 0; if (this.Color == other.Color) { if (this.value == other.value) { results = this.truth.CompareTo(other.truth); } else { results = this.value.CompareTo(other.value); } } else { results = this.Color.CompareTo(other.Color); } return results; }
Build up a string of permutations in an array
I have a sanctions api that i need to call, passing in a string of values. these values are constructed as follows: string searchString = string.Join(" ", myList.ToArray()); // remove any numbers and return complete words MatcCollection strMatch = Regex.Matches(searchString, #"[^\W\d]+"); var values = strMatch.Cast<Group>().Select(g => g.Value).ToArray(); var combinations = values.Permutations(); Now, that i have the array i need, i call the Permutations method below: public static IEnumerable<IEnumerable<T>> Permutations<T>(this IEnumerable<T> source) { if (source == null) throw new ArgumentException("source"); return permutations(source.ToArray()); } the permutations method is: private static IEnumerable<IEnumerable<T>> permutations<T>(IEnumerable<T> source) { var c = source.Count(); if (c == 1) yield return source; else for (int i = 0; i < c; i++) foreach (var p in permutations(source.Take(i).Concat(source.Skip(i + 1)))) yield return source.Skip(i).Take(1).Concat(p); } With a example list of 7 items {one,two,three,four,five,six,seven} this code returns numerous list of 7 elements in lenght. What I need to create is the following: First iteration: return result = one Second iteration return result = one + ' ' + two so on and so I got the above exmple code from a post on SO, so don't know how to change it properly to get what i need.
So do I get right that not only you want all permutations of the 7 items, but also any subsets of them enumerated (something like all combinations)? I guess the simplest way to get that behaviour would be adding some sort of length-parameter to the permutations method: private static IEnumerable<IEnumerable<T>> permutations<T>(IEnumerable<T> source, int length) { var c = source.Count(); if (length == 1 || c == 1) foreach(var x in source) yield return new T[] { x }; else for (int i = 0; i < c; i++) foreach (var p in permutations(source.Take(i).Concat(source.Skip(i + 1)), length - 1)) yield return source.Skip(i).Take(1).Concat(p); } and then calling this method with parameters from 1 to n: public static IEnumerable<IEnumerable<T>> Permutations<T>(this IEnumerable<T> source) { if (source == null) throw new ArgumentException("source"); var src = source.ToArray(); for (int i = 1; i <= src.Length; i++) foreach (var result in permutations(src, i)) yield return result; } Hope I didn't make any typos...
What does ParallelQuerys Count count?
I'm testing a self written element generator (ICollection<string>) and compare the calculated count to the actual count to get an idea if there's an error or not in my algorithm. As this generator can generate lots of elements on demand I'm looking in Partitioner<string> and I have implemented a basic one which seems to also produce valid enumerators which together give the same amount of strings as calculated. Now I want to test how this behaves if run parallel (again first testing for correct count): MyGenerator generator = new MyGenerator(); MyPartitioner partitioner = new MyPartitioner(generator); int isCount = partitioner.AsParallel().Count(); int shouldCount = generator.Count; bool same = isCount == shouldCount; // false I don't get why this count is not equal! What is the ParallelQuery<string> doing? generator.Count() == generator.Count // true partitioner.GetPartitions(xyz).Select(enumerator => { int count = 0; while (enumerator.MoveNext()) { count++; } return count; }).Sum() == generator.Count // true So, I'm currently not seeing an error in my code. Next I tried to manualy count that ParallelQuery<string>: int count = 0; partitioner.AsParallel().ForAll(e => Interlocked.Increment(ref count)); count == generator.Count // true Summed up: Everyone counts my enumerable correct, ParallelQuery.ForAll enumerates exactly generator.Count elements. But what does ParallelQuery.Count()? If the correct count is something about 10k, ParallelQuery sees 40k. internal sealed class PartialWordEnumerator : IEnumerator<string> { private object sync = new object(); private readonly IEnumerable<char> characters; private readonly char[] limit; private char[] buffer; private IEnumerator<char>[] enumerators; private int position = 0; internal PartialWordEnumerator(IEnumerable<char> characters, char[] state, char[] limit) { this.characters = new List<char>(characters); this.buffer = (char[])state.Clone(); if (limit != null) { this.limit = (char[])limit.Clone(); } this.enumerators = new IEnumerator<char>[this.buffer.Length]; for (int i = 0; i < this.buffer.Length; i++) { this.enumerators[i] = SkipTo(state[i]); } } private IEnumerator<char> SkipTo(char c) { IEnumerator<char> first = this.characters.GetEnumerator(); IEnumerator<char> second = this.characters.GetEnumerator(); while (second.MoveNext()) { if (second.Current == c) { return first; } first.MoveNext(); } throw new InvalidOperationException(); } private bool ReachedLimit { get { if (this.limit == null) { return false; } for (int i = 0; i < this.buffer.Length; i++) { if (this.buffer[i] != this.limit[i]) { return false; } } return true; } } public string Current { get { if (this.buffer == null) { throw new ObjectDisposedException(typeof(PartialWordEnumerator).FullName); } return new string(this.buffer); } } object IEnumerator.Current { get { return this.Current; } } public bool MoveNext() { lock (this.sync) { if (this.position == this.buffer.Length) { this.position--; } if (this.position == -1) { return false; } IEnumerator<char> enumerator = this.enumerators[this.position]; if (enumerator.MoveNext()) { this.buffer[this.position] = enumerator.Current; this.position++; if (this.position == this.buffer.Length) { return !this.ReachedLimit; } else { return this.MoveNext(); } } else { this.enumerators[this.position] = this.characters.GetEnumerator(); this.position--; return this.MoveNext(); } } } public void Dispose() { this.position = -1; this.buffer = null; } public void Reset() { throw new NotSupportedException(); } } public override IList<IEnumerator<string>> GetPartitions(int partitionCount) { IEnumerator<string>[] enumerators = new IEnumerator<string>[partitionCount]; List<char> characters = new List<char>(this.generator.Characters); int length = this.generator.Length; int characterCount = this.generator.Characters.Count; int steps = Math.Min(characterCount, partitionCount); int skip = characterCount / steps; for (int i = 0; i < steps; i++) { char c = characters[i * skip]; char[] state = new string(c, length).ToCharArray(); char[] limit = null; if ((i + 1) * skip < characterCount) { c = characters[(i + 1) * skip]; limit = new string(c, length).ToCharArray(); } if (i == steps - 1) { limit = null; } enumerators[i] = new PartialWordEnumerator(characters, state, limit); } for (int i = steps; i < partitionCount; i++) { enumerators[i] = Enumerable.Empty<string>().GetEnumerator(); } return enumerators; }
EDIT: I believe I have found the solution. According to the documentation on IEnumerable.MoveNext (emphasis mine): If MoveNext passes the end of the collection, the enumerator is positioned after the last element in the collection and MoveNext returns false. When the enumerator is at this position, subsequent calls to MoveNext also return false until Reset is called. According to the following logic: private bool ReachedLimit { get { if (this.limit == null) { return false; } for (int i = 0; i < this.buffer.Length; i++) { if (this.buffer[i] != this.limit[i]) { return false; } } return true; } } The call to MoveNext() will return false only one time - when the buffer is exactly equal to the limit. Once you have passed the limit, the return value from ReachedLimit will start to become false again, making return !this.ReachedLimit return true, so the enumerator will continue past the end of the limit all the way until it runs out of characters to enumerate. Apparently, in the implementation of ParallelQuery.Count(), MoveNext() is called multiple times when it has reached the end, and since it starts to return a true value again, the enumerator happily continues returning more elements (this is not the case in your custom code that walks the enumerator manually, and apparently also is not the case for the ForAll call, so they "accidentally" return the correct results). The simplest fix to this is to remember the return value from MoveNext() once it becomes false: private bool _canMoveNext = true; public bool MoveNext() { if (!_canMoveNext) return false; ... if (this.position == this.buffer.Length) { if (this.ReachedLimit) _canMoveNext = false; ... } Now once it begins returning false, it will return false for every future call and this returns the correct result from AsParallel().Count(). Hope this helps! The documentation on Partitioner notes (emphasis mine): The static methods on Partitioner are all thread-safe and may be used concurrently from multiple threads. However, while a created partitioner is in use, the underlying data source should not be modified, whether from the same thread that is using a partitioner or from a separate thread. From what I can understand of the code you have given, it would seem that ParallelQuery.Count() is most likely to have thread-safety issues because it may possibly be iterating multiple enumerators at the same time, whereas all the other solutions would require the enumerators to be run synchronized. Without seeing the code you are using for MyGenerator and MyPartitioner is it difficult to determine if thread-safety issues could be the culprit. To demonstrate, I have written a simple enumerator that returns the first hundred numbers as strings. Also, I have a partitioner, that distributes the elements in the underlying enumerator over a collection of numPartitions separate lists. Using all the methods you described above on our 12-core server (when I output numPartitions, it uses 12 by default on this machine), I get the expected result of 100 (this is LINQPad-ready code): void Main() { var partitioner = new SimplePartitioner(GetEnumerator()); GetEnumerator().Count().Dump(); partitioner.GetPartitions(10).Select(enumerator => { int count = 0; while (enumerator.MoveNext()) { count++; } return count; }).Sum().Dump(); var theCount = 0; partitioner.AsParallel().ForAll(e => Interlocked.Increment(ref theCount)); theCount.Dump(); partitioner.AsParallel().Count().Dump(); } // Define other methods and classes here public IEnumerable<string> GetEnumerator() { for (var i = 1; i <= 100; i++) yield return i.ToString(); } public class SimplePartitioner : Partitioner<string> { private IEnumerable<string> input; public SimplePartitioner(IEnumerable<string> input) { this.input = input; } public override IList<IEnumerator<string>> GetPartitions(int numPartitions) { var list = new List<string>[numPartitions]; for (var i = 0; i < numPartitions; i++) list[i] = new List<string>(); var index = 0; foreach (var s in input) list[(index = (index + 1) % numPartitions)].Add(s); IList<IEnumerator<string>> result = new List<IEnumerator<string>>(); foreach (var l in list) result.Add(l.GetEnumerator()); return result; } } Output: 100 100 100 100 This clearly works. Without more information it is impossible to tell you what is not working in your particular implementation.
Obtain the Sum of an IEnumerable collection in only one LINQ expression
Let's suppose I have an inifite generator A(). What I want is to obtain the sum of all the numbers returned by A such that the sum does not exceed a value N in only one LINQ expression. I'm wondering if there is an extension method that will help me with that? The classic way would be: int sum = 0; foreach (int x in A()) { sum += x; if (sum > N) { break; } } return sum; but I've been thinking about how to do it in only one expression without success...
Using standard idiomatic LINQ, this would be impossible. The semantics you would need is a combination of Aggregate() and TakeWhile(). Otherwise, you'd need to have side-effects which is a no-no in LINQ. Here's an example of one way to do it with side-effects: var temp = 0; var sum = A().TakeWhile(i => { var res = !(temp > N); temp += i; return res; }).Sum();
If A is an infinite generator then there's no way of doing this in a single statement using only the built-in LINQ methods. To do it cleanly, in a single statement, with no side-effects, you'd probably need to use some sort of Scan method to compute the prefix sum of the input sequence. Then you just need the first element greater than N. Easy! int sum = A().Scan((s, x) => s + x).First(s => s > N); // ... public static class EnumerableExtensions { public static IEnumerable<T> Scan<T>( this IEnumerable<T> source, Func<T, T, T> func) { if (source == null) throw new ArgumentNullException("source"); if (func == null) throw new ArgumentNullException("func"); using (var e = source.GetEnumerator()) { if (e.MoveNext()) { T accumulator = e.Current; yield return accumulator; while (e.MoveNext()) { accumulator = func(accumulator, e.Current); yield return accumulator; } } } } }
Sure there is a way to do this with a single LINQ-expression. The simplest I could come up with and still have some generality and elegance is: public static int SumWhile(this IEnumerable<int> collection, Func<int, bool> condition) { int sum = 0; foreach (int i in collection) { sum += i; if (!condition(sum)) break; } return sum; } which can be called like: int sum = A().SumWhile(i => i <= N); Yeah, just a single LINQ-expression! Have fun with it
Probably the most near your initial idea : int sum = 0; int limit = 500; A().TakeWhile(i => (sum += i) < limit).Count(); //Now the variable named sum contains the smaller sum of elements being >= limit The Count() isn't used for its returning value but to force the actual enumerating.
Let's see if I have the requirements correct. A() is an infinite generator. By definition, then, it generates values (in this case integers) forever. You want to look for all of the values that are less than N, and add them together. Linq isn't the issue. You won't be done adding until A() is finished generating... and that never happens. BTW, the code you posted doesn't all up all of the values less than N... it adds up all the values until it finds one less than N, and then it quits looking. Is that what you meant?
I believe the horrific code below satisfies your requirements. :-) using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; namespace ConsoleApplication12 { public class Program { public static void Main(string[] args) { const int N=100; int sum; try { sum=A().Aggregate((self, next) => { if(self+next<=N) return self+next; else throw new ResultException(self); }); } catch(ResultException re) { sum=re.Value; } Debug.Print("Sum="+sum); } private class ResultException : Exception { public readonly int Value; public ResultException(int value) { Value=value; } } private static IEnumerable<int> A() { var i=0; while(true) { yield return i++; } } } }
int sum = A().Where(x => x < N).Sum();