c# faster n-ary cartesian product for - c#

I have searched around for a way to find the product of multiple lists; I have used the popular answer which uses Aggregate+SelectMany. The trouble is that my example runs very slow: I have 4 lists, with 3K entries each and I need to enumerate each possible combinations.
Does anyone know a faster way in C#?
I made a fiddle here, which currently runs out of memory.
Following is the code of fiddle link
public static void Main()
{
var sources = new[]
{
Enumerable.Range(1, 3000),
Enumerable.Range(1, 3000),
Enumerable.Range(1, 3000),
Enumerable.Range(1, 3000),
};
var sw = new System.Diagnostics.Stopwatch();
sw.Start();
Console.Write("linq way...");
foreach(var l in NCartesian(sources))
{
// just enumerate
}
Console.WriteLine("{0}ms", sw.ElapsedMilliseconds);
}
public static IEnumerable<IEnumerable<T>> NCartesian<T>(
IEnumerable<IEnumerable<T>> sequences)
{
if (sequences == null)
{
return null;
}
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>()
};
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) => accumulator.SelectMany(
accseq => sequence,
(accseq, item) => accseq.Concat(new[] { item })));
}

I made one which has less memory usage than the above, still slow though:
public static IEnumerable<IEnumerable<T>> NCartesian<T>(
IEnumerable<IEnumerable<T>> sequences)
{
if (sequences == null)
{
throw new ArgumentNullException(nameof(sequences));
}
var enumerators = new List<IEnumerator<T>>();
foreach (IEnumerator<T> enumerator in sequences
.Select(s => s.GetEnumerator()))
{
enumerator.MoveNext(); // move to the first position
enumerators.Add(enumerator);
}
bool done = false;
while (!done)
{
IList<T> result = enumerators.Select(e => e.Current).ToList();
yield return result;
for (int idx = enumerators.Count - 1; idx >= 0; idx--)
{
bool hasNext = enumerators[idx].MoveNext();
if (hasNext)
{
break;
}
if (idx == 0)
{
// the first enumerator is done
done = true;
break;
}
enumerators[idx].Reset();
enumerators[idx].MoveNext();
}
}
}

Related

Use Linq to break a list by special values?

I'm trying to use Linq to convert IEnumerable<int> to IEnumerable<List<int>> - the input stream will be separated by special value 0.
IEnumerable<List<int>> Parse(IEnumerable<int> l)
{
l.Select(x => {
.....; //?
return new List<int>();
});
}
var l = new List<int> {0,1,3,5,0,3,4,0,1,4,0};
Parse(l) // returns {{1,3,5}, {3, 4}, {1,4}}
How to implement it using Linq instead of imperative looping?
Or is Linq not good for this requirement because the logic depends on the order of the input stream?
Simple loop would be good option.
Alternatives:
Enumerable.Aggregate and start new list on 0
Write own extension similar to Create batches in linq or Use LINQ to group a sequence of numbers with no gaps
Aggregate sample
var result = list.Aggregate(new List<List<int>>(),
(sum,current) => {
if(current == 0)
sum.Add(new List<int>());
else
sum.Last().Add(current);
return sum;
});
Note: this is only sample of the approach working for given very friendly input like {0,1,2,0,3,4}.
One can even make aggregation into immutable lists but that will look insane with basic .Net types.
Here's an answer that lazily enumerates the source enumerable, but eagerly enumerates the contents of each returned list between zeroes. It properly throws upon null input or upon being given a list that does not start with a zero (though allowing an empty list through--that's really an implementation detail you have to decide on). It does not return an extra and empty list at the end like at least one other answer's possible suggestions does.
public static IEnumerable<List<int>> Parse(this IEnumerable<int> source, int splitValue = 0) {
if (source == null) {
throw new ArgumentNullException(nameof (source));
}
using (var enumerator = source.GetEnumerator()) {
if (!enumerator.MoveNext()) {
return Enumerable.Empty<List<int>>();
}
if (enumerator.Current != splitValue) {
throw new ArgumentException(nameof (source), $"Source enumerable must begin with a {splitValue}.");
}
return ParseImpl(enumerator, splitValue);
}
}
private static IEnumerable<List<int>> ParseImpl(IEnumerator<int> enumerator, int splitValue) {
var list = new List<int>();
while (enumerator.MoveNext()) {
if (enumerator.Current == splitValue) {
yield return list;
list = new List<int>();
}
else {
list.Add(enumerator.Current);
}
}
if (list.Any()) {
yield return list;
}
}
This could easily be adapted to be generic instead of int, just change Parse to Parse<T>, change int to T everywhere, and use a.Equals(b) or !a.Equals(b) instead of a == b or a != b.
You could create an extension method like this:
public static IEnumerable<IEnumerable<T>> SplitBy<T>(this IEnumerable<T> source, T value)
{
using (var e = source.GetEnumerator())
{
if (e.MoveNext())
{
var list = new List<T> { };
//In case the source doesn't start with 0
if (!e.Current.Equals(value))
{
list.Add(e.Current);
}
while (e.MoveNext())
{
if ( !e.Current.Equals(value))
{
list.Add(e.Current);
}
else
{
yield return list;
list = new List<T> { };
}
}
//In case the source doesn't end with 0
if (list.Count>0)
{
yield return list;
}
}
}
}
Then, you can do the following:
var l = new List<int> { 0, 1, 3, 5, 0, 3, 4, 0, 1, 4, 0 };
var result = l.SplitBy(0);
You could use GroupBy with a counter.
var list = new List<int> {0,1,3,5,0,3,4,0,1,4,0};
int counter = 0;
var result = list.GroupBy(x => x==0 ? counter++ : counter)
.Select(g => g.TakeWhile(x => x!=0).ToList())
.Where(l => l.Any());
Edited to fix possibility of zeroes within numbers
Here is a semi-LINQ solution:
var l = new List<int> {0,1,3,5,0,3,4,0,1,4,0};
string
.Join(",", l.Select(x => x == 0 ? "|" : x.ToString()))
.Split(new[] { '|' }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries));
This is probably not preferable to using a loop due to performance and other reasons, but it should work.

How to select last value from each run of similar items?

I have a list. I'd like to take the last value from each run of similar elements.
What do I mean? Let me give a simple example. Given the list of words
['golf', 'hip', 'hop', 'hotel', 'grass', 'world', 'wee']
And the similarity function 'starting with the same letter', the function would return the shorter list
['golf', 'hotel', 'grass', 'wee']
Why? The original list has a 1-run of G words, a 3-run of H words, a 1-run of G words, and a 2-run of W words. The function returns the last word from each run.
How can I do this?
Hypothetical C# syntax (in reality I'm working with customer objects but I wanted to share something you could run and test yourself)
> var words = new List<string>{"golf", "hip", "hop", "hotel", "grass", "world", "wee"};
> words.LastDistinct(x => x[0])
["golf", "hotel", "grass", "wee"]
Edit: I tried .GroupBy(x => x[0]).Select(g => g.Last()) but that gives ['grass',
'hotel', 'wee'] which is not what I want. Read the example carefully.
Edit. Another example.
['apples', 'armies', 'black', 'beer', 'bastion', 'cat', 'cart', 'able', 'art', 'bark']
Here there are 5 runs (a run of A's, a run of B's, a run of C's, a new run of A's, a new run of B's). The last word from each run would be:
['armies', 'bastion', 'cart', 'art', 'bark']
The important thing to understand is that each run is independent. Don't mix-up the run of A's at the start with the run of A's near the end.
There's nothing too complicated with just doing it the old-fashioned way:
Func<string, object> groupingFunction = s => s.Substring(0, 1);
IEnumerable<string> input = new List<string>() {"golf", "hip", "..." };
var output = new List<string>();
if (!input.Any())
{
return output;
}
var lastItem = input.First();
var lastKey = groupingFunction(lastItem);
foreach (var currentItem in input.Skip(1))
{
var currentKey = groupingFunction(str);
if (!currentKey.Equals(lastKey))
{
output.Add(lastItem);
}
lastKey = currentKey;
lastItem = currentItem;
}
output.Add(lastItem);
You could also turn this into a generic extension method as Tim Schmelter has done; I have already taken a couple steps to generalize the code on purpose (using object as the key type and IEnumerable<T> as the input type).
You could use this extension that can group by adjacent/consecutive elements:
public static IEnumerable<IGrouping<TKey, TSource>> GroupAdjacent<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
{
TKey last = default(TKey);
bool haveLast = false;
List<TSource> list = new List<TSource>();
foreach (TSource s in source)
{
TKey k = keySelector(s);
if (haveLast)
{
if (!k.Equals(last))
{
yield return new GroupOfAdjacent<TSource, TKey>(list, last);
list = new List<TSource>();
list.Add(s);
last = k;
}
else
{
list.Add(s);
last = k;
}
}
else
{
list.Add(s);
last = k;
haveLast = true;
}
}
if (haveLast)
yield return new GroupOfAdjacent<TSource, TKey>(list, last);
}
public class GroupOfAdjacent<TSource, TKey> : IEnumerable<TSource>, IGrouping<TKey, TSource>
{
public TKey Key { get; set; }
private List<TSource> GroupList { get; set; }
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return ((System.Collections.Generic.IEnumerable<TSource>)this).GetEnumerator();
}
System.Collections.Generic.IEnumerator<TSource> System.Collections.Generic.IEnumerable<TSource>.GetEnumerator()
{
foreach (var s in GroupList)
yield return s;
}
public GroupOfAdjacent(List<TSource> source, TKey key)
{
GroupList = source;
Key = key;
}
}
Then it's easy:
var words = new List<string>{"golf", "hip", "hop", "hotel", "grass", "world", "wee"};
IEnumerable<string> lastWordOfConsecutiveFirstCharGroups = words
.GroupAdjacent(str => str[0])
.Select(g => g.Last());
Output:
string.Join(",", lastWordOfConsecutiveFirstCharGroups); // golf,hotel,grass,wee
Your other sample:
words=new List<string>{"apples", "armies", "black", "beer", "bastion", "cat", "cart", "able", "art", "bark"};
lastWordOfConsecutiveFirstCharGroups = words
.GroupAdjacent(str => str[0])
.Select(g => g.Last());
Output:
string.Join(",", lastWordOfConsecutiveFirstCharGroups); // armies,bastion,cart,art,bark
Demonstration
Try this algoritm
var words = new List<string> { "golf", "hip", "hop", "hotel", "grass", "world", "wee" };
var newList = new List<string>();
int i = 0;
while (i < words.Count - 1 && i <= words.Count)
{
if (words[i][0] != words[i+1][0])
{
newList.Add(words[i]);
i++;
}
else
{
var j = i;
while ( j < words.Count - 1 && words[j][0] == words[j + 1][0])
{
j++;
}
newList.Add(words[j]);
i = j+1;
}
}
You can use following extension method to split your sequence into groups (i.e. sub-sequnces) by some condition:
public static IEnumerable<IEnumerable<T>> Split<T, TKey>(
this IEnumerable<T> source, Func<T, TKey> keySelector)
{
var group = new List<T>();
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
yield break;
else
{
TKey currentKey = keySelector(iterator.Current);
var keyComparer = Comparer<TKey>.Default;
group.Add(iterator.Current);
while (iterator.MoveNext())
{
var key = keySelector(iterator.Current);
if (keyComparer.Compare(currentKey, key) != 0)
{
yield return group;
currentKey = key;
group = new List<T>();
}
group.Add(iterator.Current);
}
}
}
if (group.Any())
yield return group;
}
And getting your expected results looks like:
string[] words = { "golf", "hip", "hop", "hotel", "grass", "world", "wee" };
var result = words.Split(w => w[0])
.Select(g => g.Last());
Result:
golf
hotel
grass
wee
Because your input is a List<>, so I think this should work for you with an acceptable performance and especially it's very concise:
var result = words.Where((x, i) => i == words.Count - 1 ||
words[i][0] != words[i + 1][0]);
You can append ToList() on the result to get a List<string> if you want.
I went with
/// <summary>
/// Given a list, return the last value from each run of similar items.
/// </summary>
public static IEnumerable<T> WithoutDuplicates<T>(this IEnumerable<T> source, Func<T, T, bool> similar)
{
Contract.Requires(source != null);
Contract.Requires(similar != null);
Contract.Ensures(Contract.Result<IEnumerable<T>>().Count() <= source.Count(), "Result should be at most as long as original list");
T last = default(T);
bool first = true;
foreach (var item in source)
{
if (!first && !similar(item, last))
yield return last;
last = item;
first = false;
}
if (!first)
yield return last;
}

How to flatten tree via LINQ?

So I have simple tree:
class MyNode
{
public MyNode Parent;
public IEnumerable<MyNode> Elements;
int group = 1;
}
I have a IEnumerable<MyNode>. I want to get a list of all MyNode (including inner node objects (Elements)) as one flat list Where group == 1. How to do such thing via LINQ?
You can flatten a tree like this:
IEnumerable<MyNode> Flatten(IEnumerable<MyNode> e) =>
e.SelectMany(c => Flatten(c.Elements)).Concat(new[] { e });
You can then filter by group using Where(...).
To earn some "points for style", convert Flatten to an extension function in a static class.
public static IEnumerable<MyNode> Flatten(this IEnumerable<MyNode> e) =>
e.SelectMany(c => c.Elements.Flatten()).Concat(e);
To earn more points for "even better style", convert Flatten to a generic extension method that takes a tree and a function that produces descendants from a node:
public static IEnumerable<T> Flatten<T>(
this IEnumerable<T> e
, Func<T,IEnumerable<T>> f
) => e.SelectMany(c => f(c).Flatten(f)).Concat(e);
Call this function like this:
IEnumerable<MyNode> tree = ....
var res = tree.Flatten(node => node.Elements);
If you would prefer flattening in pre-order rather than in post-order, switch around the sides of the Concat(...).
The problem with the accepted answer is that it is inefficient if the tree is deep. If the tree is very deep then it blows the stack. You can solve the problem by using an explicit stack:
public static IEnumerable<MyNode> Traverse(this MyNode root)
{
var stack = new Stack<MyNode>();
stack.Push(root);
while(stack.Count > 0)
{
var current = stack.Pop();
yield return current;
foreach(var child in current.Elements)
stack.Push(child);
}
}
Assuming n nodes in a tree of height h and a branching factor considerably less than n, this method is O(1) in stack space, O(h) in heap space and O(n) in time. The other algorithm given is O(h) in stack, O(1) in heap and O(nh) in time. If the branching factor is small compared to n then h is between O(lg n) and O(n), which illustrates that the naïve algorithm can use a dangerous amount of stack and a large amount of time if h is close to n.
Now that we have a traversal, your query is straightforward:
root.Traverse().Where(item=>item.group == 1);
Just for completeness, here is the combination of the answers from dasblinkenlight and Eric Lippert. Unit tested and everything. :-)
public static IEnumerable<T> Flatten<T>(
this IEnumerable<T> items,
Func<T, IEnumerable<T>> getChildren)
{
var stack = new Stack<T>();
foreach(var item in items)
stack.Push(item);
while(stack.Count > 0)
{
var current = stack.Pop();
yield return current;
var children = getChildren(current);
if (children == null) continue;
foreach (var child in children)
stack.Push(child);
}
}
Update:
For people interested in level of nesting (depth). One of the good things about explicit enumerator stack implementation is that at any moment (and in particular when yielding the element) the stack.Count represents the currently processing depth. So taking this into account and utilizing the C#7.0 value tuples, we can simply change the method declaration as follows:
public static IEnumerable<(T Item, int Level)> ExpandWithLevel<T>(
this IEnumerable<T> source, Func<T, IEnumerable<T>> elementSelector)
and yield statement:
yield return (item, stack.Count);
Then we can implement the original method by applying simple Select on the above:
public static IEnumerable<T> Expand<T>(
this IEnumerable<T> source, Func<T, IEnumerable<T>> elementSelector) =>
source.ExpandWithLevel(elementSelector).Select(e => e.Item);
Original:
Surprisingly no one (even Eric) showed the "natural" iterative port of a recursive pre-order DFT, so here it is:
public static IEnumerable<T> Expand<T>(
this IEnumerable<T> source, Func<T, IEnumerable<T>> elementSelector)
{
var stack = new Stack<IEnumerator<T>>();
var e = source.GetEnumerator();
try
{
while (true)
{
while (e.MoveNext())
{
var item = e.Current;
yield return item;
var elements = elementSelector(item);
if (elements == null) continue;
stack.Push(e);
e = elements.GetEnumerator();
}
if (stack.Count == 0) break;
e.Dispose();
e = stack.Pop();
}
}
finally
{
e.Dispose();
while (stack.Count != 0) stack.Pop().Dispose();
}
}
I found some small issues with the answers given here:
What if the initial list of items is null?
What if there is a null value in the list of children?
Built on the previous answers and came up with the following:
public static class IEnumerableExtensions
{
public static IEnumerable<T> Flatten<T>(
this IEnumerable<T> items,
Func<T, IEnumerable<T>> getChildren)
{
if (items == null)
yield break;
var stack = new Stack<T>(items);
while (stack.Count > 0)
{
var current = stack.Pop();
yield return current;
if (current == null) continue;
var children = getChildren(current);
if (children == null) continue;
foreach (var child in children)
stack.Push(child);
}
}
}
And the unit tests:
[TestClass]
public class IEnumerableExtensionsTests
{
[TestMethod]
public void NullList()
{
IEnumerable<Test> items = null;
var flattened = items.Flatten(i => i.Children);
Assert.AreEqual(0, flattened.Count());
}
[TestMethod]
public void EmptyList()
{
var items = new Test[0];
var flattened = items.Flatten(i => i.Children);
Assert.AreEqual(0, flattened.Count());
}
[TestMethod]
public void OneItem()
{
var items = new[] { new Test() };
var flattened = items.Flatten(i => i.Children);
Assert.AreEqual(1, flattened.Count());
}
[TestMethod]
public void OneItemWithChild()
{
var items = new[] { new Test { Id = 1, Children = new[] { new Test { Id = 2 } } } };
var flattened = items.Flatten(i => i.Children);
Assert.AreEqual(2, flattened.Count());
Assert.IsTrue(flattened.Any(i => i.Id == 1));
Assert.IsTrue(flattened.Any(i => i.Id == 2));
}
[TestMethod]
public void OneItemWithNullChild()
{
var items = new[] { new Test { Id = 1, Children = new Test[] { null } } };
var flattened = items.Flatten(i => i.Children);
Assert.AreEqual(2, flattened.Count());
Assert.IsTrue(flattened.Any(i => i.Id == 1));
Assert.IsTrue(flattened.Any(i => i == null));
}
class Test
{
public int Id { get; set; }
public IEnumerable<Test> Children { get; set; }
}
}
Most of the answers presented here are producing depth-first or zig-zag sequences. For example starting with the tree below:
1 2
/ \ / \
/ \ / \
/ \ / \
/ \ / \
11 12 21 22
/ \ / \ / \ / \
/ \ / \ / \ / \
111 112 121 122 211 212 221 222
Sergey Kalinichenko's answer produces this flattened sequence:
111, 112, 121, 122, 11, 12, 211, 212, 221, 222, 21, 22, 1, 2
Konamiman's answer (that generalizes Eric Lippert's answer) produces this flattened sequence:
2, 22, 222, 221, 21, 212, 211, 1, 12, 122, 121, 11, 112, 111
Ivan Stoev's answer produces this flattened sequence:
1, 11, 111, 112, 12, 121, 122, 2, 21, 211, 212, 22, 221, 222
If you are interested in a breadth-first sequence like this:
1, 2, 11, 12, 21, 22, 111, 112, 121, 122, 211, 212, 221, 222
...then this is the solution for you:
public static IEnumerable<T> Flatten<T>(this IEnumerable<T> source,
Func<T, IEnumerable<T>> childrenSelector)
{
var queue = new Queue<T>(source);
while (queue.Count > 0)
{
var current = queue.Dequeue();
yield return current;
var children = childrenSelector(current);
if (children == null) continue;
foreach (var child in children) queue.Enqueue(child);
}
}
The difference in the implementation is basically using a Queue instead of a Stack. No actual sorting is taking place.
Caution: this implementation is far from optimal regarding memory-efficiency, since a large percentage of the total number of elements will end up being stored in the internal queue during the enumeration. Stack-based tree-traversals are much more efficient regarding memory usage than Queue-based implementations.
In case anyone else finds this, but also needs to know the level after they've flattened the tree, this expands on Konamiman's combination of dasblinkenlight and Eric Lippert's solutions:
public static IEnumerable<Tuple<T, int>> FlattenWithLevel<T>(
this IEnumerable<T> items,
Func<T, IEnumerable<T>> getChilds)
{
var stack = new Stack<Tuple<T, int>>();
foreach (var item in items)
stack.Push(new Tuple<T, int>(item, 1));
while (stack.Count > 0)
{
var current = stack.Pop();
yield return current;
foreach (var child in getChilds(current.Item1))
stack.Push(new Tuple<T, int>(child, current.Item2 + 1));
}
}
A really other option is to have a proper OO design.
e.g. ask the MyNode to return all flatten.
Like this:
class MyNode
{
public MyNode Parent;
public IEnumerable<MyNode> Elements;
int group = 1;
public IEnumerable<MyNode> GetAllNodes()
{
if (Elements == null)
{
return Enumerable.Empty<MyNode>();
}
return Elements.SelectMany(e => e.GetAllNodes());
}
}
Now you could ask the top level MyNode to get all the nodes.
var flatten = topNode.GetAllNodes();
If you can't edit the class, then this isn't an option. But otherwise, I think this is could be preferred of a separate (recursive) LINQ method.
This is using LINQ, So I think this answer is applicable here ;)
Combining Dave's and Ivan Stoev's answer in case you need the level of nesting and the list flattened "in order" and not reversed like in the answer given by Konamiman.
public static class HierarchicalEnumerableUtils
{
private static IEnumerable<Tuple<T, int>> ToLeveled<T>(this IEnumerable<T> source, int level)
{
if (source == null)
{
return null;
}
else
{
return source.Select(item => new Tuple<T, int>(item, level));
}
}
public static IEnumerable<Tuple<T, int>> FlattenWithLevel<T>(this IEnumerable<T> source, Func<T, IEnumerable<T>> elementSelector)
{
var stack = new Stack<IEnumerator<Tuple<T, int>>>();
var leveledSource = source.ToLeveled(0);
var e = leveledSource.GetEnumerator();
try
{
while (true)
{
while (e.MoveNext())
{
var item = e.Current;
yield return item;
var elements = elementSelector(item.Item1).ToLeveled(item.Item2 + 1);
if (elements == null) continue;
stack.Push(e);
e = elements.GetEnumerator();
}
if (stack.Count == 0) break;
e.Dispose();
e = stack.Pop();
}
}
finally
{
e.Dispose();
while (stack.Count != 0) stack.Pop().Dispose();
}
}
}
Here some ready to use implementation using Queue and returning the Flatten tree me first and then my children.
public static IEnumerable<T> Flatten<T>(this IEnumerable<T> items,
Func<T,IEnumerable<T>> getChildren)
{
if (items == null)
yield break;
var queue = new Queue<T>();
foreach (var item in items) {
if (item == null)
continue;
queue.Enqueue(item);
while (queue.Count > 0) {
var current = queue.Dequeue();
yield return current;
if (current == null)
continue;
var children = getChildren(current);
if (children == null)
continue;
foreach (var child in children)
queue.Enqueue(child);
}
}
}
void Main()
{
var allNodes = GetTreeNodes().Flatten(x => x.Elements);
allNodes.Dump();
}
public static class ExtensionMethods
{
public static IEnumerable<T> Flatten<T>(this IEnumerable<T> source, Func<T, IEnumerable<T>> childrenSelector = null)
{
if (source == null)
{
return new List<T>();
}
var list = source;
if (childrenSelector != null)
{
foreach (var item in source)
{
list = list.Concat(childrenSelector(item).Flatten(childrenSelector));
}
}
return list;
}
}
IEnumerable<MyNode> GetTreeNodes() {
return new[] {
new MyNode { Elements = new[] { new MyNode() }},
new MyNode { Elements = new[] { new MyNode(), new MyNode(), new MyNode() }}
};
}
class MyNode
{
public MyNode Parent;
public IEnumerable<MyNode> Elements;
int group = 1;
}
Building on Konamiman's answer, and the comment that the ordering is unexpected, here's a version with an explicit sort param:
public static IEnumerable<T> TraverseAndFlatten<T, V>(this IEnumerable<T> items, Func<T, IEnumerable<T>> nested, Func<T, V> orderBy)
{
var stack = new Stack<T>();
foreach (var item in items.OrderBy(orderBy))
stack.Push(item);
while (stack.Count > 0)
{
var current = stack.Pop();
yield return current;
var children = nested(current).OrderBy(orderBy);
if (children == null) continue;
foreach (var child in children)
stack.Push(child);
}
}
And a sample usage:
var flattened = doc.TraverseAndFlatten(x => x.DependentDocuments, y => y.Document.DocDated).ToList();
Below is Ivan Stoev's code with the additonal feature of telling the index of every object in the path. E.g. search for "Item_120":
Item_0--Item_00
Item_01
Item_1--Item_10
Item_11
Item_12--Item_120
would return the item and an int array [1,2,0]. Obviously, nesting level is also available, as length of the array.
public static IEnumerable<(T, int[])> Expand<T>(this IEnumerable<T> source, Func<T, IEnumerable<T>> getChildren) {
var stack = new Stack<IEnumerator<T>>();
var e = source.GetEnumerator();
List<int> indexes = new List<int>() { -1 };
try {
while (true) {
while (e.MoveNext()) {
var item = e.Current;
indexes[stack.Count]++;
yield return (item, indexes.Take(stack.Count + 1).ToArray());
var elements = getChildren(item);
if (elements == null) continue;
stack.Push(e);
e = elements.GetEnumerator();
if (indexes.Count == stack.Count)
indexes.Add(-1);
}
if (stack.Count == 0) break;
e.Dispose();
indexes[stack.Count] = -1;
e = stack.Pop();
}
} finally {
e.Dispose();
while (stack.Count != 0) stack.Pop().Dispose();
}
}
Every once in awhile I try to scratch at this problem and devise my own solution that supports arbitrarily deep structures (no recursion), performs breadth first traversal, and doesn't abuse too many LINQ queries or preemptively execute recursion on the children. After digging around in the .NET source and trying many solutions, I've finally come up with this solution. It ended up being very close to Ian Stoev's answer (whose answer I only saw just now), however mine doesn't utilize infinite loops or have unusual code flow.
public static IEnumerable<T> Traverse<T>(
this IEnumerable<T> source,
Func<T, IEnumerable<T>> fnRecurse)
{
if (source != null)
{
Stack<IEnumerator<T>> enumerators = new Stack<IEnumerator<T>>();
try
{
enumerators.Push(source.GetEnumerator());
while (enumerators.Count > 0)
{
var top = enumerators.Peek();
while (top.MoveNext())
{
yield return top.Current;
var children = fnRecurse(top.Current);
if (children != null)
{
top = children.GetEnumerator();
enumerators.Push(top);
}
}
enumerators.Pop().Dispose();
}
}
finally
{
while (enumerators.Count > 0)
enumerators.Pop().Dispose();
}
}
}
A working example can be found here.
Based on previous answer Pre-order flatten
public static IEnumerable<T> Flatten<T>(
this IEnumerable<T> e
, Func<T, IEnumerable<T>> f
) => e.Concat(e.SelectMany(c => f(c).Flatten(f)));

Is there a better way to use Lambda with groups of N?

I have the method Process(IEnumerable<Record> records) which can take UP TO but NO MORE THAN 3 records at a time. I have hundreds of records, so I need to pass in groups. I do this:
var _Records = Enumerable.Range(1, 16).ToArray();
for (int i = 0; i < int.MaxValue; i += 3)
{
var _ShortList = _Records.Skip(i).Take(3);
if (!_ShortList.Any())
break;
Process(_ShortList);
}
// TODO: finish
It works, but... is there a better way?
you can use MoreLinq's Batch
var result=Enumerable.Range(1, 16).Batch(3);
or
var arrayOfArrays = Enumerable.Range(1, 16).Batch(3).Select(x => x.ToArray()).ToArray();
And here is the source if you want to take a look at it.
You may use this extension method:
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> source, int chunkSize)
{
return source
.Select((value, i) => new { Index = i, Value = value })
.GroupBy(item => item.Index % chunkSize)
.Select(chunk => chunk.Select(item => item.Value));
}
It splits a source collection of items into several chunks with given size.
So your code will look next:
foreach (var chunk in Enumerable.Range(1, 16).Split(3))
{
Process(chunk);
}
Here's another LINQ-y way to do it:
var batchSize = 3;
Enumerable.Range(0, (_Records.Length - 1)/batchSize + 1)
.ToList()
.ForEach(i => Process(_Records.Skip(i * batchSize).Take(batchSize)));
In case you need "pagination" multiple times in your solution, you may consider using an extension method.
Hacked one together in LINQPad, it will do the trick.
public static class MyExtensions {
public static IEnumerable<IEnumerable<T>> Paginate<T>(this IEnumerable<T> source, int pageSize) {
T[] buffer = new T[pageSize];
int index = 0;
foreach (var item in source) {
buffer[index++] = item;
if (index >= pageSize) {
yield return buffer.Take(pageSize);
index = 0;
}
}
if (index > 0) {
yield return buffer.Take(index);
}
}
}
Basically, it pre-fills a buffer of size pageSize and yields it just when it's full. If there are < pageSize elements left, we yield them as well. So,
Enumerable.Range(1, 10).Paginate(3).Dump(); // Dump is a LINQPad extension
will yield
{{1, 2, 3}, {4, 5, 6}, {7, 8, 9}, {10}}
You can create your own extension method:
static class Extensions {
public static IEnumerable<IEnumerable<T>> ToBlocks<T>(this IEnumerable<T> source, int blockSize) {
var count = 0;
T[] block = null;
foreach (var item in source) {
if (block == null)
block = new T[blockSize];
block[count++] = item;
if (count == blockSize) {
yield return block;
block = null;
count = 0;
}
}
if (count > 0)
yield return block.Take(count);
}
}
public static void ChunkProcess<T>(IEnumerable<T> source, int size, Action<IEnumerable<T>> action)
{
var chunk = source.Take(size);
while (chunk.Any())
{
action(chunk);
source = source.Skip(size);
chunk = source.Take(size);
}
}
and your code would be
ChunkProcess(_Records, 3, Process);
var _Records = Enumerable.Range(1, 16).ToArray();
int index = 0;
foreach (var group in _Records.GroupBy(element => index++ / 3))
Process(group);
NOTE: The code above is short and relatively efficient, but is still not as efficient as it can be (it will essentially build a hashtable behind the scenes). A slightly more cumbersome, but faster way would be:
var _Records = Enumerable.Range(1, 16).ToArray();
var buff = new int[3];
int index = 0;
foreach (var element in _Records) {
if (index == buff.Length) {
Process(buff);
index = 0;
}
buff[index++] = element;
}
if (index > 0)
Process(buff.Take(index));
Or, pack it to a more reusable form:
public static class EnumerableEx {
public static void Paginate<T>(this IEnumerable<T> elements, int page_size, Action<IEnumerable<T>> process_page) {
var buff = new T[3];
int index = 0;
foreach (var element in elements) {
if (index == buff.Length) {
process_page(buff);
index = 0;
}
buff[index++] = element;
}
if (index > 0)
process_page(buff.Take(index));
}
}
// ...
var _Records = Enumerable.Range(1, 16).ToArray();
_Records.Paginate(3, Process);
This extension method is working properly.
public static class EnumerableExtentions
{
public static IEnumerable<IEnumerable<T>> Chunks<T>(this IEnumerable<T> items, int size)
{
return
items.Select((member, index) => new { Index = index, Value = member })
.GroupBy(item => (int)item.Index / size)
.Select(chunk => chunk.Select(item => item.Value));
}
}

Parallel.Foreach + yield return?

I want to process something using parallel loop like this :
public void FillLogs(IEnumerable<IComputer> computers)
{
Parallel.ForEach(computers, cpt=>
{
cpt.Logs = cpt.GetRawLogs().ToList();
});
}
Ok, it works fine. But How to do if I want the FillLogs method return an IEnumerable ?
public IEnumerable<IComputer> FillLogs(IEnumerable<IComputer> computers)
{
Parallel.ForEach(computers, cpt=>
{
cpt.Logs = cpt.GetRawLogs().ToList();
yield return cpt // KO, don't work
});
}
EDIT
It seems not to be possible... but I use something like this :
public IEnumerable<IComputer> FillLogs(IEnumerable<IComputer> computers)
{
return computers.AsParallel().Select(cpt => cpt);
}
But where I put the cpt.Logs = cpt.GetRawLogs().ToList(); instruction
Short version - no, that isn't possible via an iterator block; the longer version probably involves synchronized queue/dequeue between the caller's iterator thread (doing the dequeue) and the parallel workers (doing the enqueue); but as a side note - logs are usually IO-bound, and parallelising things that are IO-bound often doesn't work very well.
If the caller is going to take some time to consume each, then there may be some merit to an approach that only processes one log at a time, but can do that while the caller is consuming the previous log; i.e. it begins a Task for the next item before the yield, and waits for completion after the yield... but that is again, pretty complex. As a simplified example:
static void Main()
{
foreach(string s in Get())
{
Console.WriteLine(s);
}
}
static IEnumerable<string> Get() {
var source = new[] {1, 2, 3, 4, 5};
Task<string> outstandingItem = null;
Func<object, string> transform = x => ProcessItem((int) x);
foreach(var item in source)
{
var tmp = outstandingItem;
// note: passed in as "state", not captured, so not a foreach/capture bug
outstandingItem = new Task<string>(transform, item);
outstandingItem.Start();
if (tmp != null) yield return tmp.Result;
}
if (outstandingItem != null) yield return outstandingItem.Result;
}
static string ProcessItem(int i)
{
return i.ToString();
}
I don't want to be offensive, but maybe there is a lack of understanding. Parallel.ForEach means that the TPL will run the foreach according to the available hardware in several threads. But that means, that ii is possible to do that work in parallel! yield return gives you the opportunity to get some values out of a list (or what-so-ever) and give them back one-by-one as they are needed. It prevents of the need to first find all items matching the condition and then iterate over them. That is indeed a performance advantage, but can't be done in parallel.
Although the question is old I've managed to do something just for fun.
class Program
{
static void Main(string[] args)
{
foreach (var message in GetMessages())
{
Console.WriteLine(message);
}
}
// Parallel yield
private static IEnumerable<string> GetMessages()
{
int total = 0;
bool completed = false;
var batches = Enumerable.Range(1, 100).Select(i => new Computer() { Id = i });
var qu = new ConcurrentQueue<Computer>();
Task.Run(() =>
{
try
{
Parallel.ForEach(batches,
() => 0,
(item, loop, subtotal) =>
{
Thread.Sleep(1000);
qu.Enqueue(item);
return subtotal + 1;
},
result => Interlocked.Add(ref total, result));
}
finally
{
completed = true;
}
});
int current = 0;
while (current < total || !completed)
{
SpinWait.SpinUntil(() => current < total || completed);
if (current == total) yield break;
current++;
qu.TryDequeue(out Computer computer);
yield return $"Completed {computer.Id}";
}
}
}
public class Computer
{
public int Id { get; set; }
}
Compared to Koray's answer this one really uses all the CPU cores.
You can use the following extension method
public static class ParallelExtensions
{
public static IEnumerable<T1> OrderedParallel<T, T1>(this IEnumerable<T> list, Func<T, T1> action)
{
var unorderedResult = new ConcurrentBag<(long, T1)>();
Parallel.ForEach(list, (o, state, i) =>
{
unorderedResult.Add((i, action.Invoke(o)));
});
var ordered = unorderedResult.OrderBy(o => o.Item1);
return ordered.Select(o => o.Item2);
}
}
use like:
public void FillLogs(IEnumerable<IComputer> computers)
{
cpt.Logs = computers.OrderedParallel(o => o.GetRawLogs()).ToList();
}
Hope this will save you some time.
How about
Queue<string> qu = new Queue<string>();
bool finished = false;
Task.Factory.StartNew(() =>
{
Parallel.ForEach(get_list(), (item) =>
{
string itemToReturn = heavyWorkOnItem(item);
lock (qu)
qu.Enqueue(itemToReturn );
});
finished = true;
});
while (!finished)
{
lock (qu)
while (qu.Count > 0)
yield return qu.Dequeue();
//maybe a thread sleep here?
}
Edit:
I think this is better:
public static IEnumerable<TOutput> ParallelYieldReturn<TSource, TOutput>(this IEnumerable<TSource> source, Func<TSource, TOutput> func)
{
ConcurrentQueue<TOutput> qu = new ConcurrentQueue<TOutput>();
bool finished = false;
AutoResetEvent re = new AutoResetEvent(false);
Task.Factory.StartNew(() =>
{
Parallel.ForEach(source, (item) =>
{
qu.Enqueue(func(item));
re.Set();
});
finished = true;
re.Set();
});
while (!finished)
{
re.WaitOne();
while (qu.Count > 0)
{
TOutput res;
if (qu.TryDequeue(out res))
yield return res;
}
}
}
Edit2: I agree with the short No answer. This code is useless; you cannot break the yield loop.

Categories