Permutation algorithm Optimization - c#

I have this permutation code working perfectly but it does not generate the code fast enough, I need help with optimizing the code to run faster, please it is important that the result remains the same, I have seen other algorithms but they don't into consideration the output length and same character reputation which are all valid output. if I can have this converted into a for loop with 28 characters of alphanumeric, that would be awesome. below is the current code I am looking to optimize.
namespace CSharpPermutations
{
public interface IPermutable<T>
{
ISet<T> GetRange();
}
public class Digits : IPermutable<int>
{
public ISet<int> GetRange()
{
ISet<int> set = new HashSet<int>();
for (int i = 0; i < 10; ++i)
set.Add(i);
return set;
}
}
public class AlphaNumeric : IPermutable<char>
{
public ISet<char> GetRange()
{
ISet<char> set = new HashSet<char>();
set.Add('0');
set.Add('1');
set.Add('2');
set.Add('3');
set.Add('4');
set.Add('5');
set.Add('6');
set.Add('7');
set.Add('8');
set.Add('9');
set.Add('a');
set.Add('b');
return set;
}
}
public class PermutationGenerator<T,P> : IEnumerable<string>
where P : IPermutable<T>, new()
{
public PermutationGenerator(int number)
{
this.number = number;
this.range = new P().GetRange();
}
public IEnumerator<string> GetEnumerator()
{
foreach (var item in Permutations(0,0))
{
yield return item.ToString();
}
}
IEnumerator IEnumerable.GetEnumerator()
{
foreach (var item in Permutations(0,0))
{
yield return item;
}
}
private IEnumerable<StringBuilder> Permutations(int n, int k)
{
if (n == number)
yield return new StringBuilder();
foreach (var element in range.Skip(k))
{
foreach (var result in Permutations(n + 1, k + 1))
{
yield return new StringBuilder().Append(element).Append(result);
}
}
}
private int number;
private ISet<T> range;
}
class MainClass
{
public static void Main(string[] args)
{
foreach (var element in new PermutationGenerator<char, AlphaNumeric>(2))
{
Console.WriteLine(element);
}
}
}
}
Thanks for your effort in advance.

What you're outputting there is the cartesian product of two sets; the first set is the characters "0123456789ab" and the second set is the characters "123456789ab".
Eric Lippert wrote a well-known article demonstrating how to use Linq to solve this.
We can apply this to your problem like so:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Demo;
static class Program
{
static void Main(string[] args)
{
char[][] source = new char[2][];
source[0] = "0123456789ab".ToCharArray();
source[1] = "0123456789ab".ToCharArray();
foreach (var perm in Combine(source))
{
Console.WriteLine(string.Concat(perm));
}
}
public static IEnumerable<IEnumerable<T>> Combine<T>(IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[] { item }));
}
}
You can extend this to 28 characters by modifying the source data:
source[0] = "0123456789abcdefghijklmnopqr".ToCharArray();
source[1] = "0123456789abcdefghijklmnopqr".ToCharArray();
If you want to know how this works, read Eric Lipper's excellent article, which I linked above.

Consider
foreach (var result in Permutations(n + 1, k + 1))
{
yield return new StringBuilder().Append(element).Append(result);
}
Permutations is a recursive function that implements an iterator. So each time the .MoveNext() method is will advance one step of the loop, that will call MoveNext() in turn etc, resulting in N calls to MoveNext(), new StringBuilder, Append() etc. This is quite inefficient.
A can also not see that stringBuilder gives any advantage here. It is a benefit if you concatenate many strings, but as far as I can see you only add two strings together.
The first thing you should do is add code to measure the performance, or even better, use a profiler. That way you can tell if any changes actually improves the situation or not.
The second change I would try would be to try rewrite the recursion to an iterative implementation. This probably means that you need to keep track of an explicit stack of the numbers to process. Or if this is to difficult, stop using iterator blocks and let the recursive method take a list that it adds results to.

Related

Faster way to find first occurence of String in list

I have a method, that finds first occurrences in list of words.
wordSet - set of words, that i need to check
That list is representation of text, so words located in order, that text has.
so if pwWords has suck elements {This,is,good,boy,and,this,girl,is,bad}
and wordSet has {this,is} method should add true only for first two elements.
My question is: is there any faster way to do this?
Because if pwWords has like over million elements, and wordSet over 10 000 it works pretty slow.
public List<bool> getFirstOccurances(List<string> pwWords)
{
var firstOccurance = new List<bool>();
var wordSet = new List<String>(WordsWithFDictionary.Keys);
foreach (var pwWord in pwWords)
{
if (wordSet.Contains(pwWord))
{
firstOccurance.Add(true);
wordSet.Remove(pwWord);
}
else
{
firstOccurance.Add(false);
}
}
return firstOccurance;
}
Another approach is using HashSet for wordSet
public List<bool> getFirstOccurances(List<string> pwWords)
{
var wordSet = new HashSet<string>(WordsWithFDictionary.Keys);
return pwWords.Select(word => wordSet.Contains(word)).ToList();
}
HashSet.Contains algorithm is O(1), where List.Contains will loop all items until item is found.
For better performance you can create wordSet only once if this is possible.
public class FirstOccurances
{
private HashSet<string> _wordSet;
public FirstOccurances(IEnumerable<string> wordKeys)
{
_wordSet = new HashSet<string>(wordKeys);
}
public List<bool> GetFor(List<string> words)
{
return words.Select(word => _wordSet.Contains(word)).ToList();
}
}
Then use it
var occurrences = new FirstOccurances(WordsWithFDictionary.Keys);
// Now you can effectively search for occurrences multiple times
var result = occurrences.GetFor(pwWords);
var anotherResult = occurrences.GetFor(anotherPwWords);
Because item of pwWords can be checked for occurrences independently and if order of items not imported you can try to use Parallel LINQ
public List<bool> GetFor(List<string> words)
{
return words.AsParallel().Select(word => _wordSet.Contains(word)).ToList();
}

How to roll my own Jagged Foreach?

My application frequently iterates over jagged arrays. Rather than explicit nested loops all over the place, I'm trying to implement foreach like functionality where I can pass a lambda.
My latest attempt looks is this:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace JaggedTest
{
static class Global
{
// Does not work
public static void ForEachJagged<T>(Array A, System.Linq.Expressions.Expression<Action<T>> F)
{
foreach (var Item in A)
{
if (Item is Array)
{
ForEachJagged<T>((Array)Item, F);
}
else
{
System.Linq.Expressions.InvocationExpression Invo =
System.Linq.Expressions.Expression.Invoke(F,
System.Linq.Expressions.Expression.Constant(Item));
Console.WriteLine(Invo.ToString());
// How to execute "Invo" ?
}
}
}
}
class Program
{
static void Main(string[] args)
{
int[][] Foo = new int[3][] {
new int[] {1}
,new int[] {2,3,4,5,6}
,new int[] {7,8,9}
};
Global.ForEachJagged<int>(Foo, X => Console.Write(X.ToString() + " "));
}
}
}
This program produces the expected "ToString" debug output for the lambda expression, but I'm stuck trying to actually execute this expression. What is the proper way to execute a lambda expression passed as a function parameter?
-- EDIT --
Working jagged foreach based on Slava Utesinov's feedback:
public static void ForEachJagged<T>(Array A, Action<T> F)
{
foreach (var Item in A)
{
if (Item is Array)
{
ForEachJagged<T>((Array)Item, F);
}
else
{
F((T)Item);
}
}
}
What about Compile method:
foreach (var Item in A)
{
if (Item is Array)
ForEachJagged<T>((Array)Item, F);
else
F.Compile()((T)Item);
}
If you're really hung up on it being just one foreach loop, you can do something like the following:
foreach (var item in arr.SelectMany(a => a))
{
Console.Write(item.ToString() + " ");
}
But really, I would just recommend that you do the nesting. All the effort spent trying to get the code to look pretty is going to A) make the code only look prettier to you, B) make the code run appreciably if not significantly slower, or C) both.
If it's the indentation that hangs you up, you can always make it look like this:
foreach (var nest in arr)
foreach (var item in nest)
{
Console.Write(item.ToString() + " ");
}

C# Struct instance behavior changes when captured in lambda

I've got a work around for this issue, but I'm trying to figure out why it works . Basically, I'm looping through a list of structs using foreach. If I include a LINQ statement that references the current struct before I call a method of the struct, the method is unable to modify the members of the struct. This happens regardless of whether the LINQ statement is even called. I was able to work around this by assigning the value I was looking for to a variable and using that in the LINQ, but I would like to know what is causing this. Here's an example I created.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace WeirdnessExample
{
public struct RawData
{
private int id;
public int ID
{
get{ return id;}
set { id = value; }
}
public void AssignID(int newID)
{
id = newID;
}
}
public class ProcessedData
{
public int ID { get; set; }
}
class Program
{
static void Main(string[] args)
{
List<ProcessedData> processedRecords = new List<ProcessedData>();
processedRecords.Add(new ProcessedData()
{
ID = 1
});
List<RawData> rawRecords = new List<RawData>();
rawRecords.Add(new RawData()
{
ID = 2
});
int i = 0;
foreach (RawData rawRec in rawRecords)
{
int id = rawRec.ID;
if (i < 0 || i > 20)
{
List<ProcessedData> matchingRecs = processedRecords.FindAll(mr => mr.ID == rawRec.ID);
}
Console.Write(String.Format("With LINQ: ID Before Assignment = {0}, ", rawRec.ID)); //2
rawRec.AssignID(id + 8);
Console.WriteLine(String.Format("ID After Assignment = {0}", rawRec.ID)); //2
i++;
}
rawRecords = new List<RawData>();
rawRecords.Add(new RawData()
{
ID = 2
});
i = 0;
foreach (RawData rawRec in rawRecords)
{
int id = rawRec.ID;
if (i < 0)
{
List<ProcessedData> matchingRecs = processedRecords.FindAll(mr => mr.ID == id);
}
Console.Write(String.Format("With LINQ: ID Before Assignment = {0}, ", rawRec.ID)); //2
rawRec.AssignID(id + 8);
Console.WriteLine(String.Format("ID After Assignment = {0}", rawRec.ID)); //10
i++;
}
Console.ReadLine();
}
}
}
Okay, I've managed to reproduce this with a rather simpler test program, as shown below, and I now understand it. Admittedly understanding it doesn't make me feel any less nauseous, but hey... Explanation after code.
using System;
using System.Collections.Generic;
struct MutableStruct
{
public int Value { get; set; }
public void AssignValue(int newValue)
{
Value = newValue;
}
}
class Test
{
static void Main()
{
var list = new List<MutableStruct>()
{
new MutableStruct { Value = 10 }
};
Console.WriteLine("Without loop variable capture");
foreach (MutableStruct item in list)
{
Console.WriteLine("Before: {0}", item.Value); // 10
item.AssignValue(30);
Console.WriteLine("After: {0}", item.Value); // 30
}
// Reset...
list[0] = new MutableStruct { Value = 10 };
Console.WriteLine("With loop variable capture");
foreach (MutableStruct item in list)
{
Action capture = () => Console.WriteLine(item.Value);
Console.WriteLine("Before: {0}", item.Value); // 10
item.AssignValue(30);
Console.WriteLine("After: {0}", item.Value); // Still 10!
}
}
}
The difference between the two loops is that in the second one, the loop variable is captured by a lambda expression. The second loop is effectively turned into something like this:
// Nested class, would actually have an unspeakable name
class CaptureHelper
{
public MutableStruct item;
public void Execute()
{
Console.WriteLine(item.Value);
}
}
...
// Second loop in main method
foreach (MutableStruct item in list)
{
CaptureHelper helper = new CaptureHelper();
helper.item = item;
Action capture = helper.Execute;
MutableStruct tmp = helper.item;
Console.WriteLine("Before: {0}", tmp.Value);
tmp = helper.item;
tmp.AssignValue(30);
tmp = helper.item;
Console.WriteLine("After: {0}", tmp.Value);
}
Now of course each time we copy the variable out of helper we get a fresh copy of the struct. This should normally be fine - the iteration variable is read-only, so we'd expect it not to change. However, you have a method which changes the contents of the struct, causing the unexpected behaviour.
Note that if you tried to change the property, you'd get a compile-time error:
Test.cs(37,13): error CS1654: Cannot modify members of 'item' because it is a
'foreach iteration variable'
Lessons:
Mutable structs are evil
Structs which are mutated by methods are doubly evil
Mutating a struct via a method call on an iteration variable which has been captured is triply evil to the extent of breakage
It's not 100% clear to me whether the C# compiler is behaving as per the spec here. I suspect it is. Even if it's not, I wouldn't want to suggest the team should put any effort into fixing it. Code like this is just begging to be broken in subtle ways.
Ok. We definitely have an issues here but I suspect that this issue not with closures per se but with foreach implementation instead.
C# 4.0 specification stated (8.8.4 The foreach statement) that "the iteration variable corresponds to a read-only local variable with a scope that extends over the embedded statement". That's why we can't change loop variable or increment it's property (as Jon already stated):
struct Mutable
{
public int X {get; set;}
public void ChangeX(int x) { X = x; }
}
var mutables = new List<Mutable>{new Mutable{ X = 1 }};
foreach(var item in mutables)
{
// Illegal!
item = new Mutable();
// Illegal as well!
item.X++;
}
In this regard read-only loop variables behave almost exactly the same as any readonly field (in terms of accessing this variable outside of the constructor):
We can't change readonly field outside of the constructor
We can't change property of the read-only field of value type
We're treating readonly fields as values that leads to using a temporary copy every time we accessing readonly field of value type.
.
class MutableReadonly
{
public readonly Mutable M = new Mutable {X = 1};
}
// Somewhere in the code
var mr = new MutableReadonly();
// Illegal!
mr.M = new Mutable();
// Illegal as well!
mr.M.X++;
// Legal but lead to undesired behavior
// becaues mr.M.X remains unchanged!
mr.M.ChangeX(10);
There is a plenty of issues related to mutable value types and one of them related to the last behavior: changing readonly struct via mutator method (like ChangeX) lead to obscure behavior because we'll modify a copy but not an readonly object itself:
mr.M.ChangeX(10);
Is equivalent to:
var tmp = mr.M;
tmp.ChangeX(10);
If loop variable treated by the C# compiler as a read-only local variable, than its seems reasonable to expect the same behavior for them as for read-only fields.
Right now loop variable in the simple loop (without any closures) behaves almost the same as a read-only field except copying it for every access. But if code changes and closure comes to play, loop variable starts behaving like pure read-only variable:
var mutables = new List<Mutable> { new Mutable { X = 1 } };
foreach (var m in mutables)
{
Console.WriteLine("Before change: {0}", m.X); // X = 1
// We'll change loop variable directly without temporary variable
m.ChangeX(10);
Console.WriteLine("After change: {0}", m.X); // X = 10
}
foreach (var m in mutables)
{
// We start treating m as a pure read-only variable!
Action a = () => Console.WriteLine(m.X));
Console.WriteLine("Before change: {0}", m.X); // X = 1
// We'll change a COPY instead of a m variable!
m.ChangeX(10);
Console.WriteLine("After change: {0}", m.X); // X = 1
}
Unfortunately I can't find strict rules how read-only local variables should behave but its clear that this behavior is different based on loop body: we're not copying to locals for every access in simple loop, but we DO this if the loop body closes over loop variable.
We all know that Closing over loop variable considered harmful and that loop implementation was changed in the C# 5.0. Simple way to fix that old issue in pre C# 5.0 era was introducing local variable, but interesting that introducing local variable in this our case will change behavior as well:
foreach (var mLoop in mutables)
{
// Introducing local variable!
var m = mLoop;
// We're capturing local variable instead of loop variable
Action a = () => Console.WriteLine(m.X));
Console.WriteLine("Before change: {0}", m.X); // X = 1
// We'll roll back this behavior and will change
// value type directly in the closure without making a copy!
m.ChangeX(10); // X = 10 !!
Console.WriteLine("After change: {0}", m.X); // X = 1
}
Actually this means that C# 5.0 has very subtle breaking change because no one will introduce a local variable any more (and even tools like ReSharper stops warning about it in VS2012 because its not an issue).
I'm OK with both behaviors but inconsistency seems strange.
I suspect this has to do with how lambda expressions are evaluated. See this question and its answer for more details.
Question:
When using lambda expressions or anonymous methods in C#, we have to be wary of the access to modified closure pitfall. For example:
foreach (var s in strings)
{
query = query.Where(i => i.Prop == s); // access to modified closure
Due to the modified closure, the above code will cause all of the Where clauses on the query to be based on the final value of s.
Answer:
This is one of the worst "gotchas" in C#, and we are going to take the breaking change to fix it. In C# 5 the foreach loop variable will be logically inside the body of the loop, and therefore closures will get a fresh copy every time.
Just to accomplish Sergey's post, I wanna to add following example with manual closure, that demonstrates compiler's behavior. Of course compiler might have any other implementation that satisfies readonly requirement of captured within foreach statement variable.
static void Main()
{
var list = new List<MutableStruct>()
{
new MutableStruct { Value = 10 }
};
foreach (MutableStruct item in list)
{
var c = new Closure(item);
Console.WriteLine(c.Item.Value);
Console.WriteLine("Before: {0}", c.Item.Value); // 10
c.Item.AssignValue(30);
Console.WriteLine("After: {0}", c.Item.Value); // Still 10!
}
}
class Closure
{
public Closure(MutableStruct item){
Item = item;
}
//readonly modifier is mandatory
public readonly MutableStruct Item;
public void Foo()
{
Console.WriteLine(Item.Value);
}
}
This might solve your issue. It swaps out foreach for a for and makes the struct immutable.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace WeirdnessExample
{
public struct RawData
{
private readonly int id;
public int ID
{
get{ return id;}
}
public RawData(int newID)
{
id = newID;
}
}
public class ProcessedData
{
private readonly int id;
public int ID
{
get{ return id;}
}
public ProcessedData(int newID)
{
id = newID;
}
}
class Program
{
static void Main(string[] args)
{
List<ProcessedData> processedRecords = new List<ProcessedData>();
processedRecords.Add(new ProcessedData(1));
List<RawData> rawRecords = new List<RawData>();
rawRecords.Add(new RawData(2));
for (int i = 0; i < rawRecords.Count; i++)
{
RawData rawRec = rawRecords[i];
int id = rawRec.ID;
if (i < 0 || i > 20)
{
RawData rawRec2 = rawRec;
List<ProcessedData> matchingRecs = processedRecords.FindAll(mr => mr.ID == rawRec2.ID);
}
Console.Write(String.Format("With LINQ: ID Before Assignment = {0}, ", rawRec.ID)); //2
rawRec = new RawData(rawRec.ID + 8);
Console.WriteLine(String.Format("ID After Assignment = {0}", rawRec.ID)); //2
i++;
}
rawRecords = new List<RawData>();
rawRecords.Add(new RawData(2));
for (int i = 0; i < rawRecords.Count; i++)
{
RawData rawRec = rawRecords[i];
int id = rawRec.ID;
if (i < 0)
{
List<ProcessedData> matchingRecs = processedRecords.FindAll(mr => mr.ID == id);
}
Console.Write(String.Format("With LINQ: ID Before Assignment = {0}, ", rawRec.ID)); //2
rawRec = new RawData(rawRec.ID + 8);
Console.WriteLine(String.Format("ID After Assignment = {0}", rawRec.ID)); //10
i++;
}
Console.ReadLine();
}
}
}

CLR: Multi Param Aggregate, Argument not in Final Output?

Why is my delimiter not appearing in the final output? It's initialized to be a comma, but I only get ~5 white spaces between each attribute using:
SELECT [article_id]
, dbo.GROUP_CONCAT(0, t.tag_name, ',') AS col
FROM [AdventureWorks].[dbo].[ARTICLE_TAG_XREF] atx
JOIN [AdventureWorks].[dbo].[TAGS] t ON t.tag_id = atx.tag_id
GROUP BY article_id
The bit for DISTINCT works fine, but it operates within the Accumulate scope...
Output:
article_id | col
-------------------------------------------------
1 | a a b c
Update: The excess space between values is because the column as defined as NCHAR(10), so 10 characters would appear in the output. Silly mistake on my part...
Solution
With Martin Smith's help about working with the Write(BinaryWriter w) method, this update works for me:
public void Write(BinaryWriter w)
{
w.Write(list.Count);
for (int i = 0; i < list.Count; i++ )
{
if (i < list.Count - 1)
{
w.Write(list[i].ToString() + delimiter);
}
else
{
w.Write(list[i].ToString());
}
}
}
The Question:
Why does the above solve my problem? And why wouldn't it let me use more than one w.write call inside the FOR loop?
C# Code:
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Xml.Serialization;
using System.Xml;
using System.IO;
using System.Collections;
using System.Text;
[Serializable]
[SqlUserDefinedAggregate(Format.UserDefined, MaxByteSize = 8000)]
public struct GROUP_CONCAT : IBinarySerialize
{
ArrayList list;
string delimiter;
public void Init()
{
list = new ArrayList();
delimiter = ",";
}
public void Accumulate(SqlBoolean isDistinct, SqlString Value, SqlString separator)
{
delimiter = (separator.IsNull) ? "," : separator.Value ;
if (!Value.IsNull)
{
if (isDistinct)
{
if (!list.Contains(Value.Value))
{
list.Add(Value.Value);
}
}
else
{
list.Add(Value.Value);
}
}
}
public void Merge(GROUP_CONCAT Group)
{
list.AddRange(Group.list);
}
public SqlString Terminate()
{
string[] strings = new string[list.Count];
for (int i = 0; i < list.Count; i++)
{
strings[i] = list[i].ToString();
}
return new SqlString(string.Join(delimiter, strings));
}
#region IBinarySerialize Members
public void Read(BinaryReader r)
{
int itemCount = r.ReadInt32();
list = new ArrayList(itemCount);
for (int i = 0; i < itemCount; i++)
{
this.list.Add(r.ReadString());
}
}
public void Write(BinaryWriter w)
{
w.Write(list.Count);
foreach (string s in list)
{
w.Write(s);
}
}
#endregion
}
The problem here is that you do not serialize delimiter. Add:
w.Write(delimiter)
as a first line in your Write method and
delimiter = r.ReadString();
as a first line in your Read method.
Regarding your questions to suggested work-around:
Why does the above solve my problem?
It does not. It merely worked with your test scenario.
And why wouldn't it let me use more than one w.write call inside the FOR loop?
Write method needs to be compatible with Read method. If you write two strings and read only one then it is not going to work. The idea here is that your object may be removed from the memory and then loaded. This is what Write and Read are supposed to do. In your case - this indeed was happening and you were not able to keep the object value.
The answer given by #agsamek is correct but not complete. The query processor may instantiate multiple aggregators, e.g. for parallel computations, and the one that will eventually hold all data after successive calls of Merge() may be assigned an empty recordset, i.e. its Accumulate() method may be never called:
var concat1 = new GROUP_CONCAT();
concat1.Init();
results = getPartialResults(1); // no records returned
foreach (var result in results)
concat1.Accumulate(result[0], delimiter); // never called
...
var concat2 = new GROUP_CONCAT();
concat2.Init();
results = getPartialResults(2);
foreach (var result in results)
concat2.Accumulate(result[0], delimiter);
...
concat1.Merge(concat2);
...
result = concat1.Terminate();
In this scenario, concat1's private field delimiter used in Terminate() remains what it is by default in Init() but not what you pass in SQL. Luckily or not, your test SQL uses the same delimiter value as in Init(), so you can't reveal the difference.
I'm not sure if this is a bug or if it has been fixed in later versions (I stumbled on it in SQL Server 2008 R2). My workaround was to make use of the other group that is passed in Merge():
public void Merge(GROUP_CONCAT Group)
{
if (Group.list.Count != 0) // Group's Accumulate() has been called at least once
{
if (list.Count == 0) // this Accumulate() has not been called
delimiter = Group.delimiter;
list.AddRange(Group.list);
}
}
P.S. I would use StringBuilder instead of ArrayList.

Parallel iteration in C#?

Is there a way to do foreach style iteration over parallel enumerables in C#? For subscriptable lists, I know one could use a regular for loop iterating an int over the index range, but I really prefer foreach to for for a number of reasons.
Bonus points if it works in C# 2.0
.NET 4's BlockingCollection makes this pretty easy. Create a BlockingCollection, return its .GetConsumingEnumerable() in the enumerable method. Then the foreach simply adds to the blocking collection.
E.g.
private BlockingCollection<T> m_data = new BlockingCollection<T>();
public IEnumerable<T> GetData( IEnumerable<IEnumerable<T>> sources )
{
Task.Factory.StartNew( () => ParallelGetData( sources ) );
return m_data.GetConsumingEnumerable();
}
private void ParallelGetData( IEnumerable<IEnumerable<T>> sources )
{
foreach( var source in sources )
{
foreach( var item in source )
{
m_data.Add( item );
};
}
//Adding complete, the enumeration can stop now
m_data.CompleteAdding();
}
Hope this helps.
BTW I posted a blog about this last night
Andre
Short answer, no. foreach works on only one enumerable at a time.
However, if you combine your parallel enumerables into a single one, you can foreach over the combined. I am not aware of any easy, built in method of doing this, but the following should work (though I have not tested it):
public IEnumerable<TSource[]> Combine<TSource>(params object[] sources)
{
foreach(var o in sources)
{
// Choose your own exception
if(!(o is IEnumerable<TSource>)) throw new Exception();
}
var enums =
sources.Select(s => ((IEnumerable<TSource>)s).GetEnumerator())
.ToArray();
while(enums.All(e => e.MoveNext()))
{
yield return enums.Select(e => e.Current).ToArray();
}
}
Then you can foreach over the returned enumerable:
foreach(var v in Combine(en1, en2, en3))
{
// Remembering that v is an array of the type contained in en1,
// en2 and en3.
}
Zooba's answer is good, but you might also want to look at the answers to "How to iterate over two arrays at once".
I wrote an implementation of EachParallel() from the .NET4 Parallel library. It is compatible with .NET 3.5: Parallel ForEach Loop in C# 3.5
Usage:
string[] names = { "cartman", "stan", "kenny", "kyle" };
names.EachParallel(name =>
{
try
{
Console.WriteLine(name);
}
catch { /* handle exception */ }
});
Implementation:
/// <summary>
/// Enumerates through each item in a list in parallel
/// </summary>
public static void EachParallel<T>(this IEnumerable<T> list, Action<T> action)
{
// enumerate the list so it can't change during execution
list = list.ToArray();
var count = list.Count();
if (count == 0)
{
return;
}
else if (count == 1)
{
// if there's only one element, just execute it
action(list.First());
}
else
{
// Launch each method in it's own thread
const int MaxHandles = 64;
for (var offset = 0; offset < list.Count() / MaxHandles; offset++)
{
// break up the list into 64-item chunks because of a limitiation // in WaitHandle
var chunk = list.Skip(offset * MaxHandles).Take(MaxHandles);
// Initialize the reset events to keep track of completed threads
var resetEvents = new ManualResetEvent[chunk.Count()];
// spawn a thread for each item in the chunk
int i = 0;
foreach (var item in chunk)
{
resetEvents[i] = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem(new WaitCallback((object data) =>
{
int methodIndex = (int)((object[])data)[0];
// Execute the method and pass in the enumerated item
action((T)((object[])data)[1]);
// Tell the calling thread that we're done
resetEvents[methodIndex].Set();
}), new object[] { i, item });
i++;
}
// Wait for all threads to execute
WaitHandle.WaitAll(resetEvents);
}
}
}
If you want to stick to the basics - I rewrote the currently accepted answer in a simpler way:
public static IEnumerable<TSource[]> Combine<TSource> (this IEnumerable<IEnumerable<TSource>> sources)
{
var enums = sources
.Select (s => s.GetEnumerator ())
.ToArray ();
while (enums.All (e => e.MoveNext ())) {
yield return enums.Select (e => e.Current).ToArray ();
}
}
public static IEnumerable<TSource[]> Combine<TSource> (params IEnumerable<TSource>[] sources)
{
return sources.Combine ();
}
Would this work for you?
public static class Parallel
{
public static void ForEach<T>(IEnumerable<T>[] sources,
Action<T> action)
{
foreach (var enumerable in sources)
{
ThreadPool.QueueUserWorkItem(source => {
foreach (var item in (IEnumerable<T>)source)
action(item);
}, enumerable);
}
}
}
// sample usage:
static void Main()
{
string[] s1 = { "1", "2", "3" };
string[] s2 = { "4", "5", "6" };
IEnumerable<string>[] sources = { s1, s2 };
Parallel.ForEach(sources, s => Console.WriteLine(s));
Thread.Sleep(0); // allow background threads to work
}
For C# 2.0, you need to convert the lambda expressions above to delegates.
Note: This utility method uses background threads. You may want to modify it to use foreground threads, and probably you'll want to wait till all threads finish. If you do that, I suggest you create sources.Length - 1 threads, and use the current executing thread for the last (or first) source.
(I wish I could include waiting for threads to finish in my code, but I'm sorry that I don't know how to do that yet. I guess you should use a WaitHandle Thread.Join().)

Categories