Parallel.For loop for spell checking using NHunspell and c# - c#

I have a list of string and if I run spell checking using NHunspell in sequential manner then everything works fine; but if I use Parallel.For loop against the List the application stops working in the middle( some address violation error )
public static bool IsSpellingRight(string inputword, byte[] frDic, byte[] frAff, byte[] enDic, byte[] enAff)
{
if (inputword.Length != 0)
{
bool correct;
if (IsEnglish(inputword))
{
using (var hunspell = new Hunspell(enAff, enDic))
{
correct = hunspell.Spell(inputword);
}
}
else
{
using (var hunspell = new Hunspell(frAff, frDic))
{
correct = hunspell.Spell(inputword);
}
}
return correct ;
}
return false;
}
Edit:
var tokenSource = new CancellationTokenSource();
CancellationToken ct = tokenSource.Token;
var poptions = new ParallelOptions();
// Keep one core/CPU free...
poptions.MaxDegreeOfParallelism = Environment.ProcessorCount - 1;
Task task = Task.Factory.StartNew(delegate
{
Parallel.For(0, total, poptions, i =>
{
if (words[i] != "")
{
_totalWords++;
if (IsSpellingRight(words[i],dictFileBytes,
affFileBytes,dictFileBytesE,affFileBytesE))
{
// do something
}
else
{
BeginInvoke((Action) (() =>
{
//do something on UI thread
}));
}
}
});
}, tokenSource.Token);
task.ContinueWith((t) => BeginInvoke((Action) (() =>
{
MessaageBox.Show("Done");
})));

I think you should forget about your parallel loop implement the things right.
Are you aware of the fact that this code loads and constructs the dictionary:
using (var hunspell = new Hunspell(enAff, enDic))
{
correct = hunspell.Spell(inputword);
}
Your are loading and construction the dictionary over and over again with your code. This is awfully slow! Load Your dictionary once and check all words, then dispose it. And don't do this in parallel because Hunspell objects are not thread safe.
Pseodocode:
Hunspell hunspell = null;
try
{
hunspell = new Hunspell(enAff, enDic)
for( ... )
{
hunspell.Spell(inputword[i]);
}
}
}
finally
{
if( hunspell != null ) hunspell.Dispose();
}
If you need to check words massive in parallel consider to read this article:
http://www.codeproject.com/Articles/43769/Spell-Check-Hyphenation-and-Thesaurus-for-NET-with

Ok, now I can see a potential problem. In line
_totalWords++;
you're incrementing a value, that (I suppose) is declared somewhere outside the loop. Use locking mechanism.
edit:
Also, you could use Interlocked.Increment(ref val);, which would be faster, than simple locking.
edit2:
Here's how the locking I described in comment should look like for the problem you encounter:
static object Locker = new object(); //anywhere in the class
//then in your method
if (inputword.Length != 0)
{
bool correct;
bool isEnglish;
lock(Locker) {isEnglish = IsEnglish(inputword);}
if(isEnglish)
{
//..do your stuff
}
//rest of your function
}

Related

Asynchronous Task, video buffering

I am trying to understand Tasks in C# but still having some problems. I am trying to create an application containing video. The main purpose is to read the video from a file (I am using Emgu.CV) and send it via TCP/IP for process in a board and then back in a stream (real-time) way. Firstly, I did it in serial. So, reading a Bitmap, sending-receiving from board, and plotting. But reading the bitmaps and plotting them takes too much time. I would like to have a Transmit, Receive FIFO Buffers that save the video frames, and a different task that does the job of sending receiving each frame. So I would like to do it in parallel. I thought I should create 3 Tasks:
tasks.Add(Task.Run(() => Video_load(video_path)));
tasks.Add(Task.Run(() => Video_Send_Recv(video_path)));
tasks.Add(Task.Run(() => VideoDisp_hw(32)));
Which I would like to run "parallel". What type of object should I use? A concurrent queue? BufferBlock? or just a list?
Thanks for the advices! I would like to ask something. I am trying to create a simple console program with 2 TPL blocks. 1 Block would be Transform block (taking a message i.e. "start" ) and loading data to a List and another block would be ActionBlock (just reading the data from the list and printing them). Here is the code below:
namespace TPL_Dataflow
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
Random randn = new Random();
var loadData = new TransformBlock<string, List<int>>(async sample_string =>
{
List<int> input_data = new List<int>();
int cnt = 0;
if (sample_string == "start")
{
Console.WriteLine("Inside loadData");
while (cnt < 16)
{
input_data.Add(randn.Next(1, 255));
await Task.Delay(1500);
Console.WriteLine("Cnt");
cnt++;
}
}
else
{
Console.WriteLine("Not started yet");
}
return input_data;
});
var PrintData = new ActionBlock<List<int>>(async input_data =>
{
while(input_data.Count > 0)
{
Console.WriteLine("output Data = " + input_data.First());
await Task.Delay(1000);
input_data.RemoveAt(0);
}
});
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
loadData.LinkTo(PrintData, input_data => input_data.Count() >0 );
//loadData.LinkTo(PrintData, linkOptions);
loadData.SendAsync("start");
loadData.Complete();
PrintData.Completion.Wait();
}
}
}
But it seems to work in serial way.. What am I doing wrong? I tried to do the while loops async. I would like to do the 2 things in parallel. When data available from the List then plotted.
You could use a TransformManyBlock<string, int> as the producer block, and an ActionBlock<int> as the consumer block. The TransformManyBlock would be instantiated with the constructor that accepts a Func<string, IEnumerable<int>> delegate, and passed an iterator method (the Produce method in the example below) that yields values one by one:
Random random = new Random();
var producer = new TransformManyBlock<string, int>(Produce);
IEnumerable<int> Produce(string message)
{
if (message == "start")
{
int cnt = 0;
while (cnt < 16)
{
int value;
lock (random) value = random.Next(1, 255);
Console.WriteLine($"Producing #{value}");
yield return value;
Thread.Sleep(1500);
cnt++;
}
}
else
{
yield break;
}
}
var consumer = new ActionBlock<int>(async value =>
{
Console.WriteLine($"Received: {value}");
await Task.Delay(1000);
});
producer.LinkTo(consumer, new() { PropagateCompletion = true });
producer.Post("start");
producer.Complete();
consumer.Completion.Wait();
Unfortunately the producer has to block the worker thread during the idle period between yielding each value (Thread.Sleep(1500);), because the TransformManyBlock currently does not have a constructor that accepts a Func<string, IAsyncEnumerable<int>>. This will be probably fixed in the next release of the TPL Dataflow library. You could track this GitHub issue, to be informed about when this feature will be released.
Alternative solution: Instead of linking explicitly the producer and the consumer, you could keep them unlinked, and send manually the values produced by the producer to the consumer. In this case both blocks would be ActionBlocks:
Random random = new Random();
var consumer = new ActionBlock<int>(async value =>
{
Console.WriteLine($"Received: {value}");
await Task.Delay(1000);
});
var producer = new ActionBlock<string>(async message =>
{
if (message == "start")
{
int cnt = 0;
while (cnt < 16)
{
int value;
lock (random) value = random.Next(1, 255);
Console.WriteLine($"Producing #{value}");
var accepted = await consumer.SendAsync(value);
if (!accepted) break; // The consumer has failed
await Task.Delay(1500);
cnt++;
}
}
});
PropagateCompletion(producer, consumer);
producer.Post("start");
producer.Complete();
consumer.Completion.Wait();
async void PropagateCompletion(IDataflowBlock source, IDataflowBlock target)
{
try { await source.Completion.ConfigureAwait(false); } catch { }
var ex = source.Completion.IsFaulted ? source.Completion.Exception : null;
if (ex != null) target.Fault(ex); else target.Complete();
}
The main difficulty with this approach is how to propagate the completion of the producer to the consumer, so that eventually both blocks are completed. Obviously you can't use the new DataflowLinkOptions { PropagateCompletion = true } configuration, since the blocks are not linked explicitly. You also can't Complete manually the consumer, because in this case it would stop prematurely accepting values from the producer. The solution to this problem is the PropagateCompletion method shown in the above example.

Make using statement usable for multiple disposable objects

I have a bunch of text files in a folder, and all of them should have identical headers. In other words the first 100 lines of all files should be identical. So I wrote a function to check this condition:
private static bool CheckHeaders(string folderPath, int headersCount)
{
var enumerators = Directory.EnumerateFiles(folderPath)
.Select(f => File.ReadLines(f).GetEnumerator())
.ToArray();
//using (enumerators)
//{
for (int i = 0; i < headersCount; i++)
{
foreach (var e in enumerators)
{
if (!e.MoveNext()) return false;
}
var values = enumerators.Select(e => e.Current);
if (values.Distinct().Count() > 1) return false;
}
return true;
//}
}
The reason I am using enumerators is memory efficiency. Instead of loading all file contents in memory I enumerate the files concurrently line-by-line until a mismatch is found, or all headers have been examined.
My problem is evident by the commented lines of code. I would like to utilize a using block to safely dispose all the enumerators, but unfortunately using (enumerators) doesn't compile. Apparently using can handle only a single disposable object. I know that I can dispose the enumerators manually, by wrapping the whole thing in a try-finally block, and running the disposing logic in a loop inside finally, but is seems awkward. Is there any mechanism I could employ to make the using statement a viable option in this case?
Update
I just realized that my function has a serious flaw. The construction of the enumerators is not robust. A locked file can cause an exception, while some enumerators have already been created. These enumerators will not be disposed. This is something I want to fix. I am thinking about something like this:
var enumerators = Directory.EnumerateFiles(folderPath)
.ToDisposables(f => File.ReadLines(f).GetEnumerator());
The extension method ToDisposables should ensure that in case of an exception no disposables are left undisposed.
You can create a disposable-wrapper over your enumerators:
class DisposableEnumerable : IDisposable
{
private IEnumerable<IDisposable> items;
public event UnhandledExceptionEventHandler DisposalFailed;
public DisposableEnumerable(IEnumerable<IDisposable> items) => this.items = items;
public void Dispose()
{
foreach (var item in items)
{
try
{
item.Dispose();
}
catch (Exception e)
{
var tmp = DisposalFailed;
tmp?.Invoke(this, new UnhandledExceptionEventArgs(e, false));
}
}
}
}
and use it with the lowest impact to your code:
private static bool CheckHeaders(string folderPath, int headersCount)
{
var enumerators = Directory.EnumerateFiles(folderPath)
.Select(f => File.ReadLines(f).GetEnumerator())
.ToArray();
using (var disposable = new DisposableEnumerable(enumerators))
{
for (int i = 0; i < headersCount; i++)
{
foreach (var e in enumerators)
{
if (!e.MoveNext()) return false;
}
var values = enumerators.Select(e => e.Current);
if (values.Distinct().Count() > 1) return false;
}
return true;
}
}
The thing is you have to dispose those objects separately one by one anyway. But it's up to you where to encapsulate that logic. And the code I've suggested has no manual try-finally,)
To the second part of the question. If I get you right this should be sufficient:
static class DisposableHelper
{
public static IEnumerable<TResult> ToDisposable<TSource, TResult>(this IEnumerable<TSource> source,
Func<TSource, TResult> selector) where TResult : IDisposable
{
var exceptions = new List<Exception>();
var result = new List<TResult>();
foreach (var i in source)
{
try { result.Add(selector(i)); }
catch (Exception e) { exceptions.Add(e); }
}
if (exceptions.Count == 0)
return result;
foreach (var i in result)
{
try { i.Dispose(); }
catch (Exception e) { exceptions.Add(e); }
}
throw new AggregateException(exceptions);
}
}
Usage:
private static bool CheckHeaders(string folderPath, int headersCount)
{
var enumerators = Directory.EnumerateFiles(folderPath)
.ToDisposable(f => File.ReadLines(f).GetEnumerator())
.ToArray();
using (new DisposableEnumerable(enumerators))
{
for (int i = 0; i < headersCount; i++)
{
foreach (var e in enumerators)
{
if (!e.MoveNext()) return false;
}
var values = enumerators.Select(e => e.Current);
if (values.Distinct().Count() > 1) return false;
}
return true;
}
}
and
try
{
CheckHeaders(folderPath, headersCount);
}
catch(AggregateException e)
{
// Prompt to fix errors and try again
}
I'm going to suggest an approach that uses recursive calls to Zip to allow parallel enumeration of a normal IEnumerable<string> without the need to resort to using IEnumerator<string>.
bool Zipper(IEnumerable<IEnumerable<string>> sources, int take)
{
IEnumerable<string> ZipperImpl(IEnumerable<IEnumerable<string>> ss)
=> (!ss.Skip(1).Any())
? ss.First().Take(take)
: ss.First().Take(take).Zip(
ZipperImpl(ss.Skip(1)),
(x, y) => (x == null || y == null || x != y) ? null : x);
var matching_lines = ZipperImpl(sources).TakeWhile(x => x != null).ToArray();
return matching_lines.Length == take;
}
Now build up your enumerables:
IEnumerable<string>[] enumerables =
Directory
.EnumerateFiles(folderPath)
.Select(f => File.ReadLines(f))
.ToArray();
Now it's simple to call:
bool headers_match = Zipper(enumerables, 100);
Here's a trace of running this code against three files with more than 4 lines:
Ben Petering at 5:28 PM ACST
Ben Petering at 5:28 PM ACST
Ben Petering at 5:28 PM ACST
From a call 2019-05-23, James mentioned he’d like the ability to edit the current shipping price rules (eg in shipping_rules.xml) via the admin.
From a call 2019-05-23, James mentioned he’d like the ability to edit the current shipping price rules (eg in shipping_rules.xml) via the admin.
From a call 2019-05-23, James mentioned he’d like the ability to edit the current shipping price rules (eg in shipping_rules.xml) via the admin.
He also mentioned he’d like to be able to set different shipping price rules for a given time window, e.g. Jan 1 to Jan 30.
He also mentioned he’d like to be able to set different shipping price rules for a given time window, e.g. Jan 1 to Jan 30.
He also mentioned he’d like to be able to set different shipping price rules for a given time window, e.g. Jan 1 to Jan 30.
These storyishes should be considered when choosing the appropriate module to use.
These storyishes should be considered when choosing the appropriate module to use.X
These storyishes should be considered when choosing the appropriate module to use.
Note that the enumerations stop when they encountered a mismatch header in the 4th line on the second file. All enumerations then stopped.
Creating an IDisposable wrapper as #Alex suggested is correct. It needs just a logic to dispose already opened files if some of them is locked and probably some logic for error states. Maybe something like this (error state logic is very simple):
public class HeaderChecker : IDisposable
{
private readonly string _folderPath;
private readonly int _headersCount;
private string _lockedFile;
private readonly List<IEnumerator<string>> _files = new List<IEnumerator<string>>();
public HeaderChecker(string folderPath, int headersCount)
{
_folderPath = folderPath;
_headersCount = headersCount;
}
public string LockedFile => _lockedFile;
public bool CheckFiles()
{
_lockedFile = null;
if (!TryOpenFiles())
{
return false;
}
if (_files.Count == 0)
{
return true; // Not sure what to return here.
}
for (int i = 0; i < _headersCount; i++)
{
if (!_files[0].MoveNext()) return false;
string currentLine = _files[0].Current;
for (int fileIndex = 1; fileIndex < _files.Count; fileIndex++)
{
if (!_files[fileIndex].MoveNext()) return false;
if (_files[fileIndex].Current != currentLine) return false;
}
}
return true;
}
private bool TryOpenFiles()
{
bool result = true;
foreach (string file in Directory.EnumerateFiles(_folderPath))
{
try
{
_files.Add(File.ReadLines(file).GetEnumerator());
}
catch
{
_lockedFile = file;
result = false;
break;
}
}
if (!result)
{
DisposeCore(); // Close already opened files.
}
return result;
}
private void DisposeCore()
{
foreach (var item in _files)
{
try
{
item.Dispose();
}
catch
{
}
}
_files.Clear();
}
public void Dispose()
{
DisposeCore();
}
}
// Usage
using (var checker = new HeaderChecker(folderPath, headersCount))
{
if (!checker.CheckFiles())
{
if (checker.LockedFile is null)
{
// Error while opening files.
}
else
{
// Headers do not match.
}
}
}
I also removed .Select() and .Distinct() when checking the lines. The first just iterates over the enumerators array - the same as foreach above it, so you are enumerating this array twice. Then creates a new list of lines and .Distinct() enumerates over it.

waitall for multiple handles on sta thread is not supported [duplicate]

This question already has answers here:
WaitAll for multiple handles on a STA thread is not supported
(5 answers)
Closed 9 years ago.
Hi all i have this exception when i run my app.
i work on .net 3.5 so i cannot use Task
waitall for multiple handles on sta thread is not supported
this is the code :-
private void ThreadPopFunction(ContactList SelectedContactList, List<User> AllSelectedUsers)
{
int NodeCount = 0;
AllSelectedUsers.EachParallel(user =>
{
NodeCount++;
if (user != null)
{
if (user.OCSEnable)
{
string messageExciption = string.Empty;
if (!string.IsNullOrEmpty(user.SipURI))
{
//Lync.Lync.Lync lync = new Lync.Lync.Lync(AdObjects.Pools);
List<Pool> myPools = AdObjects.Pools;
if (new Lync.Lync.Lync(myPools).Populate(user, SelectedContactList, out messageExciption))
{
}
}
}
}
});
}
and this is my extension method i use to work with multithreading
public static void EachParallel<T>(this IEnumerable<T> list, Action<T> action)
{
// enumerate the list so it can't change during execution
// TODO: why is this happening?
list = list.ToArray();
var count = list.Count();
if (count == 0)
{
return;
}
else if (count == 1)
{
// if there's only one element, just execute it
action(list.First());
}
else
{
// Launch each method in it's own thread
const int MaxHandles = 64;
for (var offset = 0; offset <= count/MaxHandles; offset++)
{
// break up the list into 64-item chunks because of a limitiation in WaitHandle
var chunk = list.Skip(offset*MaxHandles).Take(MaxHandles);
// Initialize the reset events to keep track of completed threads
var resetEvents = new ManualResetEvent[chunk.Count()];
// spawn a thread for each item in the chunk
int i = 0;
foreach (var item in chunk)
{
resetEvents[i] = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem(new WaitCallback((object data) =>
{
int methodIndex =
(int) ((object[]) data)[0];
// Execute the method and pass in the enumerated item
action((T) ((object[]) data)[1]);
// Tell the calling thread that we're done
resetEvents[methodIndex].Set();
}), new object[] {i, item});
i++;
}
// Wait for all threads to execute
WaitHandle.WaitAll(resetEvents);
}
}
}
if you can help me, i'll appreciate your support
OK, as you're using .Net 3.5, you can't use the TPL introduced with .Net 4.0.
STA thread or not, in your case there is a way more simple/efficient approach than WaitAll. You could simply have a counter and a unique WaitHandle. Here's some code (can't test it right now, but it should be fine):
// No MaxHandle limitation ;)
for (var offset = 0; offset <= count; offset++)
{
// Initialize the reset event
var resetEvent = new ManualResetEvent();
// Queue action in thread pool for each item in the list
int counter = count;
foreach (var item in list)
{
ThreadPool.QueueUserWorkItem(new WaitCallback((object data) =>
{
int methodIndex =
(int) ((object[]) data)[0];
// Execute the method and pass in the enumerated item
action((T) ((object[]) data)[1]);
// Decrements counter atomically
Interlocked.Decrement(ref counter);
// If we're at 0, then last action was executed
if (Interlocked.Read(ref counter) == 0)
{
resetEvent.Set();
}
}), new object[] {i, item});
}
// Wait for the single WaitHandle
// which is only set when the last action executed
resetEvent.WaitOne();
}
Also FYI, ThreadPool.QueueUserWorkItem doesn't spawn a thread each time it's called (I'm saying that because of the comment "spawn a thread for each item in the chunk"). It uses a pool of thread, so it mostly reuses existing threads.
For those like me, who need to use the examples.
ken2k's solution is great and it works but with a few corrections (he said he didn't test it). Here is ken2k's working example (worked for me):
// No MaxHandle limitation ;)
for (var offset = 0; offset <= count; offset++)
{
// Initialize the reset event
var resetEvent = new ManualResetEvent(false);
// Queue action in thread pool for each item in the list
long counter = count;
// use a thread for each item in the chunk
int i = 0;
foreach (var item in list)
{
ThreadPool.QueueUserWorkItem(new WaitCallback((object data) =>
{
int methodIndex =
(int) ((object[]) data)[0];
// Execute the method and pass in the enumerated item
action((T) ((object[]) data)[1]);
// Decrements counter atomically
Interlocked.Decrement(ref counter);
// If we're at 0, then last action was executed
if (Interlocked.Read(ref counter) == 0)
{
resetEvent.Set();
}
}), new object[] {i, item});
}
// Wait for the single WaitHandle
// which is only set when the last action executed
resetEvent.WaitOne();
}
Actually there is a way to use (at least a good part) of the TPL in .net 3.5. There is a backport that was done for the Rx-Project.
You can find it here: http://www.nuget.org/packages/TaskParallelLibrary
Maybe this will help.

Parallel.Foreach + yield return?

I want to process something using parallel loop like this :
public void FillLogs(IEnumerable<IComputer> computers)
{
Parallel.ForEach(computers, cpt=>
{
cpt.Logs = cpt.GetRawLogs().ToList();
});
}
Ok, it works fine. But How to do if I want the FillLogs method return an IEnumerable ?
public IEnumerable<IComputer> FillLogs(IEnumerable<IComputer> computers)
{
Parallel.ForEach(computers, cpt=>
{
cpt.Logs = cpt.GetRawLogs().ToList();
yield return cpt // KO, don't work
});
}
EDIT
It seems not to be possible... but I use something like this :
public IEnumerable<IComputer> FillLogs(IEnumerable<IComputer> computers)
{
return computers.AsParallel().Select(cpt => cpt);
}
But where I put the cpt.Logs = cpt.GetRawLogs().ToList(); instruction
Short version - no, that isn't possible via an iterator block; the longer version probably involves synchronized queue/dequeue between the caller's iterator thread (doing the dequeue) and the parallel workers (doing the enqueue); but as a side note - logs are usually IO-bound, and parallelising things that are IO-bound often doesn't work very well.
If the caller is going to take some time to consume each, then there may be some merit to an approach that only processes one log at a time, but can do that while the caller is consuming the previous log; i.e. it begins a Task for the next item before the yield, and waits for completion after the yield... but that is again, pretty complex. As a simplified example:
static void Main()
{
foreach(string s in Get())
{
Console.WriteLine(s);
}
}
static IEnumerable<string> Get() {
var source = new[] {1, 2, 3, 4, 5};
Task<string> outstandingItem = null;
Func<object, string> transform = x => ProcessItem((int) x);
foreach(var item in source)
{
var tmp = outstandingItem;
// note: passed in as "state", not captured, so not a foreach/capture bug
outstandingItem = new Task<string>(transform, item);
outstandingItem.Start();
if (tmp != null) yield return tmp.Result;
}
if (outstandingItem != null) yield return outstandingItem.Result;
}
static string ProcessItem(int i)
{
return i.ToString();
}
I don't want to be offensive, but maybe there is a lack of understanding. Parallel.ForEach means that the TPL will run the foreach according to the available hardware in several threads. But that means, that ii is possible to do that work in parallel! yield return gives you the opportunity to get some values out of a list (or what-so-ever) and give them back one-by-one as they are needed. It prevents of the need to first find all items matching the condition and then iterate over them. That is indeed a performance advantage, but can't be done in parallel.
Although the question is old I've managed to do something just for fun.
class Program
{
static void Main(string[] args)
{
foreach (var message in GetMessages())
{
Console.WriteLine(message);
}
}
// Parallel yield
private static IEnumerable<string> GetMessages()
{
int total = 0;
bool completed = false;
var batches = Enumerable.Range(1, 100).Select(i => new Computer() { Id = i });
var qu = new ConcurrentQueue<Computer>();
Task.Run(() =>
{
try
{
Parallel.ForEach(batches,
() => 0,
(item, loop, subtotal) =>
{
Thread.Sleep(1000);
qu.Enqueue(item);
return subtotal + 1;
},
result => Interlocked.Add(ref total, result));
}
finally
{
completed = true;
}
});
int current = 0;
while (current < total || !completed)
{
SpinWait.SpinUntil(() => current < total || completed);
if (current == total) yield break;
current++;
qu.TryDequeue(out Computer computer);
yield return $"Completed {computer.Id}";
}
}
}
public class Computer
{
public int Id { get; set; }
}
Compared to Koray's answer this one really uses all the CPU cores.
You can use the following extension method
public static class ParallelExtensions
{
public static IEnumerable<T1> OrderedParallel<T, T1>(this IEnumerable<T> list, Func<T, T1> action)
{
var unorderedResult = new ConcurrentBag<(long, T1)>();
Parallel.ForEach(list, (o, state, i) =>
{
unorderedResult.Add((i, action.Invoke(o)));
});
var ordered = unorderedResult.OrderBy(o => o.Item1);
return ordered.Select(o => o.Item2);
}
}
use like:
public void FillLogs(IEnumerable<IComputer> computers)
{
cpt.Logs = computers.OrderedParallel(o => o.GetRawLogs()).ToList();
}
Hope this will save you some time.
How about
Queue<string> qu = new Queue<string>();
bool finished = false;
Task.Factory.StartNew(() =>
{
Parallel.ForEach(get_list(), (item) =>
{
string itemToReturn = heavyWorkOnItem(item);
lock (qu)
qu.Enqueue(itemToReturn );
});
finished = true;
});
while (!finished)
{
lock (qu)
while (qu.Count > 0)
yield return qu.Dequeue();
//maybe a thread sleep here?
}
Edit:
I think this is better:
public static IEnumerable<TOutput> ParallelYieldReturn<TSource, TOutput>(this IEnumerable<TSource> source, Func<TSource, TOutput> func)
{
ConcurrentQueue<TOutput> qu = new ConcurrentQueue<TOutput>();
bool finished = false;
AutoResetEvent re = new AutoResetEvent(false);
Task.Factory.StartNew(() =>
{
Parallel.ForEach(source, (item) =>
{
qu.Enqueue(func(item));
re.Set();
});
finished = true;
re.Set();
});
while (!finished)
{
re.WaitOne();
while (qu.Count > 0)
{
TOutput res;
if (qu.TryDequeue(out res))
yield return res;
}
}
}
Edit2: I agree with the short No answer. This code is useless; you cannot break the yield loop.

ThreadPool QueueUserWorkItem with list

I would like to use the QueueUserWorkItem from the ThreadPool. When I use the following code everything works well.
private int ThreadCountSemaphore = 0;
private void (...) {
var reportingDataList = new List<LBReportingData>();
ThreadCountSemaphore = reportingDataList.Count;
using (var autoResetEvent = new AutoResetEvent(false)) {
ThreadPool.QueueUserWorkItem((o) => this.FillReportingData(settings, reportingDataList[0], autoResetEvent));
ThreadPool.QueueUserWorkItem((o) => this.FillReportingData(settings, reportingDataList[1], autoResetEvent));
ThreadPool.QueueUserWorkItem((o) => this.FillReportingData(settings, reportingDataList[2], autoResetEvent));
}
}
private void FillReportingData(...) {
if (Interlocked.Decrement(ref this.ThreadCountSemaphore) == 0) {
waitHandle.Set();
}
}
But when I use a list instead the single method calls, then my program crash without an exception.
private void (...) {
var reportingDataList = new List<LBReportingData>();
ThreadCountSemaphore = reportingDataList.Count;
using (var autoResetEvent = new AutoResetEvent(false)) {
ThreadPool.QueueUserWorkItem((o) => this.FillReportingData(settings, reportingDataList[i], autoResetEvent));
}
}
What do i wrong? What should I change?
Update
Sorry, I've made a fault in the code. I use .NET 2.0 with VS2010.
Here's the complete code:
private int ThreadCountSemaphore = 0;
private IList<LBReportingData> LoadReportsForBatch() {
var reportingDataList = new List<LBReportingData>();
var settings = OnNeedEntitySettings();
if (settings.Settings.ReportDefinition != null) {
var definitionList = new List<ReportDefinitionen> { ReportDefinitionen.OrgStatus, ReportDefinitionen.Mittelwerte, ReportDefinitionen.Verteilungsstatistik };
using (var autoResetEvent = new AutoResetEvent(false)) {
foreach (var reportDefinition in definitionList) {
foreach (DataRow row in settings.Settings.ReportDefinition.Select("AuswertungsTyp = " + (int)reportDefinition)) {
reportingDataList.Add(new LBReportingData { SourceData = row, ReportType = reportDefinition });
}
}
ThreadCountSemaphore = reportingDataList.Count;
foreach(var reportingDataItem in reportingDataList) {
ThreadPool.QueueUserWorkItem((o) => this.FillReportingData(settings, reportingDataItem, autoResetEvent));
}
autoResetEvent.WaitOne();
}
}
return reportingDataList;
}
private void FillReportingData(IEntitySettings<DSLBUReportDefinition> settings, LBReportingData reportingData, AutoResetEvent waitHandle){
DoSomeWork();
if (Interlocked.Decrement(ref this.ThreadCountSemaphore) == 0) {
waitHandle.Set();
}
}
Thanks
You are disposing the WaitHandle immediately after queueing the work items. There is race between the call to Dispose in the main thread and Set in the worker thread. There may be other problems, but it is difficult to guess because the code is incomplete.
Here is how the pattern is suppose to work.
using (var finished = new CountdownEvent(1))
{
foreach (var item in reportingDataList)
{
var captured = item;
finished.AddCount();
ThreadPool.QueueUserWorkItem(
(state) =>
{
try
{
DoSomeWork(captured); // FillReportingData?
}
finally
{
finished.Signal();
}
}, null);
}
finished.Signal();
finished.Wait();
}
The code uses the CountdownEvent class. It is available in .NET 4.0 or as part of the Reactive Extensions download.
As Hans pointed out, it is not clear where "i" is coming from. But also I can see your disposing block going out and disposed because you are not using WaitOne on it (or you have not copied that part of code).
Also I would prefer to use WaitAll and not using interlocked.

Categories