C# Multi-Threaded Tree traversal - c#

I am trying to write a C# system that will multi-threaded traverse a tree structure. Another way to look at this is where the consumer of the BlockingCollection is also the producer.
The problem I am having is telling when everything is finished.
The test I really need is to see if all the threads are on the TryTake.
If they are then everything has finished, but I cannot find a way to test of this or wrap this with anything that would help achieve this.
The code below is a very simple example of this code as far as I have it, but there is a condition in which this code can fail. If the first thread just passed the test.TryTake(out v,-1) and has not yet executed the s.Release(); and it just pulled the last item from the collection, and the second thread just performed the if(s.CurrentCount == 0 && test.Count ==0) this could return true, and incorrectly start finishing things up.
But then the first thread would continue on and try and add more to the collection.
If I could make the lines:
if (!test.TryTake(out v, -1))
break;
s.Release();
atomic then I believe this code would work. (Which is obviously not possible.)
But I cannot figure out how to fix this flaw.
class Program
{
private static BlockingCollection<int> test;
static void Main(string[] args)
{
test = new BlockingCollection<int>();
WorkClass.s = new SemaphoreSlim(2);
WorkClass w0 = new WorkClass("A");
WorkClass w1 = new WorkClass("B");
Thread t0 = new Thread(w0.WorkFunction);
Thread t1 = new Thread(w1.WorkFunction);
test.Add(10);
t0.Start();
t1.Start();
t0.Join();
t1.Join();
Console.WriteLine("Done");
Console.ReadLine();
}
class WorkClass
{
public static SemaphoreSlim s;
private readonly string _name;
public WorkClass(string name)
{
_name = name;
}
public void WorkFunction(object t)
{
while (true)
{
int v;
s.Wait();
if (s.CurrentCount == 0 && test.Count == 0)
test.CompleteAdding();
if (!test.TryTake(out v, -1))
break;
s.Release();
Console.WriteLine(_name + " = " + v);
Thread.Sleep(5);
for (int i = 0; i < v; i++)
test.Add(i);
}
Console.WriteLine("Done " + _name);
}
}
}

This can be parallelized using task parallelism. Every node in the tree is considered to be a task which may spawn sub-tasks. See Dynamic Task Parallelism for a more detailed description.
For a binary tree with 5 levels that writes each node to console and waits for 5 milliseconds as in your example, the ParallelWalk method would then look for example as follows:
class Program
{
internal class TreeNode
{
internal TreeNode(int level)
{
Level = level;
}
internal int Level { get; }
}
static void Main(string[] args)
{
ParallelWalk(new TreeNode(0));
Console.Read();
}
static void ParallelWalk(TreeNode node)
{
if (node == null) return;
Console.WriteLine(node.Level);
Thread.Sleep(5);
if(node.Level > 4) return;
int nextLevel = node.Level + 1;
var t1 = Task.Factory.StartNew(
() => ParallelWalk(new TreeNode(nextLevel)));
var t2 = Task.Factory.StartNew(
() => ParallelWalk(new TreeNode(nextLevel)));
Task.WaitAll(t1, t2);
}
}
The central lines are where the tasks t1 and t2 are spawned.
By this decomposition in tasks, the scheduling is done by the Task Parallel Library and you don't have to manage a shared set of nodes anymore.

Related

C# infinitive task loop using Task<> class + cancellation

I`m trying to make a small class for the multithreading usage in my WinForm projects.
Tried Threads(problems with UI), Backgroundworker(smth went wrong with UI too, just leave it now:)), now trying to do it with Task class. But now, can`t understand, how to make an infinitive loop and a cancelling method (in class) for all running tasks.
Examples i found is to be used in 1 method.
So, here is a structure & code of currently working part (Worker.css and methonds used in WinForm code).
Worker.css
class Worker
{
public static int threadCount { get; set; }
public void doWork(ParameterizedThreadStart method)
{
Task[] tasks = Enumerable.Range(0, 4).Select(i => Task.Factory.StartNew(() => method(i))).ToArray();
}
}
usage on
Form1.cs
private void Start_btn_Click(object sender, EventArgs e)
{
Worker.threadCount = 1; //actually it doesn`t using now, number of tasks is declared in class temporaly
Worker worker = new Worker();
worker.doWork(Job);
string logString_1 = string.Format("Starting {0} threads...", Worker.threadCount);
log(logString_1);
}
public static int j = 0;
private void Job(object sender)
{
Worker worker = new Worker();
Random r = new Random();
log("Thread "+Thread.CurrentThread.ManagedThreadId +" is working...");
for (int i = 0; i < 5; i++)
{
j++;
log("J==" + j);
if (j == 50)
{
//worker.Stop();
log("STOP");
}
}
Thread.Sleep(r.Next(500, 1000));
}
So, it run an example 4 threads, they executed, i got J==20 in my log, it`s ok.
My question is, how to implement infinitive loop for the tasks, created by Worker.doWork() method.
And also to make a .Stop() method for the Worker class (which should just stop all tasks when called). As i understand it`s related questions, so i put it in 1.
I tryed some solutions, but all of them based on the CancellationToken usage, but i have to create this element only inside of the Worker.doWork() method, so i can`t use the same token to create a Worker.Stop() method.
Someone can help? threads amount range i have to use in this software is about 5-200 threads.
using J computation is just an example of the the easy condition used to stop a software work(stop of tasks/threads).
In real, stop conditions is mostly like Queue<> is finished, or List<> elements is empty(finished).
Finally, get it works.
class Worker
{
public static int threadCount { get; set; }
Task[] tasks;
//ex data
public static string exception;
static CancellationTokenSource wtoken = new CancellationTokenSource();
CancellationToken cancellationToken = wtoken.Token;
public void doWork(ParameterizedThreadStart method)
{
try
{
tasks = Enumerable.Range(0, 4).Select(i => Task.Factory.StartNew(() =>
{
while (!cancellationToken.IsCancellationRequested)
{
method(i);
}
}, cancellationToken)).ToArray();
}
catch (Exception ex) { exception = ex.Message; }
}
public void HardStop()
{
try
{
using (wtoken)
{
wtoken.Cancel();
}
wtoken = null;
tasks = null;
}
catch (Exception ex) { exception = ex.Message; }
}
}
But if i`m using this method to quit cancellationToken.ThrowIfCancellationRequested();
Get a error:
when Job() method reach J == 50, and worker.HardStop() function called, program window crashes and i get and exception "OparetionCanceledException was unhandled by user code"
on this string
cancellationToken.ThrowIfCancellationRequested();
so, whats wrong? i`m already put it in try{} catch(){}
as i understood, just some boolean properties should be changed in Task (Task.IsCancelled == false, Task.IsFaulted == true) on wtoken.Cancel();
I'd avoid all of the mucking around with tasks and use Microsoft's Reactive Framework (NuGet "Rx-Main") for this.
Here's how:
var r = new Random();
var query =
Observable
.Range(0, 4, Scheduler.Default)
.Select(i =>
Observable
.Generate(0, x => true, x => x, x => x,
x => TimeSpan.FromMilliseconds(r.Next(500, 1000)),
Scheduler.Default)
.Select(x => i))
.Merge();
var subscription =
query
.Subscribe(i => method(i));
And when you want to cancel the calls to method just do this:
subscription.Dispose();
I've tested this and it works like a treat.
If I wrap this up in your worker class then it looks like this:
class Worker
{
private Random _r = new Random();
private IDisposable _subscription = null;
public void doWork()
{
_subscription =
Observable
.Range(0, 4, Scheduler.Default)
.Select(n =>
Observable
.Generate(
0, x => true, x => x, x => x,
x => TimeSpan.FromMilliseconds(_r.Next(500, 1000)),
Scheduler.Default)
.Select(x => n))
.Merge()
.Subscribe(i => method(i));
}
public void HardStop()
{
_subscription.Dispose();
}
}

What is the best scenario for one fast producer multiple slow consumers?

I'm looking for the best scenario to implement one producer multiple consumer multithreaded application.
Currently I'm using one queue for shared buffer but it's much slower than the case of one producer one consumer.
I'm planning to do it like this:
Queue<item>[] buffs = new Queue<item>[N];
object[] _locks = new object[N];
static void Produce()
{
int curIndex = 0;
while(true)
{
// Produce item;
lock(_locks[curIndex])
{
buffs[curIndex].Enqueue(curItem);
Monitor.Pulse(_locks[curIndex]);
}
curIndex = (curIndex+1)%N;
}
}
static void Consume(int myIndex)
{
item curItem;
while(true)
{
lock(_locks[myIndex])
{
while(buffs[myIndex].Count == 0)
Monitor.Wait(_locks[myIndex]);
curItem = buffs[myIndex].Dequeue();
}
// Consume item;
}
}
static void main()
{
int N = 100;
Thread[] consumers = new Thread[N];
for(int i = 0; i < N; i++)
{
consumers[i] = new Thread(Consume);
consumers[i].Start(i);
}
Thread producer = new Thread(Produce);
producer.Start();
}
Use a BlockingCollection
BlockingCollection<item> _buffer = new BlockingCollection<item>();
static void Produce()
{
while(true)
{
// Produce item;
_buffer.Add(curItem);
}
// eventually stop producing
_buffer.CompleteAdding();
}
static void Consume(int myIndex)
{
foreach (var curItem in _buffer.GetConsumingEnumerable())
{
// Consume item;
}
}
static void main()
{
int N = 100;
Thread[] consumers = new Thread[N];
for(int i = 0; i < N; i++)
{
consumers[i] = new Thread(Consume);
consumers[i].Start(i);
}
Thread producer = new Thread(Produce);
producer.Start();
}
If you don't want to specify number of threads from start you can use Parallel.ForEach instead.
static void Consume(item curItem)
{
// consume item
}
void Main()
{
Thread producer = new Thread(Produce);
producer.Start();
Parallel.ForEach(_buffer.GetConsumingPartitioner(), Consumer)
}
Using more threads won't help. It may even reduce performance. I suggest you try to use ThreadPool where every work item is one item created by the producer. However, that doesn't guarantee the produced items to be consumed in the order they were produced.
Another way could be to reduce the number of consumers to 4, for example and modify the way they work as follows:
The producer adds the new work to the queue. There's only one global queue for all worker threads. It then sets a flag to indicate there is new work like this:
ManualResetEvent workPresent = new ManualResetEvent(false);
Queue<item> workQueue = new Queue<item>();
static void Produce()
{
while(true)
{
// Produce item;
lock(workQueue)
{
workQueue.Enqueue(newItem);
workPresent.Set();
}
}
}
The consumers wait for work to be added to the queue. Only one consumer will get to do its job. It then takes all the work from the queue and resets the flag. The producer will not be able to add new work until that is done.
static void Consume()
{
while(true)
{
if (WaitHandle.WaitOne(workPresent))
{
workPresent.Reset();
Queue<item> localWorkQueue = new Queue<item>();
lock(workQueue)
{
while (workQueue.Count > 0)
localWorkQueue.Enqueue(workQueue.Dequeue());
}
// Handle items in local work queue
...
}
}
}
That outcome of this, however, is a bit unpredictable. It could be that one thread is doing all the work and the others do nothing.
I don't see why you have to use multiple queues. Just reduce the amount of locking. Here is an sample where you can have a large number of consumers and they all wait for new work.
public class MyWorkGenerator
{
ConcurrentQueue<object> _queuedItems = new ConcurrentQueue<object>();
private object _lock = new object();
public void Produce()
{
while (true)
{
_queuedItems.Enqueue(new object());
Monitor.Pulse(_lock);
}
}
public object Consume(TimeSpan maxWaitTime)
{
if (!Monitor.Wait(_lock, maxWaitTime))
return null;
object workItem;
if (_queuedItems.TryDequeue(out workItem))
{
return workItem;
}
return null;
}
}
Do note that Pulse() will only trigger one consumer at a time.
Example usage:
static void main()
{
var generator = new MyWorkGenerator();
var consumers = new Thread[20];
for (int i = 0; i < consumers.Length; i++)
{
consumers[i] = new Thread(DoWork);
consumers[i].Start(generator);
}
generator.Produce();
}
public static void DoWork(object state)
{
var generator = (MyWorkGenerator) state;
var workItem = generator.Consume(TimeSpan.FromHours(1));
while (workItem != null)
{
// do work
workItem = generator.Consume(TimeSpan.FromHours(1));
}
}
Note that the actual queue is hidden in the producer as it's imho an implementation detail. The consumers doesn't really have to know how the work items are generated.

loop and thread are working parallel

my case is:
loop and thread are working parallel.. i want to stop the execution of loop untill thread is done with its functionality, when the thread state is stopped, at that time i want to execute the loop further..
for (int pp = 0; pp < LstIop.Count; pp++)
{
oCntrlImageDisplay = new CntrlImageDisplay();
oCntrlImageEdit = new CntrlImageEdit();
axAcroPDF1 = new AxAcroPDFLib.AxAcroPDF();
int pages = ConvertFileIntoBinary(LstIop[pp].Input, oCntrlImageEdit);
oCntrlImageDisplay.ImgDisplay.Image = LstIop[pp].Output;
oCntrlImageDisplay.ImgEditor.Image = oCntrlImageDisplay.ImgDisplay.Image;
if (t1 == null || t1.ThreadState.ToString() == "Stopped")
{
t1 = new Thread(() => convert(pages, LstIop[pp].Input, LstIop[pp].Output, stIop[pp].Temp));
t1.SetApartmentState(ApartmentState.STA);
t1.IsBackground = true;
CheckForIllegalCrossThreadCalls = false;
t1.Start();
}
}
as the others have said, there is no point in threading here, but if your hell bent on it, do Async. just use .Invoke or, .begininvoke followed by a .endInvoke
EX:
delegate void T2();
static void Main(string[] args)
{
T2 Thread = new T2(Work);
while (true)
{
IAsyncResult result = Thread.BeginInvoke(null, null);
//OTHER WORK TO DO
Thread.EndInvoke(result);
}
}
static void Work()
{
//WORK TO DO
}
using delegates is nice because you can specify return data, and send parameters
delegate double T2(byte[] array,string text, int num);
static void Main(string[] args)
{
T2 Thread = new T2(Work);
while (true)
{
IAsyncResult result = Thread.BeginInvoke(null, null);
//OTHER WORK TO DO
double Returned = Thread.EndInvoke(result);
}
}
static double Work(byte[] array, string text, int num)
{
// WORK TO DO
return(3.4);
}
To wait for the thread to finish executing, call:
t1.join();

Parallel.Foreach + yield return?

I want to process something using parallel loop like this :
public void FillLogs(IEnumerable<IComputer> computers)
{
Parallel.ForEach(computers, cpt=>
{
cpt.Logs = cpt.GetRawLogs().ToList();
});
}
Ok, it works fine. But How to do if I want the FillLogs method return an IEnumerable ?
public IEnumerable<IComputer> FillLogs(IEnumerable<IComputer> computers)
{
Parallel.ForEach(computers, cpt=>
{
cpt.Logs = cpt.GetRawLogs().ToList();
yield return cpt // KO, don't work
});
}
EDIT
It seems not to be possible... but I use something like this :
public IEnumerable<IComputer> FillLogs(IEnumerable<IComputer> computers)
{
return computers.AsParallel().Select(cpt => cpt);
}
But where I put the cpt.Logs = cpt.GetRawLogs().ToList(); instruction
Short version - no, that isn't possible via an iterator block; the longer version probably involves synchronized queue/dequeue between the caller's iterator thread (doing the dequeue) and the parallel workers (doing the enqueue); but as a side note - logs are usually IO-bound, and parallelising things that are IO-bound often doesn't work very well.
If the caller is going to take some time to consume each, then there may be some merit to an approach that only processes one log at a time, but can do that while the caller is consuming the previous log; i.e. it begins a Task for the next item before the yield, and waits for completion after the yield... but that is again, pretty complex. As a simplified example:
static void Main()
{
foreach(string s in Get())
{
Console.WriteLine(s);
}
}
static IEnumerable<string> Get() {
var source = new[] {1, 2, 3, 4, 5};
Task<string> outstandingItem = null;
Func<object, string> transform = x => ProcessItem((int) x);
foreach(var item in source)
{
var tmp = outstandingItem;
// note: passed in as "state", not captured, so not a foreach/capture bug
outstandingItem = new Task<string>(transform, item);
outstandingItem.Start();
if (tmp != null) yield return tmp.Result;
}
if (outstandingItem != null) yield return outstandingItem.Result;
}
static string ProcessItem(int i)
{
return i.ToString();
}
I don't want to be offensive, but maybe there is a lack of understanding. Parallel.ForEach means that the TPL will run the foreach according to the available hardware in several threads. But that means, that ii is possible to do that work in parallel! yield return gives you the opportunity to get some values out of a list (or what-so-ever) and give them back one-by-one as they are needed. It prevents of the need to first find all items matching the condition and then iterate over them. That is indeed a performance advantage, but can't be done in parallel.
Although the question is old I've managed to do something just for fun.
class Program
{
static void Main(string[] args)
{
foreach (var message in GetMessages())
{
Console.WriteLine(message);
}
}
// Parallel yield
private static IEnumerable<string> GetMessages()
{
int total = 0;
bool completed = false;
var batches = Enumerable.Range(1, 100).Select(i => new Computer() { Id = i });
var qu = new ConcurrentQueue<Computer>();
Task.Run(() =>
{
try
{
Parallel.ForEach(batches,
() => 0,
(item, loop, subtotal) =>
{
Thread.Sleep(1000);
qu.Enqueue(item);
return subtotal + 1;
},
result => Interlocked.Add(ref total, result));
}
finally
{
completed = true;
}
});
int current = 0;
while (current < total || !completed)
{
SpinWait.SpinUntil(() => current < total || completed);
if (current == total) yield break;
current++;
qu.TryDequeue(out Computer computer);
yield return $"Completed {computer.Id}";
}
}
}
public class Computer
{
public int Id { get; set; }
}
Compared to Koray's answer this one really uses all the CPU cores.
You can use the following extension method
public static class ParallelExtensions
{
public static IEnumerable<T1> OrderedParallel<T, T1>(this IEnumerable<T> list, Func<T, T1> action)
{
var unorderedResult = new ConcurrentBag<(long, T1)>();
Parallel.ForEach(list, (o, state, i) =>
{
unorderedResult.Add((i, action.Invoke(o)));
});
var ordered = unorderedResult.OrderBy(o => o.Item1);
return ordered.Select(o => o.Item2);
}
}
use like:
public void FillLogs(IEnumerable<IComputer> computers)
{
cpt.Logs = computers.OrderedParallel(o => o.GetRawLogs()).ToList();
}
Hope this will save you some time.
How about
Queue<string> qu = new Queue<string>();
bool finished = false;
Task.Factory.StartNew(() =>
{
Parallel.ForEach(get_list(), (item) =>
{
string itemToReturn = heavyWorkOnItem(item);
lock (qu)
qu.Enqueue(itemToReturn );
});
finished = true;
});
while (!finished)
{
lock (qu)
while (qu.Count > 0)
yield return qu.Dequeue();
//maybe a thread sleep here?
}
Edit:
I think this is better:
public static IEnumerable<TOutput> ParallelYieldReturn<TSource, TOutput>(this IEnumerable<TSource> source, Func<TSource, TOutput> func)
{
ConcurrentQueue<TOutput> qu = new ConcurrentQueue<TOutput>();
bool finished = false;
AutoResetEvent re = new AutoResetEvent(false);
Task.Factory.StartNew(() =>
{
Parallel.ForEach(source, (item) =>
{
qu.Enqueue(func(item));
re.Set();
});
finished = true;
re.Set();
});
while (!finished)
{
re.WaitOne();
while (qu.Count > 0)
{
TOutput res;
if (qu.TryDequeue(out res))
yield return res;
}
}
}
Edit2: I agree with the short No answer. This code is useless; you cannot break the yield loop.

Thread lock clarification

I am a beginer in programing.
When i execute the code by locking the operation:
class ThreadSafe
{
static List<string> list = new List<string>();
static object obj=new object();
static void Main()
{
new Thread(AddItems).Start();
new Thread(AddItems).Start();
foreach (string str in list)
{
Console.WriteLine(str);
}
Console.WriteLine("Count=" + list.Count.ToString());
Console.ReadKey(true);
}
static void AddItems()
{
lock (obj)
{
for (int i = 1; i < 10; i++)
list.Add("Item " + i.ToString());
}
}
}
even i am reciving,"InvalidOperationException".What would be the code alteration?
The issue is that your threads are altering the list while it is trying to be read.
class ThreadSafe
{
static List<string> list = new List<string>();
static object obj=new object();
static void Main()
{
var t1 = new Thread(AddItems);
var t2 = new Thread(AddItems);
t1.Start();
t2.Start();
t1.Join();
t2.Join();
foreach (string str in list)
{
Console.WriteLine(str);
}
Console.WriteLine("Count=" + list.Count.ToString());
Console.ReadKey(true);
}
static void AddItems()
{
for (int i = 1; i < 10; i++)
lock (obj)
{
list.Add("Item " + i.ToString());
}
}
}
The difference being that this code waits for both threads to complete before showing the results.
I also moved the lock around the specific instruction that needs to be locked, so that both threads can run concurrently.
You're enumerating over a collection with foreach (string str in list) while modifying it in AddItems(). For this code to work property you'll either have to Thread.Join() both threads (so that both will finish adding items to a list; I'm not sure, however if Add is threadsafe; I bet it's not, so you'll have to account for that by locking on SyncRoot) or use a ReaderWriterLock to logically separate these operations.
You are looping through the result list before the two AddItems threads have finished populating the list. So, the foreach complains that the list was updated while it was looping through that list.
Something like this should help:
System.Threading.Thread.Sleep(0); // Let the other threads get started on the list.
lock(obj)
{
foreach (string str in list)
{
Console.WriteLine(str);
}
}
Watch out though! This doesn't guarantee that the second thread will finish it's job before you have read through the list provided by the first thread (assuming the first thread grabs the lock first).
You will need some other mechanism (like John Gietzen's solution) to know when both threads have finished, before reading the results.
Use the debugger. :)
You receive the InvalidOperationException on the foreach.
What happens, is that the foreach is executed while your threads are still running.
Therefore, you're iterating over your list, while items are being added to the list. So, the contents of the list are changing, and therefore, the foreach throws an exception.
You can avoid this problem by calling 'Join'.
static void Main()
{
Thread t1 = new Thread (AddItems);
Thread t2 =new Thread (AddItems);
t1.Start ();
t2.Start ();
t1.Join ();
t2.Join ();
foreach( string str in list )
{
Console.WriteLine (str);
}
Console.WriteLine ("Count=" + list.Count.ToString ());
Console.ReadKey (true);
}
I have changed the code, which proves lock does nothing.
I expected add2 won't show up until add1 has finished. But add1 and add2 just mingled together.
using System;
using System.Threading;
public static class Example
{
public static void Main()
{
int data= 0;
Thread t1 = new Thread(()=> add1(ref data));
Thread t2 = new Thread(() => add2(ref data));
t1.Start();
t2.Start();
}
static void add1(ref int x)
{
object lockthis = new object();
lock (lockthis)
{
for (int i = 0; i < 30; i++)
{
x += 1;
Console.WriteLine("add1 " + x);
}
}
}
static void add2(ref int x)
{
object lockthis = new object();
lock (lockthis)
{
for (int i = 0; i < 30; i++)
{
x += 3;
Console.WriteLine("add2 " + x);
}
}
}
}

Categories