I will first provide the pseudocode and describe it below:
public void RunUntilEmpty(List<Job> jobs)
{
while (jobs.Any()) // the list "jobs" will be modified during the execution
{
List<Job> childJobs = new List<Job>();
Parallel.ForEach(jobs, job => // this will be done in parallel
{
List<Job> newJobs = job.Do(); // after a job is done, it may return new jobs to do
lock (childJobs)
childJobs.AddRange(newJobs); // I would like to add those jobs to the "pool"
});
jobs = childJobs;
}
}
As you can see, I am performing a unique type of foreach. The source, the set (jobs), can simply be enhanced during the execution and this behaviour cannot be determined earlier. When the method Do() is called on an object (here, job), it may return new jobs to perform and thus would enhance the source (jobs).
I could call this method (RunUntilEmpty) recursively, but unfortunately the stack can be really huge and is likely to result in an overflow.
Could you please tell me how to achieve this? Is there a way of doing this kind of actions in C#?
If I understand correctly, you basically start out with some collection of Job objects, each representing some task which can itself create one or more new Job objects as a result of performing its task.
Your updated code example looks like it will basically accomplish this. But note that, as commenter CommuSoft points out, it won't make most efficient use of your CPU cores. Because you are only updating the list of jobs after each group of jobs has completed, there's no way for newly-generated jobs to run until all of the previously-generated jobs have completed.
A better implementation would use a single queue of jobs, continually retrieving new Job objects for execution as old ones complete.
I agree that TPL Dataflow may be a useful way to implement this. However, depending on your needs, you might find it simple enough to just queue the tasks directly to the thread pool and use CountdownEvent to track the progress of the work so that your RunUntilEmpty() method knows when to return.
Without a good, minimal, complete code example, it's impossible to provide an answer that includes a similarly complete code example. But hopefully the below snippet illustrates the basic idea well enough:
public void RunUntilEmpty(List<Job> jobs)
{
CountdownEvent countdown = new CountdownEvent(1);
QueueJobs(jobs, countdown);
countdown.Signal();
countdown.Wait();
}
private static void QueueJobs(List<Job> jobs, CountdownEvent countdown)
{
foreach (Job job in jobs)
{
countdown.AddCount(1);
Task.Run(() =>
{
// after a job is done, it may return new jobs to do
QueueJobs(job.Do(), countdown);
countdown.Signal();
});
}
}
The basic idea is to queue a new task for each Job object, incrementing the counter of the CountdownEvent for each task that is queued. The tasks themselves do three things:
Run the Do() method,
Queue any new tasks, using the QueueJobs() method so that the CountdownEvent object's counter is incremented accordingly, and
Signal the CountdownEvent, decrementing its counter for the current task
The main RunUntilEmpty() signals the CountdownEvent to account for the single count it contributed to the object's counter when it created it, and then waits for the counter to reach zero.
Note that the calls to QueueJobs() are not recursive. The QueueJobs() method is not called by itself, but rather by the anonymous method declared within it, which is itself also not called by QueueJobs(). So there is no stack-overflow issue here.
The key feature in the above is that tasks are continuously queued as they become known, i.e. as they are returned by the previously-executed Do() method calls. Thus, the available CPU cores are kept busy by the thread pool, at least to the extent that any completed Do() method has in fact returned any new Job object to run. This addresses the main problem with the version of the code you've included in your question.
Related
So I am still really fresh with using task schedulers and threading in general, but am having trouble with a custom task scheduler. Unfortunately, I don't have all my code but will try to be as detailed as possible.
So my task scheduler is based on this limited concurrency task scheduler. Now, I want my tasks to have priority so I also have a separate class that creates and stores some concurrent queues, I am using this simple priority queue example.
So how I have it set up is that within the constructor for my tasks scheduler I pass another parameter for the amount queues I want:
public LimitedConcurrencyLevelTaskScheduler(int maxDegreeOfParallelism, int numberOfQueues)
The constructor then creates a new instance of the priority queue class with the number of queues I want. The only methods I really use from the priority queue class are TryAdd and TryTake.
Now, the changes I made to the task scheduler were mainly so that I could use the queues from my priority queue class instead of using the list in the example. The first change I had to make was to create an entirely new method for enqueuing tasks. I did this since there is no overload for QueueTask that takes more than one parameter and I wanted to create a key-value pair that held a priority level and the task.
protected sealed override void EnqueueTask(int Priority, Task task)
{
// Add the task to the list of tasks to be processed. If there aren't enough
// delegates currently queued or running to process tasks, schedule another.
var item = KeyValuePair<int, Task> (Priority, task);
lock (PriorityQueueClass._queues) //_queues are the concurrent queues that were created when I
//initialized my priority queue class.
{
PriorityQueueClass.TryAdd(item);
if (_delegatesQueuedOrRunning < _maxDegreeOfParallelism)
{
++_delegatesQueuedOrRunning;
NotifyThreadPoolOfPendingWork();
}
}
}
The only other method I modify is NotifyThreadPoolOfPendingWork(). The only real change in this method is that I use TryTake from my priority queue class (which will loop through queues in priority order) to grab an item and pass it into the TryExecuteTask method:
base.TryExecuteTask(item.Value); // item is a key-value pair, so the Value would be the task
Here's where I am having trouble. In a separate project, I have a main program (I properly set my references for my task scehduler) where I create some tasks, create my scheduler, and attempt to schedule my tasks.
public static void Main()
{
// creates a new scheduler with 2 degrees of parralelism and 2 queues
LimitedConcurrencyLevelTaskScheduler lcts = new LimitedConcurrencyLevelTaskScheduler(2,2);
//Create a few simple tasks
Task t1 = new Task(Action); //the actions are just a method that prints some text to output window.
Task t2 = new Task(Action); //I make the action in the main method, assume they are in here.
...
lcts.EnqueueTask(0,t1); //0 is the highest priority queue
lcts.EnqueueTask(1,t2); //1 is a lower priority queue
}
What happens if I run is that it looks like it is going through all the code, however when it reaches the base.TryExecuteTask(item.Value) in the EnqueueTask method it throws an exception "execute task may not be called for a task which was previously queued to a different taskscheduler".
Now, I'm not sure if this is even a correct way to start the scheduler as I have usually seen people use the task factory directly or use task.Start(). I am calling my EnqueueTask method directly because I'm not seeing a better way to assign priority at the moment. But if there is a fault in my thinking, or if there is a better way to merge the task scheduler and priority queue class, then I am all ears. Thank you.
UPDATE: Need to stick with .NET 4.6.1
I have a problem with a threadpool efficiency. I'm not sure I understand the whole concept. I did a lot of reading before asking that question and I know that threadpool is a good solution if you have a lot of small, relatively quick functions AND what's more important - non-blocking tasks. Using lock is very bad in threadpool.
And here is my question: How to return values from threadpool functions? If you have functions to run they probably produce some results, right? It's good to store those results somewhere. Where?
I'm running c.a. 200k very quick functions in a threadpool. The results I store in the List. Of course I have to do:
lock(lockobj)
{
myList.Add(result);
}
So, is this the right way? I mean, if your functions returns SOMETHING, you have to store them in some kind of collection. It has to be a blocking collection. So, I started thinking... "Blocking is very bead in threadpool, but you have to do this, at least once - at the end of every function
How to store/return results from functions running in threadpool?
Thanks!
JB
EDIT: By "function" I mean...
ThreadPool.QueueUserWorkItem(state =>
{
Result r = function(); // previously named "Task"
lock(lockobj)
{
allResults.Add(r);
}
}
If you don't want to block the ThreadPool threads use a lock-free approach. ConcurrentQueue is currently lock-free (as of .NET 4.6.2) when you enqueue items.
So simply do this:
public static ConcurrentQueue<Result> AllResults { get; } = new ConcurrentQueue<Result>();
ThreadPool.QueueUserWorkItem(state =>
{
Result r = function();
AllResults.Enqueue(r);
}
This will guarantee you don't block ThreadPool threads.
Any kind of collection that is thread safe/synchronized will do. There are plenty in .net framework.
You can also use volatile variables to store data between multiple threads - but this is usually considered a bad practice.
Another approach can be to schedule those operations on tasks that can produce results, they run by default on the thread pool and you can get the return values by awaiting the methods and checking the Result of the Task that is returned.
Finally you can write your own code in order to synchronize access to certain regions of code/variables etc using stuff like lock, semaphores, mutex etc
I have created a list of Task, like this:
public void A()
{
}
public void B()
{
}
public void C()
{
}
public void Ex()
{
Task.WaitAll(Task.Factory.StartNew(A), Task.Factory.StartNew(B), Task.Factory.StartNew(C));
var p=true;
}
Now my question is that. Will all the Tasks inside the list execute one by one or will they execute in parallel.
p=true
"p" is set when all tasks are done or before they are done?
For the first question:
Will those tasks execute one by one or asynchronously.
(here, I imagine you meant concurrently, which is not exactly the same)
Using StartNew will run your task in the current TaskScheduler. by default that means it will use the ThreadPool and, if there are any available slots in the thread pool, it will be run in parallel. If all slots are taken in the task pool, it may throttle the execution of the tasks in order to avoid the CPU to be overwhelmed, and the tasks may not be executed at the same concurrently: there are no guarantees.
This is a simplified explanation and a more complete and detailed explanation on the scheduling strategy is explained on the TaskScheduler documentation.
As a side note. The documentation for StartTask mentions a subtle difference between StartNew(Action) and Run(Action). They are not exactly equivalent, unlike stated in other answers.
Starting with the .NET Framework 4.5, you can use the Task.Run(Action) method as a quick way to call StartNew(Action) with default parameters. Note, however, that there is a difference in behavior between the two methods regarding : Task.Run(Action) by default does not allow child tasks started with the TaskCreationOptions.AttachedToParent option to attach to the current Task instance, whereas StartNew(Action) does.
For the second question
"p" is set when all tasks are done or before they are done?
The short answer is yes.
However, you should consider using another approach as this one will block your thread and waiting idly. The alternative is to give the control back to the caller if you can and so the thread is freed and can be used by the CPU. this is especially true if the thread on which this code is running is part of a ThreadPool.
therefore, you should prefer using WhenAll(). It returns a Task, which can be awaited or on which ContinueWith can be called
example:
var tasks = new Task[] {Task.Factory.StartNew(A), Task.Factory.StartNew(B), Task.Factory.StartNew(C)};
await Task.WhenAll(tasks);
first:
you are creating the tasks in wrong way. when you instantiate a task you need to call Start method on it otherwise it dose nothing.
new Task(() => /* Something * /).Start();
if you create Tasks the way you just did (by calling constructor and hitting start or using the TaskFacotry or even Task.Run) by default a ThreadPool thread will be dedicated to the task and thus the task is executed in parallel.
the Task.WhenAll Method will block the execution of calling method until all tasks which are passed to it are done executing.
so the boolean variable is set after all tasks are done.
I need to build a process that listen in WCF for new tasks. (Async)
Every Task get Enqueue'ed (somehow).
What is the Best (Logical and Performance) way to loop the queue and Dequeue it.
I thought about:
while(true){
queue.Dequeue();
}
I assume that there are better ways to do that.
Thanks
Have a look at System.Collections.Concurrent namespace - there is thread-safe queue implementation viz. ConcurrentQueue - although, I suspect that your needs would be better served by BlockingCollection.
Blocking collection is essentially a thread-safe collection useful for producer-consumer scenario. In your case, WCF calls will act as producers that will add to the collection while the worker thread will act as consumer who would essentially take queued tasks from the collection. By using single consumer (and collection), you can ensure order of execution. If that's not important then you may able to use multiple consumer threads. (There are also AddAny and TakeAny static overloads that will allow you to use multiple collections (multiple queues) if that is the need.)
The advantage over while(true) approach would be avoidance of tight loop that will just consume CPU cycles. Apart from having thread-safe, this would also solve issue of synchronization between queuing and de-queuing threads.
EDIT:
Blocking Collection is really very simple to use. See below simple example - add task will invoked from say your WCF methods to queue up tasks while StartConsumer will be called during service start-up.
public class MyTask { ... }
private BlockingCollection<MyTask> _tasks = new BlockingCollection<MyTask>();
private void AddTask(MyTask task)
{
_tasks.Add(task);
}
private void StartConsumer()
{
// I have used a task API but you can very well launch a new thread instead of task
Task.Factory.StartNew(() =>
{
while (!_tasks.IsCompleted)
{
var task = _tasks.Take();
ProcessTask(task);
}
});
}
While stopping service, one need to invoke _tasks.CompleteAdding so that consumer thread will break.
Find more examples on MSDN:
http://msdn.microsoft.com/en-us/library/dd997306.aspx
http://msdn.microsoft.com/en-us/library/dd460690.aspx
http://msdn.microsoft.com/en-us/library/dd460684.aspx
Instead of the infinite loop, I would use events to synchronize the queue. Whenever the WCF call is made, add an element to the queue and send out a "AnElementHasBeenAddedEvent".
The Thread executing the queued tasks listens for that event and whenever it receives it, the queue will be emptied.
Make sure there is only one thread that does this job!
Advantages over the while(true) concept: You do not have a thread that constantly loops through the endless loop and thus eats resources. You only do as much work as needed.
I have a table of schedule items, they may be scheduled for the same time. I'm wondering how to have them all execute at the correct time when:
The problem I see is that executing one scheduled item (like a scheduled twitter post) requires an API request which may take up to a second or two - probably longer. If I execute them sequentially + there are too many scheduled items at the same time, the time they get executed at could be after the scheduled time.
How would I go about building this "scheduling" system that avoids these problems? Any tips, advice?
Thanks!
I would use a Windows service to accomplish this. Then each of the items should be scheduled asynchronously using the BackgroundWorker process. This would allow all of the scheduled processes to be launched rapidly asynchronously so they don't collide and aren't depending on the previous one finishing before kicking off.
You might want to consider Quartz.NET. Gives you much flexibility in terms of scheduling and task execution.
Unless you take steps to take advantage of the asynchronous APIs that exist for all IO operations, your only approach is to use many threads. Consider the .net ThreadPool as this can increase the number of threads when too many work items are queued. There will be a limit here, as the ThreadPool spins up extra threads relatively slowly. Under sustained overload, your system will groan. Like I said, the best way to approach this is with asynchronous IO.
You can put the tasks in threads when you get want them to run:
public abstract class MyTask {
public abstract void DoWork();
}
// ...
public void SomeTaskStarter()
{
MyTask task = SomeFactoryMethodToCreateATaskInstance();
new Thread(new ThreadStart(task.DoWork)).Start();
}
MyTask is an abstract class that represents a task to do and it defines a method, DoWork() that will do what you want. SomeFactoryMethodToCreateATaskInstance() will construct a concrete instance of a task and all you need to do is write DoWork() to do what you need to do:
public class Twitterer : MyTask
{
private string _tweet;
public Twitterer(string tweet)
{
_tweet = tweet;
}
public override DoWork()
{
TwitterApi api = new TwitterApi(); // whatever
api.PostTweet(tweet);
}
}
You will most assuredly want some kind of action of task completion. Whatever you do, the task completion routine should be threadsafe, and should probably be called via BeginInvoke()/EndInvoke() if you need to do any UI-ish work.
SomeTaskStarter() is best called from a windows service, and will most likely contain an argument with information about what task should be started, etc.