I've got application which has two main task: encoding, processing video.
These tasks are independant.
Each task I would like run with configurable number of threads.
For this reason for one task I usually use ThreadPool and SetMaxThreads. But now I've got two tasks and would like "two configurable(number of threads) threapool for each task".
Well, ThreadPool is a static class. So how can I implement my strategy(easy configurable number of threads for each task).
Thanks
You will probably want your own thread pool. If you are using .NET 4.0 then it is actually fairly easy to roll your own if you use the BlockingCollection class.
public class CustomThreadPool
{
private BlockingCollection<Action> m_WorkItems = new BlockingCollection<Action>();
public CustomThreadPool(int numberOfThreads)
{
for (int i = 0; i < numberOfThreads; i++)
{
var thread = new Thread(
() =>
{
while (true)
{
Action action = m_WorkItems.Take();
action();
}
});
thread.IsBackground = true;
thread.Start();
}
}
public void QueueUserWorkItem(Action action)
{
m_WorkItems.Add(action);
}
}
That is really all there is to it. You would create a CustomThreadPool for each actual pool you want to control. I posted the minimum amount of code to get a crude thread pool going. Naturally, you might want to tweak and expand this implementation to suit your specific need.
Related
in MS Docu you can read about SemaphoreSlim:
„Represents a lightweight alternative to Semaphore that limits the number of threads that can access a resource or pool of resources concurrently.“
https://learn.microsoft.com/en-us/dotnet/api/system.threading.semaphoreslim?view=net-5.0
In my understanding a Task is different from Thread. Task is higher level than Thread. Different tasks can run on the same thread. Or a task can be continued on another thread than it was started on.
(Compare: "server-side applications in .NET using asynchrony will use very few threads without limiting themselves to that. If everything really can be served by a single thread, it may well be - if you never have more than one thing to do in terms of physical processing, then that's fine."
from in C# how to run method async in the same thread)
IMO if you put this information together, the conclusion is that you can’t limit the number of Tasks running in parallel with the use of a semaphore slim, but…
there are other texts that give this kind of advice (How to limit the amount of concurrent async I/O operations?, see “You can definitely do this…”)
if I’m executing this code on my machine it seems it IS possible. If I work with different numbers for _MaxDegreeOfParallelism and different ranges of numbers, _RunningTasksCount doesn’t exceed the limit that is given by MaxDegreeOfParallelism.
Can somebody provide me some information to clearify?
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
IRunner runner = new RunnerSemaphore();
runner.Run();
Console.WriteLine("Hit any key to close...");
Console.ReadLine();
}
}
public class RunnerSemaphore : IRunner
{
private readonly SemaphoreSlim _ConcurrencySemaphore;
private List<int> _Numbers;
private int _MaxDegreeOfParallelism = 3;
private object _RunningTasksLock = new object();
private int _RunningTasksCount = 0;
public RunnerSemaphore()
{
_ConcurrencySemaphore = new SemaphoreSlim(_MaxDegreeOfParallelism);
_Numbers = _Numbers = Enumerable.Range(1, 100).ToList();
}
public void Run()
{
RunAsync().Wait();
}
private async Task RunAsync()
{
List<Task> allTasks = new List<Task>();
foreach (int number in _Numbers)
{
var task = Task.Run
(async () =>
{
await _ConcurrencySemaphore.WaitAsync();
bool isFast = number != 1;
int delay = isFast ? 200 : 10000;
Console.WriteLine($"Start Work {number}\tManagedThreadId {Thread.CurrentThread.ManagedThreadId}\tRunning {IncreaseTaskCount()} tasks");
await Task.Delay(delay).ConfigureAwait(false);
Console.WriteLine($"End Work {number}\tManagedThreadId {Thread.CurrentThread.ManagedThreadId}\tRunning {DecreaseTaskCount()} tasks");
})
.ContinueWith((t) =>
{
_ConcurrencySemaphore.Release();
});
allTasks.Add(task);
}
await Task.WhenAll(allTasks.ToArray());
}
private int IncreaseTaskCount()
{
int taskCount;
lock (_RunningTasksLock)
{
taskCount = ++ _RunningTasksCount;
}
return taskCount;
}
private int DecreaseTaskCount()
{
int taskCount;
lock (_RunningTasksLock)
{
taskCount = -- _RunningTasksCount;
}
return taskCount;
}
}
Represents a lightweight alternative to Semaphore that limits the number of threads that can access a resource or pool of resources concurrently.
Well, that was a perfectly fine description when SemaphoreSlim was first introduced - it was just a lightweight Semaphore. Since that time, it has gotten new methods (i.e., WaitAsync) that enable it to act like an asynchronous synchronization primitive.
In my understanding a Task is different from Thread. Task is higher level than Thread. Different tasks can run on the same thread. Or a task can be continued on another thread than it was started on.
This is true for what I call "Delegate Tasks". There's also a completely different kind of Task that I call "Promise Tasks". Promise tasks are similar to promises (or "futures") in other languages (e.g., JavaScript), and they just represent the completion of some event. Promise tasks do not "run" anywhere; they just complete based on some future event (usually via a callback).
async methods always return promise tasks. The code in an asynchronous method is not actually run as part of the task; the task itself only represents the completion of the async method. I recommend my async intro for more information about async and how the code portions are scheduled.
if you put this information together, the conclusion is that you can’t limit the number of Tasks running in parallel with the use of a semaphore slim
This is personal preference, but I try to be very careful about terminology, precisely to avoid problems like this question. Delegate tasks may run in parallel, e.g., Parallel. Promise tasks do not "run", and they don't run in "parallel", but you can have multiple concurrent promise tasks that are all in progress. And SemaphoreSlim's WaitAsync is a perfect match for limiting that kind of concurrency.
You may wish to read about Stephen Toub's AsyncSemaphore (and other articles in that series). It's not the same implementation as SemaphoreSlim, but behaves essentially the same as far as promise tasks are concerned.
New to threading and tasks here :)
So, I wrote a simple threading program that creates a few threads and runs them asynchronously then waits for them to finish.
I then changed it to a Task. The code does exactly the same thing and the only change is I change a couple of statements.
So, two questions really:
In the below code, what is the difference?
I'm struggling to figure out async/await. How would I integrate it into the below, or given all examples seem to be one method calls another that are both async/await return is this a bad example of using Task to do background work?
Thanks.
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
ThreadSample();
TaskSample();
}
private static void ThreadSample()
{
Random r = new Random();
MyThreadTest[] myThreads = new MyThreadTest[4];
Thread[] threads = new Thread[4];
for (int i = 0; i < 4; i++)
{
myThreads[i] = new MyThreadTest($"T{i}", r.Next(1, 500));
threads[i] = new Thread(new ThreadStart(myThreads[i].ThreadSample));
threads[i].Start();
}
for (int i = 0; i < 4; i++)
{
threads[i].Join();
}
System.Console.WriteLine("Finished");
System.Console.ReadKey();
}
private static void TaskSample()
{
Random r = new Random();
MyThreadTest[] myTasks = new MyThreadTest[4];
Task[] tasks = new Task[4];
for (int i = 0; i < 4; i++)
{
myTasks[i] = new MyThreadTest($"T{i}", r.Next(1, 500));
tasks[i] = new Task(new Action(myTasks[i].ThreadSample));
tasks[i].Start();
}
for (int i = 0; i < 4; i++)
{
tasks[i].Wait();
}
System.Console.WriteLine("Finished");
System.Console.ReadKey();
}
}
class MyThreadTest
{
private string name;
private int interval;
public MyThreadTest(string name, int interval)
{
this.name = name;
this.interval = interval;
Console.WriteLine($"Thread created: {name},{interval}");
}
public void ThreadSample()
{
for (int i = 0; i < 5; i++)
{
Thread.Sleep(interval);
Console.WriteLine($"{name} At {i} on thread {Thread.CurrentThread.ManagedThreadId}");
}
}
public void TaskSample()
{
for (int i = 0; i < 5; i++)
{
Thread.Sleep(interval);
Console.WriteLine($"{name} At {i} on thread {Thread.CurrentThread.ManagedThreadId}");
}
}
}
}
The Task Parallel Library (TPL) is an abstraction, and you shouldn't try to compare Tasks directly with threads. The Task object represents the abstract concept of an asynchronous task - a piece of code that should execute asynchronously and which will either complete, fault (throw an exception) or be canceled. The abstraction means you can write and use such tasks without worrying too much about exactly how they're executed asynchronously. There are lots of useful things like ContinueWith() you can use to compose, sequence and otherwise manage tasks.
Threads are a much lower level concrete system facility that can be used to run code asynchronously, but without all the niceties you get from the Task Parallel Library (TPL). If you want to sequence tasks or anything like that, you have to code it yourself.
In your example code, you're not actually directly creating any threads. Instead, the Actions you've written are being executed by the system thread pool. Of course, this can be changed. The TPL abstraction layer provides the TaskScheduler class which you can extend - if you have some special way of running code asynchronously, you can write a TaskScheduler to use TPL with it.
async/await is 100% compiler sugar. The compiler decomposes an async method into chunks, each of which becomes a Task, and those chunks execute sequentially with the help of a state machine, all generated by the compiler. One caution: by default, await captures the current SynchronizationContext and resumes on that context. So if you're doing this in WPF or Windows Forms, your continuation code after an await isn't actually running in a thread at all, it's running on the UI thread. You can disable this by calling ConfigureAwait(false). Really, async/await are primarily intended for asynchronous programming in UI environments where synchronization to a main thread is important.
In the below code, what is the difference?
The difference is big. Task is a unit of work, which will use a thread(s) from thread pool allocated based on estimated amount of work to be computed. if there is another Task, and there are paused, but still alive threads, in the pool, instead of spinning of a new thread (which is very costy) it reuses already created one. Multiple tasks can end-up using the same thread eventually (non simultaneously obviously)
Task based parallelism in nutshell is: Tasks are jobs, ThreadPool provisions resource to complete those jobs. Consequence, more clever, elastic thread/resource utilization, especially in general purpose programs targeting variety of execution environments and resource availability, for example VMs on cloud.
I'm struggling to figure out async/await.
await implied dependency of one task from another. If in your case you don't have it, other than waiting all of them to complete, what are you doing is pretty much enough.
If you need, you can achieve that with TPL too via, for example, ContinueWith
I have a .NET 4.0 ASP.NET project which requires some threading work I've never really messed with before and I've been looking at this for days and I'm still clueless =/
Basically I want something like when you take a ticket at the deli and wait your turn before they get back to you. I'll try and relate this and see if it makes any sense...
function starts ---> gets to section where it needs to "take a ticket" (I assume queue some type of item in a blockingcollection) and waits until other "tickets" (a.k.a other instances of the same function) are completed before it gives the function the OK to resume (blocking collection gets to the item in the queue) ---> finish function.
I don't need/want to do any work in the queue, I just want the function to statically wait it's turn among other instances of the function. Does that make sense? Is that possible?
Please provide code if possible as I've seen tons of examples but none of them make sense/don't do what I want.
If you want to have the timer solution, I'd enqueue all operations into a BlockingCollection and have a dedicated thread dequeue them. This thread would wait 5s and then push the dequeued item onto the thread pool. This dedicated thread should do this in an infinite loop. Dequeue, wait, push.
What I actually recommend however, is that you use the SemaphoreSlim class to throttle the number of concurrent requests to this fragile web service. Probably you should pick a number between 1 and 5 or so as the allowed amount of concurrency.
Alright so after researching document after document and playing with numerous rewrites of code I finally figured out I wasn't using the AutoResetEvent right and how to use a blocking collection on a dedicated thread. So here was the final solution using an AutoResetEvent with a BlockingCollection. This solution below might not show the same results 100% of the time (just because I believe it has to do with thread timing of when something was entered into the blocking collection) but the end result is that it does exactly what I want.
class Program
{
static void Main(string[] args)
{
TaskProcessor tp = new TaskProcessor();
Thread t1 = new Thread(new ParameterizedThreadStart(tp.SubmitRequest));
t1.Start(1);
Thread t2 = new Thread(new ParameterizedThreadStart(tp.SubmitRequest));
t2.Start(2);
Thread t3 = new Thread(new ParameterizedThreadStart(tp.SubmitRequest));
t3.Start(3);
}
}
class TaskProcessor
{
private AutoResetEvent _Ticket;
public TaskProcessor()
{
_Continue = new AutoResetEvent(false);
}
public void SubmitRequest(object i)
{
TicketingQueue dt = new TicketingQueue();
Console.WriteLine("Grab ticket for customer {0}", (int)i);
dt.GrabTicket(_Ticket);
_Continue.WaitOne();
Console.WriteLine("Customer {0}'s turn", (int)i);
}
}
public class TicketingQueue
{
private static BlockingCollection<AutoResetEvent> tickets = new BlockingCollection<AutoResetEvent>();
static TicketingQueue()
{
var thread = new Thread(
() =>
{
while (true)
{
AutoResetEvent e = tickets.Take();
e.Set();
Thread.Sleep(1000);
}
});
thread.Start();
}
public void GrabTicket(AutoResetEvent e)
{
tickets.Add(e);
}
}
I'm trying to implement an algorithm that should run in parallel using threads or tasks. The difficulty is that I want the threads/tasks to share their best results from time to time with all other threads.
The basic idea is this:
//Accessible from each thread
IProducerConsumerCollection<MyObject> _bestObjects;
//Executed in each thread
DoSomeWork(int n){
MyObject localObject;
for(var i = 0; i < n; i++){
//Do some calculations and store results in localObject
if((i/n)%0.5 == 0)
{
//store localObject in _bestObjects
//wait until each thread has stored its result in _bestObjects
//get the best result from _bestObjects and go on
}
}
}
How can this be achieved using System.Threading or System.Threading.Tasks and is it true that tasks should not be used for long running operations?
Update: Clarification
It's not my problem to have a thread safe collection but to make the threads stop, publish result, wait until all other threads have publihed their results to and then go on again. All threads will run simultaneously.
Cutting a long story short:
Whats better for long running operations? Task or Thread or anything else?
How to communicate between threads/taks to inform each of them about the state of all other assuming that the number of threads is set at runtime (depending on available cores).
Best Regards
Jay
Look at the dollowing example.
public class Worker
{
public SharedData state;
public void Work(SharedData someData)
{
this.state = someData;
while (true) ;
}
}
public class SharedData {
X myX;
public getX() { ... }
public setX(anX) { ... }
}
public class Sharing
{
public static void Main()
{
SharedData data = new SharedDate()
Worker work1 = new Worker(data);
Worker work2 = new Worker(data);
Thread thread = new Thread(new ThreadStart(work1.Work));
thread.start();
Thread thread2 = new Thread(new ThreadStart(work2.Work));
thread2.start();
}
}
bomslang's response is not accurate. Cannot instantiate a new thread with ThreadStart, passing in Work method which requires a parameter to be passed in the above example. ParameterizedThreadStart would be more suitable. The sample code for the Main method would look more like this:
public class Sharing
{
public static void Main()
{
SharedData data = new SharedDate()
Worker work1 = new Worker(data);
Worker work2 = new Worker(data);
Thread thread = new Thread(new ParameterizedThreadStart(work1.Work));
thread.start(someData);
Thread thread2 = new Thread(new ParameterizedThreadStart(work2.Work));
thread2.start(someData);
}
}
Note that 'work' is being passed into the ParameterizedThreadStart as the method for the new thread to execute, and the data required to pass in to the 'work' method is being passed in the call to start. The data must be passed as an object, so the work method will need to cast it back to the appropriate datatype as well. Lastly, there is also another approach to passing in data to a new thread via the use of anonymous methods.
What is the most recommended .NET custom threadpool that can have separate instances i.e more than one threadpool per application?
I need an unlimited queue size (building a crawler), and need to run a separate threadpool in parallel for each site I am crawling.
Edit :
I need to mine these sites for information as fast as possible, using a separate threadpool for each site would give me the ability to control the number of threads working on each site at any given time. (no more than 2-3)
Thanks
Roey
I believe Smart Thread Pool can do this. It's ThreadPool class is instantiated so you should be able to create and manage your separate site specific instances as you require.
Ami bar wrote an excellent Smart thread pool that can be instantiated.
take a look here
Ask Jon Skeet: http://www.yoda.arachsys.com/csharp/miscutil/
Parallel extensions for .Net (TPL) should actually work much better if you want a large number of parallel running tasks.
Using BlockingCollection can be used as a queue for the threads.
Here is an implementation of it.
Updated at 2018-04-23:
public class WorkerPool<T> : IDisposable
{
BlockingCollection<T> queue = new BlockingCollection<T>();
List<Task> taskList;
private CancellationTokenSource cancellationToken;
int maxWorkers;
private bool wasShutDown;
int waitingUnits;
public WorkerPool(CancellationTokenSource cancellationToken, int maxWorkers)
{
this.cancellationToken = cancellationToken;
this.maxWorkers = maxWorkers;
this.taskList = new List<Task>();
}
public void enqueue(T value)
{
queue.Add(value);
waitingUnits++;
}
//call to signal that there are no more item
public void CompleteAdding()
{
queue.CompleteAdding();
}
//create workers and put then running
public void startWorkers(Action<T> worker)
{
for (int i = 0; i < maxWorkers; i++)
{
taskList.Add(new Task(() =>
{
string myname = "worker " + Guid.NewGuid().ToString();
try
{
while (!cancellationToken.IsCancellationRequested)
{
var value = queue.Take();
waitingUnits--;
worker(value);
}
}
catch (Exception ex) when (ex is InvalidOperationException) //throw when collection is closed with CompleteAdding method. No pretty way to do this.
{
//do nothing
}
}));
}
foreach (var task in taskList)
{
task.Start();
}
}
//wait for all workers to be finish their jobs
public void await()
{
while (waitingUnits >0 || !queue.IsAddingCompleted)
Thread.Sleep(100);
shutdown();
}
private void shutdown()
{
wasShutDown = true;
Task.WaitAll(taskList.ToArray());
}
//case something bad happen dismiss all pending work
public void Dispose()
{
if (!wasShutDown)
{
queue.CompleteAdding();
shutdown();
}
}
}
Then use like this:
WorkerPool<int> workerPool = new WorkerPool<int>(new CancellationTokenSource(), 5);
workerPool.startWorkers(value =>
{
log.Debug(value);
});
//enqueue all the work
for (int i = 0; i < 100; i++)
{
workerPool.enqueue(i);
}
//Signal no more work
workerPool.CompleteAdding();
//wait all pending work to finish
workerPool.await();
You can have as many polls has you like simply creating new WorkPool objects.
This free nuget library here: CodeFluentRuntimeClient has a CustomThreadPool class that you can reuse. It's very configurable, you can change pool threads priority, number, COM apartment state, even name (for debugging), and also culture.
Another approach is to use a Dataflow Pipeline. I added these later answer because i find Dataflows a much better approach for these kind of problem, the problem of having several thread pools. They provide a more flexible and structured approach and can easily scale vertically.
You can broke your code into one or more blocks, link then with Dataflows and let then the Dataflow engine allocate threads according to CPU and memory availability
I suggest to broke into 3 blocks, one for preparing the query to the site page , one access site page, and the last one to Analise the data.
This way the slow block (get) may have more threads allocated to compensate.
Here how would look like the Dataflow setup:
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
prepareBlock.LinkTo(get, linkOptions);
getBlock.LinkTo(analiseBlock, linkOptions);
Data will flow from prepareBlock to getBlock and then to analiseBlock.
The interfaces between blocks can be any class, just have to bee the same. See the full example on Dataflow Pipeline
Using the Dataflow would be something like this:
while ...{
...
prepareBlock.Post(...); //to send data to the pipeline
}
prepareBlock.Complete(); //when done
analiseBlock.Completion.Wait(cancellationTokenSource.Token); //to wait for all queues to empty or cancel