I have a function that needs to process items 3 at a time, and if the total time taken is less than x seconds, the thread should sleep for the remaining seconds before proceeding further.
So I'm doing the following:
private void ProcessItems()
{
for (int i = 0, n = items.Count; i < n; i++)
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
batch.Add(items[i]);
if (batch.Count == 3 || i >= items.Count - 3)
{
List<Task> tasks = new List<Task>(3);
foreach (Item item in batch)
tasks.Add(Task.Factory.StartNew(() => ProcessItem(item)));
Task.WaitAll(tasks.ToArray());
batch.Clear();
}
stopwatch.Stop();
int elapsed = (int)stopwatch.ElapsedMilliseconds;
int delay = (3000) - elapsed;
if (delay > 0)
Thread.Sleep(delay);
}
}
The ProcessItem function makes a webrequest and processes the response (callback). This is the function that takes a small amount of time.
However, if I understand tasks correctly, a thread can have multiple tasks. Therefore, if I sleep the thread, other tasks can be affected.
Is there a more efficient way to achieve the above, and can tasks be used within Parallel.Foreach?
Tasks run on automatically managed threads. There is nothing intrinsically wrong with blocking a thread. It is just a little wasteful.
Here is how I would implement this very cleanly:
MyItem[] items = ...;
foreach(MyItem[] itemsChunk in items.AsChunked(3)) {
Parallel.ForEach(itemsChunk, item => Process(item));
//here you can insert a delay
}
This wastes not a single thread and is trivially simple. Parallel.ForEach used the current thread to process work items as well, so it does not sit idle. You can add your delay logic as well. Implementing AsChunked is left as an exercise for the reader... This function is supposed to split a list into chunks of the given size (3). The good thing about such a helper function is that it untangles the batching logic from the important parts.
Use
Task.Delay
instead
static async Task DoSomeProcess()
{
await Task.Delay(3000);
}
You are right, Thread.Sleep would block other tasks
Yes you can pair async/await pattern with Parallel.
Your ProcessItems method can be very easily transformed into async version ProcessItemsAsync (I didn't validate the "batching" logic):
private async Task ProcessItemsAsync()
{
for (int i = 0, n = items.Count; i < n; i++)
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
batch.Add(items[i]);
if (batch.Count == 3 || i >= items.Count - 3)
{
List<Task> tasks = new List<Task>(3);
foreach (Item item in batch)
tasks.Add(Task.Run(() => ProcessItem(item)));
await Task.WhenAll(tasks.ToArray());
batch.Clear();
}
stopwatch.Stop();
int elapsed = (int)stopwatch.ElapsedMilliseconds;
int delay = (3000) - elapsed;
if (delay > 0)
await Task.Delay(delay);
}
}
The only benefit would be that you don't block the ProcessItems thread with Task.WaitAll() and Thread.Sleep(), as #usr pointed out in his answer. Whether to take this approach or Parallel.ForEach one probably depends on the running environment of your code. Async/await won't make your code run faster, but it will improve its scalability for server-side execution, because it may take less threads to run, so more clients could be served.
Note also that now ProcessItemsAsync is itself an async task, so to keep the flow of the code which calls it unchanged, you'd need to call it like this:
ProcessItemsAsync().Wait();
Which is a blocking call on its own and may kill the advantage of async we just gained. Whether you can completely eliminate blocks like this in your app or not, largely depends on the rest of the app's workflow.
Related
Edit: As per the discussion in the comments, I was overestimating how much many threads would help, and have gone back to Parallell.ForEach with a reasonable MaxDegreeOfParallelism, and just have to wait it out.
I have a 2D array data structure, and perform work on slices of the data. There will only ever be around 1000 threads required to work on all the data simultaneously. Basically there are around 1000 "days" worth of data for all ~7000 data points, and I would like to process the data for each day in a new thread in parallel.
My issue is that doing work in the child threads dramatically slows the time in which the main thread starts them. If I have no work being done in the child threads, the main thread starts them all basically instantly. In my example below, with just a bit of work, it takes ~65ms to start all the threads. In my real use case, the worker threads will take around 5-10 seconds to compute all what they need, but I would like them all to start instantly otherwise, I am basically running the work in sequence. I do not understand why their work is slowing down the main thread from starting them.
How the data is setup shouldn't matter (I hope). The way it's setupmight look weird I was just simulating exactly how I receive the data. What's important is that if you comment out the foreach loop in the DoThreadWork method, the time it takes to start the threads is waaay lower.
I have the for (var i = 0; i < 4; i++) loop just to run the simulation multiple times to see 4 sets of timing results to make sure that it wasn't just slow the first time.
Here is a code snippet to simulate my real code:
public static void Main(string[] args)
{
var fakeData = Enumerable
.Range(0, 7000)
.Select(_ => Enumerable.Range(0, 400).ToArray())
.ToArray();
const int offset = 100;
var dataIndices = Enumerable
.Range(offset, 290)
.ToArray();
for (var i = 0; i < 4; i++)
{
var s = Stopwatch.StartNew();
var threads = dataIndices
.Select(n =>
{
var thread = new Thread(() =>
{
foreach (var fake in fakeData)
{
var sliced = new ArraySegment<int>(fake, n - offset, n - (n - offset));
DoThreadWork(sliced);
}
});
return thread;
})
.ToList();
foreach (var thread in threads)
{
thread.Start();
}
Console.WriteLine($"Before Join: {s.Elapsed.Milliseconds}");
foreach (var thread in threads)
{
thread.Join();
}
Console.WriteLine($"After Join: {s.Elapsed.Milliseconds}");
}
}
private static void DoThreadWork(ArraySegment<int> fakeData)
{
// Commenting out this foreach loop will dramatically increase the speed
// in which all the threads start
var a = 0;
foreach (var fake in fakeData)
{
// Simulate thread work
a += fake;
}
}
Use the thread/task pool and limit thread/task count to 2*(CPU Cores) at most. Creating more threads doesn't magically make more work get done as you need hardware "threads" to run them (1 per CPU core for non-SMT CPU's, 2 per core for Intel HT, AMD's SMT implementation). Executing hundreds to thousands of threads that don't have to passively await asynchronous callbacks (i.e. I/O) makes running the threads far less efficient due to thrashing the CPU with context switches for no reason.
Here I have the following piece of code:
var tasks = new List<Task>();
var stopwatch = new Stopwatch();
for (var i = 0; i < 100; i++)
{
var person = new Person { Id = i };
list.Add(person);
}
stopwatch.Start();
foreach (var item in list)
{
var task = Task.Run(async () =>
{
await Task.Delay(1000);
Console.WriteLine("Hi");
});
tasks.Add(task);
}
await Task.WhenAll(tasks);
stopwatch.Stop();
I assume that I will have about 100 seconds in the result of the stopwatch.
But I have 1,1092223.
I think I missing something, can you help me to explain why?
I assume that your confusion might come from the await keyword in await Task.Delay(1000);
But this holds only for the innerworking of the taskmethod. Inside the loop the next iteration will be performed immidiately after Task.Run is executed. So all Tasks will be started in close succession and then run in parallel. (As far as the system has free threads at hand of course) The system takes care how, when and in which order they can be executed.
In the end in this line:
await Task.WhenAll(tasks);
you actually wait for the slowest of them (or the one started as last).
To fullfill your expectation your code should actually look like this:
public async Task RunAsPseudoParallel()
{
List<Person> list = new List<Person>();
var stopwatch = new Stopwatch();
for (var i = 0; i < 100; i++)
{
var person = new Person { Id = i };
list.Add(person);
}
stopwatch.Start();
foreach (var item in list)
{
await Task.Run(async () =>
{
await Task.Delay(1000);
Console.WriteLine("Hi");
});
}
stopwatch.Stop()
}
Disclaimer: But this code is quite nonsensical, because it uses async functionality to implement a synchronous process. In this scenario you can simply leave out the Task.Run call and use a simple Thread.Sleep(1000).
Delays are always approximate.
You are limited by when the task scheduler chooses to run the delegate you pass to Task.Run. It may be executing other tasks and be unwilling to start up more threads. Or, it may launch a new thread -- which while not slow is also not free and costs time too.
You are limited by when the task scheduler chooses to resume your code after the delay completes.
You're also limited by the OS scheduler, which may be allocating CPU time to other processes/threads and end up delaying the thread that would execute your code.
Because you are launching multiple tasks, you are seeing all of these per-task variables compound into an even larger delay.
We have a processor that will receive a queue of elements, and for every element, it will run some actions that need to be guaranteed to be executed in a sequential manner. Each action to execute on an element is a Promise Task (http://blog.stephencleary.com/2014/04/a-tour-of-task-part-0-overview.html). The processing of each element in the queue doesn't need to wait for completion of the previous one.
The signature of the actions can be assumed to be something like this:
Task MyAwaitableMethod(int delay)
The way I'm seeing, the problem can be reduced to executing a loop, and inside the loop, executing sequential operations, and each iteration shouldn't block. I'm looking at 2 approaches:
1.
for (var i = 0; i < Iterations; i++)
{
Task.Run(async () => {
await MyAwaitableMethod(DelayInMilliseconds);
await MyAwaitableMethod(DelayInMilliseconds);
});
}
2.
for (var i = 0; i < Iterations; i++)
{
MyAwaitableMethod(DelayInMilliseconds).ContinueWith(
antecedent => MyAwaitableMethod(DelayInMilliseconds));
}
I was assuming, given the actions are Promises, that with approach #2, there would be less threads created, as opposed to Task.Run, which I'd assume would create more threads. But in tests I've run, the number of threads created for both when executing a high number of iterations tends to be the same, and not dependent of the given number of iterations.
Are both methods entirely equivalent? Or you guys have better suggestions?
EDIT (rephrasing the question)
Are both methods equivalent in terms of the number of threads both require?
Thanks
Why not use Task.WhenAll()?
var tasks = new List<Task>();
for (var i = 0; i < Iterations; i++)
{
Task t = MyAwaitableMethod(DelayInMilliseconds);
tasks.Add(t);
}
await Task.WhenAll(tasks);
Part of the beauty of async-await is to write sequential asynchronous code.
If it was not to be asynchronous, you would write:
for (var i = 0; i < Iterations; i++)
{
MyAwaitableMethod(DelayInMilliseconds);
MyAwaitableMethod(DelayInMilliseconds);
}
If you want it to be asynchronous, just write:
for (var i = 0; i < Iterations; i++)
{
await MyAwaitableMethod(DelayInMilliseconds);
await MyAwaitableMethod(DelayInMilliseconds);
}
The code you posted does not satisfy your requirement to process each item only after the previous one because you are not awaiting for Task.Run.
I've got a method which takes IWorkItem, starts work on it and returns related task. The method has to look like this because of external library used.
public Task WorkOn(IWorkItem workItem)
{
//...start asynchronous operation, return task
}
I want to do this work on multiple work items. I don't know how many of them will be there - maybe 1, maybe 10 000.
WorkOn method has internal pooling and may involve waiting if too many pararell executions will be reached. (like in SemaphoreSlim.Wait):
public Task WorkOn(IWorkItem workItem)
{
_semaphoreSlim.Wait();
}
My current solution is:
public void Do(params IWorkItem[] workItems)
{
var tasks = new Task[workItems.Length];
for (var i = 0; i < workItems.Length; i++)
{
tasks[i] = WorkOn(workItems[i]);
}
Task.WaitAll(tasks);
}
Question: may I use somehow Parallel.ForEach in this case? To avoid creating 10000 tasks and later wait because of WorkOn's throttling?
That actually is not that easy. You can use Parallel.ForEach to throttle the amount of tasks that are spawned. But I am unsure how that will perform/behave in your condition.
As a general rule of thumb I usually try to avoid mixing Task and Parallel.
Surely you can do something like this:
public void Do(params IWorkItem[] workItems)
{
Parallel.ForEach(workItems, (workItem) => WorkOn(workItem).Wait());
}
Under "normal" conditions this should limit your concurrency nicely.
You could also go full async-await and add some limiting to your concurrency with some tricks. But you have to do the concurrency limiting yourself in that case.
const int ConcurrencyLimit = 8;
public async Task Do(params IWorkItem[] workItems)
{
var cursor = 0;
var currentlyProcessing = new List<Task>(ConcurrencyLimit);
while (cursor < workItems.Length)
{
while (currentlyProcessing.Count < ConcurrencyLimit && cursor < workItems.Length)
{
currentlyProcessing.Add(WorkOn(workItems[cursor]));
cursor++;
}
Task finished = await Task.WhenAny(currentlyProcessing);
currentlyProcessing.Remove(finished);
}
await Task.WhenAll(currentlyProcessing);
}
As I said... a lot more complicated. But it will limit the concurrency to any value you apply as well. In addition it properly uses the async-await pattern. If you don't want non-blocking multi threading you can easily wrap this function into another function and do a blocking .Wait on the task returned by this function.
In key in this implementation is the Task.WhenAny function. This function will return one finished task in the applied list of task (wrapped by another task for the await.
I'm making my first steps in parallel programming. I rewrote CalculateSlots to CalculateSlotsAsync. It seams to work fine (3 times faster).
My questions are: Is it written correctly?
Do I need to use the newest async awayt pattern and if yes, how?
private void CalculateSlots(bool isCalculateAllSlots)
{
foreach (IndicatorSlot indicatorSlot in strategy.Slot)
{
if (isCalculateAllSlots || !indicatorSlot.IsCalculated)
CalculateStrategySlot(indicatorSlot.SlotNumber);
}
}
private void CalculateSlotsAsync(bool isCalculateAllSlots)
{
var tasks = new List<Task>();
foreach (IIndicatorSlot indicatorSlot in strategy.Slot)
{
if (isCalculateAllSlots || !indicatorSlot.IsCalculated)
{
IIndicatorSlot slot = indicatorSlot;
Task task = Task.Factory.StartNew(() => CalculateStrategySlot(slot.SlotNumber));
tasks.Add(task);
}
}
Task.WaitAll(tasks.ToArray());
}
Test on i7-3630QM #2.40Gh
// Executed for 96 sec.
for (int i = 0; i < 1000; i++)
CalculateSlots(true);
// Executed for 34 sec.
for (int i = 0; i < 1000; i++)
CalculateSlotsAsync(true);
For data-parallel operations, you can often simplify your implementation by using PLINQ:
strategy.Slot.AsParallel()
.Where(slot => isCalculateAllSlots || !indicatorSlot.IsCalculated)
.ForAll(slot => CalculateStrategySlot(slot.SlotNumber));
However, in your case, each item takes a relatively long time to compute, so I would recommend leaving them as tasks but marking them as LongRunning (which typically has the effect of executing them on a dedicated thread, rather than the thread pool).
Task task = Task.Factory.StartNew(() => CalculateStrategySlot(slot.SlotNumber),
TaskCreationOptions.LongRunning);
Reply: Task.WaitAll causes the calling thread – in your case, the UI thread – to block until all specified tasks have completed. (The behaviour is similar for the PLINQ ForAll.)
In order for your UI to remain responsive, you need to switch from a blocking approach to an asynchronous one. For example, suppose you have:
Task.WaitAll(tasks.ToArray());
UpdateUI(strategy.Slot); // must be called on UI thread
You can replace this with:
Task.Factory.ContinueWhenAll(tasks.ToArray(), completedTasks =>
{
// callback on UI thread
UpdateUI(strategy.Slot);
},
CancellationToken.None,
TaskContinuationOptions.None,
TaskScheduler.FromCurrentSynchronizationContext());
In practice, you'll also need to learn how to use CancellationToken to allow the user to discard the operation before it completes.