I'm trying to write my own scheduler; the rationale behind it is that all the submitted actions will be executed in order, according to a delay. For example, if at time 0 I schedule action A with delay 5 and at time 1 I schedule action B with delay 2, then B should be executed first at time 3 and A should be executed second, at time 5.
Basically, what I am trying to do is something like:
public class MyScheduler
{
Task _task = new Task(() => { });
public MyScheduler()
{
_task.Start();
}
public void Schedule(Action action, long delay)
{
Task.Delay(TimeSpan.FromTicks(delay)).ContinueWith(_ =>
lock(_task) {
_task = _task.ContinueWith(task => action())
}
);
}
}
A relevant test for this code would be:
var waiter = new Waiter(3);
int _count = 0;
mysched = new MyScheduler();
mysched.Schedule(() => { _count++; waiter.Signal(); });
mysched.Schedule(() => { Task.Delay(100).Wait(); _count *= 3; waiter.Signal(); });
mysched.Schedule(() => { _count++; waiter.Signal(); });
waiter.Await();
Assert.AreEqual(4, _count);
In the above code, Waiter is a class with an internal variable initialized in the constructor; the Signal method decrements that internal variable and the Await method loops (and sleeps 10 ms on each iteration) until the internal variable is less than or equal to zero.
The aim of the test is to show that the scheduled actions have been performed in order.
Most of the times this is true and the test passes, but on few occasions the resulting value for _count is 2 instead of 4. I have spent a lot of time trying to figure out why this happens, but I can't seem to figure it out and my lack of experience with C# is not helping either.
Does anyone have any suggestions?
For one thing, _count is not synchronized for access from different threads.
I recommend that you not use ContinueWith at all; it is a very low-level method and is very easy to get the details wrong (for example, the default scheduler is TaskScheduler.Current, which is almost never what you want). Your general logic code should use await instead of ContinueWith.
Regarding the scheduler, these days it is almost impossible to make a good use case for developing your own. There are better ones available that are developed by geniuses and extremely well-tested. Consider Reactive Extensions: they provide several schedulers, and they all support scheduling.
Related
In my code I have a method such as:
void PerformWork(List<Item> items)
{
HostingEnvironment.QueueBackgroundWorkItem(async cancellationToken =>
{
foreach (var item in items)
{
await itemHandler.PerformIndividualWork(item);
}
});
}
Where Item is just a known model and itemHandler just does some work based off of the model (the ItemHandler class is defined in a separately maintained code base as nuget pkg I'd rather not modify).
The purpose of this code is to have work done for a list of items in the background but synchronously.
As part of the work, I would like to create a unit test to verify that when this method is called, the items are handled synchronously. I'm pretty sure the issue can be simplified down to this:
await MyTask(1);
await MyTask(2);
Assert.IsTrue(/* MyTask with arg 1 was completed before MyTask with arg 2 */);
The first part of this code I can easily unit test is that the sequence is maintained. For example, using NSubstitute I can check method call order on the library code:
Received.InOrder(() =>
{
itemHandler.PerformIndividualWork(Arg.Is<Item>(arg => arg.Name == "First item"));
itemHandler.PerformIndividualWork(Arg.Is<Item>(arg => arg.Name == "Second item"));
itemHandler.PerformIndividualWork(Arg.Is<Item>(arg => arg.Name == "Third item"));
});
But I'm not quite sure how to ensure that they aren't run in parallel. I've had several ideas which seem bad like mocking the library to have an artificial delay when PerformIndividualWork is called and then either checking a time elapsed on the whole background task being queued or checking the timestamps of the itemHandler received calls for a minimum time between the calls. For instance, if I have PerformIndividualWork mocked to delay 500 milliseconds and I'm expecting three items, then I could check elapsed time:
stopwatch.Start();
// I have an interface instead of directly calling HostingEnvironment, so I can access the task being queued here
backgroundTask.Invoke(...);
stopwatch.Stop();
Assert.IsTrue(stopwatch.ElapsedMilliseconds > 1500);
But that doesn't feels right and could lead to false positives. Perhaps the solution lies in modifying the code itself; however, I can't think of a way of meaningfully changing it to make this sort of unit test (testing tasks are run in order) possible. We'll definitely have system/integration testing to ensure the issue caused by asynchronous performance of the individual items doesn't happen, but I would like to hit testing here at this level as well.
Not sure if this is a good idea, but one approach could be to use an itemHandler that will detect when items are handled in parallel. Here is a quick and dirty example:
public class AssertSynchronousItemHandler : IItemHandler
{
private volatile int concurrentWork = 0;
public List<Item> Items = new List<Item>();
public Task PerformIndividualWork(Item item) =>
Task.Run(() => {
var result = Interlocked.Increment(ref concurrentWork);
if (result != 1) {
throw new Exception($"Expected 1 work item running at a time, but got {result}");
}
Items.Add(item);
var after = Interlocked.Decrement(ref concurrentWork);
if (after != 0) {
throw new Exception($"Expected 0 work items running once this item finished, but got {after}");
}
});
}
There are probably big problems with this, but the basic idea is to check how many items are already being handled when we enter the method, then decrement the counter and check there are still no other items being handled. With threading stuff I think it is very hard to make guarantees about things from tests alone, but with enough items processed this can give us a little confidence that it is working as expected:
[Fact]
public void Sample() {
var handler = new AssertSynchronousItemHandler();
var subject = new Subject(handler);
var input = Enumerable.Range(0, 100).Select(x => new Item(x.ToString())).ToList();
subject.PerformWork(input);
// With the code from the question we don't have a way of detecting
// when `PerformWork` finishes. If we can't change this we need to make
// sure we wait "long enough". Yes this is yuck. :)
Thread.Sleep(1000);
Assert.Equal(input, handler.Items);
}
If I modify PerformWork to do things in parallel I get the test failing:
public void PerformWork2(List<Item> items) {
Task.WhenAll(
items.Select(item => itemHandler.PerformIndividualWork(item))
).Wait(2000);
}
// ---- System.Exception : Expected 1 work item running at a time, but got 4
That said, if it is very important to run synchronously and it is not apparent from glancing at the implementation with async/await then maybe it is worth using a more obviously synchronous design, like a queue serviced by only one thread, so that you're guaranteed synchronous execution by design and people won't inadvertently change it to async during refactoring (i.e. it is deliberately synchronous and documented that way).
My C# application stops responding for a long time, as I break the Debug it stops on a function.
foreach (var item in list)
{
xmldiff.Compare(item, secondary, output);
...
}
I guess the running time of this function is long or it hangs. Anyway, I want to wait for a certain time (e.g. 5 seconds) for the execution of this function, and if it exceeds this time, I skip it and go to the next item in the loop. How can I do it? I found some similar question but they are mostly for processes or asynchronous methods.
You can do it the brutal way: spin up a thread to do the work, join it with timeout, then abort it, if the join didn't work.
Example:
var worker = new Thread( () => { xmlDiff.Compare(item, secondary, output); } );
worker.Start();
if (!worker.Join( TimeSpan.FromSeconds( 1 ) ))
worker.Abort();
But be warned - aborting threads is not considered nice and can make your app unstable. If at all possible try to modify Compare to accept a CancellationToken to cancel the comparison.
I would avoid directly using threads and use Microsoft's Reactive Extensions (NuGet "Rx-Main") to abstract away the management of the threads.
I don't know the exact signature of xmldiff.Compare(item, secondary, output) but if I assume it produces an integer then I could do this with Rx:
var query =
from item in list.ToObservable()
from result in
Observable
.Start(() => xmldiff.Compare(item, secondary, output))
.Timeout(TimeSpan.FromSeconds(5.0), Observable.Return(-1))
select new { item, result };
var subscription =
query
.Subscribe(x =>
{
/* do something with `x.item` and/or `x.result` */
});
This automatically iterates through each item and starts a background computation of xmldiff.Compare, but only allows each computation to take as much as 5.0 seconds before returning a default value of -1.
The subscription variable is an IDisposable, so if you want to abort the entire query before it completes just call .Dispose().
I skip it and go to the next item in the loop
By "skip it", do you mean "leave it there" or "cancel it"? The two scenarios are quite different. But for both two I suggest you use Task.
//generate 10 example tasks
var tasks = Enumerable
.Range(0, 10)
.Select(n => new Task(() => DoSomething(n)))
.ToList();
var maxExecutionTime = TimeSpan.FromSeconds(5);
foreach (var task in tasks)
{
if (task.Wait(maxExecutionTime))
{
//the task is finished in time
}
else
{
// the task is over time
// just leave it there
// the loop continues
// if you want to cancel it, see
// http://stackoverflow.com/questions/4783865/how-do-i-abort-cancel-tpl-tasks
}
}
One thing to improve is "do you really need to run your tasks one by one?" If they are independent you can run them in parallel.
I've got a method which takes IWorkItem, starts work on it and returns related task. The method has to look like this because of external library used.
public Task WorkOn(IWorkItem workItem)
{
//...start asynchronous operation, return task
}
I want to do this work on multiple work items. I don't know how many of them will be there - maybe 1, maybe 10 000.
WorkOn method has internal pooling and may involve waiting if too many pararell executions will be reached. (like in SemaphoreSlim.Wait):
public Task WorkOn(IWorkItem workItem)
{
_semaphoreSlim.Wait();
}
My current solution is:
public void Do(params IWorkItem[] workItems)
{
var tasks = new Task[workItems.Length];
for (var i = 0; i < workItems.Length; i++)
{
tasks[i] = WorkOn(workItems[i]);
}
Task.WaitAll(tasks);
}
Question: may I use somehow Parallel.ForEach in this case? To avoid creating 10000 tasks and later wait because of WorkOn's throttling?
That actually is not that easy. You can use Parallel.ForEach to throttle the amount of tasks that are spawned. But I am unsure how that will perform/behave in your condition.
As a general rule of thumb I usually try to avoid mixing Task and Parallel.
Surely you can do something like this:
public void Do(params IWorkItem[] workItems)
{
Parallel.ForEach(workItems, (workItem) => WorkOn(workItem).Wait());
}
Under "normal" conditions this should limit your concurrency nicely.
You could also go full async-await and add some limiting to your concurrency with some tricks. But you have to do the concurrency limiting yourself in that case.
const int ConcurrencyLimit = 8;
public async Task Do(params IWorkItem[] workItems)
{
var cursor = 0;
var currentlyProcessing = new List<Task>(ConcurrencyLimit);
while (cursor < workItems.Length)
{
while (currentlyProcessing.Count < ConcurrencyLimit && cursor < workItems.Length)
{
currentlyProcessing.Add(WorkOn(workItems[cursor]));
cursor++;
}
Task finished = await Task.WhenAny(currentlyProcessing);
currentlyProcessing.Remove(finished);
}
await Task.WhenAll(currentlyProcessing);
}
As I said... a lot more complicated. But it will limit the concurrency to any value you apply as well. In addition it properly uses the async-await pattern. If you don't want non-blocking multi threading you can easily wrap this function into another function and do a blocking .Wait on the task returned by this function.
In key in this implementation is the Task.WhenAny function. This function will return one finished task in the applied list of task (wrapped by another task for the await.
I'm talking about single-threaded (not TaskEx for WindowsPhone) (ok, even basic Task is designed to be async, this makes question senseless) and synchronous (no async/await) pure Task.
Can in be useful in some cases (i have quite common app which pulls data from the server, deserialize it and shows results), or is Task just a basement for
await TaskEx.Run()?
EDIT1: i mean, how this
void Foo()
{
DoSmth();
}
void Main()
{
int a = 1;
Foo();
int b = 1;
}
would differ from
void Main()
{
int a = 1;
Task.Run( () => DoSmth );
int b = 1;
}
Calling Foo(); is also kinda a "promise that next code would be called after Foo() is done".
EDIT2: I just ran in wp7 app
Debug.WriteLine("OnLoaded {0} ", Thread.CurrentThread.ManagedThreadId);
Task.Factory.StartNew(() =>
{
Thread.Sleep(5000);
Debug.WriteLine("Run Id: {0}", Thread.CurrentThread.ManagedThreadId);
});
Debug.WriteLine("Done");
Got the output:
OnLoaded 1
Done
Run Id: 4
So, is Task.Factory.StartNew() the same as TaskEx.Run() ?
ESIT3: so, here is a short summary (as Task.Factory.StartNew() is the same as TaskEx.Run()):
Thread.Sleep(5000); // UI is frozen for 5 seconds
int a = 1; // this is called after 5 seconds
TaskEx.Run(() =>
{
Thread.Sleep(5000);
int a = 1; // this is called after 5 seconds
}
int b = 2; // UI is not frozen, this is called instantly
await TaskEx.Run(() => // UI is not frozen, but...
{
Thread.Sleep(5000);
int a = 1; // this is called after 5 seconds
}
int b = 2; // this is called then task is done
A Task is just a way to represent something that will complete in the future. This is most commonly an asynchronous operation or something running in a background thread (via Task.Run/TaskEx.Run).
A "synchronous pure Task" really doesn't make sense - the entire purpose of a Task is to represent something that is not synchronous.
Can in be useful in some cases (i have quite common app which pulls data from the server, deserialize it and shows results),
In this case, since the data is pulling from a server, that is by its nature a good canidate for an asynchronous operation. This would make it a perfect canidate for Task (or Task<T>).
In response to your edit:
In the first version, everything is just run sequentially.
The second version, using Task.Run, actually causes DoSmth() to execute in a background thread. The Task returned can be used with await to asynchonously wait for it to complete, if you wanted to do so. This means that DoSmth() will potentially run at the same time as the assignment to b (and subsequent operations).
Is there any change that a multiple Background Workers perform better than Tasks on 5 second running processes? I remember reading in a book that a Task is designed for short running processes.
The reasong I ask is this:
I have a process that takes 5 seconds to complete, and there are 4000 processes to complete. At first I did:
for (int i=0; i<4000; i++) {
Task.Factory.StartNewTask(action);
}
and this had a poor performance (after the first minute, 3-4 tasks where completed, and the console application had 35 threads). Maybe this was stupid, but I thought that the thread pool will handle this kind of situation (it will put all actions in a queue, and when a thread is free, it will take an action and execute it).
The second step now was to do manually Environment.ProcessorCount background workers, and all the actions to be placed in a ConcurentQueue. So the code would look something like this:
var workers = new List<BackgroundWorker>();
//initialize workers
workers.ForEach((bk) =>
{
bk.DoWork += (s, e) =>
{
while (toDoActions.Count > 0)
{
Action a;
if (toDoActions.TryDequeue(out a))
{
a();
}
}
}
bk.RunWorkerAsync();
});
This performed way better. It performed much better then the tasks even when I had 30 background workers (as much tasks as in the first case).
LE:
I start the Tasks like this:
public static Task IndexFile(string file)
{
Action<object> indexAction = new Action<object>((f) =>
{
Index((string)f);
});
return Task.Factory.StartNew(indexAction, file);
}
And the Index method is this one:
private static void Index(string file)
{
AudioDetectionServiceReference.AudioDetectionServiceClient client = new AudioDetectionServiceReference.AudioDetectionServiceClient();
client.IndexCompleted += (s, e) =>
{
if (e.Error != null)
{
if (FileError != null)
{
FileError(client,
new FileIndexErrorEventArgs((string)e.UserState, e.Error));
}
}
else
{
if (FileIndexed != null)
{
FileIndexed(client, new FileIndexedEventArgs((string)e.UserState));
}
}
};
using (IAudio proxy = new BassProxy())
{
List<int> max = new List<int>();
if (proxy.ReadFFTData(file, out max))
{
while (max.Count > 0 && max.First() == 0)
{
max.RemoveAt(0);
}
while (max.Count > 0 && max.Last() == 0)
{
max.RemoveAt(max.Count - 1);
}
client.IndexAsync(max.ToArray(), file, file);
}
else
{
throw new CouldNotIndexException(file, "The audio proxy did not return any data for this file.");
}
}
}
This methods reads from an mp3 file some data, using the Bass.net library. Then that data is sent to a WCF service, using the async method.
The IndexFile(string file) method, which creates tasks is called for 4000 times in a for loop.
Those two events, FileIndexed and FileError are not handled, so they are never thrown.
The reason why the performance for Tasks was so poor was because you mounted too many small tasks (4000). Remember the CPU needs to schedule the tasks as well, so mounting a lots of short-lived tasks causes extra work load for CPU. More information can be found in the second paragraph of TPL:
Starting with the .NET Framework 4, the TPL is the preferred way to
write multithreaded and parallel code. However, not all code is
suitable for parallelization; for example, if a loop performs only a
small amount of work on each iteration, or it doesn't run for many
iterations, then the overhead of parallelization can cause the code to
run more slowly.
When you used the background workers, you limited the number of possible alive threads to the ProcessCount. Which reduced a lot of scheduling overhead.
Given that you have a strictly defined list of things to do, I'd use the Parallel class (either For or ForEach depending on what suits you better). Furthermore you can pass a configuration parameter to any of these methods to control how many tasks are actually performed at the same time:
System.Threading.Tasks.Parallel.For(0, 20000, new ParallelOptions() { MaxDegreeOfParallelism = 5 }, i =>
{
//do something
});
The above code will perform 20000 operations, but will NOT perform more than 5 operations at the same time.
I SUSPECT the reason the background workers did better for you was because you had them created and instantiated at the start, while in your sample Task code it seems you're creating a new Task object for every operation.
Alternatively, did you think about using a fixed number of Task objects instantiated at the start and then performing a similar action with a ConcurrentQueue like you did with the background workers? That should also prove to be quite efficient.
Have you considered using threadpool?
http://msdn.microsoft.com/en-us/library/system.threading.threadpool.aspx
If your performance is slower when using threads, it can only be due to threading overhead (allocating and destroying individual threads).