How to cleanup hanging tasks on C# Task API? - c#

I have a simple function as the following:
static Task<A> Peirce<A, B>(Func<Func<A, Task<B>>, Task<A>> a)
{
var aa = new TaskCompletionSource<A>();
var tt = new Task<A>(() =>
a(b =>
{
aa.SetResult(b);
return new TaskCompletionSource<B>().Task;
}).Result
);
tt.Start();
return Task.WhenAny(aa.Task, tt).Result;
}
The idea is simple: for any implementation of a, it must return a Task<A> to me. For this purpose, it may or may not use the parameter (of type Func<A, Task<B>). If it do, our callback will be called and it sets the result of aa, and then aa.Task will complete. Otherwise, the result of a will not depend on its parameter, so we simply return its value. In any of the situation, either aa.Task or the result of a will complete, so it should never block unless a do not uses its parameter and blocks, or the task returned by a blocks.
The above code works, for example
static void Main(string[] args)
{
Func<Func<int, Task<int>>, Task<int>> t = a =>
{
return Task.FromResult(a(20).Result + 10);
};
Console.WriteLine(Peirce(t).Result); // output 20
t = a => Task.FromResult(10);
Console.WriteLine(Peirce(t).Result); // output 10
}
The problem here is, the two tasks aa.Task and tt must be cleaned up once the result of WhenAny has been determined, otherwise I am afraid there will be a leak of hanging tasks. I do not know how to do this, can any one suggest something? Or this is actually not a problem and C# will do it for me?
P.S. The name Peirce came from the famous "Peirce's Law"(((A->B)->A)->A) in propositional logic.
UPDATE: the point of matter is not "dispose" the tasks but rather stop them from running. I have tested, when I put the "main" logic in a 1000 loop it runs slowly (about 1 loop/second), and creates a lot of threads so it is a problem to solve.

A Task is a managed object. Unless you are introducing unmanaged resources, you shouldn't worry about a Task leaking resources. Let the GC clean it up and let the finalizer take care of the WaitHandle.
EDIT:
If you want to cancel tasks, consider using cooperative cancellation in the form of a CancellationTokenSource. You can pass this token to any tasks via the overload, and inside of each task, you may have some code as follows:
while (someCondition)
{
if (cancelToken.IsCancellationRequested)
break;
}
That way your tasks can gracefully clean up without throwing an exception. However you can propogate an OperationCancelledException if you call cancelToken.ThrowIfCancellationRequested(). So the idea in your case would be that whatever finishes first can issue the cancellation to the other tasks so that they aren't hung up doing work.

Thanks to #Bryan Crosby's answer, I can now implement the function as the following:
private class CanceledTaskCache<A>
{
public static Task<A> Instance;
}
private static Task<A> GetCanceledTask<A>()
{
if (CanceledTaskCache<A>.Instance == null)
{
var aa = new TaskCompletionSource<A>();
aa.SetCanceled();
CanceledTaskCache<A>.Instance = aa.Task;
}
return CanceledTaskCache<A>.Instance;
}
static Task<A> Peirce<A, B>(Func<Func<A, Task<B>>, Task<A>> a)
{
var aa = new TaskCompletionSource<A>();
Func<A, Task<B>> cb = b =>
{
aa.SetResult(b);
return GetCanceledTask<B>();
};
return Task.WhenAny(aa.Task, a(cb)).Unwrap();
}
and it works pretty well:
static void Main(string[] args)
{
for (int i = 0; i < 1000; ++i)
{
Func<Func<int, Task<String>>, Task<int>> t =
async a => (await a(20)).Length + 10;
Console.WriteLine(Peirce(t).Result); // output 20
t = async a => 10;
Console.WriteLine(Peirce(t).Result); // output 10
}
}
Now it is fast and not consuming to much resources. It can be even faster (about 70 times in my machine) if you do not use the async/await keyword:
static void Main(string[] args)
{
for (int i = 0; i < 10000; ++i)
{
Func<Func<int, Task<String>>, Task<int>> t =
a => a(20).ContinueWith(ta =>
ta.IsCanceled ? GetCanceledTask<int>() :
Task.FromResult(ta.Result.Length + 10)).Unwrap();
Console.WriteLine(Peirce(t).Result); // output 20
t = a => Task.FromResult(10);
Console.WriteLine(Peirce(t).Result); // output 10
}
}
Here the matter is, even you can detected the return value of a(20), there is no way to cancel the async block rather than throwing an OperationCanceledException and it prevents WhenAny to be optimized.
UPDATE: optimised code and compared async/await and native Task API.
UPDATE: If I can write the following code it will be ideal:
static Task<A> Peirce<A, B>(Func<Func<A, Task<B>>, Task<A>> a)
{
var aa = new TaskCompletionSource<A>();
return await? a(async b => {
aa.SetResult(b);
await break;
}) : await aa.Task;
}
Here, await? a : b has value a's result if a successes, has value b if a is cancelled (like a ? b : c, the value of a's result should have the same type of b).
await break will cancel the current async block.

As Stephen Toub of MS Parallel Programming Team says: "No. Don't bother disposing of your tasks."
tldr: In most cases, disposing of a task does nothing, and when the task actually has allocated unmanaged resources, its finalizer will release them when the task object is collected.

Related

Enumerable foreach extend

I created an extension to Enumerable to execute action fastly, so I have listed and in this method, I loop and if object executing the method in certain time out I return,
now I want to make the output generic because the method output will differ, any advice on what to do
this IEnumerable of processes, it's like load balancing, if the first not responded the second should, I want to return the output of the input Action
public static class EnumerableExtensions
{
public static void ForEach<T>(this IEnumerable<T> source, Action action, int timeOut)
{
foreach (T element in source)
{
lock (source)
{
// Loop for all connections and get the fastest responsive proxy
foreach (var mxAccessProxy in source)
{
try
{
// check for the health
Task executionTask = Task.Run(action);
if (executionTask.Wait(timeOut))
{
return ;
}
}
catch
{
//ignore
}
}
}
}
}
}
this code run like
_proxies.ForEach(certainaction, timeOut);
this will enhance the performance and code readability
No, it definitely won't :) Moreover, you bring some more problems with this code like redundant locking or exception swallowing, but don't actually execute code in parallel.
It seems like you want to get the fastest possible call for your Action using some sort of proxy objects. You need to run Tasks asynchronously, not consequently with .Wait().
Something like this could be helpful for you:
public static class TaskExtensions
{
public static TReturn ParallelSelectReturnFastest<TPoolObject, TReturn>(this TPoolObject[] pool,
Func<TPoolObject, CancellationToken, TReturn> func,
int? timeout = null)
{
var ctx = new CancellationTokenSource();
// for every object in pool schedule a task
Task<TReturn>[] tasks = pool
.Select(poolObject =>
{
ctx.Token.ThrowIfCancellationRequested();
return Task.Factory.StartNew(() => func(poolObject, ctx.Token), ctx.Token);
})
.ToArray();
// not sure if Cast is actually needed,
// just to get rid of co-variant array conversion
int firstCompletedIndex = timeout.HasValue
? Task.WaitAny(tasks.Cast<Task>().ToArray(), timeout.Value, ctx.Token)
: Task.WaitAny(tasks.Cast<Task>().ToArray(), ctx.Token);
// we need to cancel token to avoid unnecessary work to be done
ctx.Cancel();
if (firstCompletedIndex == -1) // no objects in pool managed to complete action in time
throw new NotImplementedException(); // custom exception goes here
return tasks[firstCompletedIndex].Result;
}
}
Now, you can use this extension method to call a specific action on any pool of objects and get the first executed result:
var pool = new[] { 1, 2, 3, 4, 5 };
var result = pool.ParallelSelectReturnFastest((x, token) => {
Thread.Sleep(x * 200);
token.ThrowIfCancellationRequested();
Console.WriteLine("calculate");
return x * x;
}, 100);
Console.WriteLine(result);
It outputs:
calculate
1
Because the first task will complete work in 200ms, return it, and all other tasks will be cancelled through cancellation token.
In your case it will be something like:
var actionResponse = proxiesList.ParallelSelectReturnFastest((proxy, token) => {
token.ThrowIfCancellationRequested();
return proxy.SomeAction();
});
Some things to mention:
Make sure that your actions are safe. You can't rely on how many of these will actually come to the actual execution of your action. If this action is CreateItem, then you can end up with many items to be created through different proxies
It cannot guarantee that you will run all of these actions in parallel, because it is up to TPL to chose the optimal number of running tasks
I have implemented in old-fashioned TPL way, because your original question contained it. If possible, you need to switch to async/await - in this case your Func will return tasks and you need to use await Task.WhenAny(tasks) instead of Task.WaitAny()

Task.ContinueWith seems to be called more often than there are actual tasks

First, this is from something much bigger and yes, I could completely avoid this using await under normal/other circumstances. For anyone interested, I'll explain below.
To track how many tasks I still have left before my program continues, I've built the following:
A counter:
private static int counter;
Some method:
public static void Test()
{
List<Task> tasks = new List<Task>();
for (int i = 0; i < 10000; i++)
{
TaskCompletionSource<object> tcs = new TaskCompletionSource<object>();
var task = DoTaskWork();
task.ContinueWith(t => // After DoTaskWork
{
// [...] Use t's Result
counter--; // Decrease counter
tcs.SetResult(null); // Finish the task that the UI or whatever is waiting for
});
tasks.Add(tcs.Task); // Store tasks to wait for
}
Task.WaitAll(tasks.ToArray()); // Wait for all tasks that actually only finish in the ContinueWith
Console.WriteLine(counter);
}
My super heavy work to do:
private static Task DoTaskWork()
{
counter++; // Increase counter
return Task.Delay(500);
}
Now, interestingly I do not receive the number 0 at the end when looking at counter. Instead, the number varies with each execution. Why is this? I tried various tests, but cannot find the reason for the irregularity. With the TaskCompletionSource I believed this to be reliable. Thanks.
Now, for anyone that is interested in why I do this:
I need to create loads of tasks without starting them. For this I need to use the Task constructor (one of its rare use cases). Its disadvantage to Task.Run() is that it cannot handle anything with await and that it needs a return type from the Task to properly run (hence the null as result). Therefore, I need a way around that. Other ideas welcome...
Well. I am stupid. Just 5 minutes in, I realize that.
I just did the same while locking a helper object before changing the counter in any way and now it works...
private static int counter;
private static object locker = new object();
// [...]
task.ContinueWith(t =>
{
lock(locker)
counter--;
tcs.SetResult(null);
});
// [...]
private static Task DoTaskWork()
{
lock (locker)
counter++;
return Task.Delay(500);
}
I need to create loads of tasks without starting them ... Therefore, I need a way around that. Other ideas welcome...
So, if I read it correct you want to build a list of tasks without actually run them on creation. You could do that by building a list of Func<Task> objects you invoke when required:
async Task Main()
{
// Create list of work to do later
var tasks = new List<Func<Task>>();
// Schedule some work
tasks.Add(() => DoTaskWork(1));
tasks.Add(() => DoTaskWork(2));
// Wait for user input before doing work to demonstrate they are not started right away
Console.ReadLine();
// Execute and wait for the completion of the work to be done
await Task.WhenAll(tasks.Select(t => t.Invoke()));
Console.WriteLine("Ready");
}
public async Task DoTaskWork(int taskNr)
{
await Task.Delay(100);
Console.WriteLine(taskNr);
}
This will work, even if you use Task.Run like this:
public Task DoTaskWork(int taskNr)
{
return Task.Run(() =>
{
Thread.Sleep(100); Console.WriteLine(taskNr);
});
}
It this is not want you want can you elaborate more about the tasks you want to create?

Async Producer/Consumer

I have a instance of a class that is accessed from several threads. This class take this calls and add a tuple into a database. I need this to be done in a serial manner, as due to some db constraints, parallel threads could result in an inconsistent database.
As I am new to parallelism and concurrency in C#, I did this:
private BlockingCollection<Task> _tasks = new BlockingCollection<Task>();
public void AddDData(string info)
{
Task t = new Task(() => { InsertDataIntoBase(info); });
_tasks.Add(t);
}
private void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_tasks.IsCompleted)
{
Task t;
if (_tasks.TryTake(out t))
{
t.Start();
t.Wait();
}
}
});
}
The AddDData is the one who is called by multiple threads and InsertDataIntoBase is a very simple insert that should take few milliseconds.
The problem is that, for some reason that my lack of knowledge doesn't allow me to figure out, sometimes a task is been called twice! It always goes like this:
T1
T2
T3
T1 <- PK error.
T4
...
Did I understand .Take() completely wrong, am I missing something or my producer/ consumer implementation is really bad?
Best Regards,
Rafael
UPDATE:
As suggested, I made a quick sandbox test implementation with this architecture and as I was suspecting, it does not guarantee that a task will not be fired before the previous one finishes.
So the question remains: how to properly queue tasks and fire them sequentially?
UPDATE 2:
I simplified the code:
private BlockingCollection<Data> _tasks = new BlockingCollection<Data>();
public void AddDData(Data info)
{
_tasks.Add(info);
}
private void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_tasks.IsCompleted)
{
Data info;
if (_tasks.TryTake(out info))
{
InsertIntoDB(info);
}
}
});
}
Note that I got rid of Tasks as I'm relying on synced InsertIntoDB call (as it is inside a loop), but still no luck... The generation is fine and I'm absolutely sure that only unique instances are going to the queue. But no matter I try, sometimes the same object is used twice.
I think this should work:
private static BlockingCollection<string> _itemsToProcess = new BlockingCollection<string>();
static void Main(string[] args)
{
InsertWorker();
GenerateItems(10, 1000);
_itemsToProcess.CompleteAdding();
}
private static void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_itemsToProcess.IsCompleted)
{
string t;
if (_itemsToProcess.TryTake(out t))
{
// Do whatever needs doing here
// Order should be guaranteed since BlockingCollection
// uses a ConcurrentQueue as a backing store by default.
// http://msdn.microsoft.com/en-us/library/dd287184.aspx#remarksToggle
Console.WriteLine(t);
}
}
});
}
private static void GenerateItems(int count, int maxDelayInMs)
{
Random r = new Random();
string[] items = new string[count];
for (int i = 0; i < count; i++)
{
items[i] = i.ToString();
}
// Simulate many threads adding items to the collection
items
.AsParallel()
.WithDegreeOfParallelism(4)
.WithExecutionMode(ParallelExecutionMode.ForceParallelism)
.Select((x) =>
{
Thread.Sleep(r.Next(maxDelayInMs));
_itemsToProcess.Add(x);
return x;
}).ToList();
}
This does mean that the consumer is single threaded, but allows for multiple producer threads.
From your comment
"I simplified the code shown here, as the data is not a string"
I assume that info parameter passed into AddDData is a mutable reference type. Make sure that the caller is not using the same info instance for multple calls since that reference is captured in Task lambda .
Based on the trace that you provided the only logical possibility is that you have called InsertWorker twice (or more). There are thus two background threads waiting for items to appear in the collection and occasionally they both manage to grab an item and begin executing it.

Why this order of events in a lambda expression?

This is the code that I'm running:
[TestMethod]
public async Task HelloTest()
{
List<int> hello = new List<int>();
//await Task.WhenAll(Enumerable.Range(0, 2).Select(async x => await Say(hello)));
await Say(hello);
await Say(hello);
}
private static async Task Say(List<int> hello)
{
await Task.Delay(100);
var rep = new Random().Next();
hello.Add(rep);
}
Why is it that running this code, as it is, works as intended and results in two random numbers, but that using the commented code instead always results in two of the exact same number?
So you have several issues here.
First, why are you seeing the same value twice. That's the easy one. When you create a Random instance it is seeded with the current time, but the precision of the current time it uses is rather low. If you get two new Random instances within say 16 milliseconds or so (which is a really long time for a computer) you'll see the same values out of them. That's what's happening for you.
Normally the fix for that is just to share a single Random instance, but the problem there is that your random instances aren't being accessed from the same thread (potentially, assuming you don't have a SynchronizationContext specified), and Random isn't thread safe. You can use something like this to get your random numbers instead:
public static class MyRandom
{
private static object key = new object();
private static Random random = new Random();
public static int Next()
{
lock (key)
{
return random.Next();
}
}
//TODO add other methods for other `Random` methods as needed
}
Use that and it will resolve the immediate issue.
The other problem that you have, although it doesn't seem to be biting you currently, is that you're modifying your List from two different tasks, possibly being executed in different threads. You shouldn't do that. It's bad enough practice to have methods like this in a single threaded environment (as you're relying on side effects to do your work) but in a multitheraded environment this is very problematic, for more than just conceptual reasons. Instead you should have each thread return a value, and then pull all of those values into a collection on the caller's side, like so:
public async Task HelloTest()
{
var data = await Task.WhenAll(Say(), Say());
}
private static async Task<int> Say()
{
await Task.Delay(100);
return MyRandom.Next();
}
As to why the two Say calls are run in parallel, rather than sequentially, that has to do with the fact that in your second code snippet you aren't actually waiting for one task to complete before starting the next.
The method that you pass to Select is the method to spin up the task, and it won't block until that task is done before starting the next. The code that you have here:
await Task.WhenAll(Enumerable.Range(0, 2).Select(async x => await Say(hello)));
Is no different than simply having:
await Task.WhenAll(Enumerable.Range(0, 2).Select(x => Say(hello)));
Having an async method that does nothing but await one method call is really no different than just having that one method call. What's happening here is that Select is calling Say, staring the task, continuing on, which stats the next task, and then WhenAll is waiting (asynchronously) for both tasks to finish before continuing on.
Note: This is an answer to the original question (the question has since been changed).
They operate identically. I ran the below Console application and got the results 0 1 for both versions:
class Program
{
static int m_nextNumber = 0;
static void Main(string[] args)
{
var t1 = Version1();
m_nextNumber = 0;
var t2 = Version2();
Task.WaitAll(t1, t2);
Console.ReadKey();
}
static async Task Version1()
{
List<int> hello = new List<int>();
await Say(hello);
await Say(hello);
PrintHello(hello);
}
static async Task Version2()
{
List<int> hello = new List<int>();
await Task.WhenAll(Enumerable.Range(0, 2).Select(async x => await Say(hello)));
PrintHello(hello);
}
static void PrintHello(List<int> hello)
{
foreach (var i in hello)
Console.WriteLine(i);
}
static int GotANumber()
{
return m_nextNumber++;
}
static async Task Say(List<int> hello)
{
var rep = GotANumber();
hello.Add(rep);
}
}

What is the best way to defer code execution?

I have many methods calling each other that each have to certain tasks, some of them asynchronous, that all operate on a DOM (so only one thread must access the DOM at any time).
For example:
object A() {
/*...A() code 1...*/
var res = B();
/*...A() code 2 that uses res...*/
}
object B() {
/*...B code 1...*/
var res1 = C();
/*...B code 2 that uses res1...*/
var res2 = C();
/*...B code 3 that uses res2...*/
}
object C() {
/*...C code 1...*/
if (rnd.NextDouble() < 0.3) { // unpredictable condition
startAsyncStuff();
/*...C code 2 that uses async result above...*/
}
if (rnd.NextDouble() < 0.7) { // unpredictable condition
startOtherAsyncStuff();
/*...C code 3 that might use any/both async results above...*/
}
}
Now let's say I have a method that wants to execute method A() 1000 times as fast as possible (the async methods can run in separate threads, however all other code must only access the DOM one at a time), so Ideally when the async calls are reached code execution for A(), B() and C() are paused, so A() can be called again.
There are 2 ways I can think of to do this. One is with yield, by changing all the methods to iterators I can pause and resume execution:
struct DeferResult {
public object Result;
public bool Deferred;
}
IEnumerator<DeferResult> A() {
/*...A() code 1...*/
var dres = B();
if (dres.Deferred) yield dres;
/*...A() code 2...*/
}
IEnumerator<DeferResult> B() {
/*...B code 1...*/
var dres1 = C();
if (dres1.Deferred) yield dres1;
/*...B code 2...*/
var dres2 = C();
if (dres2.Deferred) yield dres2;
/*...B code 3...*/
}
IEnumerator<DeferResult> C() {
/*...C code 1...*/
if (rnd.NextDouble() < 0.3) { // unpredictable condition
startAsyncStuff();
yield return new DeferResult { Deferred = true; }
/*...C code 2 that uses async result above...*/
}
if (rnd.NextDouble() < 0.7) { // unpredictable condition
startOtherAsyncStuff();
yield return new DeferResult { Deferred = true; }
/*...C code 3 that might use any/both async results above...*/
}
yield return new DeferResult { Result = someResult(); }
}
void Main() {
var deferredMethods = new List<IEnumerator<DeferResult>>();
for (int i = 0; i < 1000; i++) {
var en = A().GetEnumerator();
if (en.MoveNext())
if (en.Current.Deferred)
deferredMethods.Add(en);
}
// then use events from the async methods so when any is done continue
// running it's enumerator to execute the code until the next async
// operation, or until finished
// once all 1000 iterations are complete call an AllDone() method.
}
This method has quite some overhead from the iterators, and is a bit more code intensive, however it all runs on one thread so I don't need to synchronize the DOM access.
Another way would be to use threads (1000 simultaneous threads are a bad idea, so i'd implement some kind of thread pooling), but this requires synchronizing DOM access which is costly.
Are there any other methods I can use to defer code execution under these conditions? What would be the recommended way to do this?
As Karl has suggested, does this need to be multi-threaded? I may go for multi-threaded situation if
DOM access are random but not frequent
All other code in A, B, C is substantial in terms of time (as compared to DOM Access code)
All other code in A, B, C can be executed in thread-safe way w/o doing any locking etc i.e. if they depend on some shared state then you have synchronize access to that as well as.
Now in such case, I would consider using a thread pool to launch A multiple times with synchronizing access to DOM. Cost of DOM synchronization can be reduced using thread-safe caching - of course that depends upon a kind of DOM access.

Categories