Under what conditions does .NET decide to cancel a PLINQ task? - c#

I've had a look around for some documentation, and all the documentation appears to suggest the following...
PLINQ tasks do not have a default timeout
PLINQ tasks can deadlock, and .NET/TPL will never cancel them for you to release the deadlock
However, in my application, this is not the case. I cannot replicate a minimal example using a simpler console app, but I will show the closest reproduction I have tried, and the actual PLINQ query that is being cancelled prior to completion. There is no Exception of any type other than the Task Cancellation Exceptions, which all suggest that the task was requested to be cancelled directly (no other Exception occurred anywhere). There is no cancellation code anywhere in my application, so it can only be .NET deciding to cancel it for me?
I am aware that these examples below hammer out HttpClients, this is not the cause, as the console example shows.
Attempt at reproduction, this code never cancels, despite the epic running time:
var j = 0;
var ints = new List<int>();
for (int i = 0; i < 5000; i++) {
ints.Add(i);
};
ints.AsParallel().WithExecutionMode(ParallelExecutionMode.ForceParallelism).WithDegreeOfParallelism(8).ForAll(n => {
int count = 0;
while (count < 100 && j == 0) {
var httpClient = new HttpClient();
var response = httpClient.GetStringAsync("https://hostname/").GetAwaiter().GetResult();
count++;
Thread.Sleep(1000);
}
});
But this code, usually gets a couple of minutes in before stalling. I'm not sure whether it stalls first, and then .NET notices this and cancels it (but that violates point 2...) or whether .NET just cancels it because it took overall too long (but that's point 1...). usernames contains 5000 elements in the running code, hence the console test with 5000 elements. Papi just wraps HttpClient and uses SendAsync to send HttpRequestMessage off, I can't see SendAsync being the cause though.
importSuccess = usernames.AsParallel().WithExecutionMode(ParallelExecutionMode.ForceParallelism).WithDegreeOfParallelism(8).All(u => {
var apiClone = new Papi(api.Path);
apiClone.SessionId = api.SessionId;
var userDetails = String.Format("stuff {0}", u);
var importResponse = apiClone.Post().WithString(userDetails, "application/json").Send(apiClone.SessionId).GetAwaiter().GetResult();
if (importResponse.IsSuccessStatusCode) {
var body = importResponse.Content.ReadAsStringAsync().GetAwaiter().GetResult().ToLower();
if (body == "true") {
return true;
}
}
return false;
});
Again the above PLINQ query throws a Task Cancelled Exception after a couple of minutes, no other Exception is observed. Anyone ever had a PLINQ query be cancelled without writing the cancellation themselves or know why this might be?

Related

What is wrong with my Code (SendPingAsync)

Im writing a C# Ping-Application.
I started with a synchronous Ping-method, but I figurred out that pinging several server with one click takes more and more time.
So I decided to try the asynchronous method.
Can someone help me out?
public async Task<string> CustomPing(string ip, int amountOfPackets, int sizeOfPackets)
{
// timeout
int Timeout = 2000;
// PaketSize logic
string packet = "";
for (int j = 0; j < sizeOfPackets; j++)
{
packet += "b";
};
byte[] buffer = Encoding.ASCII.GetBytes(packet);
// time-var
long ms = 0;
// Main Method
using (Ping ping = new Ping())
for (int i = 0; i < amountOfPackets; i++)
{
PingReply reply = await ping.SendPingAsync(ip, Timeout, buffer);
ms += reply.RoundtripTime;
};
return (ms / amountOfPackets + " ms");
};
I defined a "Server"-Class (Ip or host, City, Country).
Then I create a "server"-List:
List<Server> ServerList = new List<Server>()
{
new Server("www.google.de", "Some City,", "Some Country")
};
Then I loop through this list and I try to call the method like this:
foreach (var server in ServerList)
ListBox.Items.Add("The average response time of your custom server is: " + server.CustomPing(server.IP, amountOfPackets, sizeOfPackets));
Unfortunately, this is much more competitive than the synchronous method, and at the point where my method should return the value, it returns
System.Threading.Tasks.Taks`1[System.string]
since you have an async method it will return the task when it is called like this:
Task<string> task = server.CustomPing(server.IP, amountOfPackets, sizeOfPackets);
when you add it directly to your ListBox while concatenating it with a string it will use the ToString method, which by default prints the full class name of the object. This should explaint your output:
System.Threading.Tasks.Taks`1[System.string]
The [System.string] part actually tells you the return type of the task result. This is what you want, and to get it you would need to await it! like this:
foreach (var server in ServerList)
ListBox.Items.Add("The average response time of your custom server is: " + await server.CustomPing(server.IP, amountOfPackets, sizeOfPackets));
1) this has to be done in another async method and
2) this will mess up all the parallelity that you are aiming for. Because it will wait for each method call to finish.
What you can do is to start all tasks one after the other, collect the returning tasks and wait for all of them to finish. Preferably you would do this in an async method like a clickhandler:
private async void Button1_Click(object sender, EventArgs e)
{
Task<string> [] allTasks = ServerList.Select(server => server.CustomPing(server.IP, amountOfPackets, sizeOfPackets)).ToArray();
// WhenAll will wait for all tasks to finish and return the return values of each method call
string [] results = await Task.WhenAll(allTasks);
// now you can execute your loop and display the results:
foreach (var result in results)
{
ListBox.Items.Add(result);
}
}
The class System.Threading.Tasks.Task<TResult> is a helper class for Multitasking. While it resides in the Threading Namespace, it works for Threadless Multitasking just as well. Indeed if you see a function return a task, you can usually use it for any form of Multitasking. Tasks are very agnostic in how they are used. You can even run it synchronously, if you do not mind that little extra overhead of having a Task doing not a lot.
Task helps with some of the most important rules/convetions of Multitasking:
Do not accidentally swallow exceptions. Threadbase Multitasking is notoriously good in doing just that.
Do not use the result after a cancelation
It does that by throwing you exceptions in your face (usually the Aggregate one) if you try to access the Result Property when convention tells us you should not do that.
As well as having all those other usefull properties for Multitasking.

Running groups of groups of Tasks in a For loop

I have a set of 100 Tasks that need to run, in any order. Putting them all into a Task.WhenAll() tends to overload the back end, which I do not control.
I'd like to run n-number tasks at a time, after each completes, then run the next set. I wrote this code, but the "Console(Running..." is printed to the screen all after the tasks are run making me think all the Tasks are being run.
How can I force the system to really "wait" for each group of Tasks?
//Run some X at a time
int howManytoRunAtATimeSoWeDontOverload = 4;
for(int i = 0; i < tasks.Count; i++)
{
var startIndex = howManytoRunAtATimeSoWeDontOverload * i;
Console.WriteLine($"Running {startIndex} to {startIndex+ howManytoRunAtATimeSoWeDontOverload}");
var toDo = tasks.Skip(startIndex).Take(howManytoRunAtATimeSoWeDontOverload).ToArray();
if (toDo.Length == 0) break;
await Task.WhenAll(toDo);
}
Screen Output:
There are a lot of ways to do this but I would probably use some library or framework that provides a higher level abstraction like TPL Dataflow: https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/dataflow-task-parallel-library (if your using .NET Core there's a newer library).
This makes it a lot easier than building your own buffering mechanisms. Below is a very simple example but you can configure it differently and do a lot more with this library. In the example below I don't batch them but I make sure no more than 10 tasks are processed at the same time.
var buffer = new ActionBlock<Task>(async t =>
{
await t;
}, new ExecutionDataflowBlockOptions { BoundedCapacity = 10, MaxDegreeOfParallelism = 1 });
foreach (var t in tasks)
{
await buffer.SendAsync(DummyFunctionAsync(t));
}
buffer.Complete();
await buffer.Completion;

ParallelProcess blocks indefinitely for Task.WaitAll randomly , processes fine most of the time

I have a process that runs periodically every 5 mins. It picks up some records from the db and calls code to Register, in parallel 25 tasks are called, waits for execution of all those parallel tasks to complete and then continues processing the remaining 25 tasks at a time. It runs fine most of the times but after a few weeks or sometimes even a few days , it will just hang. It most likely is hung at wait all where one of the tasks has not completed.
We have not been able to reproduce this in a local environment or even in our test environment in a load scenario. It is a WCF application so the Register method is a service call and we are not sure if a particular call of register task is not returning , then why it isn't - there is no database error or deadlock detected, all queries that execute in that task have a timeout of 30 secs. When it halts and we kill the service , and restart then the same record executes without any issues.
====Snippet of code that executes the 25 tasks at a time -
public void RegisterApplications()
{
throttleCount = 25;
IList<OnlineApplication> onlineApplications = VDService.GetOnlineApplicationsPendingRegistration());
countTotal = onlineApplications.Count;
foreach (OnlineApplication onlineApplication in onlineApplications)
{
object parameters = new StartRegistrationParameters()
{
Application = onlineApplication,
Context = Context.CurrentOperationContext
};
Task task = Task.Factory.StartNew(StartRegisterApplication, parameters);
registrationTasks.Add(task);
onlineApplicationTasks.Add(task.Id, (long)onlineApplication.OnlineApplicationId);
counterTasksForThrottling++;
if (counterTasksForThrottling >= throttleCount)
{
countProcessed += CountTasksCompleted(registrationTasks, messageLog, onlineApplicationTasks);
counterTasksForThrottling = 0;
// initialize again.
registrationTasks.RemoveAll(t => t.Id > 0);
}
}
countProcessed += CountTasksCompleted(registrationTasks, messageLog, onlineApplicationTasks);
}
==========Count Task Completed Method ===============
private int CountTasksCompleted(List<Task> registrationTasks, StringBuilder messageLog, Dictionary<int, long> onlineApplicationTasks)
{
int countTasksCompleted = 0;
try
{
if (messageLog == null)
{
messageLog = new StringBuilder();
}
if (registrationTasks.Count > 0)
{
Task.WaitAll(registrationTasks.ToArray(),);
}
}
// catching this exception as it is thrown after all the tasks have completed and when any one of the tasks within throw an error.
// the task status in the task show us which have completed and which have faulted after this.
catch (AggregateException ex)
{
Logger.Log(ex);
}
finally
{
if (registrationTasks != null && registrationTasks.Count > 0)
{
countTasksCompleted = registrationTasks.Where(task => task.Status == TaskStatus.RanToCompletion).Count();
var failedtasks = registrationTasks.Where(task => task.Status != TaskStatus.RanToCompletion).ToList();
foreach (Task failedTask in failedtasks)
{
messageLog.AppendFormat("Id {0},", onlineApplicationTasks[failedTask.Id]);
}
}
}
return countTasksCompleted;
}
That sounds very much like a Race Condition and those are a pain to debug. If that really is the case, there is little we can help you with other then point out any potential causes of race conditions we can spot - but there is likely more in the rest of it.
Your description does make it sound like this might be a deadlock case. While you say you detected non with the DB access, if your code or the code you are using uses any form of lock statement, these can still happen.
As a start, add some more loging to the mix. For every disticnt step, add a "I am at step X" to the log. This will help you nail down the exact cause.

Waiting asynchronously for some tasks to complete (Task.WhenSome)

I am writing a service that combines data from various internet sources, and generates a response on the fly. Speed is more important than completeness, so I would like to generate my response as soon as some (not all) of the internet sources have responded. Typically my service creates 10 concurrent web requests, and should stop waiting and start processing after 5 of them have completed. Neither the .NET Framework, nor any of the third-party libraries I am aware of are offering this functionality, so I 'll probably have to write it myself. The method I am trying to implement has the following signature:
public static Task<TResult[]> WhenSome<TResult>(int atLeast, params Task<TResult>[] tasks)
{
// TODO
}
Contrary to how Task.WhenAny works, exceptions should be swallowed, provided that the required number of results have been acquired. If however, after completion of all tasks, there are not enough gathered results, then an AggregateException should be thrown propagating all exceptions.
Usage example:
var tasks = new Task<int>[]
{
Task.Delay(100).ContinueWith<int>(_ => throw new ApplicationException("Oops!")),
Task.Delay(200).ContinueWith(_ => 10),
Task.Delay(Timeout.Infinite).ContinueWith(_ => 0,
new CancellationTokenSource(300).Token),
Task.Delay(400).ContinueWith(_ => 20),
Task.Delay(500).ContinueWith(_ => 30),
};
var results = await WhenSome(2, tasks);
Console.WriteLine($"Results: {String.Join(", ", results)}");
Expected output:
Results: 10, 20
In this example the last task returning the value 30 should be ignored (not even awaited), because we have already acquired the number of results we want (2 results). The faulted and cancelled tasks should also be ignored, for the same reason.
This is some clunky code which I think achieves your requirements. It may be a starting point.
It may also be a bad way of handling tasks and/or not threadsafe, and/or just a terrible idea. But I expect if so someone will point that out.
async Task<TResult[]> WhenSome<TResult>(int atLeast, List<Task<TResult>> tasks)
{
List<Task<TResult>> completedTasks = new List<System.Threading.Tasks.Task<TResult>>();
int completed = 0;
List<Exception> exceptions = new List<Exception>();
while (completed < atLeast && tasks.Any()) {
var completedTask = await Task.WhenAny(tasks);
tasks.Remove(completedTask);
if (completedTask.IsCanceled)
{
continue;
}
if (completedTask.IsFaulted)
{
exceptions.Add(completedTask.Exception);
continue;
}
completed++;
completedTasks.Add(completedTask);
}
if (completed >= atLeast)
{
return completedTasks.Select(t => t.Result).ToArray();
}
throw new AggregateException(exceptions).Flatten();
}
I am adding one more solution to this problem, not because stuartd's answer in not sufficient, but just for the sake of variety. This implementation uses the Unwrap technique in order to return a Task that contains all the exceptions, in exactly the same way that the built-in Task.WhenAll method propagates all the exceptions.
public static Task<TResult[]> WhenSome<TResult>(int atLeast, params Task<TResult>[] tasks)
{
if (tasks == null) throw new ArgumentNullException(nameof(tasks));
if (atLeast < 1 || atLeast > tasks.Length)
throw new ArgumentOutOfRangeException(nameof(atLeast));
var cts = new CancellationTokenSource();
int successfulCount = 0;
var continuationAction = new Action<Task<TResult>>(task =>
{
if (task.IsCompletedSuccessfully)
if (Interlocked.Increment(ref successfulCount) == atLeast) cts.Cancel();
});
var continuations = tasks.Select(task => task.ContinueWith(continuationAction,
cts.Token, TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default));
return Task.WhenAll(continuations).ContinueWith(_ =>
{
cts.Dispose();
if (successfulCount >= atLeast) // Success
return Task.WhenAll(tasks.Where(task => task.IsCompletedSuccessfully));
else
return Task.WhenAll(tasks); // Failure
}, TaskScheduler.Default).Unwrap();
}
The continuations do not propagate the results or the exceptions of the tasks. These are cancelable continuations, and they are canceled en masse when the specified number of successful tasks has been reached.
Note: This implementation might propagate more than atLeast results. If you want exactly this number of results, you can chain a .Take(atLeast) after the .Where LINQ operatgor.

Should BlockingCollection.TryTake(object,TimeSpan) return immediately on new data?

I am trying to ascertain why performance on my blocking collection appears slow. A simple version of my code is illustrated in the question further below.
My question here is if BlockingCollection.TryTake(object,TimeSpan) returns immediately on new data?
TimeSpan gridNextTS = new TimeSpan(0, 0, 60);
if (trPipe.TryTake(out tr, gridNextTS) == false)
From my testing it appears that data is NOT returned immediately. Does that seems likely desired behaviour or am I using it incorrectly?
Detail of code previous question:
Consumer/Producer with BlockingCollection appears slow
A concise benchmark shows that BlockingCollection<T> does, in fact, perform the handover pretty swiftly regardless of the timeout value supplied to TryTake.
public async Task BlockingCollectionPerformance()
{
using (var collection = new BlockingCollection<int>())
{
var consumer = Task.Run(() =>
{
var i = 0;
while (collection.TryTake(out i, TimeSpan.FromSeconds(2)))
{
Debug.Print(i.ToString());
}
});
var producer = Task.Run(() =>
{
try
{
for (var i = 0; i < 10; i++)
{
collection.Add(i);
}
}
finally
{
collection.CompleteAdding();
}
});
await Task.WhenAll(producer, consumer);
}
}
The above completes in ~3 ms on my box.
To be more specific though, TryTake returns quickly whenever an item is added to the collection (and TryTake returns true), or when you call CompleteAdding on the blocking collection (in which case there is no point in waiting out the timeout, and TryTake returns false). It is possible to shoot yourself in the foot by keeping the consumer blocked longer than necessary if you never call CompleteAdding in which case TryTake will have to wait out the full timeout length before returning false.

Categories