Tasks are running sequentially for some reason - c#

Case 1: I have a console test app and libraries. The test app calls async methods on classes in those libraries that are meant to run in parallel. Example Code
for (int i = 0; i < 100; i++)
{
var myTask = RetrieveRecordSet<TestClass3>();
}
This works as expected in the console app, meaning all 100 Tasks are queued at the same time and the Tasks perform in parallel in the background, proven by their Console output.
Case 2: Same code, just in a WPF app instead of a console app. Now for some reason, the Tasks run sequentially.
Case 3: I also tried the following modification to the WPF app to no avail:
for (int i = 0; i < 100; i++)
{
var myTask = Task.Factory.StartNew(() => RetrieveRecordSet<TestClass3>());
}
Case 4: I then tried the following but it blocks the UI and is still sequential
Parallel.For(0, 100, a => RetrieveRecordSet<TestClass3>());
Is there a way to get the same non-blocking parallel behavior that I get from Case 1 in a WPF app?

Try the following:
Await Task.Run(() =>
{
Parallel.For(0,100, ()=> RetrieveRecordSet<>());
}

Parallel.For is indeed a blocking operation. If you want the Parallel.For loop to run on a separate thread than the UI:
new Thread(() => Parallel.For(0, 100, index => RetrieveRecordSet<TestClass3>())).Start();
Or
new Task(() => Parallel.For(0, 100, index => RetrieveRecordSet<TestClass3>())).Start();

Thanks for the answers folks. In the end, the answer is much more nefarious, and I'm still not sure what the problem is but I found a workaround. The underlying code is making WCF service calls. Using Task Parallel library to make a bunch of parallel WCF calls the first time a channel is used, will serialize those calls. I serendipitously discovered that if you 'prime' the channel with a single call first, await the response, and THEN slam it with a bunch of parallel WCF calls, you then get full parallelism. Is there a less hacky perhaps proper way to prime a WCF channel as such? Is this a bug in WCF or TPL?

Related

How to ensure parallel tasks dequeue unique entries from ConcurrentQueue<T>?

Hi I have a concurrent Queue that is loaded with files from database. These files are to be processed by parallel Tasks that will dequeue the files. However I run into issues where after some time, I start getting tasks that dequeue the same file at the same time (which leads to "used by another process errors on the file). And I also get more tasks than are supposed to be allocated. I have even seen 8 tasks running at once which should not be happening. The active tasks limit is 5
Rough code:
private void ParseQueuedTDXFiles()
{
while (_signalParseQueuedFilesEvent.WaitOne())
{
Task.Run(() => SetParsersTask());
}
}
The _signalParseQueuedFilesEvent is set on a timer in a Windows Service
The above function then calls SetParsersTask. This is why I use a concurrent Dictionary to track how many active tasks there are. And make sure they are below _ActiveTasksLimit:
private void SetParsersTask()
{
if (_ConcurrentqueuedTdxFilesToParse.Count > 0)
{
if (_activeParserTasksDict.Count < _ActiveTasksLimit) //ConcurrentTask Dictionary Used to control how many Tasks should run
{
int parserCountToStart = _ActiveTasksLimit - _activeParserTasksDict.Count;
Parallel.For(0, parserCountToStart, parserToStart =>
{
lock(_concurrentQueueLock)
Task.Run(() => PrepTdxParser());
});
}
}
}
Which then calls this function which dequeues the Concurrent Queue:
private void PrepTdxParser()
{
TdxFileToProcessData fileToProcess;
lock (_concurrentQueueLock)
_ConcurrentqueuedTdxFilesToParse.TryDequeue(out fileToProcess);
if (!string.IsNullOrEmpty(fileToProcess.TdxFileName))
{
LaunchTDXParser(fileToProcess);
}
}
I even put a lock on _ConcurrentqueuedTdxFilesToParse even though I know it doesn't need one. All to make sure that I never run into a situation where two Tasks are dequeuing the same file.
This function is where I add and remove Tasks as well as launch the file parser for the dequeued file:
private void LaunchTDXParser(TdxFileToProcessData fileToProcess)
{
string fileName = fileToProcess.TdxFileName;
Task startParserTask = new Task(() => ConfigureAndStartProcess(fileName));
_activeParserTasksDict.TryAdd(fileName, startParserTask);
startParserTask.Start();
Task.WaitAll(startParserTask);
_activeParserTasksDict.TryRemove(fileName, out Task taskToBeRemoved);
}
Can you guys help me understand why I am getting the same file dequeued in two different Tasks? And why I am getting more Tasks than the _ActiveTasksLimit?
There is a number of red flags in this¹ code:
Using a WaitHandle. This tool it too primitive. I've never seen a problem solved with WaitHandles, that can't be solved in a simpler way without them.
Launching Task.Run tasks in a fire-and-forget fashion.
Launching a Parallel.For loop without configuring the MaxDegreeOfParallelism. This practically guarantees that the ThreadPool will get saturated.
Protecting a queue (_queuedTdxFilesToParse) with a lock (_concurrentQueueLock) only partially. If the queue is a Queue<T>, you must protect it on each and every operation, otherwise the behavior of the program is undefined. If the queue is a ConcurrentQueue<T>, there is no need to protect it because it is thread-safe by itself.
Calling Task.Factory.StartNew and Task.Start without configuring the scheduler argument.
So I am not surprised that your code is not working as expected. I can't point to a specific error that needs to be fixed. For me the whole approach is dubious, and needs to be reworked/scraped. Some concepts and tools that you might want to research before attempting to rewrite this code:
The producer-consumer pattern.
The BlockingCollection<T> class.
The TPL Dataflow library.
Optionally you could consider familiarizing yourself with asynchronous programming. It can help at reducing the number of threads that your program uses while running, resulting in a more efficient and scalable program. Two powerful asynchronous tools is the Channel<T> class and the Parallel.ForEachAsync API (available from .NET 6 and later).
¹ This answer was intended for a related question that is now deleted.
So I fixed my problem. The solution was first to not add more parallelism than needs be. I was trying to create a situaion where private void SetParsersTask() would not be held by tasks that still needed to finish process a file. So I foolishly threw in Parallel.For in addition to Task.Start which is already parallel. I fixed this by generating Fire and Forget Tasks in a normal for loop as opposed to Paralle.For:
private void SetParsersTask()
{
if (_queuedTdxFilesToParse.Count > 0)
{
if (_activeParserTasksDict.Count < _tdxParsersInstanceCount)
{
int parserCountToStart = _tdxParsersInstanceCount - _activeParserTasksDict.Count;
_queuedTdxFilesToParse = new ConcurrentQueue<TdxFileToProcessData>(_queuedTdxFilesToParse.Distinct());
for (int i = 0; i < parserCountToStart; i++)
{
Task.Run(() => PrepTdxParser());
}
}
}
}
After that I was still getting the occasional duplicate files so I moved the queue loading to another long running thread. And for that thread I use an AutoResetEvent so that the queue is only populated only once at any instance of time. As opposed to potentially another task loading it with duplicate files. It could be that both my enqueue and dequeue were both responsible and now it's addressed:
var _loadQueueTask = Task.Factory.StartNew(() => LoadQueue(), TaskCreationOptions.LongRunning);
private void LoadQueue()
{
while (_loadConcurrentQueueEvent.WaitOne())
{
if (_queuedTdxFilesToParse.Count < _tdxParsersInstanceCount)
{
int numFilesToGet = _tdxParsersInstanceCount - _activeParserTasksDict.Count;
var filesToAdd = ServiceDBHelper.GetTdxFilesToEnqueueForProcessingFromDB(numFilesToGet);
foreach (var fileToProc in filesToAdd)
{
ServiceDBHelper.UpdateTdxFileToProcessStatusAndUpdateDateTime(fileToProc.TdxFileName, 1, DateTime.Now);
_queuedTdxFilesToParse.Enqueue(fileToProc);
}
}
}
}
Thanks to Theo for pointing me to additional tools and making me look closer in my parallel loops

Tasks running synchronously in console application

I have console application which is doing multiple API requests over HTTPs.
When running in single thread it can do maximum of about 8 API requests / seconds.
Server which is receiving API calls has lots of free resources, so it should be able to handle many more than 8 / sec.
Also when I run multiple instances of the application, each instance is still able to do 8 requests / sec.
I tried following code to parallelize the requests, but it still runs synchronously:
var taskList = new List<Task<string>>();
for (int i = 0; i < 10000; i++)
{
string threadNumber = i.ToString();
Task<string> task = Task<string>.Factory.StartNew(() => apiRequest(requestData));
taskList.Add(task);
}
foreach (var task in taskList)
{
Console.WriteLine(task.Result);
}
What am I doing wrong here?
EDIT:
My mistake was iterating over tasks and getting task.Result, that was blocking the main thread, making me think that it was running synchronously.
Code which I ended up using instead of foreach(var task in taskList):
while (taskList.Count > 0)
{
Task.WaitAny();
// Gets tasks in RanToCompletion or Faulted state
var finishedTasks = GetFinishedTasks(taskList);
foreach (Task<string> finishedTask in finishedTasks)
{
Console.WriteLine(finishedTask.Result);
taskList.Remove(finishedTask);
}
}
There could be a couple of things going on.
First, the .net ServicePoint class allows a maximum number of 2 connections per host by default. See this Stack Overflow question/answer.
Second, your server might theoretically be able to handle more than 8/sec, but there could be resource constraints or other issues preventing that on the server side. I have run into issues with API calls which theoretically should be able to handle much more than they do, but for whatever reason were designed or implemented improperly.
#theMayer is kinda-sorta correct. It's possible that your call to apiRequest is what's blocking and making the whole expression seem synchronous...
However... you're iterating over each task and calling task.Result, which will block until the task completes in order to print it to the screen. So, for example, all tasks except the first could be complete, but you won't print them until the first one completes, and you will continue printing them in order.
On a slightly different note, you could rewrite this little more succinctly like so:
var screenLock = new object();
var results = Enumerable.Range(1, 10000)
.AsParallel()
.Select(i => {
// I wouldn't actually use this printing, but it should help you understand your example a bit better
lock (screenLock) {
Console.WriteLine("Task i");
}
apiRequest(requestedData));
});
Without the printing, it looks like this:
var results = Enumerable.Range(1, 10000)
.AsParallel()
.Select(i => apiRequest(requestedData));

How do to iteratively call a method without waiting for response from method in c# .Net 4.5.1?

First off I apologize for terrible wording of that question...here's the scenario:
I built a WEB API method that receives a ProductID and then uploads that products images to Amazon S3. This part is working just fine.
I am now trying to get a console app running that will grab a range of ProductIDs and loop through them, calling the API method, and not wait for the results...
Can anyone point me in the right direction?
I suppose another caveat would be to not eat up all the resources on the machine running the console app...so maybe a thread cap?
UPDATE (This still seems to be synchronous):
class Program
{
async static void DoUpload(int itemid)
{
Console.WriteLine("Starting #:" + itemid);
Thread.Sleep(2000); //Simulates long call to API
Console.WriteLine("Finishing #:" + itemid);
}
static void Main(string[] args)
{
for (int i = 0; i < 20; i++)
{
DoUpload(i);
}
}
}
There are a couple easy ways to do this.
I'd recommend using Parallels. It makes the most optimized use of your environments many threads/cores. For your example, you'd simply do something like this:
var status = Parallel.For(0, 20, DoUpload);
while (!status.IsCompleted)
{
//Do something while you wait
}
The other method would be to use Tasks, and send each call as a separate task. Be careful with this approach though because you could overload the system with pending tasks if you have too many iterations.
List<Tasks> tasks = new List<Tasks>();
for (int i = 0; i < 20; i++)
{
var task = Task.Run(() => DoUpload(i));
tasks.Add(task);
}
//wait for completion of all tasks
Task.WaitAll(tasks.ToArray());
I do not recommend using Parallel.For. It does not give an satisfactory control of parallelism (you probably don't want to hammer away hundrades of requests which will start to timeout) also it requires unnecessary context switching.
Threads/cores isn't the limiting factor in case http requests.
In the example change
Thread.Sleep(2000)
to
await Task.Delay(2000)
and when using real web api calls
await httpClient.PostAsync(...)
also remember to wait in Main
Console.ReadLine() // or something more sophisticated
otherwise the program will terminate before the calls have been made.
Then to control the level of parallelism I think the easiest solution is to use a Semaphore to count the number of outstanding calls, waiting in the main loop for the semaphore to be signaled again before issuing new requests.

Manual threads vs Parallel.Foreach in task scheduler

I have a Windows Service that processes tasks created by users. This Service runs on a server with 4 cores. The tasks mostly involve heavy database work (generating a report for example). The server also has a few other services running so I don't want to spin up too many threads (let's say a maximum of 4).
If I use a BlockingCollection<MyCustomTask>, is it a better idea to create 4 Thread objects and use these to consume from the BlockingCollection<MyCustomTask> or should I use Parallel.Foreach to accomplish this?
I'm looking at the ParallelExtensionsExtras which contains a StaTaskScheduler which uses the former, like so (slightly modified the code for clarity):
var threads = Enumerable.Range(0, numberOfThreads).Select(i =>
{
var thread = new Thread(() =>
{
// Continually get the next task and try to execute it.
// This will continue until the scheduler is disposed and no more tasks remain.
foreach (var t in _tasks.GetConsumingEnumerable())
{
TryExecuteTask(t);
}
});
thread.IsBackground = true;
thread.SetApartmentState(ApartmentState.STA);
return thread;
}).ToList();
// Start all of the threads
threads.ForEach(t => t.Start());
However, there's also a BlockingCollectionPartitioner in the same ParallelExtensionsExtras which would enable the use of Parallel.Foreach on a BlockingCollection<Task>, like so:
var blockingCollection = new BlockingCollection<MyCustomTask>();
Parallel.ForEach(blockingCollection.GetConsumingEnumerable(), task =>
{
task.DoSomething();
});
It's my understanding that the latter leverages the ThreadPool. Would using Parallel.ForEach have any benefits in this case?
This answer is relevant if Task class in your code has nothing to do with System.Threading.Tasks.Task.
As a simple rule, use Parallel.ForEach to run tasks that will end eventually. Like execute some work in parallel with some other work
Use Threads when they run routine for the whole life of application.
So, it looks like in your case you should use Threads approach.

Tracking progress of a multi-step Task

I am working on a simple server that exposes webservices to clients. Some of the requests may take a long time to complete, and are logically broken into multiple steps. For such requests, it is required to report progress during execution. In addition, a new request may be initiated before a previous one completes, and it is required that both execute concurrently (barring some system-specific limitations).
I was thinking of having the server return a TaskId to its clients, and having the clients track the progress of the requests using the TaskId. I think this is a good approach, and I am left with the issue of how tasks are managed.
Never having used the TPL, I was thinking it would be a good way to approach this problem. Indeed, it allows me to run multiple tasks concurrently without having to manually manage threads. I can even create multi-step tasks relatively easily using ContinueWith.
I can't come up with a good way of tracking a task's progress, though. I realize that when my requests consist of a single "step", then the step has to cooperatively report its state. This is something I would prefer to avoid at this point. However, when a request consists of multiple steps, I would like to know which step is currently executing and report progress accordingly. The only way I could come up with is extremely tiresome:
Task<int> firstTask = new Task( () => { DoFirstStep(); return 3.14; } );
firstTask.
ContinueWith<int>( task => { UpdateProgress("50%"); return task.Result; } ).
ContinueWith<string>( task => { DoSecondStep(task.Result); return "blah"; }.
ContinueWith<string>( task => { UpdateProgress("100%"); return task.Result; } ).
And even this is not perfect since I would like the Task to store its own progress, instead of having UpdateProgress update some known location. Plus it has the obvious downside of having to change a lot of places when adding a new step (since now the progress is 33%, 66%, 100% instead of 50%, 100%).
Does anyone have a good solution?
Thanks!
This isn't really a scenario that the Task Parallel Library supports that fully.
You might consider an approach where you fed progress updates to a queue and read them on another Task:
static void Main(string[] args)
{
Example();
}
static BlockingCollection<Tuple<int, int, string>> _progressMessages =
new BlockingCollection<Tuple<int, int, string>>();
public static void Example()
{
List<Task<int>> tasks = new List<Task<int>>();
for (int i = 0; i < 10; i++)
tasks.Add(Task.Factory.StartNew((object state) =>
{
int id = (int)state;
DoFirstStep(id);
_progressMessages.Add(new Tuple<int, int, string>(
id, 1, "10.0%"));
DoSecondStep(id);
_progressMessages.Add(new Tuple<int, int, string>(
id, 2, "50.0%"));
// ...
return 1;
},
(object)i
));
Task logger = Task.Factory.StartNew(() =>
{
foreach (var m in _progressMessages.GetConsumingEnumerable())
Console.WriteLine("Task {0}: Step {1}, progress {2}.",
m.Item1, m.Item2, m.Item3);
});
List<Task> waitOn = new List<Task>(tasks.ToArray());
waitOn.Add(logger);
Task.WaitAll(waitOn.ToArray());
Console.ReadLine();
}
private static void DoSecondStep(int id)
{
Console.WriteLine("{0}: First step", id);
}
private static void DoFirstStep(int id)
{
Console.WriteLine("{0}: Second step", id);
}
This sample doesn't show cancellation, error handling or account for your requirement that your task may be long running. Long running tasks place special requirements on the scheduler. More discussion of this can be found at http://parallelpatterns.codeplex.com/, download the book draft and look at Chapter 3.
This is simply an approach for using the Task Parallel Library in a scenario like this. The TPL may well not be the best approach here.
If your web services are running inside ASP.NET (or a similar web application server) then you should also consider the likely impact of using threads from the thread pool to execute tasks, rather than service web requests:
How does Task Parallel Library scale on a terminal server or in a web application?
I don't think the solution you are looking for will involve the Task API. Or at least, not directly. It doesn't support the notion of percentage complete, and the Task/ContinueWith functions need to participate in that logic because it's data that is only available at that level (only the final invocation of ContinueWith is in any position to know the percentage complete, and even then, doing so algorithmically will be a guess at best because it certainly doesn't know if one task is going to take a lot longer than the other. I suggest you create your own API to do this, possibly leveraging the Task API to do the actual work.
This might help: http://blog.stephencleary.com/2010/06/reporting-progress-from-tasks.html. In addition to reporting progress, this solution also enables updating form controls without getting the Cross-thread operation not valid exception.

Categories