Tasks running synchronously in console application - c#

I have console application which is doing multiple API requests over HTTPs.
When running in single thread it can do maximum of about 8 API requests / seconds.
Server which is receiving API calls has lots of free resources, so it should be able to handle many more than 8 / sec.
Also when I run multiple instances of the application, each instance is still able to do 8 requests / sec.
I tried following code to parallelize the requests, but it still runs synchronously:
var taskList = new List<Task<string>>();
for (int i = 0; i < 10000; i++)
{
string threadNumber = i.ToString();
Task<string> task = Task<string>.Factory.StartNew(() => apiRequest(requestData));
taskList.Add(task);
}
foreach (var task in taskList)
{
Console.WriteLine(task.Result);
}
What am I doing wrong here?
EDIT:
My mistake was iterating over tasks and getting task.Result, that was blocking the main thread, making me think that it was running synchronously.
Code which I ended up using instead of foreach(var task in taskList):
while (taskList.Count > 0)
{
Task.WaitAny();
// Gets tasks in RanToCompletion or Faulted state
var finishedTasks = GetFinishedTasks(taskList);
foreach (Task<string> finishedTask in finishedTasks)
{
Console.WriteLine(finishedTask.Result);
taskList.Remove(finishedTask);
}
}

There could be a couple of things going on.
First, the .net ServicePoint class allows a maximum number of 2 connections per host by default. See this Stack Overflow question/answer.
Second, your server might theoretically be able to handle more than 8/sec, but there could be resource constraints or other issues preventing that on the server side. I have run into issues with API calls which theoretically should be able to handle much more than they do, but for whatever reason were designed or implemented improperly.

#theMayer is kinda-sorta correct. It's possible that your call to apiRequest is what's blocking and making the whole expression seem synchronous...
However... you're iterating over each task and calling task.Result, which will block until the task completes in order to print it to the screen. So, for example, all tasks except the first could be complete, but you won't print them until the first one completes, and you will continue printing them in order.
On a slightly different note, you could rewrite this little more succinctly like so:
var screenLock = new object();
var results = Enumerable.Range(1, 10000)
.AsParallel()
.Select(i => {
// I wouldn't actually use this printing, but it should help you understand your example a bit better
lock (screenLock) {
Console.WriteLine("Task i");
}
apiRequest(requestedData));
});
Without the printing, it looks like this:
var results = Enumerable.Range(1, 10000)
.AsParallel()
.Select(i => apiRequest(requestedData));

Related

How to ensure parallel tasks dequeue unique entries from ConcurrentQueue<T>?

Hi I have a concurrent Queue that is loaded with files from database. These files are to be processed by parallel Tasks that will dequeue the files. However I run into issues where after some time, I start getting tasks that dequeue the same file at the same time (which leads to "used by another process errors on the file). And I also get more tasks than are supposed to be allocated. I have even seen 8 tasks running at once which should not be happening. The active tasks limit is 5
Rough code:
private void ParseQueuedTDXFiles()
{
while (_signalParseQueuedFilesEvent.WaitOne())
{
Task.Run(() => SetParsersTask());
}
}
The _signalParseQueuedFilesEvent is set on a timer in a Windows Service
The above function then calls SetParsersTask. This is why I use a concurrent Dictionary to track how many active tasks there are. And make sure they are below _ActiveTasksLimit:
private void SetParsersTask()
{
if (_ConcurrentqueuedTdxFilesToParse.Count > 0)
{
if (_activeParserTasksDict.Count < _ActiveTasksLimit) //ConcurrentTask Dictionary Used to control how many Tasks should run
{
int parserCountToStart = _ActiveTasksLimit - _activeParserTasksDict.Count;
Parallel.For(0, parserCountToStart, parserToStart =>
{
lock(_concurrentQueueLock)
Task.Run(() => PrepTdxParser());
});
}
}
}
Which then calls this function which dequeues the Concurrent Queue:
private void PrepTdxParser()
{
TdxFileToProcessData fileToProcess;
lock (_concurrentQueueLock)
_ConcurrentqueuedTdxFilesToParse.TryDequeue(out fileToProcess);
if (!string.IsNullOrEmpty(fileToProcess.TdxFileName))
{
LaunchTDXParser(fileToProcess);
}
}
I even put a lock on _ConcurrentqueuedTdxFilesToParse even though I know it doesn't need one. All to make sure that I never run into a situation where two Tasks are dequeuing the same file.
This function is where I add and remove Tasks as well as launch the file parser for the dequeued file:
private void LaunchTDXParser(TdxFileToProcessData fileToProcess)
{
string fileName = fileToProcess.TdxFileName;
Task startParserTask = new Task(() => ConfigureAndStartProcess(fileName));
_activeParserTasksDict.TryAdd(fileName, startParserTask);
startParserTask.Start();
Task.WaitAll(startParserTask);
_activeParserTasksDict.TryRemove(fileName, out Task taskToBeRemoved);
}
Can you guys help me understand why I am getting the same file dequeued in two different Tasks? And why I am getting more Tasks than the _ActiveTasksLimit?
There is a number of red flags in this¹ code:
Using a WaitHandle. This tool it too primitive. I've never seen a problem solved with WaitHandles, that can't be solved in a simpler way without them.
Launching Task.Run tasks in a fire-and-forget fashion.
Launching a Parallel.For loop without configuring the MaxDegreeOfParallelism. This practically guarantees that the ThreadPool will get saturated.
Protecting a queue (_queuedTdxFilesToParse) with a lock (_concurrentQueueLock) only partially. If the queue is a Queue<T>, you must protect it on each and every operation, otherwise the behavior of the program is undefined. If the queue is a ConcurrentQueue<T>, there is no need to protect it because it is thread-safe by itself.
Calling Task.Factory.StartNew and Task.Start without configuring the scheduler argument.
So I am not surprised that your code is not working as expected. I can't point to a specific error that needs to be fixed. For me the whole approach is dubious, and needs to be reworked/scraped. Some concepts and tools that you might want to research before attempting to rewrite this code:
The producer-consumer pattern.
The BlockingCollection<T> class.
The TPL Dataflow library.
Optionally you could consider familiarizing yourself with asynchronous programming. It can help at reducing the number of threads that your program uses while running, resulting in a more efficient and scalable program. Two powerful asynchronous tools is the Channel<T> class and the Parallel.ForEachAsync API (available from .NET 6 and later).
¹ This answer was intended for a related question that is now deleted.
So I fixed my problem. The solution was first to not add more parallelism than needs be. I was trying to create a situaion where private void SetParsersTask() would not be held by tasks that still needed to finish process a file. So I foolishly threw in Parallel.For in addition to Task.Start which is already parallel. I fixed this by generating Fire and Forget Tasks in a normal for loop as opposed to Paralle.For:
private void SetParsersTask()
{
if (_queuedTdxFilesToParse.Count > 0)
{
if (_activeParserTasksDict.Count < _tdxParsersInstanceCount)
{
int parserCountToStart = _tdxParsersInstanceCount - _activeParserTasksDict.Count;
_queuedTdxFilesToParse = new ConcurrentQueue<TdxFileToProcessData>(_queuedTdxFilesToParse.Distinct());
for (int i = 0; i < parserCountToStart; i++)
{
Task.Run(() => PrepTdxParser());
}
}
}
}
After that I was still getting the occasional duplicate files so I moved the queue loading to another long running thread. And for that thread I use an AutoResetEvent so that the queue is only populated only once at any instance of time. As opposed to potentially another task loading it with duplicate files. It could be that both my enqueue and dequeue were both responsible and now it's addressed:
var _loadQueueTask = Task.Factory.StartNew(() => LoadQueue(), TaskCreationOptions.LongRunning);
private void LoadQueue()
{
while (_loadConcurrentQueueEvent.WaitOne())
{
if (_queuedTdxFilesToParse.Count < _tdxParsersInstanceCount)
{
int numFilesToGet = _tdxParsersInstanceCount - _activeParserTasksDict.Count;
var filesToAdd = ServiceDBHelper.GetTdxFilesToEnqueueForProcessingFromDB(numFilesToGet);
foreach (var fileToProc in filesToAdd)
{
ServiceDBHelper.UpdateTdxFileToProcessStatusAndUpdateDateTime(fileToProc.TdxFileName, 1, DateTime.Now);
_queuedTdxFilesToParse.Enqueue(fileToProc);
}
}
}
}
Thanks to Theo for pointing me to additional tools and making me look closer in my parallel loops

Multi-threading potentially long running operations

I am writing a Windows Service. I have a 'backlog' (in SQL) of records that have to be processed by the service. The backlog might also be empty. The record processing is a potentially very long running operation (3+ minutes).
I have a class and method in it which would go to the SQL, choose a record and process it, if there are any records to process. Then the method will exist and that's it. Note: I can't know in advance which records will be processed - the class method decides this as part of its logic.
I want to achieve parallel processing. I want to have X number of workers (where X is the optimal for the host PC) at any time. While the backlog is empty, those workers finish their jobs and exit pretty quickly (~50-100ms, maybe). I want any 'freed' worker to start over again (i.e. re-run).
I have done some reading and I deduct that ThreadPool is not a good option for long-running operations. The .net 4.0+ parallel library is not a good option either, as I don't want to wait all workers to finish and I don't want to predefine/declare in advance the tasks.
In layman terms I want to have X workers who query the data source for items and when some of them find such - operate on it, the rest would continue to look for newly pushed items into the backlog.
What would be the best approach? I think I will have to manage the threads entirely by myself? i.e. first step - determine the optimum number of threads (perhaps by checking the Environment.ProcessorCount) and then start the X threads. Monitor for IsAlive on each thread and restart it? This seems awfully unprofessional.
Any suggestions?
You can start one task per core,As tasks finish start new ones.You can use numOfThreads depending on ProcessorCount or specific number
int numOfThreads = System.Environment.ProcessorCount;
// int numOfThreads = X;
for(int i =0; i< numOfThreads; i++)
task.Add(Task.Factory.StartNew(()=> {});
while(task.count>0) //wait for task to finish
{
int index = Task.WaitAny(tasks.ToArray());
tasks.RemoveAt(index);
if(incomplete work)
task.Add(Task.Factory.StartNew()=> {....});
}
or
var options = new ParallelOptions();
options.MaxDegreeOfParllelism = System.Environment.ProcessorCount;
Parallel.For(0,N,options, (i) => {/*long running computattion*/};
or
You can Implement Producer-Coustomer pattern with BlockingCollection
This topic is excellently taught by Dr.Joe Hummel on his Pluralsight course "Async and parallel programming: Application design "
Consider using ActionBlock<T> from TPL.DataFlow library. It can be configured to process concurrently multiple messages using all available CPU cores.
ActionBlock<QueueItem> _processor;
Task _completionTask;
bool _done;
async Task ReadQueueAsync(int pollingInterval)
{
while (!_done)
{
// Get a list of items to process from SQL database
var list = ...;
// Schedule the work
foreach(var item in list)
{
_processor.Post(item);
}
// Give SQL server time to re-fill the queue
await Task.Delay(pollingInterval);
}
// Signal the processor that we are done
_processor.Complete();
}
void ProcessItem(QueueItem item)
{
// Do your work here
}
void Setup()
{
// Configure action block to process items concurrently
// using all available CPU cores
_processor= new ActionBlock<QueueItem>(new Action<QueueItem>(ProcessItem),
new ExecutionDataFlowBlock{MaxDegreeOfParallelism = DataFlowBlockOptions.Unbounded});
_done = false;
var queueReaderTask = ReadQueueAsync(QUEUE_POLL_INTERVAL);
_completionTask = Task.WhenAll(queueReaderTask, _processor.Completion);
}
void Complete()
{
_done = true;
_completionTask.Wait();
}
Per MaxDegreeOfParallelism's documentation: "Generally, you do not need to modify this setting. However, you may choose to set it explicitly in advanced usage scenarios such as these:
When you know that a particular algorithm you're using won't scale
beyond a certain number of cores. You can set the property to avoid
wasting cycles on additional cores.
When you're running multiple algorithms concurrently and want to
manually define how much of the system each algorithm can utilize.
You can set a MaxDegreeOfParallelism value for each.
When the thread pool's heuristics is unable to determine the right
number of threads to use and could end up injecting too many
threads. For example, in long-running loop body iterations, the
thread pool might not be able to tell the difference between
reasonable progress or livelock or deadlock, and might not be able to reclaim threads that were added to improve performance. In this
case, you can set the property to ensure that you don't use more
than a reasonable number of threads."
If you do not have an advanced usage scenario like the 3 cases above, you may want to hand your list of items or tasks to be run to the Task Parallel Library and let the framework handle the processor count.
List<InfoObject> infoList = GetInfo();
ConcurrentQueue<ResultObject> output = new ConcurrentQueue<ResultObject>();
await Task.Run(() =>
{
Parallel.Foreach<InfoObject>(infoList, (item) =>
{
ResultObject result = ProcessInfo(item);
output.Add(result);
});
});
foreach(var resultObj in output)
{
ReportOnResultObject(resultObj);
}
OR
List<InfoObject> infoList = GetInfo();
List<Task<ResultObject>> tasks = new List<Task<ResultObject>>();
foreach (var item in infoList)
{
tasks.Add(Task.Run(() => ProcessInfo(item)));
}
var results = await Task.WhenAll(tasks);
foreach(var resultObj in results)
{
ReportOnResultObject(resultObj);
}
H/T to IAmTimCorey tutorials:
https://www.youtube.com/watch?v=2moh18sh5p4
https://www.youtube.com/watch?v=ZTKGRJy5P2M

Why an additional async operation is making my code faster than when the operation is not taking place at all?

I'm working on a SMS-based game (Value Added Service), in which a question must be sent to each subscriber on a daily basis. There are over 500,000 subscribers and therefore performance is a key factor. Since each subscriber can be a difference state of the competition with different variables, database must be queried separately for each subscriber before sending a text message. To achieve the best performance I'm using .Net Task Parallel Library (TPL) to spawn parallel threadpool threads and do as much async operations as possible in each thread to finally send texts asap.
Before describing the actual problem there are some more information necessary to give about the code.
At first there was no async operation in the code. I just scheduled some 500,000 tasks with the default task scheduler into the Threadpool and each task would work through the routines, blocking on all EF (Entity Framework) queries and sequentially finishing its job. It was good, but not fast enough. Then I changed all EF queries to Async, the outcome was superb in speed but there has been so many deadlocks and timeouts in SQL server that about a third of the subscribers never received a text! After trying different solutions, I decided not to do too many Async Database operations while I have over 500,000 tasks running on a 24 core server (with at least 24 concurrent threadpool threads)!
I rolled back all the changes (the Asycn ones) expect for one web service call in each task which remained Async.
Now the weird case:
In my code, I have a boolean variable named "isCrossSellActive". When the variable is set some more DB operations take place and an asycn webservice call will happen on which the thread awaits. When this variable is false, none of these operations will happen including the async webservice call. Awkwardly when the variable is set the code runs so much faster than when it's not! It seems like for some reason the awaited async code (the cooperative thread) is making the code faster.
Here is the code:
public async Task AutoSendMessages(...)
{
//Get list of subscriptions plus some initialization
LimitedConcurrencyLevelTaskScheduler lcts = new LimitedConcurrencyLevelTaskScheduler(numberOfThreads);
TaskFactory taskFactory = new TaskFactory(lcts);
List<Task> tasks = new List<Task>();
//....
foreach (var sub in subscriptions)
{
AutoSendData data = new AutoSendData
{
ServiceId = serviceId,
MSISDN = sub.subscriber,
IsCrossSellActive = bolCrossSellHeader
};
tasks.Add(await taskFactory.StartNew(async (x) =>
{
await SendQuestion(x);
}, data));
}
GC.Collect();
try
{
Task.WaitAll(tasks.ToArray());
}
catch (AggregateException ae)
{
ae.Handle((ex) =>
{
_logRepo.LogException(1, "", ex);
return true;
});
}
await _autoSendRepo.SetAutoSendingStatusEnd(statusId);
}
public async Task SendQuestion(object data)
{
//extract variables from input parameter
try
{
if (isCrossSellActive)
{
int pieceCount = subscriptionRepo.GetSubscriberCarPieces(curSubscription.service, curSubscription.subscriber).Count(c => c.isConfirmed);
foreach (var rule in csRules)
{
if (rule.Applies)
{
if (await HttpClientHelper.GetJsonAsync<bool>(url, rule.TargetServiceBaseAddress))
{
int noOfAddedPieces = SomeCalculations();
if (noOfAddedPieces > 0)
{
crossSellRepo.SetPromissedPieces(curSubscription.subscriber, curSubscription.service,
rule.TargetShortCode, noOfAddedPieces, 0, rule.ExpirationLimitDays);
}
}
}
}
}
// The rest of the code. (Some db CRUD)
await SmsClient.SendSoapMessage(subscriber, smsBody);
}
catch (Exception ex){//...}
}
Ok, thanks to #usr and the clue he gave me, the problem is finally solved!
His comment drew my attention to the awaited taskFactory.StartNew(...) line which sequentially adds new tasks to the "tasks" list which is then awaited on by Task.WaitAll(tasks);
At first I removed the await keyword before the taskFactory.StartNew() and it led the code towards a horrible state of malfunction! I then returned the await keyword to before taskFactory.StartNew() and debugged the code using breakpoints and amazingly saw that the threads are ran one after another and sequentially before the first thread reaches the first await inside the "SendQuestion" routine. When the "isCrossSellActive" flag was set despite the more jobs a thread should do the first await keyword is reached earlier thus enabling the next scheduled task to run. But when its not set the only await keyword is the last line of the routine so its most likely to run sequentially to the end.
usr's suggestion to remove the await keyword in the for loop seemed to be correct but the problem was the Task.WaitAll() line would wait on the wrong list of Task<Task<void>> instead of Task<void>. I finally used Task.Run instead of TaskFactory.StartNew and everything changed. Now the service is working well. The final code inside the for loop is:
tasks.Add(Task.Run(async () =>
{
await SendQuestion(data);
}));
and the problem was solved.
Thank you all.
P.S. Read this article on Task.Run and why TaskFactory.StartNew is dangerous: http://blog.stephencleary.com/2013/08/startnew-is-dangerous.html
It's extremly hard to tell unless you add some profiling that tell you which code is taking longer now.
Without seeing more numbers my best guess would be that the SMS service doesn't like when you send too many requests in a short time and chokes. When you add the extra DB calls the extra delay make the sms service work better.
A few other small details:
await Task.WhenAll is usually a bit better than Task.WaitAll. WaitAll means the thread will sit around waiting. Making a deadlock slightly more likely.
Instead of:
tasks.Add(await taskFactory.StartNew(async (x) =>
{
await SendQuestion(x);
}, data));
You should be able to do
tasks.Add(SendQuestion(data));

How do to iteratively call a method without waiting for response from method in c# .Net 4.5.1?

First off I apologize for terrible wording of that question...here's the scenario:
I built a WEB API method that receives a ProductID and then uploads that products images to Amazon S3. This part is working just fine.
I am now trying to get a console app running that will grab a range of ProductIDs and loop through them, calling the API method, and not wait for the results...
Can anyone point me in the right direction?
I suppose another caveat would be to not eat up all the resources on the machine running the console app...so maybe a thread cap?
UPDATE (This still seems to be synchronous):
class Program
{
async static void DoUpload(int itemid)
{
Console.WriteLine("Starting #:" + itemid);
Thread.Sleep(2000); //Simulates long call to API
Console.WriteLine("Finishing #:" + itemid);
}
static void Main(string[] args)
{
for (int i = 0; i < 20; i++)
{
DoUpload(i);
}
}
}
There are a couple easy ways to do this.
I'd recommend using Parallels. It makes the most optimized use of your environments many threads/cores. For your example, you'd simply do something like this:
var status = Parallel.For(0, 20, DoUpload);
while (!status.IsCompleted)
{
//Do something while you wait
}
The other method would be to use Tasks, and send each call as a separate task. Be careful with this approach though because you could overload the system with pending tasks if you have too many iterations.
List<Tasks> tasks = new List<Tasks>();
for (int i = 0; i < 20; i++)
{
var task = Task.Run(() => DoUpload(i));
tasks.Add(task);
}
//wait for completion of all tasks
Task.WaitAll(tasks.ToArray());
I do not recommend using Parallel.For. It does not give an satisfactory control of parallelism (you probably don't want to hammer away hundrades of requests which will start to timeout) also it requires unnecessary context switching.
Threads/cores isn't the limiting factor in case http requests.
In the example change
Thread.Sleep(2000)
to
await Task.Delay(2000)
and when using real web api calls
await httpClient.PostAsync(...)
also remember to wait in Main
Console.ReadLine() // or something more sophisticated
otherwise the program will terminate before the calls have been made.
Then to control the level of parallelism I think the easiest solution is to use a Semaphore to count the number of outstanding calls, waiting in the main loop for the semaphore to be signaled again before issuing new requests.

Tasks are running sequentially for some reason

Case 1: I have a console test app and libraries. The test app calls async methods on classes in those libraries that are meant to run in parallel. Example Code
for (int i = 0; i < 100; i++)
{
var myTask = RetrieveRecordSet<TestClass3>();
}
This works as expected in the console app, meaning all 100 Tasks are queued at the same time and the Tasks perform in parallel in the background, proven by their Console output.
Case 2: Same code, just in a WPF app instead of a console app. Now for some reason, the Tasks run sequentially.
Case 3: I also tried the following modification to the WPF app to no avail:
for (int i = 0; i < 100; i++)
{
var myTask = Task.Factory.StartNew(() => RetrieveRecordSet<TestClass3>());
}
Case 4: I then tried the following but it blocks the UI and is still sequential
Parallel.For(0, 100, a => RetrieveRecordSet<TestClass3>());
Is there a way to get the same non-blocking parallel behavior that I get from Case 1 in a WPF app?
Try the following:
Await Task.Run(() =>
{
Parallel.For(0,100, ()=> RetrieveRecordSet<>());
}
Parallel.For is indeed a blocking operation. If you want the Parallel.For loop to run on a separate thread than the UI:
new Thread(() => Parallel.For(0, 100, index => RetrieveRecordSet<TestClass3>())).Start();
Or
new Task(() => Parallel.For(0, 100, index => RetrieveRecordSet<TestClass3>())).Start();
Thanks for the answers folks. In the end, the answer is much more nefarious, and I'm still not sure what the problem is but I found a workaround. The underlying code is making WCF service calls. Using Task Parallel library to make a bunch of parallel WCF calls the first time a channel is used, will serialize those calls. I serendipitously discovered that if you 'prime' the channel with a single call first, await the response, and THEN slam it with a bunch of parallel WCF calls, you then get full parallelism. Is there a less hacky perhaps proper way to prime a WCF channel as such? Is this a bug in WCF or TPL?

Categories