I am using a background service now to process a time consuming operation and same time showing progress in UI
But now looking for some performance improvements in terms of time [since at present background worker took more time to complete the processings]
The current solution of Do_Work in background worker is like
foreach(string _entry in ArrayList){
//process the _entry(communicate to a service and store response in Db) and took almost 2-3 seconds for the completion
}
ArrayList contains almost 25000 records . So almost 25000*3 = 75000 seconds time is required to do the processing now
The new solution i am thinking of is like starting 50 Threads of 500 items each at same time and waits for the completion of all Threads something like this
int failed = 0;
var tasks = new List<Task>();
for (int i = 1; i < 50; i++)
{
tasks.Add(Task.Run(() =>
{
try
{
//Process 500 items from Array .(communicate to a service and store response in Db)
}
catch
{
Interlocked.Increment(ref failed);
throw;
}
}));
}
Task t = Task.WhenAll(tasks); //Runs 50 Threads
try
{
await t;
}
catch (AggregateException exc)
{
string _error="";
foreach (Exception ed in t.Exception.InnerExceptions)
{
_error+=ed.Message;
}
MessageBox.Show(_error);
}
if (t.Status == TaskStatus.RanToCompletion)
MessageBox.Show("All ping attempts succeeded.");
else if (t.Status == TaskStatus.Faulted)
MessageBox.Show("{0} ping attempts failed", failed.ToString());
Will this helps to reduce the processing time? Or some better approaches?
I tried with a small sample size of 10 and debugged , but i cant see much difference (Something is wrong in my choice of WhenAll ?)
You need to handle few records at a time.
Process n records with n different tasks. Let's say n = 5. So 5 different tasks t1,t2,t3,t4 and t5. Once anyone is completed processing data one row they should start processing t (1+n), t (2+n), t(3+n),t(4+n), t(5+n) row and so on until all records are processed.
Store all those processed values into a dictionary or list to identify which one belongs to which record.
It's like recursive function.
I have used this kind of approach in past and it really improves the performance a lot. You can fine-tune the value of n based on your PC configuration
I would suggest to try with BlockingCollection
Related
I have a Windows service that reads data from the database and processes this data using multiple REST API calls.
Originally, this service ran on a timer where it would read unprocessed data from the database and process it using multiple threads limited using SemaphoreSlim. This worked well except that the database read had to wait for all processing to finish before reading again.
ServicePointManager.DefaultConnectionLimit = 10;
Original that works:
// Runs every 5 seconds on a timer
private void ProcessTimer_Elapsed(object sender, ElapsedEventArgs e)
{
var hasLock = false;
try
{
Monitor.TryEnter(timerLock, ref hasLock);
if (hasLock)
{
ProcessNewData();
}
else
{
log.Info("Failed to acquire lock for timer."); // This happens all of the time
}
}
finally
{
if (hasLock)
{
Monitor.Exit(timerLock);
}
}
}
public void ProcessNewData()
{
var unproceesedItems = GetDatabaseItems();
if (unproceesedItems.Count > 0)
{
var downloadTasks = new Task[unproceesedItems.Count];
var maxThreads = new SemaphoreSlim(semaphoreSlimMinMax, semaphoreSlimMinMax); // semaphoreSlimMinMax = 10 is max threads
for (var i = 0; i < unproceesedItems .Count; i++)
{
maxThreads.Wait();
var iClosure = i;
downloadTasks[i] =
Task.Run(async () =>
{
try
{
await ProcessItemsAsync(unproceesedItems[iClosure]);
}
catch (Exception ex)
{
// handle exception
}
finally
{
maxThreads.Release();
}
});
}
Task.WaitAll(downloadTasks);
}
}
To improve efficiency, I rewrite the service to run GetDatabaseItems in a separate thread from the rest so that there is a ConcurrentDictionary of unprocessed items between them that GetDatabaseItems fills and ProcessNewData empties.
The problem is that while 10 unprocessed items are send to ProcessItemsAsync, they are processed two at a time instead of all 10.
The code inside of ProcessItemsAsync calls var response = await client.SendAsync(request); where the delay occurs. All 10 threads make it to this code but come out of it two at a time. None of this code changed between the old version and the new.
Here is the code in the new version that did change:
public void Start()
{
ServicePointManager.DefaultConnectionLimit = maxSimultaneousThreads; // 10
// Start getting unprocessed data
getUnprocessedDataTimer.Interval = getUnprocessedDataInterval; // 5 seconds
getUnprocessedDataTimer.Elapsed += GetUnprocessedData; // writes data into a ConcurrentDictionary
getUnprocessedDataTimer.Start();
cancellationTokenSource = new CancellationTokenSource();
// Create a new thread to process data
Task.Factory.StartNew(() =>
{
try
{
ProcessNewData(cancellationTokenSource.Token);
}
catch (Exception ex)
{
// error handling
}
}, TaskCreationOptions.LongRunning
);
}
private void ProcessNewData(CancellationToken token)
{
// Check if task has been canceled.
while (!token.IsCancellationRequested)
{
if (unprocessedDictionary.Count > 0)
{
try
{
var throttler = new SemaphoreSlim(maxSimultaneousThreads, maxSimultaneousThreads); // maxSimultaneousThreads = 10
var tasks = unprocessedDictionary.Select(async item =>
{
await throttler.WaitAsync(token);
try
{
if (unprocessedDictionary.TryRemove(item.Key, out var item))
{
await ProcessItemsAsync(item);
}
}
catch (Exception ex)
{
// handle error
}
finally
{
throttler.Release();
}
});
Task.WhenAll(tasks);
}
catch (OperationCanceledException)
{
break;
}
}
Thread.Sleep(1000);
}
}
Environment
.NET Framework 4.7.1
Windows Server 2016
Visual Studio 2019
Attempts to fix:
I tried the following with the same bad result (two await client.SendAsync(request) completing at a time):
Set Max threads and ServicePointManager.DefaultConnectionLimit to 30
Manually create threads using Thread.Start()
Replace async/await pattern with sync HttpClient calls
Call data processing using Task.Run(async () => and Task.WaitAll(downloadTasks);
Replace the new long-running thread for ProcessNewData with a timer
What I want is to run GetUnprocessedData and ProcessNewData concurrently with an HttpClient connection limit of 10 (set in config) so that 10 requests are processed at the same time.
Note: the issue is similar to HttpClient.GetAsync executes only 2 requests at a time? but the DefaultConnectionLimit is increased and the service runs on a Windows Server. It also creates more than 2 connections when original code runs.
Update
I went back to the original project to make sure it still worked, it did. I added a new timer to perform some unrelated operations and the httpClient issue came back. I removed the timer, everything worked. I added a new thread to do parallel processing, the problem came back.
This is not a direct answer to your question, but a suggestion for simplifying your service that could make the debugging of any problem easier. My suggestion is to implement the producer-consumer pattern using an iterator for producing the unprocessed items, and a parallel loop for consuming them. Ideally the parallel loop would have async delegates, but since you are targeting the .NET Framework you don't have access to the .NET 6 method Parallel.ForEachAsync. So I will suggest the slightly wasteful approach of using a synchronous parallel loop that blocks threads. You could use either the Parallel.ForEach method, or the PLINQ like in the example below:
private IEnumerable<Item> Iterator(CancellationToken token)
{
while (true)
{
Task delayTask = Task.Delay(5000, token);
foreach (Item item in GetDatabaseItems()) yield return item;
delayTask.GetAwaiter().GetResult();
}
}
public void Start()
{
//...
ThreadPool.SetMinThreads(degreeOfParallelism, Environment.ProcessorCount);
new Thread(() =>
{
try
{
Partitioner
.Create(Iterator(token), EnumerablePartitionerOptions.NoBuffering)
.AsParallel()
.WithDegreeOfParallelism(degreeOfParallelism)
.WithCancellation(token)
.ForAll(item => ProcessItemAsync(item).GetAwaiter().GetResult());
}
catch (OperationCanceledException) { } // Ignore
}).Start();
}
Online demo.
The Iterator fetches unprocessed items from the database in batches, and yields them one by one. The database won't be hit more frequently than once every 5 seconds.
The PLINQ query is going to fetch a new item from the Iterator each time it has a worker available, according to the WithDegreeOfParallelism policy. The setting EnumerablePartitionerOptions.NoBuffering ensures that it won't try to fetch more items in advance.
The ThreadPool.SetMinThreads is used in order to boost the availability of ThreadPool threads, since the PLINQ is going to use lots of them. Without it the ThreadPool will not be able to satisfy the demand immediately, although it will gradually inject more threads and eventually will catch up. But since you already know how many threads you'll need, you can configure the ThreadPool from the start.
In case you dislike the idea of blocking threads, you can find a simple substitute of the Parallel.ForEachAsync here, based on the TPL Dataflow library. It requires installing a NuGet package.
The issue turned out to be the place where ServicePointManager.DefaultConnectionLimit is set.
In the version where HttpClient was only doing two requests at a time, ServicePointManager.DefaultConnectionLimit was being set before the threads were being created but after the HttpClient was initialized.
Once I moved it into the constructor before the HttpClient is initialized, everything started working.
Thank you very much to #Theodor Zoulias for the help.
TLDR; Set ServicePointManager.DefaultConnectionLimit before initializing the HttpClient.
Edit: As per the discussion in the comments, I was overestimating how much many threads would help, and have gone back to Parallell.ForEach with a reasonable MaxDegreeOfParallelism, and just have to wait it out.
I have a 2D array data structure, and perform work on slices of the data. There will only ever be around 1000 threads required to work on all the data simultaneously. Basically there are around 1000 "days" worth of data for all ~7000 data points, and I would like to process the data for each day in a new thread in parallel.
My issue is that doing work in the child threads dramatically slows the time in which the main thread starts them. If I have no work being done in the child threads, the main thread starts them all basically instantly. In my example below, with just a bit of work, it takes ~65ms to start all the threads. In my real use case, the worker threads will take around 5-10 seconds to compute all what they need, but I would like them all to start instantly otherwise, I am basically running the work in sequence. I do not understand why their work is slowing down the main thread from starting them.
How the data is setup shouldn't matter (I hope). The way it's setupmight look weird I was just simulating exactly how I receive the data. What's important is that if you comment out the foreach loop in the DoThreadWork method, the time it takes to start the threads is waaay lower.
I have the for (var i = 0; i < 4; i++) loop just to run the simulation multiple times to see 4 sets of timing results to make sure that it wasn't just slow the first time.
Here is a code snippet to simulate my real code:
public static void Main(string[] args)
{
var fakeData = Enumerable
.Range(0, 7000)
.Select(_ => Enumerable.Range(0, 400).ToArray())
.ToArray();
const int offset = 100;
var dataIndices = Enumerable
.Range(offset, 290)
.ToArray();
for (var i = 0; i < 4; i++)
{
var s = Stopwatch.StartNew();
var threads = dataIndices
.Select(n =>
{
var thread = new Thread(() =>
{
foreach (var fake in fakeData)
{
var sliced = new ArraySegment<int>(fake, n - offset, n - (n - offset));
DoThreadWork(sliced);
}
});
return thread;
})
.ToList();
foreach (var thread in threads)
{
thread.Start();
}
Console.WriteLine($"Before Join: {s.Elapsed.Milliseconds}");
foreach (var thread in threads)
{
thread.Join();
}
Console.WriteLine($"After Join: {s.Elapsed.Milliseconds}");
}
}
private static void DoThreadWork(ArraySegment<int> fakeData)
{
// Commenting out this foreach loop will dramatically increase the speed
// in which all the threads start
var a = 0;
foreach (var fake in fakeData)
{
// Simulate thread work
a += fake;
}
}
Use the thread/task pool and limit thread/task count to 2*(CPU Cores) at most. Creating more threads doesn't magically make more work get done as you need hardware "threads" to run them (1 per CPU core for non-SMT CPU's, 2 per core for Intel HT, AMD's SMT implementation). Executing hundreds to thousands of threads that don't have to passively await asynchronous callbacks (i.e. I/O) makes running the threads far less efficient due to thrashing the CPU with context switches for no reason.
i have this part of code
bool hasData = true;
using (Context context = new Context())
{
using (SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(MAX_THREADS))
{
while (hasData)
{
Message message = context.Database.SqlQuery<Message>($#"
select top(1) * from Message where
( Status = {(int)MessageStatusEnum.Pending} ) or
( Status = {(int)MessageStatusEnum.Paused } and ResumeOn < GETUTCDATE() )
").FirstOrDefault();
if message == null)
{
hasData = false;
}
else
{
concurrencySemaphore.Wait();
tasks.Add(Task.Factory.StartNew(() =>
{
Process(message);
concurrencySemaphore.Release();
}, this.CancellationToken));
}
}
Task.WaitAll(tasks.ToArray());
}
}
And my Process function is something like this
private void Process(Message message)
{
System.Threading.Thread.Sleep(10000);
}
Now, if i have only 1 item that i want to Process then the total execution time is 10sec and the execution time per item(1 item) is 10 sec.
Well if i have 10 items for example, then the execution per item is increasing to 15-20sec.
I tried to change the value of MAX_THREADS but always if i have more than 10 items in my queue and start the parallel execution then the time of execution per item is about 15sec.
What i am missing?
Not unexpected. Parallel Slowdown is a very real problem. On the one end we have pleasingly parallel operations. On other end inherently serial stuff like calcuating the Fibonacci sequence. And there is a lot of area in between.
Multitasking in general and Multithreading in particular is not a magic bullet that makes everything faster. I like to say: Multitasking has to pick it's problems carefully. If you throw more tasks at a problem that is not designed for it, you end up with code that is:
more complex
more memory demanding
and most importanlty slower then the simple, sequential operation
At best you just added all that Task Management overhead to the operation. At worst you run into deadlocks and resource competition. And your example operation of Sleeping the Thread is not something that will benefit from Multitasking anyway. Please show us the actuall operation you want to speed up via Multitasking/-Threading. So we can tell you if there is any chance that can work out.
I have an application which process images extracted from database. I need to process various images in parallel, and because of that i'm using .NET TPL with Tasks.
My application is a Winforms app in C#. I just have the option to choose how many processes will start and a Start button. The way that i'm doing right now is this:
private Action getBusinessAction(int numProcess) {
return () =>
{
try {
while (true) {
(new BusinessClass()).doSomeProcess(numProcess);
tokenCancelacion.ThrowIfCancellationRequested();
}
}
catch (OperationCanceledException ex) {
Console.WriteLine("Cancelled process");
}
};
}
...
for (int cnt = 0; cnt < NUM_OF_MAX_PROCESS; cnt++) {
Task.Factory.StartNew(getBusinessAction(cnt + 1), tokenCancelacion);
}
In this case, if i choose 8 as the number of processes, it starts 8 tasks which are running until application is closed. But i think that a better approach will be to start a number of tasks which calls doSomeProcess method, and then finish. But when a task finishes, i would like to start the same task or start a new instance of the task that does the same, in order to having always that number of processes running in parallel. Is there a way in TPL to achieve this?
This sounds like a good fit for TPL Dataflow's ActionBlock. You create a single block at the start. Set how many items it should process concurrently (i.e. 8) and post the items into it:
var block = new ActionBlock<Image>(
image => ProcessImage(image),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 8});
foreach (var image in GetImages())
{
block.Post(image);
}
This will take care of creating up to 8 tasks when needed and when they aren't anymore the number would go down.
When you are done you should signal the block for completion and wait for it to complete:
block.Complete();
await block.Completion;
Is there any change that a multiple Background Workers perform better than Tasks on 5 second running processes? I remember reading in a book that a Task is designed for short running processes.
The reasong I ask is this:
I have a process that takes 5 seconds to complete, and there are 4000 processes to complete. At first I did:
for (int i=0; i<4000; i++) {
Task.Factory.StartNewTask(action);
}
and this had a poor performance (after the first minute, 3-4 tasks where completed, and the console application had 35 threads). Maybe this was stupid, but I thought that the thread pool will handle this kind of situation (it will put all actions in a queue, and when a thread is free, it will take an action and execute it).
The second step now was to do manually Environment.ProcessorCount background workers, and all the actions to be placed in a ConcurentQueue. So the code would look something like this:
var workers = new List<BackgroundWorker>();
//initialize workers
workers.ForEach((bk) =>
{
bk.DoWork += (s, e) =>
{
while (toDoActions.Count > 0)
{
Action a;
if (toDoActions.TryDequeue(out a))
{
a();
}
}
}
bk.RunWorkerAsync();
});
This performed way better. It performed much better then the tasks even when I had 30 background workers (as much tasks as in the first case).
LE:
I start the Tasks like this:
public static Task IndexFile(string file)
{
Action<object> indexAction = new Action<object>((f) =>
{
Index((string)f);
});
return Task.Factory.StartNew(indexAction, file);
}
And the Index method is this one:
private static void Index(string file)
{
AudioDetectionServiceReference.AudioDetectionServiceClient client = new AudioDetectionServiceReference.AudioDetectionServiceClient();
client.IndexCompleted += (s, e) =>
{
if (e.Error != null)
{
if (FileError != null)
{
FileError(client,
new FileIndexErrorEventArgs((string)e.UserState, e.Error));
}
}
else
{
if (FileIndexed != null)
{
FileIndexed(client, new FileIndexedEventArgs((string)e.UserState));
}
}
};
using (IAudio proxy = new BassProxy())
{
List<int> max = new List<int>();
if (proxy.ReadFFTData(file, out max))
{
while (max.Count > 0 && max.First() == 0)
{
max.RemoveAt(0);
}
while (max.Count > 0 && max.Last() == 0)
{
max.RemoveAt(max.Count - 1);
}
client.IndexAsync(max.ToArray(), file, file);
}
else
{
throw new CouldNotIndexException(file, "The audio proxy did not return any data for this file.");
}
}
}
This methods reads from an mp3 file some data, using the Bass.net library. Then that data is sent to a WCF service, using the async method.
The IndexFile(string file) method, which creates tasks is called for 4000 times in a for loop.
Those two events, FileIndexed and FileError are not handled, so they are never thrown.
The reason why the performance for Tasks was so poor was because you mounted too many small tasks (4000). Remember the CPU needs to schedule the tasks as well, so mounting a lots of short-lived tasks causes extra work load for CPU. More information can be found in the second paragraph of TPL:
Starting with the .NET Framework 4, the TPL is the preferred way to
write multithreaded and parallel code. However, not all code is
suitable for parallelization; for example, if a loop performs only a
small amount of work on each iteration, or it doesn't run for many
iterations, then the overhead of parallelization can cause the code to
run more slowly.
When you used the background workers, you limited the number of possible alive threads to the ProcessCount. Which reduced a lot of scheduling overhead.
Given that you have a strictly defined list of things to do, I'd use the Parallel class (either For or ForEach depending on what suits you better). Furthermore you can pass a configuration parameter to any of these methods to control how many tasks are actually performed at the same time:
System.Threading.Tasks.Parallel.For(0, 20000, new ParallelOptions() { MaxDegreeOfParallelism = 5 }, i =>
{
//do something
});
The above code will perform 20000 operations, but will NOT perform more than 5 operations at the same time.
I SUSPECT the reason the background workers did better for you was because you had them created and instantiated at the start, while in your sample Task code it seems you're creating a new Task object for every operation.
Alternatively, did you think about using a fixed number of Task objects instantiated at the start and then performing a similar action with a ConcurrentQueue like you did with the background workers? That should also prove to be quite efficient.
Have you considered using threadpool?
http://msdn.microsoft.com/en-us/library/system.threading.threadpool.aspx
If your performance is slower when using threads, it can only be due to threading overhead (allocating and destroying individual threads).