Task Parallel Library mixed with async/await

Task Parallel Library mixed with async/await - c#

In a web application, we provide paginated search panels for various database tables in our application. We currently allow users to select individual rows, and via a UI, execute some operation in each selected instance.
For example, a panel of document records offers an ability to delete documents. A user may check 15 checkboxes representing 15 document identifiers, and choose Options > Delete. This works just fine.
I wish to offer the users an option to execute some operation for all rows matching the query used to display the data in the panel.
We may have 5,000 documents matching some search criteria, and wish to allow a user to delete all 5,000. (I understand this example is a bit contrived; let's ignore the 'wisdom' to allowing users to delete documents in bulk!)
Execution of a method for thousands of rows is a long-running operation, so I will queue the operation instead. Consider this an equivalent of Gmail's ability to apply a filter to all email conversations matching some search criteria.
I need to execute a query that will return an unknown number of rows, and for each row, insert a row into a queue (in the code below, the queue is represented by ImportFileQueue).
I coded it as follows:
using (var reader = await source.InvokeDataReaderAsync(operation, parameters))
{
Parallel.ForEach<IDictionary<string, object>>(reader.Enumerate(), async properties =>
{
try
{
var instance = new ImportFileQueueObject(User)
{
// application tier calculation here; cannot do in SQL
};
await instance.SaveAsync();
}
catch (System.Exception ex)
{
// omitted for brevity
}
});
}
When running this in a unit test that wraps the call with a Transaction, I receive a System.Data.SqlClient.SqlException: Transaction context in use by another session. error.
This is easily resolved by either:
Change the database call from async to sync, or
Removing the Parallel.Foreach, and iterating through the reader in a serial manner.
I opted for this former:
using (var reader = await source.InvokeDataReaderAsync(operation, parameters))
{
Parallel.ForEach<IDictionary<string, object>>(reader.Enumerate(), properties =>
{
try
{
var instance = new ImportFileQueueObject(User)
{
// Omitted for brevity
};
instance.Save();
}
catch (System.Exception ex)
{
// omitted for brevity
}
});
}
My thought process is, in typical use cases:
the outer reader will often have thousands of rows
the instance.Save() call is "lightweight"; inserting a single row into the db
Two questions:
Is there a reasonable way to use async/await inside the Parallel.Foreach, where the inner code is using SqlConnection (avoiding the TransactionContext error)
If not, given my expected typical use case, is my choice to leverage TPL and forfeit async/await for the single-row saves reasonable
The answer suggested in What is the reason of “Transaction context in use by another session” says:
Avoid multi-threaded data operations if it's possible (no matter
loading or saving). E.g. save SELECT/UPDATE/ etc... requests in a
single queue and serve them with a single-thread worker;
but I'm trying to minimize total execution time, and figured the Parallel.Foreach was more likely to reduce execution time.

It's almost always a bad idea to open a transaction and then wait for I/O while holding it open. You'll get much better performance (and fewer deadlocks) by buffering the data first. If there's more total data than you can easily buffer in memory, buffer it into chunks of a thousand or so rows at a time. Put each of those in a separate transaction if possible.
Whenever you open a transaction, any locks taken remain open until it is committed (and locks get taken whether you want to or not when you're inserting data). Those locks cause other updates or reads without WITH(NOLOCK) to sit and wait until the transaction is committed. In a high-performance system, if you're doing I/O while those locks are held, it's pretty-much guaranteed to cause problems as other callers start an operation and then sit and wait while this operation does I/O outside the transaction.

Related

TPL Dataflow: ActionBlock that avoids repeatedly running a using-block (such as for writing to a StreamWriter) on every invocation of its delegate

I need to read 1M rows from an IDataReader, and write n text files simultaneously. Each of those files will be a different subset of the available columns; all n text files will be 1M lines long when complete.
Current plan is one TransformManyBlock to iterate the IDataReader, linked to a BroadcastBlock, linked to n BufferBlock/ActionBlock pairs.
What I'm trying to avoid is having my ActionBlock delegate perform a using (StreamWriter x...) { x.WriteLine(); } that would open and close every output file a million times over.
My current thought is in lieu of ActionBlock, write a custom class that implements ITargetBlock<>. Is there is a simpler approach?
EDIT 1: The discussion is of value for my current problem, but the answers so far got hyper focused on file system behavior. For the benefit of future searchers, the thrust of the question was how to build some kind of setup/teardown outside the ActionBlock delegate. This would apply to any kind of disposable that you would ordinarily wrap in a using-block.
EDIT 2: Per #Panagiotis Kanavos the executive summary of the solution is to setup the object before defining the block, then teardown the object in the block's Completion.ContinueWith.

Writing to a file one line at a time is expensive in itself even when you don't have to open the stream each time. Keeping a file stream open has other issues too, as file streams are always buffered, from the FileStream level all the way down to the file system driver, for performance reasons. You'd have to flush the stream periodically to ensure the data was written to disk.
To really improve performance you'd have to batch the records, eg with a BatchBlock. Once you do that, the cost of opening the stream becomes negligible.
The lines should be generated at the last possible moment too, to avoid generating temporary strings that will need to be garbage collected. At n*1M records, the memory and CPU overhead of those allocations and garbage collections would be severe.
Logging libraries batch log entries before writing to avoid this performance hit.
You can try something like this :
var batchBlock=new BatchBlock<Record>(1000);
var writerBlock=new ActionBlock<Record[]>( records => {
//Create or open a file for appending
using var writer=new StreamWriter(ThePath,true);
foreach(var record in records)
{
writer.WriteLine("{0} = {1} :{2}",record.Prop1, record.Prop5, record.Prop2);
}
});
batchBlock.LinkTo(writerBlock,options);
or, using asynchronous methods
var batchBlock=new BatchBlock<Record>(1000);
var writerBlock=new ActionBlock<Record[]>(async records => {
//Create or open a file for appending
await using var writer=new StreamWriter(ThePath,true);
foreach(var record in records)
{
await writer.WriteLineAsync("{0} = {1} :{2}",record.Prop1, record.Prop5, record.Prop2);
}
});
batchBlock.LinkTo(writerBlock,options);
You can adjust the batch size and the StreamWriter's buffer size for optimum performance.
Creating an actual "Block" that writes to a stream
A custom block can be created using the technique shown in the Custom Dataflow block walkthrough - instead of creating an actual custom block, create something that returns whatever is needed for LinkTo to work, in this case an ITargetBlock< T> :
ITargetBlock<Record> FileExporter(string path)
{
var writer=new StreamWriter(path,true);
var block=new ActionBlock<Record>(async msg=>{
await writer.WriteLineAsync("{0} = {1} :{2}",record.Prop1, record.Prop5, record.Prop2);
});
//Close the stream when the block completes
block.Completion.ContinueWith(_=>write.Close());
return (ITargetBlock<Record>)target;
}
...
var exporter1=CreateFileExporter(path1);
previous.LinkTo(exporter,options);
The "trick" here is that the stream is created outside the block and remains active until the block completes. It's not garbage-collected because it's used by other code. When the block completes, we need to explicitly close it, no matter what happened. block.Completion.ContinueWith(_=>write.Close()); will close the stream whether the block completed gracefully or not.
This is the same code used in the Walkthrough, to close the output BufferBlock :
target.Completion.ContinueWith(delegate
{
if (queue.Count > 0 && queue.Count < windowSize)
source.Post(queue.ToArray());
source.Complete();
});
Streams are buffered by default, so calling WriteLine doesn't mean the data will actually be written to disk. This means we don't know when the data will actually be written to the file. If the application crashes, some data may be lost.
Memory, IO and overheads
When working with 1M rows over a significant period of time, things add up. One could use eg File.AppendAllLinesAsync to write batches of lines at once, but that would result in the allocation of 1M temporary strings. At each iteration, the runtime would have to use at least as RAM for those temporary strings as the batch. RAM usage would start ballooning to hundreds of MBs, then GBs, before the GC fired freezing the threads.
With 1M rows and lots of data it's hard to debug and track data in the pipeline. If something goes wrong, things can crash very quickly. Imagine for example 1M messages stuck in one block because one message got blocked.
It's important (for sanity and performance reasons) to keep individual components in the pipeline as simple as possible.

Often when using TPL, I will make custom classes so I can create private member variables and private methods that are used for blocks in my pipeline, but instead of implementing ITargetBlock or ISourceBlock, I'll just have whatever blocks I need inside of my custom class, and then I expose an ITargetBlock and or an ISourceBlock as public properties so that other classes can use the source and target blocks to link things together.

Design help for parallel processing Azure blob and bulk copy to SQL database

I have a requirement to fetch blob files from Azure storage, read through them, get the data and process it, and store it into a database. The number of data fetched from blob is high, i.e. around 40K records per file. There are 70 files like this in a folder.
This is how I designed it:
I use Parallel.Foreach on list of blob files with max parallelism 4.
In each loop, I fetch stream a blob (OpenRead method), read through it and fill a datatable. If the datatable size is 10000, I will call SqlBulkCopy and insert the data into the database.
In one folder of blob there are 70 files.
Parallel.Foreach {
// Stream blob file
// Create a datatable
foreach item in file
{
AddToDatatable
if(datatable > 5000)
{
BulkCopy to DB.
Clear datatable
}
}
// Dispose datatable
}
Some observations I found is, when I increase the parallel count the time taken to process one file increase. Is it because I'm opening multiple blob stream in parallel? Also multiple parallel causes more data to be stored in memory at a time.
I would like to know 2 things:
I would like to try a different design where I can keep a single datatable and fill it from the parallel foreach. Then if it reaches 10K records, I should store in DB and clear. I don't know how to implement it.
If there's a better approach in terms of processing the files faster.

Your current approach is quite logical. It is not optimal though, because each parallel workflow is composed of heterogeneous jobs, that are not coordinated with the other workflows. For example it is entirely possible that at a given moment all four parallel workflows are fetching data from Azure, at another moment all four are constructing datatables from raw data, and another moment all four are waiting for a response from the database.
All these heterogeneous jobs have different characteristics. For example the interaction with the database may not be parallelizable, and sending 4 concurrent SqlBulkCopy commands to the database may be actually slower than sending them the one after the other. On the other hand creating datatables in memory is probably highly parallelizable, and fetching data from Azure may be benefited by parallelism only slightly (because the bottleneck could be the speed of your internet connection, and not the speed of the Azure servers). It is quite certain though that you could achieve a performance boost between 2x-3x by just making sure that at any given moment all heterogeneous jobs are in progress. This is called task-parallelism, in contrast to the simpler data-parallelism (your current setup).
To achieve task-parallelism you need to create a pipeline, where the data are flowing from the one processing block to the next, until they reach the final block. In your case you probably need 3 blocks:
Download files from Azure and split them to raw records.
Parse the records and push the parsed data to datatables.
Send the datatables to the database for storage.
Sending single records from the first block to the second block may not be optimal, because the parallelism has overhead, and the more granular the workload the more overhead it creates. So ideally you would need to chunkify the workload, and batch the records to arrays before sending them to the next block. All this can be implemented with a great tool that is designed for exactly this kind of job, the TPL Dataflow library. It has blocks for transforming, batching, unbatching and whatnot. It is also very flexible and feature-rich regarding the options it offers. But since it has some learning curve, I have something more familiar to suggest as the infrastructure for the pipeline: the PLINQ library.
Any time you add the AsParallel operator to a query, a new processing block is started. To force the data to flow to the next block as fast as possible, the WithMergeOptions(ParallelMergeOptions.NotBuffered) operator is needed. For controlling the degree of parallelism there is the WithDegreeOfParallelism, and to keep them in the original order there is the AsOrdered. Lets combine all these in a single extension method for convenience, to avoid repeating them over and over again:
public static ParallelQuery<TSource> BeginPipelineBlock<TSource>(
this IEnumerable<TSource> source, int degreeOfParallelism)
{
return Partitioner
.Create(source, EnumerablePartitionerOptions.NoBuffering)
.AsParallel()
.AsOrdered()
.WithDegreeOfParallelism(degreeOfParallelism)
.WithMergeOptions(ParallelMergeOptions.NotBuffered);
}
The reason for the Partitioner configured with NoBuffering is for ensuring that the PLINQ will enumerate the source in its natural order, one item at a time. Without it the PLINQ utilizes some fancy partitioning strategies, that are not suitable for this usage.
Now your pipeline can be constructed fluently like this:
files
.BeginPipelineBlock(degreeOfParallelism: 2)
.SelectMany(file => DownloadFileRecords(file))
.Chunk(1000)
.BeginPipelineBlock(degreeOfParallelism: 3)
.Select(batch => CreateDataTable(batch))
.BeginPipelineBlock(degreeOfParallelism: 1)
.ForAll(dataTable => SaveDataTable(dataTable));
The Chunk is a LINQ operator that splits the elements of a sequence into chunks:
public static IEnumerable<TSource[]> Chunk<TSource> (
this IEnumerable<TSource> source, int size);
Important: If you use the above technique to build the pipeline, you should avoid configuring two consecutive blocks with degreeOfParallelism: 1. This is because of how PLINQ works. This library does not depend only on background threads, but it also uses the current thread as a worker thread. So if two (or more) consecutive pipeline blocks are configured with degreeOfParallelism: 1, they will all attempt to execute their workload in the current thread, blocking each other, and defeating the whole purpose of task-parallelism.
This shows that this library is not intended to be used as a pipeline infrastructure, and using it as such imposes some limitations. So if it makes sense for your pipeline to have consecutive blocks with degreeOfParallelism: 1, the PLINQ becomes not a viable option, and you should look for alternatives. Like the aforementioned TPL Dataflow library.
Update: It is actually possible to link consecutive blocks having degreeOfParallelism: 1, without squeezing them into a single thread, by offloading the enumeration of the source to another thread. This way each block will run on a different thread. Below is an implementation of a OffloadEnumeration method, based on a BlockingCollection<T>:
/// <summary>
/// Offloads the enumeration of the source sequence to the ThreadPool.
/// </summary>
public static IEnumerable<T> OffloadEnumeration<T>(
this IEnumerable<T> source, int boundedCapacity = 1)
{
ArgumentNullException.ThrowIfNull(source);
using BlockingCollection<T> buffer = new(boundedCapacity);
Task reader = Task.Run(() =>
{
try { foreach (T item in source) buffer.Add(item); }
catch (InvalidOperationException) when (buffer.IsAddingCompleted) { } // Ignore
finally { buffer.CompleteAdding(); }
});
try
{
foreach (T item in buffer.GetConsumingEnumerable()) yield return item;
reader.GetAwaiter().GetResult(); // Propagate possible source error
}
finally
{
// Prevent fire-and-forget
if (!reader.IsCompleted) { buffer.CompleteAdding(); Task.WaitAny(reader); }
}
}
This method should be invoked at the beginning of each block:
public static ParallelQuery<TSource> BeginPipelineBlock<TSource>(
this IEnumerable<TSource> source, int degreeOfParallelism)
{
source = OffloadEnumeration(source);
return Partitioner
.Create(source, EnumerablePartitionerOptions.NoBuffering)
.AsParallel()
.AsOrdered()
.WithDegreeOfParallelism(degreeOfParallelism)
.WithMergeOptions(ParallelMergeOptions.NotBuffered);
}
This is really only useful when the previous block has degreeOfParallelism: 1, but calling it always shouldn't add much overhead (assuming that the workload of each block is fairly chunky).
Note: Another drawback of using the PLINQ library as a processing pipeline, is that its backpressure policy is not configurable. This means that if the consumer of the ParallelQuery<T> enumerates it very slowly, the pipeline will keep fetching data from the source IEnumerable<T>, and processing them in parallel, until its internal buffer reaches an arbitrary size. This threshold is experimentally found to be somewhere between 1,000 and 50,000 elements, depending on whether the AsOrdered operator is present. I can't see any public API that allows to configure the size of this buffer, which can be a big problem if each piece of data occupies a large chunk of memory.

C# performing bulk update on table from multiple threads without a deadlock

I have written a following piece of code:
public void BulkUpdateItems(List<Items> items)
{
var bulk = new BulkOperations();
using (var trans = new TransactionScope())
{
using (SqlConnection conn = new SqlConnection(#"connstring"))
{
bulk.Setup()
.ForCollection(items)
.WithTable("Items")
.AddColumn(x => x.QuantitySold)
.BulkUpdate()
.MatchTargetOn(x => x.ItemID)
.Commit(conn);
}
trans.Complete();
}
}
With using a SQLBulkTools library... But the problem here is when I run this procedure from multiple threads at a time I run on deadlocks...
And the error states that a certain process ID was deadlocked or something like that....
Is there any alternative to perform a bulk update of 1 table from multiple threads in an efficient way?
Can someone help me out?

I don't know much about that API but a quick read suggests a few things you could try. I would try them in the order listed.
Use a smaller batch size, and/or set the batch timeout higher. This will let each thread take turns.
Use a temporary table. This will allow the threads to work independently.
Set the options to use a table lock. If you lock the whole table, different threads won't be able to lock different rows, so you shouldn't get any deadlocks.

The deadlock message is coming from SQL Server - it means that one of your connections is waiting on a resource locked by another, and that second connection is waiting on a resource held on the first.
If you are trying to update the same table, you are likely running into a simple SQL locking issue and nothing really to do with C#. You need to think more thoroughly about the implications of doing a bulk update on multiple threads; its probably (depending on the percentage of the table you are updating) better to do this on a single connection and use a queue style of mechanism to de-conflict the individual calls.

Try
lock
{
....
}
What this will do is, when one process is executing the code within the curly braces, it will cause other processes to wait until the first one is finished. In that way, only one process will execute the block at a time.

How to use Multi threading to call stored procedure for each of item in collection

Have a job which does the following 2 tasks
Read up to 300 unique customerId from database into a List.
Then Call a stored procedure for each customerId, which execute queries in the SP, creates a XML (up to of 10 KB) and store the XML into a database table.
So, in this case there should be 300 records in the table.
On an average, the SP takes around 3 secs to process each customer until it's xml creation. So that means, it's taking total 15 minutes to completely process all 300 customers. The problem is, in future it may be even more time-consuming.
I don't want to go with bulk-insert option by having logic of xml creation in application. Using bulk-insert, I won't be able to know which customerId's data was a problem if xml creation failed. So I want to call the SP for each customer.
To process all customer in parallel, I created 4 dedicated threads, each processing a collection of unique customerId, all 4 threads together processed all 300 customers in 5 minutes. Which I was expecting.
However I want to use ThreadPool rather than creating my own threads.
I want to have 2 type of threads here. One type of is process and create xml for each customer, another to work on the customers for XML is already created. This another thread will call a SP which would update a flag on a customer table based on customer's XML available.
So which is the best way to process 300 customers in parallel and quickly and also updating customer table in parallel or on a separate thread?
Is dedicated thread still good option here or Parallel.ForEach or await Task.WhenAll?
I know Parallel.Foreach will block the main thread, which I want to use for updating customer table.

You have to choose among several options for implementation. First of all, choose the schema you are using. You can implement your algorithm in co-routine fashion, whenever the thread needs some long-prepairing data, it yields the execution with await construction.
// It can be run inside the `async void` event handler from your UI.
// As it is async, the UI thread wouldn't be blocked
async Task SaveAll()
{
for(int i = 0; i < 100; ++i)
{
// somehow get a started task for saving the (i) customer on this thread
await SaveAsync(i);
}
}
// This method is our first coroutine, which firstly starts fetching the data
// and after that saves the result in database
async Task SaveAsync(int customerId)
{
// at this point we yield the work to some other method to be run
// as at this moment we do not do anything
var customerData = await FetchCustomer(customerId);
// at this moment we start to saving the data asynchroniously
// and yield the execution another time
var result = await SaveCustomer(customerData);
// at this line we can update the UI with result
}
FetchCustomer and SaveCustomer can use the TPL (they can be replaced with anonymous methods, but I don't like this approach). Task.Run will execute the code inside the default ThreadPool, so UI thread wouldn't be blocked (more about this method in Stephen Cleary's blog):
async Task<CustomerData> FetchCustomer(int customerId)
{
await Task.Run(() => DataRepository.GetCustomerById(customerId));
}
// ? here is a placeholder for your result type
async Task<?> SaveCustomer(CustomerData customer)
{
await Task.Run(() => DataRepository.SaveCustomer(customer));
}
Also I suggest you to examine this articles from that blog:
StartNew is Dangerous
Async and Await
Don't Block on Async Code
Async/Await - Best Practices in Asynchronous Programming
Another option is to use the TPL Dataflow extension, very similar to the this answer:
Nesting await in Parallel foreach
I suggest you to examine the contents of linked post, and decide for yourself, which approach will you implement.

I would try to solve your task completely within SQL. This will greatly reduce server roundtrips.
I think you can create a enumeration of tasks, each one dealing with one record, and then to call Task.WhenAll() to run them all.

Using Parallel.ForEach and Tasks.Factory.StartNew for database insertion/updation

I am working in .Net 4.0 and my Code is supposed to do this:
I have a WebAPI exposed to user . In this I have a collection of Objects . Basically a ConcurrentBag containing some objects . I have to iterate over each object in this collection and then insert/update its data in the Database . The count of objects can be high (200-300) . Adding to that , if there can be multiple concurrent users using my API .
Now , The insertion / updation is very slow as for each record the conn is made to database which makes this process very slow . Unfortunately i cant change the logic for this .
To improve performance I am using Parallel.ForEach instead of routine foreach as each iteration is diffrent. Also, I am creating a seperate task for each insertion in the db
Here is My Code
var tasks = new List<Task>(allRecordings.Count);//Creating a Task List
Parallel.ForEach(allRecordings, recording =>
{
var recordingItem = recording;
//Lines oF Code
//
if ( some Conditions){
var task = Task.Factory.StartNew(
() => SaveRecordingDetailsToDb(ref recordingItem, device.Locale));
recording.Title = recordingItem.Title;
recording.ProgramId =recordingItem.ProgramId;
recording.SeriesId = recordingItem.SeriesId;
tasks.Add(task);//Adding Task to List
}
});
Task.WaitAll(tasks.ToArray()); //Waiting for all Tasks to complete before going back to main
Function
}
Can MemoryLeak occur in the above block when there are multiple concurrent requests consuming this same API
Also, will using Parallel.ForEach be better here than normal ForEach.

The TPL (Task Parallel Library) was designed specifically for compute-bound operations, for operations which can be completed in parallel (like calculations on different CPU cores). In your case, you write into the database, so, basically you write something to a file system, i.e. this is IO operation. IO operations cannot be executed in parallel in sense of pure parallelism. If you run several IO operations simultaneously, they will simply interrupt each other and thus will take more time to complete comparing with running them one by one. Of course, the database server should somehow handle this situation, but it will not be significantly faster than sending requests to it one by one, more likely, it will be even slower.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.