Limit number of Threads without waiting for them [duplicate] - c#

This question already has answers here:
How to limit the Maximum number of parallel tasks in c#
(11 answers)
Closed 4 days ago.
I have a foreach loop that sends data to a GRPC API. I want my loop to send multiple requests at the same time but limit the number of requests to e.g. 10. My current code is the following:
foreach (var element in elements)
{
var x = new Thread(() =>
SendOverGrpc(element));
x.Start();
}
But with that code, the software "immediately" sends all requests. How can I limit the number of requests to e.g. 10? As soon as one of my 10 requests is finished, I want to send the next one.

Easiest way is Parallel.Foreach, eg
Parallel.ForEach(elements, new ParallelOptions() { MaxDegreeOfParallelism = 10 }, element =>
{
SendOverGrpc(element);
});

Related

How to to avoid Throughput exceed error in dynamodb [duplicate]

This question already has answers here:
Parallel.ForEach and async-await [duplicate]
(4 answers)
How to limit the amount of concurrent async I/O operations?
(11 answers)
Closed 11 months ago.
I am trying to do multiple delete operation on dynamodb table. However due to dynamodb limitation of 25 item per batch, I cannot delete more than 25 item per batch. I have list of deleteWriteOperation (batch of 25 each) and I am trying to run the batches parallelly. Any suggestion how can I avoid this or how do I add delay functionality so dynamodb autoscale while task wait.
Here is my code:
// batches is list of list holding DeleteWriteOperation (batch of 25) each list
var opts = new ParallelOptions { MaxDegreeOfParallelism = Convert.ToInt32(Math.Ceiling((Environment.ProcessorCount * 0.75) * 1.0)) }; // limiting number of concurrent threads
try
{
Parallel.ForEach(
batches,
opts,
async batch =>
{
await processDelete(batch, clientId);
});
}
catch (Exception ex)
{
_logger.LogDebug(e)
}
Here is the error that I received using the above code:
Amazon.DynamoDBv2.AmazonDynamoDBException: 'Throughput exceeds the current capacity for one or more global secondary indexes. DynamoDB is automatically scaling your index so please try again shortly.'

C# Multithreading and pooling

Hello fellow developers,
I have a question about implementing multi-threading on my .NET (Framework 4.0) Windows Service.
Basically, what the service should be doing is the following:
Scans the filesystem (a specific directory) to see if there are files to process
If there are files that need to be processed, it should be using a thread pooling mechanism to issue threads up to a predetermined amount.
Each thread will perform an upload operation of a single file
As soon as one thread completes, the filesystem is scanned again to see if there are other files to process (I want to avoid having two threads perform the operation on the same file)
I am struggling to find a way that will allow me to do just that last step.
Right now, I have a function that retrieves the number of maximum number of concurrent threads that runs in the main thread:
int maximumNumberOfConcurrentThreads = getMaxThreads(databaseConnection);
Then, still in the main thread, I have a function that scans the directory and returns a list with the files to process
List<FileToUploadInfo> filesToUpload = getFilesToUploadFromFS(directory);
After this, I call the following function:
generateThreads(maximumNumberOfConcurrentThreads, filesToUpload);
Each thread should be calling the below function (returns void):
uploadFile(fileToUpload, databaseConnection, currentThread);
Right now, the way the program is structured, if maximum number of threads is set, say, to 5, I am grabbing 5 elements from the list and uploading them.
As soon as all 5 are done, I grab 5 more and do the same until I don't have any left, as per code below.
for (int index = 0; index < filesToUpload.Count; index = index + maximumNumberOfConcurrentThreads) {
try {
Parallel.For(0, maximumNumberOfConcurrentThreads, iteration => { if (index + iteration < filesToUpload .Count) { uploadFile(filesToUpload [index + iteration], databaseConnection, iteration); } });
}
catch (System.ArgumentOutOfRangeException outOfRange) {
debug("Exception in Parallel.For [" + outOfRange.Message + "]");
}
However, if 4 files are small and the upload of each one takes 5 seconds, while the remaining one is big and takes 30 minutes, I will have, after the 4 files have been uploaded, only one file uploading, and I need to wait for it to finish before starting to upload others in the list.
After finishing uploading all the files in the list, my service goes to sleep, and then, when it wakes up again, it scans the file system again.
What is the strategy that best fits my needs? Is it advisable to go this route or will it create concurrency nightmares? I need to avoid uploading any file twice.

Why is this eating memory?

I wrote an application whose purpose is to read logs from a large table (90 million) and process them into easily understandable stats, how many, how long etc.
The first run took 7.5 hours and only had to process 27 of the 90 million. I would like to speed this up. So I am trying to run the queries in parallel. But when I run the below code, within a couple minutes I crash with an Out of Memory exception.
Environments:
Sync
Test : 26 Applications, 15 million logs, 5 million retrieved, < 20mb, takes 20 seconds
Production: 56 Applications, 90 million logs, 27 million retrieved, < 30mb, takes 7.5 hours
Async
Test : 26 Applications, 15 million logs, 5 million retrieved, < 20mb, takes 3 seconds
Production: 56 Applications, 90 million logs, 27 million retrieved, Memory Exception
public void Run()
{
List<Application> apps;
//Query for apps
using (var ctx = new MyContext())
{
apps = ctx.Applications.Where(x => x.Type == "TypeIWant").ToList();
}
var tasks = new Task[apps.Count];
for (int i = 0; i < apps.Count; i++)
{
var app = apps[i];
tasks[i] = Task.Run(() => Process(app));
}
//try catch
Task.WaitAll(tasks);
}
public void Process(Application app)
{
//Query for logs for time period
using (var ctx = new MyContext())
{
var logs = ctx.Logs.Where(l => l.Id == app.Id).AsNoTracking();
foreach (var log in logs)
{
Interlocked.Increment(ref _totalLogsRead);
var l = log;
Task.Run(() => ProcessLog(l, app.Id));
}
}
}
Is it ill advised to create 56 contexts?
Do I need to dispose and re-create contexts after a certain number of logs retrieved?
Perhaps I'm misunderstanding how the IQueryable is working? <-- My Guess
My understanding is that it will retrieve logs as needed, I guess that means for the loop is it like a yield? or is my issue that 56 'threads' call to the database and I am storing 27 million logs in memory?
Side question
The results don't really scale together. Based on the Test environment results i would expect Production would only take a few minutes. I assume the increase is directly related to the number of records in the table.
With 27 Million rows the problem is one of stream processing, not parallel execution. You need to approach the problem as you would with SQL Server's SSIS or any other ETL tools: each processing step is a transofrmation that processes its input and sends its output to the next step.
Parallel processing is achieved by using a separate thread to run each step. Some steps could also use multiple threads to process multiple inputs up to a limit. Setting limits to each step's thread count and input buffer ensures you can achieve maximum throughput without flooding your machine with waiting tasks.
.NET's TPL Dataflow addresses exactly this scenario. It provides blocks to transfrom inputs to outputs (TransformBlock), split collections to individual messages (TransformManyBlock), execute actions without transformations (ActionBlock), combine data in batches (BatchBlock) etc.
You can also specify the Maximum degree of parallelism for each step so that, eg. you have only 1 log queries executing at each time, but use 10 tasks for log processing.
In your case, you could:
Start with a TransformManyBlock that receives an application type and returns a list of app IDs
A TranformBlock reads the logs for a specific ID and sends them downstream
An ActionBlock processes the batch.
Step #3 could be broken to many other steps. Eg if you don't need to process all app log entries together, you can use a step to process individual entries. Or you could first group them by date.
Another option is to create a custom block to read data from the database using a DbDataReader and post each entry to the next step immediatelly, instead of waiting for all rows to return. This would allow you to process each entry as it arrives, instead of waiting to receive all entries.
If each app log contains many entries, this could be a huge memory and time saver

Windows Phone, Multiple HTTP request parallel, how many?

In my Windows Phone 8 app, Im fetching list of items from web api. After that I loop all items and get details for each Item.
Right now my code is something like this:
List<plane> planes = await planeService.getPlanes(); // Get all planes from web api
foreach(Plane plane in planes)
{
var details = await planeService.getDetails(plane.id); // Get one plane details from web api
drawDetails(details);
}
How can I improve this to make multiple request in parallel and what is resonable number of request running parallel? The planes list can be anything from 0 to 100 objects, typically max 20.
How can I improve this to make multiple request in parallel?
You can do the parallel processing like below (untested). It uses SemaphoreSlim to throttle getDetails requests.
async Task ProcessPlanes()
{
const int MAX_REQUESTS = 50;
List<plane> planes = await planeService.getPlanes(); // Get all planes from web api
var semaphore = new SemaphoreSlim(MAX_REQUESTS);
Func<string, Task<Details>> getDetailsAsync = async (id) =>
{
await semaphore.WaitAsync();
try
{
var details = await planeService.getDetails(id);
drawDetails(details);
return details;
}
finally
{
semaphore.Release();
}
};
var tasks = planes.Select((plane) =>
getDetailsAsync(plane.id));
await Task.WhenAll(tasks.ToArray());
}
what is resonable number of request running parallel? The planes list
can be anything from 0 to 100 objects, typically max 20.
It largely dependents on the server, but I don't think there's an ultimate answer to this. For example, check this question:
A reasonable number of simultaneous, asynchronous ajax requests
As far as the WP8 client goes, I believe it can spawn 100 parallel requests without a problem.
I don't know what the limit is for network connections, but there will be one.
If there wasn't, the only problem would be the amount of memory used to keep that many requests alive.
So, assuming the underlying operating system will handle throttling properly, I would do something this:
List<plane> planes = await planeService.getPlanes();
var allDetails = Task.WhenAll(from plane in plains
select planeService.getDetails(plane.id));
foreach(var details in allDetails)
{
drawDetails(details);
}
NOTE: You should follow common naming conventions to help others understand your code. Asynchronous methods should be suffixed Async and, in *C#, method names are always CamelCase.
You should check the ServicePoint, this will provides connection management for HTTP connections. The default maximum number of concurrent connections allowed by a ServicePoint object is 2. So if you need to increase it you can use ServicePointManager.DefaultConnectionLimit property. Just check the link in MSDN there you can see a sample. And set the value you need. This might help you..

Boosting performance on async web calls

Backgound: I must call a web service call 1500 times which takes roughly 1.3 seconds to complete. (No control over this 3rd party API.) total Time = 1500 * 1.3 = 1950 seconds / 60 seconds = 32 minutes roughly.
I came up with what I though was a good solution however it did not pan out that great.
So I changed the calls to async web calls thinking this would dramatically help my results it did not.
Example Code:
Pre-Optimizations:
foreach (var elmKeyDataElementNamed in findResponse.Keys)
{
var getRequest = new ElementMasterGetRequest
{
Key = new elmFullKey
{
CmpCode = CodaServiceSettings.CompanyCode,
Code = elmKeyDataElementNamed.Code,
Level = filterLevel
}
};
ElementMasterGetResponse getResponse;
_elementMasterServiceClient.Get(new MasterOptions(), getRequest, out getResponse);
elementList.Add(new CodaElement { Element = getResponse.Element, SearchCode = filterCode });
}
With Optimizations:
var tasks = findResponse.Keys.Select(elmKeyDataElementNamed => new ElementMasterGetRequest
{
Key = new elmFullKey
{
CmpCode = CodaServiceSettings.CompanyCode,
Code = elmKeyDataElementNamed.Code,
Level = filterLevel
}
}).Select(getRequest => _elementMasterServiceClient.GetAsync(new MasterOptions(), getRequest)).ToList();
Task.WaitAll(tasks.ToArray());
elementList.AddRange(tasks.Select(p => new CodaElement
{
Element = p.Result.GetResponse.Element,
SearchCode = filterCode
}));
Smaller Sampling Example:
So to easily test I did a smaller sampling of 40 records this took 60 seconds with no optimizations with the optimizations it only took 50 seconds. I would have though it would have been closer to 30 or better.
I used wireshark to watch the transactions come through and realized the async way was not sending as fast I assumed it would have.
Async requests captured
Normal no optimization
You can see that the asnyc pushes a few very fast then drops off...
Also note that between requests 10 and 11 it took nearly 3 seconds.
Is the overhead for creating threads for the tasks that slow that it takes seconds?
Note: The tasks I am referring to are the 4.5 TAP task library.
Why wouldn't the request come faster than that.
I was told the Apache web server I was hitting could hold 200 max threads so I don't see an issue there..
Am I not thinking about this clearly?
When calling web services are there little advantages from async requests?
Do I have a code mistake?
Any ideas would be great.
After many days of searching I found this post that solved my problem:
Trying to run multiple HTTP requests in parallel, but being limited by Windows (registry)
The reason that the request was not hitting the server quicker was due too the my client side code and nothing to do with the server. By default C# only allows 2 concurrent requests.
see here: http://msdn.microsoft.com/en-us/library/system.net.servicepointmanager.defaultconnectionlimit.aspx
I simply added this line of code and then all request came through in milliseconds.
System.Net.ServicePointManager.DefaultConnectionLimit = 50;

Categories