Azure Blob Storage DownloadToStreamAsync hangs during network change - c#

I've been having an issue with the Microsoft.WindowsAzure.Storage v9.3.3 and Microsoft.Azure.Storage.Blob v11.1.0 NuGet libraries. Specifically when download a large file. If you change your network during the "DownloadToStreamAsync" method the call hangs. I've been seeing my code, which processes a lot of files, hang occasionally and I've been trying to narrow it down. I think the network change might be a reliable way of triggering some failure in the Azure Blob Storage Libraries.
More info about the issue;
When I unplug my network cable my computer switches to WiFi but the request never resumes
If I start the download on WiFi and then plug in my network cable the same error occurs
The “ServerTimeout” property never fails the request or acts as expected in accordance to the Documentation
The “MaximumExecutionTime” property does fail the request but we don’t want to limit ourselves to a certain time period, especially because we’re dealing with large files
The following code fails 100% of the time if the network is changed during the call.
static void Main(string[] args)
{
try
{
CloudStorageAccount.TryParse("<Connection String>", out var storageAccount);
var cloudBlobClient = storageAccount.CreateCloudBlobClient();
var container = cloudBlobClient.GetContainerReference("<Container Reference>");
var blobRef = container.GetBlockBlobReference("Large Text.txt");
Stream memoryStream = new MemoryStream();
BlobRequestOptions optionsWithRetryPolicy = new BlobRequestOptions() { ServerTimeout = TimeSpan.FromSeconds(5), RetryPolicy = new LinearRetry(TimeSpan.FromSeconds(20), 4) };
blobRef.DownloadToStreamAsync(memoryStream, null, optionsWithRetryPolicy, null).GetAwaiter().GetResult();
Console.WriteLine("Completed");
}
catch (Exception ex)
{
Console.WriteLine($"Exception: {ex.Message}");
}
finally
{
Console.WriteLine("Finished");
}
}
I've found this active issue in the Azure Storage GitHub but it seems inactive.
Is there any other approach I could take to reliably and efficiently download a blob or something I'm missing when using this package?

Thanks to Mohit for the suggestion.
Create a Task to check the stream length in the background
If the stream hasn't increased in a set period of time, cancel the DownloadToStreamAsync
DISCLAIMER: I haven't written tests around this code or how to make it run in a performant way as you couldn't have a wait like this for every file you process. I might need to cancel the initial task if the download completes, I don't know yet, I just wanted to get it working first. I don't deem it production ready.
// Create download cancellation token
var downloadCancellationTokenSource = new CancellationTokenSource();
var downloadCancellationToken = downloadCancellationTokenSource.Token;
var completedChecking = false;
// A background task to confirm the download is still progressing
Task.Run(() =>
{
// Allow the download to start
Task.Delay(TimeSpan.FromSeconds(2)).GetAwaiter().GetResult();
long currentStreamLength = 0;
var currentRetryCount = 0;
var availableRetryCount = 5;
// Keep the checking going during the duration of the Download
while (!completedChecking)
{
Console.WriteLine("Checking");
if (currentRetryCount == availableRetryCount)
{
Console.WriteLine($"RETRY WAS {availableRetryCount} - FAILING TASK");
downloadCancellationTokenSource.Cancel();
completedChecking = true;
}
if (currentStreamLength == memoryStream.Length)
{
currentRetryCount++;
Console.WriteLine($"Length has not increased. Incremented Count: {currentRetryCount}");
Task.Delay(TimeSpan.FromSeconds(10)).GetAwaiter().GetResult();
}
else
{
currentStreamLength = memoryStream.Length;
Console.WriteLine($"Download in progress: {currentStreamLength}");
currentRetryCount = 0;
Task.Delay(TimeSpan.FromSeconds(1)).GetAwaiter().GetResult();
}
}
});
Console.WriteLine("Starting Download");
blobRef.DownloadToStreamAsync(memoryStream, downloadCancellationToken).GetAwaiter().GetResult();
Console.WriteLine("Completed Download");
completedChecking = true;
Console.WriteLine("Completed");

Related

HttpClient.SendAsync processes two requests at a time when the limit is higher

I have a Windows service that reads data from the database and processes this data using multiple REST API calls.
Originally, this service ran on a timer where it would read unprocessed data from the database and process it using multiple threads limited using SemaphoreSlim. This worked well except that the database read had to wait for all processing to finish before reading again.
ServicePointManager.DefaultConnectionLimit = 10;
Original that works:
// Runs every 5 seconds on a timer
private void ProcessTimer_Elapsed(object sender, ElapsedEventArgs e)
{
var hasLock = false;
try
{
Monitor.TryEnter(timerLock, ref hasLock);
if (hasLock)
{
ProcessNewData();
}
else
{
log.Info("Failed to acquire lock for timer."); // This happens all of the time
}
}
finally
{
if (hasLock)
{
Monitor.Exit(timerLock);
}
}
}
public void ProcessNewData()
{
var unproceesedItems = GetDatabaseItems();
if (unproceesedItems.Count > 0)
{
var downloadTasks = new Task[unproceesedItems.Count];
var maxThreads = new SemaphoreSlim(semaphoreSlimMinMax, semaphoreSlimMinMax); // semaphoreSlimMinMax = 10 is max threads
for (var i = 0; i < unproceesedItems .Count; i++)
{
maxThreads.Wait();
var iClosure = i;
downloadTasks[i] =
Task.Run(async () =>
{
try
{
await ProcessItemsAsync(unproceesedItems[iClosure]);
}
catch (Exception ex)
{
// handle exception
}
finally
{
maxThreads.Release();
}
});
}
Task.WaitAll(downloadTasks);
}
}
To improve efficiency, I rewrite the service to run GetDatabaseItems in a separate thread from the rest so that there is a ConcurrentDictionary of unprocessed items between them that GetDatabaseItems fills and ProcessNewData empties.
The problem is that while 10 unprocessed items are send to ProcessItemsAsync, they are processed two at a time instead of all 10.
The code inside of ProcessItemsAsync calls var response = await client.SendAsync(request); where the delay occurs. All 10 threads make it to this code but come out of it two at a time. None of this code changed between the old version and the new.
Here is the code in the new version that did change:
public void Start()
{
ServicePointManager.DefaultConnectionLimit = maxSimultaneousThreads; // 10
// Start getting unprocessed data
getUnprocessedDataTimer.Interval = getUnprocessedDataInterval; // 5 seconds
getUnprocessedDataTimer.Elapsed += GetUnprocessedData; // writes data into a ConcurrentDictionary
getUnprocessedDataTimer.Start();
cancellationTokenSource = new CancellationTokenSource();
// Create a new thread to process data
Task.Factory.StartNew(() =>
{
try
{
ProcessNewData(cancellationTokenSource.Token);
}
catch (Exception ex)
{
// error handling
}
}, TaskCreationOptions.LongRunning
);
}
private void ProcessNewData(CancellationToken token)
{
// Check if task has been canceled.
while (!token.IsCancellationRequested)
{
if (unprocessedDictionary.Count > 0)
{
try
{
var throttler = new SemaphoreSlim(maxSimultaneousThreads, maxSimultaneousThreads); // maxSimultaneousThreads = 10
var tasks = unprocessedDictionary.Select(async item =>
{
await throttler.WaitAsync(token);
try
{
if (unprocessedDictionary.TryRemove(item.Key, out var item))
{
await ProcessItemsAsync(item);
}
}
catch (Exception ex)
{
// handle error
}
finally
{
throttler.Release();
}
});
Task.WhenAll(tasks);
}
catch (OperationCanceledException)
{
break;
}
}
Thread.Sleep(1000);
}
}
Environment
.NET Framework 4.7.1
Windows Server 2016
Visual Studio 2019
Attempts to fix:
I tried the following with the same bad result (two await client.SendAsync(request) completing at a time):
Set Max threads and ServicePointManager.DefaultConnectionLimit to 30
Manually create threads using Thread.Start()
Replace async/await pattern with sync HttpClient calls
Call data processing using Task.Run(async () => and Task.WaitAll(downloadTasks);
Replace the new long-running thread for ProcessNewData with a timer
What I want is to run GetUnprocessedData and ProcessNewData concurrently with an HttpClient connection limit of 10 (set in config) so that 10 requests are processed at the same time.
Note: the issue is similar to HttpClient.GetAsync executes only 2 requests at a time? but the DefaultConnectionLimit is increased and the service runs on a Windows Server. It also creates more than 2 connections when original code runs.
Update
I went back to the original project to make sure it still worked, it did. I added a new timer to perform some unrelated operations and the httpClient issue came back. I removed the timer, everything worked. I added a new thread to do parallel processing, the problem came back.
This is not a direct answer to your question, but a suggestion for simplifying your service that could make the debugging of any problem easier. My suggestion is to implement the producer-consumer pattern using an iterator for producing the unprocessed items, and a parallel loop for consuming them. Ideally the parallel loop would have async delegates, but since you are targeting the .NET Framework you don't have access to the .NET 6 method Parallel.ForEachAsync. So I will suggest the slightly wasteful approach of using a synchronous parallel loop that blocks threads. You could use either the Parallel.ForEach method, or the PLINQ like in the example below:
private IEnumerable<Item> Iterator(CancellationToken token)
{
while (true)
{
Task delayTask = Task.Delay(5000, token);
foreach (Item item in GetDatabaseItems()) yield return item;
delayTask.GetAwaiter().GetResult();
}
}
public void Start()
{
//...
ThreadPool.SetMinThreads(degreeOfParallelism, Environment.ProcessorCount);
new Thread(() =>
{
try
{
Partitioner
.Create(Iterator(token), EnumerablePartitionerOptions.NoBuffering)
.AsParallel()
.WithDegreeOfParallelism(degreeOfParallelism)
.WithCancellation(token)
.ForAll(item => ProcessItemAsync(item).GetAwaiter().GetResult());
}
catch (OperationCanceledException) { } // Ignore
}).Start();
}
Online demo.
The Iterator fetches unprocessed items from the database in batches, and yields them one by one. The database won't be hit more frequently than once every 5 seconds.
The PLINQ query is going to fetch a new item from the Iterator each time it has a worker available, according to the WithDegreeOfParallelism policy. The setting EnumerablePartitionerOptions.NoBuffering ensures that it won't try to fetch more items in advance.
The ThreadPool.SetMinThreads is used in order to boost the availability of ThreadPool threads, since the PLINQ is going to use lots of them. Without it the ThreadPool will not be able to satisfy the demand immediately, although it will gradually inject more threads and eventually will catch up. But since you already know how many threads you'll need, you can configure the ThreadPool from the start.
In case you dislike the idea of blocking threads, you can find a simple substitute of the Parallel.ForEachAsync here, based on the TPL Dataflow library. It requires installing a NuGet package.
The issue turned out to be the place where ServicePointManager.DefaultConnectionLimit is set.
In the version where HttpClient was only doing two requests at a time, ServicePointManager.DefaultConnectionLimit was being set before the threads were being created but after the HttpClient was initialized.
Once I moved it into the constructor before the HttpClient is initialized, everything started working.
Thank you very much to #Theodor Zoulias for the help.
TLDR; Set ServicePointManager.DefaultConnectionLimit before initializing the HttpClient.

How to correctly close the EventHubReceiver when working with Azure IoT in C#?

I am writing an application that should be able to read and display IoT data. The basic functionality works for me with this code (I removed some checks etc so that the code would the shorter):
public void Run()
{
_eventHubClient = EventHubClient.CreateFromConnectionString(ConnectionString, "messages/events");
var partitions = _eventHubClient.GetRuntimeInformation().PartitionIds;
cts = new CancellationTokenSource();
var tasks = partitions.Select(partition => ReceiveMessagesFromDeviceAsync(partition, cts.Token));
Task.WaitAll(tasks.ToArray());
}
public void Cancel()
{
cts.Cancel();
}
private async Task ReceiveMessagesFromDeviceAsync(string partition, CancellationToken cancellationToken)
{
var eventHubReceiver = _eventHubClient.GetDefaultConsumerGroup().CreateReceiver(partition, DateTime.UtcNow);
while (true)
{
if (cancellationToken.IsCancellationRequested)
{
break;
}
var eventData = await eventHubReceiver.ReceiveAsync(new TimeSpan(0,0,1));
var data = Encoding.UTF8.GetString(eventData.GetBytes());
Console.WriteLine("Message received at {2}. Partition: {0} Data: '{1}'", partition, data, eventData.EnqueuedTimeUtc);
}
}
My problem is that I need to be able to stop and restart the connection again. Everything works okay until the moment when I start it for the 6th time, then I get the "QuotaExceededException": "Exceeded the maximum number of allowed receivers per partition in a consumer group which is 5". I have googled the exception and I understand the problem, what I don't know is how to correctly close the previous receivers after I close a connection, so that I could open it again later. I have tried calling
eventHubReceiver.Close()
in the Cancel() method but it didn't seem to help.
I would be very grateful for any hints on how to solve this, thanks.

Multiple connections with TcpClient, second connection always hangs/does nothing

So I have a TcpClient in a console app that is listening on port 9096. I want the client to be able to handle multiple connections (simultaneous or not). I also do not want to use Threads. I want to use async/await. I also need to be able to gracefully close the app during certain events, being careful not to lose any data. So I need a cancellation token. I have the code mostly working but there are two issues.
First, when the app starts listening and I send it data; everything works correctly as long as the sender is using the same initial connection to the app. Once a new connection (or socket I guess? not clear on the terminology) is established the app does not process the new data.
Second, when the terminate signal is given to the app and the token is canceled the app does not close. I am not getting any exceptions and I cannot figure out what I an doing wrong.
I have looked all over and cannot find an example of a TcpClient that uses async/await with a cancellation token. I also cannot find an example that I have been able to get working that correctly processes multiple connections, without using Threads or other complicated designs. I want the design as simple as possible with as little code as possible while still meeting my requirements. If using threads is the only way to do it I will, but I am soo close to getting it right I feel like I am just missing a little thing.
I am trying to figure this out at my wits end and have exhausted all my ideas.
EDIT: I moved the AcceptTcpClientAsync into the loop as suggested below and it did not change anything. The app functions the same as before.
Program.cs
class Program
{
private static List<Task> _listeners = new List<Task>();
private static readonly CancellationTokenSource cancelSource = new CancellationTokenSource();
static void Main(string[] args)
{
Console.TreatControlCAsInput = false;
Console.CancelKeyPress += (o, e) => {
Console.WriteLine("Shutting down.");
cancelSource.Cancel();
};
Console.WriteLine("Started, press ctrl + c to terminate.");
_listeners.Add(Listen(cancelSource.Token));
cancelSource.Token.WaitHandle.WaitOne();
Task.WaitAll(_listeners.ToArray(), cancelSource.Token);
}
}
Listen
public async Task Listen(CancellationToken token){
var listener = new TcpListener(IPAddress.Parse("0.0.0.0"), 9096);
listener.Start();
Console.WriteLine("Listening on port 9096");
while (!token.IsCancellationRequested) {
// Also tried putting AcceptTcpClientAsync here.
await Task.Run(async () => {
var client = await listener.AcceptTcpClientAsync();
using (var stream = client.GetStream())
using (var streamReader = new StreamReader(stream, Encoding.UTF8))
using (var streamWriter = new StreamWriter(stream, Encoding.UTF8)) {
while (!token.IsCancellationRequested) {
// DO WORK WITH DATA RECEIVED
vat data = await streamReader.ReadAsync();
await streamWriter.WriteLineAsync("Request received.");
}
}
});
}
Console.WriteLine("Stopped Accepting Requests.");
listener.Server.Close();
listener.Stop();
}
This is actually working the way you designed it, however you have only built to receive one connection. I am not going to write a full socket implementation for you (as this can get fairly in-depth). However, as for your main problem, you need to put the AcceptTcpClientAsync in the loop otherwise you won't get any more connections:
var cancellation = new CancellationTokenSource();
...
var listener = new TcpListener(...);
listener.Start();
try
{
while (!token.IsCancellationRequested)
{
var client = await listener.AcceptTcpClientAsync()
...
}
}
finally
{
listener.Stop();
}
// somewhere in another thread
cancellation.Cancel();
Update
I tried that and no behavior changed. Still does not pick up any
connection after the first.
await ...
while (!token.IsCancellationRequested) {
// DO WORK WITH DATA RECEIVED
It's obvious that AcceptTcpClientAsync will never get called again because you are awaiting the task. This method is what accepts the client, if you can't call it, you don't get any more clients.
You cannot block here, which is what you are doing. Please see some socket server examples to get a better idea of how to write a listener.

Azure Service Bus as Web Job throw error The lock supplied is invalid

Currently I'm tring to implement my service queue bus on web job. The process that i'm perform with each message is taking about 5 - 30 seconds. While I'm not getting many messages in same time it's running ok, without any exceptions. Otherwise I'm getting this error: The lock supplied is invalid. Either the lock expired, or the message has already been removed from the queue.
I'm read something about time that I should use to avoid of this error, but it doesn't help me (I'm still getting this error) and I dont' know why it's happen? Maybebe someone stack on similiar problem and solve it with other solution that i use (I'm change MaxAutoRenewDuration to 5 minutes).
Maybe is something wrong with my web job implementation ?
Here's my code:
static void Main(string[] args)
{
MainAsync().GetAwaiter().GetResult();
}
static async Task MainAsync()
{
JobHostConfiguration config = new JobHostConfiguration();
config.Tracing.ConsoleLevel = System.Diagnostics.TraceLevel.Error;
queueClient = new QueueClient(ServiceBusConnectionString, QueueName);
RegisterOnMessageHandlerAndReceiveMessages();
JobHost host = new JobHost(config);
if (config.IsDevelopment)
{
config.UseDevelopmentSettings();
}
host.RunAndBlock();
}
static void RegisterOnMessageHandlerAndReceiveMessages()
{
var messageHandlerOptions = new MessageHandlerOptions(ExceptionReceivedHandler)
{
MaxConcurrentCalls = 1,
MaxAutoRenewDuration = TimeSpan.FromMinutes(5),
AutoComplete = false
};
queueClient.RegisterMessageHandler(ProcessMessagesAsync, messageHandlerOptions);
}
static async Task ProcessMessagesAsync(Message message, CancellationToken token)
{
var watch = System.Diagnostics.Stopwatch.StartNew();
Console.WriteLine("----------------------------------------------------");
try
{
Thread.Sleep(15000); // average time of actions that i perform
watch.Stop();
var elapsedMs = watch.ElapsedMilliseconds;
var results = true;
}
catch (Exception ex)
{
}
Console.WriteLine("----------------------------------------------------");
await queueClient.CompleteAsync(message.SystemProperties.LockToken);
}
MessageHandlerOptions has a ExceptionReceivedHandler callback you could use to get more details about the failure.
Losing lock can take place, especially if client fails to communicate back to the server on time or there are intermittent failures Azure Service Bus retries itself, but takes time. Normal LockDuration time is 60 seconds, so your sample code should have worked. It could be that you're experiencing connectiving issues that are retried by the client and by then lock is expired. Another option, clock skew between your local machine and the server, which speeds up lock expiration. You could sync the clock to eliminate that.
Note that MaxAutoRenewDuration is not as effective as LockDuration. It's better to set the LockDuration to the maximum that rely on MaxAutoRenewDuration.
In case this code is not what you've used to repro the issue, please share the details.

Handle multiple threads, one out one in, in a timed loop

I need to process a large number of files overnight, with a defined start and end time to avoid disrupting users. I've been investigating but there are so many ways of handling threading now that I'm not sure which way to go. The files come into an Exchange inbox as attachments.
My current attempt, based on some examples from here and a bit of experimentation, is:
while (DateTime.Now < dtEndTime.Value)
{
var finished = new CountdownEvent(1);
for (int i = 0; i < numThreads; i++)
{
object state = offset;
finished.AddCount();
ThreadPool.QueueUserWorkItem(delegate
{
try
{
StartProcessing(state);
}
finally
{
finished.Signal();
}
});
offset += numberOfFilesPerPoll;
}
finished.Signal();
finished.Wait();
}
It's running in a winforms app at the moment for ease, but the core processing is in a dll so I can spawn the class I need from a windows service, from a console running under a scheduler, however is easiest. I do have a Windows Service set up with a Timer object that kicks off the processing at a time set in the config file.
So my question is - in the above code, I initialise a bunch of threads (currently 10), then wait for them all to process. My ideal would be a static number of threads, where as one finishes I fire off another, and then when I get to the end time I just wait for all threads to complete.
The reason for this is that the files I'm processing are variable sizes - some might take seconds to process and some might take hours, so I don't want the whole application to wait while one thread completes if I can have it ticking along in the background.
(edit)As it stands, each thread instantiates a class and passes it an offset. The class then gets the next x emails from the inbox, starting at the offset (using the Exchange Web Services paging functionality). As each file is processed, it's moved to a separate folder. From some of the replies so far, I'm wondering if actually I should grab the e-mails in the outer loop, and spawn threads as needed.
To cloud the issue, I currently have a backlog of e-mails that I'm trying to process through. Once the backlog has been cleared, it's likely that the nightly run will have a significantly lower load.
On average there are around 1000 files to process each night.
Update
I've rewritten large chunks of my code so that I can use the Parallel.Foreach and I've come up against an issue with thread safety. The calling code now looks like this:
public bool StartProcessing()
{
FindItemsResults<Item> emails = GetEmails();
var source = new CancellationTokenSource(TimeSpan.FromHours(10));
// Process files in parallel, with a maximum thread count.
var opts = new ParallelOptions { MaxDegreeOfParallelism = 8, CancellationToken = source.Token };
try
{
Parallel.ForEach(emails, opts, processAttachment);
}
catch (OperationCanceledException)
{
Console.WriteLine("Loop was cancelled.");
}
catch (Exception err)
{
WriteToLogFile(err.Message + "\r\n");
WriteToLogFile(err.StackTrace + "r\n");
}
return true;
}
So far so good (excuse temporary error handling). I have a new issue now with the fact that the properties of the "Item" object, which is an email, not being threadsafe. So for example when I start processing an e-mail, I move it to a "processing" folder so that another process can't grab it - but it turns out that several of the threads might be trying to process the same e-mail at a time. How do I guarantee that this doesn't happen? I know I need to add a lock, can I add this in the ForEach or should it be in the processAttachments method?
Use the TPL:
Parallel.ForEach( EnumerateFiles(),
new ParallelOptions { MaxDegreeOfParallelism = 10 },
file => ProcessFile( file ) );
Make EnumerateFiles stop enumerating when your end time is reached, trivially like this:
IEnumerable<string> EnumerateFiles()
{
foreach (var file in Directory.EnumerateFiles( "*.txt" ))
if (DateTime.Now < _endTime)
yield return file;
else
yield break;
}
You can use a combination of Parallel.ForEach() along with a cancellation token source which will cancel the operation after a set time:
using System;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
static class Program
{
static Random rng = new Random();
static void Main()
{
// Simulate having a list of files.
var fileList = Enumerable.Range(1, 100000).Select(i => i.ToString());
// For demo purposes, cancel after a few seconds.
var source = new CancellationTokenSource(TimeSpan.FromSeconds(10));
// Process files in parallel, with a maximum thread count.
var opts = new ParallelOptions {MaxDegreeOfParallelism = 8, CancellationToken = source .Token};
try
{
Parallel.ForEach(fileList, opts, processFile);
}
catch (OperationCanceledException)
{
Console.WriteLine("Loop was cancelled.");
}
}
static void processFile(string file)
{
Console.WriteLine("Processing file: " + file);
// Simulate taking a varying amount of time per file.
int delay;
lock (rng)
{
delay = rng.Next(200, 2000);
}
Thread.Sleep(delay);
Console.WriteLine("Processed file: " + file);
}
}
}
As an alternative to using a cancellation token, you can write a method that returns IEnumerable<string> which returns the list of filenames, and stop returning them when time is up, for example:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
static class Program
{
static Random rng = new Random();
static void Main()
{
// Process files in parallel, with a maximum thread count.
var opts = new ParallelOptions {MaxDegreeOfParallelism = 8};
Parallel.ForEach(fileList(), opts, processFile);
}
static IEnumerable<string> fileList()
{
// Simulate having a list of files.
var fileList = Enumerable.Range(1, 100000).Select(x => x.ToString()).ToArray();
// Simulate finishing after a few seconds.
DateTime endTime = DateTime.Now + TimeSpan.FromSeconds(10);
int i = 0;
while (DateTime.Now <= endTime)
yield return fileList[i++];
}
static void processFile(string file)
{
Console.WriteLine("Processing file: " + file);
// Simulate taking a varying amount of time per file.
int delay;
lock (rng)
{
delay = rng.Next(200, 2000);
}
Thread.Sleep(delay);
Console.WriteLine("Processed file: " + file);
}
}
}
Note that you don't need the try/catch with this approach.
You should consider using Microsoft's Reactive Framework. It lets you use LINQ queries to process multithreaded asynchronous processing in a very simple way.
Something like this:
var query =
from file in filesToProcess.ToObservable()
where DateTime.Now < stopTime
from result in Observable.Start(() => StartProcessing(file))
select new { file, result };
var subscription =
query.Subscribe(x =>
{
/* handle result */
});
Truly, that's all the code you need if StartProcessing is already defined.
Just NuGet "Rx-Main".
Oh, and to stop processing at any time just call subscription.Dispose().
This was a truly fascinating task, and it took me a while to get the code to a level that I was happy with it.
I ended up with a combination of the above.
The first thing worth noting is that I added the following lines to my web service call, as the operation timeout I was experiencing, and which I thought was because I'd exceeded some limit set on the endpoint, was actually due to a limit set by microsoft way back in .Net 2.0:
ServicePointManager.DefaultConnectionLimit = int.MaxValue;
ServicePointManager.Expect100Continue = false;
See here for more information:
What to set ServicePointManager.DefaultConnectionLimit to
As soon as I added those lines of code, my processing increased from 10/minute to around 100/minute.
But I still wasn't happy with the looping, and partitioning etc. My service moved onto a physical server to minimise CPU contention, and I wanted to allow the operating system to dictate how fast it ran, rather than my code throttling it.
After some research, this is what I ended up with - arguably not the most elegant code I've written, but it's extremely fast and reliable.
List<XElement> elements = new List<XElement>();
while (XMLDoc.ReadToFollowing("ElementName"))
{
using (XmlReader r = XMLDoc.ReadSubtree())
{
r.Read();
XElement node = XElement.Load(r);
//do some processing of the node here...
elements.Add(node);
}
}
//And now pass the list of elements through PLinQ to the actual web service call, allowing the OS/framework to handle the parallelism
int failCount=0; //the method call below sets this per request; we log and continue
failCount = elements.AsParallel()
.Sum(element => IntegrationClass.DoRequest(element.ToString()));
It ended up fiendishly simple and lightning fast.
I hope this helps someone else trying to do the same thing!

Categories