Stop hanging synchronous method - c#

There is a method HTTP_actions.put_import() in XenAPI, which is synchronous and it supports cancellation via its delegate.
I have the following method:
private void UploadImage(.., Func<bool> isTaskCancelled)
{
try
{
HTTP_actions.put_import(
cancellingDelegate: () => isTaskCancelled(),
...);
}
catch (HTTP.CancelledException exception)
{
}
}
It so happens that in some cases this method HTTP_actions.put_import hangs and doesn't react to isTaskCancelled(). In that case the whole application also hangs.
I can run this method in a separate thread and kill it forcefully once I receive cancellation signal, but this method doesn't always hang and sometimes I want to gracefully cancel this method. Only when this method is really hanging, I want to kill it myself.
What is the best way to handle such situation?

Wrote blog post for below : http://pranayamr.blogspot.in/2017/12/abortcancel-task.html
Tried lot of solution since last 2 hr for you and I come up with below working solution , please have try it out
class Program
{
//capture request running that , which need to be cancel in case
// it take more time
static Thread threadToCancel = null;
static async Task<string> DoWork(CancellationToken token)
{
var tcs = new TaskCompletionSource<string>();
//enable this for your use
//await Task.Factory.StartNew(() =>
//{
// //Capture the thread
// threadToCancel = Thread.CurrentThread;
// HTTP_actions.put_import(...);
//});
//tcs.SetResult("Completed");
//return tcs.Task.Result;
//comment this whole this is just used for testing
await Task.Factory.StartNew(() =>
{
//Capture the thread
threadToCancel = Thread.CurrentThread;
//Simulate work (usually from 3rd party code)
for (int i = 0; i < 100000; i++)
{
Console.WriteLine($"value {i}");
}
Console.WriteLine("Task finished!");
});
tcs.SetResult("Completed");
return tcs.Task.Result;
}
public static void Main()
{
var source = new CancellationTokenSource();
CancellationToken token = source.Token;
DoWork(token);
Task.Factory.StartNew(()=>
{
while(true)
{
if (token.IsCancellationRequested && threadToCancel!=null)
{
threadToCancel.Abort();
Console.WriteLine("Thread aborted");
}
}
});
///here 1000 can be replace by miliseconds after which you want to
// abort thread which calling your long running method
source.CancelAfter(1000);
Console.ReadLine();
}
}

Here is my final implementation (based on Pranay Rana's answer).
public class XenImageUploader : IDisposable
{
public static XenImageUploader Create(Session session, IComponentLogger parentComponentLogger)
{
var logger = new ComponentLogger(parentComponentLogger, typeof(XenImageUploader));
var taskHandler = new XenTaskHandler(
taskReference: session.RegisterNewTask(UploadTaskName, logger),
currentSession: session);
return new XenImageUploader(session, taskHandler, logger);
}
private XenImageUploader(Session session, XenTaskHandler xenTaskHandler, IComponentLogger logger)
{
_session = session;
_xenTaskHandler = xenTaskHandler;
_logger = logger;
_imageUploadingHasFinishedEvent = new AutoResetEvent(initialState: false);
_xenApiUploadCancellationReactionTime = new TimeSpan();
}
public Maybe<string> Upload(
string imageFilePath,
XenStorage destinationStorage,
ProgressToken progressToken,
JobCancellationToken cancellationToken)
{
_logger.WriteDebug("Image uploading has started.");
var imageUploadingThread = new Thread(() =>
UploadImageOfVirtualMachine(
imageFilePath: imageFilePath,
storageReference: destinationStorage.GetReference(),
isTaskCancelled: () => cancellationToken.IsCancellationRequested));
imageUploadingThread.Start();
using (new Timer(
callback: _ => WatchForImageUploadingState(imageUploadingThread, progressToken, cancellationToken),
state: null,
dueTime: TimeSpan.Zero,
period: TaskStatusUpdateTime))
{
_imageUploadingHasFinishedEvent.WaitOne(MaxTimeToUploadSvm);
}
cancellationToken.PerformCancellationIfRequested();
return _xenTaskHandler.TaskIsSucceded
? new Maybe<string>(((string) _xenTaskHandler.Result).GetOpaqueReferenceFromResult())
: new Maybe<string>();
}
public void Dispose()
{
_imageUploadingHasFinishedEvent.Dispose();
}
private void UploadImageOfVirtualMachine(string imageFilePath, XenRef<SR> storageReference, Func<bool> isTaskCancelled)
{
try
{
_logger.WriteDebug("Uploading thread has started.");
HTTP_actions.put_import(
progressDelegate: progress => { },
cancellingDelegate: () => isTaskCancelled(),
timeout_ms: -1,
hostname: new Uri(_session.Url).Host,
proxy: null,
path: imageFilePath,
task_id: _xenTaskHandler.TaskReference,
session_id: _session.uuid,
restore: false,
force: false,
sr_id: storageReference);
_xenTaskHandler.WaitCompletion();
_logger.WriteDebug("Uploading thread has finished.");
}
catch (HTTP.CancelledException exception)
{
_logger.WriteInfo("Image uploading has been cancelled.");
_logger.WriteInfo(exception.ToDetailedString());
}
_imageUploadingHasFinishedEvent.Set();
}
private void WatchForImageUploadingState(Thread imageUploadingThread, ProgressToken progressToken, JobCancellationToken cancellationToken)
{
progressToken.Progress = _xenTaskHandler.Progress;
if (!cancellationToken.IsCancellationRequested)
{
return;
}
_xenApiUploadCancellationReactionTime += TaskStatusUpdateTime;
if (_xenApiUploadCancellationReactionTime >= TimeForXenApiToReactOnCancel)
{
_logger.WriteWarning($"XenApi didn't cancel for {_xenApiUploadCancellationReactionTime}.");
if (imageUploadingThread.IsAlive)
{
try
{
_logger.WriteWarning("Trying to forcefully abort uploading thread.");
imageUploadingThread.Abort();
}
catch (Exception exception)
{
_logger.WriteError(exception.ToDetailedString());
}
}
_imageUploadingHasFinishedEvent.Set();
}
}
private const string UploadTaskName = "Xen image uploading";
private static readonly TimeSpan TaskStatusUpdateTime = TimeSpan.FromSeconds(1);
private static readonly TimeSpan TimeForXenApiToReactOnCancel = TimeSpan.FromSeconds(10);
private static readonly TimeSpan MaxTimeToUploadSvm = TimeSpan.FromMinutes(20);
private readonly Session _session;
private readonly XenTaskHandler _xenTaskHandler;
private readonly IComponentLogger _logger;
private readonly AutoResetEvent _imageUploadingHasFinishedEvent;
private TimeSpan _xenApiUploadCancellationReactionTime;
}

HTTP_actions.put_import
calls
HTTP_actions.put
calls
HTTP.put
calls
HTTP.CopyStream
The delegate is passed to CopyStream which then checks that the function isn’t null (not passed) or true (return value). However, it only does this at the While statement so the chances are it is the Read of the Stream that is causing the blocking operation. Though it could also occur in the progressDelegate if one is used.
To get around this, put the call to HTTP.put_import() inside a task or background thread and then separately check for cancellation or a return from the task/thread.
Interestingly enough, a quick glance at that CopyStream code revealed a bug to me. If the function that works out if a process has been cancelled returns a different value based off some check it is making, you can actually get the loop to exit without generating a CancelledException(). The result of the CancelledException call should be stored in a local variable.

Related

Alternatives for Monitor (Wait, PluseAll) in Async Tasks in C#

I implemented Task synchronization using Monitor in C#.
However, I have read Monitor should not be used in asynchronous operation.
In the below code, how do I implement Monitor methods Wait and PulseAll with a construct that works with Task (asynchronous operations).
I have read that SemaphoreSlim.WaitAsync and Release methods can help.
But how do they fit in the below sample where multiple tasks need to wait on a lock object, and releasing the lock wakes up all waiting tasks ?
private bool m_condition = false;
private readonly Object m_lock = new Object();
private async Task<bool> SyncInteralWithPoolingAsync(
SyncDatabase db,
List<EntryUpdateInfo> updateList)
{
List<Task> activeTasks = new List<Task>();
int addedTasks = 0;
int removedTasks = 0;
foreach (EntryUpdateInfo entryUpdateInfo in updateList)
{
Monitor.Enter(m_lock);
//If 5 tasks are waiting in ProcessEntryAsync method
if(m_count >= 5)
{
//Do some batch processing to obtian values to set for adapterEntry.AdapterEntryId in ProcessEntryAsync
//.......
//.......
m_condition = true;
Monitor.PulseAll(m_lock); // Wakes all waiters AFTER lock is released
}
Monitor.Exit(m_lock);
removedTasks += activeTasks.RemoveAll(t => t.IsCompleted);
Task processingTask = Task.Run(
async () =>
{
await this.ProcessEntryAsync(
entryUpdateInfo,
db)
.ContinueWith(this.ProcessEntryCompleteAsync)
.ConfigureAwait(false);
});
activeTasks.Add(processingTask);
addedTasks++;
}
}
private async Task<bool> ProcessEntryAsync(SyncDatabase db, EntryUpdateInfo entryUpdateInfo)
{
SyncEntryAdapterData adapterEntry =
updateInfo.Entry.AdapterEntries.FirstOrDefault(e => e.AdapterId == this.Config.Id);
if (adapterEntry == null)
{
adapterEntry = new SyncEntryAdapterData()
{
SyncEntry = updateInfo.Entry,
AdapterId = this.Config.Id
};
updateInfo.Entry.AdapterEntries.Add(adapterEntry);
}
m_condition = false;
Monitor.Enter(m_lock);
while (!m_condition)
{
m_count++;
Monitor.Wait(m_lock);
}
m_count--;
adapterEntry.AdapterEntryId = .... //Set Value obtained form batch processing
Monitor.Exit(m_lock);
}
private void ProcessEntryCompleteAsync(Task<bool> task, object context)
{
EntryProcessingContext ctx = (EntryProcessingContext)context;
try
{
string message;
if (task.IsCanceled)
{
Logger.Warning("Processing was cancelled");
message = "The change was cancelled during processing";
}
else if (task.Exception != null)
{
Exception ex = task.Exception;
Logger.Warning("Processing failed with {0}: {1}", ex.GetType().FullName, ex.Message);
message = "An error occurred while synchronzing the changed.";
}
else
{
message = "The change was successfully synchronized";
if (task.Result)
{
//Processing
//...
//...
}
}
}
catch (Exception e)
{
Logger.Info(
"Caught an exception while completing entry processing. " + e);
}
finally
{
}
}
Thanks

What could be the reason behind the failure of the task?

I have a class named TableStorageController.cs
public class TableStorageController
{
private static Dictionary<string, BlockingCollection<CloudModelDetail>> s_dictionary = new Dictionary<string, BlockingCollection<CloudModelDetail>>();
private static StorageAccount s_azureStorageAccount;
private readonly CloudModelDetail _cloudModelDetail;
private static CancellationTokenSource s_cancellationTokenSource = new CancellationTokenSource();
private static int s_retailerId;
/// <summary>
/// Static task to log transactions to Azure Table Storage after every 5 minutes.
/// </summary>
static TableStorageController()
{
try
{
Task.Run(async () =>
{
while (true)
{
await Task.Delay(300000, s_cancellationTokenSource.Token);
foreach (var retaileridkey in s_dictionary.Keys)
{
var batchOperation = new TableBatchOperation();
while (s_dictionary[retaileridkey].Count != 0 && batchOperation.Count < 101)
{
batchOperation.InsertOrMerge(s_dictionary[retaileridkey].Take());
}
if (batchOperation?.Count != 0)
await s_azureStorageAccount.VerifyCloudTable.ExecuteBatchAsync(batchOperation);
}
}
});
}
catch (Exception ex)
{
s_log.Fatal("Azure task run failed", ex);
}
}
}
This task is meant to run after every 5 minutes and log whatever items are present in the dictionary to Azure Table Storage.
Locally when I run my code, I can see it gets triggered after every 5 minutes.
But, somehow once after the deployment in another environment (Production), it fails.
Can anyone point out what am I missing?
Note: I have never got an exception Azure task run failed.
Try putting the exception handling inside the Task.Run block. You do not await the call to Task.Run so exceptions will go unnoticed. And since await Task.Delay is already non-blocking I do not see why you need the extra Task for. Try this:
static async void TableStorageController()
{
try
{
while (true)
{
await Task.Delay(TimeSpan.FromMinutes(5), s_cancellationTokenSource.Token);
foreach (var retaileridkey in s_dictionary.Keys)
{
var batchOperation = new TableBatchOperation();
while (s_dictionary[retaileridkey].Count != 0 && batchOperation.Count < 101)
{
batchOperation.InsertOrMerge(s_dictionary[retaileridkey].Take());
}
if (batchOperation?.Count != 0)
await s_azureStorageAccount.VerifyCloudTable.ExecuteBatchAsync(batchOperation);
}
}
}
catch (Exception ex)
{
s_log.Fatal("Azure task run failed", ex);
}
}
By the way, you could also use a timer instead of a Task.Delay with a CancellationToken

BrokeredMessage disposed after accessing from different thread

This might be a duplicate of this question but that's confused with talk about batching database updates and still has no proper answer.
In a simple example using Azure Service Bus queues, I can't access a BrokeredMessage after it's been placed on a queue; it's always disposed if I read the queue from another thread.
Sample code:
class Program {
private static string _serviceBusConnectionString = "XXX";
private static BlockingCollection<BrokeredMessage> _incomingMessages = new BlockingCollection<BrokeredMessage>();
private static CancellationTokenSource _cancelToken = new CancellationTokenSource();
private static QueueClient _client;
static void Main(string[] args) {
// Set up a few listeners on different threads
Task.Run(async () => {
while (!_cancelToken.IsCancellationRequested) {
var msg = _incomingMessages.Take(_cancelToken.Token);
if (msg != null) {
try {
await msg.CompleteAsync();
Console.WriteLine($"Completed Message Id: {msg.MessageId}");
} catch (ObjectDisposedException) {
Console.WriteLine("Message was disposed!?");
}
}
}
});
// Now set up our service bus reader
_client = GetQueueClient("test");
_client.OnMessageAsync(async (message) => {
await Task.Run(() => _incomingMessages.Add(message));
},
new OnMessageOptions() {
AutoComplete = false
});
// Now start sending
Task.Run(async () => {
int sent = 0;
while (!_cancelToken.IsCancellationRequested) {
var msg = new BrokeredMessage();
await _client.SendAsync(msg);
Console.WriteLine($"Sent {++sent}");
await Task.Delay(1000);
}
});
Console.ReadKey();
_cancelToken.Cancel();
}
private static QueueClient GetQueueClient(string queueName) {
var namespaceManager = NamespaceManager.CreateFromConnectionString(_serviceBusConnectionString);
if (!namespaceManager.QueueExists(queueName)) {
var settings = new QueueDescription(queueName);
settings.MaxDeliveryCount = 10;
settings.LockDuration = TimeSpan.FromSeconds(5);
settings.EnableExpress = true;
settings.EnablePartitioning = true;
namespaceManager.CreateQueue(settings);
}
var factory = MessagingFactory.CreateFromConnectionString(_serviceBusConnectionString);
factory.RetryPolicy = new RetryExponential(minBackoff: TimeSpan.FromSeconds(0.1), maxBackoff: TimeSpan.FromSeconds(30), maxRetryCount: 100);
var queueClient = factory.CreateQueueClient(queueName);
return queueClient;
}
}
I've tried playing around with settings but can't get this to work. Any ideas?
Answering my own question with response from Serkant Karaca # Microsoft here:
Very basic rule and I am not sure if this is documented. The received message needs to be processed in the callback function's life time. In your case, messages will be disposed when async callback completes, this is why your complete attempts are failing with ObjectDisposedException in another thread.
I don't really see how queuing messages for further processing helps on the throughput. This will add more burden to client for sure. Try processing the message in the async callback, that should be performant enough.
Bugger.

How to get handle of an awaitable Task?

I am a newbie in Tasks and still learning this topic so be gentle with me (I think I have some fundamental mess-ups with my below code...)
Please look at the below exercise which will help me describe my question:
I have a simple "MyService" class which has a "Do_CPU_Intensive_Job" method called by the "Run" method. My purpose is to be able to run several instances of the "Do_CPU_Intensive_Job" method (which itself run on a different thread than the UI as its CPU bound), sometimes synchronously and sometimes asynchronously.
In other words, assuming I have 2 instances of MyService, sometimes I want these 2 methods running together and sometimes not.
class MyService
{
private bool async;
private string name;
private CancellationTokenSource tokenSource;
private CancellationToken token;
private bool isRunning = false;
private Task myTask = null;
public MyService(string name, bool async)
{
this.name = name;
this.async = async;
}
public string Name { get { return name; } }
public bool IsRunning { get { return isRunning; } }
public async Task Run ()
{
isRunning = true;
tokenSource = new CancellationTokenSource();
token = tokenSource.Token;
if (async)
myTask = Do_CPU_Intensive_Job();
else
await Do_CPU_Intensive_Job(); // I cannot do myTask = await Do_CPU_Intensive_Job(); so how can the "Stop" method wait for it??
}
public async Task Stop ()
{
tokenSource.Cancel();
if (myTask != null)
await myTask;
isRunning = false;
}
private async Task Do_CPU_Intensive_Job ()
{
Console.WriteLine("doing some heavy job for Task " + name);
int i = 0;
while (!token.IsCancellationRequested)
{
Console.WriteLine("Task: " + name + " - " + i);
await Task.Delay(1000);
i++;
}
Console.WriteLine("Task " + name + " not yet completed! I need to do some cleanups");
await Task.Delay(2000); //simulating cleanups
Console.WriteLine("Task " + name + " - CPU intensive and cleanups done!");
}
}
So, I have the below GUI which which works well but only if the 2 instances are running asynchronously. "works well" means that when stopping the tasks, it stops nicely by running entire "Do_CPU_Intensive_Job" method. hence the last message will be from the GUI ("Both tasks are completed...now doing some other stuff"). So far so good.
public partial class Form1 : Form
{
List<MyService> list = null;
MyService ms1 = null;
MyService ms2 = null;
public Form1()
{
InitializeComponent();
list = new List<MyService>();
ms1 = new MyService("task 1", true);
ms2 = new MyService("task 2", true);
list.Add(ms1);
list.Add(ms2);
}
private async void button1_Click(object sender, EventArgs e)
{
foreach (MyService item in list)
await item.Run();
}
private async void button2_Click(object sender, EventArgs e)
{
foreach (MyService item in list)
{
if (item.IsRunning)
{
await item.Stop();
Console.WriteLine("Done stopping Task: " + item.Name);
}
}
//now ready to do some other stuff
Console.WriteLine("Both tasks are completed...now doing some other stuff");
}
}
Problem starts when the 2 instances are not running simultaneously. In that case, I get the "Both tasks are completed...now doing some other stuff" from the GUI before "Do_CPU_Intensive_Job" is really completed...
ms1 = new MyService("task 1", false);
ms2 = new MyService("task 2", false);
this is not happening when both tasks are running together because I have the handle (myTask) when running asynchronously which I dont when running synchronously.
await Do_CPU_Intensive_Job(); // I cannot do myTask = await Do_CPU_Intensive_Job(); so how can the "Stop" method wait for it??
Thanks, all
I spent some time hammering out the code to a point that I think it is doing what is expected.
The first problem I found is that you can't just pass the cancellation token into your method, you need to relate it to the task(s) that are to be cancelled. Unfortunately I could not find a way to do this directly on an async method but have a look at the MyService class here for how I was able to do this.
class MyService
{
private bool async;
private string name;
private CancellationTokenSource tokenSource;
private bool isRunning = false;
private Task myTask = null;
public MyService(string name, bool async)
{
this.name = name;
this.async = async;
}
public string Name { get { return name; } }
public bool IsRunning { get { return isRunning; } }
public async Task Run()
{
isRunning = true;
tokenSource = new CancellationTokenSource();
myTask = Task.Run(() => Do_CPU_Intensive_Job(tokenSource.Token), tokenSource.Token);
if (!async)
await myTask;
}
public async Task Stop()
{
tokenSource.Cancel();
if (myTask != null)
await myTask;
isRunning = false;
}
private void Do_CPU_Intensive_Job(CancellationToken token)
{
Console.WriteLine("doing some heavy job for Task " + name);
int i = 0;
while (!token.IsCancellationRequested)
{
Console.WriteLine("Task: " + name + " - " + i);
Thread.Sleep(1000);
i++;
}
Console.WriteLine("Task " + name + " not yet completed! I need to do some cleanups");
Thread.Sleep(1000);
Console.WriteLine("Task " + name + " - CPU intensive and cleanups done!");
}
}
The Run method is now using Task.Run to call Do_CPU_Intensive_Job and if you note I am passing the token to both the work method and to the Task.Run call. The latter is what links the token to that Task/Thread and the former is what allows us to watch for the cancellation request.
The final piece is how we call Run on the service instances, by calling await on a Task or async method the thread is being released but the remainder of the code in the method is extracted and will not be run until the awaited task completes.
I was just using a unit test in order to work on the code rather than a button but the premise should be the same, but here is how I was able to run the tasks in synchronous mode and still be able to call stop on them.
var service1 = new MyService("task 1", false);
var service2 = new MyService("task 2", false);
service1.Run(); //Execution immediately moves to next line
service2.Run(); // Same here
await service1.Stop(); //Execution will halt here until task one has fully stopped so task 2 actually continues running
await service2.Stop();

How to cancel Task await after a timeout period

I am using this method to instantiate a web browser programmatically, navigate to a url and return a result when the document has completed.
How would I be able to stop the Task and have GetFinalUrl() return null if the document takes more than 5 seconds to load?
I have seen many examples using a TaskFactory but I haven't been able to apply it to this code.
private Uri GetFinalUrl(PortalMerchant portalMerchant)
{
SetBrowserFeatureControl();
Uri finalUri = null;
if (string.IsNullOrEmpty(portalMerchant.Url))
{
return null;
}
Uri trackingUrl = new Uri(portalMerchant.Url);
var task = MessageLoopWorker.Run(DoWorkAsync, trackingUrl);
task.Wait();
if (!String.IsNullOrEmpty(task.Result.ToString()))
{
return new Uri(task.Result.ToString());
}
else
{
throw new Exception("Parsing Failed");
}
}
// by Noseratio - http://stackoverflow.com/users/1768303/noseratio
static async Task<object> DoWorkAsync(object[] args)
{
_threadCount++;
Console.WriteLine("Thread count:" + _threadCount);
Uri retVal = null;
var wb = new WebBrowser();
wb.ScriptErrorsSuppressed = true;
TaskCompletionSource<bool> tcs = null;
WebBrowserDocumentCompletedEventHandler documentCompletedHandler = (s, e) => tcs.TrySetResult(true);
foreach (var url in args)
{
tcs = new TaskCompletionSource<bool>();
wb.DocumentCompleted += documentCompletedHandler;
try
{
wb.Navigate(url.ToString());
await tcs.Task;
}
finally
{
wb.DocumentCompleted -= documentCompletedHandler;
}
retVal = wb.Url;
wb.Dispose();
return retVal;
}
return null;
}
public static class MessageLoopWorker
{
#region Public static methods
public static async Task<object> Run(Func<object[], Task<object>> worker, params object[] args)
{
var tcs = new TaskCompletionSource<object>();
var thread = new Thread(() =>
{
EventHandler idleHandler = null;
idleHandler = async (s, e) =>
{
// handle Application.Idle just once
Application.Idle -= idleHandler;
// return to the message loop
await Task.Yield();
// and continue asynchronously
// propogate the result or exception
try
{
var result = await worker(args);
tcs.SetResult(result);
}
catch (Exception ex)
{
tcs.SetException(ex);
}
// signal to exit the message loop
// Application.Run will exit at this point
Application.ExitThread();
};
// handle Application.Idle just once
// to make sure we're inside the message loop
// and SynchronizationContext has been correctly installed
Application.Idle += idleHandler;
Application.Run();
});
// set STA model for the new thread
thread.SetApartmentState(ApartmentState.STA);
// start the thread and await for the task
thread.Start();
try
{
return await tcs.Task;
}
finally
{
thread.Join();
}
}
#endregion
}
Updated: the latest version of the WebBrowser-based console web scraper can be found on Github.
Updated: Adding a pool of WebBrowser objects for multiple parallel downloads.
Do you have an example of how to do this in a console app by any
chance? Also I don't think webBrowser can be a class variable because
I am running the whole thing in a parallell for each, iterating
thousands of URLs
Below is an implementation of more or less generic **WebBrowser-based web scraper **, which works as console application. It's a consolidation of some of my previous WebBrowser-related efforts, including the code referenced in the question:
Capturing an image of the web page with opacity
Loading a page with dynamic AJAX content
Creating an STA message loop thread for WebBrowser
Loading a set of URLs, one after another
Printing a set of URLs with WebBrowser
Web page UI automation
A few points:
Reusable MessageLoopApartment class is used to start and run a WinForms STA thread with its own message pump. It can be used from a console application, as below. This class exposes a TPL Task Scheduler (FromCurrentSynchronizationContext) and a set of Task.Factory.StartNew wrappers to use this task scheduler.
This makes async/await a great tool for running WebBrowser navigation tasks on that separate STA thread. This way, a WebBrowser object gets created, navigated and destroyed on that thread. Although, MessageLoopApartment is not tied up to WebBrowser specifically.
It's important to enable HTML5 rendering using Browser Feature
Control, as otherwise the WebBrowser obejcts runs in IE7 emulation mode by default.
That's what SetFeatureBrowserEmulation does below.
It may not always be possible to determine when a web page has finished rendering with 100% probability. Some pages are quite complex and use continuous AJAX updates. Yet we
can get quite close, by handling DocumentCompleted event first, then polling the page's current HTML snapshot for changes and checking the WebBrowser.IsBusy property. That's what NavigateAsync does below.
A time-out logic is present on top of the above, in case the page rendering is never-ending (note CancellationTokenSource and CreateLinkedTokenSource).
using Microsoft.Win32;
using System;
using System.Threading;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace Console_22239357
{
class Program
{
// by Noseratio - https://stackoverflow.com/a/22262976/1768303
// main logic
static async Task ScrapeSitesAsync(string[] urls, CancellationToken token)
{
using (var apartment = new MessageLoopApartment())
{
// create WebBrowser inside MessageLoopApartment
var webBrowser = apartment.Invoke(() => new WebBrowser());
try
{
foreach (var url in urls)
{
Console.WriteLine("URL:\n" + url);
// cancel in 30s or when the main token is signalled
var navigationCts = CancellationTokenSource.CreateLinkedTokenSource(token);
navigationCts.CancelAfter((int)TimeSpan.FromSeconds(30).TotalMilliseconds);
var navigationToken = navigationCts.Token;
// run the navigation task inside MessageLoopApartment
string html = await apartment.Run(() =>
webBrowser.NavigateAsync(url, navigationToken), navigationToken);
Console.WriteLine("HTML:\n" + html);
}
}
finally
{
// dispose of WebBrowser inside MessageLoopApartment
apartment.Invoke(() => webBrowser.Dispose());
}
}
}
// entry point
static void Main(string[] args)
{
try
{
WebBrowserExt.SetFeatureBrowserEmulation(); // enable HTML5
var cts = new CancellationTokenSource((int)TimeSpan.FromMinutes(3).TotalMilliseconds);
var task = ScrapeSitesAsync(
new[] { "http://example.com", "http://example.org", "http://example.net" },
cts.Token);
task.Wait();
Console.WriteLine("Press Enter to exit...");
Console.ReadLine();
}
catch (Exception ex)
{
while (ex is AggregateException && ex.InnerException != null)
ex = ex.InnerException;
Console.WriteLine(ex.Message);
Environment.Exit(-1);
}
}
}
/// <summary>
/// WebBrowserExt - WebBrowser extensions
/// by Noseratio - https://stackoverflow.com/a/22262976/1768303
/// </summary>
public static class WebBrowserExt
{
const int POLL_DELAY = 500;
// navigate and download
public static async Task<string> NavigateAsync(this WebBrowser webBrowser, string url, CancellationToken token)
{
// navigate and await DocumentCompleted
var tcs = new TaskCompletionSource<bool>();
WebBrowserDocumentCompletedEventHandler handler = (s, arg) =>
tcs.TrySetResult(true);
using (token.Register(() => tcs.TrySetCanceled(), useSynchronizationContext: true))
{
webBrowser.DocumentCompleted += handler;
try
{
webBrowser.Navigate(url);
await tcs.Task; // wait for DocumentCompleted
}
finally
{
webBrowser.DocumentCompleted -= handler;
}
}
// get the root element
var documentElement = webBrowser.Document.GetElementsByTagName("html")[0];
// poll the current HTML for changes asynchronosly
var html = documentElement.OuterHtml;
while (true)
{
// wait asynchronously, this will throw if cancellation requested
await Task.Delay(POLL_DELAY, token);
// continue polling if the WebBrowser is still busy
if (webBrowser.IsBusy)
continue;
var htmlNow = documentElement.OuterHtml;
if (html == htmlNow)
break; // no changes detected, end the poll loop
html = htmlNow;
}
// consider the page fully rendered
token.ThrowIfCancellationRequested();
return html;
}
// enable HTML5 (assuming we're running IE10+)
// more info: https://stackoverflow.com/a/18333982/1768303
public static void SetFeatureBrowserEmulation()
{
if (System.ComponentModel.LicenseManager.UsageMode != System.ComponentModel.LicenseUsageMode.Runtime)
return;
var appName = System.IO.Path.GetFileName(System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName);
Registry.SetValue(#"HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\Main\FeatureControl\FEATURE_BROWSER_EMULATION",
appName, 10000, RegistryValueKind.DWord);
}
}
/// <summary>
/// MessageLoopApartment
/// STA thread with message pump for serial execution of tasks
/// by Noseratio - https://stackoverflow.com/a/22262976/1768303
/// </summary>
public class MessageLoopApartment : IDisposable
{
Thread _thread; // the STA thread
TaskScheduler _taskScheduler; // the STA thread's task scheduler
public TaskScheduler TaskScheduler { get { return _taskScheduler; } }
/// <summary>MessageLoopApartment constructor</summary>
public MessageLoopApartment()
{
var tcs = new TaskCompletionSource<TaskScheduler>();
// start an STA thread and gets a task scheduler
_thread = new Thread(startArg =>
{
EventHandler idleHandler = null;
idleHandler = (s, e) =>
{
// handle Application.Idle just once
Application.Idle -= idleHandler;
// return the task scheduler
tcs.SetResult(TaskScheduler.FromCurrentSynchronizationContext());
};
// handle Application.Idle just once
// to make sure we're inside the message loop
// and SynchronizationContext has been correctly installed
Application.Idle += idleHandler;
Application.Run();
});
_thread.SetApartmentState(ApartmentState.STA);
_thread.IsBackground = true;
_thread.Start();
_taskScheduler = tcs.Task.Result;
}
/// <summary>shutdown the STA thread</summary>
public void Dispose()
{
if (_taskScheduler != null)
{
var taskScheduler = _taskScheduler;
_taskScheduler = null;
// execute Application.ExitThread() on the STA thread
Task.Factory.StartNew(
() => Application.ExitThread(),
CancellationToken.None,
TaskCreationOptions.None,
taskScheduler).Wait();
_thread.Join();
_thread = null;
}
}
/// <summary>Task.Factory.StartNew wrappers</summary>
public void Invoke(Action action)
{
Task.Factory.StartNew(action,
CancellationToken.None, TaskCreationOptions.None, _taskScheduler).Wait();
}
public TResult Invoke<TResult>(Func<TResult> action)
{
return Task.Factory.StartNew(action,
CancellationToken.None, TaskCreationOptions.None, _taskScheduler).Result;
}
public Task Run(Action action, CancellationToken token)
{
return Task.Factory.StartNew(action, token, TaskCreationOptions.None, _taskScheduler);
}
public Task<TResult> Run<TResult>(Func<TResult> action, CancellationToken token)
{
return Task.Factory.StartNew(action, token, TaskCreationOptions.None, _taskScheduler);
}
public Task Run(Func<Task> action, CancellationToken token)
{
return Task.Factory.StartNew(action, token, TaskCreationOptions.None, _taskScheduler).Unwrap();
}
public Task<TResult> Run<TResult>(Func<Task<TResult>> action, CancellationToken token)
{
return Task.Factory.StartNew(action, token, TaskCreationOptions.None, _taskScheduler).Unwrap();
}
}
}
I suspect running a processing loop on another thread will not work out well, since WebBrowser is a UI component that hosts an ActiveX control.
When you're writing TAP over EAP wrappers, I recommend using extension methods to keep the code clean:
public static Task<string> NavigateAsync(this WebBrowser #this, string url)
{
var tcs = new TaskCompletionSource<string>();
WebBrowserDocumentCompletedEventHandler subscription = null;
subscription = (_, args) =>
{
#this.DocumentCompleted -= subscription;
tcs.TrySetResult(args.Url.ToString());
};
#this.DocumentCompleted += subscription;
#this.Navigate(url);
return tcs.Task;
}
Now your code can easily apply a timeout:
async Task<string> GetUrlAsync(string url)
{
using (var wb = new WebBrowser())
{
var navigate = wb.NavigateAsync(url);
var timeout = Task.Delay(TimeSpan.FromSeconds(5));
var completed = await Task.WhenAny(navigate, timeout);
if (completed == navigate)
return await navigate;
return null;
}
}
which can be consumed as such:
private async Task<Uri> GetFinalUrlAsync(PortalMerchant portalMerchant)
{
SetBrowserFeatureControl();
if (string.IsNullOrEmpty(portalMerchant.Url))
return null;
var result = await GetUrlAsync(portalMerchant.Url);
if (!String.IsNullOrEmpty(result))
return new Uri(result);
throw new Exception("Parsing Failed");
}
I'm trying to take benefit from Noseratio's solution as well as following advices from Stephen Cleary.
Here is the code I updated to include in the code from Stephen the code from Noseratio regarding the AJAX tip.
First part: the Task NavigateAsync advised by Stephen
public static Task<string> NavigateAsync(this WebBrowser #this, string url)
{
var tcs = new TaskCompletionSource<string>();
WebBrowserDocumentCompletedEventHandler subscription = null;
subscription = (_, args) =>
{
#this.DocumentCompleted -= subscription;
tcs.TrySetResult(args.Url.ToString());
};
#this.DocumentCompleted += subscription;
#this.Navigate(url);
return tcs.Task;
}
Second part: a new Task NavAjaxAsync to run the tip for AJAX (based on Noseratio's code)
public static async Task<string> NavAjaxAsync(this WebBrowser #this)
{
// get the root element
var documentElement = #this.Document.GetElementsByTagName("html")[0];
// poll the current HTML for changes asynchronosly
var html = documentElement.OuterHtml;
while (true)
{
// wait asynchronously
await Task.Delay(POLL_DELAY);
// continue polling if the WebBrowser is still busy
if (webBrowser.IsBusy)
continue;
var htmlNow = documentElement.OuterHtml;
if (html == htmlNow)
break; // no changes detected, end the poll loop
html = htmlNow;
}
return #this.Document.Url.ToString();
}
Third part: a new Task NavAndAjaxAsync to get the navigation and the AJAX
public static async Task NavAndAjaxAsync(this WebBrowser #this, string url)
{
await #this.NavigateAsync(url);
await #this.NavAjaxAsync();
}
Fourth and last part: the updated Task GetUrlAsync from Stephen with Noseratio's code for AJAX
async Task<string> GetUrlAsync(string url)
{
using (var wb = new WebBrowser())
{
var navigate = wb.NavAndAjaxAsync(url);
var timeout = Task.Delay(TimeSpan.FromSeconds(5));
var completed = await Task.WhenAny(navigate, timeout);
if (completed == navigate)
return await navigate;
return null;
}
}
I'd like to know if this is the right approach.

Categories