i've a methode that return a IEnumerable of my business object. In this method i parse the content of a large text file to the business object model. There is no threading stuff in it.
In my ViewModel (WPF) i need to store and display the results of the method.
Store is an ObservableCollection.
Here ist the observable code:
private void OpenFile(string file)
{
_parser = new IhvParser();
App.Messenger.NotifyColleagues(Actions.ReportContentInfo, new Model.StatusInfoDisplayDTO { Information = "Lade Daten...", Interval = 0 });
_ihvDataList.Clear();
var obs = _parser.ParseDataObservable(file)
.ToObservable(NewThreadScheduler.Default)
.ObserveOnDispatcher()
.Subscribe<Ihv>(AddIhvToList, ReportError, ReportComplete);
}
private void ReportComplete()
{
App.Messenger.NotifyColleagues(Actions.ReportContentInfo, new Model.StatusInfoDisplayDTO { Information = "Daten fertig geladen.", Interval = 3000 });
RaisePropertyChanged(() => IhvDataList);
}
private void ReportError(Exception ex)
{
MessageBox.Show("...");
}
private void AddIhvToList(Ihv ihv)
{
_ihvDataList.Add(ihv);
}
And this is the parser code:
public IEnumerable<Model.Ihv> ParseDataObservable(string file)
{
using (StreamReader reader = new StreamReader(file))
{
var head = reader.ReadLine(); //erste Zeile ist Kopfinformation
if (!head.Contains("BayBAS") || !head.Contains("2.3.0"))
{
_logger.ErrorFormat("Die Datei {0} liegt nicht im BayBAS-Format 2.3.0 vor.");
}
else
{
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
if (line.Length != 1415)
{
_logger.ErrorFormat("Die Datei {0} liegt nicht im BayBAS-Format 2.3.0 vor.");
break;
}
var tempIhvItem = Model.Ihv.Parse(line);
yield return tempIhvItem;
}
reader.Close();
}
}
}
Why do i don't get the results async? Before i see the results in my DataGrid, all items are parsed and delivered.
Can anybody help?
Andreas
Are you sure this isn't happening asynchronously? Are you assuming this based on what you perceive in the UI, or have you set breakpoints and determined that this is, in fact, the case?
Note that WPF's Dispatcher uses a priority queue, and DispatcherScheduler schedules items with Normal priority, which trumps the priority levels used for input, layout, and rendering. If the results come in quickly enough, then the UI may not get updated until after the last result has been processed: the dispatcher might be too busy processing results to perform layout and rendering of the UI.
You could try overriding the behavior of the DispatcherScheduler to schedule at a custom priority like so:
public class PriorityDispatcherScheduler : DispatcherScheduler
{
private readonly DispatcherPriority _priority;
public PriorityDispatcherScheduler(DispatcherPriority priority)
: this(priority, Dispatcher.CurrentDispatcher) {}
public PriorityDispatcherScheduler(DispatcherPriority priority, Dispatcher dispatcher)
: base(dispatcher)
{
_priority = priority;
}
public override IDisposable Schedule<TState>(TState state, Func<IScheduler, TState, IDisposable> action)
{
if (action == null)
throw new ArgumentNullException("action");
var d = new SingleAssignmentDisposable();
this.Dispatcher.BeginInvoke(
_priority,
(Action)(() =>
{
if (d.IsDisposed)
return;
d.Disposable = action(this, state);
}));
return d;
}
}
And then modify your observable sequence by replacing ObserveOnDispatcher() with ObserveOn(new PriorityDispatcherScheduler(p)), where p is an appropriate priority level (e.g., Background).
Also, this looks highly suspect: ToObservable(NewThreadScheduler.Default). I believe this will cause a new thread to be created every time a result comes in, for the sole purpose of passing it to the dispatcher, after which the new thread will terminate. This is almost certainly not what you intended. I assume you simply wanted the file processed on a separate thread; as written, your code would literally end up creating 1,000 short-lived threads if your IEnumerable yields 1,000 items, none of which would actually be doing the work of reading the file.
Lastly, is OpenFile() being invoked on the dispatcher thread? If so, I believe what is going to happen is as follows:
Dispatcher (on UI thread) will call Subscribe(), which will process the chain of observable operators all the way back to ParseDataObservable(file).
Dispatcher will iterate through your IEnumerable sequence, firing each result into the observable sequence created by ToObservable().
Each result passed into the observable sequence will be scheduled for delivery on the dispatcher (the very same dispatcher that is currently running).
If this is the case, then the entire file will be read before any of the results get passed to AddIhvToList(), because the dispatcher is tied up reading the file and won't get around to processing the results in its queue until it has finished. If this is what is happening, you can try altering your code as follows:
var obs = _parser.ParseDataObservable(file)
.ToObservable()
.SubscribeOn(/*NewThread*/Scheduler.Default)
.ObserveOnDispatcher() // consider using PriorityDispatcherScheduler
.Subscribe<Ihv>(AddIhvToList, ReportError, ReportComplete);
Injecting SubscribeOn() should ensure that the iteration of your IEnumerable (i.e., the reading of the file) occurs on a separate thread. Scheduler.Default should suffice here, but you could use a NewThreadScheduler if you really need to (you probably don't). The dispatcher thread will return from Subscribe() after everything has been set up, freeing it up to continue processing its queue, i.e., passing the results to AddIhvToList() as they come in. This should give you the asynchronous behavior you desire.
Related
I know that asynchronous programming has seen a lot of changes over the years. I'm somewhat embarrassed that I let myself get this rusty at just 34 years old, but I'm counting on StackOverflow to bring me up to speed.
What I am trying to do is manage a queue of "work" on a separate thread, but in such a way that only one item is processed at a time. I want to post work on this thread and it doesn't need to pass anything back to the caller. Of course I could simply spin up a new Thread object and have it loop over a shared Queue object, using sleeps, interrupts, wait handles, etc. But I know things have gotten better since then. We have BlockingCollection, Task, async/await, not to mention NuGet packages that probably abstract a lot of that.
I know that "What's the best..." questions are generally frowned upon so I'll rephrase it by saying "What is the currently recommended..." way to accomplish something like this using built-in .NET mechanisms preferably. But if a third party NuGet package simplifies things a bunch, it's just as well.
I considered a TaskScheduler instance with a fixed maximum concurrency of 1, but seems there is probably a much less clunky way to do that by now.
Background
Specifically, what I am trying to do in this case is queue an IP geolocation task during a web request. The same IP might wind up getting queued for geolocation multiple times, but the task will know how to detect that and skip out early if it's already been resolved. But the request handler is just going to throw these () => LocateAddress(context.Request.UserHostAddress) calls into a queue and let the LocateAddress method handle duplicate work detection. The geolocation API I am using doesn't like to be bombarded with requests which is why I want to limit it to a single concurrent task at a time. However, it would be nice if the approach was allowed to easily scale to more concurrent tasks with a simple parameter change.
To create an asynchronous single degree of parallelism queue of work you can simply create a SemaphoreSlim, initialized to one, and then have the enqueing method await on the acquisition of that semaphore before starting the requested work.
public class TaskQueue
{
private SemaphoreSlim semaphore;
public TaskQueue()
{
semaphore = new SemaphoreSlim(1);
}
public async Task<T> Enqueue<T>(Func<Task<T>> taskGenerator)
{
await semaphore.WaitAsync();
try
{
return await taskGenerator();
}
finally
{
semaphore.Release();
}
}
public async Task Enqueue(Func<Task> taskGenerator)
{
await semaphore.WaitAsync();
try
{
await taskGenerator();
}
finally
{
semaphore.Release();
}
}
}
Of course, to have a fixed degree of parallelism other than one simply initialize the semaphore to some other number.
Your best option as I see it is using TPL Dataflow's ActionBlock:
var actionBlock = new ActionBlock<string>(address =>
{
if (!IsDuplicate(address))
{
LocateAddress(address);
}
});
actionBlock.Post(context.Request.UserHostAddress);
TPL Dataflow is robust, thread-safe, async-ready and very configurable actor-based framework (available as a nuget)
Here's a simple example for a more complicated case. Let's assume you want to:
Enable concurrency (limited to the available cores).
Limit the queue size (so you won't run out of memory).
Have both LocateAddress and the queue insertion be async.
Cancel everything after an hour.
var actionBlock = new ActionBlock<string>(async address =>
{
if (!IsDuplicate(address))
{
await LocateAddressAsync(address);
}
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = 10000,
MaxDegreeOfParallelism = Environment.ProcessorCount,
CancellationToken = new CancellationTokenSource(TimeSpan.FromHours(1)).Token
});
await actionBlock.SendAsync(context.Request.UserHostAddress);
Actually you don't need to run tasks in one thread, you need them to run serially (one after another), and FIFO. TPL doesn't have class for that, but here is my very lightweight, non-blocking implementation with tests. https://github.com/Gentlee/SerialQueue
Also have #Servy implementation there, tests show it is twice slower than mine and it doesn't guarantee FIFO.
Example:
private readonly SerialQueue queue = new SerialQueue();
async Task SomeAsyncMethod()
{
var result = await queue.Enqueue(DoSomething);
}
Use BlockingCollection<Action> to create a producer/consumer pattern with one consumer (only one thing running at a time like you want) and one or many producers.
First define a shared queue somewhere:
BlockingCollection<Action> queue = new BlockingCollection<Action>();
In your consumer Thread or Task you take from it:
//This will block until there's an item available
Action itemToRun = queue.Take()
Then from any number of producers on other threads, simply add to the queue:
queue.Add(() => LocateAddress(context.Request.UserHostAddress));
I'm posting a different solution here. To be honest I'm not sure whether this is a good solution.
I'm used to use BlockingCollection to implement a producer/consumer pattern, with a dedicated thread consuming those items. It's fine if there are always data coming in and consumer thread won't sit there and do nothing.
I encountered a scenario that one of the application would like to send emails on a different thread, but total number of emails is not that big.
My initial solution was to have a dedicated consumer thread (created by Task.Run()), but a lot of time it just sits there and does nothing.
Old solution:
private readonly BlockingCollection<EmailData> _Emails =
new BlockingCollection<EmailData>(new ConcurrentQueue<EmailData>());
// producer can add data here
public void Add(EmailData emailData)
{
_Emails.Add(emailData);
}
public void Run()
{
// create a consumer thread
Task.Run(() =>
{
foreach (var emailData in _Emails.GetConsumingEnumerable())
{
SendEmail(emailData);
}
});
}
// sending email implementation
private void SendEmail(EmailData emailData)
{
throw new NotImplementedException();
}
As you can see, if there are not enough emails to be sent (and it is my case), the consumer thread will spend most of them sitting there and do nothing at all.
I changed my implementation to:
// create an empty task
private Task _SendEmailTask = Task.Run(() => {});
// caller will dispatch the email to here
// continuewith will use a thread pool thread (different to
// _SendEmailTask thread) to send this email
private void Add(EmailData emailData)
{
_SendEmailTask = _SendEmailTask.ContinueWith((t) =>
{
SendEmail(emailData);
});
}
// actual implementation
private void SendEmail(EmailData emailData)
{
throw new NotImplementedException();
}
It's no longer a producer/consumer pattern, but it won't have a thread sitting there and does nothing, instead, every time it is to send an email, it will use thread pool thread to do it.
My lib, It can:
Run random in queue list
Multi queue
Run prioritize first
Re-queue
Event all queue completed
Cancel running or cancel wait for running
Dispatch event to UI thread
public interface IQueue
{
bool IsPrioritize { get; }
bool ReQueue { get; }
/// <summary>
/// Dont use async
/// </summary>
/// <returns></returns>
Task DoWork();
bool CheckEquals(IQueue queue);
void Cancel();
}
public delegate void QueueComplete<T>(T queue) where T : IQueue;
public delegate void RunComplete();
public class TaskQueue<T> where T : IQueue
{
readonly List<T> Queues = new List<T>();
readonly List<T> Runnings = new List<T>();
[Browsable(false), DefaultValue((string)null)]
public Dispatcher Dispatcher { get; set; }
public event RunComplete OnRunComplete;
public event QueueComplete<T> OnQueueComplete;
int _MaxRun = 1;
public int MaxRun
{
get { return _MaxRun; }
set
{
bool flag = value > _MaxRun;
_MaxRun = value;
if (flag && Queues.Count != 0) RunNewQueue();
}
}
public int RunningCount
{
get { return Runnings.Count; }
}
public int QueueCount
{
get { return Queues.Count; }
}
public bool RunRandom { get; set; } = false;
//need lock Queues first
void StartQueue(T queue)
{
if (null != queue)
{
Queues.Remove(queue);
lock (Runnings) Runnings.Add(queue);
queue.DoWork().ContinueWith(ContinueTaskResult, queue);
}
}
void RunNewQueue()
{
lock (Queues)//Prioritize
{
foreach (var q in Queues.Where(x => x.IsPrioritize)) StartQueue(q);
}
if (Runnings.Count >= MaxRun) return;//other
else if (Queues.Count == 0)
{
if (Runnings.Count == 0 && OnRunComplete != null)
{
if (Dispatcher != null && !Dispatcher.CheckAccess()) Dispatcher.Invoke(OnRunComplete);
else OnRunComplete.Invoke();//on completed
}
else return;
}
else
{
lock (Queues)
{
T queue;
if (RunRandom) queue = Queues.OrderBy(x => Guid.NewGuid()).FirstOrDefault();
else queue = Queues.FirstOrDefault();
StartQueue(queue);
}
if (Queues.Count > 0 && Runnings.Count < MaxRun) RunNewQueue();
}
}
void ContinueTaskResult(Task Result, object queue_obj) => QueueCompleted((T)queue_obj);
void QueueCompleted(T queue)
{
lock (Runnings) Runnings.Remove(queue);
if (queue.ReQueue) lock (Queues) Queues.Add(queue);
if (OnQueueComplete != null)
{
if (Dispatcher != null && !Dispatcher.CheckAccess()) Dispatcher.Invoke(OnQueueComplete, queue);
else OnQueueComplete.Invoke(queue);
}
RunNewQueue();
}
public void Add(T queue)
{
if (null == queue) throw new ArgumentNullException(nameof(queue));
lock (Queues) Queues.Add(queue);
RunNewQueue();
}
public void Cancel(T queue)
{
if (null == queue) throw new ArgumentNullException(nameof(queue));
lock (Queues) Queues.RemoveAll(o => o.CheckEquals(queue));
lock (Runnings) Runnings.ForEach(o => { if (o.CheckEquals(queue)) o.Cancel(); });
}
public void Reset(T queue)
{
if (null == queue) throw new ArgumentNullException(nameof(queue));
Cancel(queue);
Add(queue);
}
public void ShutDown()
{
MaxRun = 0;
lock (Queues) Queues.Clear();
lock (Runnings) Runnings.ForEach(o => o.Cancel());
}
}
I know this thread is old, but it seems all the present solutions are extremely onerous. The simplest way I could find uses the Linq Aggregate function to create a daisy-chained list of tasks.
var arr = new int[] { 1, 2, 3, 4, 5};
var queue = arr.Aggregate(Task.CompletedTask,
(prev, item) => prev.ContinueWith(antecedent => PerformWorkHere(item)));
The idea is to get your data into an IEnumerable (I'm using an int array), and then reduce that enumerable to a chain of tasks, starting with a default, completed, task.
I have a MessagesManager thread to which different threads may send messages and then this MessagesManager thread is responsible to publish these messages inside SendMessageToTcpIP() (start point of MessagesManager thread ).
class MessagesManager : IMessageNotifier
{
//private
private readonly AutoResetEvent _waitTillMessageQueueEmptyARE = new AutoResetEvent(false);
private ConcurrentQueue<string> MessagesQueue = new ConcurrentQueue<string>();
public void PublishMessage(string Message)
{
MessagesQueue.Enqueue(Message);
_waitTillMessageQueueEmptyARE.Set();
}
public void SendMessageToTcpIP()
{
//keep waiting till a new message comes
while (MessagesQueue.Count() == 0)
{
_waitTillMessageQueueEmptyARE.WaitOne();
}
//Copy the Concurrent Queue into a local queue - keep dequeuing the item once it is inserts into the local Queue
Queue<string> localMessagesQueue = new Queue<string>();
while (!MessagesQueue.IsEmpty)
{
string message;
bool isRemoved = MessagesQueue.TryDequeue(out message);
if (isRemoved)
localMessagesQueue.Enqueue(message);
}
//Use the Local Queue for further processing
while (localMessagesQueue.Count() != 0)
{
TcpIpMessageSenderClient.ConnectAndSendMessage(localMessagesQueue.Dequeue().PadRight(80, ' '));
Thread.Sleep(2000);
}
}
}
The different threads (3-4) send their message by calling the PublishMessage(string Message) (using same object to MessageManager). Once the message comes, I push that message into a concurrent queue and notifies the SendMessageToTcpIP() by setting _waitTillMessageQueueEmptyARE.Set();. Inside SendMessageToTcpIP(), I am copying the message from the concurrent queue inside a local queue and then publish one by one.
QUESTIONS: Is it thread safe to do enqueuing and dequeuing in this way? Could there be some strange effects due to it?
While this is probably thread-safe, there are built-in classes in .NET to help with "many publishers one consumer" pattern, like BlockingCollection. You can rewrite your class like this:
class MessagesManager : IDisposable {
// note that your ConcurrentQueue is still in play, passed to constructor
private readonly BlockingCollection<string> MessagesQueue = new BlockingCollection<string>(new ConcurrentQueue<string>());
public MessagesManager() {
// start consumer thread here
new Thread(SendLoop) {
IsBackground = true
}.Start();
}
public void PublishMessage(string Message) {
// no need to notify here, will be done for you
MessagesQueue.Add(Message);
}
private void SendLoop() {
// this blocks until new items are available
foreach (var item in MessagesQueue.GetConsumingEnumerable()) {
// ensure that you handle exceptions here, or whole thing will break on exception
TcpIpMessageSenderClient.ConnectAndSendMessage(item.PadRight(80, ' '));
Thread.Sleep(2000); // only if you are sure this is required
}
}
public void Dispose() {
// this will "complete" GetConsumingEnumerable, so your thread will complete
MessagesQueue.CompleteAdding();
MessagesQueue.Dispose();
}
}
.NET already provides ActionBlock< T> that allows posting messages to a buffer and processing them asynchronously. By default, only one message is processed at a time.
Your code could be rewritten as:
//In an initialization function
ActionBlock<string> _hmiAgent=new ActionBlock<string>(async msg=>{
TcpIpMessageSenderClient.ConnectAndSendMessage(msg.PadRight(80, ' '));
await Task.Delay(2000);
);
//In some other thread ...
foreach ( ....)
{
_hmiAgent.Post(someMessage);
}
// When the application closes
_hmiAgent.Complete();
await _hmiAgent.Completion;
ActionBlock offers many benefits - you can specify a limit to the number of items it can accept in a buffer and specify that multiple messages can be processed in parallel. You can also combine multiple blocks in a processing pipeline. In a desktop application, a message can be posted to a pipeline in response to an event, get processed by separate blocks and results posted to a final block that updates the UI.
Padding, for example, could be performed by an intermediary TransformBlock< TIn,TOut>. This transformation is trivial and the cost of using the block is greater than the method, but that's just an illustration:
//In an initialization function
TransformBlock<string> _hmiAgent=new TransformBlock<string,string>(
msg=>msg.PadRight(80, ' '));
ActionBlock<string> _tcpBlock=new ActionBlock<string>(async msg=>{
TcpIpMessageSenderClient.ConnectAndSendMessage());
await Task.Delay(2000);
);
var linkOptions=new DataflowLinkOptions{PropagateCompletion = true};
_hmiAgent.LinkTo(_tcpBlock);
The posting code doesn't change at all
_hmiAgent.Post(someMessage);
When the application terminates, we need to wait for the _tcpBlock to complete:
_hmiAgent.Complete();
await _tcpBlock.Completion;
Visual Studio 2015+ itself uses TPL Dataflow for such scenarios
Bar Arnon provides a better example in TPL Dataflow Is The Best Library You're Not Using, that shows how both synchronous and asynchronous methods can be used in a block.
The code is thread safe since both ConcurrentQueue and AutoResetEvent are thread safe. your strings are anyway being read and never being written to, so this code is thread safe.
However, You have to make sure you call SendMessageToTcpIP in some sort of a loop.
otherwise , you have a dangerous race condition - some messages may get lost:
while (!MessagesQueue.IsEmpty)
{
string message;
bool isRemoved = MessagesQueue.TryDequeue(out message);
if (isRemoved)
localMessagesQueue.Enqueue(message);
}
//<<--- what happens if another thread enqueues a message here?
while (localMessagesQueue.Count() != 0)
{
TcpIpMessageSenderClient.ConnectAndSendMessage(localMessagesQueue.Dequeue().PadRight(80, ' '));
Thread.Sleep(2000);
}
Other than that, AutoResetEvent is extremely heavy object. it uses a kernel object to synchronize threads. every call is a system call which may be costly. consider using user mode synchronization object (doesn't .net provides some sort of condition variable?)
This is an refactored code snippet of how I would implement this functionality:
class MessagesManager {
private readonly AutoResetEvent messagesAvailableSignal = new AutoResetEvent(false);
private readonly ConcurrentQueue<string> messageQueue = new ConcurrentQueue<string>();
public void PublishMessage(string Message) {
messageQueue.Enqueue(Message);
messagesAvailableSignal.Set();
}
public void SendMessageToTcpIP() {
while (true) {
messagesAvailableSignal.WaitOne();
while (!messageQueue.IsEmpty) {
string message;
if (messageQueue.TryDequeue(out message)) {
TcpIpMessageSenderClient.ConnectAndSendMessage(message.PadRight(80, ' '));
}
}
}
}
}
Points to note here:
This drains the queue completely: if there is at least one message, it will process all of them
The 2000ms Thread sleep is removed
I am working on legacy application (winform App) where I need to improve performance.
In this application, we are using MVP pattern and Shell usage reflection to find which presenter it needs to call in order to satisfied the user request. So there is a function which do the following tasks...
Find the appropriate presenter
Itrate through it's all methods to find out default method reference.
prepare an array for method's parameter inputs
Call the default method on presenter
return the presenter reference
here is some code...
public object FindPresenter(Type pType, string action, Dictionary<string, object> paramDictonary, string callerName = null)
{
if (pType == null)
{
throw new ArgumentNullException("presenterType");
}
var presenterTypeName = pType.Name;
var presenter = _presenterFactory.Create(pType);
presenter.CallerName = callerName;
if (presenter == null)
{
throw new SomeException(string.Format("Unable to resolve presenter"));
}
// Check each interface for the named method
MethodInfo method = null;
foreach (var i in presenter.GetType().GetInterfaces())
{
method = i.GetMethod(action, BindingFlags.FlattenHierarchy | BindingFlags.Public | BindingFlags.Instance);
if (method != null) break;
}
if (method == null)
{
throw new SomeException(string.Format("No action method found"));
}
// Match up parameters
var par = method.GetParameters();
object[] results = null;
if (paramDictonary != null)
{
if (par.Length != paramDictonary.Count)
throw new ArgumentException(
"Parameter mis-match");
results = (from d in paramDictonary
join p in par on d.Key equals p.Name
orderby p.Position
select d.Value).ToArray();
}
// Attach release action
presenter.ReleaseAction = () => _presenterFactory.Release(presenter);
// Invoke target method
method.Invoke(presenter, results);
return presenter;
}
This method take around 15-20 second to complete and freeze the UI. I want to refector this method with some async processing so UI is not freeze during this method. As I need to return presenter reference, I thought of using wait() or join() method, but they will again lock the UI.
Please note, I am using .NET 4.0.
Unless you have millions of presenter types to search through, which is highly doubtful, and unless your average presenter has millions of parameters, which is also highly doubtful, there is nothing in the code that I see above that should take 15 seconds to execute.
So, the entirety of the delay is not in the code that you are showing us, but in one of the functions it invokes.
That could be in the very suspicious looking _presenterFactory.Create(pType); or in the implementation of paramDictionary if by any chance it happens to be a roll-your-own dictionary instead of a standard hash dictionary, or in the invocation of method.Invoke(presenter, results); itself.
Most likely in the last.
So, first of all, profile your code to find what the real culprit is.
Then, restructure your code so that the lengthy process happens on a separate worker thread. This may require that you pull considerable parts of your application out of the GUI thread and into that worker thread. Nobody said GUI programming was easy.
If the culprit is method.Invoke(), then this looks like something that you may be able to move to another thread relatively easily: Start that thread right before returning, and make sure everything that happens with each one of those presenter objects is thread safe.
But of course, these presenters are trying to actually render stuff in the GUI, then you will have to go and refactor all of them too, to separate their computationally expensive logic from the rendering logic, because WinForms can only be invoked from within its own thread.
The easiest approach is to use Application.DoEvents(); in your foreach loop to keep UI unlocked during long running processes
You may refer to MSDN system.windows.forms.application.doevents
But before using it you must also read Keeping your UI Responsive and the Dangers of Application.DoEvents
Well, based on your comment: "My question is how can I refector this by putting some tasks in background which do not lock the UI."
Try this:
class Program
{
static void Main(string[] args)
{
Task t1 = Task.Run(() => FindPresenter(typeof(Program), "boo"))
.ContinueWith(x => UsePresenter(x.Result));
while (true)
{
Thread.Sleep(200);
Console.WriteLine("I am the ui thread. Press a key to exit.");
if ( Console.KeyAvailable)
break;
}
}
static object FindPresenter(Type type, string action)
{
// Your FindPresenter logic here
Thread.Sleep(1000);
return (object)"I'm a presenter";
}
static void UsePresenter(object presenter)
{
Console.WriteLine(presenter.ToString());
Console.WriteLine("Done using presenter.");
}
}
Sorry I am not a Winforms guy.... In WPF we have Dispatcher for thread affinity issues. You might try this:
System.Windows.Threading.Dispatcher and WinForms?
I am educating myself on Parallel.Invoke, and parallel processing in general, for use in current project. I need a push in the right direction to understand how you can dynamically\intelligently allocate more parallel 'threads' as required.
As an example. Say you are parsing large log files. This involves reading from file, some sort of parsing of the returned lines and finally writing to a database.
So to me this is a typical problem that can benefit from parallel processing.
As a simple first pass the following code implements this.
Parallel.Invoke(
()=> readFileLinesToBuffer(),
()=> parseFileLinesFromBuffer(),
()=> updateResultsToDatabase()
);
Behind the scenes
readFileLinesToBuffer() reads each line and stores to a buffer.
parseFileLinesFromBuffer comes along and consumes lines from buffer and then let's say it put them on another buffer so that updateResultsToDatabase() can come along and consume this buffer.
So the code shown assumes that each of the three steps uses the same amount of time\resources but lets say the parseFileLinesFromBuffer() is a long running process so instead of running just one of these methods you want to run two in parallel.
How can you have the code intelligently decide to do this based on any bottlenecks it might perceive?
Conceptually I can see how some approach of monitoring the buffer sizes might work, spawning a new 'thread' to consume the buffer at an increased rate for example...but I figure this type of issue has been considered in putting together the TPL library.
Some sample code would be great but I really just need a clue as to what concepts I should investigate next. It looks like maybe the System.Threading.Tasks.TaskScheduler holds the key?
Have you tried the Reactive Extensions?
http://msdn.microsoft.com/en-us/data/gg577609.aspx
The Rx is a new tecnology from Microsoft, the focus as stated in the official site:
The Reactive Extensions (Rx)... ...is a library to compose
asynchronous and event-based programs using observable collections and
LINQ-style query operators.
You can download it as a Nuget package
https://nuget.org/packages/Rx-Main/1.0.11226
Since I am currently learning Rx I wanted to take this example and just write code for it, the code I ended up it is not actually executed in parallel, but it is completely asynchronous, and guarantees the source lines are executed in order.
Perhaps this is not the best implementation, but like I said I am learning Rx, (thread-safe should be a good improvement)
This is a DTO that I am using to return data from the background threads
class MyItem
{
public string Line { get; set; }
public int CurrentThread { get; set; }
}
These are the basic methods doing the real work, I am simulating the time with a simple Thread.Sleep and I am returning the thread used to execute each method Thread.CurrentThread.ManagedThreadId. Note the timer of the ProcessLine it is 4 sec, it's the most time-consuming operation
private IEnumerable<MyItem> ReadLinesFromFile(string fileName)
{
var source = from e in Enumerable.Range(1, 10)
let v = e.ToString()
select v;
foreach (var item in source)
{
Thread.Sleep(1000);
yield return new MyItem { CurrentThread = Thread.CurrentThread.ManagedThreadId, Line = item };
}
}
private MyItem UpdateResultToDatabase(string processedLine)
{
Thread.Sleep(700);
return new MyItem { Line = "s" + processedLine, CurrentThread = Thread.CurrentThread.ManagedThreadId };
}
private MyItem ProcessLine(string line)
{
Thread.Sleep(4000);
return new MyItem { Line = "p" + line, CurrentThread = Thread.CurrentThread.ManagedThreadId };
}
The following method I am using it just to update the UI
private void DisplayResults(MyItem myItem, Color color, string message)
{
this.listView1.Items.Add(
new ListViewItem(
new[]
{
message,
myItem.Line ,
myItem.CurrentThread.ToString(),
Thread.CurrentThread.ManagedThreadId.ToString()
}
)
{
ForeColor = color
}
);
}
And finally this is the method that calls the Rx API
private void PlayWithRx()
{
// we init the observavble with the lines read from the file
var source = this.ReadLinesFromFile("some file").ToObservable(Scheduler.TaskPool);
source.ObserveOn(this).Subscribe(x =>
{
// for each line read, we update the UI
this.DisplayResults(x, Color.Red, "Read");
// for each line read, we subscribe the line to the ProcessLine method
var process = Observable.Start(() => this.ProcessLine(x.Line), Scheduler.TaskPool)
.ObserveOn(this).Subscribe(c =>
{
// for each line processed, we update the UI
this.DisplayResults(c, Color.Blue, "Processed");
// for each line processed we subscribe to the final process the UpdateResultToDatabase method
// finally, we update the UI when the line processed has been saved to the database
var persist = Observable.Start(() => this.UpdateResultToDatabase(c.Line), Scheduler.TaskPool)
.ObserveOn(this).Subscribe(z => this.DisplayResults(z, Color.Black, "Saved"));
});
});
}
This process runs totally in the background, this is the output generated:
in an async/await world, you'd have something like:
public async Task ProcessFileAsync(string filename)
{
var lines = await ReadLinesFromFileAsync(filename);
var parsed = await ParseLinesAsync(lines);
await UpdateDatabaseAsync(parsed);
}
then a caller could just do var tasks = filenames.Select(ProcessFileAsync).ToArray(); and whatever (WaitAll, WhenAll, etc, depending on context)
Use a couple of BlockingCollection. Here is an example
The idea is that you create a producer that puts data into the collection
while (true) {
var data = ReadData();
blockingCollection1.Add(data);
}
Then you create any number of consumers that reads from the collection
while (true) {
var data = blockingCollection1.Take();
var processedData = ProcessData(data);
blockingCollection2.Add(processedData);
}
and so on
You can also let TPL handle the number of consumers by using Parallel.Foreach
Parallel.ForEach(blockingCollection1.GetConsumingPartitioner(),
data => {
var processedData = ProcessData(data);
blockingCollection2.Add(processedData);
});
(note that you need to use GetConsumingPartitioner not GetConsumingEnumerable (see here)
Why will the Parallel.ForEach will not finish executing a series of tasks until MoveNext returns false?
I have a tool that monitors a combination of MSMQ and Service Broker queues for incoming messages. When a message is found, it hands that message off to the appropriate executor.
I wrapped the check for messages in an IEnumerable, so that I could hand the Parallel.ForEach method the IEnumerable plus a delegate to run. The application is designed to run continuously w/ the IEnumerator.MoveNext processing in a loop until it's able to get work, then the IEnumerator.Current giving it the next item.
Since the MoveNext will never die until I set the CancelToken to true, this should continue to process for ever. Instead what I'm seeing is that once the Parallel.ForEach has picked up all the messages and the MoveNext is no longer returning "true", no more tasks are processed. Instead it seems like the MoveNext thread is the only thread given any work while it waits for it to return, and the other threads (including waiting and scheduled threads) do not do any work.
Is there a way to tell the Parallel to keep working while it waits for a response from the MoveNext?
If not, is there another way to structure the MoveNext to get what I want? (having it return true and then the Current returning a null object spawns a lot of bogus Tasks)
Bonus Question: Is there a way to limit how many messages the Parallel pulls off at once? It seems to pull off and schedule a lot of messages at once (the MaxDegreeOfParallelism only seems to limit how much work it does at once, it doesn't stop it from pulling off a lot of messages to be scheduled)
Here is the IEnumerator for what I've written (w/o some extraneous code):
public class DataAccessEnumerator : IEnumerator<TransportMessage>
{
public TransportMessage Current
{ get { return _currentMessage; } }
public bool MoveNext()
{
while (_cancelToken.IsCancellationRequested == false)
{
TransportMessage current;
foreach (var task in _tasks)
{
if (task.QueueType.ToUpper() == "MSMQ")
current = _msmq.Get(task.Name);
else
current = _serviceBroker.Get(task.Name);
if (current != null)
{
_currentMessage = current;
return true;
}
}
WaitHandle.WaitAny(new [] {_cancelToken.WaitHandle}, 500);
}
return false;
}
public DataAccessEnumerator(IDataAccess<TransportMessage> serviceBroker, IDataAccess<TransportMessage> msmq, IList<JobTask> tasks, CancellationToken cancelToken)
{
_serviceBroker = serviceBroker;
_msmq = msmq;
_tasks = tasks;
_cancelToken = cancelToken;
}
private readonly IDataAccess<TransportMessage> _serviceBroker;
private readonly IDataAccess<TransportMessage> _msmq;
private readonly IList<JobTask> _tasks;
private readonly CancellationToken _cancelToken;
private TransportMessage _currentMessage;
}
Here is the Parallel.ForEach call where _queueAccess is the IEnumerable that holds the above IEnumerator and RunJob processes a TransportMessage that is returned from that IEnumerator:
var parallelOptions = new ParallelOptions
{
CancellationToken = _cancelTokenSource.Token,
MaxDegreeOfParallelism = 8
};
Parallel.ForEach(_queueAccess, parallelOptions, x => RunJob(x));
It sounds to me like Parallel.ForEach isn't really a good match for what you want to do. I suggest you use BlockingCollection<T> to create a producer/consumer queue instead - create a bunch of threads/tasks to service the blocking collection, and add work items to it as and when they arrive.
Your problem might be to do with the Partitioner being used.
In your case, the TPL will choose the Chunk Partitioner, which will take multiple items from the enum before passing them on to be processed. The number of items taken in each chunk will increase with time.
When your MoveNext method blocks, the TPL is left waiting for the next item and won't process the items that it has already taken.
You have a couple of options to fix this:
1) Write a Partitioner that always returns individual items. Not as tricky as it sounds.
2) Use the TPL instead of Parallel.ForEach:
foreach ( var item in _queueAccess )
{
var capturedItem = item;
Task.Factory.StartNew( () => RunJob( capturedItem ) );
}
The second solution changes the behaviour a bit. The foreach loop will complete when all the Tasks have been created, not when they have finished. If this is a problem for you, you can add a CountdownEvent:
var ce = new CountdownEvent( 1 );
foreach ( var item in _queueAccess )
{
ce.AddCount();
var capturedItem = item;
Task.Factory.StartNew( () => { RunJob( capturedItem ); ce.Signal(); } );
}
ce.Signal();
ce.Wait();
I haven't gone to the effort to make sure of this, but the impression I'd received from discussions of Parallel.ForEach was that it would pull all the items out of the enumerable them make appropriate decisions about how to divide them across threads. Based on your problem, that seems correct.
So, to keep most of your current code, you should probably pull the blocking code out of the iterator and place it into a loop around the call to Parallel.ForEach (which uses the iterator).