My C# application stops responding for a long time, as I break the Debug it stops on a function.
foreach (var item in list)
{
xmldiff.Compare(item, secondary, output);
...
}
I guess the running time of this function is long or it hangs. Anyway, I want to wait for a certain time (e.g. 5 seconds) for the execution of this function, and if it exceeds this time, I skip it and go to the next item in the loop. How can I do it? I found some similar question but they are mostly for processes or asynchronous methods.
You can do it the brutal way: spin up a thread to do the work, join it with timeout, then abort it, if the join didn't work.
Example:
var worker = new Thread( () => { xmlDiff.Compare(item, secondary, output); } );
worker.Start();
if (!worker.Join( TimeSpan.FromSeconds( 1 ) ))
worker.Abort();
But be warned - aborting threads is not considered nice and can make your app unstable. If at all possible try to modify Compare to accept a CancellationToken to cancel the comparison.
I would avoid directly using threads and use Microsoft's Reactive Extensions (NuGet "Rx-Main") to abstract away the management of the threads.
I don't know the exact signature of xmldiff.Compare(item, secondary, output) but if I assume it produces an integer then I could do this with Rx:
var query =
from item in list.ToObservable()
from result in
Observable
.Start(() => xmldiff.Compare(item, secondary, output))
.Timeout(TimeSpan.FromSeconds(5.0), Observable.Return(-1))
select new { item, result };
var subscription =
query
.Subscribe(x =>
{
/* do something with `x.item` and/or `x.result` */
});
This automatically iterates through each item and starts a background computation of xmldiff.Compare, but only allows each computation to take as much as 5.0 seconds before returning a default value of -1.
The subscription variable is an IDisposable, so if you want to abort the entire query before it completes just call .Dispose().
I skip it and go to the next item in the loop
By "skip it", do you mean "leave it there" or "cancel it"? The two scenarios are quite different. But for both two I suggest you use Task.
//generate 10 example tasks
var tasks = Enumerable
.Range(0, 10)
.Select(n => new Task(() => DoSomething(n)))
.ToList();
var maxExecutionTime = TimeSpan.FromSeconds(5);
foreach (var task in tasks)
{
if (task.Wait(maxExecutionTime))
{
//the task is finished in time
}
else
{
// the task is over time
// just leave it there
// the loop continues
// if you want to cancel it, see
// http://stackoverflow.com/questions/4783865/how-do-i-abort-cancel-tpl-tasks
}
}
One thing to improve is "do you really need to run your tasks one by one?" If they are independent you can run them in parallel.
Related
In my code I have a method such as:
void PerformWork(List<Item> items)
{
HostingEnvironment.QueueBackgroundWorkItem(async cancellationToken =>
{
foreach (var item in items)
{
await itemHandler.PerformIndividualWork(item);
}
});
}
Where Item is just a known model and itemHandler just does some work based off of the model (the ItemHandler class is defined in a separately maintained code base as nuget pkg I'd rather not modify).
The purpose of this code is to have work done for a list of items in the background but synchronously.
As part of the work, I would like to create a unit test to verify that when this method is called, the items are handled synchronously. I'm pretty sure the issue can be simplified down to this:
await MyTask(1);
await MyTask(2);
Assert.IsTrue(/* MyTask with arg 1 was completed before MyTask with arg 2 */);
The first part of this code I can easily unit test is that the sequence is maintained. For example, using NSubstitute I can check method call order on the library code:
Received.InOrder(() =>
{
itemHandler.PerformIndividualWork(Arg.Is<Item>(arg => arg.Name == "First item"));
itemHandler.PerformIndividualWork(Arg.Is<Item>(arg => arg.Name == "Second item"));
itemHandler.PerformIndividualWork(Arg.Is<Item>(arg => arg.Name == "Third item"));
});
But I'm not quite sure how to ensure that they aren't run in parallel. I've had several ideas which seem bad like mocking the library to have an artificial delay when PerformIndividualWork is called and then either checking a time elapsed on the whole background task being queued or checking the timestamps of the itemHandler received calls for a minimum time between the calls. For instance, if I have PerformIndividualWork mocked to delay 500 milliseconds and I'm expecting three items, then I could check elapsed time:
stopwatch.Start();
// I have an interface instead of directly calling HostingEnvironment, so I can access the task being queued here
backgroundTask.Invoke(...);
stopwatch.Stop();
Assert.IsTrue(stopwatch.ElapsedMilliseconds > 1500);
But that doesn't feels right and could lead to false positives. Perhaps the solution lies in modifying the code itself; however, I can't think of a way of meaningfully changing it to make this sort of unit test (testing tasks are run in order) possible. We'll definitely have system/integration testing to ensure the issue caused by asynchronous performance of the individual items doesn't happen, but I would like to hit testing here at this level as well.
Not sure if this is a good idea, but one approach could be to use an itemHandler that will detect when items are handled in parallel. Here is a quick and dirty example:
public class AssertSynchronousItemHandler : IItemHandler
{
private volatile int concurrentWork = 0;
public List<Item> Items = new List<Item>();
public Task PerformIndividualWork(Item item) =>
Task.Run(() => {
var result = Interlocked.Increment(ref concurrentWork);
if (result != 1) {
throw new Exception($"Expected 1 work item running at a time, but got {result}");
}
Items.Add(item);
var after = Interlocked.Decrement(ref concurrentWork);
if (after != 0) {
throw new Exception($"Expected 0 work items running once this item finished, but got {after}");
}
});
}
There are probably big problems with this, but the basic idea is to check how many items are already being handled when we enter the method, then decrement the counter and check there are still no other items being handled. With threading stuff I think it is very hard to make guarantees about things from tests alone, but with enough items processed this can give us a little confidence that it is working as expected:
[Fact]
public void Sample() {
var handler = new AssertSynchronousItemHandler();
var subject = new Subject(handler);
var input = Enumerable.Range(0, 100).Select(x => new Item(x.ToString())).ToList();
subject.PerformWork(input);
// With the code from the question we don't have a way of detecting
// when `PerformWork` finishes. If we can't change this we need to make
// sure we wait "long enough". Yes this is yuck. :)
Thread.Sleep(1000);
Assert.Equal(input, handler.Items);
}
If I modify PerformWork to do things in parallel I get the test failing:
public void PerformWork2(List<Item> items) {
Task.WhenAll(
items.Select(item => itemHandler.PerformIndividualWork(item))
).Wait(2000);
}
// ---- System.Exception : Expected 1 work item running at a time, but got 4
That said, if it is very important to run synchronously and it is not apparent from glancing at the implementation with async/await then maybe it is worth using a more obviously synchronous design, like a queue serviced by only one thread, so that you're guaranteed synchronous execution by design and people won't inadvertently change it to async during refactoring (i.e. it is deliberately synchronous and documented that way).
I am using IBackgroundJobManager with Hangfire integration.
Use case:
I am processing a single uploaded file. After the file is saved, I would like to start two separate Abp.BackgroundJobs sequentially. Only after the first job completes, should the second job start.
Here is my code:
var measureJob1 = await _backgroundJobManager.EnqueueAsync<FileProcessBackgroundJob, FileProcessJobArgsDto>(
new FileProcessJobArgsDto
{
Id = Id,
User = user,
})
.ContinueWith<AnalyticsBackgroundJob<AnalyticsJobArgsDto>>("measureJob",x => x);
Problem:
I can not figure out the syntax for what I need when using the .ContinueWith<???>(???).
This seems to be an XY Problem.
How to start the second background job after the first completes? (Problem X)
There is no support for guaranteeing the sequential and conditional execution of jobs.
That said, background jobs are automatically retried, which allows for Way 3 below.
So, you can effectively achieve that by one of these ways:
Combine the two jobs into one job.
Enqueue the second job inside the first job.
Enqueue both jobs (you don't need to use ContinueWith, await is much neater). In the second job, check if the first job created what it needed to. Otherwise, throw an exception and rely on retry.
What is the syntax for what I need when using ContinueWith? (Solution Y)
There is no syntax for that, other than a less desirable variant of Way 3.
Task.ContinueWith is a C# construct that runs after the EnqueueAsync task, not the actual background job.
The syntax for Way 3 with ContinueWith would be something like:
var measureJobId = await _backgroundJobManager.EnqueueAsync<FileProcessBackgroundJob, FileProcessJobArgsDto>(
new FileProcessJobArgsDto
{
Id = Id,
User = user,
}
)
.ContinueWith(task => _backgroundJobManager.EnqueueAsync<AnalyticsBackgroundJob, AnalyticsJobArgsDto>(
new AnalyticsJobArgsDto("measureJob")
)
.Unwrap();
Compare that to:
var processJobId = await _backgroundJobManager.EnqueueAsync<FileProcessBackgroundJob, FileProcessJobArgsDto>(
new FileProcessJobArgsDto
{
Id = Id,
User = user,
}
);
var measureJobId = await _backgroundJobManager.EnqueueAsync<AnalyticsBackgroundJob, AnalyticsJobArgsDto>(
new AnalyticsJobArgsDto("measureJob")
);
I have a C# requirement for individually processing a 'great many' (perhaps > 100,000) records. Running this process sequentially is proving to be very slow with each record taking a good second or so to complete (with a timeout error set at 5 seconds).
I would like to try running these tasks asynchronously by using a set number of worker 'threads' (I use the term 'thread' here cautiously as I am not sure if I should be looking at a thread, or a task or something else).
I have looked at the ThreadPool, but I can't imagine it could queue the volume of requests required. My ideal pseudo code would look something like this...
public void ProcessRecords() {
SetMaxNumberOfThreads(20);
MyRecord rec;
while ((rec = GetNextRecord()) != null) {
var task = WaitForNextAvailableThreadFromPool(ProcessRecord(rec));
task.Start()
}
}
I will also need a mechanism that the processing method can report back to the parent/calling class.
Can anyone point me in the right direction with perhaps some example code?
A possible simple solution would be to use a TPL Dataflow block which is a higher abstraction over the TPL with configurations for degree of parallelism and so forth. You simply create the block (ActionBlock in this case), Post everything to it, wait asynchronously for completion and TPL Dataflow handles all the rest for you:
var block = new ActionBlock<MyRecord>(
rec => ProcessRecord(rec),
new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 20});
MyRecord rec;
while ((rec = GetNextRecord()) != null)
{
block.Post(rec);
}
block.Complete();
await block.Completion
Another benefit is that the block starts working as soon as the first record arrives and not only when all the records have been received.
If you need to report back on each record you can use a TransformBlock to do the actual processing and link an ActionBlock to it that does the updates:
var transform = new TransfromBlock<MyRecord, Report>(rec =>
{
ProcessRecord(rec);
return GenerateReport(rec);
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 20});
var reporter = new ActionBlock<Report>(report =>
{
RaiseEvent(report) // Or any other mechanism...
});
transform.LinkTo(reporter, new DataflowLinkOptions { PropagateCompletion = true });
MyRecord rec;
while ((rec = GetNextRecord()) != null)
{
transform.Post(rec);
}
transform.Complete();
await transform.Completion
Have you thought about using parallel processing with Actions?
ie, create a method to process a single record, add each record method as an action into a list, and then perform a parrallel.for on the list.
Dim list As New List(Of Action)
list.Add(New Action(Sub() MyMethod(myParameter)))
Parallel.ForEach(list, Sub(t) t.Invoke())
This is in vb.net, but I think you get the gist.
I will have to create a concurrent software which create several Task, and every Task could generate another task(that could also generate another Task, ...).
I need that the call to the method which launch task is blocking: no return BEFORE all task and subtask are completed.
I know there is this TaskCreationOptions.AttachedToParent property, but I think it will not fit:
The server will have something like 8 cores at least, and each task will create 2-3 subtask, so if I set the AttachedToParent option, I've the impression that the second sub-task will not start before the three tasks of the first subtask ends. So I will have a limited multitasking here.
So with this process tree:
I've the impression that if I set AttachedToParent property everytime I launch a thread, B will not ends before E,F,G are finished, so C will start before B finish, and I will have only 3 actives thread instead of the 8 I can have.
If I don't put the AttachedToParent property, A will be finished very fast and return.
So how could I do to ensure that I've always my 8 cores fully used if I don't set this option?
The TaskCreationOptions.AttachedToParent does not prevent the other subtasks from starting, but rather prevents the parent task itself from closing. So when E,F and G are started with AttachedToParent, B is not flagged as finished until all three are finished. So it should do just as you want.
The source (in the accepted answer).
As Me.Name mentioned, AttachedToParent doesn't behave according to your impressions. I think it's a fine option in this case.
But if you don't want to use that for whatever reason, you can wait for all the child tasks to finish with Task.WaitAll(). Although it means you have to have all of them in a collection.
Task.WaitAll() blocks the current thread until all the Tasks are finished. If you don't want that and you are on .Net 4.5, you can use Task.WhenAll(), which will return a single Task that will finish when all of the given Tasks finish.
You could you TaskFactory create options like in this example:
Task parent = new Task(() => {
var cts = new CancellationTokenSource();
var tf = new TaskFactory<Int32>(cts.Token,
TaskCreationOptions.AttachedToParent,
TaskContinuationOptions.ExecuteSynchronously,
TaskScheduler.Default);
// This tasks creates and starts 3 child tasks
var childTasks = new[] {
tf.StartNew(() => Sum(cts.Token, 10000)),
tf.StartNew(() => Sum(cts.Token, 20000)),
tf.StartNew(() => Sum(cts.Token, Int32.MaxValue)) // Too big, throws Overflow
};
// If any of the child tasks throw, cancel the rest of them
for (Int32 task = 0; task <childTasks.Length; task++)
childTasks[task].ContinueWith(
t => cts.Cancel(), TaskContinuationOptions.OnlyOnFaulted);
// When all children are done, get the maximum value returned from the
// non-faulting/canceled tasks. Then pass the maximum value to another
// task which displays the maximum result
tf.ContinueWhenAll(
childTasks,
completedTasks => completedTasks.Where(
t => !t.IsFaulted && !t.IsCanceled).Max(t => t.Result), CancellationToken.None)
.ContinueWith(t =>Console.WriteLine("The maximum is: " + t.Result),
TaskContinuationOptions.ExecuteSynchronously);
});
// When the children are done, show any unhandled exceptions too
parent.ContinueWith(p => {
// I put all this text in a StringBuilder and call Console.WriteLine just once
// because this task could execute concurrently with the task above & I don't
// want the tasks' output interspersed
StringBuildersb = new StringBuilder(
"The following exception(s) occurred:" + Environment.NewLine);
foreach (var e in p.Exception.Flatten().InnerExceptions)
sb.AppendLine(" "+ e.GetType().ToString());
Console.WriteLine(sb.ToString());
}, TaskContinuationOptions.OnlyOnFaulted);
// Start the parent Task so it can start its children
parent.Start();
Why will the Parallel.ForEach will not finish executing a series of tasks until MoveNext returns false?
I have a tool that monitors a combination of MSMQ and Service Broker queues for incoming messages. When a message is found, it hands that message off to the appropriate executor.
I wrapped the check for messages in an IEnumerable, so that I could hand the Parallel.ForEach method the IEnumerable plus a delegate to run. The application is designed to run continuously w/ the IEnumerator.MoveNext processing in a loop until it's able to get work, then the IEnumerator.Current giving it the next item.
Since the MoveNext will never die until I set the CancelToken to true, this should continue to process for ever. Instead what I'm seeing is that once the Parallel.ForEach has picked up all the messages and the MoveNext is no longer returning "true", no more tasks are processed. Instead it seems like the MoveNext thread is the only thread given any work while it waits for it to return, and the other threads (including waiting and scheduled threads) do not do any work.
Is there a way to tell the Parallel to keep working while it waits for a response from the MoveNext?
If not, is there another way to structure the MoveNext to get what I want? (having it return true and then the Current returning a null object spawns a lot of bogus Tasks)
Bonus Question: Is there a way to limit how many messages the Parallel pulls off at once? It seems to pull off and schedule a lot of messages at once (the MaxDegreeOfParallelism only seems to limit how much work it does at once, it doesn't stop it from pulling off a lot of messages to be scheduled)
Here is the IEnumerator for what I've written (w/o some extraneous code):
public class DataAccessEnumerator : IEnumerator<TransportMessage>
{
public TransportMessage Current
{ get { return _currentMessage; } }
public bool MoveNext()
{
while (_cancelToken.IsCancellationRequested == false)
{
TransportMessage current;
foreach (var task in _tasks)
{
if (task.QueueType.ToUpper() == "MSMQ")
current = _msmq.Get(task.Name);
else
current = _serviceBroker.Get(task.Name);
if (current != null)
{
_currentMessage = current;
return true;
}
}
WaitHandle.WaitAny(new [] {_cancelToken.WaitHandle}, 500);
}
return false;
}
public DataAccessEnumerator(IDataAccess<TransportMessage> serviceBroker, IDataAccess<TransportMessage> msmq, IList<JobTask> tasks, CancellationToken cancelToken)
{
_serviceBroker = serviceBroker;
_msmq = msmq;
_tasks = tasks;
_cancelToken = cancelToken;
}
private readonly IDataAccess<TransportMessage> _serviceBroker;
private readonly IDataAccess<TransportMessage> _msmq;
private readonly IList<JobTask> _tasks;
private readonly CancellationToken _cancelToken;
private TransportMessage _currentMessage;
}
Here is the Parallel.ForEach call where _queueAccess is the IEnumerable that holds the above IEnumerator and RunJob processes a TransportMessage that is returned from that IEnumerator:
var parallelOptions = new ParallelOptions
{
CancellationToken = _cancelTokenSource.Token,
MaxDegreeOfParallelism = 8
};
Parallel.ForEach(_queueAccess, parallelOptions, x => RunJob(x));
It sounds to me like Parallel.ForEach isn't really a good match for what you want to do. I suggest you use BlockingCollection<T> to create a producer/consumer queue instead - create a bunch of threads/tasks to service the blocking collection, and add work items to it as and when they arrive.
Your problem might be to do with the Partitioner being used.
In your case, the TPL will choose the Chunk Partitioner, which will take multiple items from the enum before passing them on to be processed. The number of items taken in each chunk will increase with time.
When your MoveNext method blocks, the TPL is left waiting for the next item and won't process the items that it has already taken.
You have a couple of options to fix this:
1) Write a Partitioner that always returns individual items. Not as tricky as it sounds.
2) Use the TPL instead of Parallel.ForEach:
foreach ( var item in _queueAccess )
{
var capturedItem = item;
Task.Factory.StartNew( () => RunJob( capturedItem ) );
}
The second solution changes the behaviour a bit. The foreach loop will complete when all the Tasks have been created, not when they have finished. If this is a problem for you, you can add a CountdownEvent:
var ce = new CountdownEvent( 1 );
foreach ( var item in _queueAccess )
{
ce.AddCount();
var capturedItem = item;
Task.Factory.StartNew( () => { RunJob( capturedItem ); ce.Signal(); } );
}
ce.Signal();
ce.Wait();
I haven't gone to the effort to make sure of this, but the impression I'd received from discussions of Parallel.ForEach was that it would pull all the items out of the enumerable them make appropriate decisions about how to divide them across threads. Based on your problem, that seems correct.
So, to keep most of your current code, you should probably pull the blocking code out of the iterator and place it into a loop around the call to Parallel.ForEach (which uses the iterator).