Track progress when using TPL's Parallel.ForEach

Track progress when using TPL's Parallel.ForEach - c#

What is the best way way to track progress in the following
long total = Products.LongCount();
long current = 0;
double Progress = 0.0;
Parallel.ForEach(Products, product =>
{
try
{
var price = GetPrice(SystemAccount, product);
SavePrice(product,price);
}
finally
{
Interlocked.Decrement(ref this.current);
}});
I want to update the progress variable from 0.0 to 1.0 (current/total) but i don't want to use anything that would have an adverse effect on the parallelism.

Jon's solution is good, if you need simple synchronization like this, your first attempt should almost always use lock. But if you measure that the locking slows things too much, you should think about using something like Interlocked.
In this case, I would use Interlocked.Increment to increment the current count, and change Progress into a property:
private long total;
private long current;
public double Progress
{
get
{
if (total == 0)
return 0;
return (double)current / total;
}
}
…
this.total = Products.LongCount();
this.current = 0;
Parallel.ForEach(Products, product =>
{
try
{
var price = GetPrice(SystemAccount, product);
SavePrice(product, price);
}
finally
{
Interlocked.Increment(ref this.current);
}
});
Also, you might want to consider what to do with exceptions, I'm not sure that iterations that ended with an exception should be counted as done.

Since you are just doing a few quick calculations, ensure atomicity by locking on an appropriate object:
long total = Products.LongCount();
long current = 0;
double Progress = 0.0;
var lockTarget = new object();
Parallel.ForEach(Products, product =>
{
try
{
var price = GetPrice(SystemAccount, product);
SavePrice(product,price);
}
finally
{
lock (lockTarget) {
Progress = ++this.current / total;
}
}});

A solution without using any blocking in the body:
long total = Products.LongCount();
BlockingCollection<MyState> states = new BlockingCollection<MyState>();
Parallel.ForEach(Products, () =>
{
MyState myState = new MyState();
states.Add(myState);
return myState;
},
(i, state, arg3, myState) =>
{
try
{
var price = GetPrice(SystemAccount, product);
SavePrice(product,price);
}
finally
{
myState.value++;
return myState;
}
},
i => { }
);
Then, to access the current progress:
(float)states.Sum(state => state.value) / total

Related

How to convert Tuple to Async Task

I'm doing a small application and I need help, because I do not know where the problem is.
I have not been with C # for a long time and I am learning little by little, because all this is leisure form me, no more.
I have the following Tuple that is working correctly:
private Tuple<int, int, int, int> CheckStatus()
{
int out = 0;
int stage = 0;
int retired = 0;
int stop = 0;
for (int i = 0; i < Dgv.Rows.Count; i++)
{
if (Dgv.Rows[i].Cells["Start"].Value != null)
{
out = out + 1;
}
if (Dgv.Rows[i].Cells["Start"].Value != null && Dgv.Rows[i].Cells["Finnish"].Value == null)
{
stage = stage + 1;
}
if (Dgv.Rows[i].Cells["Start"].Value != null && Dgv.Rows[i].Cells["Finnish"].Value != null)
{
stop = stop + 1;
}
}
retired = GetRetirements();
stage = stage - retired;
return new Tuple<int, int, int,int>(out, stage, retired, stop);
}
I want to pass it to asynchronous to execute an await method because now the GetRetirements method is asynchronous tasks, and change the code to this, but i have problems:
private async Task<Tuple<int, int, int, int>> CheckStatus()
{
int out = 0;
int stage = 0;
int retired = 0;
int stop = 0;
for (int i = 0; i < Dgv.Rows.Count; i++)
{
if (Dgv.Rows[i].Cells["Start"].Value != null)
{
out = out + 1;
}
if (Dgv.Rows[i].Cells["Start"].Value != null && Dgv.Rows[i].Cells["Finnish"].Value == null)
{
stage = stage + 1;
}
if (Dgv.Rows[i].Cells["Start"].Value != null && Dgv.Rows[i].Cells["Finnish"].Value != null)
{
stop = stop + 1;
}
}
retired = await GetRetirements();
stage = stage - retired;
return new Tuple<int, int, int,int>(out, stage, retired, stop);
}
But tells me that can not find any item (item1, item2, item3, item4). I do not know where is the problem.
private void GetCheckStatus()
{
LblOut.Text = CheckStatus().Item1.ToString();
LblStage.Text = CheckStatus().Item2.ToString();
LblRetired.Text = CheckStatus().Item3.ToString();
LblStop.Text = CheckStatus().Item4.ToString();
}
I am doing something wrong? It's the first time I work with Tuple and I do not know the truth that it could be wrong.
Thanks you very much.
Best regards,

CheckStatus is now an async function. To get the result you need to await and you likely only want to invoke the function once. Note how async has also been added to GetCheckStatus and will flow all the way up to an async void event handler, e.g. a button click.
private async Task GetCheckStatus()
{
var status = await CheckStatus()
LblOut.Text = status.Item1.ToString();
LblStage.Text = status.Item2.ToString();
LblRetired.Text = status.Item3.ToString();
LblStop.Text = status.Item4.ToString();
}

You changed CheckStatus() to return a Task<>. You should probably await that and use the result as before.
You could also handle it in different ways, depending on your UI framework. But it comes down to "this method is now aysnc, handle it that way."

You've made the inner call async but the outer call is not waiting for it. Try something like:
private async Task GetCheckStatus()
{
var result = await CheckStatus();
LblOut.Text = result .Item1.ToString();
LblStage.Text = result .Item2.ToString();
LblRetired.Text = result .Item3.ToString();
LblStop.Text = result .Item4.ToString();
}

The cause is, that you forgot to await for the results of CheckStatus() before accessing the result.
It is quite conventional to end the name of async functions with async. This is to warn users not to forget that they are using async-await, and that they should await for the return value before accessing the result.
This has also the advantage that you can offer both the normal version and the async version
async Task<int> GetRetirementsAsync(){...}
async Task<Tuple<int, int, int, int>> CheckStatusAsync()
{
...
int retired = await GetRetirementsAsync();
return new Tuple...
}
async Task GetCheckStatusAsync()
{
var tuple = await CheckStatusAsync();
// process output:
LblOut.Text = tuple.Item1.ToString();
LblStage.Text = tuple.Item2.ToString();
LblRetired.Text = tuple.Item3.ToString();
LblStop.Text = tuple.Item4.ToString();
}
Possible performance improvement
The reason that you want to use the GetRetirementsAsync instead of the non-async GetRetirements, is because you expect that somewhere deep inside the process has to wait idly for the results from another process, like querying database, or reading a file, or fetching data from the internet.
Instead of waiting idly, you can use async await to do other things, until you really need the results from the database.
You do this, by starting the task, without awaiting. The thread won't wait idly for the database, but continues processing your statements until you need the result and await the task.
private async Task<Tuple<int, int, int, int>> CheckStatus()
{
// Get the retirements, do not await yet.
Task<int> taskGetRetirements = GetRetirementsAsync();
// instead of waiting idly, your thread is free to do the following:
int out = 0;
int stage = 0;
int retired = 0;
int stop = 0;
for (int i = 0; i < Dgv.Rows.Count; i++)
{
...
}
// now you need the retirements; await for the task to finish
int retired = await taskGetRetirements;
stage = stage - retired;
return new Tuple<int, int, int,int>(out, stage, retired, stop);
}

Try a speculative, concurrent, lock free, atomic update until abort condition matches

Please let me know if you see any performance improvements, bugs, or anything you'd change and why.
public static bool TrySpeculativeUpdate(ref int field, out int result,
Func<int, int> update, Func<int, bool> shouldAbort)
{
SpinWait spinWait = new SpinWait();
while (true)
{
int snapshot = field;
if (shouldAbort(field))
{
result = 0;
return false;
}
else
{
int calc = update(snapshot);
if (Interlocked.CompareExchange(ref field, calc, snapshot) == snapshot)
{
result = calc;
return true;
}
}
spinWait.SpinOnce();
}
}
Can be used like this
private bool TryIncreaseCapacity(out int newCapacity)
{
return TrySpeculativeUpdate(ref _currentCapacity, out newCapacity,
(currentCapacity) => currentCapacity + 1,
(currentCapacity) => currentCapacity == _maxCapacity);
}
if (this.TryIncreaseCapacity(out newCapacity))
{
...
}
else
{
...
}
What I'm really trying to accomplish is the fastest thread safe version of Interlocked.Increment that will stop incrementing at a max value and I will have some way of detecting it stopped incrementing.

Use Task.Run instead of Delegate.BeginInvoke

I have recently upgraded my projects to ASP.NET 4.5 and I have been waiting a long time to use 4.5's asynchronous capabilities. After reading the documentation I'm not sure whether I can improve my code at all.
I want to execute a task asynchronously and then forget about it. The way that I'm currently doing this is by creating delegates and then using BeginInvoke.
Here's one of the filters in my project with creates an audit in our database every time a user accesses a resource that must be audited:
public override void OnActionExecuting(ActionExecutingContext filterContext)
{
var request = filterContext.HttpContext.Request;
var id = WebSecurity.CurrentUserId;
var invoker = new MethodInvoker(delegate
{
var audit = new Audit
{
Id = Guid.NewGuid(),
IPAddress = request.UserHostAddress,
UserId = id,
Resource = request.RawUrl,
Timestamp = DateTime.UtcNow
};
var database = (new NinjectBinder()).Kernel.Get<IDatabaseWorker>();
database.Audits.InsertOrUpdate(audit);
database.Save();
});
invoker.BeginInvoke(StopAsynchronousMethod, invoker);
base.OnActionExecuting(filterContext);
}
But in order to finish this asynchronous task, I need to always define a callback, which looks like this:
public void StopAsynchronousMethod(IAsyncResult result)
{
var state = (MethodInvoker)result.AsyncState;
try
{
state.EndInvoke(result);
}
catch (Exception e)
{
var username = WebSecurity.CurrentUserName;
Debugging.DispatchExceptionEmail(e, username);
}
}
I would rather not use the callback at all due to the fact that I do not need a result from the task that I am invoking asynchronously.
How can I improve this code with Task.Run() (or async and await)?

If I understood your requirements correctly, you want to kick off a task and then forget about it. When the task completes, and if an exception occurred, you want to log it.
I'd use Task.Run to create a task, followed by ContinueWith to attach a continuation task. This continuation task will log any exception that was thrown from the parent task. Also, use TaskContinuationOptions.OnlyOnFaulted to make sure the continuation only runs if an exception occurred.
Task.Run(() => {
var audit = new Audit
{
Id = Guid.NewGuid(),
IPAddress = request.UserHostAddress,
UserId = id,
Resource = request.RawUrl,
Timestamp = DateTime.UtcNow
};
var database = (new NinjectBinder()).Kernel.Get<IDatabaseWorker>();
database.Audits.InsertOrUpdate(audit);
database.Save();
}).ContinueWith(task => {
task.Exception.Handle(ex => {
var username = WebSecurity.CurrentUserName;
Debugging.DispatchExceptionEmail(ex, username);
});
}, TaskContinuationOptions.OnlyOnFaulted);
As a side-note, background tasks and fire-and-forget scenarios in ASP.NET are highly discouraged. See The Dangers of Implementing Recurring Background Tasks In ASP.NET

It may sound a bit out of scope, but if you just want to forget after you launch it, why not using directly ThreadPool?
Something like:
ThreadPool.QueueUserWorkItem(
x =>
{
try
{
// Do something
...
}
catch (Exception e)
{
// Log something
...
}
});
I had to do some performance benchmarking for different async call methods and I found that (not surprisingly) ThreadPool works much better, but also that, actually, BeginInvoke is not that bad (I am on .NET 4.5). That's what I found out with the code at the end of the post. I did not find something like this online, so I took the time to check it myself. Each call is not exactly equal, but it is more or less functionally equivalent in terms of what it does:
ThreadPool: 70.80ms
Task: 90.88ms
BeginInvoke: 121.88ms
Thread: 4657.52ms
public class Program
{
public delegate void ThisDoesSomething();
// Perform a very simple operation to see the overhead of
// different async calls types.
public static void Main(string[] args)
{
const int repetitions = 25;
const int calls = 1000;
var results = new List<Tuple<string, double>>();
Console.WriteLine(
"{0} parallel calls, {1} repetitions for better statistics\n",
calls,
repetitions);
// Threads
Console.Write("Running Threads");
results.Add(new Tuple<string, double>("Threads", RunOnThreads(repetitions, calls)));
Console.WriteLine();
// BeginInvoke
Console.Write("Running BeginInvoke");
results.Add(new Tuple<string, double>("BeginInvoke", RunOnBeginInvoke(repetitions, calls)));
Console.WriteLine();
// Tasks
Console.Write("Running Tasks");
results.Add(new Tuple<string, double>("Tasks", RunOnTasks(repetitions, calls)));
Console.WriteLine();
// Thread Pool
Console.Write("Running Thread pool");
results.Add(new Tuple<string, double>("ThreadPool", RunOnThreadPool(repetitions, calls)));
Console.WriteLine();
Console.WriteLine();
// Show results
results = results.OrderBy(rs => rs.Item2).ToList();
foreach (var result in results)
{
Console.WriteLine(
"{0}: Done in {1}ms avg",
result.Item1,
(result.Item2 / repetitions).ToString("0.00"));
}
Console.WriteLine("Press a key to exit");
Console.ReadKey();
}
/// <summary>
/// The do stuff.
/// </summary>
public static void DoStuff()
{
Console.Write("*");
}
public static double RunOnThreads(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var toProcess = calls;
var stopwatch = new Stopwatch();
var resetEvent = new ManualResetEvent(false);
var threadList = new List<Thread>();
for (var i = 0; i < calls; i++)
{
threadList.Add(new Thread(() =>
{
// Do something
DoStuff();
// Safely decrement the counter
if (Interlocked.Decrement(ref toProcess) == 0)
{
resetEvent.Set();
}
}));
}
stopwatch.Start();
foreach (var thread in threadList)
{
thread.Start();
}
resetEvent.WaitOne();
stopwatch.Stop();
totalMs += stopwatch.ElapsedMilliseconds;
}
return totalMs;
}
public static double RunOnThreadPool(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var toProcess = calls;
var resetEvent = new ManualResetEvent(false);
var stopwatch = new Stopwatch();
var list = new List<int>();
for (var i = 0; i < calls; i++)
{
list.Add(i);
}
stopwatch.Start();
for (var i = 0; i < calls; i++)
{
ThreadPool.QueueUserWorkItem(
x =>
{
// Do something
DoStuff();
// Safely decrement the counter
if (Interlocked.Decrement(ref toProcess) == 0)
{
resetEvent.Set();
}
},
list[i]);
}
resetEvent.WaitOne();
stopwatch.Stop();
totalMs += stopwatch.ElapsedMilliseconds;
}
return totalMs;
}
public static double RunOnBeginInvoke(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var beginInvokeStopwatch = new Stopwatch();
var delegateList = new List<ThisDoesSomething>();
var resultsList = new List<IAsyncResult>();
for (var i = 0; i < calls; i++)
{
delegateList.Add(DoStuff);
}
beginInvokeStopwatch.Start();
foreach (var delegateToCall in delegateList)
{
resultsList.Add(delegateToCall.BeginInvoke(null, null));
}
// We lose a bit of accuracy, but if the loop is big enough,
// it should not really matter
while (resultsList.Any(rs => !rs.IsCompleted))
{
Thread.Sleep(10);
}
beginInvokeStopwatch.Stop();
totalMs += beginInvokeStopwatch.ElapsedMilliseconds;
}
return totalMs;
}
public static double RunOnTasks(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var resultsList = new List<Task>();
var stopwatch = new Stopwatch();
stopwatch.Start();
for (var i = 0; i < calls; i++)
{
resultsList.Add(Task.Factory.StartNew(DoStuff));
}
// We lose a bit of accuracy, but if the loop is big enough,
// it should not really matter
while (resultsList.Any(task => !task.IsCompleted))
{
Thread.Sleep(10);
}
stopwatch.Stop();
totalMs += stopwatch.ElapsedMilliseconds;
}
return totalMs;
}
}

Here's one of the filters in my project with creates an audit in our database every time a user accesses a resource that must be audited
Auditing is certainly not something I would call "fire and forget". Remember, on ASP.NET, "fire and forget" means "I don't care whether this code actually executes or not". So, if your desired semantics are that audits may occasionally be missing, then (and only then) you can use fire and forget for your audits.
If you want to ensure your audits are all correct, then either wait for the audit save to complete before sending the response, or queue the audit information to reliable storage (e.g., Azure queue or MSMQ) and have an independent backend (e.g., Azure worker role or Win32 service) process the audits in that queue.
But if you want to live dangerously (accepting that occasionally audits may be missing), you can mitigate the problems by registering the work with the ASP.NET runtime. Using the BackgroundTaskManager from my blog:
public override void OnActionExecuting(ActionExecutingContext filterContext)
{
var request = filterContext.HttpContext.Request;
var id = WebSecurity.CurrentUserId;
BackgroundTaskManager.Run(() =>
{
try
{
var audit = new Audit
{
Id = Guid.NewGuid(),
IPAddress = request.UserHostAddress,
UserId = id,
Resource = request.RawUrl,
Timestamp = DateTime.UtcNow
};
var database = (new NinjectBinder()).Kernel.Get<IDatabaseWorker>();
database.Audits.InsertOrUpdate(audit);
database.Save();
}
catch (Exception e)
{
var username = WebSecurity.CurrentUserName;
Debugging.DispatchExceptionEmail(e, username);
}
});
base.OnActionExecuting(filterContext);
}

Parallel.Foreach + yield return?

I want to process something using parallel loop like this :
public void FillLogs(IEnumerable<IComputer> computers)
{
Parallel.ForEach(computers, cpt=>
{
cpt.Logs = cpt.GetRawLogs().ToList();
});
}
Ok, it works fine. But How to do if I want the FillLogs method return an IEnumerable ?
public IEnumerable<IComputer> FillLogs(IEnumerable<IComputer> computers)
{
Parallel.ForEach(computers, cpt=>
{
cpt.Logs = cpt.GetRawLogs().ToList();
yield return cpt // KO, don't work
});
}
EDIT
It seems not to be possible... but I use something like this :
public IEnumerable<IComputer> FillLogs(IEnumerable<IComputer> computers)
{
return computers.AsParallel().Select(cpt => cpt);
}
But where I put the cpt.Logs = cpt.GetRawLogs().ToList(); instruction

Short version - no, that isn't possible via an iterator block; the longer version probably involves synchronized queue/dequeue between the caller's iterator thread (doing the dequeue) and the parallel workers (doing the enqueue); but as a side note - logs are usually IO-bound, and parallelising things that are IO-bound often doesn't work very well.
If the caller is going to take some time to consume each, then there may be some merit to an approach that only processes one log at a time, but can do that while the caller is consuming the previous log; i.e. it begins a Task for the next item before the yield, and waits for completion after the yield... but that is again, pretty complex. As a simplified example:
static void Main()
{
foreach(string s in Get())
{
Console.WriteLine(s);
}
}
static IEnumerable<string> Get() {
var source = new[] {1, 2, 3, 4, 5};
Task<string> outstandingItem = null;
Func<object, string> transform = x => ProcessItem((int) x);
foreach(var item in source)
{
var tmp = outstandingItem;
// note: passed in as "state", not captured, so not a foreach/capture bug
outstandingItem = new Task<string>(transform, item);
outstandingItem.Start();
if (tmp != null) yield return tmp.Result;
}
if (outstandingItem != null) yield return outstandingItem.Result;
}
static string ProcessItem(int i)
{
return i.ToString();
}

I don't want to be offensive, but maybe there is a lack of understanding. Parallel.ForEach means that the TPL will run the foreach according to the available hardware in several threads. But that means, that ii is possible to do that work in parallel! yield return gives you the opportunity to get some values out of a list (or what-so-ever) and give them back one-by-one as they are needed. It prevents of the need to first find all items matching the condition and then iterate over them. That is indeed a performance advantage, but can't be done in parallel.

Although the question is old I've managed to do something just for fun.
class Program
{
static void Main(string[] args)
{
foreach (var message in GetMessages())
{
Console.WriteLine(message);
}
}
// Parallel yield
private static IEnumerable<string> GetMessages()
{
int total = 0;
bool completed = false;
var batches = Enumerable.Range(1, 100).Select(i => new Computer() { Id = i });
var qu = new ConcurrentQueue<Computer>();
Task.Run(() =>
{
try
{
Parallel.ForEach(batches,
() => 0,
(item, loop, subtotal) =>
{
Thread.Sleep(1000);
qu.Enqueue(item);
return subtotal + 1;
},
result => Interlocked.Add(ref total, result));
}
finally
{
completed = true;
}
});
int current = 0;
while (current < total || !completed)
{
SpinWait.SpinUntil(() => current < total || completed);
if (current == total) yield break;
current++;
qu.TryDequeue(out Computer computer);
yield return $"Completed {computer.Id}";
}
}
}
public class Computer
{
public int Id { get; set; }
}
Compared to Koray's answer this one really uses all the CPU cores.

You can use the following extension method
public static class ParallelExtensions
{
public static IEnumerable<T1> OrderedParallel<T, T1>(this IEnumerable<T> list, Func<T, T1> action)
{
var unorderedResult = new ConcurrentBag<(long, T1)>();
Parallel.ForEach(list, (o, state, i) =>
{
unorderedResult.Add((i, action.Invoke(o)));
});
var ordered = unorderedResult.OrderBy(o => o.Item1);
return ordered.Select(o => o.Item2);
}
}
use like:
public void FillLogs(IEnumerable<IComputer> computers)
{
cpt.Logs = computers.OrderedParallel(o => o.GetRawLogs()).ToList();
}
Hope this will save you some time.

How about
Queue<string> qu = new Queue<string>();
bool finished = false;
Task.Factory.StartNew(() =>
{
Parallel.ForEach(get_list(), (item) =>
{
string itemToReturn = heavyWorkOnItem(item);
lock (qu)
qu.Enqueue(itemToReturn );
});
finished = true;
});
while (!finished)
{
lock (qu)
while (qu.Count > 0)
yield return qu.Dequeue();
//maybe a thread sleep here?
}
Edit:
I think this is better:
public static IEnumerable<TOutput> ParallelYieldReturn<TSource, TOutput>(this IEnumerable<TSource> source, Func<TSource, TOutput> func)
{
ConcurrentQueue<TOutput> qu = new ConcurrentQueue<TOutput>();
bool finished = false;
AutoResetEvent re = new AutoResetEvent(false);
Task.Factory.StartNew(() =>
{
Parallel.ForEach(source, (item) =>
{
qu.Enqueue(func(item));
re.Set();
});
finished = true;
re.Set();
});
while (!finished)
{
re.WaitOne();
while (qu.Count > 0)
{
TOutput res;
if (qu.TryDequeue(out res))
yield return res;
}
}
}
Edit2: I agree with the short No answer. This code is useless; you cannot break the yield loop.

C# 2.0 Design Question - Creating sublists from a larger list

I am looking for a good design/alogrithm/pattern for the following:
I have a large list of TODO tasks. Each one of them has an estimated duration. I want to break the larger list into smaller sublists, each sublist containing a max of 4 hours of work.
My current algorithm is something like this:
while( index < list.Count )
{
List<string> subList = CreateSublist( ref index );
SaveSubList(subList);
}
Passing the index in as a ref feels awkward and not OOD. I am really consuming the TODO list somewhat like a stream, so I'm wondering if there's something similar I could do, but I'm somewhat of a C# newbie. I am also currently limited to C# 2.0. Any quick pointers on a good design here?

You can stuff everything in one method:
List<List<TodoTask>> GetTodoTasks(IEnumerable<TodoTask> tasks, int timeWindow)
{
List<List<TodoTask>> allTasks = new List<List<TodoTask>>();
List<TodoTask> tasks = new List<TodoTask>();
int duration = 0;
foreach(TodoTask task in tasks)
{
if(duration > timeWindow)
{
allTasks.Add(tasks);
duration = 0;
tasks = new List<TodoTask>();
}
tasks.Add(task);
duration += task.Duration;
}
allTasks.Add(tasks);
return allTasks;
}
Or, using iterators:
IEnumerable<List<TodoTask>> GetTodoTasks(IEnumerable<TodoTask> tasks, int timeWindow)
{
List<TodoTask> tasks = new List<TodoTask>();
int duration = 0;
foreach(TodoTask task in tasks)
{
if(duration > timeWindow)
{
yield return tasks;
duration = 0;
tasks = new List<TodoTask>();
}
tasks.Add(task);
duration += task.Duration;
}
yield return tasks;
}

This should do the job:
public static List<List<Task>> SplitTaskList(List<Task> tasks)
{
List<List<Task>> subLists = new List<List<Task>>();
List<Task> curList = new List<Task>();
int curDuration; // Measured in hours.
foreach (var item in tasks)
{
curDuration += item.Duration;
if (curDuration > 4)
{
subLists.Add(curList);
curList = new List<Task>();
curDuration = 0;
}
curList.Add(item);
}
subLists.Add(curList);
return subLists;
}
LINQ would probably simplify things, but since you are using C# 2.0 (and likely also .NET 2.0 I would presume), this would seem like the most straightforward solution.

I would suggest to encapsulate this into a class.
SubListBuilder<WorkItem> slb = new SubListBuilder<WorkItem>(
workItems, sublist => sublist.Sum(item => item.Duration) <= 4);
This nicely allows to supply a predicate to control how the sublists are build. Then you can just get your results.
while (slb.HasMoreSubLists)
{
SaveList(slb.GetNextSubList());
}
Or maybe this way.
foreach (var subList in slb.GetSubLists())
{
SaveList(subList);
}

Here's my solution:
class Task
{
public string Name { get; set; }
public int Duration { get; set; }
}
class TaskList : List<Task>
{
public int Duration { get; set; }
public void Add(Task task, int duration)
{
this.Add(task);
Duration += duration;
}
}
private static IList<TaskList> SplitTaskList(IList<Task> tasks, int topDuration)
{
IList<TaskList> subLists = new List<TaskList>();
foreach (var task in tasks)
{
subLists = DistributeTask(subLists, task, topDuration);
}
return subLists;
}
private static IList<TaskList> DistributeTask(IList<TaskList> subLists, Task task, int topDuration)
{
if (task.Duration > topDuration)
throw new ArgumentOutOfRangeException("task too long");
if (subLists.Count == 0)
subLists.Add(new TaskList());
foreach (var subList in subLists)
{
if (task.Duration + subList.Duration <= topDuration)
{
subList.Add(task, task.Duration);
return subLists;
}
}
TaskList newList = new TaskList();
newList.Add(task, task.Duration);
subLists.Add(newList);
return subLists;
}
Notice that this is not the optimal solution ... that would go to a whole new level :)
Also, this solution will distribute the items a little better than Noldorin and Anton solutions. You might end up will fewer lists.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Track progress when using TPL's Parallel.ForEach - c#

Related

How to convert Tuple to Async Task

Try a speculative, concurrent, lock free, atomic update until abort condition matches

Use Task.Run instead of Delegate.BeginInvoke

Parallel.Foreach + yield return?

C# 2.0 Design Question - Creating sublists from a larger list

Categories

Resources