Are Parallel.Invoke and Parallel.ForEach essentially the same thing? - c#

And by "same thing" I mean do these two operations basically do the same work, and it just boils down to which one is more convenient to call based on what you have to work with? (i.e. a list of delegates or a list of things to iterate over)? I've been searching MSDN, StackOverflow, and various random articles but I have yet to find a clear answer for this.
EDIT: I should have been clearer; I am asking if the two methods do the same thing because if they do not, I would like to understand which would be more efficient.
Example: I have a list of 500 key values. Currently I use a foreach loop that iterates through the list (serially) and performs work for each item. If I want to take advantage of multiple cores, should I simply use Parallel.ForEach instead?
Let's say for arguments's sake that I had an array of 500 delegates for those 500 tasks - would the net effect be any different calling Parallel.Invoke and giving it a list of 500 delegates?

Parallel.ForEach goes through the list of elements and can perform some task on the elements of the array.
eg.
Parallel.ForEach(val, (array) => Sum(array));
Parallel.Invoke can invoke many functions in parallel.
eg.
Parallel.Invoke(
() => doSum(array),
() => doAvg(array),
() => doMedian(array));
As from the example above, you can see that they are different in functionality. ForEach iterates through a List of elements and performs one task on each element in parallel, while Invoke can perform many tasks in parallel on a single element.

Parallel.Invoke and Parallel.ForEach (when used to execute Actions) function the same, although yes one specifically wants the collection to be an Array. Consider the following sample:
List<Action> actionsList = new List<Action>
{
() => Console.WriteLine("0"),
() => Console.WriteLine("1"),
() => Console.WriteLine("2"),
() => Console.WriteLine("3"),
() => Console.WriteLine("4"),
() => Console.WriteLine("5"),
() => Console.WriteLine("6"),
() => Console.WriteLine("7"),
() => Console.WriteLine("8"),
() => Console.WriteLine("9"),
};
Parallel.ForEach<Action>(actionsList, ( o => o() ));
Console.WriteLine();
Action[] actionsArray = new Action[]
{
() => Console.WriteLine("0"),
() => Console.WriteLine("1"),
() => Console.WriteLine("2"),
() => Console.WriteLine("3"),
() => Console.WriteLine("4"),
() => Console.WriteLine("5"),
() => Console.WriteLine("6"),
() => Console.WriteLine("7"),
() => Console.WriteLine("8"),
() => Console.WriteLine("9"),
};
Parallel.Invoke(actionsArray);
Console.ReadKey();
This code produces this output on one Run. It's output is generally in a different order every time.
0 5 1 6 2 7 3 8 4 9
0 1 2 4 5 6 7 8 9 3

Surprisingly, no, they are not the same thing. Their fundamental difference is on how they behave in case of exceptions:
The Parallel.ForEach (as well as the Parallel.For and the upcoming Parallel.ForEachAsync) fails fast. After an exception has occurred, it does not start any new parallel work, and will return as soon as all the currently running delegates complete.
The Parallel.Invoke invokes invariably all the actions, regardless of whether some (or all) of them failed.
For demonstrating this behavior, lets run in parallel 1,000 actions, with one every three actions failing:
int c = 0;
Action[] actions = Enumerable.Range(1, 1000).Select(n => new Action(() =>
{
Interlocked.Increment(ref c);
if (n % 3 == 0) throw new ApplicationException();
})).ToArray();
try { c = 0; Parallel.For(0, actions.Length, i => actions[i]()); }
catch (AggregateException aex)
{ Console.WriteLine($"Parallel.For, Exceptions: {aex.InnerExceptions.Count}/{c}"); }
try { c = 0; Parallel.ForEach(actions, action => action()); }
catch (AggregateException aex)
{ Console.WriteLine($"Parallel.ForEach, Exceptions: {aex.InnerExceptions.Count}/{c}"); }
try { c = 0; Parallel.Invoke(actions); }
catch (AggregateException aex)
{ Console.WriteLine($"Parallel.Invoke, Exceptions: {aex.InnerExceptions.Count}/{c}"); }
Output (in my PC, .NET 5, Release build):
Parallel.For, Exceptions: 5/12
Parallel.ForEach, Exceptions: 5/11
Parallel.Invoke, Exceptions: 333/1000
Try it on Fiddle.

I'm trying to find a good way of phrasing it; but they are not the same thing.
The reason is, Invoke works on an Array of Action and ForEach works on a List (specifically an IEnumerable) of Action; Lists are significantly different to Arrays in mechanics although they expose the same sort of basic behaviour.
I can't say what the difference actually entails because I don't know, so please don't accept this an an answer (unless you really want too!) but I hope it jogs someones memory as to the mechanisms.
+1 for a good question too.
Edit; It just occurred to me that there is another answer too; Invoke can work on a dynamic list of Actions; but Foreach can work with a Generic IEnumerable of Actions and gives you the ability to use conditional logic, Action by Action; so you could test a condition before saying Action.Invoke() in each Foreach iteration.

Related

Offloading long blocking calls to separate threads using System.Observable

I want to execute a blocking method into separate threads. To illustrate the situation, I created this example:
int LongBlockingCall(int n)
{
Thread.Sleep(1000);
return n + 1;
}
var observable = Observable
.Range(0, 10)
.SelectMany(n =>
Observable.Defer(() => Observable.Return(LongBlockingCall(n), NewThreadScheduler.Default)));
var results = await observable.ToList();
I expected this to run in ~1 second, but it takes ~10 seconds instead.
A new thread should be spawned for each call, because I'm specifying NewThreadScheduler.Default, right?
What am I doing wrong?
You have to replace this line:
Observable.Return(LongBlockingCall(n), NewThreadScheduler.Default)
...with this:
Observable.Start(() => LongBlockingCall(n), NewThreadScheduler.Default)
The Observable.Return just returns a value. It's not aware how this value is produced. In order to invoke an action on a specific scheduler, you need the Observable.Start method.

Reactive Extensions SelectMany with large objects

I have this little piece of code that simulates a flow that uses large objects (that huge byte[]). For each item in the sequence, an async method is invoked to get some result. The problem? As it is, it throws OutOfMemoryException.
Code compatible with LINQPad (C# Program):
void Main()
{
var selectMany = Enumerable.Range(1, 100)
.Select(i => new LargeObject(i))
.ToObservable()
.SelectMany(o => Observable.FromAsync(() => DoSomethingAsync(o)));
selectMany
.Subscribe(r => Console.WriteLine(r));
}
private static async Task<int> DoSomethingAsync(LargeObject lo)
{
await Task.Delay(10000);
return lo.Id;
}
internal class LargeObject
{
public int Id { get; }
public LargeObject(int id)
{
this.Id = id;
}
public byte[] Data { get; } = new byte[10000000];
}
It seems that it creates all the objects at the same time. How can I do it the right way?
The underlying idea is to invoke DoSomethingAsync in order to get some result for each object, so that's why I use SelectMany. To simplify, I just have introduced a Task.Delay, but in real life it is a service that can process some items concurrently, so I want to introduce some concurrency mechanism to get advantage of it.
Please, notice that, theoretically, processing a little number of items at time shouldn't fill the memory. In fact, we only need each "large object" to get the results of the DoSomethingAsync method. After that point, the large object isn't used anymore.
I feel like i'm repeating myself. Similar to your last question and my last answer, what you need to do is limit the number of bigObjects™ to be created concurrent.
To do so, you need to combine object creation and processing and put it on the same thread pool. Now the problem is, we use async methods to allow threads to do other things while our async method run. Since your slow network call is async, your (fast) object creation code will keep creating large objects too fast.
Instead, we can use Rx to keep count of the number of concurrent Observables running by combine the object creation with the async call and use .Merge(maxConcurrent) to limit concurrency.
As a bonus, we can also set a minimal time for queries to execute. Just Zip with something that takes a minimal delay.
static void Main()
{
var selectMany = Enumerable.Range(1, 100)
.ToObservable()
.Select(i => Observable.Defer(() => Observable.Return(new LargeObject(i)))
.SelectMany(o => Observable.FromAsync(() => DoSomethingAsync(o)))
.Zip(Observable.Timer(TimeSpan.FromMilliseconds(400)), (el, _) => el)
).Merge(4);
selectMany
.Subscribe(r => Console.WriteLine(r));
Console.ReadLine();
}
private static async Task<int> DoSomethingAsync(LargeObject lo)
{
await Task.Delay(10000);
return lo.Id;
}
internal class LargeObject
{
public int Id { get; }
public LargeObject(int id)
{
this.Id = id;
Console.WriteLine(id + "!");
}
public byte[] Data { get; } = new byte[10000000];
}
It seems that it creates all the objects at the same time.
Yes, because you are creating them all at once.
If I simplify your code I can show you why:
void Main()
{
var selectMany =
Enumerable
.Range(1, 5)
.Do(x => Console.WriteLine($"{x}!"))
.ToObservable()
.SelectMany(i => Observable.FromAsync(() => DoSomethingAsync(i)));
selectMany
.Subscribe(r => Console.WriteLine(r));
}
private static async Task<int> DoSomethingAsync(int i)
{
await Task.Delay(1);
return i;
}
Running this produces:
1!
2!
3!
4!
5!
4
3
5
2
1
Because of the Observable.FromAsync you are allowing the source to run to completion before any of the results return. In other words you are quickly building all of the large objects, but slowly processing them.
You should allow Rx to run synchronously, but on the default scheduler so that your main thread is not blocked. The code will then run without any memory issues and your program will remain responsive on the main thread.
Here's the code for this:
var selectMany =
Observable
.Range(1, 100, Scheduler.Default)
.Select(i => new LargeObject(i))
.Select(o => DoSomethingAsync(o))
.Select(t => t.Result);
(I've effectively replaced Enumerable.Range(1, 100).ToObservable() with Observable.Range(1, 100) as that will also help with some issues.)
I've tried testing other options, but so far anything that allows DoSomethingAsync to run asynchronously runs into the out of memory error.
ConcatMap supports this out of the box. I know this operator is not available in .net, but you can make the same using Concat operator which defers subscribing to each inner source until the previous one completes.
You can introduce a time interval delay this way:
var source = Enumerable.Range(1, 100)
.ToObservable()
.Zip(Observable.Interval(TimeSpan.FromSeconds(1)), (i, ts) => i)
.Select(i => new LargeObject(i))
.SelectMany(o => Observable.FromAsync(() => DoSomethingAsync(o)));
So instead of pulling all 100 integers at once, immediately converting them to the LargeObject then calling DoSomethingAsync on all 100, it drips the integers out one-by-one spaced out one second each.
This is what a TPL+Rx solution would look like. Needless to say it is less elegant than Rx alone, or TPL alone. However, I don't think this problem is well suited for Rx:
void Main()
{
var source = Observable.Range(1, 100);
const int MaxParallelism = 5;
var transformBlock = new TransformBlock<int, int>(async i => await DoSomethingAsync(new LargeObject(i)),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = MaxParallelism });
source.Subscribe(transformBlock.AsObserver());
var selectMany = transformBlock.AsObservable();
selectMany
.Subscribe(r => Console.WriteLine(r));
}

Counting Non-Faulted Tasks causes re-execution of each task

I am saving a bunch of items to my database using async saves
var tasks = items.Select(item =>
{
var clone = item.MakeCopy();
clone.Id = Guid.NewGuid();
return dbAccess.SaveAsync(clone);
});
await Task.WhenAll(tasks);
I need to verify how many times SaveAsync was successful (It throws and exception if something goes wrong). I am using IsFaulted flag to examine the tasks:
var successCount = tasks.Count(t => !t.IsFaulted);
Collection items consists of 3 elements so SaveAsync should have been called three times but it is executed 6 times. Upon closer examination I noticed that counting non-faulted tasks with c.Count(...) causes each of the task to re-run.
I suspect it has something to do with deferred LINQ execution but I am not sure why exactly and how to fix this.
Any suggestion why I observe this behavior and what would be the optimal pattern to avoid this artifact?
It happens because of multiple enumeration of your Select query.
In order to fix it, force enumeration by calling ToList() method. Then it will work correctly.
var tasks = items.Select(item =>
{
var clone = item.MakeCopy();
clone.Id = Guid.NewGuid();
return dbAccess.SaveAsync(clone);
})
.ToList();
Also you may take a look at these more detailed answers:
https://stackoverflow.com/a/8240935/3872935
https://stackoverflow.com/a/20129161/3872935.

Search on TextChanged with Reactive Extensions

I was trying to implement instant search on a database table with 10000+ records.
The search starts when the text inside the search text box changes, when the search box becomes empty I want to call a different method that loads all the data.
Also if the user changes the search string while results for another search are being loaded, then the loading of the those results should stop in favor of the new search.
I implemented it like the following code, but I was wondering if there is a better or cleaner way to do it using Rx (Reactive Extension) operators, I feel that creating a second observable inside the subscribe method of the first observable is more imperative than declarative, and the same for that if statement.
var searchStream = Observable.FromEventPattern(s => txtSearch.TextChanged += s, s => txtSearch.TextChanged -= s)
.Throttle(TimeSpan.FromMilliseconds(300))
.Select(evt =>
{
var txtbox = evt.Sender as TextBox;
return txtbox.Text;
}
);
searchStream
.DistinctUntilChanged()
.ObserveOn(SynchronizationContext.Current)
.Subscribe(searchTerm =>
{
this.parties.Clear();
this.partyBindingSource.ResetBindings(false);
long partyCount;
var foundParties = string.IsNullOrEmpty(searchTerm) ? partyRepository.GetAll(out partyCount) : partyRepository.SearchByNameAndNotes(searchTerm);
foundParties
.ToObservable(Scheduler.Default)
.TakeUntil(searchStream)
.Buffer(500)
.ObserveOn(SynchronizationContext.Current)
.Subscribe(searchResults =>
{
this.parties.AddRange(searchResults);
this.partyBindingSource.ResetBindings(false);
}
, innerEx =>
{
}
, () => { }
);
}
, ex =>
{
}
, () =>
{
}
);
The SearchByNameAndNotes method just returns an IEnumerable<Party> using SQLite by reading data from a data reader.
I think you want something like this. EDIT: From your comments, I see you have a synchronous repository API - I'll leave the asynchronous version in, and add a synchronous version afterwards. Notes inline:
Asynchronous Repository Version
An asynchronous repository interface could be something like this:
public interface IPartyRepository
{
Task<IEnumerable<Party>> GetAllAsync(out long partyCount);
Task<IEnumerable<Party>> SearchByNameAndNotesAsync(string searchTerm);
}
Then I refactor the query as:
var searchStream = Observable.FromEventPattern(
s => txtSearch.TextChanged += s,
s => txtSearch.TextChanged -= s)
.Select(evt => txtSearch.Text) // better to select on the UI thread
.Throttle(TimeSpan.FromMilliseconds(300))
.DistinctUntilChanged()
// placement of this is important to avoid races updating the UI
.ObserveOn(SynchronizationContext.Current)
.Do(_ =>
{
// I like to use Do to make in-stream side-effects explicit
this.parties.Clear();
this.partyBindingSource.ResetBindings(false);
})
// This is "the money" part of the answer:
// Don't subscribe, just project the search term
// into the query...
.Select(searchTerm =>
{
long partyCount;
var foundParties = string.IsNullOrEmpty(searchTerm)
? partyRepository.GetAllAsync(out partyCount)
: partyRepository.SearchByNameAndNotesAsync(searchTerm);
// I assume the intention of the Buffer was to load
// the data into the UI in batches. If so, you can use Buffer from nuget
// package Ix-Main like this to get IEnumerable<T> batched up
// without splitting it up into unit sized pieces first
return foundParties
// this ToObs gets us into the monad
// and returns IObservable<IEnumerable<Party>>
.ToObservable()
// the ToObs here gets us into the monad from
// the IEnum<IList<Party>> returned by Buffer
// and the SelectMany flattens so the output
// is IObservable<IList<Party>>
.SelectMany(x => x.Buffer(500).ToObservable())
// placement of this is again important to avoid races updating the UI
// erroneously putting it after the Switch is a very common bug
.ObserveOn(SynchronizationContext.Current);
})
// At this point we have IObservable<IObservable<IList<Party>>
// Switch flattens and returns the most recent inner IObservable,
// cancelling any previous pending set of batched results
// superceded due to a textbox change
// i.e. the previous inner IObservable<...> if it was incomplete
// - it's the equivalent of your TakeUntil, but a bit neater
.Switch()
.Subscribe(searchResults =>
{
this.parties.AddRange(searchResults);
this.partyBindingSource.ResetBindings(false);
},
ex => { },
() => { });
Synchronous Repository Version
An synchronous repository interface could be something like this:
public interface IPartyRepository
{
IEnumerable<Party> GetAll(out long partyCount);
IEnumerable<Party> SearchByNameAndNotes(string searchTerm);
}
Personally, I don't recommend a repository interface be synchronous like this. Why? It is typically going to do IO, so you will wastefully block a thread.
You might say the client could call from a background thread, or you could wrap their call in a task - but this is not the right way to go I think.
The client doesn't "know" you are going to block; it's not expressed in the contract
It should be the repository that handles the asynchronous aspect of the implementation - after all, how this is best achieved will only be known best by the repository implementer.
Anyway, accepting the above, one way to implement is like this (of course it's mostly similar to the async version so I've only annotated the differences):
var searchStream = Observable.FromEventPattern(
s => txtSearch.TextChanged += s,
s => txtSearch.TextChanged -= s)
.Select(evt => txtSearch.Text)
.Throttle(TimeSpan.FromMilliseconds(300))
.DistinctUntilChanged()
.ObserveOn(SynchronizationContext.Current)
.Do(_ =>
{
this.parties.Clear();
this.partyBindingSource.ResetBindings(false);
})
.Select(searchTerm =>
// Here we wrap the synchronous repository into an
// async call. Note it's simply not enough to call
// ToObservable(Scheduler.Default) on the enumerable
// because this can actually still block up to the point that the
// first result is yielded. Doing as we have here,
// we guarantee the UI stays responsive
Observable.Start(() =>
{
long partyCount;
var foundParties = string.IsNullOrEmpty(searchTerm)
? partyRepository.GetAll(out partyCount)
: partyRepository.SearchByNameAndNotes(searchTerm);
return foundParties;
}) // Note you can supply a scheduler, default is Scheduler.Default
.SelectMany(x => x.Buffer(500).ToObservable())
.ObserveOn(SynchronizationContext.Current))
.Switch()
.Subscribe(searchResults =>
{
this.parties.AddRange(searchResults);
this.partyBindingSource.ResetBindings(false);
},
ex => { },
() => { });

Parallel Linq - return first result that comes back

I'm using PLINQ to run a function that tests serial ports to determine if they're a GPS device.
Some serial ports immediately are found to be a valid GPS. In this case, I want the first one to complete the test to be the one returned. I don't want to wait for the rest of the results.
Can I do this with PLINQ, or do I have to schedule a batch of tasks and wait for one to return?
PLINQ is probably not going to suffice here. While you can use .First, in .NET 4, this will cause it to run sequentially, which defeats the purpose. (Note that this will be improved in .NET 4.5.)
The TPL, however, is most likely the right answer here. You can create a Task<Location> for each serial port, and then use Task.WaitAny to wait on the first successful operation.
This provides a simple way to schedule a bunch of "tasks" and then just use the first result.
I have been thinking about this on and off for the past couple days and I can't find a built in PLINQ way to do this in C# 4.0. The accepted answer to this question of using FirstOrDefault does not return a value until the full PLINQ query is complete and still returns the (ordered) first result. The following extreme example shows the behavior:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
cts.CancelAfter(5000);
// waits until all results are in, then returns first
q.FirstOrDefault().Dump("result");
I don't see a built-in way to immediately get the first available result, but I was able to come up with two workarounds.
The first creates Tasks to do the work and returns the Task, resulting in a quickly completed PLINQ query. The resulting tasks can be passed to WaitAny to get the first result as soon as it is available:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
return Task.Factory.StartNew(() =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
});
cts.CancelAfter(5000);
// returns as soon as the tasks are created
var ts = q.ToArray();
// wait till the first task finishes
var idx = Task.WaitAny( ts );
ts[idx].Result.Dump("res");
This is probably a terrible way to do it. Since the actual work of the PLINQ query is just a very fast Task.Factory.StartNew, it's pointless to use PLINQ at all. A simple .Select( i => Task.Factory.StartNew( ... on the IEnumerable is cleaner and probably faster.
The second workaround uses a queue (BlockingCollection) and just inserts results into this queue once they are computed:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
cts.CancelAfter(5000);
var qu = new BlockingCollection<string>();
// ForAll blocks until PLINQ query is complete
Task.Factory.StartNew(() => q.ForAll( x => qu.Add(x) ));
// get first result asap
qu.Take().Dump("result");
With this method, the work is done using PLINQ, and the BlockingCollecion's Take() will return the first result as soon as it is inserted by the PLINQ query.
While this produces the desired result, I am not sure it has any advantage over just using the simpler Tasks + WaitAny
Upon further review, you can apparently just use FirstOrDefault to solve this. PLINQ will not preserve ordering by default, and with an unbuffered query, will return immediately.
http://msdn.microsoft.com/en-us/library/dd460677.aspx
To accomplish this entirely with PLINQ in .NET 4.0:
SerialPorts. // Your IEnumerable of serial ports
AsParallel().AsUnordered(). // Run as an unordered parallel query
Where(IsGps). // Matching the predicate IsGps (Func<SerialPort, bool>)
Take(1). // Taking the first match
FirstOrDefault(); // And unwrap it from the IEnumerable (or null if none are found
The key is to not use an ordered evaluation like First or FirstOrDefault until you have specified that you only care to find one.

Categories