I am having a difficult time understanding the TPL and I cannot find many clear articles on it. Most seem to use simplistic examples with lambda expressions.
I have a C# function
int[] PlayGames(int[][] boardToSearch, int numGamesToPlay) {…}
I want to make this threadable using the .NET 4.6 TPL in C#. I want to launch up to 8 of these functions at once, wait until they all finish, capture the results and move on.
I can’t seem to get the types right and it’s not working as expected.
Here’s what I’ve got so far:
Task<int[]> PlayGames(int[][] boardToSearch, int numGamesToPlay) {…code that takes a long time…}
private int FindBestMove(int[][] boardToSearch, int numGamesToPlay)
{
…
var taskList = new List<Task>();
taskList.Add(Task.Factory.StartNew(() => { return PlayGames(boardToSearch, numGamesToPlay); }));
taskList.Add(Task.Factory.StartNew(() => { return PlayGames(boardToSearch, numGamesToPlay); }));
// Tests
Task a = taskList.First();
var a1 = a.Result; // NOT ALLOWED!? Viewable in debugger, cannot access it.
var b = Task.FromResult(a);
var b1 = b.Result; // Works but cannot access child member Result. Debugger sees it, I can’t!?
Task.WaitAll(taskList.ToArray());
…
}
Here are my questions
How do I remove the lambda expression () => { return PlayGames(boardToSearch, numGamesToPlay); }? I want to use Func() somehow but I cannot for the life of me figure out how to say “Task.Factory.StartNew<int[]>(Func(PlayGames(boardToSearch, numGamesToPlay)))”.
Why do I need to use StartNew()? When I do taskList.Add(PlayGames(int[][] boardToSearch, int numGamesToPlay)), it does it synchronously!? What is the correct syntax to add a task to a list in this manner? Do I need to declare a Func of some sorts that I pass to something like new Task(Func(PlayGames))?
When you look at variable a after executing the line Task a = taskList.First(), it clearly shows a member called Result in the debug pane. And if I expand that result it contains the right data! But if I click on add watch to the Result member, I get an error! And if I do a.Result, the compiler gives me the same error!?? So the debugger says it’s there but I cannot view it on its own!?? I can browse it from the a object but not directly. I included a screenshot of this so others could see.
What is the cleanest way to do this with .NET 4.6 while staying away from lambda expressions. I want to see all the types and the declarations.
Attached is a screenshot of my debugger so you can see what I mean with .Result
Let's start from the top:
1] Func it's just a delegate that's the part of .net framework libraries.
So, when you pass () => { return PlayGames(boardToSearch, numGamesToPlay); } it means you just create an anonymous method which has a type of Func<int[]>. If you assign this lambda expression to some variable then you can check this type.
If you don't want to use lambda you can write a common method and put it inside the task: Task.Factory.StartNew(YourMethodWhichReturnsIntArray).
2] When you call StartNew() method it just creates a new Task and starts execute this. That's it.
taskList.Add(PlayGames(int[][] boardToSearch, int numGamesToPlay)) - this just put the Task into the taskList. If inside your PlayGames method this Task wasn't started then you will need to do it sometime after. Synchronous or not - adding Task to list is synchronous operation, but executing still will be asynchronous. Any syntax might be correct or not - it depends on complexity and realization. Instead of Task.Factory.StartNew(), you can you a just Task.Run() method. It does the same, but in a bit shorten manner. And it's not necessary to declare a func before passing to the Task.
3] I believe is that's because the debugger has an ability to wait for the results from a parallel thread/task, but watcher doesn't. That's why you get an error.
So, I would say do not try to add watcher for the parallel threads results (just to skip the possible errors).
4] What is the cleanest way to do this with .NET 4.6 while staying away from lambda expressions. I want to see all the types and the declarations.
As I said above, it's not necessary to declare the lambda. You can create a method with the correspond definition. But here you will face some difficulties with passing the parameters to this method (but they still could be solved). So, lambda - is the easiest way to pass function to the Task, because it can easily capture you parameters from the scope where these lambda's have been created.
What is the cleanest way - again, it depends. Each of us has his own cleanest way to run new tasks. So, I think that (see below):
// your function
int[] PlayGames(int[][] boardToSearch, int numGamesToPlay) {…}
private int YouMethodToRun8Tasks(int[][] boardToSearch, int numGamesToPlay)
{
...
var taskList = new List<Task<int[]>>();
// create and run 8 tasks
for(var i = 0; i < 8; i++)
{
// it will capture the parameters and use them in all 8 tasks
taskList.Add(Task.Run<int[]>(() => PlayGames(boardToSearch, numGamesToPlay));
}
// do something else
...
// wait for all tasks
Task.WaitAll(taskList.ToArray());
// do something else
}
might be named as cleanest way in some cases.
I hope it will help you.
Related
Is there maybe something like a "when" statement in C#?
The reason I want this is because an if statement only checks once if a certain property is true at a particular time, but I want it to wait until the property is true.
Anybody know something I can do in C# that would be similar to a "when" statement?
What you want is SpinWait
e.g. SpinWait.SpinUntil(() => condition);
It will sit there until it either times out (with your specified timeout) or the condition is met.
There is no when control statement, but there are two options which might meet your needs:
You can use a while(predicate){} loop to keep looping until a condition is met. The predicate can be any expression which returns true/false - as long as the condition is true, it will loop. If you just want to wait without consuming too much CPU, you can Sleep() the thread within the loop:
while(name == "Sebastian")
{
// Code to execute
System.Threading.Thread.Sleep(1000);
}
If you property is a numeric range, you could use a for loop, but that doesn't sound like what you want.
If you want to deal with an asynchronous world than you may be should look at the library Rx.NET. Let's look at the simple example: suppose you want to read strings from the console and when user inputs word "hello" you need to print "world" in the response. This simple example can be implemented as follows:
var inputLines = new Subject<string>();
inputLines.Subscribe(info =>
{
if (info == "hello")
Console.Out.WriteLine("world");
});
while (true)
{
var line = Console.In.ReadLine();
inputLines.OnNext(line);
}
So, there are explicit when action, that we pass in the Subscribe(...) function.
In this simple example usage of Rx.NET is obviously unnecessarily and you shouldn't do it. But in more complex scenarios this is a very helpful library. You can see, that with Reactive Extensions you split the logic of your application from the main event-pool, where you can want to do some other work, not related to the application logic. Also, there is high flexibility that you can get with this library, because it's very dynamic - you can subscribe and unsubscribe different events in run-time at any time.
You can notice, that there is another way to solve my example in the event-based paradigm. We can simply use built-in events like this:
public static event EventHandler<string> InputEvent;
public void Run()
{
InputEvent += (sender, line) => {
if (line == "hello")
Console.WriteLine("world");
};
while (true) {
var line = Console.In.ReadLine();
InputEvent?.Invoke(this, line);
}
}
And this is a right point, sometimes you can replace Reactive Extensions with simple events because they are connected. But when you need to build a complex pipeline from many event sources and using many different tightly coupled actions, then Reactive Extensions allow you to nicely build this pipeline in the very declarative way.
You can use 'async' and 'await' to wait unit a certain 'Task' is complete. 'await' is somewhat similar to the 'when' statement you need. Only it pauses the current 'Task' until the awaited 'Task' finished with any result not just when an expression becomes 'true'. See also TaskCompletionSource.
What are you trying to achieve here? Are you running a synchronous process or are you waiting for something asynchronous to happen?
If you're synchronous then while is probably the correct solution:
var result = 0;
while(result != 6)
{
result = RollADie();
Console.WriteLine($"I rolled a {result}");
}
Console.WriteLine("At last, a six!");
But - if you're waiting for something asynchronous to happen then a different solution is called for. An asynchronous scenario is where you want your code to 'hang around and wait, doing nothing' until the condition is fulfilled.
In that case the modern C# solution is asynchronous programming using the async and await keywords, along with the Task class (and it's generic cousin Task<TResult>). That's probably a bit deep to go into here, but here's a pretty good primer.
What's important is that you don't use a solution based on while in order to deal with asynchronous processes. You'll just send the CPU spinning in circles, chasing it's own tail so to speak, when really you want to say "now stop working on this until X happens". Also, avoid any solution based on while combined with Thread.Sleep for related reasons.
I am programming a game using the Unity Engine and I am currently running into the following problem:
I have a method that asynchronously returns his result using the parameters of a callback function. Pretty straightforward, it looks like this:
public void CalculateSomething( - PARAMETERS - , Action<float> callback)
I have to call this method in a loop for different parameters.
foreach(float f in manyFloats){
CalculateSomething(f, myCallback);
}
void myCallback(float f){
...compare this result value to the other values?...
}
Now I would like to compare the resulting floats, that come with the different callbacks. Let's just say I want to find the highest value among those floats. How do I do that?
I had the idea to store the results in an array field and just compare after it is fully filled, but I don't know how to check if all callbacks are done already.
Ideally I'd like to avoid polluting my class with a field like this, but it would be alright if there is no other way.
The CalculateSomethingfunction is part of a library, so I can't change it. Everything else is variable.
Here is the deal.
You got it right about creating the array and storing the values and compare them when all callbacks are done. Hence the problem is that you don't know when all the callbacks are returned. But you know how many callbacks are there based on count of your origin manyFloats variable. All you need to do is keep a counter and add it up every time a callback is returned. And check if it equals the count of manyFloats then you can do the comparison:
int count = 0;
void myCallback(float f)
{
... usual stuff
... then
if(count == manyFloats.Count)
{
// do the comparison
}
else
{
count ++;
}
}
Rather than using a callback based model you should use a Task based model. Have CalculateSomething return a Task<float> instead of having a callback. This allows you to use the TPL to compose these Task objects by writing code like:
var highestResult = (await Task.WhenAll(manyFloats.Select(CalculateSomething))).Max();
If you can't edit the method itself, then create your own wrapper that transforms the method into a task based version.
So, if we have no control over CalculateSomething function, we still can store max value and make comparison in our callback. Something like this:
void callbackFunction( float numb){
if (numb > maxNumb) //maxNumb is global
maxNumb = numb;
}
then you can use your foreach loop to go through your array. Just keep in mind that you would need to declare your maxValue global variable and make it initially equal to minimum value.
Even if you are not doing Min/Max comparison, and it is something more complicated, you still can do it in callback function. No need for arrays, because even with arrays if you can do it with single pass through array - you can do it in a callback function.
In to following tutorial : http://www.albahari.com/threading/
They say that the following code :
for (int i = 0; i < 10; i++)
new Thread (() => Console.Write (i)).Start();
is non deterministic and can produce the following answer :
0223557799
I thought that when one uses lambda expressions the compiler creates some kind of anonymous class that captures the variables that are in use by creating members like them in the capturing class.
But i is value type, so i thought that he should be copied by value.
where is my mistake ?
It will be very helpful if the answer will explain how does closure work, how do it hold a "pointer" to a specific int , what code does generated in this specific case ?
The key point here is that closures close over variables, not over values. As such, the value of a given variable at the time you close over it is irrelevant. What matters is the value of that variable at the time the anonymous method is invoked.
How this happens is easy enough to see when you see what the compiler transforms the closure into. It'll create something morally similar to this:
public class ClosureClass1
{
public int i;
public void AnonyousMethod1()
{
Console.WriteLine(i);
}
}
static void Main(string[] args)
{
ClosureClass1 closure1 = new ClosureClass1();
for (closure1.i = 0; closure1.i < 10; closure1.i++)
new Thread(closure1.AnonyousMethod1).Start();
}
So here we can see a bit more clearly what's going on. There is one copy of the variable, and that variable has now been promoted to a field of a new class, instead of being a local variable. Anywhere that would have modified the local variable now modifies the field of this instance. We can now see why your code prints what it does. After starting the new thread, but before it can actually execute, the for loop in the main thread is going back and incrementing the variable in the closure. The variable that hasn't yet been read by the closure.
To produce the desired result what you need to do is make sure that, instead of having every iteration of the loop closing over a single variable, they need to each have a variable that they close over:
for (int i = 0; i < 10; i++)
{
int copy = i;
new Thread(() => Console.WriteLine(copy));
}
Now the copy variable is never changed after it is closed over, and our program will print out 0-9 (although in an arbitrary order, because threads can be scheduled however the OS wants).
As Albahari states, Although the passing arguments are value types, each thread captures the memory location thus resulting in unexpected results.
This is happening because before the Thread had any time to start, the loop already changed whatever value that inside i.
To avoid that, you should use a temp variable as Albahari stated, or only use it when you know the variable is not going to change.
i in Console.Write(i) is evaluated right when that statement is about to be executed. That statement will be executed once thread has been fully created and started running and got to that code. By that time loop has moved forward a few times and thus i can be any value by then. Closures, unlike regular functions, have visibility into local variables of a function in which it is defined (what makes them useful, and way to shoot oneself in a foot).
I'm testing performance differences using various lambda expression syntaxes. If I have a simple method:
public IEnumerable<Item> GetItems(int point)
{
return this.items.Where(i => i.IsApplicableFor(point));
}
then there's some variable lifting going on here related to point parameter because it's a free variable from lambda's perspective. If I would call this method a million times, would it be better to keep it as it is or change it in any way to improve its performance?
What options do I have and which ones are actually feasible? As I understand it is I have to get rid of free variables so compiler won't have to create closure class and instantiate it on every call to this method. This instantiation usually takes significant amount of time compared to non-closure versions.
The thing is I would like to come up with some sort of lambda writing guidelines that would generally work, because it seems I'm wasting some time every time I write a heavily hit lambda expression. I have to manually test it to make sure it will work, because I don't know what rules to follow.
Alternative method
& example console application code
I've also written a different version of the same method that doesn't need any variable lifting (at least I think it doesn't, but you guys who understand this let me know if that's the case):
public IEnumerable<Item> GetItems(int point)
{
Func<int, Func<Item, bool>> buildPredicate = p => i => i.IsApplicableFor(p);
return this.items.Where(buildPredicate(point));
}
Check out Gist here. Just create a console application and copy the whole code into Program.cs file inside namespace block. You will see that the second example is much much slower even though it doesn't use free variables.
A contradictory example
The reason why I would like to construct some lambda best usage guidelines is that I've met this problem before and to my surprise that one turned out to be working faster when a predicate builder lambda expression was used.
Now explain that then. I'm completely lost here because it may as well turn out I won't be using lambdas at all when I know I have some heavy use method in my code. But I would like to avoid such situation and get to the bottom of it all.
Edit
Your suggestions don't seem to work
I've tried implementing a custom lookup class that internally works similar to what compiler does with a free variable lambda. But instead of having a closure class I've implemented instance members that simulate a similar scenario. This is the code:
private int Point { get; set; }
private bool IsItemValid(Item item)
{
return item.IsApplicableFor(this.Point);
}
public IEnumerable<TItem> GetItems(int point)
{
this.Point = point;
return this.items.Where(this.IsItemValid);
}
Interestingly enough this works just as slow as the slow version. I don't know why, but it seems to do nothing else than the fast one. It reuses the same functionality because these additional members are part of the same object instance. Anyway. I'm now extremely confused!
I've updated Gist source with this latest addition, so you can test for yourself.
What makes you think that the second version doesn't require any variable lifting? You're defining the Func with a Lambda expression, and that's going to require the same bits of compiler trickery that the first version requires.
Furthermore, you're creating a Func that returns a Func, which bends my brain a little bit and will almost certainly require re-evaluation with each call.
I would suggest that you compile this in release mode and then use ILDASM to examine the generated IL. That should give you some insight into what code is generated.
Another test that you should run, which will give you more insight, is to make the predicate call a separate function that uses a variable at class scope. Something like:
private DateTime dayToCompare;
private bool LocalIsDayWithinRange(TItem i)
{
return i.IsDayWithinRange(dayToCompare);
}
public override IEnumerable<TItem> GetDayData(DateTime day)
{
dayToCompare = day;
return this.items.Where(i => LocalIsDayWithinRange(i));
}
That will tell you if hoisting the day variable is actually costing you anything.
Yes, this requires more code and I wouldn't suggest that you use it. As you pointed out in your response to a previous answer that suggested something similar, this creates what amounts to a closure using local variables. The point is that either you or the compiler has to do something like this in order to make things work. Beyond writing the pure iterative solution, there is no magic you can perform that will prevent the compiler from having to do this.
My point here is that "creating the closure" in my case is a simple variable assignment. If this is significantly faster than your version with the Lambda expression, then you know that there is some inefficiency in the code that the compiler creates for the closure.
I'm not sure where you're getting your information about having to eliminate the free variables, and the cost of the closure. Can you give me some references?
Your second method runs 8 times slower than the first for me. As #DanBryant says in comments, this is to do with constructing and calling the delegate inside the method - not do do with variable lifting.
Your question is confusing as it reads to me like you expected the second sample to be faster than the first. I also read it as the first is somehow unacceptably slow due to 'variable lifting'. The second sample still has a free variable (point) but it adds additional overhead - I don't understand why you'd think it removes the free variable.
As the code you have posted confirms, the first sample above (using a simple inline predicate) performs jsut 10% slower than a simple for loop - from your code:
foreach (TItem item in this.items)
{
if (item.IsDayWithinRange(day))
{
yield return item;
}
}
So, in summary:
The for loop is the simplest approach and is "best case".
The inline predicate is slightly slower, due to some additional overhead.
Constructing and calling a Func that returns Func within each iteration is significantly slower than either.
I don't think any of this is surprising. The 'guideline' is to use an inline predicate - if it performs poorly, simplify by moving to a straight loop.
I profiled your benchmark for you and determined many things:
First of all, it spends half its time on the line return this.GetDayData(day).ToList(); calling ToList. If you remove that and instead manually iterate over the results, you can measure relative the differences in the methods.
Second, because IterationCount = 1000000 and RangeCount = 1, you are timing the initialization of the different methods rather than the amount of time it takes to execute them. This means your execution profile is dominated by creating the iterators, escaping variable records, and delegates, plus the hundreds of subsequent gen0 garbage collections that result from creating all that garbage.
Third, the "slow" method is really slow on x86, but about as fast as the "fast" method on x64. I believe this is due to how the different JITters create delegates. If you discount the delegate creation from the results, the "fast" and "slow" methods are identical in speed.
Fourth, if you actually invoke the iterators a significant number of times (on my computer, targetting x64, with RangeCount = 8), "slow" is actually faster than "foreach" and "fast" is faster than all of them.
In conclusion, the "lifting" aspect is negligible. Testing on my laptop shows that capturing a variable like you do requires an extra 10ns every time the lambda gets created (not every time it is invoked), and that includes the extra GC overhead. Furthermore, while creating the iterator in your "foreach" method is somewhat faster than creating the lambdas, actually invoking that iterator is slower than invoking the lambdas.
If the few extra nanoseconds required to create delegates is too much for your application, consider caching them. If you require parameters to those delegates (i.e. closures), consider creating your own closure classes such that you can create them once and then just change the properties when you need to reuse their delegates. Here's an example:
public class SuperFastLinqRangeLookup<TItem> : RangeLookupBase<TItem>
where TItem : RangeItem
{
public SuperFastLinqRangeLookup(DateTime start, DateTime end, IEnumerable<TItem> items)
: base(start, end, items)
{
// create delegate only once
predicate = i => i.IsDayWithinRange(day);
}
DateTime day;
Func<TItem, bool> predicate;
public override IEnumerable<TItem> GetDayData(DateTime day)
{
this.day = day; // set captured day to correct value
return this.items.Where(predicate);
}
}
When a LINQ expression that uses deferred execution executes within the same scope that encloses the free variables it references, the compiler should detect that and not create a closure over the lambda, because it's not needed.
The way to verify that would be by testing it using something like this:
public class Test
{
public static void ExecuteLambdaInScope()
{
// here, the lambda executes only within the scope
// of the referenced variable 'add'
var items = Enumerable.Range(0, 100000).ToArray();
int add = 10; // free variable referenced from lambda
Func<int,int> f = x => x + add;
// measure how long this takes:
var array = items.Select( f ).ToArray();
}
static Func<int,int> GetExpression()
{
int add = 10;
return x => x + add; // this needs a closure
}
static void ExecuteLambdaOutOfScope()
{
// here, the lambda executes outside the scope
// of the referenced variable 'add'
Func<int,int> f = GetExpression();
var items = Enumerable.Range(0, 100000).ToArray();
// measure how long this takes:
var array = items.Select( f ).ToArray();
}
}
I'm curious to know whether a Lambda (when used as delegate) will create a new instance every time it is invoked, or whether the compiler will figure out a way to instantiate the delegate only once and pass in that instance.
More specifically, I'm wanting to create an API for an XNA game that I can use a lambda to pass in a custom call back. Since this will be called in the Update method (which is called many times per second) it would be pretty bad if it newed up an instance everytime to pass in the delegate.
InputManager.GamePads.ButtonPressed(Buttons.A, s => s.MoveToScreen<NextScreen>());
Yes, it will cache them when it can:
using System;
class Program {
static void Main(string[] args) {
var i1 = test(10);
var i2 = test(20);
System.Console.WriteLine(object.ReferenceEquals(i1, i2));
}
static Func<int, int> test(int x) {
Func<int, int> inc = y => y + 1;
Console.WriteLine(inc(x));
return inc;
}
}
It creates a static field, and if it's null, populates it with a new delegate, otherwise returns the existing delegate.
Outputs 10, 20, true.
I was interested by your question because I had just assumed that this kind of thing would always generate a new object and hence to be avoided in code which is called frequently.
I do something similar so I thought I would use ildasm to find out what exactly is going on behind the scenes. In my case it turned out that a new object was getting created each time the delegate was called, I won't post my code because it is fairly complex and not very easy to understand out of context. This conflicts with the answer provided by MichaelGG, I suspect because in his example he makes use of static functions. I would suggest you try it for yourself before designing everything one way and later on finding out that you have a problem. ildasm is the way to go (http://msdn.microsoft.com/en-us/library/f7dy01k1.aspx), look out for any "newobj" lines, you don't want those.
Also worth using CLR Profile to find out if your lambda functions are allocating memory (https://github.com/MicrosoftArchive/clrprofiler). It says it's for framework 2.0 but it also works for 3.5 and it's the latest version that is available.