Tasks with LinkedList nodes - c#

I have multiple tasks and each starts at a given LinkedListNode. Something like this:
first = linkedList.First;
int counter = 0;
while (iterator != null) {
counter++;
if (counter == threshold) {
Task.Factory.StartNew(() => run(first, iterator));
counter = 0;
first = iterator.Next;
}
iterator = iterator.Next;
}
The idea is that I want to run through a LinkedList and not convert it to an array because of memory requirements. So, I figured I'd just pass in the start and the end and iterate over that.
When my tasks actually start, it seems like the parameters are where ever they left off in the loop. Is there some way I can form a closure over the variables so that the task starts with the correct nodes from the LinkedList?
Or, maybe a better way of accomplishing this goal with a LinkedList?

Consider using Parallel.ForEach on your linked list instead. This looks like it will save you a lot of trouble.

Due to variable closure, you should declare the variable that is captured in your lambda expression within the body of your loop. Otherwise, the task's reads of the first variable would suffer a race condition with the main thread's subsequent updates to it.
first = linkedList.First;
while (iterator != null)
{
// ...
var current = first;
Task.Factory.StartNew(() => run(current));
// ...
}

You can pass state to your Task through the appropriate StartNew method which takes an object parameter. A closure is automatically performed over the data passed in this way to the Task.

Related

C# Passing array of string element into a Task.Run

Trying to pass an element of an array of strings into a function which is being called in a Task.Run. Anyone know what is the error here?
The code here doesn't work, it behaves as if ProcessElem never gets called.
string[] arr = message.Split(new string[] {"\n"}, StringSplitOptions.None);
for (int i = 0; i < arr.Length; i++) {
if(arr[i] != "") {
var t = Task.Run(() => this.ProcessElem(arr[i]));
}
}
However the code below works
string[] arr = message.Split(new string[] {"\n"}, StringSplitOptions.None);
for (int i = 0; i < arr.Length; i++) {
if(arr[i] != "") {
var tmp = arr[i];
var t = Task.Run(() => this.ProcessElem(tmp));
}
}
I'm very new to how C# does things, but it seems like both patterns are unsafe because the function that calls Task.Run() might return before the ProcessElem function executes, and if the strings are pass by reference then they will be destroyed before ProcessElem is called.
If this is the case, what would be the best way to pass the string into ProcessElem?
Also, why doesn't the first version actually "call" ProcessElem? I have a print statement at the top of ProcessElem and it only gets printed # the second version.
Welcome to captured variables.
Task.Run(() => this.ProcessElem(arr[i]))
This essentially means:
Take my lambda action: () => this.ProcessElem(arr[i])
Run it after you've found/created a thread to do so. i.e. some time later.
However, there's only one variable involved, i, and that's defined outside your lambda action's scope, it's not being copied, the same variable is just being captured and referenced.
By the time that thread gets around to executing, the value of i has most likely changed. Usually, the loop finishes before the threads perform their work.
That means that by that time, i equals arr.Length and all threads try to access arr[arr.length] which obviously results in an IndexOutOfRangeException.
When you do var tmp = arr[i];, you are creating a fresh variable per loop iteration, copying the loop variable and capturing that copy in your lambda, which is why it works.
The source of your problem is how the actual "coroutines" work in C#
i is not passed as the current value but rather as ref i which means that your Action always will receive the current i value when it gets executed.
Chances are, you run this code and the Tasks are not executed in parallel. That means, the specific task executed gets the current value of i which, in most simple cases, will be as provided as exit condition: arr.Length + 1
to proof:
for (int i = 0; i < arr.Length; i++)
{
if (arr[i] != "")
{
var j = i;
var t = Task.Run(() => ProcessElem(arr[j]));
tasklist.Add(t);
}
}
will work perfectly fine (unless you have some problems in your ProcessElem method :P)
in regards of string-destruction, unless you got some object that implements IDisposable, you should be fine with passing it into some lambda.
It will exist, until the actual lambda got deleted (as it will retain some reference to the object eg. in this case arr)
Your problem is an age old issue, its how lamdas work, and its very well documented.
However, assuming you are just creating and awaiting a bunch of tasks, then save your self code, hassle, and task creation and just use TPL Parallel.For or AsParallel
Parallel.For(0, arr.Length, (i) => ProcessElem(arr[i]));
Or
arr.AsParallel().ForAll(ProcessElem);
Or if you really don't want empty strings
arr.Where(x => !string.IsNullOrEmpty(x))
.AsParallel()
.ForAll(ProcessElem);

How to get next value when a common variable is increment in a parallel foreach?

I have a common variable that is updated inside a foreach, it is a counter, so I would like to get the next counter to be assigned inside a parallel foreach.
I have something like that:
int myCommonCounter = 5;
Parallel.Foreach(myList, (iterator, state) =>
{
if(iterator.MyProperty == X)
{
iterator.Position = miCommonCounter;
miCommonCounter = miCommonCunter + 1;
}
});
I want to avoid gaps in property iterator.Position and avoid duplicates too.
I have seen an example that does the sum of partial results, for example in this documentation, but it is not really the same case that I need.
So I would like to know if there are any away to update a counter to avoid gaps and duplicates when I updated inside the parallel foreach.
Thanks.
The overload of Parallel foreach that has 3 parameters for lambda argument has counter already.
You should use that instead of writing manual counter. It does not have duplicates. It wont have gaps either unless you stop loop for some reason.
Parallel.ForEach(source.Where(x => /*condition*/), (x, state, counter) =>
{
iterator.Position = (int) counter;
});
Note that counter is type of long so you have to cast it to int if Position is of int type

Multithreading in C# and ConcurrentDictionary: Is the following usage correct?

I have such a scenario at hand (using C#): I need to use a parallel "foreach" on a list of objects: Each object in this list is working like a data source, which is generating series of binary vector patterns (like "0010100110"). As each vector pattern is generated, I need to update the occurrence count of the current vector pattern on a shared ConcurrentDictionary. This ConcurrentDictionary acts like a histogram of specific binary patterns among ALL data sources. In a pseudo-code it should work like this:
ConcurrentDictionary<BinaryPattern,int> concDict = new ConcurrentDictionary<BinaryPattern,int>();
Parallel.Foreach(var dataSource in listOfDataSources)
{
for(int i=0;i<dataSource.OperationCount;i++)
{
BinaryPattern pattern = dataSource.GeneratePattern(i);
//Add the pattern to concDict if it does not exist,
//or increment the current value of it, in a thread-safe fashion among all
//dataSource objects in parallel steps.
}
}
I have read about TryAdd() and TryUpdate() methods of ConcurrentDictionary class in the documentation but I am not sure that I have clearly understood them. TryAdd() obtains an access to the Dictionary for the current thread and looks for the existence of a specific key, a binary pattern in this case, and then if it does not exist, it creates its entry, sets its value to 1 as it is the first occurence of this pattern. TryUpdate() gains acces to the dictionary for the current thread, looks whether the entry with the specified key has its current value equal to a "known" value, if it is so, updates it. By the way, TryGetValue() checks whether a key exits in the dictionary and returns the current value, if it does.
Now I think of the following usage and wonder if it is a correct implementation of a thread-safe population of the ConcurrentDictionary:
ConcurrentDictionary<BinaryPattern,int> concDict = new ConcurrentDictionary<BinaryPattern,int>();
Parallel.Foreach(var dataSource in listOfDataSources)
{
for(int i=0;i<dataSource.OperationCount;i++)
{
BinaryPattern pattern = dataSource.GeneratePattern(i);
while(true)
{
//Look whether the pattern is in dictionary currently,
//if it is, get its current value.
int currOccurenceOfPattern;
bool isPatternInDict = concDict.TryGetValue(pattern,out currOccurenceOfPattern);
//Not in dict, try to add.
if(!isPatternInDict)
{
//If the pattern is not added in the meanwhile, add it to the dict.
//If added, then exit from the while loop.
//If not added, then skip this step and try updating again.
if(TryAdd(pattern,1))
break;
}
//The pattern is already in the dictionary.
//Try to increment its current occurrence value instead.
else
{
//If the pattern's occurence value is not incremented by another thread
//in the meanwhile, update it. If this succeeds, then exit from the loop.
//If TryUpdate fails, then we see that the value has been updated
//by another thread in the meanwhile, we need to try our chances in the next
//step of the while loop.
int newValue = currOccurenceOfPattern + 1;
if(TryUpdate(pattern,newValue,currOccurenceOfPattern))
break;
}
}
}
}
I tried to firmly summarize my logic in the above code snippet in the comments. From what I gather from the documentation, a thread-safe updating scheme can be coded in this fashion, given the atomic "TryXXX()" methods of the ConcurrentDictionary. Is this a correct approach to the problem? How can this be improved or corrected if it is not?
You can use AddOrUpdate method that encapsulates either add or update logic as single thread-safe operation:
ConcurrentDictionary<BinaryPattern,int> concDict = new ConcurrentDictionary<BinaryPattern,int>();
Parallel.Foreach(listOfDataSources, dataSource =>
{
for(int i=0;i<dataSource.OperationCount;i++)
{
BinaryPattern pattern = dataSource.GeneratePattern(i);
concDict.AddOrUpdate(
pattern,
_ => 1, // if pattern doesn't exist - add with value "1"
(_, previous) => previous + 1 // if pattern exists - increment existing value
);
}
});
Please note that AddOrUpdateoperation is not atomic, not sure if it's your requirement but if you need to know the exact iteration when a value was added to the dictionary you can keep your code (or extract it to kind of extension method)
You might also want to go through this article
I don't know what BinaryPattern is here, but I would probably address this in a different way. Instead of copying value types around, inserting things into dictionaries, etc.. like this, I would probably be more inclined if performance was critical to simply place your instance counter in BinaryPattern. Then use InterlockedIncrement() to increment the counter whenever the pattern was found.
Unless there is a reason to separate the count from the pattern, in which case the ConccurentDictionary is probably a good choice.
First, the question is a little confusing because it's not clear what you mean by Parallel.Foreach. I would naively expect this to be System.Threading.Tasks.Parallel.ForEach(), but that's not usable with the syntax you show here.
That said, assuming you actually mean something like Parallel.ForEach(listOfDataSources, dataSource => { ... } )…
Personally, unless you have some specific need to show intermediate results, I would not bother with ConcurrentDictionary here. Instead, I would let each concurrent operation generate its own dictionary of counts, and then merge the results at the end. Something like this:
var results = listOfDataSources.Select(dataSource =>
Tuple.Create(dataSource, new Dictionary<BinaryPattern, int>())).ToList();
Parallel.ForEach(results, result =>
{
for(int i = 0; i < result.Item1.OperationCount; i++)
{
BinaryPattern pattern = result.Item1.GeneratePattern(i);
int count;
result.Item2.TryGetValue(pattern, out count);
result.Item2[pattern] = count + 1;
}
});
var finalResult = new Dictionary<BinaryPattern, int>();
foreach (result in results)
{
foreach (var kvp in result.Item2)
{
int count;
finalResult.TryGetValue(kvp.Key, out count);
finalResult[kvp.Key] = count + kvp.Value;
}
}
This approach would avoid contention between the worker threads (at least where the counts are concerned), potentially improving efficiency. The final aggregation operation should be very fast and can easily be handled in the single, original thread.

Does for loop count elements added into itself during the loop?

My question is, that when I loop through a list with for loop, and add elements to it during this, does it count the elements added while looping?
Simple code example:
for (int i = 0; i < listOfIds.Count(); i++) // Does loop counts the items added below?
{
foreach (var post in this.Collection)
{
if (post.ResponsePostID == listOfIds.ElementAt(i))
{
listOfIds.Add(post.PostId); // I add new item to list in here
}
}
}
I hope my explanation is good enough for you to understand what my question is.
Yes, it usually does. But changing a collection at the same time you're iterating over it can lead to weird behavior and hard-to-find bugs. It isn't recommended at all.
If you want this loop run only for preAdded item count then do this
int nLstCount = listOfIds.Count();
for (int i = 0; i < nLstCount ; i++)
{
foreach (var post in this.Collection)
{
if (post.ResponsePostID == listOfIds.ElementAt(i))
{
listOfIds.Add(post.PostId);
}
}
}
Yes it surely will.The inner foreach loop will execute and add the elements the outer collection and thus will increament the count of the elements.
listOfIds.Count=2 //iteration 1
listOfIds.Add(//element)
when it come to the for loop again
listOfIds.Count=3 //iteration 2
As a slightly abridged explanation of the for loop. You're essentially defining the following:
for (initializer; condition; iterator)
body
Your initializer will will establish your initial conditions, and will only happen once (effectively outside the loop).
Your condition will be evaluated every time to determine whether your loop should run again, or simply exit.
Your iterator defines an action that will occur after each iteration in your loop.
So in your case, your loop will reevaluate listOfIds.Count(); each time, to decide if it should execute; that may or may not be your desired behaviour.
As Dennis points out, you can let yourself get into a bit of a mess (youy loop may run infinitely) if you aren't careful.
A much more detailed/better written explanation can be found on msdn: http://msdn.microsoft.com/en-us/library/ch45axte.aspx

Linq FirstOrDefault evaluates predicate each iteration?

If I had a statement such as:
var item = Core.Collections.Items.FirstOrDefault(itm => itm.UserID == bytereader.readInt());
Does this code read an integer from my stream each iteration, or does it read the integer once, store it, then use its value throughout the lookup?
Consider this code:
static void Main(string[] args)
{
new[] { 1, 2, 3, 4 }.FirstOrDefault(j => j == Get());
Console.ReadLine();
}
static int i = 5;
static int Get()
{
Console.WriteLine("GET:" + i);
return i--;
}
It shows, that it will call the method the number of times it needs to meet the first element matching the condition. The output will be:
GET:5
GET:4
GET:3
I don't know without checking but would expect it to read it each time.
But this is very easily remedied with the following version of your code.
byte val = bytereader.readInt();
var item = Core.Collections.Items.FirstOrDefault(itm => itm.UserID == val);
Myself, I would automatically take this approach anyway just to remove any doubt. Might be a good habit to form as there is no reason to read it for each item.
It's actually quite obvious that the call is performed for each item - FirstOrDefault() takes an delegate as argument. This fact is a bit obscured by using a lambda method but in the end the method only sees a delegate that it can call for each item to check the predicate. In order to evaluate the right hand side only once some magic mechanism would have to understand and rewrite the method and (sometimes sadly) there is no real magic inside compilers and runtimes.

Categories