I have a List<byte[]> and I like to deserialize each byte[] into Foo. The List is ordered and I like to write a parallel loop in which the resulting List<Foo> contains all Foo in the same order than the original byte[]. The list is significantly large to make parallel operation worthwhile. Is there a built-in way to accomplish this?
If not, any ideas how to achieve a speedup over running this all synchronously?
Thanks
From the info you've given, I understand you want to have an output array of Foo with size equal to the input array of bytes? Is this correct?
If so, yes the operation is simple. Don't bother with locking or synchronized constructs, these will erode all the speed up that parallelization gives you.
Instead, if you obey this simple rule any algorithm can be parallelized without locking or synchronization:
For each input element X[i] processed, you may read from any input element X[j], but only write to output element Y[i]
Look up Scatter/Gather, this type of operation is called a gather as only one output element is written to.
If you can use the above principle then you want to create your output array Foo[] up front, and use Parallel.For not ForEach on the input array.
E.g.
List<byte[]> inputArray = new List<byte[]>();
int[] outputArray = new int[inputArray.Count];
var waitHandle = new ManualResetEvent(false);
int counter = 0;
Parallel.For(0, inputArray.Count, index =>
{
// Pass index to for loop, do long running operation
// on input items
// writing to only a single output item
outputArray[index] = DoOperation(inputArray[index]);
if(Interlocked.Increment(ref counter) == inputArray.Count -1)
{
waitHandle.Set();
}
});
waitHandler.WaitOne();
// Optional conversion back to list if you wanted this
var outputList = outputArray.ToList();
You can use a threadsafe dictionary with an index int key to store the reult from foo
so at the end you will have all the data orderer in the dictionary
Related
I am trying to split collection into equal number of batches.below is the code.
public static List<List<T>> SplitIntoBatches<T>(List<T> collection, int size)
{
var chunks = new List<List<T>>();
var count = 0;
var temp = new List<T>();
foreach (var element in collection)
{
if (count++ == size)
{
chunks.Add(temp);
temp = new List<T>();
count = 1;
}
temp.Add(element);
}
chunks.Add(temp);
return chunks;
}
can we do it using Parallel.ForEach() for better performance as we have around 1 Million items in the list?
Thanks!
If performance is the concern, my thoughts (in increasing order of impact):
right-sizing the lists when you create them would save a lot of work, i.e. figure out the output batch sizes before you start copying, i.e. temp = new List<T>(thisChunkSize)
working with arrays would be more effective than working with lists - new T[thisChunkSize]
especially if you use BlockCopy (or CopyTo, which uses that internally) rather than copying individual elements one by one
once you've calculated the offsets for each of the chunks, the individual block-copies could probably be executed in parallel, but I wouldn't assume it will be faster - memory bandwidth will be the limiting factor at that point
but the ultimate fix is: don't copy the data at all, but instead just create ranges over the existing data; for example, if using arrays, ArraySegment<T> would help; if you're open to using newer .NET features, this is a perfect fit for Memory<T>/Span<T> - creating memory/span ranges over an existing array is essentially free and instant - i.e. take a T[] and return List<Memory<T>> or similar.
Even if you can't switch to ArraySegment<T> / Memory<T> etc, returning something like that could still be used - i.e. List<ListSegment<T>> where ListSegment<T> is something like:
readonly struct ListSegment<T> { // like ArraySegment<T>, but for List<T>
public List<T> List {get;}
public int Offset {get;}
public int Count {get;}
}
and have your code work with ListSegment<T> by processing the Offset and Count appropriately.
I am still getting a hang of TPL DataFlow so bear with me.
My application requires that queue is executed upon in parallel while retaining its order. This led me to the DataFlow library and what I am trying to do. I am wondering if there is a way to link one ActionBlock to another and the second taking a value from the first to operate on.
Pseudo-example:
var block1 = new ActionBlock<ByteBuffer>(buffer => {
// code generating a hash of the byte buffer to pass to next block
ulong hash = generateHash(buffer);
// this is what i would like to pass to the next ActionBlock
var tup = Tuple<ByteBuffer, ulong>(buffer, along);
}, dataFlowOpts);
var block2 = new ActionBlock<Tuple<ByteBuffer, ulong>(tup => {
/* code to act on the buffer and hash */
}, dataFlowOpts);
block1.LinkTo(block2); // Is there something like this that would use the correct params?
Is what I'm trying to do possible? Does this even make sense? The reason I have these separated into two ActionBlocks is I would like to reuse block2 in another codepath (hashing the contents of a different ByteBuffer differently.)
Maybe there is a better way? Really, I am just trying to hash objects as they come in in a concurrent fashion while retaining FIFO order as it's way too slow to hash these synchronously.
Just use the TransformBlock:
var block1 = new TransformBlock<ByteBuffer, Tuple<ByteBuffer, ulong>>(buffer => {
// code generating a hash of the byte buffer to pass to next block
ulong hash = generateHash(buffer);
// this is what i would like to pass to the next ActionBlock
return Tuple<ByteBuffer, ulong>(buffer, along);
}, dataFlowOpts);
var block2 = new ActionBlock<Tuple<ByteBuffer, ulong>(tup => {
/* code to act on the buffer and hash */
}, dataFlowOpts);
block1.LinkTo(block2); // Is there something like this that would use the correct params?
Or, probably better option to use a DTO class with two properties: buffer and hash, as Tuple isn't much readable. Also, consider a new C# feature about named tuples.
I have such a scenario at hand (using C#): I need to use a parallel "foreach" on a list of objects: Each object in this list is working like a data source, which is generating series of binary vector patterns (like "0010100110"). As each vector pattern is generated, I need to update the occurrence count of the current vector pattern on a shared ConcurrentDictionary. This ConcurrentDictionary acts like a histogram of specific binary patterns among ALL data sources. In a pseudo-code it should work like this:
ConcurrentDictionary<BinaryPattern,int> concDict = new ConcurrentDictionary<BinaryPattern,int>();
Parallel.Foreach(var dataSource in listOfDataSources)
{
for(int i=0;i<dataSource.OperationCount;i++)
{
BinaryPattern pattern = dataSource.GeneratePattern(i);
//Add the pattern to concDict if it does not exist,
//or increment the current value of it, in a thread-safe fashion among all
//dataSource objects in parallel steps.
}
}
I have read about TryAdd() and TryUpdate() methods of ConcurrentDictionary class in the documentation but I am not sure that I have clearly understood them. TryAdd() obtains an access to the Dictionary for the current thread and looks for the existence of a specific key, a binary pattern in this case, and then if it does not exist, it creates its entry, sets its value to 1 as it is the first occurence of this pattern. TryUpdate() gains acces to the dictionary for the current thread, looks whether the entry with the specified key has its current value equal to a "known" value, if it is so, updates it. By the way, TryGetValue() checks whether a key exits in the dictionary and returns the current value, if it does.
Now I think of the following usage and wonder if it is a correct implementation of a thread-safe population of the ConcurrentDictionary:
ConcurrentDictionary<BinaryPattern,int> concDict = new ConcurrentDictionary<BinaryPattern,int>();
Parallel.Foreach(var dataSource in listOfDataSources)
{
for(int i=0;i<dataSource.OperationCount;i++)
{
BinaryPattern pattern = dataSource.GeneratePattern(i);
while(true)
{
//Look whether the pattern is in dictionary currently,
//if it is, get its current value.
int currOccurenceOfPattern;
bool isPatternInDict = concDict.TryGetValue(pattern,out currOccurenceOfPattern);
//Not in dict, try to add.
if(!isPatternInDict)
{
//If the pattern is not added in the meanwhile, add it to the dict.
//If added, then exit from the while loop.
//If not added, then skip this step and try updating again.
if(TryAdd(pattern,1))
break;
}
//The pattern is already in the dictionary.
//Try to increment its current occurrence value instead.
else
{
//If the pattern's occurence value is not incremented by another thread
//in the meanwhile, update it. If this succeeds, then exit from the loop.
//If TryUpdate fails, then we see that the value has been updated
//by another thread in the meanwhile, we need to try our chances in the next
//step of the while loop.
int newValue = currOccurenceOfPattern + 1;
if(TryUpdate(pattern,newValue,currOccurenceOfPattern))
break;
}
}
}
}
I tried to firmly summarize my logic in the above code snippet in the comments. From what I gather from the documentation, a thread-safe updating scheme can be coded in this fashion, given the atomic "TryXXX()" methods of the ConcurrentDictionary. Is this a correct approach to the problem? How can this be improved or corrected if it is not?
You can use AddOrUpdate method that encapsulates either add or update logic as single thread-safe operation:
ConcurrentDictionary<BinaryPattern,int> concDict = new ConcurrentDictionary<BinaryPattern,int>();
Parallel.Foreach(listOfDataSources, dataSource =>
{
for(int i=0;i<dataSource.OperationCount;i++)
{
BinaryPattern pattern = dataSource.GeneratePattern(i);
concDict.AddOrUpdate(
pattern,
_ => 1, // if pattern doesn't exist - add with value "1"
(_, previous) => previous + 1 // if pattern exists - increment existing value
);
}
});
Please note that AddOrUpdateoperation is not atomic, not sure if it's your requirement but if you need to know the exact iteration when a value was added to the dictionary you can keep your code (or extract it to kind of extension method)
You might also want to go through this article
I don't know what BinaryPattern is here, but I would probably address this in a different way. Instead of copying value types around, inserting things into dictionaries, etc.. like this, I would probably be more inclined if performance was critical to simply place your instance counter in BinaryPattern. Then use InterlockedIncrement() to increment the counter whenever the pattern was found.
Unless there is a reason to separate the count from the pattern, in which case the ConccurentDictionary is probably a good choice.
First, the question is a little confusing because it's not clear what you mean by Parallel.Foreach. I would naively expect this to be System.Threading.Tasks.Parallel.ForEach(), but that's not usable with the syntax you show here.
That said, assuming you actually mean something like Parallel.ForEach(listOfDataSources, dataSource => { ... } )…
Personally, unless you have some specific need to show intermediate results, I would not bother with ConcurrentDictionary here. Instead, I would let each concurrent operation generate its own dictionary of counts, and then merge the results at the end. Something like this:
var results = listOfDataSources.Select(dataSource =>
Tuple.Create(dataSource, new Dictionary<BinaryPattern, int>())).ToList();
Parallel.ForEach(results, result =>
{
for(int i = 0; i < result.Item1.OperationCount; i++)
{
BinaryPattern pattern = result.Item1.GeneratePattern(i);
int count;
result.Item2.TryGetValue(pattern, out count);
result.Item2[pattern] = count + 1;
}
});
var finalResult = new Dictionary<BinaryPattern, int>();
foreach (result in results)
{
foreach (var kvp in result.Item2)
{
int count;
finalResult.TryGetValue(kvp.Key, out count);
finalResult[kvp.Key] = count + kvp.Value;
}
}
This approach would avoid contention between the worker threads (at least where the counts are concerned), potentially improving efficiency. The final aggregation operation should be very fast and can easily be handled in the single, original thread.
I am trying to utilize the parallel for loop in .NET Framework 4.0. However I noticed that, I am missing some elements in the result set.
I have snippet of code as below. lhs.ListData is a list of nullable double and rhs.ListData is a list of nullable double.
int recordCount = lhs.ListData.Count > rhs.ListData.Count ? rhs.ListData.Count : lhs.ListData.Count;
List<double?> listResult = new List<double?>(recordCount);
var rangePartitioner = Partitioner.Create(0, recordCount);
Parallel.ForEach(rangePartitioner, range =>
{
for (int index = range.Item1; index < range.Item2; index++)
{
double? result = lhs.ListData[index] * rhs.ListData[index];
listResult.Add(result);
}
});
lhs.ListData has the length of 7964 and rhs.ListData has the length of 7962. When I perform the "*" operation, listResult has only 7867 as output. There are null elements in the both input list.
I am not sure what is happening during the execution. Is there any reason why I am seeing less elements in the result set? Please advice...
The correct way to do this is to use LINQ's IEnumerable.AsParallel() extention. It does all of the partitioning for you, and everything in PLINQ is inherently thread-safe. There is another LINQ extension called Zip that zips together two collections into one, based on a function that you give it. However, this isn't exactly what you need as it only goes to the length of the shorter of the two lists, not the longer. It would probably be easies to do this, but first expand the shorter of the two lists to the length of the longer one by padding it with null at the end of the list.
IEnumerable<double?> lhs, rhs; // Assume these are filled with your numbers.
double?[] result = System.Linq.Enumerable.Zip(lhs, rhs, (a, b) => a * b).AsParallel().ToArray();
Here's the MSDN page on Zip:
http://msdn.microsoft.com/en-us/library/dd267698%28VS.100%29.aspx
That's probably because the operations on a List<T> (e.g. Add) are not thread safe - your results may vary. As a workaround you could use a lock, but that would very much reduce performance.
It looks like you just want each item in the result list to be the product of the items at the corresponding index in the two input lists, how about this instead using PLINQ:
var listResult = lhs.AsParallel()
.Zip(rhs.AsParallel(), (a,b) => a*b)
.ToList();
Not sure why you chose parallelism here, I would benchmark if this is even necessary - is this truly the bottleneck in your application?
You are using List<double?> to store results but Add method is not thread safe.
You can use explicit index to store the result (instead of calling Add):
listResult[index] = result;
I have a time-consuming static C# method for creating an array (of double:s) and have therefore parallelized the operation.
Since I am creating the array before entering the loop and not tampering with its reference after that, I am thinking that it should be sufficient to lock the array itself while updating it within the parallel loop.
Is it OK to lock on the array itself or would I potentially face some performance or deadlock problems with this approach? Is it better to create a separate lock variable to put the lock on?
Here is some sample code to illustrate:
static double[] CreateArray(int mn, int n)
{
var localLock = new object(); // Necessary?
var array = new double[mn];
Parallel.For(0, n, i =>
{
... lengthy operation ...
lock (array) // or is 'lock (localLock)' required?
{
UpdatePartOfArray(array);
}
});
return array;
}
Since the array here is a reference type, isn't reassigned during the operations, and isn't exposed elsewhere yet (where some other code could lock it), yes, it can suffice as the lock object itself. However, if the updates are to different parts of the array, i.e.
array[i] = ... // i is separate
then there is no need to lock anything.