Parallel.Foreach with modulo partitioning - c#

When using Parallel.Foreach() for 100 items using 4 threads, it will divide the list into 4 blocks (0-24,25-49,50-74,75-99) of items, which means, that items 0, 25, 50 and 75 are processed in parallel.
Is it somehow possible to partition the items in a modulo way to handle those with lower indices first? Like:
Thread 1: 0, 5, 9,..
Thread 2: 1, 6, 10,...
Thread 3: 2, 7, 11,...
Thread 4: 3, 8, 12,...

This partitioning method is known as Round Robin, or Striping. The primary challenge of using this with Parallel.ForEach() is that ForEach() requires partitioners to support dynamic partitions, which would not be possible with this type of partitioning as the number of partitions must be fixed prior to execution of the loop.
One way to achieve this type of partitioning is to create a custom class derived from System.Collections.Concurrent.Partitioner<TSource> and use the ParallelQuery.ForAll() method, which does not have the dynamic partitioning support requirement. For most applications this should be equivalent to using ForEach().
Below is an example of a custom Partitioner and a basic implementation. The Partitioner will generate the same number of partitions as the degree of parallelism.
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;
namespace RoundRobinPartitioning
{
public class RoundRobinPartitioner<TSource> : Partitioner<TSource>
{
private readonly IList<TSource> _source;
public RoundRobinPartitioner(IList<TSource> source)
{
_source = source;
}
public override bool SupportsDynamicPartitions { get { return false; } }
public override IList<IEnumerator<TSource>> GetPartitions(int partitionCount)
{
var enumerators = new List<IEnumerator<TSource>>(partitionCount);
for (int i = 0; i < partitionCount; i++)
{
enumerators.Add(GetEnumerator(i, partitionCount));
}
return enumerators;
}
private IEnumerator<TSource> GetEnumerator(
int partition,
int partitionCount)
{
int position = partition;
TSource value;
while (position < _source.Count)
{
value = _source[position];
position += partitionCount;
yield return value;
}
}
}
class Program
{
static void Main(string[] args)
{
var values = Enumerable.Range(0, 100).ToList();
var partitioner = new RoundRobinPartitioner<int>(values);
partitioner.AsParallel()
.WithDegreeOfParallelism(4)
.ForAll(value =>
{
// Perform work here
});
}
}
}

Related

Why I can't change an array element obtained from another class?

I created a class that implements the wrapper over the array double [] but I can not change the element of the received array. These are tests
public void SetCorrectly ()
public void IndexerDoesNotCopyArray ()
The problem sounds like this. Write the class Indexer, which is created as a wrapper over the array double [], and opens access to its subarray of some length, starting with some element. Your decision must pass the tests contained in the project. As always, you must monitor the integrity of the data in Indexer.
Here is my code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Incapsulation.Weights
{
public class Indexer
{
double[] array;
int start;
int length;
public int Length
{
get { return length; }
}
public Indexer(double[] array, int start, int length)
{
if (start < 0 || start >= array.Length) throw new ArgumentException();
this.start = start;
if (length < start || length > array.Length) throw new ArgumentException();
this.length = length;
this.array = array.Skip(start).Take(length).ToArray();
}
public double this[int index]
{
get { return array[index]; }
set { array[index] = value; }
}
}
}
This is tests
using NUnit.Framework;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Incapsulation.Weights
{
[TestFixture]
public class Indexer_should
{
double[] array = new double[] { 1, 2, 3, 4 };
[Test]
public void HaveCorrectLength()
{
var indexer = new Indexer(array, 1, 2);
Assert.AreEqual(2, indexer.Length);
}
[Test]
public void GetCorrectly()
{
var indexer = new Indexer(array, 1, 2);
Assert.AreEqual(2, indexer[0]);
Assert.AreEqual(3, indexer[1]);
}
[Test]
public void SetCorrectly()
{
var indexer = new Indexer(array, 1, 2);
indexer[0] = 10;
Assert.AreEqual(10, array[1]);
}
[Test]
public void IndexerDoesNotCopyArray()
{
var indexer1 = new Indexer(array, 1, 2);
var indexer2 = new Indexer(array, 0, 2);
indexer1[0] = 100500;
Assert.AreEqual(100500, indexer2[1]);
}
}
The problem you are facing here is that the last test requires that your wrapper provides access to the underlying array. In other words, whatever number of Indexers are created, they all point to the same underlying array.
Your line here this.array = array.Skip(start).Take(length).ToArray(); violates this requirement creating a new instance of Array class. Because of this the value changed by first indexer is not reflected in the second one - they point to different memory areas.
To fix this, instead of creating a new Array using LINQ, simply store the original array passed through constructor. Your this[] indexer property must take care of passed start and end adding start to the index and checking the out-of-boundaries condition manually.
All linq extension methods create a new enumeration, they do not mutate or return the one the method is called on:
var newArray = array.Skip(...).ToArray();
ReferenceEquals(array, newArray); //returns false
Any change you might make in an element of newArray will not change anything whatsoever in array.
Your SetCorrectly test is comparing indexer and array and it will always fail. Your other test also fails because indexer1 and indexer2 reference two different arrays.
However because Linq is lazy, modifying array can be seen by the result of the Linq extension method depending on when you materialize the enumeration; this can happen:
var skippedA = array.Skip(1); //deferred execution
array[1] = //some different value...,
var newArray =skipped.ToArray(); //Skip is materialized here!
newArray[1] == array[1]; //true!

Assign specific initial parameters from random values, among a pool of parameters

I am using a function to initialize values for a constructor.
In the constructor I call this function, and initialize parameters based on the specific type passed in the constructor.
public class myclass()
{
// class parameters of type int\
public myclass()
{
GenerateValues(1);
}
private void GenerateValues( int type_of_worker)
{
Random rnd = new Random();
//generate random values for the params
}
}
With this I can generate random values based on the type of worker passed to the constructor, so I can have a basic worker, a technician, a director, a manager and so on.
Each role has requirements though, so when I create the worker of type "manager", it has to have at least 2 parameters that are above 50 (min 0 max 99). If is a director, it needs 3 parameters equal or above 75.
How can I tell C# to create random values for X parameters, but if the type is manager, 2 of them has to be above 50, or if it is a director, 3 parameters ahs to be above 75?
For example, if I create a manager, I would use this in GenerateValues:
// main parameters - any value between 1-99 is fine
par1 = rnd.Next(1.99);
par2 = rnd.Next(1.99);
par3 = rnd.Next(1.99);
par4 = rnd.Next(1.99);
par5 = rnd.Next(1.99);
//extra parameters - ANY 2 of these above 50 for manager
par6 = rnd.Next(1.99);
par7 = rnd.Next(50.99);
par8 = rnd.Next(50.99);
par9 = rnd.Next(1.99);
The problem here is that I don't want to specify which of the extra parameters is between 50 and 99; but would like that any 2 of the extra parameters has to be between 50 and 99.
I could assign values and then read them, changing them again, but it is pretty inefficient.
I would do it in more OOP way, since it's C#. See implementation below.
First I define an interface to define a contract. From each worker type we should get integer parameters. Not sure how your parameters look like, I used just an array of integers.
Then I add abstract class to share a code for generating N random numbers from given interval.
Then I just create concrete types implementing only specific logic.
For simplicity I used LINQ helper functions.
The following rules are applied to get set of random numbers:
Technician - 2 values from interval [0, 99] and 5 values from [0, 50]. Notice int Next function min parameter is inclusive whereas max parameter is exclusive.
Director - 2 values from interval [50, 99] and 5 values from [0, 50].
Manager - 3 values from interval [75, 99] and 5 values from [0, 50].
using System.Linq;
using System.Collections.Generic;
public interface IWorker {
int[] GetParameters();
}
abstract class Worker : IWorker {
public abstract int[] GetParameters();
protected IEnumerable<int> GenerateNValues(int quanity, int min, int max) {
var theRandom = new Random();
var theResult = new List<int>(quanity);
for (int i = 0; i < quanity; i++)
theResult.Add(theRandom.Next(min, max));
return theResult;
}
protected IEnumerable<int> Shuffle(IEnumerable<int> array) {
var theRandom = new Random();
return array.OrderBy(x => theRandom.Next()).ToArray();
}
}
class Technician : Worker {
public override int[] GetParameters() {
return GenerateNValues(2, 0, 100).Concat(GenerateNValues(5, 10, 51)).ToArray();
}
}
class Director : Worker {
public override int[] GetParameters() {
return GenerateNValues(2, 50, 100).Concat(GenerateNValues(5, 10, 51)).ToArray();
}
}
class Manager : Worker {
public override int[] GetParameters() {
return Shuffle(GenerateNValues(3, 75, 100).Concat(GenerateNValues(5, 10, 51))).ToArray();
}
}
public class MyClass
{
public MyClass(IWorker aWorker)
{
var theParameters = aWorker.GetParameters();
// do stuff with parameters
}
}

Using ImmutableSortedSet<T> for a thread safe cache

I have a method that takes a DateTime and returns the date marking the end of that quarter. Because of some complexity involving business days and holiday calendars, I want to cache the result to speed up subsequent calls. I'm using a SortedSet<DateTime> to maintain a cache of data, and I use the GetViewBetween method in order to do cache lookups as follows:
private static SortedSet<DateTime> quarterEndCache = new SortedSet<DateTime>();
public static DateTime GetNextQuarterEndDate(DateTime date)
{
var oneDayLater = date.AddDays(1.0);
var fiveMonthsLater = date.AddMonths(5);
var range = quarterEndCache.GetViewBetween(oneDayLater, fiveMonthsLater);
if (range.Count > 0)
{
return range.Min;
}
// Perform expensive calc here
}
Now I want to make my cache threadsafe. Rather than use a lock everywhere which would incur a performance hit on every lookup, I'm exploring the new ImmutableSortedSet<T> collection which would allow me to avoid locks entirely. The problem is that ImmutableSortedSet<T> doesn't have the method GetViewBetween. Is there any way to get similar functionality from the ImmutableSortedSet<T>?
[EDIT]
Servy has convinced me just using a lock with a normal SortedSet<T> is the easiest solution. I'll leave the question open though just because I'm interested to know whether the ImmutableSortedSet<T> can handle this scenario efficiently.
Let's divide the question into two parts:
How to get a functionality similar to GetViewBetween with ImmutableSortedSet<T>? I'd suggest using the IndexOf method. In the snippet below, I created an extension method GetRangeBetween which should do the job.
How to implement lock-free, thread-safe updates with data immutable data structures? Despite this is not the original question, there are some skeptical comments with respect to this issue.
The immutables framework implements a method for exactly that purpose: System.Collections.Immutable.Update<T>(ref T location, Func<T, T> transformer) where T : class; The method internally relies on atomic compare/exchange operations. If you want to do this by hand, you'll find an alternative implementation below which should behave the same like Immutable.Update.
So here is the code:
public static class ImmutableExtensions
{
public static IEnumerable<T> GetRangeBetween<T>(
this ImmutableSortedSet<T> set, T min, T max)
{
int i = set.IndexOf(min);
if (i < 0) i = ~i;
while (i < set.Count)
{
T x = set[i++];
if (set.KeyComparer.Compare(x, min) >= 0 &&
set.KeyComparer.Compare(x, max) <= 0)
{
yield return x;
}
else
{
break;
}
}
}
public static void LockfreeUpdate<T>(ref T item, Func<T, T> fn)
where T: class
{
T x, y;
do
{
x = item;
y = fn(x);
} while (Interlocked.CompareExchange(ref item, y, x) != x);
}
}
Usage:
private static volatile ImmutableSortedSet<DateTime> quarterEndCache =
ImmutableSortedSet<DateTime>.Empty;
private static volatile int counter; // test/verification purpose only
public static DateTime GetNextQuarterEndDate(DateTime date)
{
var oneDayLater = date.AddDays(1.0);
var fiveMonthsLater = date.AddMonths(5);
var range = quarterEndCache.GetRangeBetween(oneDayLater, fiveMonthsLater);
if (range.Any())
{
return range.First();
}
// Perform expensive calc here
// -> Meaningless dummy computation for verification purpose only
long x = Interlocked.Increment(ref counter);
DateTime test = DateTime.FromFileTime(x);
ImmutableExtensions.LockfreeUpdate(
ref quarterEndCache,
c => c.Add(test));
return test;
}
[TestMethod]
public void TestIt()
{
var tasks = Enumerable
.Range(0, 100000)
.Select(x => Task.Factory.StartNew(
() => GetNextQuarterEndDate(DateTime.Now)))
.ToArray();
Task.WaitAll(tasks);
Assert.AreEqual(100000, counter);
}

Buffering a LINQ query

FINAL EDIT:
I've chosen Timothy's answer but if you want a cuter implementation that leverages the C# yield statement check Eamon's answer: https://stackoverflow.com/a/19825659/145757
By default LINQ queries are lazily streamed.
ToArray/ToList give full buffering but first they're eager and secondly it may take quite some time to complete with an infinite sequence.
Is there any way to have a combination of both behaviors : streaming and buffering values on the fly as they are generated, so that the next querying won't trigger the generation of the elements that have already been queried.
Here is a basic use-case:
static IEnumerable<int> Numbers
{
get
{
int i = -1;
while (true)
{
Console.WriteLine("Generating {0}.", i + 1);
yield return ++i;
}
}
}
static void Main(string[] args)
{
IEnumerable<int> evenNumbers = Numbers.Where(i => i % 2 == 0);
foreach (int n in evenNumbers)
{
Console.WriteLine("Reading {0}.", n);
if (n == 10) break;
}
Console.WriteLine("==========");
foreach (int n in evenNumbers)
{
Console.WriteLine("Reading {0}.", n);
if (n == 10) break;
}
}
Here is the output:
Generating 0.
Reading 0.
Generating 1.
Generating 2.
Reading 2.
Generating 3.
Generating 4.
Reading 4.
Generating 5.
Generating 6.
Reading 6.
Generating 7.
Generating 8.
Reading 8.
Generating 9.
Generating 10.
Reading 10.
==========
Generating 0.
Reading 0.
Generating 1.
Generating 2.
Reading 2.
Generating 3.
Generating 4.
Reading 4.
Generating 5.
Generating 6.
Reading 6.
Generating 7.
Generating 8.
Reading 8.
Generating 9.
Generating 10.
Reading 10.
The generation code is triggered 22 times.
I'd like it to be triggered 11 times, the first time the enumerable is iterated.
Then the second iteration would benefit from the already generated values.
It would be something like:
IEnumerable<int> evenNumbers = Numbers.Where(i => i % 2 == 0).Buffer();
For those familiar with Rx it's a behavior similar to a ReplaySubject.
IEnumerable<T>.Buffer() extension method
public static EnumerableExtensions
{
public static BufferEnumerable<T> Buffer(this IEnumerable<T> source)
{
return new BufferEnumerable<T>(source);
}
}
public class BufferEnumerable<T> : IEnumerable<T>, IDisposable
{
IEnumerator<T> source;
List<T> buffer;
public BufferEnumerable(IEnumerable<T> source)
{
this.source = source.GetEnumerator();
this.buffer = new List<T>();
}
public IEnumerator<T> GetEnumerator()
{
return new BufferEnumerator<T>(source, buffer);
}
public void Dispose()
{
source.Dispose()
}
}
public class BufferEnumerator<T> : IEnumerator<T>
{
IEnumerator<T> source;
List<T> buffer;
int i = -1;
public BufferEnumerator(IEnumerator<T> source, List<T> buffer)
{
this.source = source;
this.buffer = buffer;
}
public T Current
{
get { return buffer[i]; }
}
public bool MoveNext()
{
i++;
if (i < buffer.Count)
return true;
if (!source.MoveNext())
return false;
buffer.Add(source.Current);
return true;
}
public void Reset()
{
i = -1;
}
public void Dispose()
{
}
}
Usage
using (var evenNumbers = Numbers.Where(i => i % 2 == 0).Buffer())
{
...
}
Comments
The key point here is that the IEnumerable<T> source given as input to the Buffer method only has GetEnumerator called once, regardless of how many times the result of Buffer is enumerated. All enumerators for the result of Buffer share the same source enumerator and internal list.
You can use the Microsoft.FSharp.Collections.LazyList<> type from the F# power pack (yep, from C# without F# installed - no problem!) for this. It's in Nuget package FSPowerPack.Core.Community.
In particular, you want to call LazyListModule.ofSeq(...) which returns a LazyList<T> that implements IEnumerable<T> and is lazy and cached.
In your case, usage is just a matter of...
var evenNumbers = LazyListModule.ofSeq(Numbers.Where(i => i % 2 == 0));
var cachedEvenNumbers = LazyListModule.ofSeq(evenNumbers);
Though I personally prefer var in all such cases, note that this does mean the compile-time type will be more specific than just IEnumerable<> - not that this is likely to ever be a downside. Another advantage of the F# non-interface types is that they expose some efficient operations you can't do efficienly with plain IEnumerables, such as LazyListModule.skip.
I'm not sure whether LazyList is thread-safe, but I suspect it is.
Another alternative pointed out in the comments below (if you have F# installed) is SeqModule.Cache (namespace Microsoft.FSharp.Collections, it'll be in GACed assembly FSharp.Core.dll) which has the same effective behavior. Like other .NET enumerables, Seq.cache doesn't have a tail (or skip) operator you can efficiently chain.
Thread-safe: unlike other solutions to this question Seq.cache is thread-safe in the sense that you can have multiple enumerators running in parallel (each enumerator is not thread safe).
Performance I did a quick benchmark, and the LazyList enumerable has at least 4 times more overhead than the SeqModule.Cache variant, which has at least three times more overhead than the custom implementation answers. So, while the F# variants work, they're not quite as fast. Note that 3-12 times slower still isn't very slow compared to an enumerable that does (say) I/O or any non-trivial computation, so this probably won't matter most of the time, but it's good to keep in mind.
TL;DR If you need an efficient, thread-safe cached enumerable, just use SeqModule.Cache.
Building upon Eamon's answer above, here's another functional solution (no new types) that works also with simultaneous evaluation. This demonstrates that a general pattern (iteration with shared state) underlies this problem.
First we define a very general helper method, meant to allow us to simulate the missing feature of anonymous iterators in C#:
public static IEnumerable<T> Generate<T>(Func<Func<Tuple<T>>> generator)
{
var tryGetNext = generator();
while (true)
{
var result = tryGetNext();
if (null == result)
{
yield break;
}
yield return result.Item1;
}
}
Generate is like an aggregator with state. It accepts a function that returns initial state, and a generator function that would have been an anonymous with yield return in it, if it were allowed in C#. The state returned by initialize is meant to be per-enumeration, while a more global state (shared between all enumerations) can be maintained by the caller to Generate e.g. in closure variables as we'll show below.
Now we can use this for the "buffered Enumerable" problem:
public static IEnumerable<T> Cached<T>(IEnumerable<T> enumerable)
{
var cache = new List<T>();
var enumerator = enumerable.GetEnumerator();
return Generate<T>(() =>
{
int pos = -1;
return () => {
pos += 1;
if (pos < cache.Count())
{
return new Tuple<T>(cache[pos]);
}
if (enumerator.MoveNext())
{
cache.Add(enumerator.Current);
return new Tuple<T>(enumerator.Current);
}
return null;
};
});
}
I hope this answer combines the brevity and clarity of sinelaw's answer and the support for multiple enumerations of Timothy's answer:
public static IEnumerable<T> Cached<T>(this IEnumerable<T> enumerable) {
return CachedImpl(enumerable.GetEnumerator(), new List<T>());
}
static IEnumerable<T> CachedImpl<T>(IEnumerator<T> source, List<T> buffer) {
int pos=0;
while(true) {
if(pos == buffer.Count)
if (source.MoveNext())
buffer.Add(source.Current);
else
yield break;
yield return buffer[pos++];
}
}
Key ideas are to use the yield return syntax to make for a short enumerable implementation, but you still need a state-machine to decide whether you can get the next element from the buffer, or whether you need to check the underlying enumerator.
Limitations: This makes no attempt to be thread-safe, nor does it dispose the underlying enumerator (which, in general, is quite tricky to do as the underlying uncached enumerator must remain undisposed as long as any cached enumerabl might still be used).
As far as I know there is no built-in way to do this, which - now that you mention it - is slightly surprising (my guess is, given the frequency with which one would want to use this option, it was probably not worth the effort needed to analyse the code to make sure that the generator gives the exact same sequence every time).
You can however implement it yourself. The easy way would be on the call-site, as
var evenNumbers = Numbers.Where(i => i % 2 == 0).
var startOfList = evenNumbers.Take(10).ToList();
// use startOfList instead of evenNumbers in the loop.
More generally and accurately, you could do it in the generator: create a List<int> cache and every time you generate a new number add it to the cache before you yield return it. Then when you loop through again, first serve up all the cached numbers. E.g.
List<int> cachedEvenNumbers = new List<int>();
IEnumerable<int> EvenNumbers
{
get
{
int i = -1;
foreach(int cached in cachedEvenNumbers)
{
i = cached;
yield return cached;
}
// Note: this while loop now starts from the last cached value
while (true)
{
Console.WriteLine("Generating {0}.", i + 1);
yield return ++i;
}
}
}
I guess if you think about this long enough you could come up with a general implementation of a IEnumerable<T>.Buffered() extension method - again, the requirement is that the enumeration doesn't change between calls and the question is if it is worth it.
Here's an incomplete yet compact 'functional' implementation (no new types defined).
The bug is that it does not allow simultaneous enumeration.
Original description:
The first function should have been an anonymous lambda inside the second, but C# does not allow yield in anonymous lambdas:
// put these in some extensions class
private static IEnumerable<T> EnumerateAndCache<T>(IEnumerator<T> enumerator, List<T> cache)
{
while (enumerator.MoveNext())
{
var current = enumerator.Current;
cache.Add(current);
yield return current;
}
}
public static IEnumerable<T> ToCachedEnumerable<T>(this IEnumerable<T> enumerable)
{
var enumerator = enumerable.GetEnumerator();
var cache = new List<T>();
return cache.Concat(EnumerateAndCache(enumerator, cache));
}
Usage:
var enumerable = Numbers.ToCachedEnumerable();
Full credit to Eamon Nerbonne and sinelaw for their answers, just a couple of tweaks! First, to release the enumerator when it is completed. Secondly to protect the underlying enumerator with a lock so the enumerable can be safely used on multiple threads.
// This is just the same as #sinelaw's Generator but I didn't like the name
public static IEnumerable<T> AnonymousIterator<T>(Func<Func<Tuple<T>>> generator)
{
var tryGetNext = generator();
while (true)
{
var result = tryGetNext();
if (null == result)
{
yield break;
}
yield return result.Item1;
}
}
// Cached/Buffered/Replay behaviour
public static IEnumerable<T> Buffer<T>(this IEnumerable<T> self)
{
// Rows are stored here when they've been fetched once
var cache = new List<T>();
// This counter is thread-safe in that it is incremented after the item has been added to the list,
// hence it will never give a false positive. It may give a false negative, but that falls through
// to the code which takes the lock so it's ok.
var count = 0;
// The enumerator is retained until it completes, then it is discarded.
var enumerator = self.GetEnumerator();
// This lock protects the enumerator only. The enumerable could be used on multiple threads
// and the enumerator would then be shared among them, but enumerators are inherently not
// thread-safe so a) we must protect that with a lock and b) we don't need to try and be
// thread-safe in our own enumerator
var lockObject = new object();
return AnonymousIterator<T>(() =>
{
int pos = -1;
return () =>
{
pos += 1;
if (pos < count)
{
return new Tuple<T>(cache[pos]);
}
// Only take the lock when we need to
lock (lockObject)
{
// The counter could have been updated between the check above and this one,
// so now we have the lock we must check again
if (pos < count)
{
return new Tuple<T>(cache[pos]);
}
// Enumerator is set to null when it has completed
if (enumerator != null)
{
if (enumerator.MoveNext())
{
cache.Add(enumerator.Current);
count += 1;
return new Tuple<T>(enumerator.Current);
}
else
{
enumerator = null;
}
}
}
}
return null;
};
});
}
I use the following extension method.
This way, the input is read at maximum speed, and the consumer processes at maximum speed.
public static IEnumerable<T> Buffer<T>(this IEnumerable<T> input)
{
var blockingCollection = new BlockingCollection<T>();
//read from the input
Task.Factory.StartNew(() =>
{
foreach (var item in input)
{
blockingCollection.Add(item);
}
blockingCollection.CompleteAdding();
});
foreach (var item in blockingCollection.GetConsumingEnumerable())
{
yield return item;
}
}
Example Usage
This example has a fast producer (find files), and a slow consumer (upload files).
long uploaded = 0;
long total = 0;
Directory
.EnumerateFiles(inputFolder, "*.jpg", SearchOption.AllDirectories)
.Select(filename =>
{
total++;
return filename;
})
.Buffer()
.ForEach(filename =>
{
//pretend to do something slow, like upload the file.
Thread.Sleep(1000);
uploaded++;
Console.WriteLine($"Uploaded {uploaded:N0}/{total:N0}");
});

ThreadLocal performance vs using parameters

I am currently implementing a runtime (i.e. a collection of functions) for a formulas language. Some formulas need a context to be passed to them and I created a class called EvaluationContext which contains all properties I need access to at runtime.
Using ThreadLocal<EvaluationContext> seems like a good option to make this context available to the runtime functions. The other option is to pass the context as a parameter to the functions that need it.
I prefer using ThreadLocal but I was wondering if there is any performance penalty as opposed to passing the evaluation context via method parameters.
I created the program below and it is faster to use parameters rather than the ThreadLocal field.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace TestThreadLocal
{
internal class Program
{
public class EvaluationContext
{
public int A { get; set; }
public int B { get; set; }
}
public static class FormulasRunTime
{
public static ThreadLocal<EvaluationContext> Context = new ThreadLocal<EvaluationContext>();
public static int SomeFunction()
{
EvaluationContext ctx = Context.Value;
return ctx.A + ctx.B;
}
public static int SomeFunction(EvaluationContext context)
{
return context.A + context.B;
}
}
private static void Main(string[] args)
{
Stopwatch stopwatch = Stopwatch.StartNew();
int N = 10000;
Task<int>[] tasks = new Task<int>[N];
int sum = 0;
for (int i = 0; i < N; i++)
{
int x = i;
tasks[i] = Task.Factory.StartNew(() =>
{
//Console.WriteLine("Starting {0}, thread {1}", x, Thread.CurrentThread.ManagedThreadId);
FormulasRunTime.Context.Value = new EvaluationContext {A = 0, B = x};
return FormulasRunTime.SomeFunction();
});
sum += i;
}
Task.WaitAll(tasks);
Console.WriteLine("Using ThreadLocal: It took {0} millisecs and the sum is {1}", stopwatch.ElapsedMilliseconds, tasks.Sum(t => t.Result));
Console.WriteLine(sum);
stopwatch = Stopwatch.StartNew();
for (int i = 0; i < N; i++)
{
int x = i;
tasks[i] = Task.Factory.StartNew(() =>
{
return FormulasRunTime.SomeFunction(new EvaluationContext { A = 0, B = x });
});
}
Task.WaitAll(tasks);
Console.WriteLine("Using parameter: It took {0} millisecs and the sum is {1}", stopwatch.ElapsedMilliseconds, tasks.Sum(t => t.Result));
Console.ReadKey();
}
}
}
Going on costa's answer;
If you try N as 10000000,
int N = 10000000;
you will see there is not much of a difference (around 107.4 to 103.4 seconds).
If the value gets bigger the difference becomes smaller.
So, if you do not mind a three seconds slowness, i think it is the difference between the usability and the taste.
PS: In the code, int return types must be converted to long.
I consider the ThreadLocal design to be dirty, yet creative. It is definitely going to be faster to use parameters but performance should not be your only concern. Parameters will be much clearer to understand. I recommend you go with parameters.
There will not be any performance impact, but you will not be able to do any parallel computations in this case (which can be quite useful especially in formulas domain).
If you definitely don't want to do it you can go for ThreadLocal.
Otherwise I would suggest you look at the "state monad" "pattern" that will allow you to seamlessly pass your state (context) through your computations (formulas) without having any explicit parameters.
I think you'll find that in a head-to-head comparison, accessing a ThreadLocal<> takes substantially longer than accessing a parameter, but in the end it might not be a significant difference - it all depends what else you're doing.

Categories