Simple round robin (moving average) array in C# - c#

As a diagnostic, I want to display the number of cycles per second in my app. (Think frames-per-second in a first-person-shooter.)
But I don't want to display the most recent value, or the average since launch. What I want to calculate is the mean of the last X values.
My question is, I suppose, about the best way to store these values. My first thought was to create a fixed size array, so each new value would push out the oldest. Is this the best way to do it? If so, how would I implement it?
EDIT:
Here's the class I wrote: RRQueue. It inherits Queue, but enforces the capacity and dequeues if necessary.
EDIT 2:
Pastebin is so passé. Now on a GitHub repo.

The easiest option for this is probably to use a Queue<T>, as this provides the first-in, first-out behavior you're after. Just Enqueue() your items, and when you have more than X items, Dequeue() the extra item(s).

Possibly use a filter:
average = 0.9*average + 0.1*value
where 'value' is the most recent measurement
Vary with the 0.9 and 0.1 (as long as the sum of these two is 1)
This is not exactly an average, but it does filter out spikes, transients, etc, but does not require arrays for storage.
Greetings,
Karel

If you need the fastest implementation, then yes, a fixed-size array ()with a separate count would be fastest.

You should take a look at the performance monitoring built into Windows :D.
MSDN
The API will feel a bit wonky if you haven't played with it before, but it's fast, powerful, extensible, and it makes quick work of getting usable results.

my implementation:
class RoundRobinAverage
{
int[] buffer;
byte _size;
byte _idx = 0;
public RoundRobinAverage(byte size)
{
_size = size;
buffer = new int[size];
}
public double Calc(int probeValue)
{
buffer[_idx++] = probeValue;
if (_idx >= _size)
_idx = 0;
return buffer.Sum() / _size;
}
}
usage:
private RoundRobinAverage avg = new RoundRobinAverage(10);\
...
var average = avg.Calc(123);

Related

Cannot implicitly convert type 'double' to 'int'

I'm a beginner running through some easy enough challenges, and I can't seem to figure out this issue. Code is just a function for finding the biggest and smallest numbers in an array, and even if this isn't exactly an efficient way to do it, I have no idea where the code is getting an int from. Anyone able to shed light on that?
using System;
using System.Linq;
public class Program
{
public static double[] FindMinMax(double[] values)
{
double small = values.Min();
double big = values.Max();
double result = new double[small, big];
return result;
}
}
I have no idea where the code is getting an int from
You are asking, here (new double[small, big]) for it to create a 2-dimensional (rectangular) array with dimensions (for example) 17.2 × 42.6 - it is those dimensions that it wants to be integers.
I think you mean to create a vector (single-dimension zero-based array) with the two values as the contents:
double[] result = new double[] {small, big};
Although I suspect a value-tuple would work more effectively, i.e.
public static (double Min, double Max) FindMinMax(double[] values)
{
// ...
return (small, big);
}
You might also want to look into whether it is optimal (it might not be) to write an explicit single loop that checks min and max both in each step, rather than two iterations. But as long as the data isn't huge, it won't matter at all. And if it is huge, then you get into topics like sharding the array and performing parallel min/max on the different shards, and then aggregating the shard results. There may also be SIMD options here.

Algorithm speed-up using List<T>.Sort and IEnumerable

For my project, I first load an image from file, and put every pixel into a 2D pixels[,] array. Then, I want to examine each pixel and split them up into "bins" based on how they are colored, and then sort each bin. So I have a Bin object, which encapsulates a List<Pixel>, and I have a List<Bin> containing a (relatively small) number of bins.
My problem is that when I try to fill these bins from very large images (1920x1200 = 2.3 million pixels, eg), the algorithm I'm using is slower than I'd like, and I've traced the problem down to some C# language-specific features which seem to be taking way longer than I was expecting. I'd like some advice on how to better use C# to remove these bottlenecks.
After loading an image, I call a function called "fillBinsFromSource", which takes an enumerable list of pixels, finds which bin they belong in, and puts them there:
public void fillBinsFromSource(IEnumerable<Pixel> source)
{
Stopwatch s = new Stopwatch();
foreach (Pixel p in source)
{
s.Start();
// algorithm removed for brevity
// involves a binary search of all Bins and a List.Add call
s.Stop();
}
}
For large images, it's expected that my algorithm will take a while, but when I put a Stopwatch outside the function call, it turns out that it takes about twice as long as the time accrued on s, which means doing the enumeration using foreach is taking up half the time of this function (about 800 ms of the 1.6 seconds for a 1920x1200 image).
The reason I need to pass an enumerable list is because sometimes users will add only a small region of a picture, not an entire picture. The time-consuming call passes down several iterators, first from an list of images, then from each image in the list, like so (simplified):
class ImageList : IEnumerable<Pixel>
{
private List<Image> imageList;
public IEnumerator<Pixel> GetEnumerator()
{
foreach (Image i in imageList)
foreach (Pixel p in i)
yield return p;
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
class Image : IEnumerable<Pixel>
{
private Pixel[,] pixels; // all pixels in the image
private List<Pixel> selectedPixels;// all pixels in the user's selection
public IEnumerator<Pixel> GetEnumerator()
{
if (selectedPixels == null)
for (int i = 0; i < image.Width; i++)
for (int j = 0; j < image.Height; j++)
yield return pixels[i, j];
else
for (int i = 0; i < selectedPixels.Count; i++)
yield return selectedPixels[i];
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
Then finally I call this
ImageList list; // pretend it contains only 1 image, and it's large
fillBinsFromSource(list);
Question 1) Because of the need to enumerate over both the 2D array of pixels AND the selected region, depending on what the user has chosen, the enumeration is just really slow. How can I speed this up?
Then, after filling all these bins with Pixel objects, I sort them. I call List<Pixel>.Sort() and rely on IComparable, like so:
ImageList list; // pretend it contains only 1 image, and it's large
fillBinsFromSource(list);
foreach(Bin b in allBins)
b.Sort(); // calls List<Pixel>.Sort()
class Pixel : IComparable
{
// store both HSV and RGB
float h, s, v;
byte r, g, b;
// we sort by HSV's value property
public int CompareTo(object obj)
{
// this is much faster than calling CompareTo on a float
Pixel rhs = obj as Pixel;
if (v < rhs.v)
return -1;
else if (v > rhs.v)
return 1;
return 0;
}
Question 2) Suppose allBins has 7 elements; sorting 7 separate lists which have a total of 2.3 million Pixels in them takes about 2 seconds. Sorting one list of 2.3 million random ints takes under 200 milliseconds. I can appreciate that there's some level of speed-up using primitive types, but is it really over 10x slower to use IComparable? Are there any speedups to be had here?
I apologize for the long question, if anyone has any advice for me, I'd appreciate it!
You really need to profile your code and see what is slow.
Most obvious:
Pixel does not implement generic IComparable<Pixel> and as result Compare had to do much more work.
Try to make Pixel value type. Most likely you'll see drop in performance, but may be not.
Consider passing 2d ranges (rectangle) and access Pixel objects directly by index instead of iterators if you find that performance is below what you find acceptable. Iterators are nice, but not free.
All kinds of indirection, like a visitor pattern or virtual inheritance, is poison if you want raw metal performance. Virtual calls, allocation, unpredictable branching do a lot of damage to the kind of algorithm where there is a small, tight, inner loop where 99,99% of the time is spent.
Why? Because the CPU likes to execute many (dozens) of instructions in parallel. It can do that only if it can peek ahead in the instruction stream. The aforementioned things prevent that.
You really need to get the innermost loop right. Don't allocate there, don't call virtual functions (or interface methods or delegates).
Probably, your innermost loop should process a rectangle of a given image with a hard-coded kernel. Instead of implementing your processing function per-pixel, implement it per-rectangle.
In contrast, it doesn't matter how you provide the stream of images. Use LINQ there all you want. It is a low-volume operation because there a millions of pixels per image.
Instead of using an iterator, or even building up an array / list of pixels to begin with, you could use the visitor pattern. Images, Image Lists and other objects representing arbitrary selections could all accept a visitor class with a single method VisitPixel and call that method for each pixel the object represents. The visitor class would then be responsible for binning all the pixels as they're visited.
This might eliminate the need to extract all of your pixels into a separate array. It might also eliminate the creation of iterators in favor of simple for loops.
Alexei Levenkov has some good points in regards to your second question. It might be even faster to use the Sort overload that takes an instance of IComparer<T>.

Prepend to a C# Array

Given a populated byte[] values in C#, I want to prepend the value (byte)0x00 to the array. I assume this will require making a new array and adding the contents of the old array. Speed is an important aspect of my application. What is the best way to do this?
-- EDIT --
The byte[] is used to store DSA (Digital Signature Algorithm) parameters. The operation will only need to be performed once per array, but speed is important because I am potentially performing this operation on many different byte[]s.
If you are only going to perform this operation once then there isn't a whole lot of choices. The code provided by Monroe's answer should do just fine.
byte[] newValues = new byte[values.Length + 1];
newValues[0] = 0x00; // set the prepended value
Array.Copy(values, 0, newValues, 1, values.Length); // copy the old values
If, however, you're going to be performing this operation multiple times you have some more choices. There is a fundamental problem that prepending data to an array isn't an efficient operation, so you could choose to use an alternate data structure.
A LinkedList can efficiently prepend data, but it's less efficient in general for most tasks as it involves a lot more memory allocation/deallocation and also looses memory locallity, so it may not be a net win.
A double ended queue (known as a deque) would be a fantastic data structure for you. You can efficiently add to the start or the end, and efficiently access data anywhere in the structure (but you can't efficiently insert somewhere other than the start or end). The major problem here is that .NET doesn't provide an implementation of a deque. You'd need to find a 3rd party library with an implementation.
You can also save yourself a lot when copying by keeping track of "data that I need to prepend" (using a List/Queue/etc.) and then waiting to actually prepend the data as long as possible, so that you minimize the creation of new arrays as much as possible, as well as limiting the number of copies of existing elements.
You could also consider whether you could adjust the structure so that you're adding to the end, rather than the start (even if you know that you'll need to reverse it later). If you are appending a lot in a short space of time it may be worth storing the data in a List (which can efficiently add to the end) and adding to the end. Depending on your needs, it may even be worth making a class that is a wrapper for a List and that hides the fact that it is reversed. You could make an indexer that maps i to Count-i, etc. so that it appears, from the outside, as though your data is stored normally, even though the internal List actually holds the data backwards.
Ok guys, let's take a look at the perfomance issue regarding this question.
This is not an answer, just a microbenchmark to see which option is more efficient.
So, let's set the scenario:
A byte array of 1,000,000 items, randomly populated
We need to prepend item 0x00
We have 3 options:
Manually creating and populating the new array
Manually creating the new array and using Array.Copy (#Monroe)
Creating a list, loading the array, inserting the item and converting the list to an array
Here's the code:
byte[] byteArray = new byte[1000000];
for (int i = 0; i < byteArray.Length; i++)
{
byteArray[i] = Convert.ToByte(DateTime.Now.Second);
}
Stopwatch stopWatch = new Stopwatch();
//#1 Manually creating and populating a new array;
stopWatch.Start();
byte[] extendedByteArray1 = new byte[byteArray.Length + 1];
extendedByteArray1[0] = 0x00;
for (int i = 0; i < byteArray.Length; i++)
{
extendedByteArray1[i + 1] = byteArray[i];
}
stopWatch.Stop();
Console.WriteLine(string.Format("#1: {0} ms", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
//#2 Using a new array and Array.Copy
stopWatch.Start();
byte[] extendedByteArray2 = new byte[byteArray.Length + 1];
extendedByteArray2[0] = 0x00;
Array.Copy(byteArray, 0, extendedByteArray2, 1, byteArray.Length);
stopWatch.Stop();
Console.WriteLine(string.Format("#2: {0} ms", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
//#3 Using a List
stopWatch.Start();
List<byte> byteList = new List<byte>();
byteList.AddRange(byteArray);
byteList.Insert(0, 0x00);
byte[] extendedByteArray3 = byteList.ToArray();
stopWatch.Stop();
Console.WriteLine(string.Format("#3: {0} ms", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
Console.ReadLine();
And the results are:
#1: 9 ms
#2: 1 ms
#3: 6 ms
I've run it multiple times and I got different numbers, but the proportion is always the same: #2 is always the most efficient choice.
My conclusion: arrays are more efficient then Lists (although they provide less functionality), and somehow Array.Copy is really optmized (would like to understand that, though).
Any feedback will be appreciated.
Best regards.
PS: this is not a swordfight post, we are at a Q&A site to learn and teach. And learn.
The easiest and cleanest way for .NET 4.7.1 and above is to use the side-effect free Prepend().
Adds a value to the beginning of the sequence.
Example
// Creating an array of numbers
var numbers = new[] { 1, 2, 3 };
// Trying to prepend any value of the same type
var results = numbers.Prepend(0);
// output is 0, 1, 2, 3
Console.WriteLine(string.Join(", ", results ));
As you surmised, the fastest way to do this is to create new array of length + 1 and copy all the old values over.
If you are going to be doing this many times, then I suggest using a List<byte> instead of byte[], as the cost of reallocating and copying while growing the underlying storage is amortized more effectively; in the usual case, the underlying vector in the List is grown by a factor of two each time an addition or insertion is made to the List that would exceed its current capacity.
...
byte[] newValues = new byte[values.Length + 1];
newValues[0] = 0x00; // set the prepended value
Array.Copy(values, 0, newValues, 1, values.Length); // copy the old values
When I need to append data frequently but also want O(1) random access to individual elements, I'll use an array that is over allocated by some amount of padding for quickly adding new values. This means you need to store the actual content length in another variable, as the array.length will indicate the length + the padding. A new value gets appended by using one slot of the padding, no allocation & copy are necessary until you run out of padding. In effect, allocation is amortized over several append operations. There are speed space trade offs, as if you have many of these data structures you could have a fair amount of padding in use at any one time in the program.
This same technique can be used in prepending. Just as with appending, you can introduce an interface or abstraction between the users and the implementation: you can have several slots of padding so that new memory allocation is only necessary occasionally. As some above suggested, you can also implement a prepending interface with an appending data structure that reverses the indexes.
I'd package the data structure as an implementation of some generic collection interface, so that the interface appears fairly normal to the user (such as an array list or something).
(Also, if removal is supported, it's probably useful to clear elements as soon as they are removed to help reduce gc load.)
The main point is to consider the implementation and the interface separately, as decoupling them gives you the flexibility to choose varied implementations or to hide implementation details using a minimal interface.
There are many other data structures you could use depending on the applicability to your domain. Ropes or Gap Buffer; see What is best data structure suitable to implement editor like notepad?; Trie's do some useful things, too.
I know this is a VERY old post but I actually like using lambda. Sure my code may NOT be the most efficient way but its readable and in one line. I use a combination of .Concat and ArraySegment.
string[] originalStringArray = new string[] { "1", "2", "3", "5", "6" };
int firstElementZero = 0;
int insertAtPositionZeroBased = 3;
string stringToPrepend = "0";
string stringToInsert = "FOUR"; // Deliberate !!!
originalStringArray = new string[] { stringToPrepend }
.Concat(originalStringArray).ToArray();
insertAtPositionZeroBased += 1; // BECAUSE we prepended !!
originalStringArray = new ArraySegment<string>(originalStringArray, firstElementZero, insertAtPositionZeroBased)
.Concat(new string[] { stringToInsert })
.Concat(new ArraySegment<string>(originalStringArray, insertAtPositionZeroBased, originalStringArray.Length - insertAtPositionZeroBased)).ToArray();
The best choice depends on what you're going to be doing with this collection later on down the line. If that's the only length-changing edit that will ever be made, then your best bet is to create a new array with one additional slot and use Array.Copy() to do the rest. No need to initialize the first value, since new C# arrays are always zeroed out:
byte[] PrependWithZero(byte[] input)
{
var newArray = new byte[input.Length + 1];
Array.Copy(input, 0, newArray, 1, input.Length);
return newArray;
}
If there are going to be other length-changing edits that might happen, the most performant option might be to use a List<byte> all along, as long as the additions aren't always to the beginning. (If that's the case, even a linked list might not be an option that you can dismiss out of hand.):
var list = new List<byte>(input);
list.Insert(0, 0);
I am aware this is over 4-year-old accepted post, but for those who this might be relevant Buffer.BlockCopy would be faster.

Fast Algorithm for computing percentiles to remove outliers

I have a program that needs to repeatedly compute the approximate percentile (order statistic) of a dataset in order to remove outliers before further processing. I'm currently doing so by sorting the array of values and picking the appropriate element; this is doable, but it's a noticable blip on the profiles despite being a fairly minor part of the program.
More info:
The data set contains on the order of up to 100000 floating point numbers, and assumed to be "reasonably" distributed - there are unlikely to be duplicates nor huge spikes in density near particular values; and if for some odd reason the distribution is odd, it's OK for an approximation to be less accurate since the data is probably messed up anyhow and further processing dubious. However, the data isn't necessarily uniformly or normally distributed; it's just very unlikely to be degenerate.
An approximate solution would be fine, but I do need to understand how the approximation introduces error to ensure it's valid.
Since the aim is to remove outliers, I'm computing two percentiles over the same data at all times: e.g. one at 95% and one at 5%.
The app is in C# with bits of heavy lifting in C++; pseudocode or a preexisting library in either would be fine.
An entirely different way of removing outliers would be fine too, as long as it's reasonable.
Update: It seems I'm looking for an approximate selection algorithm.
Although this is all done in a loop, the data is (slightly) different every time, so it's not easy to reuse a datastructure as was done for this question.
Implemented Solution
Using the wikipedia selection algorithm as suggested by Gronim reduced this part of the run-time by about a factor 20.
Since I couldn't find a C# implementation, here's what I came up with. It's faster even for small inputs than Array.Sort; and at 1000 elements it's 25 times faster.
public static double QuickSelect(double[] list, int k) {
return QuickSelect(list, k, 0, list.Length);
}
public static double QuickSelect(double[] list, int k, int startI, int endI) {
while (true) {
// Assume startI <= k < endI
int pivotI = (startI + endI) / 2; //arbitrary, but good if sorted
int splitI = partition(list, startI, endI, pivotI);
if (k < splitI)
endI = splitI;
else if (k > splitI)
startI = splitI + 1;
else //if (k == splitI)
return list[k];
}
//when this returns, all elements of list[i] <= list[k] iif i <= k
}
static int partition(double[] list, int startI, int endI, int pivotI) {
double pivotValue = list[pivotI];
list[pivotI] = list[startI];
list[startI] = pivotValue;
int storeI = startI + 1;//no need to store # pivot item, it's good already.
//Invariant: startI < storeI <= endI
while (storeI < endI && list[storeI] <= pivotValue) ++storeI; //fast if sorted
//now storeI == endI || list[storeI] > pivotValue
//so elem #storeI is either irrelevant or too large.
for (int i = storeI + 1; i < endI; ++i)
if (list[i] <= pivotValue) {
list.swap_elems(i, storeI);
++storeI;
}
int newPivotI = storeI - 1;
list[startI] = list[newPivotI];
list[newPivotI] = pivotValue;
//now [startI, newPivotI] are <= to pivotValue && list[newPivotI] == pivotValue.
return newPivotI;
}
static void swap_elems(this double[] list, int i, int j) {
double tmp = list[i];
list[i] = list[j];
list[j] = tmp;
}
Thanks, Gronim, for pointing me in the right direction!
The histogram solution from Henrik will work. You can also use a selection algorithm to efficiently find the k largest or smallest elements in an array of n elements in O(n). To use this for the 95th percentile set k=0.05n and find the k largest elements.
Reference:
http://en.wikipedia.org/wiki/Selection_algorithm#Selecting_k_smallest_or_largest_elements
According to its creator a SoftHeap can be used to:
compute exact or approximate medians
and percentiles optimally. It is also
useful for approximate sorting...
I used to identify outliers by calculating the standard deviation. Everything with a distance more as 2 (or 3) times the standard deviation from the avarage is an outlier. 2 times = about 95%.
Since your are calculating the avarage, its also very easy to calculate the standard deviation is very fast.
You could also use only a subset of your data to calculate the numbers.
You could estimate your percentiles from just a part of your dataset, like the first few thousand points.
The Glivenko–Cantelli theorem ensures that this would be a fairly good estimate, if you can assume your data points to be independent.
Divide the interval between minimum and maximum of your data into (say) 1000 bins and calculate a histogram. Then build partial sums and see where they first exceed 5000 or 95000.
There are a couple basic approaches I can think of. First is to compute the range (by finding the highest and lowest values), project each element to a percentile ((x - min) / range) and throw out any that evaluate to lower than .05 or higher than .95.
The second is to compute the mean and standard deviation. A span of 2 standard deviations from the mean (in both directions) will enclose 95% of a normally-distributed sample space, meaning your outliers would be in the <2.5 and >97.5 percentiles. Calculating the mean of a series is linear, as is the standard dev (square root of the sum of the difference of each element and the mean). Then, subtract 2 sigmas from the mean, and add 2 sigmas to the mean, and you've got your outlier limits.
Both of these will compute in roughly linear time; the first one requires two passes, the second one takes three (once you have your limits you still have to discard the outliers). Since this is a list-based operation, I do not think you will find anything with logarithmic or constant complexity; any further performance gains would require either optimizing the iteration and calculation, or introducing error by performing the calculations on a sub-sample (such as every third element).
A good general answer to your problem seems to be RANSAC.
Given a model, and some noisy data, the algorithm efficiently recovers the parameters of the model.
You will have to chose a simple model that can map your data. Anything smooth should be fine. Let say a mixture of few gaussians. RANSAC will set the parameters of your model and estimate a set of inliners at the same time. Then throw away whatever doesn't fit the model properly.
You could filter out 2 or 3 standard deviation even if the data is not normally distributed; at least, it will be done in a consistent manner, that should be important.
As you remove the outliers, the std dev will change, you could do this in a loop until the change in std dev is minimal. Whether or not you want to do this depends upon why are you manipulating the data this way. There are major reservations by some statisticians to removing outliers. But some remove the outliers to prove that the data is fairly normally distributed.
Not an expert, but my memory suggests:
to determine percentile points exactly you need to sort and count
taking a sample from the data and calculating the percentile values sounds like a good plan for decent approximation if you can get a good sample
if not, as suggested by Henrik, you can avoid the full sort if you do the buckets and count them
One set of data of 100k elements takes almost no time to sort, so I assume you have to do this repeatedly. If the data set is the same set just updated slightly, you're best off building a tree (O(N log N)) and then removing and adding new points as they come in (O(K log N) where K is the number of points changed). Otherwise, the kth largest element solution already mentioned gives you O(N) for each dataset.

Parallel Shuffle in C# 4?

As noticed in this question: Randomize a List<T> you can implement a shuffle method on a list; like one of the answers mentions:
using System.Security.Cryptography;
...
public static void Shuffle<T>(this IList<T> list)
{
RNGCryptoServiceProvider provider = new RNGCryptoServiceProvider();
int n = list.Count;
while (n > 1)
{
byte[] box = new byte[1];
do provider.GetBytes(box);
while (!(box[0] < n * (Byte.MaxValue / n)));
int k = (box[0] % n);
n--;
T value = list[k];
list[k] = list[n];
list[n] = value;
}
}
Does anyone know if it's possible to "Parallel-ize" this using some of the new features in C# 4?
Just a curiosity.
Thanks all,
-R.
Your question is ambiguous, but you can use the Parallel Framework to help doing some operations in parallel, but it depends on whether you want to get the bytes, then shuffle them, so the getting of the bytes is in parallel, or, if you want to shuffle multiple lists at one time.
If it is the former, I would suggest that you first break your code into smaller functions, so you can do some analysis to see where the bottlenecks are, as, if the getting of the bytes is the bottleneck, then doing it in parallel may make a difference. By having tests in place you can test new functions and decide if it is worth the added complexity.
static RNGCryptoServiceProvider provider = new RNGCryptoServiceProvider();
private static byte[] GetBytes() {
byte[] box = new byte[1];
do provider.GetBytes(box);
while (!(box[0] < n * (Byte.MaxValue / n)));
return box;
}
private static IList<T> InnerLoop(int n, IList<T> list) {
var box = GetBytes(n);
int k = (box[0] % n);
T value = list[k];
list[k] = list[n];
list[n] = value;
return list;
}
public static void Shuffle<T>(this IList<T> list)
{
int n = list.Count;
while (n > 1)
{
list = InnerLoop(n, list);
n--;
}
}
This is a rough idea as to how to split your function, so you can replace the GetBytes function, but you may need to make some other changes to test it.
Getting some numbers is crucial to make certain that you are getting enough of a benefit to warrant adding complexity though.
You may want to move the lines in InnerLoop that deals with list into a separate function, so you can see if that is slow and perhaps swap out some algorithms to improve it, but, you need to have an idea how fast you need the entire shuffle operation to go.
But if you want to just do multiple lists then it will be easy, you may want to perhaps look at PLINQ for that.
UPDATE
The code above is meant to just show an example of how it can be broken into smaller functions, not to be a working solution. If it is necessary to move the Provider class that I put into a static variable into a function and then pass it as a parameter then that may need to be done. I didn't test the code, but my suggestion is based on getting profiling then look at how to improve it's performance, especially since I am not certain which way it was meant to be done in parallel. It may be necessary to just build up an array in order, in parallel, then shuffle them, but first see what the time needed for each operation is, then see if doing it in parallel will be warranted.
There may be a need to use concurrent data structures also, if using multiple threads, in order to not pay a penalty by having to synchronize yourself, but, again, that may not be needed, depending on where the bottleneck is.
UPDATE:
Based on the answer to my comment, you may want to look at the various functions in the parallel library, but this page may help. http://msdn.microsoft.com/en-us/library/dd537609.aspx.
You can create a Func version of your function, and pass that in as a parameter. There are multiple ways to work use this library to make this function in parallel, as you already don't have any global variables, just lose the static operator.
You will want to get numbers as you add more threads though, to see where you start to see a decrease in performance, as you won't see a 1:1 improvement, so, if you add 2 threads it won't go twice as fast, so just test and see where it becomes a problem having more threads. Since your function is CPU bound you may want to have only one thread per core, as a rough starting point.
Any in-place shuffle is not very well suited for parallelization. Especially since a shuffle requires a random component (over the range) so there is no way to localize parts of the problem.
Well, you could fairly easily parallelize the code which is generating the random numbers - which could be the bottleneck when using a cryptographically secure random number generator. You could then use that sequence of random numbers (which would need to have its order preserved) within a single thread performing the swaps.
One problem which has just occurred to me though, is that RNGCryptoServiceProvider isn't thread-safe (and neither is System.Random). You'd need as many random number generators as threads to make this work. Basically it becomes a bit ugly :(

Categories