AccessViolationException with sound buffer conversion - c#

I'm using Naudio AsioOut object to pass data from input buffer to my delayProc() function and then to output buffer.
The delayProc() needs float[] buffer type, and this is possible using e.GetAsInterleavedSamples(). The problem is I need to re-convert it to a multidimensional IntPtr, to do this I'm using AsioSampleConvertor class.
When I try to apply the effect it shows me an error: AccessViolationException on the code of AsioSampleConvertor class.
So I think the problem is due to the conversion from float[] to IntPtr[]..
I give you some code:
OnAudioAvailable()
floatIn = new float[e.SamplesPerBuffer * e.InputBuffers.Length];//*2
e.GetAsInterleavedSamples(floatIn);
floatOut = delayProc(floatIn, e.SamplesPerBuffer * e.InputBuffers.Length, 1.5f);
//conversione da float[] a IntPtr[L][R]
Outp = Marshal.AllocHGlobal(sizeof(float)*floatOut.Length);
Marshal.Copy(floatOut, 0, Outp, floatOut.Length);
NAudio.Wave.Asio.ASIOSampleConvertor.ConvertorFloatToInt2Channels(Outp, e.OutputBuffers, e.InputBuffers.Length, floatOut.Length);
delayProc()
private float[] delayProc(float[] sourceBuffer, int sampleCount, float delay)
{
if (OldBuf == null)
{
OldBuf = new float[sampleCount];
}
float[] BufDly = new float[(int)(sampleCount * delay)];
int delayLength = (int)(BufDly.Length - (BufDly.Length / delay));
for (int j = sampleCount - delayLength; j < sampleCount; j++)
for (int i = 0; i < delayLength; i++)
BufDly[i] = OldBuf[j];
for (int j = 0; j < sampleCount; j++)
for (int i = delayLength; i < BufDly.Length; i++)
BufDly[i] = sourceBuffer[j];
for (int i = 0; i < sampleCount; i++)
OldBuf[i] = sourceBuffer[i];
return BufDly;
}
AsioSampleConvertor
public static void ConvertorFloatToInt2Channels(IntPtr inputInterleavedBuffer, IntPtr[] asioOutputBuffers, int nbChannels, int nbSamples)
{
unsafe
{
float* inputSamples = (float*)inputInterleavedBuffer;
int* leftSamples = (int*)asioOutputBuffers[0];
int* rightSamples = (int*)asioOutputBuffers[1];
for (int i = 0; i < nbSamples; i++)
{
*leftSamples++ = clampToInt(inputSamples[0]);
*rightSamples++ = clampToInt(inputSamples[1]);
inputSamples += 2;
}
}
}
ClampToInt()
private static int clampToInt(double sampleValue)
{
sampleValue = (sampleValue < -1.0) ? -1.0 : (sampleValue > 1.0) ? 1.0 : sampleValue;
return (int)(sampleValue * 2147483647.0);
}
If you need some other code, just ask me.

When you call ConvertorFloatToInt2Channels you are passing in the total number of samples across all channels, then trying to read that many pairs of samples. So you are trying to read twice as many samples from your input buffer as are actually there. Using unsafe code you are trying to address well past the end of the allocated block, which results in the access violation you are getting.
Change the for loop in your ConvertorFloatToInt2Channels method to read:
for (int i = 0; i < nbSamples; i += 2)
This will stop your code from trying to read double the number of items actually present in the source memory block.
Incidentally, why are you messing around with allocating global memory and using unsafe code here? Why not process them as managed arrays? Processing the data itself isn't much slower, and you save on all the overheads of copying data to and from unmanaged memory.
Try this:
public static void FloatMonoToIntStereo(float[] samples, float[] leftChannel, float[] rightChannel)
{
for (int i = 0, j = 0; i < samples.Length; i += 2, j++)
{
leftChannel[j] = (int)(samples[i] * Int32.MaxValue);
rightChannel[j] = (int)(samples[i + 1] * Int32.MaxValue);
}
}
On my machine that processes around 12 million samples per second, converting the samples to integer and splitting the channels. About half that speed if I allocate the buffers for every set of results. About half again when I write that to use unsafe code, AllocHGlobal etc.
Never assume that unsafe code is faster.

Related

C# performance - pointer to span in a hot loop

I'm looking for a faster alternative to BitConverter:
But! Inside a "hot loop":
//i_k_size = 8 bytes
while (fs.Read(ba_buf, 0, ba_buf.Length) > 0 && dcm_buf_read_ctr < i_buf_reads)
{
Span<byte> sp_data = ba_buf.AsSpan();
for (int i = 0; i < ba_buf.Length; i += i_k_size)
{
UInt64 k = BitConverter.ToUInt64(sp_data.Slice(i, i_k_size));
}
}
My efforts to integrate a pointer with conversion - made performance worse. Can a pointer be used to maki it faster with span?
Below is the benchmark: pointer 2 array is 2x faster
Actually I want this code to be used instead of BitConverter:
public static int l_1gb = 1073741824;
static unsafe void Main(string[] args)
{
Random rnd = new Random();
Stopwatch sw1 = new();
sw1.Start();
byte[] k = new byte[8];
fixed (byte* a2rr = &k[0])
{
for (int i = 0; i < 1000000000; i++)
{
rnd.NextBytes(k);
//UInt64 p1 = BitConverter.ToUInt64(k);
//time: 10203.824
//time: 10508.981
//time: 10246.784
//time: 10285.889
//UInt64* uint64ptr = (UInt64*)a2rr;
//x2 performance !
UInt64 p2 = *(UInt64*)a2rr;
//time: 4609.814
//time: 4588.157
//time: 4634.494
}
}
Console.WriteLine($"time: {Math.Round(sw1.Elapsed.TotalMilliseconds, 3)}");
}
Assuming ba_buf is a byte[], a very easy and efficient way to run your loop is as such:
foreach(var value in MemoryMarshal.Cast<byte, ulong>(ba_buf))
// work with value here
If you need to finesse the buffer (for example, to cut off parts of it), use AsSpan(start, count) on it first.
You can optimise this quite a lot by initialising some spans outside the reading loop and then read directly into a Span<byte> and access the data via a Span<ulong> like so:
int buf_bytes = sizeof(ulong) * 1024; // Or whatever buffer size you need.
var ba_buf = new byte[buf_bytes];
var span_buf = ba_buf.AsSpan();
var data_span = MemoryMarshal.Cast<byte, ulong>(span_buf);
while (true)
{
int count = fs.Read(span_buf) / sizeof(ulong);
if (count == 0)
break;
for (int i = 0; i < count; i++)
{
// Do something with data_span[i]
Console.WriteLine(data_span[i]); // Put your own processing here.
}
}
This avoids memory allocation as much as possible. It terminates the reading loop when it runs out of data, and if the number of bytes returned is not a multiple of sizeof(ulong) it ignores the extra bytes.
It will always read all the available data, but if you want to terminate it earlier you can add code to do so.
As an example, consider this code which writes 2,000 ulong values to a file and then reads them back in using the code above:
using (var output = File.OpenWrite("x"))
{
for (ulong i = 0; i < 2000; ++i)
{
output.Write(BitConverter.GetBytes(i));
}
}
using var fs = File.OpenRead("x");
int buf_bytes = sizeof(ulong) * 1024; // Or whatever buffer size you need.
var ba_buf = new byte[buf_bytes];
var span_buf = ba_buf.AsSpan();
var data_span = MemoryMarshal.Cast<byte, ulong>(span_buf);
while (true)
{
int count = fs.Read(span_buf) / sizeof(ulong);
if (count == 0)
break;
for (int i = 0; i < count; i++)
{
// Do something with data_span[i]
Console.WriteLine(data_span[i]); // Put your own processing here.
}
}

c# managedCuda 2d array to GPU

I'm new to CUDA and trying to figure out how to pass 2d array to the kernel.
I have to following working code for 1 dimension array:
class Program
{
static void Main(string[] args)
{
int N = 10;
int deviceID = 0;
CudaContext ctx = new CudaContext(deviceID);
CudaKernel kernel = ctx.LoadKernel(#"doubleIt.ptx", "DoubleIt");
kernel.GridDimensions = (N + 255) / 256;
kernel.BlockDimensions = Math.Min(N,256);
// Allocate input vectors h_A in host memory
float[] h_A = new float[N];
// Initialize input vectors h_A
for (int i = 0; i < N; i++)
{
h_A[i] = i;
}
// Allocate vectors in device memory and copy vectors from host memory to device memory
CudaDeviceVariable<float> d_A = h_A;
CudaDeviceVariable<float> d_C = new CudaDeviceVariable<float>(N);
// Invoke kernel
kernel.Run(d_A.DevicePointer, d_C.DevicePointer, N);
// Copy result from device memory to host memory
float[] h_C = d_C;
// h_C contains the result in host memory
}
}
with the following kernel code:
__global__ void DoubleIt(const float* A, float* C, int N)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < N)
C[i] = A[i] * 2;
}
as I said, everything works fine but I want to work with 2d array as follow:
// Allocate input vectors h_A in host memory
int W = 10;
float[][] h_A = new float[N][];
// Initialize input vectors h_A
for (int i = 0; i < N; i++)
{
h_A[i] = new float[W];
for (int j = 0; j < W; j++)
{
h_A[i][j] = i*W+j;
}
}
I need all the 2nd dimension to be on the same thread so the kernel.BlockDimensions must stay as 1 dimension and each kernel thread need to get 1d array with 10 elements.
so my bottom question is: How shell I copy this 2d array to the device and how to use it in the kernel? (as to the example it supposed to have total of 10 threads).
Short answer: you shouldn't do it...
Long answer: Jagged arrays are difficult to handle in general. Instead of one continuous segment of memory for your data, you have plenty small ones lying sparsely somewhere in your memory. What happens if you copy the data to GPU? If you had one large continuous segment you call the cudaMemcpy/CopyToDevice functions and copy the entire block at once. But same as you allocate jagged arrays in a for loop, you’d have to copy your data line by line into a CudaDeviceVariable<CUdeviceptr>, where each entry points to a CudaDeviceVariable<float>. In parallel you maintain a host array CudaDeviceVariable<float>[] that manages your CUdeviceptrs on host side. Copying data in general is already quite slow, doing it this way is probably a real performance killer...
To conclude: If you can, use flattened arrays and index the entries with index y * DimX + x. Even better on GPU side, use pitched memory, where the allocation is done so that each line starts on a "good" address: Index then turns to y * Pitch + x (simplified). The 2D copy methods in CUDA are made for these pitched memory allocations where each line gets some additional bytes added.
For completeness: In C# you also have 2-dimensional arrays like float[,]. You can also use these on host side instead of flattened 1D arrays. But I wouldn’t recommend to do so, as the ISO standard of .net does not guarantee that the internal memory is actually continuous, an assumption that managedCuda must use in order to use these arrays. Current .net framework doesn’t have any internal weirdness, but who knows if it will stay like this...
This would realize the jagged array copy:
float[][] data_h;
CudaDeviceVariable<CUdeviceptr> data_d;
CUdeviceptr[] ptrsToData_h; //represents data_d on host side
CudaDeviceVariable<float>[] arrayOfarray_d; //Array of CudaDeviceVariables to manage memory, source for pointers in ptrsToData_h.
int sizeX = 512;
int sizeY = 256;
data_h = new float[sizeX][];
arrayOfarray_d = new CudaDeviceVariable<float>[sizeX];
data_d = new CudaDeviceVariable<CUdeviceptr>(sizeX);
ptrsToData_h = new CUdeviceptr[sizeX];
for (int x = 0; x < sizeX; x++)
{
data_h[x] = new float[sizeY];
arrayOfarray_d[x] = new CudaDeviceVariable<float>(sizeY);
ptrsToData_h[x] = arrayOfarray_d[x].DevicePointer;
//ToDo: init data on host...
}
//Copy the pointers once:
data_d.CopyToDevice(ptrsToData_h);
//Copy data:
for (int x = 0; x < sizeX; x++)
{
arrayOfarray_d[x].CopyToDevice(data_h[x]);
}
//Call a kernel:
kernel.Run(data_d.DevicePointer /*, other parameters*/);
//kernel in *cu file:
//__global__ void kernel(float** data_d, ...)
This is a sample for CudaPitchedDeviceVariable:
int dimX = 512;
int dimY = 512;
float[] array_host = new float[dimX * dimY];
CudaPitchedDeviceVariable<float> arrayPitched_d = new CudaPitchedDeviceVariable<float>(dimX, dimY);
for (int y = 0; y < dimY; y++)
{
for (int x = 0; x < dimX; x++)
{
array_host[y * dimX + x] = x * y;
}
}
arrayPitched_d.CopyToDevice(array_host);
kernel.Run(arrayPitched_d.DevicePointer, arrayPitched_d.Pitch, dimX, dimY);
//Correspondend kernel:
extern "C"
__global__ void kernel(float* data, size_t pitch, int dimX, int dimY)
{
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= dimX || y >= dimY)
return;
//pointer arithmetic: add y*pitch to char* pointer as pitch is given in bytes,
//which gives the start of line y. Convert to float* and add x, to get the
//value at entry x of line y:
float value = *(((float*)((char*)data + y * pitch)) + x);
*(((float*)((char*)data + y * pitch)) + x) = value + 1;
//Or simpler if you don't like pointers:
float* line = (float*)((char*)data + y * pitch);
float value2 = line[x];
}

Accessing processed values from FFT

I am attempting to use Lomont FFT in order to return complex numbers to build a spectrogram / spectral density chart using c#.
I am having trouble understanding how to return values from the class.
Here is the code I have put together thus far which appears to be working.
int read = 0;
Double[] data;
byte[] buffer = new byte[1024];
FileStream wave = new FileStream(args[0], FileMode.Open, FileAccess.Read);
read = wave.Read(buffer, 0, 44);
read = wave.Read(buffer, 0, 1024);
data = new Double[read];
for (int i = 0; i < read; i+=2)
{
data[i] = BitConverter.ToInt16(buffer, i) / 32768.0;
Console.WriteLine(data[i]);
}
LomontFFT LFFT = new LomontFFT();
LFFT.FFT(data, true);
What I am not clear on is, how to return/access the values from Lomont FFT implementation back into my application (console)?
Being pretty new to c# development, I'm thinking I am perhaps missing a fundamental aspect of understanding regarding how to retrieve processed values from the instance of the Lomont Class, or perhaps even calling it incorrectly.
Console.WriteLine(LFFT.A); // Returns 0
Console.WriteLine(LFFT.B); // Returns 1
I have been searching for a code snippet or explanation of how to do this, but so far have come up with nothing that I understand or explains this particular aspect of the issue I am facing. Any guidance would be greatly appreciated.
A subset of the results held in data array noted in the code above can be found below and based on my current understanding, appear to be valid:
0.00531005859375
0.0238037109375
0.041473388671875
0.0576171875
0.07183837890625
0.083465576171875
0.092193603515625
0.097625732421875
0.099639892578125
0.098114013671875
0.0931396484375
0.0848388671875
0.07354736328125
0.05963134765625
0.043609619140625
0.026031494140625
0.007476806640625
-0.011260986328125
-0.0296630859375
-0.047027587890625
-0.062713623046875
-0.076141357421875
-0.086883544921875
-0.09454345703125
-0.098785400390625
-0.0994873046875
-0.0966796875
-0.090362548828125
-0.080810546875
-0.06842041015625
-0.05352783203125
-0.036712646484375
-0.0185546875
What am I actually attempting to do? (perspective)
I am looking to load a wave file into a console application and return a spectrogram/spectral density chart/image as a jpg/png for further processing.
The wave files I am reading are mono in format
UPDATE 1
I Receive slightly different results depending on which FFT is used.
Using RealFFT
for (int i = 0; i < read; i+=2)
{
data[i] = BitConverter.ToInt16(buffer, i) / 32768.0;
//Console.WriteLine(data[i]);
}
LomontFFT LFFT = new LomontFFT();
LFFT.RealFFT(data, true);
for (int i = 0; i < buffer.Length / 2; i++)
{
System.Console.WriteLine("{0}",
Math.Sqrt(data[2 * i] * data[2 * i] + data[2 * i + 1] * data[2 * i + 1]));
}
Partial Result of RealFFT
0.314566983321381
0.625242818210924
0.30314888696868
0.118468857708093
0.0587697011760449
0.0369034115568654
0.0265842582236275
0.0207195964060356
0.0169601273233317
0.0143745438577886
0.012528799609089
0.0111831275153128
0.0102313284519146
0.00960198279358434
0.00920236001619566
Using FFT
for (int i = 0; i < read; i+=2)
{
data[i] = BitConverter.ToInt16(buffer, i) / 32768.0;
//Console.WriteLine(data[i]);
}
double[] bufferB = new double[2 * data.Length];
for (int i = 0; i < data.Length; i++)
{
bufferB[2 * i] = data[i];
bufferB[2 * i + 1] = 0;
}
LomontFFT LFFT = new LomontFFT();
LFFT.FFT(bufferB, true);
for (int i = 0; i < bufferB.Length / 2; i++)
{
System.Console.WriteLine("{0}",
Math.Sqrt(bufferB[2 * i] * bufferB[2 * i] + bufferB[2 * i + 1] * bufferB[2 * i + 1]));
}
Partial Result of FFT:
0.31456698332138
0.625242818210923
0.303148886968679
0.118468857708092
0.0587697011760447
0.0369034115568653
0.0265842582236274
0.0207195964060355
0.0169601273233317
0.0143745438577886
0.012528799609089
0.0111831275153127
0.0102313284519146
0.00960198279358439
0.00920236001619564
Looking at the LomontFFT.FFT documentation:
Compute the forward or inverse Fourier Transform of data, with
data containing complex valued data as alternating real and
imaginary parts. The length must be a power of 2. The data is
modified in place.
This tells us a few things. First the function is expecting complex-valued data whereas your data is real. A quick fix for this is to create another buffer of twice the size and setting all the imaginary parts to 0:
double[] buffer = new double[2*data.Length];
for (int i=0; i<data.Length; i++)
{
buffer[2*i] = data[i];
buffer[2*i+1] = 0;
}
The documentation also tells us that the computation is done in place. That means that after the call to FFT returns, the input array is replaced with the computed result. You could thus print the spectrum with:
LomontFFT LFFT = new LomontFFT();
LFFT.FFT(buffer, true);
for (int i = 0; i < buffer.Length/2; i++)
{
System.Console.WriteLine("{0}",
Math.Sqrt(buffer[2*i]*buffer[2*i]+buffer[2*i+1]*buffer[2*i+1]));
}
Note since your input data is real valued you could also use LomontFFT.RealFFT. In that case, given a slightly different packing rule, you would obtain the FFT results using:
LomontFFT LFFT = new LomontFFT();
LFFT.RealFFT(data, true);
System.Console.WriteLine("{0}", Math.Abs(data[0]);
for (int i = 1; i < data.Length/2; i++)
{
System.Console.WriteLine("{0}",
Math.Sqrt(data[2*i]*data[2*i]+data[2*i+1]*data[2*i+1]));
}
System.Console.WriteLine("{0}", Math.Abs(data[1]);
This would give you the non-redundant lower half of the spectrum (Unlike LomontFFT.FFT which provides the entire spectrum). Also, numerical differences on the order of double precision (around 1e-16 times the spectrum peak value) with respect to LomontFFT.FFT can be expected.

EmguCv: Reduce the grayscales

Is there a way to reduce the grayscales of an gray-image in openCv?
Normaly i have grayvalues from 0 to 256 for an
Image<Gray, byte> inputImage.
In my case i just need grayvalues from 0-10. Is there i good way to do that with OpenCV, especially for C# ?
There's nothing built-in on OpenCV that allows this sort of thing.
Nevertheless, you can write something yourself. Take a look at this C++ implementation and just translate it to C#:
void colorReduce(cv::Mat& image, int div=64)
{
int nl = image.rows; // number of lines
int nc = image.cols * image.channels(); // number of elements per line
for (int j = 0; j < nl; j++)
{
// get the address of row j
uchar* data = image.ptr<uchar>(j);
for (int i = 0; i < nc; i++)
{
// process each pixel
data[i] = data[i] / div * div + div / 2;
}
}
}
Just send a grayscale Mat to this function and play with the div parameter.

How to initialize integer array in C# [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
c# Leaner way of initializing int array
Basically I would like to know if there is a more efficent code than the one shown below
private static int[] GetDefaultSeriesArray(int size, int value)
{
int[] result = new int[size];
for (int i = 0; i < size; i++)
{
result[i] = value;
}
return result;
}
where size can vary from 10 to 150000. For small arrays is not an issue, but there should be a better way to do the above.
I am using VS2010(.NET 4.0)
C#/CLR does not have built in way to initalize array with non-default values.
Your code is as efficient as it could get if you measure in operations per item.
You can get potentially faster initialization if you initialize chunks of huge array in parallel. This approach will need careful tuning due to non-trivial cost of mutlithread operations.
Much better results can be obtained by analizing your needs and potentially removing whole initialization alltogether. I.e. if array is normally contains constant value you can implement some sort of COW (copy on write) approach where your object initially have no backing array and simpy returns constant value, that on write to an element it would create (potentially partial) backing array for modified segment.
Slower but more compact code (that potentially easier to read) would be to use Enumerable.Repeat. Note that ToArray will cause significant amount of memory to be allocated for large arrays (which may also endup with allocations on LOH) - High memory consumption with Enumerable.Range?.
var result = Enumerable.Repeat(value, size).ToArray();
One way that you can improve speed is by utilizing Array.Copy. It's able to work at a lower level in which it's bulk assigning larger sections of memory.
By batching the assignments you can end up copying the array from one section to itself.
On top of that, the batches themselves can be quite effectively paralleized.
Here is my initial code up. On my machine (which only has two cores) with a sample array of size 10 million items, I was getting a 15% or so speedup. You'll need to play around with the batch size (try to stay in multiples of your page size to keep it efficient) to tune it to the size of items that you have. For smaller arrays it'll end up almost identical to your code as it won't get past filling up the first batch, but it also won't be (noticeably) worse in those cases either.
private const int batchSize = 1048576;
private static int[] GetDefaultSeriesArray2(int size, int value)
{
int[] result = new int[size];
//fill the first batch normally
int end = Math.Min(batchSize, size);
for (int i = 0; i < end; i++)
{
result[i] = value;
}
int numBatches = size / batchSize;
Parallel.For(1, numBatches, batch =>
{
Array.Copy(result, 0, result, batch * batchSize, batchSize);
});
//handle partial leftover batch
for (int i = numBatches * batchSize; i < size; i++)
{
result[i] = value;
}
return result;
}
Another way to improve performance is with a pretty basic technique: loop unrolling.
I have written some code to initialize an array with 20 million items, this is done repeatedly 100 times and an average is calculated. Without unrolling the loop, this takes about 44 MS. With loop unrolling of 10 the process is finished in 23 MS.
private void Looper()
{
int repeats = 100;
float avg = 0;
ArrayList times = new ArrayList();
for (int i = 0; i < repeats; i++)
times.Add(Time());
Console.WriteLine(GetAverage(times)); //44
times.Clear();
for (int i = 0; i < repeats; i++)
times.Add(TimeUnrolled());
Console.WriteLine(GetAverage(times)); //22
}
private float GetAverage(ArrayList times)
{
long total = 0;
foreach (var item in times)
{
total += (long)item;
}
return total / times.Count;
}
private long Time()
{
Stopwatch sw = new Stopwatch();
int size = 20000000;
int[] result = new int[size];
sw.Start();
for (int i = 0; i < size; i++)
{
result[i] = 5;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
return sw.ElapsedMilliseconds;
}
private long TimeUnrolled()
{
Stopwatch sw = new Stopwatch();
int size = 20000000;
int[] result = new int[size];
sw.Start();
for (int i = 0; i < size; i += 10)
{
result[i] = 5;
result[i + 1] = 5;
result[i + 2] = 5;
result[i + 3] = 5;
result[i + 4] = 5;
result[i + 5] = 5;
result[i + 6] = 5;
result[i + 7] = 5;
result[i + 8] = 5;
result[i + 9] = 5;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
return sw.ElapsedMilliseconds;
}
Enumerable.Repeat(value, size).ToArray();
Reading up Enumerable.Repeat is 20 times slower than the ops standard for loop and the only thing I found which might improve its speed is
private static int[] GetDefaultSeriesArray(int size, int value)
{
int[] result = new int[size];
for (int i = 0; i < size; ++i)
{
result[i] = value;
}
return result;
}
NOTE the i++ is changed to ++i. i++ copies i, increments i, and returns the original value. ++i just returns the incremented value
As someone already mentioned, you can leverage parallel processing like this:
int[] result = new int[size];
Parallel.ForEach(result, x => x = value);
return result;
Sorry I had no time to do performance testing on this (don't have VS installed on this machine) but if you can do it and share the results it would be great.
EDIT: As per comment, while I still think that in terms of performance they are equivalent, you can try the parallel for loop:
Parallel.For(0, size, i => result[i] = value);

Categories