C# performance - pointer to span in a hot loop

C# performance - pointer to span in a hot loop - c#

I'm looking for a faster alternative to BitConverter:
But! Inside a "hot loop":
//i_k_size = 8 bytes
while (fs.Read(ba_buf, 0, ba_buf.Length) > 0 && dcm_buf_read_ctr < i_buf_reads)
{
Span<byte> sp_data = ba_buf.AsSpan();
for (int i = 0; i < ba_buf.Length; i += i_k_size)
{
UInt64 k = BitConverter.ToUInt64(sp_data.Slice(i, i_k_size));
}
}
My efforts to integrate a pointer with conversion - made performance worse. Can a pointer be used to maki it faster with span?
Below is the benchmark: pointer 2 array is 2x faster
Actually I want this code to be used instead of BitConverter:
public static int l_1gb = 1073741824;
static unsafe void Main(string[] args)
{
Random rnd = new Random();
Stopwatch sw1 = new();
sw1.Start();
byte[] k = new byte[8];
fixed (byte* a2rr = &k[0])
{
for (int i = 0; i < 1000000000; i++)
{
rnd.NextBytes(k);
//UInt64 p1 = BitConverter.ToUInt64(k);
//time: 10203.824
//time: 10508.981
//time: 10246.784
//time: 10285.889
//UInt64* uint64ptr = (UInt64*)a2rr;
//x2 performance !
UInt64 p2 = *(UInt64*)a2rr;
//time: 4609.814
//time: 4588.157
//time: 4634.494
}
}
Console.WriteLine($"time: {Math.Round(sw1.Elapsed.TotalMilliseconds, 3)}");
}

Assuming ba_buf is a byte[], a very easy and efficient way to run your loop is as such:
foreach(var value in MemoryMarshal.Cast<byte, ulong>(ba_buf))
// work with value here
If you need to finesse the buffer (for example, to cut off parts of it), use AsSpan(start, count) on it first.

You can optimise this quite a lot by initialising some spans outside the reading loop and then read directly into a Span<byte> and access the data via a Span<ulong> like so:
int buf_bytes = sizeof(ulong) * 1024; // Or whatever buffer size you need.
var ba_buf = new byte[buf_bytes];
var span_buf = ba_buf.AsSpan();
var data_span = MemoryMarshal.Cast<byte, ulong>(span_buf);
while (true)
{
int count = fs.Read(span_buf) / sizeof(ulong);
if (count == 0)
break;
for (int i = 0; i < count; i++)
{
// Do something with data_span[i]
Console.WriteLine(data_span[i]); // Put your own processing here.
}
}
This avoids memory allocation as much as possible. It terminates the reading loop when it runs out of data, and if the number of bytes returned is not a multiple of sizeof(ulong) it ignores the extra bytes.
It will always read all the available data, but if you want to terminate it earlier you can add code to do so.
As an example, consider this code which writes 2,000 ulong values to a file and then reads them back in using the code above:
using (var output = File.OpenWrite("x"))
{
for (ulong i = 0; i < 2000; ++i)
{
output.Write(BitConverter.GetBytes(i));
}
}
using var fs = File.OpenRead("x");
int buf_bytes = sizeof(ulong) * 1024; // Or whatever buffer size you need.
var ba_buf = new byte[buf_bytes];
var span_buf = ba_buf.AsSpan();
var data_span = MemoryMarshal.Cast<byte, ulong>(span_buf);
while (true)
{
int count = fs.Read(span_buf) / sizeof(ulong);
if (count == 0)
break;
for (int i = 0; i < count; i++)
{
// Do something with data_span[i]
Console.WriteLine(data_span[i]); // Put your own processing here.
}
}

Related

Algorithms and techniques for string search across multiple GiB of text files

I have to create a utility that searches through 40 to 60 GiB of text files as quick as possible.
Each file has around 50 MB of data that consists of log lines (about 630.000 lines per file).
A NOSQL document database is unfortunately no option...
As of now I am using a Aho-Corsaick algorithm for the search which I stole from Tomas Petricek off of his blog. It works very well.
I process the files in Tasks. Each file is loaded into memory by simply calling File.ReadAllLines(path). The lines are then fed into the Aho-Corsaick one by one, thus each file causes around 600.000 calls to the algorithm (I need the line number in my results).
This takes a lot of time and requires a lot of memory and CPU.
I have very little expertise in this field as I usually work in image processing.
Can you guys recommend algorithms and approaches which could speed up the processing?
Below is more detailed view to the Task creation and file loading which is pretty standard. For more information on the Aho-Corsaick, please visit the linked blog page above.
private KeyValuePair<string, StringSearchResult[]> FindInternal(
IStringSearchAlgorithm algo,
string file)
{
List<StringSearchResult> result = new List<StringSearchResult>();
string[] lines = File.ReadAllLines(file);
for (int i = 0; i < lines.Length; i++)
{
var results = algo.FindAll(lines[i]);
for (int j = 0; j < results.Length; j++)
{
results[j].Row = i;
}
}
foreach (string line in lines)
{
result.AddRange(algo.FindAll(line));
}
return new KeyValuePair<string, StringSearchResult[]>(
file, result.ToArray());
}
public Dictionary<string, StringSearchResult[]> Find(
params string[] search)
{
IStringSearchAlgorithm algo = new StringSearch();
algo.Keywords = search;
Task<KeyValuePair<string, StringSearchResult[]>>[] findTasks
= new Task<KeyValuePair<string, StringSearchResult[]>>[_files.Count];
Parallel.For(0, _files.Count, i => {
findTasks[i] = Task.Factory.StartNew(
() => FindInternal(algo, _files[i])
);
});
Task.WaitAll(findTasks);
return findTasks.Select(t => t.Result)
.ToDictionary(x => x.Key, x => x.Value);
}

EDIT
See section Initial Answer for the original Answer.
I further optimized my code by doing the following:
Added paging to prevent memory overflow / crash due to large amount of result data.
I offload the search results into local files as soon as they exceed a certain buffer size (64kb in my case).
Offloading the results required me to convert my SearchData struct to binary and back.
Splicing the array of files which are processed and running them in Tasks greatly increased performance (from 35 sec to 9 sec when processing about 25 GiB of search data)
Splicing / scaling the file array
The code below gives a scaled/normalized value for T_min and T_max.
This value can then be used to determine the size of each array holding n-amount of file paths.
private int ScalePartition(int T_min, int T_max)
{
// Scale m to range.
int m = T_max / 2;
int t_min = 4;
int t_max = Math.Max(T_max / 16, T_min);
m = ((T_min - m) / (T_max - T_min)) * (t_max - t_min) + t_max;
return m;
}
This code shows the implementation of the scaling and splicing.
// Get size of file array portion.
int scale = ScalePartition(1, _files.Count);
// Iterator.
int n = 0;
// List containing tasks.
List<Task<SearchData[]>> searchTasks = new List<Task<SearchData[]>>();
// Loop through files.
while (n < _files.Count) {
// Local instance of n.
// You will get an AggregateException if you use n
// as n changes during runtime.
int num = n;
// The amount of items to take.
// This needs to be calculated as there might be an
// odd number of elements in the file array.
int cnt = n + scale > _files.Count ? _files.Count - n : scale;
// Run the Find(int, int, Regex[]) method and add as task.
searchTasks.Add(Task.Run(() => Find(num, cnt, regexes)));
// Increment iterator by the amount of files stored in scale.
n += scale;
}
Initial Answer
I had the best results so far after switching to MemoryMappedFile and moving from the Aho-Corsaick back to Regex (a demand has been made that pattern matching is a must have).
There are still parts that can be optimized or changed and I'm sure this is not the fastest or best solution but for it's alright.
Here is the code which returns the results in 30 seconds for 25 GiB worth of data:
// GNU coreutil wc defined buffer size.
// Had best performance with this buffer size.
//
// Definition in wc.c:
// -------------------
// /* Size of atomic reads. */
// #define BUFFER_SIZE (16 * 1024)
//
private const int BUFFER_SIZE = 16 * 1024;
private KeyValuePair<string, SearchData[]> FindInternal(Regex[] rgx, string file)
{
// Buffer for data segmentation.
byte[] buffer = new byte[BUFFER_SIZE];
// Get size of file.
FileInfo fInfo = new FileInfo(file);
long fSize = fInfo.Length;
fInfo = null;
// List of results.
List<SearchData> results = new List<SearchData>();
// Create MemoryMappedFile.
string name = "mmf_" + Path.GetFileNameWithoutExtension(file);
using (var mmf = MemoryMappedFile.CreateFromFile(
file, FileMode.Open, name))
{
// Create read-only in-memory access to file data.
using (var accessor = mmf.CreateViewStream(
0, fSize,
MemoryMappedFileAccess.Read))
{
// Store current position.
int pos = (int)accessor.Position;
// Check if file size is less then the
// default buffer size.
int cnt = (int)(fSize - BUFFER_SIZE > 0
? BUFFER_SIZE
: fSize - BUFFER_SIZE);
// Iterate through file until end of file is reached.
while (accessor.Position < fSize)
{
// Write data to buffer.
accessor.Read(buffer, 0, cnt);
// Update position.
pos = (int)accessor.Position;
// Update next buffer size.
cnt = (int)(fSize - pos >= BUFFER_SIZE
? BUFFER_SIZE
: fSize - pos);
// Convert buffer data to string for Regex search.
string s = Encoding.UTF8.GetString(buffer);
// Run regex against extracted data.
foreach (Regex r in rgx) {
// Get matches.
MatchCollection matches = r.Matches(s);
// Create SearchData struct to reduce memory
// impact and only keep relevant data.
foreach (Match m in matches) {
SearchData sd = new SearchData();
// The actual matched string.
sd.Match = m.Value;
// The index in the file.
sd.Index = m.Index + pos;
// Index to find beginning of line.
int nFirst = m.Index;
// Index to find end of line.
int nLast = m.Index;
// Go back in line until the end of the
// preceeding line has been found.
while (s[nFirst] != '\n' && nFirst > 0) {
nFirst--;
}
// Append length of \r\n (new line).
// Change this to 1 if you work on Unix system.
nFirst+=2;
// Go forth in line until the end of the
// current line has been found.
while (s[nLast] != '\n' && nLast < s.Length-1) {
nLast++;
}
// Remove length of \r\n (new line).
// Change this to 1 if you work on Unix system.
nLast-=2;
// Store whole line in SearchData struct.
sd.Line = s.Substring(nFirst, nLast - nFirst);
// Add result.
results.Add(sd);
}
}
}
}
}
return new KeyValuePair<string, SearchData[]>(file, results.ToArray());
}
public List<KeyValuePair<string, SearchData[]>> Find(params string[] search)
{
var results = new List<KeyValuePair<string, SearchData[]>>();
// Prepare regex objects.
Regex[] regexes = new Regex[search.Length];
for (int i=0; i<regexes.Length; i++) {
regexes[i] = new Regex(search[i], RegexOptions.Compiled);
}
// Get all search results.
// Creating the Regex once and passing it
// to the sub-routine is best as the regex
// engine adds a lot of overhead.
foreach (var file in _files) {
var data = FindInternal(regexes, file);
results.Add(data);
}
return results;
}
I had a stupid idea yesterday were I though that it might work out converting the file data to a bitmap and looking for the input within pixels as pixel checking is quite fast.
Just for the giggles... here is the non-optimized test code for that stupid idea:
public struct SearchData
{
public string Line;
public string Search;
public int Row;
public SearchData(string l, string s, int r) {
Line = l;
Search = s;
Row = r;
}
}
internal static class FileToImage
{
public static unsafe SearchData[] FindText(string search, Bitmap bmp)
{
byte[] buffer = Encoding.ASCII.GetBytes(search);
BitmapData data = bmp.LockBits(
new Rectangle(0, 0, bmp.Width, bmp.Height),
ImageLockMode.ReadOnly, bmp.PixelFormat);
List<SearchData> results = new List<SearchData>();
int bpp = Bitmap.GetPixelFormatSize(bmp.PixelFormat) / 8;
byte* ptFirst = (byte*)data.Scan0;
byte firstHit = buffer[0];
bool isFound = false;
for (int y=0; y<data.Height; y++) {
byte* ptStride = ptFirst + (y * data.Stride);
for (int x=0; x<data.Stride; x++) {
if (firstHit == ptStride[x]) {
byte[] temp = new byte[buffer.Length];
if (buffer.Length < data.Stride-x) {
int ret = 0;
for (int n=0, xx=x; n<buffer.Length; n++, xx++) {
if (ptStride[xx] != buffer[n]) {
break;
}
ret++;
}
if (ret == buffer.Length) {
int lineLength = 0;
for (int n = 0; n<data.Stride; n+=bpp) {
if (ptStride[n+2] == 255 &&
ptStride[n+1] == 255 &&
ptStride[n+0] == 255)
{
lineLength=n;
}
}
SearchData sd = new SearchData();
byte[] lineBytes = new byte[lineLength];
Marshal.Copy((IntPtr)ptStride, lineBytes, 0, lineLength);
sd.Search = search;
sd.Line = Encoding.ASCII.GetString(lineBytes);
sd.Row = y;
results.Add(sd);
}
}
}
}
}
return results.ToArray();
bmp.UnlockBits(data);
return null;
}
private static unsafe Bitmap GetBitmapInternal(string[] lines, int startIndex, Bitmap bmp)
{
int bpp = Bitmap.GetPixelFormatSize(bmp.PixelFormat) / 8;
BitmapData data = bmp.LockBits(
new Rectangle(0, 0, bmp.Width, bmp.Height),
ImageLockMode.ReadWrite,
bmp.PixelFormat);
int index = startIndex;
byte* ptFirst = (byte*)data.Scan0;
int maxHeight = bmp.Height;
if (lines.Length - startIndex < maxHeight) {
maxHeight = lines.Length - startIndex -1;
}
for (int y = 0; y < maxHeight; y++) {
byte* ptStride = ptFirst + (y * data.Stride);
index++;
int max = lines[index].Length;
max += (max % bpp);
lines[index] += new string('\0', max % bpp);
max = lines[index].Length;
for (int x=0; x+2<max; x+=bpp) {
ptStride[x+0] = (byte)lines[index][x+0];
ptStride[x+1] = (byte)lines[index][x+1];
ptStride[x+2] = (byte)lines[index][x+2];
}
ptStride[max+2] = 255;
ptStride[max+1] = 255;
ptStride[max+0] = 255;
for (int x = max + bpp; x < data.Stride; x += bpp) {
ptStride[x+2] = 0;
ptStride[x+1] = 0;
ptStride[x+0] = 0;
}
}
bmp.UnlockBits(data);
return bmp;
}
public static unsafe Bitmap[] GetBitmap(string filePath)
{
int bpp = Bitmap.GetPixelFormatSize(PixelFormat.Format24bppRgb) / 8;
var lines = System.IO.File.ReadAllLines(filePath);
int y = 0x800; //lines.Length / 0x800;
int x = lines.Max(l => l.Length) / bpp;
int cnt = (int)Math.Ceiling((float)lines.Length / (float)y);
Bitmap[] results = new Bitmap[cnt];
for (int i = 0; i < results.Length; i++) {
results[i] = new Bitmap(x, y, PixelFormat.Format24bppRgb);
results[i] = GetBitmapInternal(lines, i * 0x800, results[i]);
}
return results;
}
}

You can split the file into partitions and regex search each partition in parallel then join the results. There are some sharp edges in the details like handling values that span two partitions. Gigantor is a c# library I have created that does this very thing. Feel free to try it or have a look at the source code.

Fastest Way To Create an Array of a NumberSequence

I would like to create an array with of length X, and i would like the following 'inteligence'
if , for exemple, X = 6,
myArray[x] = [0,1,2,3,4,5]
For the moment, i do
int[] availableIndex = new int[DestructiblesCubes.Count];
for (var i = 0; i < availableIndex.Length; i++)
{
availableIndex[i] = i;
}
But, i'm curious, is there a better (the faster way to execute it) and/or the faster(the shortest char length) way?
Thanks :)

This is short way to implement this. Not the performance best solution.
Enumerable.Range(0, 10).ToArray()
MSDN description for Enumerable.Range

I think the fastest method uses the unsafe context together with a proper fixed pointer to the array, as demonstrated below:
/*const*/ int availableIndex_Length = 6;
int[] availableIndex = new int[availableIndex_Length];
unsafe {
fixed(int* p = &availableIndex[0]) {
for(int i = 0; i < availableIndex_Length; ++i) {
*(p+i) = i;
}
}
}
This can be refactored to a method, optionally inlined:
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static unsafe void FillRange(ref int[] array) {
int length = array.Length;
fixed(int* p = &array[0]) {
for(int i = 0; i < length; ++i) {
*(p + i) = i;
}
}
}
static void Main(string[] args) {
// Example usage:
int[] availableIndices = new int[6];
FillRange(ref availableIndices);
// Test if it worked:
foreach(var availableIndex in availableIndices) {
Console.WriteLine(availableIndex);
}
Console.ReadKey(true);
}

You could try this:
unsafe
{
int[] availableIndex = new int[DestructiblesCubes.Count];
int length = availableIndex.Length;
int n = 0;
fixed(int *p = availableIndex) {
while(n < length) *p++ = n++;
}
}
may be faster, depending on the optimization stage of your compiler.

The only optimisation I can see to apply to your code as it stands is to count down to zero, but any increase in performance will be tiny
int[] availableIndex = new int[DestructiblesCubes.Count];
for (var i = availableIndex.Length-1; i >= 0; i--)
{
availableIndex[i] = i;
}
Otherwise, especially if you're talking large arrays, one thing to try would be to create an array greater than your max envisioned value of DestructiblesCubes.Count and
Intialize that array as above, then use Array.Copy when you want the smaller array.
I would be confident that no code we hand roll will be faster than a single call to Array.Copy.
int[] availableIndex = new int[DestructiblesCubes.Count];
Array.Copy(LargeArray, availableIndex, availableIndex.Length);
Otherwise I can't think of anything that might be faster than the code you have.

AccessViolationException with sound buffer conversion

I'm using Naudio AsioOut object to pass data from input buffer to my delayProc() function and then to output buffer.
The delayProc() needs float[] buffer type, and this is possible using e.GetAsInterleavedSamples(). The problem is I need to re-convert it to a multidimensional IntPtr, to do this I'm using AsioSampleConvertor class.
When I try to apply the effect it shows me an error: AccessViolationException on the code of AsioSampleConvertor class.
So I think the problem is due to the conversion from float[] to IntPtr[]..
I give you some code:
OnAudioAvailable()
floatIn = new float[e.SamplesPerBuffer * e.InputBuffers.Length];//*2
e.GetAsInterleavedSamples(floatIn);
floatOut = delayProc(floatIn, e.SamplesPerBuffer * e.InputBuffers.Length, 1.5f);
//conversione da float[] a IntPtr[L][R]
Outp = Marshal.AllocHGlobal(sizeof(float)*floatOut.Length);
Marshal.Copy(floatOut, 0, Outp, floatOut.Length);
NAudio.Wave.Asio.ASIOSampleConvertor.ConvertorFloatToInt2Channels(Outp, e.OutputBuffers, e.InputBuffers.Length, floatOut.Length);
delayProc()
private float[] delayProc(float[] sourceBuffer, int sampleCount, float delay)
{
if (OldBuf == null)
{
OldBuf = new float[sampleCount];
}
float[] BufDly = new float[(int)(sampleCount * delay)];
int delayLength = (int)(BufDly.Length - (BufDly.Length / delay));
for (int j = sampleCount - delayLength; j < sampleCount; j++)
for (int i = 0; i < delayLength; i++)
BufDly[i] = OldBuf[j];
for (int j = 0; j < sampleCount; j++)
for (int i = delayLength; i < BufDly.Length; i++)
BufDly[i] = sourceBuffer[j];
for (int i = 0; i < sampleCount; i++)
OldBuf[i] = sourceBuffer[i];
return BufDly;
}
AsioSampleConvertor
public static void ConvertorFloatToInt2Channels(IntPtr inputInterleavedBuffer, IntPtr[] asioOutputBuffers, int nbChannels, int nbSamples)
{
unsafe
{
float* inputSamples = (float*)inputInterleavedBuffer;
int* leftSamples = (int*)asioOutputBuffers[0];
int* rightSamples = (int*)asioOutputBuffers[1];
for (int i = 0; i < nbSamples; i++)
{
*leftSamples++ = clampToInt(inputSamples[0]);
*rightSamples++ = clampToInt(inputSamples[1]);
inputSamples += 2;
}
}
}
ClampToInt()
private static int clampToInt(double sampleValue)
{
sampleValue = (sampleValue < -1.0) ? -1.0 : (sampleValue > 1.0) ? 1.0 : sampleValue;
return (int)(sampleValue * 2147483647.0);
}
If you need some other code, just ask me.

When you call ConvertorFloatToInt2Channels you are passing in the total number of samples across all channels, then trying to read that many pairs of samples. So you are trying to read twice as many samples from your input buffer as are actually there. Using unsafe code you are trying to address well past the end of the allocated block, which results in the access violation you are getting.
Change the for loop in your ConvertorFloatToInt2Channels method to read:
for (int i = 0; i < nbSamples; i += 2)
This will stop your code from trying to read double the number of items actually present in the source memory block.
Incidentally, why are you messing around with allocating global memory and using unsafe code here? Why not process them as managed arrays? Processing the data itself isn't much slower, and you save on all the overheads of copying data to and from unmanaged memory.
Try this:
public static void FloatMonoToIntStereo(float[] samples, float[] leftChannel, float[] rightChannel)
{
for (int i = 0, j = 0; i < samples.Length; i += 2, j++)
{
leftChannel[j] = (int)(samples[i] * Int32.MaxValue);
rightChannel[j] = (int)(samples[i + 1] * Int32.MaxValue);
}
}
On my machine that processes around 12 million samples per second, converting the samples to integer and splitting the channels. About half that speed if I allocate the buffers for every set of results. About half again when I write that to use unsafe code, AllocHGlobal etc.
Never assume that unsafe code is faster.

How to initialize integer array in C# [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
c# Leaner way of initializing int array
Basically I would like to know if there is a more efficent code than the one shown below
private static int[] GetDefaultSeriesArray(int size, int value)
{
int[] result = new int[size];
for (int i = 0; i < size; i++)
{
result[i] = value;
}
return result;
}
where size can vary from 10 to 150000. For small arrays is not an issue, but there should be a better way to do the above.
I am using VS2010(.NET 4.0)

C#/CLR does not have built in way to initalize array with non-default values.
Your code is as efficient as it could get if you measure in operations per item.
You can get potentially faster initialization if you initialize chunks of huge array in parallel. This approach will need careful tuning due to non-trivial cost of mutlithread operations.
Much better results can be obtained by analizing your needs and potentially removing whole initialization alltogether. I.e. if array is normally contains constant value you can implement some sort of COW (copy on write) approach where your object initially have no backing array and simpy returns constant value, that on write to an element it would create (potentially partial) backing array for modified segment.
Slower but more compact code (that potentially easier to read) would be to use Enumerable.Repeat. Note that ToArray will cause significant amount of memory to be allocated for large arrays (which may also endup with allocations on LOH) - High memory consumption with Enumerable.Range?.
var result = Enumerable.Repeat(value, size).ToArray();

One way that you can improve speed is by utilizing Array.Copy. It's able to work at a lower level in which it's bulk assigning larger sections of memory.
By batching the assignments you can end up copying the array from one section to itself.
On top of that, the batches themselves can be quite effectively paralleized.
Here is my initial code up. On my machine (which only has two cores) with a sample array of size 10 million items, I was getting a 15% or so speedup. You'll need to play around with the batch size (try to stay in multiples of your page size to keep it efficient) to tune it to the size of items that you have. For smaller arrays it'll end up almost identical to your code as it won't get past filling up the first batch, but it also won't be (noticeably) worse in those cases either.
private const int batchSize = 1048576;
private static int[] GetDefaultSeriesArray2(int size, int value)
{
int[] result = new int[size];
//fill the first batch normally
int end = Math.Min(batchSize, size);
for (int i = 0; i < end; i++)
{
result[i] = value;
}
int numBatches = size / batchSize;
Parallel.For(1, numBatches, batch =>
{
Array.Copy(result, 0, result, batch * batchSize, batchSize);
});
//handle partial leftover batch
for (int i = numBatches * batchSize; i < size; i++)
{
result[i] = value;
}
return result;
}

Another way to improve performance is with a pretty basic technique: loop unrolling.
I have written some code to initialize an array with 20 million items, this is done repeatedly 100 times and an average is calculated. Without unrolling the loop, this takes about 44 MS. With loop unrolling of 10 the process is finished in 23 MS.
private void Looper()
{
int repeats = 100;
float avg = 0;
ArrayList times = new ArrayList();
for (int i = 0; i < repeats; i++)
times.Add(Time());
Console.WriteLine(GetAverage(times)); //44
times.Clear();
for (int i = 0; i < repeats; i++)
times.Add(TimeUnrolled());
Console.WriteLine(GetAverage(times)); //22
}
private float GetAverage(ArrayList times)
{
long total = 0;
foreach (var item in times)
{
total += (long)item;
}
return total / times.Count;
}
private long Time()
{
Stopwatch sw = new Stopwatch();
int size = 20000000;
int[] result = new int[size];
sw.Start();
for (int i = 0; i < size; i++)
{
result[i] = 5;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
return sw.ElapsedMilliseconds;
}
private long TimeUnrolled()
{
Stopwatch sw = new Stopwatch();
int size = 20000000;
int[] result = new int[size];
sw.Start();
for (int i = 0; i < size; i += 10)
{
result[i] = 5;
result[i + 1] = 5;
result[i + 2] = 5;
result[i + 3] = 5;
result[i + 4] = 5;
result[i + 5] = 5;
result[i + 6] = 5;
result[i + 7] = 5;
result[i + 8] = 5;
result[i + 9] = 5;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
return sw.ElapsedMilliseconds;
}

Enumerable.Repeat(value, size).ToArray();

Reading up Enumerable.Repeat is 20 times slower than the ops standard for loop and the only thing I found which might improve its speed is
private static int[] GetDefaultSeriesArray(int size, int value)
{
int[] result = new int[size];
for (int i = 0; i < size; ++i)
{
result[i] = value;
}
return result;
}
NOTE the i++ is changed to ++i. i++ copies i, increments i, and returns the original value. ++i just returns the incremented value

As someone already mentioned, you can leverage parallel processing like this:
int[] result = new int[size];
Parallel.ForEach(result, x => x = value);
return result;
Sorry I had no time to do performance testing on this (don't have VS installed on this machine) but if you can do it and share the results it would be great.
EDIT: As per comment, while I still think that in terms of performance they are equivalent, you can try the parallel for loop:
Parallel.For(0, size, i => result[i] = value);

Taking large (1 million) number of substring (100 character wide) reads from a long string (3 million characters)

How can I take 1 million substring from a string with more than 3 million characters efficiently in C#? I have written a program which involves reading random DNA reads (substrings from random position) of length say 100 from a string with 3 million characters. There are 1 million such reads. Currently i run a while loop that runs 1 million times and read a substring of 100 character length from the string with 3 million character. This is taking a long time. What can i do to complete this faster?
heres my code, len is the length of the original string, 3 million in this case, it may be as low as 50 thats why the check in the while loop.
while(i < 1000000 && len-100> 0) //len is 3000000
{
int randomPos = _random.Next()%(len - ReadLength);
readString += all.Substring(randomPos, ReadLength) + Environment.NewLine;
i++;
}

Using a StringBuilder to assemble the string will get you a 600 times increase in processing (as it avoids repeated object creation everytime you append to the string.
before loop (initialising capacity avoids recreating the backing array in StringBuilder):
StringBuilder sb = new StringBuilder(1000000 * ReadLength);
in loop:
sb.Append(all.Substring(randomPos, ReadLength) + Environment.NewLine);
after loop:
readString = sb.ToString();
Using a char array instead of a string to extract the values yeilds another 30% improvement as you avoid object creation incurred when calling Substring():
before loop:
char[] chars = all.ToCharArray();
in loop:
sb.Append(chars, randomPos, ReadLength);
sb.AppendLine();
Edit (final version which does not use StringBuilder and executes in 300ms):
char[] chars = all.ToCharArray();
var iterations = 1000000;
char[] results = new char[iterations * (ReadLength + 1)];
GetRandomStrings(len, iterations, ReadLength, chars, results, 0);
string s = new string(results);
private static void GetRandomStrings(int len, int iterations, int ReadLength, char[] chars, char[] result, int resultIndex)
{
Random random = new Random();
int i = 0, index = resultIndex;
while (i < iterations && len - 100 > 0) //len is 3000000
{
var i1 = len - ReadLength;
int randomPos = random.Next() % i1;
Array.Copy(chars, randomPos, result, index, ReadLength);
index += ReadLength;
result[index] = Environment.NewLine[0];
index++;
i++;
}
}

I think better solutions will come, but .NET StringBuilder class instances are faster than String class instances because it handles data as a Stream.
You can split the data in pieces and use .NET Task Parallel Library for Multithreading and Parallelism
Edit: Assign fixed values to a variable out of the loop to avoid recalculation;
int x = len-100
int y = len-ReadLength
use
StringBuilder readString= new StringBuilder(ReadLength * numberOfSubStrings);
readString.AppendLine(all.Substring(randomPos, ReadLength));
for Parallelism you should split your input to pieces. Then run these operations on pieces in seperate threads. Then combine the results.
Important: As my previous experiences showed these operations run faster with .NET v2.0 rather than v4.0, so you should change your projects target framework version; but you can't use Task Parallel Library with .NET v2.0 so you should use multithreading in oldschool way like
Thread newThread ......

How long is a long time ? It shouldn't be that long.
var file = new StreamReader(#"E:\Temp\temp.txt");
var s = file.ReadToEnd();
var r = new Random();
var sw = new Stopwatch();
sw.Start();
var range = Enumerable.Range(0,1000000);
var results = range.Select( i => s.Substring(r.Next(s.Length - 100),100)).ToList();
sw.Stop();
sw.ElapsedMilliseconds.Dump();
s.Length.Dump();
So on my machine the results were 807ms and the string is 4,055,442 chars.
Edit: I just noticed that you want a string as a result, so my above solution just changes to...
var results = string.Join(Environment.NewLine,range.Select( i => s.Substring(r.Next(s.Length - 100),100)).ToArray());
And adds about 100ms, so still under a second in total.

Edit: I abandoned the idea to use memcpy, and I think the result is super great.
I've broken a 3m length string into 30k strings of 100 length each in 43 milliseconds.
private static unsafe string[] Scan(string hugeString, int subStringSize)
{
var results = new string[hugeString.Length / subStringSize];
var gcHandle = GCHandle.Alloc(hugeString, GCHandleType.Pinned);
var currAddress = (char*)gcHandle.AddrOfPinnedObject();
for (var i = 0; i < results.Length; i++)
{
results[i] = new string(currAddress, 0, subStringSize);
currAddress += subStringSize;
}
return results;
}
To use the method for the case shown in the question:
const int size = 3000000;
const int subSize = 100;
var stringBuilder = new StringBuilder(size);
var random = new Random();
for (var i = 0; i < size; i++)
{
stringBuilder.Append((char)random.Next(30, 80));
}
var hugeString = stringBuilder.ToString();
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < 1000; i++)
{
var strings = Scan(hugeString, subSize);
}
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds / 1000); // 43

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# performance - pointer to span in a hot loop - c#

Assuming ba_buf is a byte[], a very easy and efficient way to run your loop is as such: foreach(var value in MemoryMarshal.Cast<byte, ulong>(ba_buf)) // work with value here If you need to finesse the buffer (for example, to cut off parts of it), use AsSpan(start, count) on it first.

Related

Algorithms and techniques for string search across multiple GiB of text files

Fastest Way To Create an Array of a NumberSequence

AccessViolationException with sound buffer conversion

How to initialize integer array in C# [duplicate]

Taking large (1 million) number of substring (100 character wide) reads from a long string (3 million characters)

Categories

Resources