Intrinsics SIMD instruction to replace values

Intrinsics SIMD instruction to replace values - c#

I wonder how it would be possible to replace byte values in a Vector128<byte>
I think it is okay to assume the code below where we have a resultvector with
those values :
<0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0>
Here I like to create a new vector where all "0" will be replaced with "2"
and all "1" will be replaced with "0" like this :
<2,2,2,2,0,0,0,0,2,2,2,2,2,2,2,2>
I am not sure if there is an intrinsics for this or how to achieve this?
Thank you!
//Create array
byte[] array = new byte[16];
for (int i = 0; i < 4; i++) { array[i] = 0; }
for (int i = 4; i < 8; i++) { array[i] = 1; }
for (int i = 8; i < 16; i++) { array[i] = 0; }
fixed (byte* ptr = array)
{
byte* pointarray = &*((byte*)(ptr + 0));
System.Runtime.Intrinsics.Vector128<byte> resultvector = System.Runtime.Intrinsics.X86.Avx.LoadVector128(&pointarray[0]);
//<0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0>
//resultvector
}

The instruction for that is pshufb, available in modern .NET as Avx2.Shuffle, and Ssse3.Shuffle for 16-byte version. Both are really fast, 1 cycle latency on modern CPUs.
Pass your source data into shuffle control mask argument, and a special value for the first argument which is the bytes being shuffled, something like this:
// Create AVX vector with all zeros except the first byte in each 16-byte lane which is 2
static Vector256<byte> makeShufflingVector()
{
Vector128<byte> res = Vector128<byte>.Zero;
res = Sse2.Insert( res.AsInt16(), 2, 0 ).AsByte();
return Vector256.Create( res, res );
}
See _mm_shuffle_epi8 section on page 18 of this article for details.
Update: if you don’t have SSSE3, you can do the same in SSE2, in 2 instructions instead of 1:
static Vector128<byte> replaceZeros( Vector128<byte> src )
{
src = Sse2.CompareEqual( src, Vector128<byte>.Zero );
return Sse2.And( src, Vector128.Create( (byte)2 ) );
}
By the way, there’s a performance problem in .NET that prevents compiler from loading constants outside of loops. If you gonna call that method in a loop and want to maximize the performance, consider passing both constant vectors, with zero and 2, as method parameters.

Related

C# performance - pointer to span in a hot loop

I'm looking for a faster alternative to BitConverter:
But! Inside a "hot loop":
//i_k_size = 8 bytes
while (fs.Read(ba_buf, 0, ba_buf.Length) > 0 && dcm_buf_read_ctr < i_buf_reads)
{
Span<byte> sp_data = ba_buf.AsSpan();
for (int i = 0; i < ba_buf.Length; i += i_k_size)
{
UInt64 k = BitConverter.ToUInt64(sp_data.Slice(i, i_k_size));
}
}
My efforts to integrate a pointer with conversion - made performance worse. Can a pointer be used to maki it faster with span?
Below is the benchmark: pointer 2 array is 2x faster
Actually I want this code to be used instead of BitConverter:
public static int l_1gb = 1073741824;
static unsafe void Main(string[] args)
{
Random rnd = new Random();
Stopwatch sw1 = new();
sw1.Start();
byte[] k = new byte[8];
fixed (byte* a2rr = &k[0])
{
for (int i = 0; i < 1000000000; i++)
{
rnd.NextBytes(k);
//UInt64 p1 = BitConverter.ToUInt64(k);
//time: 10203.824
//time: 10508.981
//time: 10246.784
//time: 10285.889
//UInt64* uint64ptr = (UInt64*)a2rr;
//x2 performance !
UInt64 p2 = *(UInt64*)a2rr;
//time: 4609.814
//time: 4588.157
//time: 4634.494
}
}
Console.WriteLine($"time: {Math.Round(sw1.Elapsed.TotalMilliseconds, 3)}");
}

Assuming ba_buf is a byte[], a very easy and efficient way to run your loop is as such:
foreach(var value in MemoryMarshal.Cast<byte, ulong>(ba_buf))
// work with value here
If you need to finesse the buffer (for example, to cut off parts of it), use AsSpan(start, count) on it first.

You can optimise this quite a lot by initialising some spans outside the reading loop and then read directly into a Span<byte> and access the data via a Span<ulong> like so:
int buf_bytes = sizeof(ulong) * 1024; // Or whatever buffer size you need.
var ba_buf = new byte[buf_bytes];
var span_buf = ba_buf.AsSpan();
var data_span = MemoryMarshal.Cast<byte, ulong>(span_buf);
while (true)
{
int count = fs.Read(span_buf) / sizeof(ulong);
if (count == 0)
break;
for (int i = 0; i < count; i++)
{
// Do something with data_span[i]
Console.WriteLine(data_span[i]); // Put your own processing here.
}
}
This avoids memory allocation as much as possible. It terminates the reading loop when it runs out of data, and if the number of bytes returned is not a multiple of sizeof(ulong) it ignores the extra bytes.
It will always read all the available data, but if you want to terminate it earlier you can add code to do so.
As an example, consider this code which writes 2,000 ulong values to a file and then reads them back in using the code above:
using (var output = File.OpenWrite("x"))
{
for (ulong i = 0; i < 2000; ++i)
{
output.Write(BitConverter.GetBytes(i));
}
}
using var fs = File.OpenRead("x");
int buf_bytes = sizeof(ulong) * 1024; // Or whatever buffer size you need.
var ba_buf = new byte[buf_bytes];
var span_buf = ba_buf.AsSpan();
var data_span = MemoryMarshal.Cast<byte, ulong>(span_buf);
while (true)
{
int count = fs.Read(span_buf) / sizeof(ulong);
if (count == 0)
break;
for (int i = 0; i < count; i++)
{
// Do something with data_span[i]
Console.WriteLine(data_span[i]); // Put your own processing here.
}
}

Breaking up underlying binary in Byte array into 10 or 12 bit words: C#

I'm parsing binary data from a file that comes in as a byte array. I'm trying to split the underlying binary of the array into 'words' (every 10 or 12 bits). I have a function that does this but it is pretty time consuming as I'm dealing with a lot of data. I have limited programming experience so I'm sure there's a better way to accomplish this.
private void separateWords(List<byte[]> minorFrames, int wordSize, int frameLength)
{
UInt16[] wordArray = new UInt16[frameLength];
foreach (byte[] array in minorFrames)
{
// Convert byte array to bit array
// Bits need to be reversed on a byte boundary
byte[] temp = new byte[array.Length];
for (int i = 0; i < array.Length; i++)
{
temp[i] = ReverseBits(array[i]);
}
BitArray binaryArray = new BitArray(temp);
for (int i = 0; i < (binaryArray.Length / wordSize); i++ )
{
UInt16 newWord = 0;
for (int j = 0; j < wordSize; j++)
{ // Converts every n bits to UInt16
if (binaryArray[j + (i*wordSize)])
newWord += Convert.ToUInt16(Math.Pow(2, ((wordSize-1)-j)));
}
wordArray[i]=newWord; // Populate formatted minor frame
}
words.Add(wordArray); // add populated minor frame to lsit
}
}
Ideally I'd like to operate directly on the byte array. The 'words' will be saved into UInt16's to keep the output size as small as possible.
My current thought is:
Shift first 10 bits into UInt16 variable
add variable to array of words
shift entire byte array over 10 bits
repeat
I'm having some trouble shifting bits into a UInt16 though, and unsure how to shift an entire array. Maybe there's a better way to approach this?

Math.Pow is very time consuming. Consider using Bitwise and shift operators (C# reference).
if (binaryArray[j + i * wordSize]) {
newWord |= (ushort)(1 << (wordSize - 1 - j));
}

After some feedback I've rewritten the loop to:
public List<UInt16[]> separateWords(List<byte[]> minorFrames, int wordSize, int frameLength)
{
List<UInt16[]> framedWords = new List<UInt16[]>();
UInt16 newWord = 0;
foreach (byte[] array in minorFrames)
{
int bitcount = 1;
int wordCount = 0;
BitArray binaryArray = new BitArray(array);
UInt16[] wordArray = new UInt16[frameLength];
for (int i = 1; i <= array.Length; i++)
{
for (int j = 1; j <= 8; j++)
{
newWord <<= 1; // Make room for next bit
newWord |= Convert.ToUInt16(binaryArray[(i * 8) - j]); // Adds next bit in array
if (bitcount % wordSize == 0) // Only if multiple of wordsize
{
wordArray[wordCount] = newWord; // Populate formatted minor frame
newWord = 0; // Reset for next word
wordCount++; // Advance index
}
bitcount++;
}
}
framedWords.Add(wordArray); // add populated minor frame to lsit
}
return framedWords;
}
This took the run time from 12 minutes to 2.5 minutes.

New image overlays previous bitmap

There are a number of posts about this, but i still can't figure it out. I am rather new at this, so please be forgiving.
I display an image, then grab a new image, and try to display it. When the new image is displayed, it has remnants of the old image. I have tried Picture1.Image= null to no avail.
Is it an issue with managed memory? I suspect it has to do with how the memory is being managed, that somehow the code copies a new image over and old image in a way that leaves some data from the previous image.
Here is the code to display the data in scaled1 (from this helpful earlier post):
Edit:
Code added showing processing of arrays that are plotted. The overlaying behavior stops if the arrays are cleared using the Array.Clear method. Perhaps when this is cleared up I can post a canonical snippet demonstrating the issue.
This resets the question as: Why do arrays need to be cleared when each value of the array is rewritten? How can the array retain information of previous values?
ushort[] frame = null;
byte[] scaled1 = null;
double[][] frameringSin;
double[][] frameringCos;
double[] sumsin;
double[] sumcos;
frame = new ushort[mImageWidth * mImageHeight];
scaled1 = new byte[mImageWidth * mImageHeight];
frameringSin = new double[RingSize][];
frameringCos = new double[RingSize][];
ringsin = new double[RingSize];
ringcos = new double[RingSize];
//Fill array with images
for(int ring=0; ring <nN; ++ring)
{
mCamera.GrabFrameReduced(framering[ring], reduced, out preset);
}
//Process images
for (int i = 0; i < nN; ++i)
{
Array.Clear(frameringSin[i], 0, frameringSin.Length);
Array.Clear(frameringCos[i], 0, frameringSin.Length);
}
Array.Clear(sumsin, 0, sumsin.Length);
Array.Clear(sumcos, 0, sumcos.Length);
for(int r=0;r<nN; ++r)
{
for (int i = 0; i < frame.Length; ++i)//upto 12 ms
{
frameringSin[r][i] = framering[r][i]* ringsin[r] / nN;
frameringCos[r][i] = framering[r][i] *ringcos[r] / nN;
}
}
for (int i = 0; i < sumsin.Length; ++i)//up to 25ms
{
for (int r = 0; r < nN; ++r)
{
sumsin[i] += frameringSin[r][i];
sumcos[i] += frameringCos[r][i];
}
}
for(int r=0 ; r<nN ;++r)
{
for (int i = 0; i < sumsin.Length; ++i)
{
A[i] = Math.Sqrt(sumsin[i] * sumsin[i] + sumcos[i] * sumcos[i]);
}
//extract scaling parameters
...
//Scale Image
for (i1 = 0; i1 < frame.Length; ++i1)
scaled1[i1] = (byte)((Math.Min(Math.Max(min1, frameA[i1]), max1) - min1) * scale1);
bmp1 = new Bitmap(mImageWidth,mImageHeight,System.Drawing.Imaging.PixelFormat.Format8bppIndexed);
var bdata1 = bmp1.LockBits(new Rectangle(new Point(0, 0), bmp1.Size), ImageLockMode.WriteOnly, bmp1.PixelFormat);
try
{
Marshal.Copy(scaled1, 0, bdata1.Scan0, scaled1.Length);
}
finally
{
bmp1.UnlockBits(bdata1);
}
Picture1.Image = bmp1;
Picture1.Refresh();

Actually, you're not replacing all values in the arrays - your for cycles are wrong. You want them to look like this:
for (i1 = 0; i1 < frame.Length; i1++)
scaled1[i1] = (byte)((Math.Min(Math.Max(min1, frameA[i1]), max1) - min1)
* scale1);
The difference (i++ vs ++i) is that your way, you're skipping the first and the last index. C# arrays start at 0, while you start at 1 (you increment the loop variable before you run the body for the first time).
Also, note that for performance reasons, it's very handy if you're going through the array like this:
for (var i = 0; i < array.Length; i++)
/* do work with array[i] */
The JIT compiler recognizes this and avoids bounds checks, because it knows there can never be an overflow. When you're doing a lot of work with arrays, this can give you a massive performance boost, even if you access multiple arrays through the same index (one of them will not have the checks, the others will - still saves a lot of work).
The default JIT isn't very smart about this (it has to be quite fast), so pretty much anything else will reintroduce the bounds check. If performance is a concern for you, you'd want to profile the code anyway, of course.
EDIT: Ah, my bad. Anyway, I believe your problem isn't having to clear the frameringXXX arrays, but rather, the sumsin and sumcos arrays - you're always adding to those, so you'd be adding to the value that was already there, rather than starting from zero again. So you need to reset those arrays to zeroes first (which is what Array.Clear does).

Fast intersection of two sorted integer arrays

I need to find the intersection of two sorted integer arrays and do it very fast.
Right now, I am using the following code:
int i = 0, j = 0;
while (i < arr1.Count && j < arr2.Count)
{
if (arr1[i] < arr2[j])
{
i++;
}
else
{
if (arr2[j] < arr1[i])
{
j++;
}
else
{
intersect.Add(arr2[j]);
j++;
i++;
}
}
}
Unfortunately it might to take hours to do all work.
How to do it faster? I found this article where SIMD instructions are used. Is it possible to use SIMD in .NET?
What do you think about:
http://docs.go-mono.com/index.aspx?link=N:Mono.Simd Mono.SIMD
http://netasm.codeplex.com/ NetASM(inject asm code to managed)
and something like http://www.atrevido.net/blog/PermaLink.aspx?guid=ac03f447-d487-45a6-8119-dc4fa1e932e1
EDIT:
When i say thousands i mean following (in code)
for(var i=0;i<arrCollection1.Count-1;i++)
{
for(var j=i+1;j<arrCollection2.Count;j++)
{
Intersect(arrCollection1[i],arrCollection2[j])
}
}

UPDATE
The fastest I got was 200ms with arrays size 10mil, with the unsafe version (Last piece of code).
The test I've did:
var arr1 = new int[10000000];
var arr2 = new int[10000000];
for (var i = 0; i < 10000000; i++)
{
arr1[i] = i;
arr2[i] = i * 2;
}
var sw = Stopwatch.StartNew();
var result = arr1.IntersectSorted(arr2);
sw.Stop();
Console.WriteLine(sw.Elapsed); // 00:00:00.1926156
Full Post:
I've tested various ways to do it and found this to be very good:
public static List<int> IntersectSorted(this int[] source, int[] target)
{
// Set initial capacity to a "full-intersection" size
// This prevents multiple re-allocations
var ints = new List<int>(Math.Min(source.Length, target.Length));
var i = 0;
var j = 0;
while (i < source.Length && j < target.Length)
{
// Compare only once and let compiler optimize the switch-case
switch (source[i].CompareTo(target[j]))
{
case -1:
i++;
// Saves us a JMP instruction
continue;
case 1:
j++;
// Saves us a JMP instruction
continue;
default:
ints.Add(source[i++]);
j++;
// Saves us a JMP instruction
continue;
}
}
// Free unused memory (sets capacity to actual count)
ints.TrimExcess();
return ints;
}
For further improvement you can remove the ints.TrimExcess();, which will also make a nice difference, but you should think if you're going to need that memory.
Also, if you know that you might break loops that use the intersections, and you don't have to have the results as an array/list, you should change the implementation to an iterator:
public static IEnumerable<int> IntersectSorted(this int[] source, int[] target)
{
var i = 0;
var j = 0;
while (i < source.Length && j < target.Length)
{
// Compare only once and let compiler optimize the switch-case
switch (source[i].CompareTo(target[j]))
{
case -1:
i++;
// Saves us a JMP instruction
continue;
case 1:
j++;
// Saves us a JMP instruction
continue;
default:
yield return source[i++];
j++;
// Saves us a JMP instruction
continue;
}
}
}
Another improvement is to use unsafe code:
public static unsafe List<int> IntersectSorted(this int[] source, int[] target)
{
var ints = new List<int>(Math.Min(source.Length, target.Length));
fixed (int* ptSrc = source)
{
var maxSrcAdr = ptSrc + source.Length;
fixed (int* ptTar = target)
{
var maxTarAdr = ptTar + target.Length;
var currSrc = ptSrc;
var currTar = ptTar;
while (currSrc < maxSrcAdr && currTar < maxTarAdr)
{
switch ((*currSrc).CompareTo(*currTar))
{
case -1:
currSrc++;
continue;
case 1:
currTar++;
continue;
default:
ints.Add(*currSrc);
currSrc++;
currTar++;
continue;
}
}
}
}
ints.TrimExcess();
return ints;
}
In summary, the most major performance hit was in the if-else's.
Turning it into a switch-case made a huge difference (about 2 times faster).

Have you tried something simple like this:
var a = Enumerable.Range(1, int.MaxValue/100).ToList();
var b = Enumerable.Range(50, int.MaxValue/100 - 50).ToList();
//var c = a.Intersect(b).ToList();
List<int> c = new List<int>();
var t1 = DateTime.Now;
foreach (var item in a)
{
if (b.BinarySearch(item) >= 0)
c.Add(item);
}
var t2 = DateTime.Now;
var tres = t2 - t1;
This piece of code takes 1 array of 21,474,836 elements and the other one with 21,474,786
If I use var c = a.Intersect(b).ToList(); I get an OutOfMemoryException
The result product would be 461,167,507,485,096 iterations using nested foreach
But with this simple code, the intersection occurred in TotalSeconds = 7.3960529 (using one core)
Now I am still not happy, so I am trying to increase the performance by breaking this in parallel, as soon as I finish I will post it

Yorye Nathan gave me the fastest intersection of two arrays with the last "unsafe code" method. Unfortunately it was still too slow for me, I needed to make combinations of array intersections, which goes up to 2^32 combinations, pretty much no? I made following modifications and adjustments and time dropped to 2.6X time faster. You need to make some pre optimization before, for sure you can do it some way or another. I am using only indexes instead the actual objects or ids or some other abstract comparison. So, by example if you have to intersect big number like this
Arr1: 103344, 234566, 789900, 1947890,
Arr2: 150034, 234566, 845465, 23849854
put everything into and array
Arr1: 103344, 234566, 789900, 1947890, 150034, 845465,23849854
and use, for intersection, the ordered indexes of the result array
Arr1Index: 0, 1, 2, 3
Arr2Index: 1, 4, 5, 6
Now we have smaller numbers with whom we can build some other nice arrays. What I did after taking the method from Yorye, I took Arr2Index and expand it into, theoretically boolean array, practically into byte arrays, because of the memory size implication, to following:
Arr2IndexCheck: 0, 1, 0, 0, 1, 1 ,1
that is more or less a dictionary which tells me for any index if second array contains it.
The next step I did not use memory allocation which also took time, instead I pre-created the result array before calling the method, so during the process of finding my combinations I never instantiate anything. Of course you have to deal with the length of this array separately, so maybe you need to store it somewhere.
Finally the code looks like this:
public static unsafe int IntersectSorted2(int[] arr1, byte[] arr2Check, int[] result)
{
int length;
fixed (int* pArr1 = arr1, pResult = result)
fixed (byte* pArr2Check = arr2Check)
{
int* maxArr1Adr = pArr1 + arr1.Length;
int* arr1Value = pArr1;
int* resultValue = pResult;
while (arr1Value < maxArr1Adr)
{
if (*(pArr2Check + *arr1Value) == 1)
{
*resultValue = *arr1Value;
resultValue++;
}
arr1Value++;
}
length = (int)(resultValue - pResult);
}
return length;
}
You can see the result array size is returned by the function, then you do what you wish(resize it, keep it). Obviously the result array has to have at least the minimum size of arr1 and arr2.
The big improvement, is that I only iterate through the first array, which would be best to have less size than the second one, so you have less iterations. Less iterations means less CPU cycles right?
So here is the really fast intersection of two ordered arrays, that if you need a reaaaaalllyy high performance ;).

Are arrCollection1 and arrCollection2 collections of arrays of integers? IN this case you should get some notable improvement by starting second loop from i+1 as opposed to 0

C# doesn't support SIMD. Additionally, and I haven't yet figured out why, DLL's that use SSE aren't any faster when called from C# than the non-SSE equivalent functions. Also, all SIMD extensions that I know of don't work with branching anyway, ie your "if" statements.
If you're using .net 4.0, you can use Parallel For to gain speed if you have multiple cores. Otherwise you can write a multithreaded version if you have .net 3.5 or less.
Here is a method similar to yours:
IList<int> intersect(int[] arr1, int[] arr2)
{
IList<int> intersect = new List<int>();
int i = 0, j = 0;
int iMax = arr1.Length - 1, jMax = arr2.Length - 1;
while (i < iMax && j < jMax)
{
while (i < iMax && arr1[i] < arr2[j]) i++;
if (arr1[i] == arr2[j]) intersect.Add(arr1[i]);
while (i < iMax && arr1[i] == arr2[j]) i++; //prevent reduntant entries
while (j < jMax && arr2[j] < arr1[i]) j++;
if (arr1[i] == arr2[j]) intersect.Add(arr1[i]);
while (j < jMax && arr2[j] == arr1[i]) j++; //prevent redundant entries
}
return intersect;
}
This one also prevents any entry from appearing twice. With 2 sorted arrays both of size 10 million, it completed in about a second. The compiler is supposed to remove array bounds checks if you use array.Length in a For statement, I don't know if that works in a while statement though.

Storing sum of chunks of array through one pass

Let's say I have the array
1,2,3,4,5,6,7,8,9,10,11,12
if my chunck size = 4
then I want to be able to have a method that will output an array of ints int[] a =
a[0] = 1
a[1] = 3
a[2] = 6
a[3] = 10
a[4] = 14
a[5] = 18
a[6] = 22
a[7] = 26
a[8] = 30
a[9] = 34
a[10] = 38
a[11] = 42
note that a[n] = a[n] + a[n-1] + a[n-2] + a[n-3] because the chunk size is 4 thus I sum the last 4 items
I need to have the method without a nested loop
for(int i=0; i<12; i++)
{
for(int k = i; k>=0 ;k--)
{
// do sumation
counter++;
if(counter==4)
break;
}
}
for example i don't want to have something like that... in order to make code more efficient
also the chunck size may change so I cannot do:
a[3] = a[0] + a[1] + a[2] + a[3]
edit
The reason why I asked this question is because I need to implement check sum rolling for my data structures class. I basically open a file for reading. I then have a byte array. then I will perform a hash function on parts of the file. lets say the file is 100 bytes. I split it in chunks of 10 bytes. I perform a hash function in each chunck thus I get 10 hashes. then I need to compare those hashes with a second file that is similar. let's say the second file has the same 100 bytes but with an additional 5 so it contains a total of 105 bytes. becasuse those extra bytes may have been in the middle of the file if I perform the same algorithm that I did on the first file it is not going to work. Hope I explain my self correctly. and because some files are large. it is not efficient to have a nested loop in my algorithm.
also the real rolling hashing functions are very complex. Most of them are in c++ and I have a hard time understanding them. That's why I want to create my own hashing function very simple just to demonstrate how check sum rolling works...
Edit 2
int chunckSize = 4;
int[] a = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12 }; // the bytes of the file
int[] b = new int[a.Length]; // array where we will place the checksums
int[] sum = new int[a.Length]; // array needed to avoid nested loop
for (int i = 0; i < a.Length; i++)
{
int temp = 0;
if (i == 0)
{
temp = 1;
}
sum[i] += a[i] + sum[i-1+temp];
if (i < chunckSize)
{
b[i] = sum[i];
}
else
{
b[i] = sum[i] - sum[i - chunckSize];
}
}
the problem with this algorithm is that with large files the sum will at some point be larger than int.Max thus it is not going to work....
but at least know it is more efficient. getting rid of that nested loop helped a lot!
edit 3
Based on edit two I have worked this out. It does not work with large files and also the checksum algorithm is very bad. but at least I think it explains the hashing rolling that I am trying to explain...
Part1(#"A:\fileA.txt");
Part2(#"A:\fileB.txt", null);
.....
// split the file in chuncks and return the checksums of the chuncks
private static UInt64[] Part1(string file)
{
UInt64[] hashes = new UInt64[(int)Math.Pow(2, 20)];
var stream = File.OpenRead(file);
int chunckSize = (int)Math.Pow(2, 22); // 10 => kilobite 20 => megabite 30 => gigabite etc..
byte[] buffer = new byte[chunckSize];
int bytesRead; // how many bytes where read
int counter = 0; // counter
while ( // while bytesRead > 0
(bytesRead =
(stream.Read(buffer, 0, buffer.Length)) // returns the number of bytes read or 0 if no bytes read
) > 0)
{
hashes[counter] = 0;
for (int i = 0; i < bytesRead; i++)
{
hashes[counter] = hashes[counter] + buffer[i]; // simple algorithm not realistic to perform check sum of file
}
counter++;
}// end while loop
return hashes;
}
// split the file in chuncks rolling it. In reallity this file will be on a different computer..
private static void Part2(string file, UInt64[] hash)
{
UInt64[] hashes = new UInt64[(int)Math.Pow(2, 20)];
var stream = File.OpenRead(file);
int chunckSize = (int)Math.Pow(2, 22); // chunks must be as big as in pervious method
byte[] buffer = new byte[chunckSize];
int bytesRead; // how many bytes where read
int counter = 0; // counter
UInt64[] sum = new UInt64[(int)Math.Pow(2, 20)];
while ( // while bytesRead > 0
(bytesRead =
(stream.Read(buffer, 0, buffer.Length)) // returns the number of bytes read or 0 if no bytes read
) > 0)
{
for (int i = 0; i < bytesRead; i++)
{
int temp = 0;
if (counter == 0)
temp = 1;
sum[counter] += (UInt64)buffer[i] + sum[counter - 1 + temp];
if (counter < chunckSize)
{
hashes[counter] = (UInt64)sum[counter];
}else
{
hashes[counter] = (UInt64)sum[counter] - (UInt64)sum[counter - chunckSize];
}
counter++;
}
}// end while loop
// mising to compare hashes arrays
}

Add an array r for the result, and initialize its first chunk members using a loop from 0 to chunk-1. Now observe that to get r[i+1] you can add a[i+1] to r[i], and subtract a[i-chunk+1]. Now you can do the rest of the items in a single non-nested loop:
for (int i=chunk+1 ; i < N-1 ; i++) {
r[i+1] = a[i+1] + r[i] - a[i-chunk+1];
}

You can get this down to a single for loop, though that may not be good enough. To do that, just note that c[i+1] = c[i]-a[i-k+1]+a[i+1]; where a is the original array, c is the chunky array, and k is the size of the chunks.

I understand that you want to compute a rolling hash function to hash every n-gram (where n is what you call the "chunk size"). Rolling hashing is sometimes called "recursive hashing". There is a wikipedia entry on the topic:
http://en.wikipedia.org/wiki/Rolling_hash
A common algorithm to solve this problem is Karp-Rabin. Here is some pseudo-code which you should be able to easily implement in C#:
B←37
s←empty First-In-First-Out (FIFO) structure (e.g., a linked-list)
x←0(L-bit integer)
z←0(L-bit integer)
for each character c do
append c to s
x ← (B x−B^n z + c ) mod 2^L
yield x
if length(s) = n then
remove oldest character y from s
z ← y
end if
end for
Note that because B^n is a constant, the main loop only does two multiplications, one subtraction and one addition. The "mod 2^L" operation can be done very fast (use a mask, or unsigned integers with L=32 or L=64, for example).
Specifically, your C# code might look like this where n is the "chunk" size (just set B=37, and Btothen = 37 ^ n)
r[0] = 0
for (int i=1 ; i < N ; i++) {
r[i] = a[i] + B * r[i-1] - Btothen * a[i-n];
}
Karp-Rabin is not ideal however. I wrote a paper where better solutions are discussed:
Daniel Lemire and Owen Kaser, Recursive n-gram hashing is pairwise independent, at best, Computer Speech & Language 24 (4), pages 698-710, 2010.
http://arxiv.org/abs/0705.4676
I also published the source code (Java and C++, alas no C# but it should not be hard to go from Java to C#):
https://github.com/lemire/rollinghashjava
https://github.com/lemire/rollinghashcpp

How about storing off the last chunk_size values as you step through?
Allocate an array of size chunk_size, set them all to zero, and then set the element at i % chunk_size with your current element at each iteration of i, and then add up all the values?

using System;
class Sample {
static void Main(){
int chunckSize = 4;
int[] a = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 };
int[] b = new int[a.Length];
int sum = 0;
int d = chunckSize*(chunckSize-1)/2;
foreach(var i in a){
if(i < chunckSize){
sum += i;
b[i-1]=sum;
} else {
b[i-1]=chunckSize*i -d;
}
}
Console.WriteLine(String.Join(",", b));//1,3,6,10,14,18,22,26,30,34,38,42
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Intrinsics SIMD instruction to replace values - c#

Related

C# performance - pointer to span in a hot loop

Breaking up underlying binary in Byte array into 10 or 12 bit words: C#

New image overlays previous bitmap

Fast intersection of two sorted integer arrays

Storing sum of chunks of array through one pass

Categories

Resources