I'm trying to use the XNA microphone to capture audio and pass it to an API I have that analyses the data for display purposes. However, the API requires the audio data in an array of 16 bit integers. So my question is fairly straight forward; what's the most efficient way to convert the byte array into a short array?
private void _microphone_BufferReady(object sender, System.EventArgs e)
{
_microphone.GetData(_buffer);
short[] shorts;
//Convert and pass the 16 bit samples
ProcessData(shorts);
}
Cheers,
Dave
EDIT: This is what I have come up with and seems to work, but could it be done faster?
private short[] ConvertBytesToShorts(byte[] bytesBuffer)
{
//Shorts array should be half the size of the bytes buffer, as each short represents 2 bytes (16bits)
short[] shorts = new short[bytesBuffer.Length / 2];
int currentStartIndex = 0;
for (int i = 0; i < shorts.Length - 1; i++)
{
//Convert the 2 bytes at the currentStartIndex to a short
shorts[i] = BitConverter.ToInt16(bytesBuffer, currentStartIndex);
//increment by 2, ready to combine the next 2 bytes in the buffer
currentStartIndex += 2;
}
return shorts;
}
After reading your update, I can see you need to actually copy a byte array directly into a buffer of shorts, merging bytes. Here's the relevant section from the documentation:
The byte[] buffer format used as a parameter for the SoundEffect constructor, Microphone.GetData method, and DynamicSoundEffectInstance.SubmitBuffer method is PCM wave data. Additionally, the PCM format is interleaved and in little-endian.
Now, if for some weird reason your system has BitConverter.IsLittleEndian == false, then you will need to loop through your buffer, swapping bytes as you go, to convert from little-endian to big-endian. I'll leave the code as an exercise - I am reasonably sure all the XNA systems are little-endian.
For your purposes, you can just copy the buffer directly using Marshal.Copy or Buffer.BlockCopy. Both will give you the performance of the platform's native memory copy operation, which will be extremely fast:
// Create this buffer once and reuse it! Don't recreate it each time!
short[] shorts = new short[_buffer.Length/2];
// Option one:
unsafe
{
fixed(short* pShorts = shorts)
Marshal.Copy(_buffer, 0, (IntPtr)pShorts, _buffer.Length);
}
// Option two:
Buffer.BlockCopy(_buffer, 0, shorts, 0, _buffer.Length);
This is a performance question, so: measure it!
It is worth pointing out that for measuring performance in .NET you want to do a release build and run without the debugger attached (this allows the JIT to optimise).
Jodrell's answer is worth commenting on: Using AsParallel is interesting, but it is worth checking if the cost of spinning it up is worth it. (Speculation - measure it to confirm: converting byte to short should be extremely fast, so if your buffer data is coming from shared memory and not a per-core cache, most of your cost will probably be in data transfer not processing.)
Also I am not sure that ToArray is appropriate. First of all, it may not be able to create the correct-sized array directly, having to resize the array as it builds it will make it very slow. Additionally it will always allocate the array - which is not slow itself, but adds a GC cost that you almost certainly don't want.
Edit: Based on your updated question, the code in the rest of this answer is not directly usable, as the format of the data is different. And the technique itself (a loop, safe or unsafe) is not as fast as what you can use. See my other answer for details.
So you want to pre-allocate your array. Somewhere out in your code you want a buffer like this:
short[] shorts = new short[_buffer.Length];
And then simply copy from one buffer to the other:
for(int i = 0; i < _buffer.Length; ++i)
result[i] = ((short)buffer[i]);
This should be very fast, and the JIT should be clever enough to skip one if not both of the array bounds checks.
And here's how you can do it with unsafe code: (I haven't tested this code, but it should be about right)
unsafe
{
int length = _buffer.Length;
fixed(byte* pSrc = _buffer) fixed(short* pDst = shorts)
{
byte* ps = pSrc;
short* pd = pDst;
while(pd < pd + length)
*(pd++) = (short)(*(ps++));
}
}
Now the unsafe version has the disadvantage of requiring /unsafe, and also it may actually be slower because it prevents the JIT from doing various optimisations. Once again: measure it.
(Also you can probably squeeze more performance if you try some permutations on the above examples. Measure it.)
Finally: Are you sure you want the conversion to be (short)sample? Shouldn't it be something like ((short)sample-128)*256 to take it from unsigned to signed and extend it to the correct bit-width? Update: seems I was wrong on the format here, see my other answer
The pest PLINQ I could come up with is here.
private short[] ConvertBytesToShorts(byte[] bytesBuffer)
{
//Shorts array should be half the size of the bytes buffer, as each short represents 2 bytes (16bits)
var odd = buffer.AsParallel().Where((b, i) => i % 2 != 0);
var even = buffer.AsParallell().Where((b, i) => i % 2 == 0);
return odd.Zip(even, (o, e) => {
return (short)((o << 8) | e);
}.ToArray();
}
I'm dubios about the performance but with enough data and processors who knows.
If the conversion operation is wrong ((short)((o << 8) | e)) please change to suit.
Related
So I've been trying to add bits of a value to a MemoryStream but the issue is I have no idea how. I've seen that it's used for performance when it comes to networking.
I know I want a function that takes the bit value and how many bits it takes to store that value. So for instance, to store the value 3 I would need to allocate 2 bits 0000 0000 0000 0011. I would essentially pack the bits into a byte array and then add that byte array to the MemoryStream
var ms = new MemoryStream();
ms.WriteByte(1);
ms.WriteByte(1);
ms.WriteByte(1);
ms.WriteByte(1);
ms.WriteByte(1);
WriteBits(2, 3);
WriteBits(1, 1);
void WriteBits(int numbBits, int value)
{
/* Convert the "value" to a byte or bytes and add it to the MemoryStream */
}
How do I properly implement this?
Java Example
public void writeBits(int numBits, int value) {
int bytePos = bitPosition >> 3;
int bitOffset = 8 - (bitPosition & 7);
bitPosition += numBits;
for (; numBits > bitOffset; bitOffset = 8) {
buffer[bytePos] &= ~bitMaskOut[bitOffset]; // mask out the desired area
buffer[bytePos++] |= (value >> (numBits - bitOffset))
& bitMaskOut[bitOffset];
numBits -= bitOffset;
}
if (numBits == bitOffset) {
buffer[bytePos] &= ~bitMaskOut[bitOffset];
buffer[bytePos] |= value & bitMaskOut[bitOffset];
} else {
buffer[bytePos] &= ~(bitMaskOut[numBits] << (bitOffset - numBits));
buffer[bytePos] |= (value & bitMaskOut[numBits]) << (bitOffset - numBits);
}
}
So I've been trying to add bits of a value to a MemoryStream
You don't, MemoryStream only handles bytes.
So for instance, to store the value 3 I would need to allocate 2 bits
This would only be true if the range of values you want to store is [0, 3]. If you want the possibility of storing any larger value you need more bits.
How do I properly implement this?
you would need to implement your own bit-stream. The java example looks like it has a byte[] buffer, and a bitPosition. You would need to implement this. The bif-fiddeling code looks like it should work just about the same in c#. Once you have a byte[] it is trivial to write this out to whatever stream you want, and usually possible to send directly over the network.
I've seen that it's used for performance when it comes to networking
I think there is a significant misunderstanding here. While you could manually manipulate individual bits, in most cases it would just be a waste of (development) time.
In general, a better way to get good performance is to use existing, well optimized and designed libraries. And there are a variety of serialization libraries that converts objects to byte-streams for you. An example would be protobuf (.net), this actually encodes numbers with a variable number of bytes.
If you still need smaller data it is usually more efficient to use some form of compression. The old classic deflate usually gives good compromise between size and performance, while algorithms like lz4 prioritizes speed over compression ratio.
I had exactly the same problem and wrote an entire BitStream library which can handle any reads and writes of an arbitrary number of bits to a MemoryStream (and any other stream, too). The library is open-source, MIT-licensed and fast (https://github.com/martinweihrauch/BitStream).
Writing bits to a MemoryStream.
These are the steps to write a value to a certain number of bits to a specific position in the MemoryStream:
Have a Stream available, e. g. a MemoryStream(), to which you want to write.
Connect this Stream to a new Bitstream
using SharpBitStream;
uint[] testDataUnsigned = { 5, 62, 17, 50, 33 };
var ms = new MemoryStream();
var bs = new BitStream(ms);
Now, you can start writing to the BitStream like this:
foreach(var bits in testDataUnsigned)
{
bs.WriteUnsigned(6, (ulong)bits);
}
Writing can be done as above by only providing the bitlength and the value, but you of course also have full controll of exactly where to write the bits like so:
bs.WriteUnsigned(3, 2, 4, 5);
// Overloaded signature of WriteUnsigned:
// public void WriteUnsigned(long offsetByteStream, int offsetBit, int bitLength, ulong value)
// For signed numbers (e. g. -17), use
// bs.WriteSigned(3, 2, 4, -5);
This means, you can control that you write to the 4th byte (3, because starting at 0) in the underlying byte Stream,
starting from the the 3rd (=2) position of the byte with a length of 6 bits and the value 5 (=0b0101);
Reading works similarly:
Just read the next 6 bits, wherever your byte and bit position is (e. g. for loops, etc):
ulong number = bs.ReadUnsigned(6);
// For Signed, use
// long number = bs.ReadSigned(6);
Read a specific position, in this example read 4 bits from 3rd byte in Stream (2= 3rd position), starting with bit #0:
ulong number = bs.ReadUnsigned(2, 0, 4);
// For signed, use
// long number = bs.ReadSigned(2, 0, 4);
Note: The bit offset is always counting from 0 from the left-most position.
I have byte[] byteArray, usually byteArray.Length = 1-3
I need decompose an array into bits, take some bits (for example, 5-17), and convert these bits to Int32.
I tried to do this
private static IEnumerable<bool> GetBitsStartingFromLSB(byte b)
{
for (int i = 0; i < 8; i++)
{
yield return (b % 2 != 0);
b = (byte)(b >> 1);
}
}
public static Int32 Bits2Int(ref byte[] source, int offset, int length)
{
List<bool> bools = source.SelectMany(GetBitsStartingFromLSB).ToList();
bools = bools.GetRange(offset, length);
bools.AddRange(Enumerable.Repeat(false, 32-length).ToList() );
int[] array = new int[1];
(new BitArray(bools.ToArray())).CopyTo(array, 0);
return array[0];
}
But this method is too slow, and I have to call it very often.
How can I do this more efficiently?
Thanx a lot! Now i do this:
public static byte[] GetPartOfByteArray( byte[] source, int offset, int length)
{
byte[] retBytes = new byte[length];
Buffer.BlockCopy(source, offset, retBytes, 0, length);
return retBytes;
}
public static Int32 Bits2Int(byte[] source, int offset, int length)
{
if (source.Length > 4)
{
source = GetPartOfByteArray(source, offset / 8, (source.Length - offset / 8 > 3 ? 4 : source.Length - offset / 8));
offset -= 8 * (offset / 8);
}
byte[] intBytes = new byte[4];
source.CopyTo(intBytes, 0);
Int32 full = BitConverter.ToInt32(intBytes);
Int32 mask = (1 << length) - 1;
return (full >> offset) & mask;
}
And it works very fast!
If you're after "fast", then ultimately you need to do this with bit logic, not LINQ etc. I'm not going to write actual code, but you'd need to:
use your offset with / 8 and % 8 to find the starting byte and the bit-offset inside that byte
compose however many bytes you need - quite possibly up to 5 if you are after a 32-bit number (because of the possibility of an offset)
; for example into a long, in whichever endianness (presumably big-endian?) you expect
use right-shift (>>) on the composed value to drop however-many bits you need to apply the bit-offset (i.e. value >>= offset % 8;)
mask out any bits you don't want; for example value &= ~(-1L << length); (the -1 gives you all-ones; the << length creates length zeros at the right hand edge, and the ~ swaps all zeros for ones and ones for zeros, so you now have length ones at the right hand edge)
if the value is signed, you'll need to think about how you want negatives to be handled, especially if you aren't always reading 32 bits
First of all, you're asking for optimization. But the only things you've said are:
too slow
need to call it often
There's no information on:
how much slow is too slow? have you measured current code? have you estimated how fast you need it to be?
how often is "often"?
how large are the source byt arrays?
etc.
Optimization can be done in a multitude of ways. When asking for optimization, everything is important. For example, if source byte[] is 1 or 2 bytes long (yeah, may be ridiculous, but you didn't tell us), and if it rarely changes, then you could get very nice results by caching results. And so on.
So, no solutions from me, just a list of possible performance problems:
private static IEnumerable<bool> GetBitsStartingFromLSB(byte b) // A
{
for (int i = 0; i < 8; i++)
{
yield return (b % 2 != 0); // A
b = (byte)(b >> 1);
}
}
public static Int32 Bits2Int(ref byte[] source, int offset, int length)
{
List<bool> bools = source.SelectMany(GetBitsStartingFromLSB).ToList(); //A,B
bools = bools.GetRange(offset, length); //B
bools.AddRange(Enumerable.Repeat(false, 32-length).ToList() ); //C
int[] array = new int[1]; //D
(new BitArray(bools.ToArray())).CopyTo(array, 0); //D
return array[0]; //D
}
A: LINQ is fun, but not fast unless done carefully. For each input byte, it takes 1 byte, splits that in 8 bools, passing them around wrapped it in a compiler-generated IEnumerable object *). Note that it all needs to be cleaned up later, too. Probably you'd get a better performance simply returning a new bool[8] or even BitArray(size=8).
*) conceptually. In fact yield-return is lazy, so it's not 8valueobj+1refobj, but just 1 enumerable that generates items. But then, you're doing .ToList() in (B), so me writing this in that way isn't that far from truth.
A2: the 8 is hardcoded. Once you drop that pretty IEnumerable and change it to a constant-sized array-like thing, you can preallocate that array and pass it via parameter to GetBitsStartingFromLSB to further reduce the amount of temporary objects created and later thrown away. And since SelectMany visits items one-by-one without ever going back, that preallocated array can be reused.
B: Converts whole Source array to stream of bytes, converts it to List. Then discards that whole list except for a small offset-length range of that list. Why covert to list at all? It's just another pack of objects wasted, and internal data is copied as well, since bool is a valuetype. You could have taken the range directly from IEnumerable by .Skip(X).Take(Y)
C: padding a list of bools to have 32 items. AddRange/Repeat is fun, but Repeat has to return an IEnumerable. It's again another object that is created and throw away. You're padding the list with false. Drop the list idea, make it an bool[32]. Or BitArray(32). They start with false automatically. That's the default value of a bool. Iterate over the those bits from 'range' A+B and write them into that array by index. Those written will have their value, those unwritten will stay false. Job done, no objects wasted.
C2: connect preallocating 32-item array with A+A2. GetBitsStartingFromLSB doesn't need to return anything, it may get a buffer to be filled via parameter. And that buffer doesn't need to be 8-item buffer. You may pass the whole 32-item final array, and pass an offset so that function knows exactly where to write. Even less objects wasted.
D: finally, all that work to return selected bits as an integer. new temporary array is created&wasted, new BitArray is effectively created&wasted too. Note that earlier you were already doing manual bit-shift conversion int->bits in GetBitsStartingFromLSB, why not just create a similar method that will do some shifts and do bits->int instead? If you knew the order of the bits, now you know them as well. No need for array&BitArray, some code wiggling, and you save on that allocations and data copying again.
I have no idea how much time/space/etc will that save for you, but that's just a few points that stand out at first glance, without modifying your original idea for the code too much, without doing-it-all via math&bitshifts in one go, etc. I've seen MarcGravell already wrote you some hints too. If you have time to spare, I suggest you try first mine, one by one, and see how (and if at all !) each change affects performance. Just to see. Then you'll probably scrap it all and try again new "do-it-all via math&bitshifts in one go" version with Marc's hints.
I am using a Microsoft example for interprocess communication. In the example there are two methods for reading/writing a string to/from a stream. The code sends the length of the string being streamed in the data. I need similar code, but need to make some modifications. An explanation of the highlighted lines would be helpful.
In WriteString(), they taken the length of the byte array being written and divide it by 256. The opposite is done in ReadString(), but an explanation as to why 256 is used would be great. Then, it writes another byte by taking the length and & it with 255. I don't understand the reasoning for this either. I'm thinking it shifts the value, but I don't really understand why this is needed. And then in ReadString() it does a += to the length by reading a byte. An explanation of this would be really helpful. I'm new to streaming and just want to understand exactly what is happening and why.
public string ReadString()
{
int len = 0;
// the next two lines
len = ioStream.ReadByte() * 256;
len += ioStream.ReadByte();
byte[] inBuffer = new byte[len];
ioStream.Read(inBuffer, 0, len);
return streamEncoding.GetString(inBuffer);
}
public int WriteString(string outString)
{
byte[] outBuffer = streamEncoding.GetBytes(outString);
int len = outBuffer.Length;
if (len > UInt16.MaxValue)
{
len = (int)UInt16.MaxValue;
}
// the next to lines
ioStream.WriteByte((byte)(len / 256));
ioStream.WriteByte((byte)(len & 255));
ioStream.Write(outBuffer, 0, len);
ioStream.Flush();
return outBuffer.Length + 2;
}
This code is bad, find a new tutorial.
The 256 stuff is there to convert the low 2 bytes of the length integer to bytes in order to serialize/deserialize them. This is not how it's normally done. Use BinaryReader/Writer or code not based on multiplication but on binary and and shift.
Dividing by 256 is equivalent to x >> 8. Also this only works with positive integers. x & 255 is used to take the lowest byte. This could simply be (byte)x. Sometimes people write x % 256 for this which not idiomatic and has problems with signedness.
Good code would be new byte[] { (byte)(x >> 8), (byte)(x >> 0) } and x = bytes[1] << 8 | bytes[0]. Much simpler, faster and idiomatic. I like writing >> 0 which does nothing for the sake of symmetry. It's optimized away. This might seem ridiculous for 16 bit ints but with longer ints there are 4 or 8 components and having one of them slightly off seems like needless inconsistency.
ioStream.Read(inBuffer, 0, len);
is a bug because it assumes the read completes in one chunk. Need a loop or again BinaryReader.
if (len > UInt16.MaxValue)
{
len = (int)UInt16.MaxValue;
}
I'm going to use this opportunity to warn anyone who reads this: Microsoft .NET sample code often is of extremely poor quality. Read it with great skepticism.
I'm currently trying to get this C code converted into C#.
Since I'm not really familiar with C I'd really apprecheate your help!
static unsigned char byte_table[2080] = {0};
First of, some bytearray gets declared but never filled which I'm okay with
BYTE* packet = //bytes come in here from a file
int unknownVal = 0;
int unknown_field0 = *(DWORD *)(packet + 0x08);
do
{
*((BYTE *)packet + i) ^= byte_table[(i + unknownVal) & 0x7FF];
++i;
}
while (i <= packet[0]);
But down here.. I really have no idea how to translate this into C#
BYTE = byte[] right?
DWORD = double?
but how can (packet + 0x08) be translated? How can I add a hex to a bytearray? Oo
I'd be happy about anything that helps! :)
In C, setting any set of memory to {0} will set the entire memory area to zeroes, if I'm not mistaken.
That bottom loop can be rewritten in a simpler, C# friendly fashion.
byte[] packet = arrayofcharsfromfile;
int field = packet[8]+(packet[9]<<8)+(packet[10]<<16)+(packet[11]<<24); //Assuming 32 bit little endian integer
int unknownval = 0;
int i = 0;
do //Why waste the newline? I don't know. Conventions are silly!
{
packet[i] ^= byte_table[(i+unknownval) & 0x7FF];
} while( ++i <= packet[0] );
field is set by taking the four bytes including and following index 8 and generating a 32 bit int from them.
In C, you can cast pointers to other types, as is done in your provided snippet. What they're doing is taking an array of bytes (each one 1/4 the size of a DWORD) and adding 8 to the index which advances the pointer by 8 bytes (since each element is a byte wide) and then treating that pointer as a DWORD pointer. In simpler terms, they're turning the byte array in to a DWORD array, and then taking index 2, as 8/4=2.
You can simulate this behavior in a safe fashion by stringing the bytes together with bitshifting and addition, as I demonstrated above. It's not as efficient and isn't as pretty, but it accomplishes the same thing, and in a platform agnostic way too. Not all platforms are little endian.
I am trying to write a function to determine whether two equal-size bitmaps are identical or not. The function I have right now simply compares a pixel at a time in each bitmap, returning false at the first non-equal pixel.
While this works, and works well for small bitmaps, in production I'm going to be using this in a tight loop and on larger images, so I need a better way. Does anyone have any recommendations?
The language I'm using is C# by the way - and yes, I am already using the .LockBits method. =)
Edit: I've coded up implementations of some of the suggestions given, and here are the benchmarks. The setup: two identical (worst-case) bitmaps, 100x100 in size, with 10,000 iterations each. Here are the results:
CompareByInts (Marc Gravell) : 1107ms
CompareByMD5 (Skilldrick) : 4222ms
CompareByMask (GrayWizardX) : 949ms
In CompareByInts and CompareByMask I'm using pointers to access the memory directly; in the MD5 method I'm using Marshal.Copy to retrieve a byte array and pass that as an argument to MD5.ComputeHash. CompareByMask is only slightly faster, but given the context I think any improvement is useful.
Thanks everyone. =)
Edit 2: Forgot to turn optimizations on - doing that gives GrayWizardX's answer even more of a boost:
CompareByInts (Marc Gravell) : 944ms
CompareByMD5 (Skilldrick) : 4275ms
CompareByMask (GrayWizardX) : 630ms
CompareByMemCmp (Erik) : 105ms
Interesting that the MD5 method didn't improve at all.
Edit 3: Posted my answer (MemCmp) which blew the other methods out of the water. o.O
Edit 8-31-12: per Joey's comment below, be mindful of the format of the bitmaps you compare. They may contain padding on the strides that render the bitmaps unequal, despite being equivalent pixel-wise. See this question for more details.
Reading this answer to a question regarding comparing byte arrays has yielded a MUCH FASTER method: using P/Invoke and the memcmp API call in msvcrt. Here's the code:
[DllImport("msvcrt.dll")]
private static extern int memcmp(IntPtr b1, IntPtr b2, long count);
public static bool CompareMemCmp(Bitmap b1, Bitmap b2)
{
if ((b1 == null) != (b2 == null)) return false;
if (b1.Size != b2.Size) return false;
var bd1 = b1.LockBits(new Rectangle(new Point(0, 0), b1.Size), ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);
var bd2 = b2.LockBits(new Rectangle(new Point(0, 0), b2.Size), ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);
try
{
IntPtr bd1scan0 = bd1.Scan0;
IntPtr bd2scan0 = bd2.Scan0;
int stride = bd1.Stride;
int len = stride * b1.Height;
return memcmp(bd1scan0, bd2scan0, len) == 0;
}
finally
{
b1.UnlockBits(bd1);
b2.UnlockBits(bd2);
}
}
If you are trying to determine if they are 100% equal, you can invert one and add it to the other if its zero they are identical. Extending this using unsafe code, take 64 bits at a time as a long and do the math that way, any differences can cause an immediate fail.
If the images are not 100% identical (comparing png to jpeg), or if you are not looking for a 100% match then you have some more work ahead of you.
Good luck.
Well, you're using .LockBits, so presumably you're using unsafe code. Rather than treating each row origin (Scan0 + y * Stride) as a byte*, consider treating it as an int*; int arithmetic is pretty quick, and you only have to do 1/4 as much work. And for images in ARGB you might still be talking in pixels, making the math simple.
Could you take a hash of each and compare? It would be slightly probabilistic, but practically not.
Thanks to Ram, here's a sample implementation of this technique.
If the original problem is just to find the exact duplicates among two bitmaps, then just a bit level comparison will have to do. I don't know C# but in C I would use the following function:
int areEqual (long size, long *a, long *b)
{
long start = size / 2;
long i;
for (i = start; i != size; i++) { if (a[i] != b[i]) return 0 }
for (i = 0; i != start; i++) { if (a[i] != b[i]) return 0 }
return 1;
}
I would start looking in the middle because I suspect there is a much better chance of finding unequal bits near the middle of the image than the beginning; of course, this would really depend on the images you are deduping, selecting a random place to start may be best.
If you are trying to find the exact duplicates among hundreds of images then comparing all pairs of them is unnecessary. First compute the MD5 hash of each image and place it in a list of pairs (md5Hash, imageId); then sort the list by the m5Hash. Next, only do pairwise comparisons on the images that have the same md5Hash.
If these bitmaps are already on your graphics card then you can parallelize such a check by doing it on the graphics card using a language like CUDA or OpenCL.
I'll explain in terms of CUDA, since that's the one I know. Basically CUDA lets you write general purpose code to run in parallel across each node of your graphics card. You can access bitmaps that are in shared memory. Each invocation of the function is also given an index within the set of parallel runs. So, for a problem like this, you'd just run one of the above comparison functions for some subset of the bitmap - using parallelization to cover the entire bitmap. Then, just write a 1 to a certain memory location if the comparison fails (and write nothing if it succeeds).
If you don't already have the bitmaps on your graphics card, this probably isn't the way to go, since the costs for loading the two bitmaps on your card will easily eclipse the savings such parallelization will gain you.
Here's some (pretty bad) example code (it's been a little while since I programmed CUDA). There's better ways to access bitmaps that are already loaded as textures, but I didn't bother here.
// kernel to run on GPU, once per thread
__global__ void compare_bitmaps(long const * const A, long const * const B, char * const retValue, size_t const len)
{
// divide the work equally among the threads (each thread is in a block, each block is in a grid)
size_t const threads_per_block = blockDim.x * blockDim.y * blockDim.z;
size_t const len_to_compare = len / (gridDim.x * gridDim.y * gridDim.z * threads_per_block);
# define offset3(idx3,dim3) (idx3.x + dim3.x * (idx3.y + dim3.y * idx3.z))
size_t const start_offset = len_to_compare * (offset3(threadIdx,blockDim) + threads_per_block * offset3(blockIdx,gridDim));
size_t const stop_offset = start_offset + len_to_compare;
# undef offset3
size_t i;
for (i = start_offset; i < stop_offset; i++)
{
if (A[i] != B[i])
{
*retValue = 1;
break;
}
}
return;
}
If you can implement something like Duff's Device in your language, that might give you a significant speed boost over a simple loop. Usually it's used for copying data, but there's no reason it can't be used for comparing data instead.
Or, for that matter, you may just want to use some equivalent to memcmp().
You could try to add them to a database "blob" then use the database engine to compare their binaries. This would only give you a yes or no answer to whether the binary data is the same. It would be very easy to make 2 images that produce the same graphic but have different binary though.
You could also select a few random pixels and compare them, then if they are the same continue with more until you've checked all the pixels. This would only return a faster negative match though, it still would take as long to find 100% positive matches
Based on the approach of comparing hashes instead of comparing every single pixel, this is what I use:
public static class Utils
{
public static byte[] ShaHash(this Image image)
{
var bytes = new byte[1];
bytes = (byte[])(new ImageConverter()).ConvertTo(image, bytes.GetType());
return (new SHA256Managed()).ComputeHash(bytes);
}
public static bool AreEqual(Image imageA, Image imageB)
{
if (imageA.Width != imageB.Width) return false;
if (imageA.Height != imageB.Height) return false;
var hashA = imageA.ShaHash();
var hashB = imageB.ShaHash();
return !hashA
.Where((nextByte, index) => nextByte != hashB[index])
.Any();
}
]
Usage is straight forward:
bool isMatch = Utils.AreEqual(bitmapOne, bitmapTwo);