How to fill byte array with junk? - c#

I am using this:
byte[] buffer = new byte[10240];
As I understand this initialize the buffer array of 10kb filled with 0s.
Whats the fastest way to fill this array (or initialize it) with junk data every time?
I need to use that array like >5000 times and fill it every time with different junk data, that's why I am looking for a fast method to do it. The array size will also have to change every time.

Answering 'the fastest way' is impossible without describing what the properties of your junk data have to be. Why isn't all zeroes valid junk data?
That said, this is a fast way to fill your array with meaningless numbers.
Random r = new Random();
r.NextBytes(buffer);
You might also look at implementing your own Linear congruential generator if Random isn't fast enough for you. They're simple to implement and fast, but won't give high quality random numbers. (It's unclear to me if you need those or not.)

If you are happy with the data being random, but being created form a random seed buffer, then you could do the following:
public class RandomBufferGenerator
{
private readonly Random _random = new Random();
private readonly byte[] _seedBuffer;
public RandomBufferGenerator(int maxBufferSize)
{
_seedBuffer = new byte[maxBufferSize];
_random.NextBytes(_seedBuffer);
}
public byte[] GenerateBufferFromSeed(int size)
{
int randomWindow = _random.Next(0, size);
byte[] buffer = new byte[size];
Buffer.BlockCopy(_seedBuffer, randomWindow, buffer, 0, size - randomWindow);
Buffer.BlockCopy(_seedBuffer, 0, buffer, size - randomWindow, randomWindow);
return buffer;
}
}
I found it to be approx 60-70 times faster then generating a random buffer from scratch each time.
START: From seed buffer.
00:00:00.009 END : From seed buffer. (Items = 5,000; Per Second = 500,776.20)
START: From scratch.
00:00:00.604 END : From scratch. (Items = 5,000; Per Second = 8,276.95)
Update
The general idea is to create a RandomBufferGenerator once, and then use this instance to generate random buffers, e.g.:
RandomBufferGenerator generator = new RandomBufferGenerator(MaxBufferSize);
byte[] randomBuffer1 = generator.GenerateBufferFromSeed(10 * 1024);
byte[] randomBuffer2 = generator.GenerateBufferFromSeed(5 * 1024);
...

Look at the System.Random.NextBytes() method

As another option to consider, Marshall.AllocHGlobal will allocate unmanaged memory. It doesn't zero out the memory, you get what happened to be there so it's very fast. Of course you now have to work with this memory using unsafe code, and if you need to pull it into the managed space you are better off with Random.NextBytes.

How junky should be the data? Do you mean random? If that is the case just use the Random class.

Related

Does converting between byte[] and MemoryStream cause overhead?

I want to know if there's overhead when converting between byte arrays and Streams (specifically MemoryStreams when using MemoryStream.ToArray() and MemoryStream(byte[]). I assume it's temporary doubling memory usage.
For example, I read as a stream, convert to bytes, and then convert to stream again.
But getting rid of that byte conversion will require a bit of a rewrite. I don't want to waste time rewriting it if it doesn't make a difference.
So, yes.. you are correct in assuming that ToArray duplicates the memory in the stream.
If you want do not want to do this (for efficiency reasons), you could modify the bytes directly in the stream. Take a look at this:
// create some bytes: 0,1,2,3,4,5,6,7...
var originalBytes = Enumerable.Range(0, 256).Select(Convert.ToByte).ToArray();
using(var ms = new MemoryStream(originalBytes)) // ms is referencing bytes array, not duplicating it
{
// var duplicatedBytes = ms.ToArray(); // copy of originalBytes array
// If you don't want to duplicate the bytes but want to
// modify the buffer directly, you could do this:
var bufRef = ms.GetBuffer();
for(var i = 0; i < bufRef.Length; ++i)
{
bufRef[i] = Convert.ToByte(bufRef[i] ^ 0x55);
}
// or this:
/*
ms.TryGetBuffer(out var buf);
for (var i = 0; i < buf.Count; ++i)
{
buf[i] = Convert.ToByte(buf[i] ^ 0x55);
}*/
// or this:
/*
for (var i = 0; i < ms.Length; ++i)
{
ms.Position = i;
var b = ms.ReadByte();
ms.Position = i;
ms.WriteByte(Convert.ToByte(b ^ 0x55));
}*/
}
// originalBytes will now be 85,84,87,86...
ETA:
Edited to add in Blindy's examples. Thanks! -- Totally forgot about GetBuffer and had no idea about TryGetBuffer
Does MemoryStream(byte[]) cause a memory copy?
No, it's a non-resizable stream, and as such no copy is necessary.
Does MemoryStream.ToArray() cause a memory copy?
Yes, by design it creates a copy of the active buffer. This is to cover the resizable case, where the buffer used by the stream is not the same buffer that was initially provided due to reallocations to increase/decrease its size.
Alternatives to MemoryStream.ToArray() that don't cause memory copy?
Sure, you have MemoryStream.TryGetBuffer (out ArraySegment<byte> buffer), which returns a segment pointing to the internal buffer, whether or not it's resizable. If it's non-resizable, it's a segment into your original array.
You also have MemoryStream.GetBuffer, which returns the entire internal buffer. Note that in the resizable case, this will be a lot larger than the actual used stream space, and you'll have to adjust for that in code.
And lastly, you don't always actually need a byte array, sometimes you just need to write it to another stream (a file, a socket, a compression stream, an Http response, etc). For this, you have MemoryStream.CopyTo[Async], which also doesn't perform any copies.

What defines the capacity of a memory stream

I was calculating the size of object(a List that is being Populated), using the following code:
long myObjectSize = 0;
System.IO.MemoryStream memoryStreamObject = new System.IO.MemoryStream();
System.Runtime.Serialization.Formatters.Binary.BinaryFormatter binaryBuffer = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
binaryBuffer.Serialize(memoryStreamObject, myListObject);
myObjectSize = memoryStreamObject.Position;
A the initial point the capacity of the memoryStreamObject was 1024
Later(after adding more elements to the list) It was shown as 2048.
And It seems to be increasing as the stream content increasing. Then what is the purpose of capacity in this scenario.?
This is caused by the internal implementation of the MemoryStream. The Capacity property is the size of the internal buffer. This make sense if the MemoryStream is created with a fixed size buffer. But in your case the MemoryStream can grow and the actual implementation doubles the size of the buffer if the buffer is too small.
Code of MemoryStream
private bool EnsureCapacity(int value)
{
if (value < 0)
{
throw new IOException(Environment.GetResourceString("IO.IO_StreamTooLong"));
}
if (value > this._capacity)
{
int num = value;
if (num < 256)
{
num = 256;
}
if (num < this._capacity * 2)
{
num = this._capacity * 2;
}
if (this._capacity * 2 > 2147483591)
{
num = ((value > 2147483591) ? value : 2147483591);
}
this.Capacity = num;
return true;
}
return false;
}
And somewhere in Write
int num = this._position + count;
// snip
if (num > this._capacity && this.EnsureCapacity(num))
The purpose of the capacity for a memory stream and for lists is that the underlying data structure is really an array, and arrays cannot be dynamically resized.
So to start with you use an array with a small(ish) size, but once you add enough data so that the array is no longer big enough you need to create a new array, copy over all the data from the old array to the new array, and then switch to using the new array from now on.
This create+copy takes time, the bigger the array, the longer the time it takes to do this. As such, if you resized the array just big enough every time, you would effectively do this every time you write to the memory stream or add a new element to the list.
Instead you have a capacity, saying "you can use up to this value before having to resize" to reduce the number of create+copy cycles you have to perform.
For instance, if you were to write one byte at a time to this array, and not have this capacity concept, every extra byte would mean one full cycle of create+copy of the entire array. Instead, with the last screenshot in your question, you can write one byte at a time, 520 more times before this create+copy cycle has to be performed.
So this is a performance optimization.
An additional bonus is that repeatedly allocating slightly larger memory blocks will eventually fragment memory so that you would risk getting "out of memory" exceptions, reducing the number of such allocations also helps to stave off this scenario.
A typical method to calculate this capacity is by just doubling it every time.

Creating a 'ChaosStream' out of a C# MemoryStream for reading random lengths of data?

Test review showed that MemoryStream always returns 'smooth' results. i.e. if we have 200 bytes being read from the MemoryStream into a 400 byte working buffer, it always returns back exactly 200 bytes in exactly one call. If we read 4000 byte into the 200 byte working buffer, it's always exactly 20 iterations of exactly 200 bytes each.
The problem is that the MemoryStream can (in real world, corner cases) represent slow stream sources (like network or file). So the read may not be so deterministically smooth. It's possible that every Read operation may return a non-deterministic number of bytes read (eg: 8, 1, 105, 20, 5, 80 ...)
So what's a good way of turning a vanilla MemoryStream into (for the lack a better word) a ChaosStream where the number of bytes read is a random number between 1 and the read request count? (Note that 0 means end-of-stream). The bytes themselves need to be the underlying bytes, just read with some randomness/jitter to them to expand test coverage.
You could always derive from MemoryStream to provide some randomness:
public class ChaosStream : MemoryStream
{
private Random random = new Random();
// Create constructors as needed to match desired MemoryStream construction
public override int Read(byte[] buffer, int offset, int count)
{
int newCount = random.Next(1, count + 1);
return base.Read(buffer, offset, newCount);
}
}

Generate Running Hash (or Checksum) in C#?

Preface:
I am doing a data-import that has a verify-commit phase. The idea is that: the first phase allows taking data from various sources and then running various insert/update/validate operations on a database. The commit is rolled back but a "verification hash/checksum" is generated. The commit phase is the same, but, if the "verification hash/checksum" is the same then the operations will be committed. (The database will be running under the appropriate isolation levels.)
Restrictions:
Input reading and operations are forward-read-once only
Do not want to pre-create a stream (e.g. writing to MemoryStream not desirable) as there may be a lot of data. (It would work on our servers/load, but pretend memory is limited.)
Do not want to "create my own". (I am aware of available code like CRC-32 by Damien which I could use/modify but would prefer something "standard".)
And what I (think I am) looking for:
A way to generate a Hash (e.g. SHA1 or MD5?) or a Checksum (e.g. CRC32 but hopefully more) based on input + operations. (The input/operations could themselves be hashed to values more fitting to the checksum generation but it would be nice just to be able to "write to steam".)
So, the question is:
How to generate a Running Hash (or Checksum) in C#?
Also, while there are CRC32 implementations that can be modified for a Running operation, what about running SHAx or MD5 hashes?
Am I missing some sort of handy Stream approach than could be used as an adapter?
(Critiques are welcome, but please also answer the above as applicable. Also, I would prefer not to deal with threads. ;-)
You can call HashAlgorithm.TransformBlock multiple times, and then calling TransformFinalBlock will give you the result of all blocks.
Chunk up your input (by reading x amount of bytes from a steam) and call TransformBlock with each chunk.
EDIT (from the msdn example):
public static void PrintHashMultiBlock(byte[] input, int size)
{
SHA256Managed sha = new SHA256Managed();
int offset = 0;
while (input.Length - offset >= size)
offset += sha.TransformBlock(input, offset, size, input, offset);
sha.TransformFinalBlock(input, offset, input.Length - offset);
Console.WriteLine("MultiBlock {0:00}: {1}", size, BytesToStr(sha.Hash));
}
Sorry I don't have any example readily available, though for you, you're basically replacing input with your own chunk, then the size would be the number of bytes in that chunk. You will have to keep track of the offset yourself.
Hashes have a build and a finalization phase. You can shove arbitrary amounts of data in during the build phase. The data can be split up as you like. Finally, you finish the hash operation and get your hash.
You can use a writable CryptoStream to write your data. This is the easiest way.
You can generate an MD5 hash using the MD5CryptoServiceProvider's ComputeHash method. It takes a stream as input.
Create a memory or file stream, write your hash inputs to that, and then call the ComputeHash method when you are done.
var myStream = new MemoryStream();
// Blah blah, write to the stream...
myStream.Position = 0;
using (var csp = new MD5CryptoServiceProvider()) {
var myHash = csp.ComputeHash(myStream);
}
EDIT: One possibility to avoid building up massive Streams is calling this over and over in a loop and XORing the results:
// Assuming we had this somewhere:
Byte[] myRunningHash = new Byte[16];
// Later on, from above:
for (var i = 0; i < 16; i++) // I believe MD5 are 16-byte arrays. Edit accordingly.
myRunningHash[i] = myRunningHash[i] ^ [myHash[i];
EDIT #2: Finally, building on #usr's answer below, you can probably use HashCore and HashFinal:
using (var csp = new MD5CryptoServiceProvider()) {
// My example here uses a foreach loop, but an
// event-driven stream-like approach is
// probably more what you are doing here.
foreach (byte[] someData in myDataThings)
csp.HashCore(someData, 0, someData.Length);
var myHash = csp.HashFinal();
}
this is the canonical way:
using System;
using System.Security.Cryptography;
using System.Text;
public void CreateHash(string sSourceData)
{
byte[] sourceBytes;
byte[] hashBytes;
//create Bytearray from source data
sourceBytes = ASCIIEncoding.ASCII.GetBytes(sSourceData);
// calculate 16 Byte Hashcode
hashBytes = new MD5CryptoServiceProvider().ComputeHash(sourceBytes);
string sOutput = ByteArrayToHexString(hashBytes);
}
static string ByteArrayToHexString(byte[] arrInput)
{
int i;
StringBuilder sOutput = new StringBuilder(arrInput.Length);
for (i = 0; i < arrInput.Length - 1; i++)
{
sOutput.Append(arrInput[i].ToString("X2"));
}
return sOutput.ToString();
}

Efficiently convert audio bytes - byte[] to short[]

I'm trying to use the XNA microphone to capture audio and pass it to an API I have that analyses the data for display purposes. However, the API requires the audio data in an array of 16 bit integers. So my question is fairly straight forward; what's the most efficient way to convert the byte array into a short array?
private void _microphone_BufferReady(object sender, System.EventArgs e)
{
_microphone.GetData(_buffer);
short[] shorts;
//Convert and pass the 16 bit samples
ProcessData(shorts);
}
Cheers,
Dave
EDIT: This is what I have come up with and seems to work, but could it be done faster?
private short[] ConvertBytesToShorts(byte[] bytesBuffer)
{
//Shorts array should be half the size of the bytes buffer, as each short represents 2 bytes (16bits)
short[] shorts = new short[bytesBuffer.Length / 2];
int currentStartIndex = 0;
for (int i = 0; i < shorts.Length - 1; i++)
{
//Convert the 2 bytes at the currentStartIndex to a short
shorts[i] = BitConverter.ToInt16(bytesBuffer, currentStartIndex);
//increment by 2, ready to combine the next 2 bytes in the buffer
currentStartIndex += 2;
}
return shorts;
}
After reading your update, I can see you need to actually copy a byte array directly into a buffer of shorts, merging bytes. Here's the relevant section from the documentation:
The byte[] buffer format used as a parameter for the SoundEffect constructor, Microphone.GetData method, and DynamicSoundEffectInstance.SubmitBuffer method is PCM wave data. Additionally, the PCM format is interleaved and in little-endian.
Now, if for some weird reason your system has BitConverter.IsLittleEndian == false, then you will need to loop through your buffer, swapping bytes as you go, to convert from little-endian to big-endian. I'll leave the code as an exercise - I am reasonably sure all the XNA systems are little-endian.
For your purposes, you can just copy the buffer directly using Marshal.Copy or Buffer.BlockCopy. Both will give you the performance of the platform's native memory copy operation, which will be extremely fast:
// Create this buffer once and reuse it! Don't recreate it each time!
short[] shorts = new short[_buffer.Length/2];
// Option one:
unsafe
{
fixed(short* pShorts = shorts)
Marshal.Copy(_buffer, 0, (IntPtr)pShorts, _buffer.Length);
}
// Option two:
Buffer.BlockCopy(_buffer, 0, shorts, 0, _buffer.Length);
This is a performance question, so: measure it!
It is worth pointing out that for measuring performance in .NET you want to do a release build and run without the debugger attached (this allows the JIT to optimise).
Jodrell's answer is worth commenting on: Using AsParallel is interesting, but it is worth checking if the cost of spinning it up is worth it. (Speculation - measure it to confirm: converting byte to short should be extremely fast, so if your buffer data is coming from shared memory and not a per-core cache, most of your cost will probably be in data transfer not processing.)
Also I am not sure that ToArray is appropriate. First of all, it may not be able to create the correct-sized array directly, having to resize the array as it builds it will make it very slow. Additionally it will always allocate the array - which is not slow itself, but adds a GC cost that you almost certainly don't want.
Edit: Based on your updated question, the code in the rest of this answer is not directly usable, as the format of the data is different. And the technique itself (a loop, safe or unsafe) is not as fast as what you can use. See my other answer for details.
So you want to pre-allocate your array. Somewhere out in your code you want a buffer like this:
short[] shorts = new short[_buffer.Length];
And then simply copy from one buffer to the other:
for(int i = 0; i < _buffer.Length; ++i)
result[i] = ((short)buffer[i]);
This should be very fast, and the JIT should be clever enough to skip one if not both of the array bounds checks.
And here's how you can do it with unsafe code: (I haven't tested this code, but it should be about right)
unsafe
{
int length = _buffer.Length;
fixed(byte* pSrc = _buffer) fixed(short* pDst = shorts)
{
byte* ps = pSrc;
short* pd = pDst;
while(pd < pd + length)
*(pd++) = (short)(*(ps++));
}
}
Now the unsafe version has the disadvantage of requiring /unsafe, and also it may actually be slower because it prevents the JIT from doing various optimisations. Once again: measure it.
(Also you can probably squeeze more performance if you try some permutations on the above examples. Measure it.)
Finally: Are you sure you want the conversion to be (short)sample? Shouldn't it be something like ((short)sample-128)*256 to take it from unsigned to signed and extend it to the correct bit-width? Update: seems I was wrong on the format here, see my other answer
The pest PLINQ I could come up with is here.
private short[] ConvertBytesToShorts(byte[] bytesBuffer)
{
//Shorts array should be half the size of the bytes buffer, as each short represents 2 bytes (16bits)
var odd = buffer.AsParallel().Where((b, i) => i % 2 != 0);
var even = buffer.AsParallell().Where((b, i) => i % 2 == 0);
return odd.Zip(even, (o, e) => {
return (short)((o << 8) | e);
}.ToArray();
}
I'm dubios about the performance but with enough data and processors who knows.
If the conversion operation is wrong ((short)((o << 8) | e)) please change to suit.

Categories