I was calculating the size of object(a List that is being Populated), using the following code:
long myObjectSize = 0;
System.IO.MemoryStream memoryStreamObject = new System.IO.MemoryStream();
System.Runtime.Serialization.Formatters.Binary.BinaryFormatter binaryBuffer = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
binaryBuffer.Serialize(memoryStreamObject, myListObject);
myObjectSize = memoryStreamObject.Position;
A the initial point the capacity of the memoryStreamObject was 1024
Later(after adding more elements to the list) It was shown as 2048.
And It seems to be increasing as the stream content increasing. Then what is the purpose of capacity in this scenario.?
This is caused by the internal implementation of the MemoryStream. The Capacity property is the size of the internal buffer. This make sense if the MemoryStream is created with a fixed size buffer. But in your case the MemoryStream can grow and the actual implementation doubles the size of the buffer if the buffer is too small.
Code of MemoryStream
private bool EnsureCapacity(int value)
{
if (value < 0)
{
throw new IOException(Environment.GetResourceString("IO.IO_StreamTooLong"));
}
if (value > this._capacity)
{
int num = value;
if (num < 256)
{
num = 256;
}
if (num < this._capacity * 2)
{
num = this._capacity * 2;
}
if (this._capacity * 2 > 2147483591)
{
num = ((value > 2147483591) ? value : 2147483591);
}
this.Capacity = num;
return true;
}
return false;
}
And somewhere in Write
int num = this._position + count;
// snip
if (num > this._capacity && this.EnsureCapacity(num))
The purpose of the capacity for a memory stream and for lists is that the underlying data structure is really an array, and arrays cannot be dynamically resized.
So to start with you use an array with a small(ish) size, but once you add enough data so that the array is no longer big enough you need to create a new array, copy over all the data from the old array to the new array, and then switch to using the new array from now on.
This create+copy takes time, the bigger the array, the longer the time it takes to do this. As such, if you resized the array just big enough every time, you would effectively do this every time you write to the memory stream or add a new element to the list.
Instead you have a capacity, saying "you can use up to this value before having to resize" to reduce the number of create+copy cycles you have to perform.
For instance, if you were to write one byte at a time to this array, and not have this capacity concept, every extra byte would mean one full cycle of create+copy of the entire array. Instead, with the last screenshot in your question, you can write one byte at a time, 520 more times before this create+copy cycle has to be performed.
So this is a performance optimization.
An additional bonus is that repeatedly allocating slightly larger memory blocks will eventually fragment memory so that you would risk getting "out of memory" exceptions, reducing the number of such allocations also helps to stave off this scenario.
A typical method to calculate this capacity is by just doubling it every time.
Related
I have byte[] byteArray, usually byteArray.Length = 1-3
I need decompose an array into bits, take some bits (for example, 5-17), and convert these bits to Int32.
I tried to do this
private static IEnumerable<bool> GetBitsStartingFromLSB(byte b)
{
for (int i = 0; i < 8; i++)
{
yield return (b % 2 != 0);
b = (byte)(b >> 1);
}
}
public static Int32 Bits2Int(ref byte[] source, int offset, int length)
{
List<bool> bools = source.SelectMany(GetBitsStartingFromLSB).ToList();
bools = bools.GetRange(offset, length);
bools.AddRange(Enumerable.Repeat(false, 32-length).ToList() );
int[] array = new int[1];
(new BitArray(bools.ToArray())).CopyTo(array, 0);
return array[0];
}
But this method is too slow, and I have to call it very often.
How can I do this more efficiently?
Thanx a lot! Now i do this:
public static byte[] GetPartOfByteArray( byte[] source, int offset, int length)
{
byte[] retBytes = new byte[length];
Buffer.BlockCopy(source, offset, retBytes, 0, length);
return retBytes;
}
public static Int32 Bits2Int(byte[] source, int offset, int length)
{
if (source.Length > 4)
{
source = GetPartOfByteArray(source, offset / 8, (source.Length - offset / 8 > 3 ? 4 : source.Length - offset / 8));
offset -= 8 * (offset / 8);
}
byte[] intBytes = new byte[4];
source.CopyTo(intBytes, 0);
Int32 full = BitConverter.ToInt32(intBytes);
Int32 mask = (1 << length) - 1;
return (full >> offset) & mask;
}
And it works very fast!
If you're after "fast", then ultimately you need to do this with bit logic, not LINQ etc. I'm not going to write actual code, but you'd need to:
use your offset with / 8 and % 8 to find the starting byte and the bit-offset inside that byte
compose however many bytes you need - quite possibly up to 5 if you are after a 32-bit number (because of the possibility of an offset)
; for example into a long, in whichever endianness (presumably big-endian?) you expect
use right-shift (>>) on the composed value to drop however-many bits you need to apply the bit-offset (i.e. value >>= offset % 8;)
mask out any bits you don't want; for example value &= ~(-1L << length); (the -1 gives you all-ones; the << length creates length zeros at the right hand edge, and the ~ swaps all zeros for ones and ones for zeros, so you now have length ones at the right hand edge)
if the value is signed, you'll need to think about how you want negatives to be handled, especially if you aren't always reading 32 bits
First of all, you're asking for optimization. But the only things you've said are:
too slow
need to call it often
There's no information on:
how much slow is too slow? have you measured current code? have you estimated how fast you need it to be?
how often is "often"?
how large are the source byt arrays?
etc.
Optimization can be done in a multitude of ways. When asking for optimization, everything is important. For example, if source byte[] is 1 or 2 bytes long (yeah, may be ridiculous, but you didn't tell us), and if it rarely changes, then you could get very nice results by caching results. And so on.
So, no solutions from me, just a list of possible performance problems:
private static IEnumerable<bool> GetBitsStartingFromLSB(byte b) // A
{
for (int i = 0; i < 8; i++)
{
yield return (b % 2 != 0); // A
b = (byte)(b >> 1);
}
}
public static Int32 Bits2Int(ref byte[] source, int offset, int length)
{
List<bool> bools = source.SelectMany(GetBitsStartingFromLSB).ToList(); //A,B
bools = bools.GetRange(offset, length); //B
bools.AddRange(Enumerable.Repeat(false, 32-length).ToList() ); //C
int[] array = new int[1]; //D
(new BitArray(bools.ToArray())).CopyTo(array, 0); //D
return array[0]; //D
}
A: LINQ is fun, but not fast unless done carefully. For each input byte, it takes 1 byte, splits that in 8 bools, passing them around wrapped it in a compiler-generated IEnumerable object *). Note that it all needs to be cleaned up later, too. Probably you'd get a better performance simply returning a new bool[8] or even BitArray(size=8).
*) conceptually. In fact yield-return is lazy, so it's not 8valueobj+1refobj, but just 1 enumerable that generates items. But then, you're doing .ToList() in (B), so me writing this in that way isn't that far from truth.
A2: the 8 is hardcoded. Once you drop that pretty IEnumerable and change it to a constant-sized array-like thing, you can preallocate that array and pass it via parameter to GetBitsStartingFromLSB to further reduce the amount of temporary objects created and later thrown away. And since SelectMany visits items one-by-one without ever going back, that preallocated array can be reused.
B: Converts whole Source array to stream of bytes, converts it to List. Then discards that whole list except for a small offset-length range of that list. Why covert to list at all? It's just another pack of objects wasted, and internal data is copied as well, since bool is a valuetype. You could have taken the range directly from IEnumerable by .Skip(X).Take(Y)
C: padding a list of bools to have 32 items. AddRange/Repeat is fun, but Repeat has to return an IEnumerable. It's again another object that is created and throw away. You're padding the list with false. Drop the list idea, make it an bool[32]. Or BitArray(32). They start with false automatically. That's the default value of a bool. Iterate over the those bits from 'range' A+B and write them into that array by index. Those written will have their value, those unwritten will stay false. Job done, no objects wasted.
C2: connect preallocating 32-item array with A+A2. GetBitsStartingFromLSB doesn't need to return anything, it may get a buffer to be filled via parameter. And that buffer doesn't need to be 8-item buffer. You may pass the whole 32-item final array, and pass an offset so that function knows exactly where to write. Even less objects wasted.
D: finally, all that work to return selected bits as an integer. new temporary array is created&wasted, new BitArray is effectively created&wasted too. Note that earlier you were already doing manual bit-shift conversion int->bits in GetBitsStartingFromLSB, why not just create a similar method that will do some shifts and do bits->int instead? If you knew the order of the bits, now you know them as well. No need for array&BitArray, some code wiggling, and you save on that allocations and data copying again.
I have no idea how much time/space/etc will that save for you, but that's just a few points that stand out at first glance, without modifying your original idea for the code too much, without doing-it-all via math&bitshifts in one go, etc. I've seen MarcGravell already wrote you some hints too. If you have time to spare, I suggest you try first mine, one by one, and see how (and if at all !) each change affects performance. Just to see. Then you'll probably scrap it all and try again new "do-it-all via math&bitshifts in one go" version with Marc's hints.
I want to know if there's overhead when converting between byte arrays and Streams (specifically MemoryStreams when using MemoryStream.ToArray() and MemoryStream(byte[]). I assume it's temporary doubling memory usage.
For example, I read as a stream, convert to bytes, and then convert to stream again.
But getting rid of that byte conversion will require a bit of a rewrite. I don't want to waste time rewriting it if it doesn't make a difference.
So, yes.. you are correct in assuming that ToArray duplicates the memory in the stream.
If you want do not want to do this (for efficiency reasons), you could modify the bytes directly in the stream. Take a look at this:
// create some bytes: 0,1,2,3,4,5,6,7...
var originalBytes = Enumerable.Range(0, 256).Select(Convert.ToByte).ToArray();
using(var ms = new MemoryStream(originalBytes)) // ms is referencing bytes array, not duplicating it
{
// var duplicatedBytes = ms.ToArray(); // copy of originalBytes array
// If you don't want to duplicate the bytes but want to
// modify the buffer directly, you could do this:
var bufRef = ms.GetBuffer();
for(var i = 0; i < bufRef.Length; ++i)
{
bufRef[i] = Convert.ToByte(bufRef[i] ^ 0x55);
}
// or this:
/*
ms.TryGetBuffer(out var buf);
for (var i = 0; i < buf.Count; ++i)
{
buf[i] = Convert.ToByte(buf[i] ^ 0x55);
}*/
// or this:
/*
for (var i = 0; i < ms.Length; ++i)
{
ms.Position = i;
var b = ms.ReadByte();
ms.Position = i;
ms.WriteByte(Convert.ToByte(b ^ 0x55));
}*/
}
// originalBytes will now be 85,84,87,86...
ETA:
Edited to add in Blindy's examples. Thanks! -- Totally forgot about GetBuffer and had no idea about TryGetBuffer
Does MemoryStream(byte[]) cause a memory copy?
No, it's a non-resizable stream, and as such no copy is necessary.
Does MemoryStream.ToArray() cause a memory copy?
Yes, by design it creates a copy of the active buffer. This is to cover the resizable case, where the buffer used by the stream is not the same buffer that was initially provided due to reallocations to increase/decrease its size.
Alternatives to MemoryStream.ToArray() that don't cause memory copy?
Sure, you have MemoryStream.TryGetBuffer (out ArraySegment<byte> buffer), which returns a segment pointing to the internal buffer, whether or not it's resizable. If it's non-resizable, it's a segment into your original array.
You also have MemoryStream.GetBuffer, which returns the entire internal buffer. Note that in the resizable case, this will be a lot larger than the actual used stream space, and you'll have to adjust for that in code.
And lastly, you don't always actually need a byte array, sometimes you just need to write it to another stream (a file, a socket, a compression stream, an Http response, etc). For this, you have MemoryStream.CopyTo[Async], which also doesn't perform any copies.
I need to work with large lists of floats, but I am hitting memory limits on x86 systems. I do not know the final length, so I need to use an expandable type. On x64 systems, I can use <gcAllowVeryLargeObjects>.
My current data type:
List<RawData> param1 = new List<RawData>();
List<RawData> param2 = new List<RawData>();
List<RawData> param3 = new List<RawData>();
public class RawData
{
public string name;
public List<float> data;
}
The length of the paramN lists is low (currently 50 or lower), but data can be 10m+. When the length is 50, I am hitting memory limits (OutOfMemoryException) at just above 1m data points, and when the length is 25, I hit the limit at just above 2m data points. (If my calculations are right, that is exactly 200MB, plus the size of name, plus overhead). What can I use to increase this limit?
Edit: I tried using List<List<float>> with a max inner list size of 1 << 17 (131072), which increased the limit somewhat, but still not as far as I want.
Edit2: I tried reducing the chunk size in the List> to 8192, and I got OOM at ~2.3m elements, with task manager reading ~1.4GB for the process. It looks like I need to reduce memory usage in between the data source and the storage, or trigger GC more often - I was able to gather 10m data points in a x64 process on a pc with 4GB RAM, IIRC the process never went over 3GB
Edit3: I condensed my code down to just the parts that handle the data. http://pastebin.com/maYckk84
Edit4: I had a look in DotMemory, and found that my data structure does take up ~1GB with the settings I was testing on (50ch * 3 params * 2m events = 300,000,000 float elements). I guess I will need to limit it on x86 or figure out how to write to disk in this format as I get data
First of all, on x86 systems memory limit is 2GB, not 200MB. I presume
your problem is much more trickier than that. You have aggressive LOH (large object heap) fragmentation.
CLR uses different heaps for small and large objects. Object is large if its size is larger than 85,000 bytes. LOH is a very fractious thing, it is not eager to return unused memory back to OS, and it is very poor at defragmentation.
.Net List is implementation of ArrayList data structure, it stores elements in array, which has fixed size; when array is filled, new array with doubled size is created. That continuous growth of array with your amount of data is a "starvation" scenario for LOH.
So, you have to use tailor-made data structure to suit your needs. E.g. list of chunks, with each chunk is small enough not to get into LOH. Here is small prototype:
public class ChunkedList
{
private readonly List<float[]> _chunks = new List<float[]>();
private const int ChunkSize = 8000;
private int _count = 0;
public void Add(float item)
{
int chunk = _count / ChunkSize;
int ind = _count % ChunkSize;
if (ind == 0)
{
_chunks.Add(new float[ChunkSize]);
}
_chunks[chunk][ind] = item;
_count ++;
}
public float this[int index]
{
get
{
if(index <0 || index >= _count) throw new IndexOutOfRangeException();
int chunk = index / ChunkSize;
int ind = index % ChunkSize;
return _chunks[chunk][ind];
}
set
{
if(index <0 || index >= _count) throw new IndexOutOfRangeException();
int chunk = index / ChunkSize;
int ind = index % ChunkSize;
_chunks[chunk][ind] = value;
}
}
//other code you require
}
With ChunkSize = 8000 every chunk will take only 32,000 bytes, so it will not get into LOH. _chunks will get into LOH only when there will be about 16,000 chunks in collection, which is more than 128 million elements in collection (about 500 MB).
UPD I've performed some stress tests for sample above. OS is x64, solution platform is x86. ChunkSize is 20000.
First:
var list = new ChunkedList();
for (int i = 0; ; i++)
{
list.Add(0.1f);
}
OutOfMemoryException is raised at ~324,000,000 elements
Second:
public class RawData
{
public string Name;
public ChunkedList Data = new ChunkedList();
}
var list = new List<RawData>();
for (int i = 0;; i++)
{
var raw = new RawData { Name = "Test" + i };
for (int j = 0; j < 20 * 1000 * 1000; j++)
{
raw.Data.Add(0.1f);
}
list.Add(raw);
}
OutOfMemoryException is raised at i=17, j~12,000,000. 17 RawData instances successfully created, 20 million data points per each, about 352 million data points totally.
I'm trying to use the XNA microphone to capture audio and pass it to an API I have that analyses the data for display purposes. However, the API requires the audio data in an array of 16 bit integers. So my question is fairly straight forward; what's the most efficient way to convert the byte array into a short array?
private void _microphone_BufferReady(object sender, System.EventArgs e)
{
_microphone.GetData(_buffer);
short[] shorts;
//Convert and pass the 16 bit samples
ProcessData(shorts);
}
Cheers,
Dave
EDIT: This is what I have come up with and seems to work, but could it be done faster?
private short[] ConvertBytesToShorts(byte[] bytesBuffer)
{
//Shorts array should be half the size of the bytes buffer, as each short represents 2 bytes (16bits)
short[] shorts = new short[bytesBuffer.Length / 2];
int currentStartIndex = 0;
for (int i = 0; i < shorts.Length - 1; i++)
{
//Convert the 2 bytes at the currentStartIndex to a short
shorts[i] = BitConverter.ToInt16(bytesBuffer, currentStartIndex);
//increment by 2, ready to combine the next 2 bytes in the buffer
currentStartIndex += 2;
}
return shorts;
}
After reading your update, I can see you need to actually copy a byte array directly into a buffer of shorts, merging bytes. Here's the relevant section from the documentation:
The byte[] buffer format used as a parameter for the SoundEffect constructor, Microphone.GetData method, and DynamicSoundEffectInstance.SubmitBuffer method is PCM wave data. Additionally, the PCM format is interleaved and in little-endian.
Now, if for some weird reason your system has BitConverter.IsLittleEndian == false, then you will need to loop through your buffer, swapping bytes as you go, to convert from little-endian to big-endian. I'll leave the code as an exercise - I am reasonably sure all the XNA systems are little-endian.
For your purposes, you can just copy the buffer directly using Marshal.Copy or Buffer.BlockCopy. Both will give you the performance of the platform's native memory copy operation, which will be extremely fast:
// Create this buffer once and reuse it! Don't recreate it each time!
short[] shorts = new short[_buffer.Length/2];
// Option one:
unsafe
{
fixed(short* pShorts = shorts)
Marshal.Copy(_buffer, 0, (IntPtr)pShorts, _buffer.Length);
}
// Option two:
Buffer.BlockCopy(_buffer, 0, shorts, 0, _buffer.Length);
This is a performance question, so: measure it!
It is worth pointing out that for measuring performance in .NET you want to do a release build and run without the debugger attached (this allows the JIT to optimise).
Jodrell's answer is worth commenting on: Using AsParallel is interesting, but it is worth checking if the cost of spinning it up is worth it. (Speculation - measure it to confirm: converting byte to short should be extremely fast, so if your buffer data is coming from shared memory and not a per-core cache, most of your cost will probably be in data transfer not processing.)
Also I am not sure that ToArray is appropriate. First of all, it may not be able to create the correct-sized array directly, having to resize the array as it builds it will make it very slow. Additionally it will always allocate the array - which is not slow itself, but adds a GC cost that you almost certainly don't want.
Edit: Based on your updated question, the code in the rest of this answer is not directly usable, as the format of the data is different. And the technique itself (a loop, safe or unsafe) is not as fast as what you can use. See my other answer for details.
So you want to pre-allocate your array. Somewhere out in your code you want a buffer like this:
short[] shorts = new short[_buffer.Length];
And then simply copy from one buffer to the other:
for(int i = 0; i < _buffer.Length; ++i)
result[i] = ((short)buffer[i]);
This should be very fast, and the JIT should be clever enough to skip one if not both of the array bounds checks.
And here's how you can do it with unsafe code: (I haven't tested this code, but it should be about right)
unsafe
{
int length = _buffer.Length;
fixed(byte* pSrc = _buffer) fixed(short* pDst = shorts)
{
byte* ps = pSrc;
short* pd = pDst;
while(pd < pd + length)
*(pd++) = (short)(*(ps++));
}
}
Now the unsafe version has the disadvantage of requiring /unsafe, and also it may actually be slower because it prevents the JIT from doing various optimisations. Once again: measure it.
(Also you can probably squeeze more performance if you try some permutations on the above examples. Measure it.)
Finally: Are you sure you want the conversion to be (short)sample? Shouldn't it be something like ((short)sample-128)*256 to take it from unsigned to signed and extend it to the correct bit-width? Update: seems I was wrong on the format here, see my other answer
The pest PLINQ I could come up with is here.
private short[] ConvertBytesToShorts(byte[] bytesBuffer)
{
//Shorts array should be half the size of the bytes buffer, as each short represents 2 bytes (16bits)
var odd = buffer.AsParallel().Where((b, i) => i % 2 != 0);
var even = buffer.AsParallell().Where((b, i) => i % 2 == 0);
return odd.Zip(even, (o, e) => {
return (short)((o << 8) | e);
}.ToArray();
}
I'm dubios about the performance but with enough data and processors who knows.
If the conversion operation is wrong ((short)((o << 8) | e)) please change to suit.
I am using this:
byte[] buffer = new byte[10240];
As I understand this initialize the buffer array of 10kb filled with 0s.
Whats the fastest way to fill this array (or initialize it) with junk data every time?
I need to use that array like >5000 times and fill it every time with different junk data, that's why I am looking for a fast method to do it. The array size will also have to change every time.
Answering 'the fastest way' is impossible without describing what the properties of your junk data have to be. Why isn't all zeroes valid junk data?
That said, this is a fast way to fill your array with meaningless numbers.
Random r = new Random();
r.NextBytes(buffer);
You might also look at implementing your own Linear congruential generator if Random isn't fast enough for you. They're simple to implement and fast, but won't give high quality random numbers. (It's unclear to me if you need those or not.)
If you are happy with the data being random, but being created form a random seed buffer, then you could do the following:
public class RandomBufferGenerator
{
private readonly Random _random = new Random();
private readonly byte[] _seedBuffer;
public RandomBufferGenerator(int maxBufferSize)
{
_seedBuffer = new byte[maxBufferSize];
_random.NextBytes(_seedBuffer);
}
public byte[] GenerateBufferFromSeed(int size)
{
int randomWindow = _random.Next(0, size);
byte[] buffer = new byte[size];
Buffer.BlockCopy(_seedBuffer, randomWindow, buffer, 0, size - randomWindow);
Buffer.BlockCopy(_seedBuffer, 0, buffer, size - randomWindow, randomWindow);
return buffer;
}
}
I found it to be approx 60-70 times faster then generating a random buffer from scratch each time.
START: From seed buffer.
00:00:00.009 END : From seed buffer. (Items = 5,000; Per Second = 500,776.20)
START: From scratch.
00:00:00.604 END : From scratch. (Items = 5,000; Per Second = 8,276.95)
Update
The general idea is to create a RandomBufferGenerator once, and then use this instance to generate random buffers, e.g.:
RandomBufferGenerator generator = new RandomBufferGenerator(MaxBufferSize);
byte[] randomBuffer1 = generator.GenerateBufferFromSeed(10 * 1024);
byte[] randomBuffer2 = generator.GenerateBufferFromSeed(5 * 1024);
...
Look at the System.Random.NextBytes() method
As another option to consider, Marshall.AllocHGlobal will allocate unmanaged memory. It doesn't zero out the memory, you get what happened to be there so it's very fast. Of course you now have to work with this memory using unsafe code, and if you need to pull it into the managed space you are better off with Random.NextBytes.
How junky should be the data? Do you mean random? If that is the case just use the Random class.