C# Explanation of stream string example

C# Explanation of stream string example - c#

I am using a Microsoft example for interprocess communication. In the example there are two methods for reading/writing a string to/from a stream. The code sends the length of the string being streamed in the data. I need similar code, but need to make some modifications. An explanation of the highlighted lines would be helpful.
In WriteString(), they taken the length of the byte array being written and divide it by 256. The opposite is done in ReadString(), but an explanation as to why 256 is used would be great. Then, it writes another byte by taking the length and & it with 255. I don't understand the reasoning for this either. I'm thinking it shifts the value, but I don't really understand why this is needed. And then in ReadString() it does a += to the length by reading a byte. An explanation of this would be really helpful. I'm new to streaming and just want to understand exactly what is happening and why.
public string ReadString()
{
int len = 0;
// the next two lines
len = ioStream.ReadByte() * 256;
len += ioStream.ReadByte();
byte[] inBuffer = new byte[len];
ioStream.Read(inBuffer, 0, len);
return streamEncoding.GetString(inBuffer);
}
public int WriteString(string outString)
{
byte[] outBuffer = streamEncoding.GetBytes(outString);
int len = outBuffer.Length;
if (len > UInt16.MaxValue)
{
len = (int)UInt16.MaxValue;
}
// the next to lines
ioStream.WriteByte((byte)(len / 256));
ioStream.WriteByte((byte)(len & 255));
ioStream.Write(outBuffer, 0, len);
ioStream.Flush();
return outBuffer.Length + 2;
}

This code is bad, find a new tutorial.
The 256 stuff is there to convert the low 2 bytes of the length integer to bytes in order to serialize/deserialize them. This is not how it's normally done. Use BinaryReader/Writer or code not based on multiplication but on binary and and shift.
Dividing by 256 is equivalent to x >> 8. Also this only works with positive integers. x & 255 is used to take the lowest byte. This could simply be (byte)x. Sometimes people write x % 256 for this which not idiomatic and has problems with signedness.
Good code would be new byte[] { (byte)(x >> 8), (byte)(x >> 0) } and x = bytes[1] << 8 | bytes[0]. Much simpler, faster and idiomatic. I like writing >> 0 which does nothing for the sake of symmetry. It's optimized away. This might seem ridiculous for 16 bit ints but with longer ints there are 4 or 8 components and having one of them slightly off seems like needless inconsistency.
ioStream.Read(inBuffer, 0, len);
is a bug because it assumes the read completes in one chunk. Need a loop or again BinaryReader.
if (len > UInt16.MaxValue)
{
len = (int)UInt16.MaxValue;
}
I'm going to use this opportunity to warn anyone who reads this: Microsoft .NET sample code often is of extremely poor quality. Read it with great skepticism.

Related

How do I add bits to a MemoryStream

So I've been trying to add bits of a value to a MemoryStream but the issue is I have no idea how. I've seen that it's used for performance when it comes to networking.
I know I want a function that takes the bit value and how many bits it takes to store that value. So for instance, to store the value 3 I would need to allocate 2 bits 0000 0000 0000 0011. I would essentially pack the bits into a byte array and then add that byte array to the MemoryStream
var ms = new MemoryStream();
ms.WriteByte(1);
ms.WriteByte(1);
ms.WriteByte(1);
ms.WriteByte(1);
ms.WriteByte(1);
WriteBits(2, 3);
WriteBits(1, 1);
void WriteBits(int numbBits, int value)
{
/* Convert the "value" to a byte or bytes and add it to the MemoryStream */
}
How do I properly implement this?
Java Example
public void writeBits(int numBits, int value) {
int bytePos = bitPosition >> 3;
int bitOffset = 8 - (bitPosition & 7);
bitPosition += numBits;
for (; numBits > bitOffset; bitOffset = 8) {
buffer[bytePos] &= ~bitMaskOut[bitOffset]; // mask out the desired area
buffer[bytePos++] |= (value >> (numBits - bitOffset))
& bitMaskOut[bitOffset];
numBits -= bitOffset;
}
if (numBits == bitOffset) {
buffer[bytePos] &= ~bitMaskOut[bitOffset];
buffer[bytePos] |= value & bitMaskOut[bitOffset];
} else {
buffer[bytePos] &= ~(bitMaskOut[numBits] << (bitOffset - numBits));
buffer[bytePos] |= (value & bitMaskOut[numBits]) << (bitOffset - numBits);
}
}

So I've been trying to add bits of a value to a MemoryStream
You don't, MemoryStream only handles bytes.
So for instance, to store the value 3 I would need to allocate 2 bits
This would only be true if the range of values you want to store is [0, 3]. If you want the possibility of storing any larger value you need more bits.
How do I properly implement this?
you would need to implement your own bit-stream. The java example looks like it has a byte[] buffer, and a bitPosition. You would need to implement this. The bif-fiddeling code looks like it should work just about the same in c#. Once you have a byte[] it is trivial to write this out to whatever stream you want, and usually possible to send directly over the network.
I've seen that it's used for performance when it comes to networking
I think there is a significant misunderstanding here. While you could manually manipulate individual bits, in most cases it would just be a waste of (development) time.
In general, a better way to get good performance is to use existing, well optimized and designed libraries. And there are a variety of serialization libraries that converts objects to byte-streams for you. An example would be protobuf (.net), this actually encodes numbers with a variable number of bytes.
If you still need smaller data it is usually more efficient to use some form of compression. The old classic deflate usually gives good compromise between size and performance, while algorithms like lz4 prioritizes speed over compression ratio.

I had exactly the same problem and wrote an entire BitStream library which can handle any reads and writes of an arbitrary number of bits to a MemoryStream (and any other stream, too). The library is open-source, MIT-licensed and fast (https://github.com/martinweihrauch/BitStream).
Writing bits to a MemoryStream.
These are the steps to write a value to a certain number of bits to a specific position in the MemoryStream:
Have a Stream available, e. g. a MemoryStream(), to which you want to write.
Connect this Stream to a new Bitstream
using SharpBitStream;
uint[] testDataUnsigned = { 5, 62, 17, 50, 33 };
var ms = new MemoryStream();
var bs = new BitStream(ms);
Now, you can start writing to the BitStream like this:
foreach(var bits in testDataUnsigned)
{
bs.WriteUnsigned(6, (ulong)bits);
}
Writing can be done as above by only providing the bitlength and the value, but you of course also have full controll of exactly where to write the bits like so:
bs.WriteUnsigned(3, 2, 4, 5);
// Overloaded signature of WriteUnsigned:
// public void WriteUnsigned(long offsetByteStream, int offsetBit, int bitLength, ulong value)
// For signed numbers (e. g. -17), use
// bs.WriteSigned(3, 2, 4, -5);
This means, you can control that you write to the 4th byte (3, because starting at 0) in the underlying byte Stream,
starting from the the 3rd (=2) position of the byte with a length of 6 bits and the value 5 (=0b0101);
Reading works similarly:
Just read the next 6 bits, wherever your byte and bit position is (e. g. for loops, etc):
ulong number = bs.ReadUnsigned(6);
// For Signed, use
// long number = bs.ReadSigned(6);
Read a specific position, in this example read 4 bits from 3rd byte in Stream (2= 3rd position), starting with bit #0:
ulong number = bs.ReadUnsigned(2, 0, 4);
// For signed, use
// long number = bs.ReadSigned(2, 0, 4);
Note: The bit offset is always counting from 0 from the left-most position.

c# - fastest method get integer from part of bits

I have byte[] byteArray, usually byteArray.Length = 1-3
I need decompose an array into bits, take some bits (for example, 5-17), and convert these bits to Int32.
I tried to do this
private static IEnumerable<bool> GetBitsStartingFromLSB(byte b)
{
for (int i = 0; i < 8; i++)
{
yield return (b % 2 != 0);
b = (byte)(b >> 1);
}
}
public static Int32 Bits2Int(ref byte[] source, int offset, int length)
{
List<bool> bools = source.SelectMany(GetBitsStartingFromLSB).ToList();
bools = bools.GetRange(offset, length);
bools.AddRange(Enumerable.Repeat(false, 32-length).ToList() );
int[] array = new int[1];
(new BitArray(bools.ToArray())).CopyTo(array, 0);
return array[0];
}
But this method is too slow, and I have to call it very often.
How can I do this more efficiently?
Thanx a lot! Now i do this:
public static byte[] GetPartOfByteArray( byte[] source, int offset, int length)
{
byte[] retBytes = new byte[length];
Buffer.BlockCopy(source, offset, retBytes, 0, length);
return retBytes;
}
public static Int32 Bits2Int(byte[] source, int offset, int length)
{
if (source.Length > 4)
{
source = GetPartOfByteArray(source, offset / 8, (source.Length - offset / 8 > 3 ? 4 : source.Length - offset / 8));
offset -= 8 * (offset / 8);
}
byte[] intBytes = new byte[4];
source.CopyTo(intBytes, 0);
Int32 full = BitConverter.ToInt32(intBytes);
Int32 mask = (1 << length) - 1;
return (full >> offset) & mask;
}
And it works very fast!

If you're after "fast", then ultimately you need to do this with bit logic, not LINQ etc. I'm not going to write actual code, but you'd need to:
use your offset with / 8 and % 8 to find the starting byte and the bit-offset inside that byte
compose however many bytes you need - quite possibly up to 5 if you are after a 32-bit number (because of the possibility of an offset)
; for example into a long, in whichever endianness (presumably big-endian?) you expect
use right-shift (>>) on the composed value to drop however-many bits you need to apply the bit-offset (i.e. value >>= offset % 8;)
mask out any bits you don't want; for example value &= ~(-1L << length); (the -1 gives you all-ones; the << length creates length zeros at the right hand edge, and the ~ swaps all zeros for ones and ones for zeros, so you now have length ones at the right hand edge)
if the value is signed, you'll need to think about how you want negatives to be handled, especially if you aren't always reading 32 bits

First of all, you're asking for optimization. But the only things you've said are:
too slow
need to call it often
There's no information on:
how much slow is too slow? have you measured current code? have you estimated how fast you need it to be?
how often is "often"?
how large are the source byt arrays?
etc.
Optimization can be done in a multitude of ways. When asking for optimization, everything is important. For example, if source byte[] is 1 or 2 bytes long (yeah, may be ridiculous, but you didn't tell us), and if it rarely changes, then you could get very nice results by caching results. And so on.
So, no solutions from me, just a list of possible performance problems:
private static IEnumerable<bool> GetBitsStartingFromLSB(byte b) // A
{
for (int i = 0; i < 8; i++)
{
yield return (b % 2 != 0); // A
b = (byte)(b >> 1);
}
}
public static Int32 Bits2Int(ref byte[] source, int offset, int length)
{
List<bool> bools = source.SelectMany(GetBitsStartingFromLSB).ToList(); //A,B
bools = bools.GetRange(offset, length); //B
bools.AddRange(Enumerable.Repeat(false, 32-length).ToList() ); //C
int[] array = new int[1]; //D
(new BitArray(bools.ToArray())).CopyTo(array, 0); //D
return array[0]; //D
}
A: LINQ is fun, but not fast unless done carefully. For each input byte, it takes 1 byte, splits that in 8 bools, passing them around wrapped it in a compiler-generated IEnumerable object *). Note that it all needs to be cleaned up later, too. Probably you'd get a better performance simply returning a new bool[8] or even BitArray(size=8).
*) conceptually. In fact yield-return is lazy, so it's not 8valueobj+1refobj, but just 1 enumerable that generates items. But then, you're doing .ToList() in (B), so me writing this in that way isn't that far from truth.
A2: the 8 is hardcoded. Once you drop that pretty IEnumerable and change it to a constant-sized array-like thing, you can preallocate that array and pass it via parameter to GetBitsStartingFromLSB to further reduce the amount of temporary objects created and later thrown away. And since SelectMany visits items one-by-one without ever going back, that preallocated array can be reused.
B: Converts whole Source array to stream of bytes, converts it to List. Then discards that whole list except for a small offset-length range of that list. Why covert to list at all? It's just another pack of objects wasted, and internal data is copied as well, since bool is a valuetype. You could have taken the range directly from IEnumerable by .Skip(X).Take(Y)
C: padding a list of bools to have 32 items. AddRange/Repeat is fun, but Repeat has to return an IEnumerable. It's again another object that is created and throw away. You're padding the list with false. Drop the list idea, make it an bool[32]. Or BitArray(32). They start with false automatically. That's the default value of a bool. Iterate over the those bits from 'range' A+B and write them into that array by index. Those written will have their value, those unwritten will stay false. Job done, no objects wasted.
C2: connect preallocating 32-item array with A+A2. GetBitsStartingFromLSB doesn't need to return anything, it may get a buffer to be filled via parameter. And that buffer doesn't need to be 8-item buffer. You may pass the whole 32-item final array, and pass an offset so that function knows exactly where to write. Even less objects wasted.
D: finally, all that work to return selected bits as an integer. new temporary array is created&wasted, new BitArray is effectively created&wasted too. Note that earlier you were already doing manual bit-shift conversion int->bits in GetBitsStartingFromLSB, why not just create a similar method that will do some shifts and do bits->int instead? If you knew the order of the bits, now you know them as well. No need for array&BitArray, some code wiggling, and you save on that allocations and data copying again.
I have no idea how much time/space/etc will that save for you, but that's just a few points that stand out at first glance, without modifying your original idea for the code too much, without doing-it-all via math&bitshifts in one go, etc. I've seen MarcGravell already wrote you some hints too. If you have time to spare, I suggest you try first mine, one by one, and see how (and if at all !) each change affects performance. Just to see. Then you'll probably scrap it all and try again new "do-it-all via math&bitshifts in one go" version with Marc's hints.

Defining a bit[] array in C#

currently im working on a solution for a prime-number calculator/checker. The algorythm is already working and verry efficient (0,359 seconds for the first 9012330 primes). Here is a part of the upper region where everything is declared:
const uint anz = 50000000;
uint a = 3, b = 4, c = 3, d = 13, e = 12, f = 13, g = 28, h = 32;
bool[,] prim = new bool[8, anz / 10];
uint max = 3 * (uint)(anz / (Math.Log(anz) - 1.08366));
uint[] p = new uint[max];
Now I wanted to go to the next level and use ulong's instead of uint's to cover a larger area (you can see that already), where i tapped into my problem: the bool-array.
Like everybody should know, bool's have the length of a byte what takes a lot of memory when creating the array... So I'm searching for a more resource-friendly way to do that.
My first idea was a bit-array -> not byte! <- to save the bool's, but haven't figured out how to do that by now. So if someone ever did something like this, I would appreciate any kind of tips and solutions. Thanks in advance :)

You can use BitArray collection:
http://msdn.microsoft.com/en-us/library/system.collections.bitarray(v=vs.110).aspx
MSDN Description:
Manages a compact array of bit values, which are represented as Booleans, where true indicates that the bit is on (1) and false indicates the bit is off (0).

You can (and should) use well tested and well known libraries.
But if you're looking to learn something (as it seems to be the case) you can do it yourself.
Another reason you may want to use a custom bit array is to use the hard drive to store the array, which comes in handy when calculating primes. To do this you'd need to further split addr, for example lowest 3 bits for the mask, next 28 bits for 256MB of in-memory storage, and from there on - a file name for a buffer file.
Yet another reason for custom bit array is to compress the memory use when specifically searching for primes. After all more than half of your bits will be 'false' because the numbers corresponding to them would be even, so in fact you can both speed up your calculation AND reduce memory requirements if you don't even store the even bits. You can do that by changing the way addr is interpreted. Further more you can also exclude numbers divisible by 3 (only 2 out of every 6 numbers has a chance of being prime) thus reducing memory requirements by 60% compared to plain bit array.
Notice the use of shift and logical operators to make the code a bit more efficient.
byte mask = (byte)(1 << (int)(addr & 7)); for example can be written as
byte mask = (byte)(1 << (int)(addr % 8));
and addr >> 3 can be written as addr / 8
Testing shift/logical operators vs division shows 2.6s vs 4.8s in favor of shift/logical for 200000000 operations.
Here's the code:
void Main()
{
var barr = new BitArray(10);
barr[4] = true;
Console.WriteLine("Is it "+barr[4]);
Console.WriteLine("Is it Not "+barr[5]);
}
public class BitArray{
private readonly byte[] _buffer;
public bool this[long addr]{
get{
byte mask = (byte)(1 << (int)(addr & 7));
byte val = _buffer[(int)(addr >> 3)];
bool bit = (val & mask) == mask;
return bit;
}
set{
byte mask = (byte) ((value ? 1:0) << (int)(addr & 7));
int offs = (int)addr >> 3;
_buffer[offs] = (byte)(_buffer[offs] | mask);
}
}
public BitArray(long size){
_buffer = new byte[size/8 + 1]; // define a byte buffer sized to hold 8 bools per byte. The spare +1 is to avoid dealing with rounding.
}
}

Efficiently convert audio bytes - byte[] to short[]

I'm trying to use the XNA microphone to capture audio and pass it to an API I have that analyses the data for display purposes. However, the API requires the audio data in an array of 16 bit integers. So my question is fairly straight forward; what's the most efficient way to convert the byte array into a short array?
private void _microphone_BufferReady(object sender, System.EventArgs e)
{
_microphone.GetData(_buffer);
short[] shorts;
//Convert and pass the 16 bit samples
ProcessData(shorts);
}
Cheers,
Dave
EDIT: This is what I have come up with and seems to work, but could it be done faster?
private short[] ConvertBytesToShorts(byte[] bytesBuffer)
{
//Shorts array should be half the size of the bytes buffer, as each short represents 2 bytes (16bits)
short[] shorts = new short[bytesBuffer.Length / 2];
int currentStartIndex = 0;
for (int i = 0; i < shorts.Length - 1; i++)
{
//Convert the 2 bytes at the currentStartIndex to a short
shorts[i] = BitConverter.ToInt16(bytesBuffer, currentStartIndex);
//increment by 2, ready to combine the next 2 bytes in the buffer
currentStartIndex += 2;
}
return shorts;
}

After reading your update, I can see you need to actually copy a byte array directly into a buffer of shorts, merging bytes. Here's the relevant section from the documentation:
The byte[] buffer format used as a parameter for the SoundEffect constructor, Microphone.GetData method, and DynamicSoundEffectInstance.SubmitBuffer method is PCM wave data. Additionally, the PCM format is interleaved and in little-endian.
Now, if for some weird reason your system has BitConverter.IsLittleEndian == false, then you will need to loop through your buffer, swapping bytes as you go, to convert from little-endian to big-endian. I'll leave the code as an exercise - I am reasonably sure all the XNA systems are little-endian.
For your purposes, you can just copy the buffer directly using Marshal.Copy or Buffer.BlockCopy. Both will give you the performance of the platform's native memory copy operation, which will be extremely fast:
// Create this buffer once and reuse it! Don't recreate it each time!
short[] shorts = new short[_buffer.Length/2];
// Option one:
unsafe
{
fixed(short* pShorts = shorts)
Marshal.Copy(_buffer, 0, (IntPtr)pShorts, _buffer.Length);
}
// Option two:
Buffer.BlockCopy(_buffer, 0, shorts, 0, _buffer.Length);

This is a performance question, so: measure it!
It is worth pointing out that for measuring performance in .NET you want to do a release build and run without the debugger attached (this allows the JIT to optimise).
Jodrell's answer is worth commenting on: Using AsParallel is interesting, but it is worth checking if the cost of spinning it up is worth it. (Speculation - measure it to confirm: converting byte to short should be extremely fast, so if your buffer data is coming from shared memory and not a per-core cache, most of your cost will probably be in data transfer not processing.)
Also I am not sure that ToArray is appropriate. First of all, it may not be able to create the correct-sized array directly, having to resize the array as it builds it will make it very slow. Additionally it will always allocate the array - which is not slow itself, but adds a GC cost that you almost certainly don't want.
Edit: Based on your updated question, the code in the rest of this answer is not directly usable, as the format of the data is different. And the technique itself (a loop, safe or unsafe) is not as fast as what you can use. See my other answer for details.
So you want to pre-allocate your array. Somewhere out in your code you want a buffer like this:
short[] shorts = new short[_buffer.Length];
And then simply copy from one buffer to the other:
for(int i = 0; i < _buffer.Length; ++i)
result[i] = ((short)buffer[i]);
This should be very fast, and the JIT should be clever enough to skip one if not both of the array bounds checks.
And here's how you can do it with unsafe code: (I haven't tested this code, but it should be about right)
unsafe
{
int length = _buffer.Length;
fixed(byte* pSrc = _buffer) fixed(short* pDst = shorts)
{
byte* ps = pSrc;
short* pd = pDst;
while(pd < pd + length)
*(pd++) = (short)(*(ps++));
}
}
Now the unsafe version has the disadvantage of requiring /unsafe, and also it may actually be slower because it prevents the JIT from doing various optimisations. Once again: measure it.
(Also you can probably squeeze more performance if you try some permutations on the above examples. Measure it.)
Finally: Are you sure you want the conversion to be (short)sample? Shouldn't it be something like ((short)sample-128)*256 to take it from unsigned to signed and extend it to the correct bit-width? Update: seems I was wrong on the format here, see my other answer

The pest PLINQ I could come up with is here.
private short[] ConvertBytesToShorts(byte[] bytesBuffer)
{
//Shorts array should be half the size of the bytes buffer, as each short represents 2 bytes (16bits)
var odd = buffer.AsParallel().Where((b, i) => i % 2 != 0);
var even = buffer.AsParallell().Where((b, i) => i % 2 == 0);
return odd.Zip(even, (o, e) => {
return (short)((o << 8) | e);
}.ToArray();
}
I'm dubios about the performance but with enough data and processors who knows.
If the conversion operation is wrong ((short)((o << 8) | e)) please change to suit.

How to produce 8 bytes from 4 bytes with a reproducible operation?

I've 4 bytes of data and need an 8 bytes array for a security operation. I should produce these 8 bytes form the 4 bytes byte array and this should be reproducible.
I was thinking of using exact byte array and adding 4 extra bytes and fill them with AND, OR, XOR... of the initial array in a known sequence. I'm not sure if it's a good idea. I just need an 8 byte array from this 4 bytes and the operation should be reproducible (same 8 bytes with same given 4 bytes). Please give an example in C#

Why not just pad the existing 4 bytes with another 4 bytes of zeroes? Or repeat the original 4 bytes. For example:
static byte[] Pad(byte[] input)
{
// Alternatively use Array.Resize
byte[] output = new byte[input.Length + 4];
Buffer.BlockCopy(input, 0, output, 0, input.Length);
return output;
}
static byte[] Repeat(byte[] input)
{
byte[] output = new byte[input.Length * 2];
Buffer.BlockCopy(input, 0, output, 0, input.Length);
Buffer.BlockCopy(input, 0, output, input.Length, input.Length);
return output;
}
Both of these fulfil your original criteria, I believe... but I suspect you're looking for something else. If that's the case, you need to be explicit about what you need.
EDIT: As I've said in the comments, you're basically not adding any real security here - padding will make that clearer, IMO. On the other hand, if you do want some security-through-obscurity, you could find a random number generator that allows seeding, and use that as a starting point. For example:
// Don't use this - see below. Just the concept...
int seed = BitConverter.ToInt32(input, 0); // TODO: Cope with endianness
Random rng = new Random(seed);
byte[] output = new byte[8];
Buffer.BlockCopy(input, 0, output, 0, 4);
for (int i = 4; i < 8; i++) {
output[i] = (byte) rng.Next(256);
}
Now, the reason I've got the comment above is that you probably need an algorithm which is guaranteed not to change between versions of .NET. Find code to something like the Mersenne Twister, for exmaple.

There are multiple methods of doing padding for block ciphers.
This Wikipedia article covers some of the more accepted solutions: http://en.wikipedia.org/wiki/Padding_(cryptography)#Padding_methods
Barring any other considerations, I would use PKCS#7 padding.

How about
bytes.Concat(bytes)

I would be very careful. If a security operation requires 64 bits worth of data it is probably because it requires that much data. If you create your 64 bits from 32 bits with a known reproducible formula you will still only have 32 bits worth of data.
If the security is not affected by the data you have you can just fill the the remaining four bytes with ones or zeros. But you should really try to get 8 bytes of "real" data.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.