How do I add bits to a MemoryStream - c#

So I've been trying to add bits of a value to a MemoryStream but the issue is I have no idea how. I've seen that it's used for performance when it comes to networking.
I know I want a function that takes the bit value and how many bits it takes to store that value. So for instance, to store the value 3 I would need to allocate 2 bits 0000 0000 0000 0011. I would essentially pack the bits into a byte array and then add that byte array to the MemoryStream
var ms = new MemoryStream();
ms.WriteByte(1);
ms.WriteByte(1);
ms.WriteByte(1);
ms.WriteByte(1);
ms.WriteByte(1);
WriteBits(2, 3);
WriteBits(1, 1);
void WriteBits(int numbBits, int value)
{
/* Convert the "value" to a byte or bytes and add it to the MemoryStream */
}
How do I properly implement this?
Java Example
public void writeBits(int numBits, int value) {
int bytePos = bitPosition >> 3;
int bitOffset = 8 - (bitPosition & 7);
bitPosition += numBits;
for (; numBits > bitOffset; bitOffset = 8) {
buffer[bytePos] &= ~bitMaskOut[bitOffset]; // mask out the desired area
buffer[bytePos++] |= (value >> (numBits - bitOffset))
& bitMaskOut[bitOffset];
numBits -= bitOffset;
}
if (numBits == bitOffset) {
buffer[bytePos] &= ~bitMaskOut[bitOffset];
buffer[bytePos] |= value & bitMaskOut[bitOffset];
} else {
buffer[bytePos] &= ~(bitMaskOut[numBits] << (bitOffset - numBits));
buffer[bytePos] |= (value & bitMaskOut[numBits]) << (bitOffset - numBits);
}
}

So I've been trying to add bits of a value to a MemoryStream
You don't, MemoryStream only handles bytes.
So for instance, to store the value 3 I would need to allocate 2 bits
This would only be true if the range of values you want to store is [0, 3]. If you want the possibility of storing any larger value you need more bits.
How do I properly implement this?
you would need to implement your own bit-stream. The java example looks like it has a byte[] buffer, and a bitPosition. You would need to implement this. The bif-fiddeling code looks like it should work just about the same in c#. Once you have a byte[] it is trivial to write this out to whatever stream you want, and usually possible to send directly over the network.
I've seen that it's used for performance when it comes to networking
I think there is a significant misunderstanding here. While you could manually manipulate individual bits, in most cases it would just be a waste of (development) time.
In general, a better way to get good performance is to use existing, well optimized and designed libraries. And there are a variety of serialization libraries that converts objects to byte-streams for you. An example would be protobuf (.net), this actually encodes numbers with a variable number of bytes.
If you still need smaller data it is usually more efficient to use some form of compression. The old classic deflate usually gives good compromise between size and performance, while algorithms like lz4 prioritizes speed over compression ratio.

I had exactly the same problem and wrote an entire BitStream library which can handle any reads and writes of an arbitrary number of bits to a MemoryStream (and any other stream, too). The library is open-source, MIT-licensed and fast (https://github.com/martinweihrauch/BitStream).
Writing bits to a MemoryStream.
These are the steps to write a value to a certain number of bits to a specific position in the MemoryStream:
Have a Stream available, e. g. a MemoryStream(), to which you want to write.
Connect this Stream to a new Bitstream
using SharpBitStream;
uint[] testDataUnsigned = { 5, 62, 17, 50, 33 };
var ms = new MemoryStream();
var bs = new BitStream(ms);
Now, you can start writing to the BitStream like this:
foreach(var bits in testDataUnsigned)
{
bs.WriteUnsigned(6, (ulong)bits);
}
Writing can be done as above by only providing the bitlength and the value, but you of course also have full controll of exactly where to write the bits like so:
bs.WriteUnsigned(3, 2, 4, 5);
// Overloaded signature of WriteUnsigned:
// public void WriteUnsigned(long offsetByteStream, int offsetBit, int bitLength, ulong value)
// For signed numbers (e. g. -17), use
// bs.WriteSigned(3, 2, 4, -5);
This means, you can control that you write to the 4th byte (3, because starting at 0) in the underlying byte Stream,
starting from the the 3rd (=2) position of the byte with a length of 6 bits and the value 5 (=0b0101);
Reading works similarly:
Just read the next 6 bits, wherever your byte and bit position is (e. g. for loops, etc):
ulong number = bs.ReadUnsigned(6);
// For Signed, use
// long number = bs.ReadSigned(6);
Read a specific position, in this example read 4 bits from 3rd byte in Stream (2= 3rd position), starting with bit #0:
ulong number = bs.ReadUnsigned(2, 0, 4);
// For signed, use
// long number = bs.ReadSigned(2, 0, 4);
Note: The bit offset is always counting from 0 from the left-most position.

Related

Why is the last byte of a WasapiLoopbackCapture sample always 3B,3C,3D,3E,BC,BC or BE

I am writing a program to display audio samples on a time/amplitude graph.
I want to use the real-time audio playing on the computer and am using WasapiLoopbackCapture from the CSCore library, but the data shows on my graph always seems erroneous.
when I try to read the audio stream buffer with a bit depth of 32, the last byte of the array written to debug in
Debug.WriteLine(BitConverter.ToString(_sample));
is always 3B,3C,3D,3E,BC,BC or BE
I also found that reading the buffer as 32 bit (capture.WaveFormat.BitsPerSample returns 32) appears to show completely erroneous amplitudes on the graph whereas manually setting a bit depth
of 16 and reading the byte array with ReadInt16BigEndian appears to represent each sample somewhat accurately.
capture = new WasapiLoopbackCapture();
_bitDepth = capture.WaveFormat.BitsPerSample;
_byteDepth = bitDepth / 8;
Debug.WriteLine(bitDepth);
capture.DataAvailable += (s, a) =>
{
//for loop where i steps through each sample (each sample is multiple bytes)
for(int i = 0; i < a.Buffer.Length; i += _byteDepth)
{
//creates new byte array with 4 bytes / 32 bits
var _sample = new byte[_byteDepth];
//copies 1 sample from the buffer into the _sample byte array
Buffer.BlockCopy(a.Buffer, i, _sample,0,_byteDepth);
Debug.WriteLine(BitConverter.ToString(_sample));
//reads the byte array to an int
var _intSample = BinaryPrimitives.ReadInt32BigEndian(_sample);
currentData.Add(_intSample);
}
};
WASAPI uses 32 bit floating point (float) as a sample format. It's not a 32 bit integer.

c# - fastest method get integer from part of bits

I have byte[] byteArray, usually byteArray.Length = 1-3
I need decompose an array into bits, take some bits (for example, 5-17), and convert these bits to Int32.
I tried to do this
private static IEnumerable<bool> GetBitsStartingFromLSB(byte b)
{
for (int i = 0; i < 8; i++)
{
yield return (b % 2 != 0);
b = (byte)(b >> 1);
}
}
public static Int32 Bits2Int(ref byte[] source, int offset, int length)
{
List<bool> bools = source.SelectMany(GetBitsStartingFromLSB).ToList();
bools = bools.GetRange(offset, length);
bools.AddRange(Enumerable.Repeat(false, 32-length).ToList() );
int[] array = new int[1];
(new BitArray(bools.ToArray())).CopyTo(array, 0);
return array[0];
}
But this method is too slow, and I have to call it very often.
How can I do this more efficiently?
Thanx a lot! Now i do this:
public static byte[] GetPartOfByteArray( byte[] source, int offset, int length)
{
byte[] retBytes = new byte[length];
Buffer.BlockCopy(source, offset, retBytes, 0, length);
return retBytes;
}
public static Int32 Bits2Int(byte[] source, int offset, int length)
{
if (source.Length > 4)
{
source = GetPartOfByteArray(source, offset / 8, (source.Length - offset / 8 > 3 ? 4 : source.Length - offset / 8));
offset -= 8 * (offset / 8);
}
byte[] intBytes = new byte[4];
source.CopyTo(intBytes, 0);
Int32 full = BitConverter.ToInt32(intBytes);
Int32 mask = (1 << length) - 1;
return (full >> offset) & mask;
}
And it works very fast!
If you're after "fast", then ultimately you need to do this with bit logic, not LINQ etc. I'm not going to write actual code, but you'd need to:
use your offset with / 8 and % 8 to find the starting byte and the bit-offset inside that byte
compose however many bytes you need - quite possibly up to 5 if you are after a 32-bit number (because of the possibility of an offset)
; for example into a long, in whichever endianness (presumably big-endian?) you expect
use right-shift (>>) on the composed value to drop however-many bits you need to apply the bit-offset (i.e. value >>= offset % 8;)
mask out any bits you don't want; for example value &= ~(-1L << length); (the -1 gives you all-ones; the << length creates length zeros at the right hand edge, and the ~ swaps all zeros for ones and ones for zeros, so you now have length ones at the right hand edge)
if the value is signed, you'll need to think about how you want negatives to be handled, especially if you aren't always reading 32 bits
First of all, you're asking for optimization. But the only things you've said are:
too slow
need to call it often
There's no information on:
how much slow is too slow? have you measured current code? have you estimated how fast you need it to be?
how often is "often"?
how large are the source byt arrays?
etc.
Optimization can be done in a multitude of ways. When asking for optimization, everything is important. For example, if source byte[] is 1 or 2 bytes long (yeah, may be ridiculous, but you didn't tell us), and if it rarely changes, then you could get very nice results by caching results. And so on.
So, no solutions from me, just a list of possible performance problems:
private static IEnumerable<bool> GetBitsStartingFromLSB(byte b) // A
{
for (int i = 0; i < 8; i++)
{
yield return (b % 2 != 0); // A
b = (byte)(b >> 1);
}
}
public static Int32 Bits2Int(ref byte[] source, int offset, int length)
{
List<bool> bools = source.SelectMany(GetBitsStartingFromLSB).ToList(); //A,B
bools = bools.GetRange(offset, length); //B
bools.AddRange(Enumerable.Repeat(false, 32-length).ToList() ); //C
int[] array = new int[1]; //D
(new BitArray(bools.ToArray())).CopyTo(array, 0); //D
return array[0]; //D
}
A: LINQ is fun, but not fast unless done carefully. For each input byte, it takes 1 byte, splits that in 8 bools, passing them around wrapped it in a compiler-generated IEnumerable object *). Note that it all needs to be cleaned up later, too. Probably you'd get a better performance simply returning a new bool[8] or even BitArray(size=8).
*) conceptually. In fact yield-return is lazy, so it's not 8valueobj+1refobj, but just 1 enumerable that generates items. But then, you're doing .ToList() in (B), so me writing this in that way isn't that far from truth.
A2: the 8 is hardcoded. Once you drop that pretty IEnumerable and change it to a constant-sized array-like thing, you can preallocate that array and pass it via parameter to GetBitsStartingFromLSB to further reduce the amount of temporary objects created and later thrown away. And since SelectMany visits items one-by-one without ever going back, that preallocated array can be reused.
B: Converts whole Source array to stream of bytes, converts it to List. Then discards that whole list except for a small offset-length range of that list. Why covert to list at all? It's just another pack of objects wasted, and internal data is copied as well, since bool is a valuetype. You could have taken the range directly from IEnumerable by .Skip(X).Take(Y)
C: padding a list of bools to have 32 items. AddRange/Repeat is fun, but Repeat has to return an IEnumerable. It's again another object that is created and throw away. You're padding the list with false. Drop the list idea, make it an bool[32]. Or BitArray(32). They start with false automatically. That's the default value of a bool. Iterate over the those bits from 'range' A+B and write them into that array by index. Those written will have their value, those unwritten will stay false. Job done, no objects wasted.
C2: connect preallocating 32-item array with A+A2. GetBitsStartingFromLSB doesn't need to return anything, it may get a buffer to be filled via parameter. And that buffer doesn't need to be 8-item buffer. You may pass the whole 32-item final array, and pass an offset so that function knows exactly where to write. Even less objects wasted.
D: finally, all that work to return selected bits as an integer. new temporary array is created&wasted, new BitArray is effectively created&wasted too. Note that earlier you were already doing manual bit-shift conversion int->bits in GetBitsStartingFromLSB, why not just create a similar method that will do some shifts and do bits->int instead? If you knew the order of the bits, now you know them as well. No need for array&BitArray, some code wiggling, and you save on that allocations and data copying again.
I have no idea how much time/space/etc will that save for you, but that's just a few points that stand out at first glance, without modifying your original idea for the code too much, without doing-it-all via math&bitshifts in one go, etc. I've seen MarcGravell already wrote you some hints too. If you have time to spare, I suggest you try first mine, one by one, and see how (and if at all !) each change affects performance. Just to see. Then you'll probably scrap it all and try again new "do-it-all via math&bitshifts in one go" version with Marc's hints.

C# Explanation of stream string example

I am using a Microsoft example for interprocess communication. In the example there are two methods for reading/writing a string to/from a stream. The code sends the length of the string being streamed in the data. I need similar code, but need to make some modifications. An explanation of the highlighted lines would be helpful.
In WriteString(), they taken the length of the byte array being written and divide it by 256. The opposite is done in ReadString(), but an explanation as to why 256 is used would be great. Then, it writes another byte by taking the length and & it with 255. I don't understand the reasoning for this either. I'm thinking it shifts the value, but I don't really understand why this is needed. And then in ReadString() it does a += to the length by reading a byte. An explanation of this would be really helpful. I'm new to streaming and just want to understand exactly what is happening and why.
public string ReadString()
{
int len = 0;
// the next two lines
len = ioStream.ReadByte() * 256;
len += ioStream.ReadByte();
byte[] inBuffer = new byte[len];
ioStream.Read(inBuffer, 0, len);
return streamEncoding.GetString(inBuffer);
}
public int WriteString(string outString)
{
byte[] outBuffer = streamEncoding.GetBytes(outString);
int len = outBuffer.Length;
if (len > UInt16.MaxValue)
{
len = (int)UInt16.MaxValue;
}
// the next to lines
ioStream.WriteByte((byte)(len / 256));
ioStream.WriteByte((byte)(len & 255));
ioStream.Write(outBuffer, 0, len);
ioStream.Flush();
return outBuffer.Length + 2;
}
This code is bad, find a new tutorial.
The 256 stuff is there to convert the low 2 bytes of the length integer to bytes in order to serialize/deserialize them. This is not how it's normally done. Use BinaryReader/Writer or code not based on multiplication but on binary and and shift.
Dividing by 256 is equivalent to x >> 8. Also this only works with positive integers. x & 255 is used to take the lowest byte. This could simply be (byte)x. Sometimes people write x % 256 for this which not idiomatic and has problems with signedness.
Good code would be new byte[] { (byte)(x >> 8), (byte)(x >> 0) } and x = bytes[1] << 8 | bytes[0]. Much simpler, faster and idiomatic. I like writing >> 0 which does nothing for the sake of symmetry. It's optimized away. This might seem ridiculous for 16 bit ints but with longer ints there are 4 or 8 components and having one of them slightly off seems like needless inconsistency.
ioStream.Read(inBuffer, 0, len);
is a bug because it assumes the read completes in one chunk. Need a loop or again BinaryReader.
if (len > UInt16.MaxValue)
{
len = (int)UInt16.MaxValue;
}
I'm going to use this opportunity to warn anyone who reads this: Microsoft .NET sample code often is of extremely poor quality. Read it with great skepticism.

Defining a bit[] array in C#

currently im working on a solution for a prime-number calculator/checker. The algorythm is already working and verry efficient (0,359 seconds for the first 9012330 primes). Here is a part of the upper region where everything is declared:
const uint anz = 50000000;
uint a = 3, b = 4, c = 3, d = 13, e = 12, f = 13, g = 28, h = 32;
bool[,] prim = new bool[8, anz / 10];
uint max = 3 * (uint)(anz / (Math.Log(anz) - 1.08366));
uint[] p = new uint[max];
Now I wanted to go to the next level and use ulong's instead of uint's to cover a larger area (you can see that already), where i tapped into my problem: the bool-array.
Like everybody should know, bool's have the length of a byte what takes a lot of memory when creating the array... So I'm searching for a more resource-friendly way to do that.
My first idea was a bit-array -> not byte! <- to save the bool's, but haven't figured out how to do that by now. So if someone ever did something like this, I would appreciate any kind of tips and solutions. Thanks in advance :)
You can use BitArray collection:
http://msdn.microsoft.com/en-us/library/system.collections.bitarray(v=vs.110).aspx
MSDN Description:
Manages a compact array of bit values, which are represented as Booleans, where true indicates that the bit is on (1) and false indicates the bit is off (0).
You can (and should) use well tested and well known libraries.
But if you're looking to learn something (as it seems to be the case) you can do it yourself.
Another reason you may want to use a custom bit array is to use the hard drive to store the array, which comes in handy when calculating primes. To do this you'd need to further split addr, for example lowest 3 bits for the mask, next 28 bits for 256MB of in-memory storage, and from there on - a file name for a buffer file.
Yet another reason for custom bit array is to compress the memory use when specifically searching for primes. After all more than half of your bits will be 'false' because the numbers corresponding to them would be even, so in fact you can both speed up your calculation AND reduce memory requirements if you don't even store the even bits. You can do that by changing the way addr is interpreted. Further more you can also exclude numbers divisible by 3 (only 2 out of every 6 numbers has a chance of being prime) thus reducing memory requirements by 60% compared to plain bit array.
Notice the use of shift and logical operators to make the code a bit more efficient.
byte mask = (byte)(1 << (int)(addr & 7)); for example can be written as
byte mask = (byte)(1 << (int)(addr % 8));
and addr >> 3 can be written as addr / 8
Testing shift/logical operators vs division shows 2.6s vs 4.8s in favor of shift/logical for 200000000 operations.
Here's the code:
void Main()
{
var barr = new BitArray(10);
barr[4] = true;
Console.WriteLine("Is it "+barr[4]);
Console.WriteLine("Is it Not "+barr[5]);
}
public class BitArray{
private readonly byte[] _buffer;
public bool this[long addr]{
get{
byte mask = (byte)(1 << (int)(addr & 7));
byte val = _buffer[(int)(addr >> 3)];
bool bit = (val & mask) == mask;
return bit;
}
set{
byte mask = (byte) ((value ? 1:0) << (int)(addr & 7));
int offs = (int)addr >> 3;
_buffer[offs] = (byte)(_buffer[offs] | mask);
}
}
public BitArray(long size){
_buffer = new byte[size/8 + 1]; // define a byte buffer sized to hold 8 bools per byte. The spare +1 is to avoid dealing with rounding.
}
}

How to produce 8 bytes from 4 bytes with a reproducible operation?

I've 4 bytes of data and need an 8 bytes array for a security operation. I should produce these 8 bytes form the 4 bytes byte array and this should be reproducible.
I was thinking of using exact byte array and adding 4 extra bytes and fill them with AND, OR, XOR... of the initial array in a known sequence. I'm not sure if it's a good idea. I just need an 8 byte array from this 4 bytes and the operation should be reproducible (same 8 bytes with same given 4 bytes). Please give an example in C#
Why not just pad the existing 4 bytes with another 4 bytes of zeroes? Or repeat the original 4 bytes. For example:
static byte[] Pad(byte[] input)
{
// Alternatively use Array.Resize
byte[] output = new byte[input.Length + 4];
Buffer.BlockCopy(input, 0, output, 0, input.Length);
return output;
}
static byte[] Repeat(byte[] input)
{
byte[] output = new byte[input.Length * 2];
Buffer.BlockCopy(input, 0, output, 0, input.Length);
Buffer.BlockCopy(input, 0, output, input.Length, input.Length);
return output;
}
Both of these fulfil your original criteria, I believe... but I suspect you're looking for something else. If that's the case, you need to be explicit about what you need.
EDIT: As I've said in the comments, you're basically not adding any real security here - padding will make that clearer, IMO. On the other hand, if you do want some security-through-obscurity, you could find a random number generator that allows seeding, and use that as a starting point. For example:
// Don't use this - see below. Just the concept...
int seed = BitConverter.ToInt32(input, 0); // TODO: Cope with endianness
Random rng = new Random(seed);
byte[] output = new byte[8];
Buffer.BlockCopy(input, 0, output, 0, 4);
for (int i = 4; i < 8; i++) {
output[i] = (byte) rng.Next(256);
}
Now, the reason I've got the comment above is that you probably need an algorithm which is guaranteed not to change between versions of .NET. Find code to something like the Mersenne Twister, for exmaple.
There are multiple methods of doing padding for block ciphers.
This Wikipedia article covers some of the more accepted solutions: http://en.wikipedia.org/wiki/Padding_(cryptography)#Padding_methods
Barring any other considerations, I would use PKCS#7 padding.
How about
bytes.Concat(bytes)
I would be very careful. If a security operation requires 64 bits worth of data it is probably because it requires that much data. If you create your 64 bits from 32 bits with a known reproducible formula you will still only have 32 bits worth of data.
If the security is not affected by the data you have you can just fill the the remaining four bytes with ones or zeros. But you should really try to get 8 bytes of "real" data.

Categories