OK so I have converted a file into a binary format using BinaryWriter. The format is:
number of ints, followed by the ints.
So the code will be something like:
readLineOfNumbers() {
count = read();
int[] a = read(count ints);
return a;
}
Do I use a BinaryReader? The closest thing I can see there is to read everything into a byte[], but then how do I make that an int array? This all has to be done very efficiently as well. I need buffering and so on.
If you use BinaryWriter to create file it makes sense to read it using BinaryReader
Something like:
private static int[] ReadArray(BinaryReader reader)
{
int count = reader.ReadInt32();
int[] data = new int[count];
for (int i = 0; i < count; i++)
{
data[i] = reader.ReadInt32();
}
return data;
}
I don't know of anything within BinaryReader which will read an array of integers, I'm afraid. If you read into a byte array you could then use Buffer.BlockCopy to copy those bytes into an int[], which is probably the fastest form of conversion - although it relies on the endianness of your processor being appropriate for your data.
Have you tried just looping round, calling BinaryReader.ReadInt32() as many times as you need to, and letting the file system do the buffering? You could always add a BufferedStream with a large buffer into the mix if you thought that would help.
int[] original = { 1, 2, 3, 4 }, copy;
byte[] bytes;
using (var ms = new MemoryStream())
{
using (var writer = new BinaryWriter(ms))
{
writer.Write(original.Length);
for (int i = 0; i < original.Length; i++)
writer.Write(original[i]);
}
bytes = ms.ToArray();
}
using (var ms = new MemoryStream(bytes))
using (var reader = new BinaryReader(ms))
{
int len = reader.ReadInt32();
copy = new int[len];
for (int i = 0; i < len; i++)
{
copy[i] = reader.ReadInt32();
}
}
Although personally I'd just read from the stream w/o BinaryReader.
Actually, strictly speaking, if it was me I would use my own serializer, and just:
[ProtoContract]
public class Foo {
[ProtoMember(1, Options = MemberSerializationOptions.Packed)]
public int[] Bar {get;set;}
}
since this will have known endianness, handle buffering, and will use variable-length encoding to help reduce bloat if most of the numbers aren't enormous.
Related
I have an object array containing array of a different type that is not known at compile time, but turns out to be int[], double[] etc.
I want to save these arrays to disk and I don't really need to process their contents online, so I looking for a way to cast the object[] to a byte[] that I then can write to disk.
How can I achieve this?
You may use binary serialization and deserialization for Serializable types.
using System.Runtime.Serialization.Formatters.Binary;
BinaryFormatter binary = BinaryFormatter();
using (FileStream fs = File.Create(file))
{
bs.Serialize(fs, objectArray);
}
Edit: If all these elements of an array are simple types then use BitConverter.
object[] arr = { 10.20, 1, 1.2f, 1.4, 10L, 12 };
using (MemoryStream ms = new MemoryStream())
{
foreach (dynamic t in arr)
{
byte[] bytes = BitConverter.GetBytes(t);
ms.Write(bytes, 0, bytes.Length);
}
}
You could do it the old fashioned way.
static void Main()
{
object [] arrayToConvert = new object[] {1.0,10.0,3.0,4.0, 1.0, 12313.2342};
if (arrayToConvert.Length > 0) {
byte [] dataAsBytes;
unsafe {
if (arrayToConvert[0] is int) {
dataAsBytes = new byte[sizeof(int) * arrayToConvert.Length];
fixed (byte * dataP = &dataAsBytes[0])
// CLR Arrays are always aligned
for(int i = 0; i < arrayToConvert.Length; ++i)
*((int*)dataP + i) = (int)arrayToConvert[i];
} else if (arrayToConvert[0] is double) {
dataAsBytes = new byte[sizeof(double) * arrayToConvert.Length];
fixed (byte * dataP = &dataAsBytes[0]) {
// CLR Arrays are always aligned
for(int i = 0; i < arrayToConvert.Length; ++i) {
double current = (double)arrayToConvert[i];
*((long*)dataP + i) = *(long*)¤t;
}
}
} else {
throw new ArgumentException("Wrong array type.");
}
}
Console.WriteLine(dataAsBytes);
}
}
However, I would recommend that you revisit your design. You should probably be using generics, rather than object arrays.
From here:
List<object> list = ...
byte[] obj = (byte[])list.ToArray(typeof(byte));
or if your list is complex type:
list.CopyTo(obj);
Is there an elegant to emulate the StreamReader.ReadToEnd method with BinaryReader? Perhaps to put all the bytes into a byte array?
I do this:
read1.ReadBytes((int)read1.BaseStream.Length);
...but there must be a better way.
Original Answer (Read Update Below!)
Simply do:
byte[] allData = read1.ReadBytes(int.MaxValue);
The documentation says that it will read all bytes until the end of the stream is reached.
Update
Although this seems elegant, and the documentation seems to indicate that this would work, the actual implementation (checked in .NET 2, 3.5, and 4) allocates a full-size byte array for the data, which will probably cause an OutOfMemoryException on a 32-bit system.
Therefore, I would say that actually there isn't an elegant way.
Instead, I would recommend the following variation of #iano's answer. This variant doesn't rely on .NET 4:
Create an extension method for BinaryReader (or Stream, the code is the same for either).
public static byte[] ReadAllBytes(this BinaryReader reader)
{
const int bufferSize = 4096;
using (var ms = new MemoryStream())
{
byte[] buffer = new byte[bufferSize];
int count;
while ((count = reader.Read(buffer, 0, buffer.Length)) != 0)
ms.Write(buffer, 0, count);
return ms.ToArray();
}
}
There is not an easy way to do this with BinaryReader. If you don't know the count you need to read ahead of time, a better bet is to use MemoryStream:
public byte[] ReadAllBytes(Stream stream)
{
using (var ms = new MemoryStream())
{
stream.CopyTo(ms);
return ms.ToArray();
}
}
To avoid the additional copy when calling ToArray(), you could instead return the Position and buffer, via GetBuffer().
To copy the content of a stream to another, I've solved reading "some" bytes until the end of the file is reached:
private const int READ_BUFFER_SIZE = 1024;
using (BinaryReader reader = new BinaryReader(responseStream))
{
using (BinaryWriter writer = new BinaryWriter(File.Open(localPath, FileMode.Create)))
{
int byteRead = 0;
do
{
byte[] buffer = reader.ReadBytes(READ_BUFFER_SIZE);
byteRead = buffer.Length;
writer.Write(buffer);
byteTransfered += byteRead;
} while (byteRead == READ_BUFFER_SIZE);
}
}
Had the same problem.
First, get the file's size using FileInfo.Length.
Next, create a byte array and set its value to BinaryReader.ReadBytes(FileInfo.Length).
e.g.
var size = new FileInfo(yourImagePath).Length;
byte[] allBytes = yourReader.ReadBytes(System.Convert.ToInt32(size));
Another approach to this problem is to use C# extension methods:
public static class StreamHelpers
{
public static byte[] ReadAllBytes(this BinaryReader reader)
{
// Pre .Net version 4.0
const int bufferSize = 4096;
using (var ms = new MemoryStream())
{
byte[] buffer = new byte[bufferSize];
int count;
while ((count = reader.Read(buffer, 0, buffer.Length)) != 0)
ms.Write(buffer, 0, count);
return ms.ToArray();
}
// .Net 4.0 or Newer
using (var ms = new MemoryStream())
{
stream.CopyTo(ms);
return ms.ToArray();
}
}
}
Using this approach will allow for both reusable as well as readable code.
I use this, which utilizes the underlying BaseStream property to give you the length info you need. It keeps things nice and simple.
Below are three extension methods on BinaryReader:
The first reads from wherever the stream's current position is to the end
The second reads the entire stream in one go
The third utilizes the Range type to specify the subset of data you are interested in.
public static class BinaryReaderExtensions {
public static byte[] ReadBytesToEnd(this BinaryReader binaryReader) {
var length = binaryReader.BaseStream.Length - binaryReader.BaseStream.Position;
return binaryReader.ReadBytes((int)length);
}
public static byte[] ReadAllBytes(this BinaryReader binaryReader) {
binaryReader.BaseStream.Position = 0;
return binaryReader.ReadBytes((int)binaryReader.BaseStream.Length);
}
public static byte[] ReadBytes(this BinaryReader binaryReader, Range range) {
var (offset, length) = range.GetOffsetAndLength((int)binaryReader.BaseStream.Length);
binaryReader.BaseStream.Position = offset;
return binaryReader.ReadBytes(length);
}
}
Using them is then trivial and clear...
// 1 - Reads everything in as a byte array
var rawBytes = myBinaryReader.ReadAllBytes();
// 2 - Reads a string, then reads the remaining data as a byte array
var someString = myBinaryReader.ReadString();
var rawBytes = myBinaryReader.ReadBytesToEnd();
// 3 - Uses a range to read the last 44 bytes
var rawBytes = myBinaryReader.ReadBytes(^44..);
This might be a simple one, but I can't seem to find an easy way to do it. I need to save an array of 84 uint's into an SQL database's BINARY field. So I'm using the following lines in my C# ASP.NET project:
//This is what I have
uint[] uintArray;
//I need to convert from uint[] to byte[]
byte[] byteArray = ???
cmd.Parameters.Add("#myBindaryData", SqlDbType.Binary).Value = byteArray;
So how do you convert from uint[] to byte[]?
How about:
byte[] byteArray = uintArray.SelectMany(BitConverter.GetBytes).ToArray();
This'll do what you want, in little-endian format...
You can use System.Buffer.BlockCopy to do this:
byte[] byteArray = new byte[uintArray.Length * 4];
Buffer.BlockCopy(uintArray, 0, byteArray, 0, uintArray.Length * 4];
http://msdn.microsoft.com/en-us/library/system.buffer.blockcopy.aspx
This will be much more efficient than using a for loop or some similar construct. It directly copies the bytes from the first array to the second.
To convert back just do the same thing in reverse.
There is no built-in conversion function to do this. Because of the way arrays work, a whole new array will need to be allocated and its values filled-in. You will probably just have to write that yourself. You can use the System.BitConverter.GetBytes(uint) function to do some of the work, and then copy the resulting values into the final byte[].
Here's a function that will do the conversion in little-endian format:
private static byte[] ConvertUInt32ArrayToByteArray(uint[] value)
{
const int bytesPerUInt32 = 4;
byte[] result = new byte[value.Length * bytesPerUInt32];
for (int index = 0; index < value.Length; index++)
{
byte[] partialResult = System.BitConverter.GetBytes(value[index]);
for (int indexTwo = 0; indexTwo < partialResult.Length; indexTwo++)
result[index * bytesPerUInt32 + indexTwo] = partialResult[indexTwo];
}
return result;
}
byte[] byteArray = Array.ConvertAll<uint, byte>(
uintArray,
new Converter<uint, byte>(
delegate(uint u) { return (byte)u; }
));
Heed advice from #liho1eye, make sure your uints really fit into bytes, otherwise you're losing data.
If you need all the bits from each uint, you're gonna to have to make an appropriately sized byte[] and copy each uint into the four bytes it represents.
Something like this ought to work:
uint[] uintArray;
//I need to convert from uint[] to byte[]
byte[] byteArray = new byte[uintArray.Length * sizeof(uint)];
for (int i = 0; i < uintArray.Length; i++)
{
byte[] barray = System.BitConverter.GetBytes(uintArray[i]);
for (int j = 0; j < barray.Length; j++)
{
byteArray[i * sizeof(uint) + j] = barray[j];
}
}
cmd.Parameters.Add("#myBindaryData", SqlDbType.Binary).Value = byteArray;
How to write bits (not bytes) to a file with c#, .net? I'm preety stuck with it.
Edit: i'm looking for a different way that just writing every 8 bits as a byte
The smallest amount of data you can write at one time is a byte.
If you need to write individual bit-values. (Like for instance a binary format that requires a 1 bit flag, a 3 bit integer and a 4 bit integer); you would need to buffer the individual values in memory and write to the file when you have a whole byte to write. (For performance, it makes sense to buffer more and write larger chunks to the file).
Accumulate the bits in a buffer (a single byte can qualify as a "buffer")
When adding a bit, left-shift the buffer and put the new bit in the lowest position using OR
Once the buffer is full, append it to the file
I've made something like this to emulate a BitsWriter.
private BitArray bitBuffer = new BitArray(new byte[65536]);
private int bitCount = 0;
// Write one int. In my code, this is a byte
public void write(int b)
{
BitArray bA = new BitArray((byte)b);
int[] pattern = new int[8];
writeBitArray(bA);
}
// Write one bit. In my code, this is a binary value, and the amount of times
public void write(int b, int len)
{
int[] pattern = new int[len];
BitArray bA = new BitArray(len);
for (int i = 0; i < len; i++)
{
bA.Set(i, (b == 1));
}
writeBitArray(bA);
}
private void writeBitArray(BitArray bA)
{
for (int i = 0; i < bA.Length; i++)
{
bitBuffer.Set(bitCount + i, bA[i]);
bitCount++;
}
if (bitCount % 8 == 0)
{
BitArray bitBufferWithLength = new BitArray(new byte[bitCount / 8]);
byte[] res = new byte[bitBuffer.Count / 8];
for (int i = 0; i < bitCount; i++)
{
bitBufferWithLength.Set(i, (bitBuffer[i]));
}
bitBuffer.CopyTo(res, 0);
bitCount = 0;
base.BaseStream.Write(res, 0, res.Length);
}
}
You will have to use bitshifts or binary arithmetic, as you can only write one byte at a time, not individual bits.
I would like to get a byte[] from a float[] as quickly as possible, without looping through the whole array (via a cast, probably). Unsafe code is fine. Thanks!
I am looking for a byte array 4 time longer than the float array (the dimension of the byte array will be 4 times that of the float array, since each float is composed of 4 bytes). I'll pass this to a BinaryWriter.
EDIT:
To those critics screaming "premature optimization":
I have benchmarked this using ANTS profiler before I optimized. There was a significant speed increase because the file has a write-through cache and the float array is exactly sized to match the sector size on the disk. The binary writer wraps a file handle created with pinvoke'd win32 API. The optimization occurs since this lessens the number of function calls.
And, with regard to memory, this application creates massive caches which use plenty of memory. I can allocate the byte buffer once and re-use it many times--the double memory usage in this particular instance amounts to a roundoff error in the overall memory consumption of the app.
So I guess the lesson here is not to make premature assumptions ;)
There is a dirty fast (not unsafe code) way of doing this:
[StructLayout(LayoutKind.Explicit)]
struct BytetoDoubleConverter
{
[FieldOffset(0)]
public Byte[] Bytes;
[FieldOffset(0)]
public Double[] Doubles;
}
//...
static Double Sum(byte[] data)
{
BytetoDoubleConverter convert = new BytetoDoubleConverter { Bytes = data };
Double result = 0;
for (int i = 0; i < convert.Doubles.Length / sizeof(Double); i++)
{
result += convert.Doubles[i];
}
return result;
}
This will work, but I'm not sure of the support on Mono or newer versions of the CLR. The only strange thing is that the array.Length is the bytes length. This can be explained because it looks at the array length stored with the array, and because this array was a byte array that length will still be in byte length. The indexer does think about the Double being eight bytes large so no calculation is necessary there.
I've looked for it some more, and it's actually described on MSDN, How to: Create a C/C++ Union by Using Attributes (C# and Visual Basic), so chances are this will be supported in future versions. I am not sure about Mono though.
Premature optimization is the root of all evil! #Vlad's suggestion to iterate over each float is a much more reasonable answer than switching to a byte[]. Take the following table of runtimes for increasing numbers of elements (average of 50 runs):
Elements BinaryWriter(float) BinaryWriter(byte[])
-----------------------------------------------------------
10 8.72ms 8.76ms
100 8.94ms 8.82ms
1000 10.32ms 9.06ms
10000 32.56ms 10.34ms
100000 213.28ms 739.90ms
1000000 1955.92ms 10668.56ms
There is little difference between the two for small numbers of elements. Once you get into the huge number of elements range, the time spent copying from the float[] to the byte[] far outweighs the benefits.
So go with what is simple:
float[] data = new float[...];
foreach(float value in data)
{
writer.Write(value);
}
There is a way which avoids memory copying and iteration.
You can use a really ugly hack to temporary change your array to another type using (unsafe) memory manipulation.
I tested this hack in both 32 & 64 bit OS, so it should be portable.
The source + sample usage is maintained at https://gist.github.com/1050703 , but for your convenience I'll paste it here as well:
public static unsafe class FastArraySerializer
{
[StructLayout(LayoutKind.Explicit)]
private struct Union
{
[FieldOffset(0)] public byte[] bytes;
[FieldOffset(0)] public float[] floats;
}
[StructLayout(LayoutKind.Sequential, Pack = 1)]
private struct ArrayHeader
{
public UIntPtr type;
public UIntPtr length;
}
private static readonly UIntPtr BYTE_ARRAY_TYPE;
private static readonly UIntPtr FLOAT_ARRAY_TYPE;
static FastArraySerializer()
{
fixed (void* pBytes = new byte[1])
fixed (void* pFloats = new float[1])
{
BYTE_ARRAY_TYPE = getHeader(pBytes)->type;
FLOAT_ARRAY_TYPE = getHeader(pFloats)->type;
}
}
public static void AsByteArray(this float[] floats, Action<byte[]> action)
{
if (floats.handleNullOrEmptyArray(action))
return;
var union = new Union {floats = floats};
union.floats.toByteArray();
try
{
action(union.bytes);
}
finally
{
union.bytes.toFloatArray();
}
}
public static void AsFloatArray(this byte[] bytes, Action<float[]> action)
{
if (bytes.handleNullOrEmptyArray(action))
return;
var union = new Union {bytes = bytes};
union.bytes.toFloatArray();
try
{
action(union.floats);
}
finally
{
union.floats.toByteArray();
}
}
public static bool handleNullOrEmptyArray<TSrc,TDst>(this TSrc[] array, Action<TDst[]> action)
{
if (array == null)
{
action(null);
return true;
}
if (array.Length == 0)
{
action(new TDst[0]);
return true;
}
return false;
}
private static ArrayHeader* getHeader(void* pBytes)
{
return (ArrayHeader*)pBytes - 1;
}
private static void toFloatArray(this byte[] bytes)
{
fixed (void* pArray = bytes)
{
var pHeader = getHeader(pArray);
pHeader->type = FLOAT_ARRAY_TYPE;
pHeader->length = (UIntPtr)(bytes.Length / sizeof(float));
}
}
private static void toByteArray(this float[] floats)
{
fixed(void* pArray = floats)
{
var pHeader = getHeader(pArray);
pHeader->type = BYTE_ARRAY_TYPE;
pHeader->length = (UIntPtr)(floats.Length * sizeof(float));
}
}
}
And the usage is:
var floats = new float[] {0, 1, 0, 1};
floats.AsByteArray(bytes =>
{
foreach (var b in bytes)
{
Console.WriteLine(b);
}
});
If you do not want any conversion to happen, I would suggest Buffer.BlockCopy().
public static void BlockCopy(
Array src,
int srcOffset,
Array dst,
int dstOffset,
int count
)
For example:
float[] floatArray = new float[1000];
byte[] byteArray = new byte[floatArray.Length * 4];
Buffer.BlockCopy(floatArray, 0, byteArray, 0, byteArray.Length);
You're better-off letting the BinaryWriter do this for you. There's going to be iteration over your entire set of data regardless of which method you use, so there's no point in playing with bytes.
Although you can obtain a byte* pointer using unsafe and fixed, you cannot convert the byte* to byte[] in order for the writer to accept it as a parameter without performing data copy. Which you do not want to do as it will double your memory footprint and add an extra iteration over the inevitable iteration that needs to be performed in order to output the data to disk.
Instead, you are still better off iterating over the array of floats and writing each float to the writer individually, using the Write(double) method. It will still be fast because of buffering inside the writer. See sixlettervariables's numbers.
Using the new Span<> in .Net Core 2.1 or later...
byte[] byteArray2 = MemoryMarshal.Cast<float, byte>(floatArray).ToArray();
Or, if Span can be used instead, then a direct reinterpret cast can be done: (very fast - zero copying)
Span<byte> byteArray3 = MemoryMarshal.Cast<float, byte>(floatArray);
// with span we can get a byte, set a byte, iterate, and more.
byte someByte = byteSpan[2];
byteSpan[2] = 33;
I did some crude benchmarks. The time taken for each is in the comments. [release/no debugger/x64]
float[] floatArray = new float[100];
for (int i = 0; i < 100; i++) floatArray[i] = i * 7.7777f;
Stopwatch start = Stopwatch.StartNew();
for (int j = 0; j < 100; j++)
{
start.Restart();
for (int k = 0; k < 1000; k++)
{
Span<byte> byteSpan = MemoryMarshal.Cast<float, byte>(floatArray);
}
long timeTaken1 = start.ElapsedTicks; ////// 0 ticks //////
start.Restart();
for (int k = 0; k < 1000; k++)
{
byte[] byteArray2 = MemoryMarshal.Cast<float, byte>(floatArray).ToArray();
}
long timeTaken2 = start.ElapsedTicks; ////// 26 ticks //////
start.Restart();
for (int k = 0; k < 1000; k++)
{
byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
for (int i = 0; i < floatArray.Length; i++)
BitConverter.GetBytes(floatArray[i]).CopyTo(byteArray, i * sizeof(float));
}
long timeTaken3 = start.ElapsedTicks; ////// 1310 ticks //////
start.Restart();
for (int k = 0; k < 1000; k++)
{
byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
Buffer.BlockCopy(floatArray, 0, byteArray, 0, byteArray.Length);
}
long timeTaken4 = start.ElapsedTicks; ////// 33 ticks //////
start.Restart();
for (int k = 0; k < 1000; k++)
{
byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
MemoryStream memStream = new MemoryStream();
BinaryWriter writer = new BinaryWriter(memStream);
foreach (float value in floatArray)
writer.Write(value);
writer.Close();
}
long timeTaken5 = start.ElapsedTicks; ////// 1080 ticks //////
Console.WriteLine($"{timeTaken1/10,6} {timeTaken2 / 10,6} {timeTaken3 / 10,6} {timeTaken4 / 10,6} {timeTaken5 / 10,6} ");
}
We have a class called LudicrousSpeedSerialization and it contains the following unsafe method:
static public byte[] ConvertFloatsToBytes(float[] data)
{
int n = data.Length;
byte[] ret = new byte[n * sizeof(float)];
if (n == 0) return ret;
unsafe
{
fixed (byte* pByteArray = &ret[0])
{
float* pFloatArray = (float*)pByteArray;
for (int i = 0; i < n; i++)
{
pFloatArray[i] = data[i];
}
}
}
return ret;
}
Although it basically does do a for loop behind the scenes, it does do the job in one line
byte[] byteArray = floatArray.Select(
f=>System.BitConverter.GetBytes(f)).Aggregate(
(bytes, f) => {List<byte> temp = bytes.ToList(); temp.AddRange(f); return temp.ToArray(); });