Equivalent of StringBuilder for byte arrays - c#

This is a simple one, and one that I thought would have been answered. I did try to find an answer on here, but didn't come up with anything - so apologies if there is something I have missed.
Anyway, is there an equivalent of StringBuilder but for byte arrays?
I'm not bothered about all the different overloads of Append() - but I'd like to see Append(byte) and Append(byte[]).
Is there anything around or is it roll-your-own time?

Would MemoryStream work for you? The interface might not be quite as convenient, but it offers a simple way to append bytes, and when you are done you can get the contents as a byte[] by calling ToArray().
A more StringBuilder-like interface could probably be achieved by making an extension method.
Update
Extension methods could look like this:
public static class MemoryStreamExtensions
{
public static void Append(this MemoryStream stream, byte value)
{
stream.Append(new[] { value });
}
public static void Append(this MemoryStream stream, byte[] values)
{
stream.Write(values, 0, values.Length);
}
}
Usage:
MemoryStream stream = new MemoryStream();
stream.Append(67);
stream.Append(new byte[] { 68, 69 });
byte[] data = stream.ToArray(); // gets an array with bytes 67, 68 and 69

The MemoryStream approach is good, but if you want to have StringBuilder-like behavior add a BinaryWriter. BinaryWriter provides all the Write overrides you could want.
MemoryStream stream = new MemoryStream();
BinaryWriter writer = new BinaryWriter(stream);
writer.Write((byte)67);
writer.Write(new byte[] { 68, 69 });

Probably List<byte>:
var byteList = new List<byte>();
byteList.Add(42);
byteList.AddRange(new byte[] { 1, 2, 3 });

List<byte> Then when you want it as an array you can call ToArray()

using System;
using System.IO;
public static class MemoryStreams
{
public static MemoryStream Append(
this MemoryStream stream
, byte value
, out bool done)
{
try
{
stream.WriteByte(value);
done = true;
}
catch { done = false; }
return stream;
}
public static MemoryStream Append(
this MemoryStream stream
, byte[] value
, out bool done
, uint offset = 0
, Nullable<uint> count = null)
{
try
{
var rLenth = (uint)((value == null) ? 0 : value.Length);
var rOffset = unchecked((int)(offset & 0x7FFFFFFF));
var rCount = unchecked((int)((count ?? rLenth) & 0x7FFFFFFF));
stream.Write(value, rOffset, rCount);
done = true;
}
catch { done = false; }
return stream;
}
}

Look at this project at codeproject.
It's just one class that implements the solution with a MemoryStream as suggested in other answers.

Related

Fastest way to save/load enum[] (byte) to a file

The array has duplicate elements and their order is important (must be kept). I have to save/load hundreds of these files constantly and each file may hold an array up to 100,000 elements.
The code bellow is an example of what I'm currently doing to save/load the files. Since IO is slow I got a significant speed improvement by casting the enums to byte before serialization (reducing the file size by 10x). I'm not sure I should be using BinaryFormatter though.
I'm still looking for improvements as everything should be as quick as possible, is there a better alternative to what I'm currently doing? How would you do it?
enum DogBreed : byte { Bulldog, Poodle, Beagle, Rottweiler, Chihuahua }
DogBreed[] myDogs = { DogBreed.Beagle, DogBreed.Poodle, DogBreed.Beagle, DogBreed.Bulldog };
public void Save(string path)
{
BinaryFormatter formatter = new BinaryFormatter();
FileStream stream = new FileStream(path, FileMode.Create);
byte[] myDogsInByte = Array.ConvertAll(myDogs, new Converter<DogBreed, byte>(DogBreedToByte));
formatter.Serialize(stream, myDogsInByte);
stream.Close();
}
public bool Load(string path)
{
if (!File.Exists(path))
{
return false;
}
BinaryFormatter formatter = new BinaryFormatter();
FileStream stream = new FileStream(path, FileMode.Open);
byte[] myDogsInByte = formatter.Deserialize(stream) as byte[];
myDogs = Array.ConvertAll(myDogsInByte, new Converter<byte, DogBreed>(ByteToDogBreed));
stream.Close();
return true;
}
private byte DogBreedToByte(DogBreed db)
{
return (byte)db;
}
private DogBreed ByteToDogBreed(byte bt)
{
return (DogBreed)bt;
}
EDIT: New code based on Jeremy suggestion, the code is working, I'll try to test the performance of it and post the results here as soon as I can.
enum DogBreed : byte { Bulldog, Poodle, Beagle, Rottweiler, Chihuahua }
DogBreed[] myDogs = { DogBreed.Beagle, DogBreed.Poodle, DogBreed.Beagle, DogBreed.Bulldog };
public void Save(string path)
{
byte[] myDogsInByte = new byte[myDogs.Length];
Array.Copy(myDogs,myDogsInByte,myDogs.Length);
File.WriteAllBytes(path, myDogsInByte);
}
public bool Load(string path)
{
if (!File.Exists(path))
{
return false;
}
byte[] myDogsInByte = File.ReadAllBytes(path);
myDogs = (DogBreed[])(object)myDogsInByte;
return true;
}
While the C# compiler will complain if you attempt to directly assign a byte[] to an enum array. The runtime doesn't care.
var bytes = File.ReadAllBytes(path);
myDogs = (DogBreed[])(object)bytes;
The VS debugger will show that myDogs is really a byte array, but accessing an element from the array works just fine.
Update;
ArgumentException: Object must be an array of primitives.
So File.WriteAllBytes() doesn't like being tricked with an enum[]. You should be able to to use Array.Copy to quickly duplicate the enum values into a byte[].
var buffer = new byte[myDogs.Length];
Array.Copy(myDogs, buffer, myDogs.Length);
File.WriteAllBytes(path, buffer);
Of course that's not a free operation, but it should be fairly fast even for large arrays.

What is the best way of reading/writing structured binary data in C#

Like in C we could use structure pointers to read or write structured binary data like file headers etc, is there a similar way to do this in C#?
Using BinaryReader and BinaryWriter over a MemoryStream tends to be the best way in my opinion.
Parsing binary data:
byte[] buf = f // some data from somewhere
using (var ms = new MemoryStream(buf, false)) { // Read-only
var br = new BinaryReader(ms);
UInt32 len = br.ReadUInt32();
// ...
}
Generating binary data:
byte[] result;
using (var ms = new MemoryStream()) { // Expandable
var bw = new BinaryWriter(ms);
UInt32 len = 0x1337;
bw.Write(len);
// ...
result = ms.GetBuffer(); // Get the underlying byte array you've created.
}
They allow you to read and write all of the primitive types you'd need for most file headers, etc. such as (U)Int16, 32, 64, Single, Double, as well as byte, char and arrays of those.  There is direct support for strings, however only if 
The string is prefixed with the length, encoded as an integer seven bits at a time.
This only seems useful to me if you wrote the string in this way from BinaryWriter in this way. It's easy enough however, say your string is prefixed by a DWord length, followed by that many ASCII characters:
int len = (int)br.ReadUInt32();
string s = Encoding.ASCII.GetString(br.ReadBytes(len));
Note that I do not have the BinaryReader and BinaryWriter objects wrapped in a using() block. This is because, although they are IDisposable, all their Dispose() does is call Dispose() on the underlying stream (in these examples, the MemoryStream).
Since all the BinaryReader/BinaryWriter are is a set of Read()/Write() wrappers around the underlying streams, I don't see why they're IDisposable anyway. It's just confusing when you try to do The Right Thing and call Dispose() on all your IDisposables, and suddenly your stream is disposed.
To read arbitrarily-structured data (a struct) from a binary file, you first need this:
public static T ToStructure<T>(byte[] data)
{
unsafe
{
fixed (byte* p = &data[0])
{
return (T)Marshal.PtrToStructure(new IntPtr(p), typeof(T));
}
};
}
You can then:
public static T Read<T>(BinaryReader reader) where T: new()
{
T instance = new T();
return ToStructure<T>(reader.ReadBytes(Marshal.SizeOf(instance)));
}
To write, convert the struct object to a byte array:
public static byte[] ToByteArray(object obj)
{
int len = Marshal.SizeOf(obj);
byte[] arr = new byte[len];
IntPtr ptr = Marshal.AllocHGlobal(len);
Marshal.StructureToPtr(obj, ptr, true);
Marshal.Copy(ptr, arr, 0, len);
Marshal.FreeHGlobal(ptr);
return arr;
}
...and then just write the resulting byte array to a file using a BinaryWriter.
Here is an simple example showing how to read and write data in Binary format to and from a file.
using System;
using System.IO;
namespace myFileRead
{
class Program
{
static void Main(string[] args)
{
// Let's create new data file.
string myFileName = #"C:\Integers.dat";
//check if already exists
if (File.Exists(myFileName))
{
Console.WriteLine(myFileName + " already exists in the selected directory.");
return;
}
FileStream fs = new FileStream(myFileName, FileMode.CreateNew);
// Instantialte a Binary writer to write data
BinaryWriter bw = new BinaryWriter(fs);
// write some data with bw
for (int i = 0; i < 100; i++)
{
bw.Write((int)i);
}
bw.Close();
fs.Close();
// Instantiate a reader to read content from file
fs = new FileStream(myFileName, FileMode.Open, FileAccess.Read);
BinaryReader br = new BinaryReader(fs);
// Read data from the file
for (int i = 0; i < 100; i++)
{
//read data as Int32
Console.WriteLine(br.ReadInt32());
}
//close the file
br.Close();
fs.Close();
}
}
}

Why StringWriter.ToString return `System.Byte[]` and not the data?

UnZipFile method writes the data from inputStream to outputWriter.
Why sr.ToString() returns System.Byte[] and not the data?
using (var sr = new StringWriter())
{
UnZipFile(response.GetResponseStream(), sr);
var content = sr.ToString();
}
public static void UnZipFile(Stream inputStream, TextWriter outputWriter)
{
using (var zipStream = new ZipInputStream(inputStream))
{
ZipEntry currentEntry;
if ((currentEntry = zipStream.GetNextEntry()) != null)
{
var size = 2048;
var data = new byte[size];
while (true)
{
size = zipStream.Read(data, 0, size);
if (size > 0)
{
outputWriter.Write(data);
}
else
{
break;
}
}
}
}
}
The problem is on the line:
outputWriter.Write(data);
StringWriter.Write has no overload expecting a byte[]. Therefore, Write(Object) is called instead. And according to MSDN:
Writes the text representation of an object to the text string or stream by calling the ToString method on that object.
Calling ToString on a byte array returns System.byte[], explaining how you get that string in your StringWriter.
The reason is simple:
data is of type byte[]. There is no overload for byte[] on StringWriter so it uses the overload for object. And then calls ToString() on the boxed byte array which simply prints the type.
Your code is equivalent to this:
outputWriter.Write(data.ToString());
theateist,
Looking at the other answers here, I am going to have to agree that the reason for the "ToString()" returning System.Byte[] is because that is what you are putting into it, and everything put into the StringWriter calls it's own "ToString" method when doing so. (i.e. byte[].toString() = "System.byte[]"). In fact the whole idea is that the StringWriter is only ment for writing into a string "buffer" (StringBuilder), so in theory if your file was large enough(bigger than 2048), your output would be "System.Byte[]System.Byte[]" (etc.). Try this to deflate into a memory stream and then read from that stream, may be a better understanding of what you are looking at. (Code not tested, just example).
using (Stream ms = new MemoryStream())
{
UnZipFile(response.GetResponseStream(), ms);
string content;
ms.Position = 0;
using(StreamReader s = new StreamReader(ms))
{
content = s.ReadToEnd();
}
}
public static void UnZipFile(Stream inputStream, Stream outputWriter)
{
using (var zipStream = new ZipInputStream(inputStream))
{
ZipEntry currentEntry;
if ((currentEntry = zipStream.GetNextEntry()) != null)
{
int size = 2048;
byte[] data = new byte[size];
while (true)
{
size = zipStream.Read(data, 0, size);
if (size > 0)
{
outputWriter.Write(data);
}
else
{
break;
}
}
}
}
}
Another idea would actually be to using the endcoding to get the string
public string UnZipFile(Stream inputStream)
{
string tmp;
using(Stream zipStream = new ZipInputStream(inputStream))
{
ZipEntry currentEntry;
if(currentEntry = zipStream.GetNextEntry()) != null)
{
using(Stream ms = new MemoryStream())
{
int size = 2048;
byte[] data = new byte[size];
while(true)
{
if((size = zipStream.Read(data,0,size)) > 0)
ms.Write(data);
else
break;
}
tmp = Encoding.Default.GetString(ms.ToByteArray());
}
}
}
}
return tmp;
}
Or as one last idea, you could actually change your original code to have
outputWriter.Write(Encoding.Default.GetString(data));
Instead of
outputWriter.Write(data);
By the way, please avoid the var keyword in posts, maybe just my pet peev, but code is less readable when utilizing weak types.
StringWriter.Write:MSDN
StringWriter.ToString:MSDN

Gzip uncompress from string error, The magic number in GZip header is not correct

I am trying to replicate the php function gzuncompress in C#
So far I got part of following code working. see comment and code below.
I thing the tricky bit is happening during byte[] and string convertion.
How can I fix this? and where did I missed??
I am using .Net 3.5 environment
var plaintext = Console.ReadLine();
Console.WriteLine("string to byte[] then to string");
byte[] buff = Encoding.UTF8.GetBytes(plaintext);
var compress = GZip.GZipCompress(buff);
//Uncompress working below
try
{
var unpressFromByte = GZip.GZipUncompress(compress);
Console.WriteLine("uncompress successful by uncompress byte[]");
}catch
{
Console.WriteLine("uncompress failed by uncompress byte[]");
}
var compressString = Encoding.UTF8.GetString(compress);
Console.WriteLine(compressString);
var compressBuff = Encoding.UTF8.GetBytes(compressString);
Console.WriteLine(Encoding.UTF8.GetString(compressBuff));
//Uncompress not working below by using string
//The magic number in GZip header is not correct
try
{
var uncompressFromString = GZip.GZipUncompress(compressBuff);
Console.WriteLine("uncompress successful by uncompress string");
}
catch
{
Console.WriteLine("uncompress failed by uncompress string");
}
code for class Gzip
public static class GZip
{
public static byte[] GZipUncompress(byte[] data)
{
using (var input = new MemoryStream(data))
using (var gzip = new GZipStream(input, CompressionMode.Decompress))
using (var output = new MemoryStream())
{
gzip.CopyTo(output);
return output.ToArray();
}
}
public static byte[] GZipCompress(byte[] data)
{
using (var input = new MemoryStream(data))
using (var output = new MemoryStream())
{
using (var gzip = new GZipStream(output, CompressionMode.Compress, true))
{
input.CopyTo(gzip);
}
return output.ToArray();
}
}
public static long CopyTo(this Stream source, Stream destination)
{
var buffer = new byte[2048];
int bytesRead;
long totalBytes = 0;
while ((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0)
{
destination.Write(buffer, 0, bytesRead);
totalBytes += bytesRead;
}
return totalBytes;
}
}
This is inappropriate:
var compressString = Encoding.UTF8.GetString(compress);
compress isn't a UTF-8-encoded piece of text. You should treat it as arbitrary binary data - which isn't appropriate to pass into Encoding.GetString. If you really need to convert arbitrary binary data into text, use Convert.ToBase64String (and then reverse with Convert.FromBase64String):
var compressString = Convert.ToBase64String(compress);
Console.WriteLine(compressString);
var compressBuff = Convert.FromBase64String(compressString);
That may or may not match what PHP does, but it's a safe way of representing arbitrary binary data as text, unlike treating the binary data as if it were valid UTF-8-encoded text.
I am trying to replicate the php function gzuncompress in C#
Then use GZipStream or DeflateStream classes which are built into the .NET framework for this purpose.

Convert an array of different value types to a byte array

This is what I've come up with so far, but it doesn't seem very optimal, any ideas on better approaches?
public void ToBytes(object[] data, byte[] buffer)
{
byte[] obytes;
int offset = 0;
foreach (object obj in data)
{
if (obj is string)
obytes = System.Text.Encoding.UTF8.GetBytes(((string)obj));
else if (obj is bool)
obytes = BitConverter.GetBytes((bool)obj);
else if (obj is char)
obytes = BitConverter.GetBytes((char)obj);
// And so on for each valuetype
Buffer.BlockCopy(obytes, 0, buffer, offset, obytes.Length);
offset += obytes.Length;
}
}
Well, you could have a map like this:
private static readonlyDictionary<Type, Func<object, byte[]>> Converters =
new Dictionary<Type, Func<object, byte[]>>()
{
{ typeof(string), o => Encoding.UTF8.GetBytes((string) o) },
{ typeof(bool), o => BitConverter.GetBytes((bool) o) },
{ typeof(char), o => BitConverter.GetBytes((char) o) },
...
};
public static void ToBytes(object[] data, byte[] buffer)
{
int offset = 0;
foreach (object obj in data)
{
if (obj == null)
{
// Or do whatever you want
throw new ArgumentException("Unable to convert null values");
}
Func<object, byte[]> converter;
if (!Converters.TryGetValue(obj.GetType(), out converter))
{
throw new ArgumentException("No converter for " + obj.GetType());
}
byte[] obytes = converter(obj);
Buffer.BlockCopy(obytes, 0, buffer, offset, obytes.Length);
offset += obytes.Length;
}
}
You're still specifying the converter for each type, but it's a lot more compact than the if/else form.
There are various other ways of constructing the dictionary, btw. You could do it like this:
private static readonly Dictionary<Type, Func<object, byte[]>> Converters =
new Dictionary<Type, Func<object, byte[]>>();
static WhateverYourTypeIsCalled()
{
AddConverter<string>(Encoding.UTF8.GetBytes);
AddConverter<bool>(BitConverter.GetBytes);
AddConverter<char>(BitConverter.GetBytes);
}
static void AddConverter<T>(Func<T, byte[]> converter)
{
Converters.Add(typeof(T), x => converter((T) x));
}
I see another answer has suggested binary serialization. I'm personally not keen on "opaque" serialization schemes like that. I like to know exactly what's going to be in the data in a way that means I can port it to other platforms.
I would point out, however, that your current scheme doesn't give any sort of delimiter - if you have two strings, you'd have no idea where one stopped and the other started, for example. You also don't store the type information - that may be okay, but it may not be. The variable length issue is usually more important. You might consider using a length-prefix scheme, like the one in BinaryWriter. Indeed, BinaryWriter may well be a simpler solution in general. You'd probably want to still have a map of delegates, but make them actions taking a BinaryWriter and a value. You could then build the map by reflection, or just a hardcoded list of calls.
Then you'd just initialize a BinaryWriter wrapping a MemoryStream, write each value to it appropriately, then call ToArray on the MemoryStream to get the results.
Probably, you should consider using BinaryFormatter instead:
var formatter = new BinaryFormatter();
var stream = new MemoryStream();
formatter.Serialize(stream, obj);
byte[] result = stream.ToArray();
Beside that, there are some pretty good serialization frameworks like Google Protocol Buffers if you want to avoid reinventing the wheel.
You can use a StreamWriter to write to a memory stream and use its buffer:
{
byte[] result;
using (MemoryStream stream = new MemoryStream())
{
StreamWriter writer = new StreamWriter(stream);
writer.WriteLine("test");
writer.WriteLine(12);
writer.WriteLine(true);
writer.Flush();
result = stream.GetBuffer();
}
using(MemoryStream stream=new MemoryStream(result))
{
StreamReader reader = new StreamReader(stream);
while(! reader.EndOfStream)
Console.WriteLine(reader.ReadLine());
}
}

Categories