I am parsing a byte array containing different type values stored in a fixed format. For example first 4 bytes could be an int containing size of an array -
let's say the array of doubles so next 8 bytes represent a double - the first element of the array etc. It could in theory contain values of other types, but let's say we can only have
bool,int,uint,short,ushort,long,ulong,float,double and arrays of each of these. Simple approach:
public class FixedFormatParser
{
private byte[] _Contents;
private int _CurrentPos = 0;
public FixedFormatParser(byte[] contents)
{
_Contents = contents;
}
bool ReadBool()
{
bool res = BitConverter.ToBoolean(_Contents, _CurrentPos);
_CurrentPos += sizeof(bool);
return res;
}
int ReadInt()
{
int res = BitConverter.ToInt32(_Contents, _CurrentPos);
_CurrentPos += sizeof(int);
return res;
}
// etc. for uint, short, ushort, long, ulong, float, double
int[] ReadIntArray()
{
int size = ReadInt();
if (size == 0)
return null;
int[] res = new int[size];
for (int i = 0; i < size; i++)
res[i] = ReadInt();
return res;
}
// etc. for bool, uint, short, ushort, long, ulong, float, double
}
I can obviously write 18 methods to cover each case, but seems like there should be a way to generalize this.
bool val = Read<bool>();
long[] arr = ReadArray<long>(); // or ReadArray(Read<long>);
Obviously I don't mean write 2 wrappers in addition to the 18 methods to allow for this syntax. The syntax is not important, the code duplication is the issue. Another consideration is the performance. Ideally there would not be any (or much) of a performance hit. Thanks.
Update:
Regarding other questions that are supposedly duplicates. I disagree as none of them addressed the particular generalization I am after, but one came pretty close:
First answer in
C# Reading Byte Array
described wrapping BinaryReader. This would cover 9 of the 18 methods. So half of the problem is addressed. I still would need to write all of the various array reads.
public class FixedFormatParser2 : BinaryReader
{
public FixedFormatParser2(byte[] input) : base(new MemoryStream(input))
{
}
public override string ReadString()
{
//
}
public double[] ReadDoubleArray()
{
int size = ReadInt32();
if (size == 0)
return null;
double[] res = new double[size];
for (int i = 0; i < size; i++)
res[i] = ReadDouble();
return res;
}
}
How do I not write a separate ReadXXXArray for each of the types?
Nearest I got to it:
public void WriteCountedArray(dynamic[] input)
{
if (input == null || input.Length == 0)
Write((int)0);
else
{
Write(input.Length);
foreach (dynamic val in input)
Write(val);
}
}
This compiles but calling it is cumbersome and expensive :
using (FixedFormatWriter writer = new FixedFormatWriter())
{
double[] array = new double[3];
// ... assign values
writer.WriteCountedArray(array.Select(x=>(dynamic)x).ToArray());
I like doing like this
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Xml;
using System.Xml.Serialization;
using System.IO;
namespace ConsoleApplication50
{
class Program
{
static void Main(string[] args)
{
new Format();
}
}
public class Format
{
public enum TYPES
{
INT,
INT16,
LONG
}
public static List<Format> format = new List<Format>() {
new Format() { name = "AccountNumber", _type = TYPES.INT ,numberOfBytes = 4},
new Format() { name = "Age", _type = TYPES.INT16 ,numberOfBytes = 2},
new Format() { name = "AccountNumber", _type = TYPES.LONG ,numberOfBytes = 8}
};
public Dictionary<string, object> dict = new Dictionary<string, object>();
public string name { get; set; }
public TYPES _type { get; set; }
public int numberOfBytes { get; set; }
public Format() { }
public Format(byte[] contents)
{
MemoryStream stream = new MemoryStream(contents);
BinaryReader reader = new BinaryReader(stream);
foreach (Format item in format)
{
switch (item._type)
{
case TYPES.INT16 :
dict.Add(item.name, reader.ReadInt16());
break;
case TYPES.INT:
dict.Add(item.name, reader.ReadInt32());
break;
case TYPES.LONG:
dict.Add(item.name, reader.ReadInt64());
break;
}
}
}
}
}
Related
I am getting the "Fody/Alea.CUDA: clrobj(cGPU) does not have llvm" build error for a code in which I try to pass an array of struct to the NVIDIA Kernel using ALEA library. Here is a simplified version of my code. I removed the output gathering functionality in order to keep the code simple. I just need to be able to send the array of struct to the GPU for the moment.
using Alea.CUDA;
using Alea.CUDA.Utilities;
using Alea.CUDA.IL;
namespace GPUProgramming
{
public class cGPU
{
public int Slice;
public float FloatValue;
}
[AOTCompile(AOTOnly = true)]
public class TestModule : ILGPUModule
{
public TestModule(GPUModuleTarget target) : base(target)
{
}
const int blockSize = 64;
[Kernel]
public void Kernel2(deviceptr<cGPU> Data, int n)
{
var start = blockIdx.x * blockDim.x + threadIdx.x;
int ind = threadIdx.x;
var sharedSlice = __shared__.Array<int>(64);
var sharedFloatValue = __shared__.Array<float>(64);
if (ind < n && start < n)
{
sharedSlice[ind] = Data[start].Slice;
sharedFloatValue[ind] = Data[start].FloatValue;
Intrinsic.__syncthreads();
}
}
public void Test2(deviceptr<cGPU> Data, int n, int NumOfBlocks)
{
var GridDim = new dim3(NumOfBlocks, 1);
var BlockDim = new dim3(64, 1);
try
{
var lp = new LaunchParam(GridDim, BlockDim);
GPULaunch(Kernel2, lp, Data, n);
}
catch (CUDAInterop.CUDAException x)
{
var code = x.Data0;
Console.WriteLine("ErrorCode = {0}", code);
}
}
public void Test2(cGPU[] Data)
{
int NumOfBlocks = Common.divup(Data.Length, blockSize);
using (var d_Slice = GPUWorker.Malloc(Data))
{
try
{
Test_Kernel2(d_Slice.Ptr, Data.Length, NumOfBlocks);
}
catch (CUDAInterop.CUDAException x)
{
var code = x.Data0;
Console.WriteLine("ErrorCode = {0}", x.Data0);
}
}
}
}
}
Your data is class, which is reference type. Try use struct. Reference type doesn't fit Gpu well, since it require of allocating small memory on the heap.
Does anyone know of a C# equivalent of this occassional PHP task:
Convert dot syntax like "this.that.other" to multi-dimensional array in PHP
That is, to convert a list of strings like level1.level2.level3 = item into a dictionary or multidimensional array?
I'd assume the dictionary would need to hold items of type object and I'd later cast them to Dictionary<string, string> or a string if it's a final item.
Does code like this work?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string input = "level1.level2.level3 = item";
string pattern = "^(?'keys'[^=]+)=(?'value'.*)";
Match match = Regex.Match(input, pattern);
string value = match.Groups["value"].Value.Trim();
string[] keys = match.Groups["keys"].Value.Trim().Split(new char[] {'.'}, StringSplitOptions.RemoveEmptyEntries);
Dictionary<string, List<string>> dict = new Dictionary<string, List<string>>();
foreach (string key in keys)
{
if (dict.ContainsKey("key"))
{
dict[key].Add(value);
}
else
{
dict.Add(key, new List<string>() { value });
}
}
}
}
}
I guess you could do it like this, though (i'm sure more optimization could be done)
using System;
using System.Collections.Generic;
public class Program
{
public class IndexedTree {
private readonly IDictionary<string, IndexedTree> _me;
private object _value;
private readonly string _splitKey = ".";
public IndexedTree this[string key] {
get {
return _me[key];
}
}
public object Value { get; set; }
public void Add(string dottedItem) {
if ( string.IsNullOrWhiteSpace( dottedItem ) ) {
throw new ArgumentException("dottedItem cannot be empty");
}
int index;
if ( (index = dottedItem.IndexOf( _splitKey ) ) < 0 ) {
throw new ArgumentException("dottedItem didn't contain " + _splitKey);
}
string key = dottedItem.Substring(0, index), rest = dottedItem.Substring(index + 1);
IndexedTree child;
if (_me.ContainsKey(key)) {
child = _me[key];
} else {
child = new IndexedTree( _splitKey );
_me.Add(key, child);
}
if (rest.IndexOf(_splitKey) >= 0) {
child.Add(rest);
} else {
// maybe it can be checked if there is already a value set here or not
// in case there is a warning or error might be more appropriate
child.Value = rest;
}
}
public IndexedTree(string splitKey) {
_splitKey = splitKey;
_me = new Dictionary<string, IndexedTree>();
}
}
public static void Main()
{
IndexedTree tree = new IndexedTree(".");
tree.Add("Level1.Level2.Level3.Item");
tree.Add("Level1.Level2.Value");
Console.WriteLine(tree["Level1"]["Level2"].Value);
Console.WriteLine(tree["Level1"]["Level2"]["Level3"].Value);
}
}
You could see the result here:
https://dotnetfiddle.net/EGagoz
Code:
while ((linevalue = filereader.ReadLine()) != null)
{
items.Add(linevalue);
}
filereader.Close();
items.Sort();
//To display the content of array (sorted)
IEnumerator myEnumerator = items.GetEnumerator();
while (myEnumerator.MoveNext())
{
Console.WriteLine(myEnumerator.Current);
}
The program above displays all the values. How to extract only the dates and sort it in ascending order?
I am not let to work with linq, use the exception or threading or any other stuff. I have to stick with the File Stream, try to get my data out of the text file, sort and store it, so that i can retrieve it, view it and edit it and search for any particular date and see the date of joining records for that date. Can't figure out. Struggling
Basically, don't try and work with the file as lines of text; separate that away, so that you have one piece of code which parses that text into typed records, and then process those upstream when you only need to deal with typed data.
For example (and here I'm assuming that the file is tab-delimited, but you could change it to be column indexed instead easily enough); look at how little work my Main method needs to do to work with the data:
using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
static class Program
{
static void Main()
{
foreach (var item in ReadFile("my.txt").OrderBy(x => x.Joined))
{
Console.WriteLine(item.Names);
}
}
static readonly char[] tab = { '\t' };
class Foo
{
public string Names { get; set; }
public int Age { get; set; }
public string Designation { get; set; }
public DateTime Joined { get; set; }
}
static IEnumerable<Foo> ReadFile(string path)
{
using (var reader = File.OpenText(path))
{
// skip the first line (headers), or exit
if (reader.ReadLine() == null) yield break;
// read each line
string line;
var culture = CultureInfo.InvariantCulture;
while ((line = reader.ReadLine()) != null)
{
var parts = line.Split(tab);
yield return new Foo
{
Names = parts[0],
Age = int.Parse(parts[1], culture),
Designation = parts[2],
Joined = DateTime.Parse(parts[3], culture)
};
}
}
}
}
And here's a version (not quite as elegant, but working) that works on .NET 2.0 (and probably on .NET 1.1) using only ISO-1 language features; personally I think it would be silly to use .NET 1.1, and if you are using .NET 2.0, then List<T> would be vastly preferable to ArrayList. But this is "worst case":
using System;
using System.Collections;
using System.Globalization;
using System.IO;
class Program
{
static void Main()
{
ArrayList items = ReadFile("my.txt");
items.Sort(FooByDateComparer.Default);
foreach (Foo item in items)
{
Console.WriteLine(item.Names);
}
}
class FooByDateComparer : IComparer
{
public static readonly FooByDateComparer Default
= new FooByDateComparer();
private FooByDateComparer() { }
public int Compare(object x, object y)
{
return ((Foo)x).Joined.CompareTo(((Foo)y).Joined);
}
}
static readonly char[] tab = { '\t' };
class Foo
{
private string names, designation;
private int age;
private DateTime joined;
public string Names { get { return names; } set { names = value; } }
public int Age { get { return age; } set { age = value; } }
public string Designation { get { return designation; } set { designation = value; } }
public DateTime Joined { get { return joined; } set { joined = value; } }
}
static ArrayList ReadFile(string path)
{
ArrayList items = new ArrayList();
using (StreamReader reader = File.OpenText(path))
{
// skip the first line (headers), or exit
if (reader.ReadLine() == null) return items;
// read each line
string line;
CultureInfo culture = CultureInfo.InvariantCulture;
while ((line = reader.ReadLine()) != null)
{
string[] parts = line.Split(tab);
Foo foo = new Foo();
foo.Names = parts[0];
foo.Age = int.Parse(parts[1], culture);
foo.Designation = parts[2];
foo.Joined = DateTime.Parse(parts[3], culture);
items.Add(foo);
}
}
return items;
}
}
I'm not sure why you'd want to retrieve just the dates. You'd probably be better reading your data into Tuples first. Something like
List<Tuple<string, int, string, DateTime>> items.
Then you can sort them by items.Item4, which will be the date.
You can use LINQ and split the line according to tabs to only retrieve the date and order them through a conversion to date.
while ((linevalue = filereader.ReadLine()) != null)
{
items.Add(linevalue.Split('\t').Last());
}
filereader.Close();
items.OrderBy(i => DateTime.Parse(i));
foreach(var item in items)
{
Console.WriteLine(item);
}
get the desired values in Array from the file...
public class DateComparer : IComparer {
public int Compare(DateTime x, DateTime y) {
if(x.Date > y.Date)
return 1;
if(x.Date < y.Date)
return -1;
else
return 0;
}
}
list.Sort(new DateComparer());
So I am developing an append-only 64-bit-ish List and Dictionary, and I've run into a memory error. I figured I would at some point, but not at 64 MBs. I find that somewhat unexpected, and am curious if someone could explain to me why it's running into an issue at 64 MBs.
My test for my new List class is simply an attempt to create and load 8 GBs worth of bools into the List. I figured they'd suck up only ~1 bit each, so I'd get some good metrics / precision for testing my program.
Here is the output from VS:
- this {OrganicCodeDesigner.DynamicList64Tail<bool>} OrganicCodeDesigner.DynamicList64Tail<bool>
Count 536870912 ulong
- data Count = 536870912 System.Collections.Generic.List<bool>
- base {"Exception of type 'System.OutOfMemoryException' was thrown."} System.SystemException {System.OutOfMemoryException}
- base {"Exception of type 'System.OutOfMemoryException' was thrown."} System.Exception {System.OutOfMemoryException}
+ Data {System.Collections.ListDictionaryInternal} System.Collections.IDictionary {System.Collections.ListDictionaryInternal}
HelpLink null string
+ InnerException null System.Exception
Message "Exception of type 'System.OutOfMemoryException' was thrown." string
Source "mscorlib" string
StackTrace " at System.Collections.Generic.Mscorlib_CollectionDebugView`1.get_Items()" string
+ TargetSite {T[] get_Items()} System.Reflection.MethodBase {System.Reflection.RuntimeMethodInfo}
+ Static members
+ Non-Public members
- Raw View
Capacity 536870912 int
Count 536870912 int
- Static members
+ Non-Public members
- Non-Public members
+ _items {bool[536870912]} bool[]
_size 536870912 int
_syncRoot null object
_version 536870912 int
System.Collections.Generic.ICollection<T>.IsReadOnly false bool
System.Collections.ICollection.IsSynchronized false bool
System.Collections.ICollection.SyncRoot {object} object
System.Collections.IList.IsFixedSize false bool
System.Collections.IList.IsReadOnly false bool
item false bool
- Type variables
T bool bool
And here are the classes I am currently working on:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace OrganicCodeDesigner
{
public class DynamicList64Tail<T> : iList64<T>
{
private List<T> data;
public DynamicList64Tail()
{
data = new List<T>();
}
public void Add(T item)
{
data.Add(item);
}
public void Clear()
{
data.Clear();
}
public bool Contains(T item)
{
return data.Contains(item);
}
public ulong? IndexOf(T item)
{
if(this.data.Contains(item)) {
return (ulong)data.IndexOf(item);
}
return null;
}
public T this[ulong index]
{
get
{
return data[(int)(index)];
}
set
{
data[(int)(index)] = value;
}
}
public ulong Count
{
get { return (ulong)data.Count; }
}
}
}
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;
namespace OrganicCodeDesigner
{
// #todo: Create IList64, with 64-bit longs in mind.
// #todo: Create BigIntegerList, which may supersede this one.
public class DynamicList64<T> : iList64<T>
{
private List<iList64<T>> data;
private ulong count = 0;
private ulong depth = 0;
public DynamicList64()
{
data = new List<iList64<T>>() { new DynamicList64Tail<T>()};
count = 0;
}
public DynamicList64(ulong depth)
{
this.depth = depth;
if (depth == 0)
{
data = new List<iList64<T>>() { new DynamicList64Tail<T>() };
}
else
{
depth -= 1;
data = new List<iList64<T>>() { new DynamicList64<T>(depth) };
}
}
internal DynamicList64(List<iList64<T>> data, ulong depth)
{
this.data = data;
this.depth = depth;
this.count = Int32.MaxValue;
}
public void Add(T item)
{
if (data.Count >= Int32.MaxValue)
{
//#todo: Do switch operation, whereby this {depth, List l} becomes this {depth + 1, List.Add(List l), count = 1}, and the new object becomes {depth, List l, count = max}
DynamicList64<T> newDynamicList64 = new DynamicList64<T>(this.data, this.depth);
this.data = new List<iList64<T>>() { newDynamicList64 };
this.count = 0;
this.depth += 1;
}
if(data[data.Count-1].Count >= Int32.MaxValue) {
if (depth == 0)
{
data.Add(new DynamicList64Tail<T>());
}
else
{
data.Add(new DynamicList64<T>(depth - 1));
}
}
data[data.Count - 1].Add(item);
count++;
}
public void Clear()
{
data.Clear();
data = new List<iList64<T>>() { new DynamicList64Tail<T>() };
count = 0;
depth = 0;
}
public bool Contains(T item)
{
foreach(iList64<T> l in data) {
if(l.Contains(item)) {
return true;
}
}
return false;
}
public ulong? IndexOf(T item)
{
for (int i = 0; i < data.Count; i++ )
{
if (data[i].Contains(item))
{
return (ulong)(((ulong)i * (ulong)(Int32.MaxValue)) + data[i].IndexOf(item).Value);
}
}
return null;
}
public T this[ulong index]
{
get
{
return data[(int)(index / Int32.MaxValue)][index % Int32.MaxValue];
}
set
{
data[(int)(index / Int32.MaxValue)][index % Int32.MaxValue] = value;
}
}
public ulong Count
{
get { return this.count; }
}
}
}
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace OrganicCodeDesigner
{
public interface iList64<T>
{
void Add(T item);
void Clear();
bool Contains(T item);
ulong? IndexOf(T item);
T this[ulong index] { get; set;}
ulong Count { get; }
}
}
And the test program's code:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using OrganicCodeDesigner;
namespace OrganicCodeDesignerListDictionaryTest
{
public partial class MainForm : Form
{
public MainForm()
{
InitializeComponent();
}
private void Button_TestList_Click(object sender, EventArgs e)
{
DynamicList64<bool> newList = new DynamicList64<bool>();
newList.Add(true);
newList.Add(false);
bool b = true;
for (ulong i = 0; i < 68719476736; i++)
{
b = !b;
newList.Add(b);
//if(i%4096==0) {
//TextBox_Output.Text += "List now contains " + i + "\r";
//}
}
TextBox_Output.Text += "Successfully added all the bits.\r";
}
private void Button_TestDictionary_Click(object sender, EventArgs e)
{
}
}
}
Perhaps you can spot the error?
Perhaps you can spot the error?
I think the error is here:
I figured they'd suck up only ~1 bit each, so I'd get some good metrics / precision for testing my program.
A bool takes one byte, not one bit - so you've drastically underestimated the size of your list. You're actually running into an error with 512MB of bools. As Reed Copsey is editing a little faster than me - I suspect the list is trying to increase its size by allocating an array 2x it's current size [i.e. a 1GB array] and that this is running into some .net limitations.
This is probably a good time to start implementing your splitting logic.
There are limits to the size of an array in .NET. Even if you are running on 64bit platforms, and set gcAllowVeryLargeObjects (in .NET 4.5), you are still limited to 2,146,435,071 items max in a single dimension of the array.
(In pre-4.5, you are limited by 2gb for a single object, no matter how many entries it contains.)
That being said, a bool is represented by one byte, not one bit, so this will be quite a bit larger than you're expecting. That being said, you still only have 536,870,912 in your list when this fails, so theoretically, on a 64bit system with enough memory, the next allocation for growing the list should still be within the limits. However, this requires the program to succesfully allocate a single, contiguous chunk of memory large enough for the requested data (which will be 2x the size of the last chunk).
I'm looking for the fastest way to serialize and deserialize .NET objects. Here is what I have so far:
public class TD
{
public List<CT> CTs { get; set; }
public List<TE> TEs { get; set; }
public string Code { get; set; }
public string Message { get; set; }
public DateTime StartDate { get; set; }
public DateTime EndDate { get; set; }
public static string Serialize(List<TD> tData)
{
var serializer = new XmlSerializer(typeof(List<TD>));
TextWriter writer = new StringWriter();
serializer.Serialize(writer, tData);
return writer.ToString();
}
public static List<TD> Deserialize(string tData)
{
var serializer = new XmlSerializer(typeof(List<TD>));
TextReader reader = new StringReader(tData);
return (List<TD>)serializer.Deserialize(reader);
}
}
Here's your model (with invented CT and TE) using protobuf-net (yet retaining the ability to use XmlSerializer, which can be useful - in particular for migration); I humbly submit (with lots of evidence if you need it) that this is the fastest (or certainly one of the fastest) general purpose serializer in .NET.
If you need strings, just base-64 encode the binary.
[XmlType]
public class CT {
[XmlElement(Order = 1)]
public int Foo { get; set; }
}
[XmlType]
public class TE {
[XmlElement(Order = 1)]
public int Bar { get; set; }
}
[XmlType]
public class TD {
[XmlElement(Order=1)]
public List<CT> CTs { get; set; }
[XmlElement(Order=2)]
public List<TE> TEs { get; set; }
[XmlElement(Order = 3)]
public string Code { get; set; }
[XmlElement(Order = 4)]
public string Message { get; set; }
[XmlElement(Order = 5)]
public DateTime StartDate { get; set; }
[XmlElement(Order = 6)]
public DateTime EndDate { get; set; }
public static byte[] Serialize(List<TD> tData) {
using (var ms = new MemoryStream()) {
ProtoBuf.Serializer.Serialize(ms, tData);
return ms.ToArray();
}
}
public static List<TD> Deserialize(byte[] tData) {
using (var ms = new MemoryStream(tData)) {
return ProtoBuf.Serializer.Deserialize<List<TD>>(ms);
}
}
}
A comprehensive comparison between different formats made by me in this post-
https://medium.com/#maximn/serialization-performance-comparison-xml-binary-json-p-ad737545d227
Just one sample from the post-
Having an interest in this, I decided to test the suggested methods with the closest "apples to apples" test I could. I wrote a Console app, with the following code:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Runtime.Serialization.Formatters.Binary;
using System.Text;
using System.Threading.Tasks;
namespace SerializationTests
{
class Program
{
static void Main(string[] args)
{
var count = 100000;
var rnd = new Random(DateTime.UtcNow.GetHashCode());
Console.WriteLine("Generating {0} arrays of data...", count);
var arrays = new List<int[]>();
for (int i = 0; i < count; i++)
{
var elements = rnd.Next(1, 100);
var array = new int[elements];
for (int j = 0; j < elements; j++)
{
array[j] = rnd.Next();
}
arrays.Add(array);
}
Console.WriteLine("Test data generated.");
var stopWatch = new Stopwatch();
Console.WriteLine("Testing BinarySerializer...");
var binarySerializer = new BinarySerializer();
var binarySerialized = new List<byte[]>();
var binaryDeserialized = new List<int[]>();
stopWatch.Reset();
stopWatch.Start();
foreach (var array in arrays)
{
binarySerialized.Add(binarySerializer.Serialize(array));
}
stopWatch.Stop();
Console.WriteLine("BinaryFormatter: Serializing took {0}ms.", stopWatch.Elapsed.TotalMilliseconds);
stopWatch.Reset();
stopWatch.Start();
foreach (var serialized in binarySerialized)
{
binaryDeserialized.Add(binarySerializer.Deserialize<int[]>(serialized));
}
stopWatch.Stop();
Console.WriteLine("BinaryFormatter: Deserializing took {0}ms.", stopWatch.Elapsed.TotalMilliseconds);
Console.WriteLine();
Console.WriteLine("Testing ProtoBuf serializer...");
var protobufSerializer = new ProtoBufSerializer();
var protobufSerialized = new List<byte[]>();
var protobufDeserialized = new List<int[]>();
stopWatch.Reset();
stopWatch.Start();
foreach (var array in arrays)
{
protobufSerialized.Add(protobufSerializer.Serialize(array));
}
stopWatch.Stop();
Console.WriteLine("ProtoBuf: Serializing took {0}ms.", stopWatch.Elapsed.TotalMilliseconds);
stopWatch.Reset();
stopWatch.Start();
foreach (var serialized in protobufSerialized)
{
protobufDeserialized.Add(protobufSerializer.Deserialize<int[]>(serialized));
}
stopWatch.Stop();
Console.WriteLine("ProtoBuf: Deserializing took {0}ms.", stopWatch.Elapsed.TotalMilliseconds);
Console.WriteLine();
Console.WriteLine("Testing NetSerializer serializer...");
var netSerializerSerializer = new ProtoBufSerializer();
var netSerializerSerialized = new List<byte[]>();
var netSerializerDeserialized = new List<int[]>();
stopWatch.Reset();
stopWatch.Start();
foreach (var array in arrays)
{
netSerializerSerialized.Add(netSerializerSerializer.Serialize(array));
}
stopWatch.Stop();
Console.WriteLine("NetSerializer: Serializing took {0}ms.", stopWatch.Elapsed.TotalMilliseconds);
stopWatch.Reset();
stopWatch.Start();
foreach (var serialized in netSerializerSerialized)
{
netSerializerDeserialized.Add(netSerializerSerializer.Deserialize<int[]>(serialized));
}
stopWatch.Stop();
Console.WriteLine("NetSerializer: Deserializing took {0}ms.", stopWatch.Elapsed.TotalMilliseconds);
Console.WriteLine("Press any key to end.");
Console.ReadKey();
}
public class BinarySerializer
{
private static readonly BinaryFormatter Formatter = new BinaryFormatter();
public byte[] Serialize(object toSerialize)
{
using (var stream = new MemoryStream())
{
Formatter.Serialize(stream, toSerialize);
return stream.ToArray();
}
}
public T Deserialize<T>(byte[] serialized)
{
using (var stream = new MemoryStream(serialized))
{
var result = (T)Formatter.Deserialize(stream);
return result;
}
}
}
public class ProtoBufSerializer
{
public byte[] Serialize(object toSerialize)
{
using (var stream = new MemoryStream())
{
ProtoBuf.Serializer.Serialize(stream, toSerialize);
return stream.ToArray();
}
}
public T Deserialize<T>(byte[] serialized)
{
using (var stream = new MemoryStream(serialized))
{
var result = ProtoBuf.Serializer.Deserialize<T>(stream);
return result;
}
}
}
public class NetSerializer
{
private static readonly NetSerializer Serializer = new NetSerializer();
public byte[] Serialize(object toSerialize)
{
return Serializer.Serialize(toSerialize);
}
public T Deserialize<T>(byte[] serialized)
{
return Serializer.Deserialize<T>(serialized);
}
}
}
}
The results surprised me; they were consistent when run multiple times:
Generating 100000 arrays of data...
Test data generated.
Testing BinarySerializer...
BinaryFormatter: Serializing took 336.8392ms.
BinaryFormatter: Deserializing took 208.7527ms.
Testing ProtoBuf serializer...
ProtoBuf: Serializing took 2284.3827ms.
ProtoBuf: Deserializing took 2201.8072ms.
Testing NetSerializer serializer...
NetSerializer: Serializing took 2139.5424ms.
NetSerializer: Deserializing took 2113.7296ms.
Press any key to end.
Collecting these results, I decided to see if ProtoBuf or NetSerializer performed better with larger objects. I changed the collection count to 10,000 objects, but increased the size of the arrays to 1-10,000 instead of 1-100. The results seemed even more definitive:
Generating 10000 arrays of data...
Test data generated.
Testing BinarySerializer...
BinaryFormatter: Serializing took 285.8356ms.
BinaryFormatter: Deserializing took 206.0906ms.
Testing ProtoBuf serializer...
ProtoBuf: Serializing took 10693.3848ms.
ProtoBuf: Deserializing took 5988.5993ms.
Testing NetSerializer serializer...
NetSerializer: Serializing took 9017.5785ms.
NetSerializer: Deserializing took 5978.7203ms.
Press any key to end.
My conclusion, therefore, is: there may be cases where ProtoBuf and NetSerializer are well-suited to, but in terms of raw performance for at least relatively simple objects... BinaryFormatter is significantly more performant, by at least an order of magnitude.
YMMV.
Protobuf is very very fast.
See http://code.google.com/p/protobuf-net/wiki/Performance for in depth information concerning the performance of this system, and an implementation.
Yet another serializer out there that claims to be super fast is netserializer.
The data given on their site shows performance of 2x - 4x over protobuf, I have not tried this myself, but if you are evaluating various options, try this as well
The binary serializer included with .net should be faster that the XmlSerializer. Or another serializer for protobuf, json, ...
But for some of them you need to add Attributes, or some other way to add metadata. For example ProtoBuf uses numeric property IDs internally, and the mapping needs to be somehow conserved by a different mechanism. Versioning isn't trivial with any serializer.
I removed the bugs in above code and got below results: Also I am unsure given how NetSerializer requires you to register the types you are serializing, what kind of compatibility or performance differences that could potentially make.
Generating 100000 arrays of data...
Test data generated.
Testing BinarySerializer...
BinaryFormatter: Serializing took 508.9773ms.
BinaryFormatter: Deserializing took 371.8499ms.
Testing ProtoBuf serializer...
ProtoBuf: Serializing took 3280.9185ms.
ProtoBuf: Deserializing took 3190.7899ms.
Testing NetSerializer serializer...
NetSerializer: Serializing took 427.1241ms.
NetSerializer: Deserializing took 78.954ms.
Press any key to end.
Modified Code
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Runtime.Serialization.Formatters.Binary;
using System.Text;
using System.Threading.Tasks;
namespace SerializationTests
{
class Program
{
static void Main(string[] args)
{
var count = 100000;
var rnd = new Random((int)DateTime.UtcNow.Ticks & 0xFF);
Console.WriteLine("Generating {0} arrays of data...", count);
var arrays = new List<int[]>();
for (int i = 0; i < count; i++)
{
var elements = rnd.Next(1, 100);
var array = new int[elements];
for (int j = 0; j < elements; j++)
{
array[j] = rnd.Next();
}
arrays.Add(array);
}
Console.WriteLine("Test data generated.");
var stopWatch = new Stopwatch();
Console.WriteLine("Testing BinarySerializer...");
var binarySerializer = new BinarySerializer();
var binarySerialized = new List<byte[]>();
var binaryDeserialized = new List<int[]>();
stopWatch.Reset();
stopWatch.Start();
foreach (var array in arrays)
{
binarySerialized.Add(binarySerializer.Serialize(array));
}
stopWatch.Stop();
Console.WriteLine("BinaryFormatter: Serializing took {0}ms.", stopWatch.Elapsed.TotalMilliseconds);
stopWatch.Reset();
stopWatch.Start();
foreach (var serialized in binarySerialized)
{
binaryDeserialized.Add(binarySerializer.Deserialize<int[]>(serialized));
}
stopWatch.Stop();
Console.WriteLine("BinaryFormatter: Deserializing took {0}ms.", stopWatch.Elapsed.TotalMilliseconds);
Console.WriteLine();
Console.WriteLine("Testing ProtoBuf serializer...");
var protobufSerializer = new ProtoBufSerializer();
var protobufSerialized = new List<byte[]>();
var protobufDeserialized = new List<int[]>();
stopWatch.Reset();
stopWatch.Start();
foreach (var array in arrays)
{
protobufSerialized.Add(protobufSerializer.Serialize(array));
}
stopWatch.Stop();
Console.WriteLine("ProtoBuf: Serializing took {0}ms.", stopWatch.Elapsed.TotalMilliseconds);
stopWatch.Reset();
stopWatch.Start();
foreach (var serialized in protobufSerialized)
{
protobufDeserialized.Add(protobufSerializer.Deserialize<int[]>(serialized));
}
stopWatch.Stop();
Console.WriteLine("ProtoBuf: Deserializing took {0}ms.", stopWatch.Elapsed.TotalMilliseconds);
Console.WriteLine();
Console.WriteLine("Testing NetSerializer serializer...");
var netSerializerSerialized = new List<byte[]>();
var netSerializerDeserialized = new List<int[]>();
stopWatch.Reset();
stopWatch.Start();
var netSerializerSerializer = new NS();
foreach (var array in arrays)
{
netSerializerSerialized.Add(netSerializerSerializer.Serialize(array));
}
stopWatch.Stop();
Console.WriteLine("NetSerializer: Serializing took {0}ms.", stopWatch.Elapsed.TotalMilliseconds);
stopWatch.Reset();
stopWatch.Start();
foreach (var serialized in netSerializerSerialized)
{
netSerializerDeserialized.Add(netSerializerSerializer.Deserialize<int[]>(serialized));
}
stopWatch.Stop();
Console.WriteLine("NetSerializer: Deserializing took {0}ms.", stopWatch.Elapsed.TotalMilliseconds);
Console.WriteLine("Press any key to end.");
Console.ReadKey();
}
public class BinarySerializer
{
private static readonly BinaryFormatter Formatter = new BinaryFormatter();
public byte[] Serialize(object toSerialize)
{
using (var stream = new MemoryStream())
{
Formatter.Serialize(stream, toSerialize);
return stream.ToArray();
}
}
public T Deserialize<T>(byte[] serialized)
{
using (var stream = new MemoryStream(serialized))
{
var result = (T)Formatter.Deserialize(stream);
return result;
}
}
}
public class ProtoBufSerializer
{
public byte[] Serialize(object toSerialize)
{
using (var stream = new MemoryStream())
{
ProtoBuf.Serializer.Serialize(stream, toSerialize);
return stream.ToArray();
}
}
public T Deserialize<T>(byte[] serialized)
{
using (var stream = new MemoryStream(serialized))
{
var result = ProtoBuf.Serializer.Deserialize<T>(stream);
return result;
}
}
}
public class NS
{
NetSerializer.Serializer Serializer = new NetSerializer.Serializer(new Type[] { typeof(int), typeof(int[]) });
public byte[] Serialize(object toSerialize)
{
using (var stream = new MemoryStream())
{
Serializer.Serialize(stream, toSerialize);
return stream.ToArray();
}
}
public T Deserialize<T>(byte[] serialized)
{
using (var stream = new MemoryStream(serialized))
{
Serializer.Deserialize(stream, out var result);
return (T)result;
}
}
}
}
}
You can try Salar.Bois serializer which has a decent performance. Its focus is on payload size but it also offers good performance.
There are benchmarks in the Github page if you wish to see and compare the results by yourself.
https://github.com/salarcode/Bois
I took the liberty of feeding your classes into the CGbR generator. Because it is in an early stage it doesn't support DateTime yet, so I simply replaced it with long. The generated serialization code looks like this:
public int Size
{
get
{
var size = 24;
// Add size for collections and strings
size += Cts == null ? 0 : Cts.Count * 4;
size += Tes == null ? 0 : Tes.Count * 4;
size += Code == null ? 0 : Code.Length;
size += Message == null ? 0 : Message.Length;
return size;
}
}
public byte[] ToBytes(byte[] bytes, ref int index)
{
if (index + Size > bytes.Length)
throw new ArgumentOutOfRangeException("index", "Object does not fit in array");
// Convert Cts
// Two bytes length information for each dimension
GeneratorByteConverter.Include((ushort)(Cts == null ? 0 : Cts.Count), bytes, ref index);
if (Cts != null)
{
for(var i = 0; i < Cts.Count; i++)
{
var value = Cts[i];
value.ToBytes(bytes, ref index);
}
}
// Convert Tes
// Two bytes length information for each dimension
GeneratorByteConverter.Include((ushort)(Tes == null ? 0 : Tes.Count), bytes, ref index);
if (Tes != null)
{
for(var i = 0; i < Tes.Count; i++)
{
var value = Tes[i];
value.ToBytes(bytes, ref index);
}
}
// Convert Code
GeneratorByteConverter.Include(Code, bytes, ref index);
// Convert Message
GeneratorByteConverter.Include(Message, bytes, ref index);
// Convert StartDate
GeneratorByteConverter.Include(StartDate.ToBinary(), bytes, ref index);
// Convert EndDate
GeneratorByteConverter.Include(EndDate.ToBinary(), bytes, ref index);
return bytes;
}
public Td FromBytes(byte[] bytes, ref int index)
{
// Read Cts
var ctsLength = GeneratorByteConverter.ToUInt16(bytes, ref index);
var tempCts = new List<Ct>(ctsLength);
for (var i = 0; i < ctsLength; i++)
{
var value = new Ct().FromBytes(bytes, ref index);
tempCts.Add(value);
}
Cts = tempCts;
// Read Tes
var tesLength = GeneratorByteConverter.ToUInt16(bytes, ref index);
var tempTes = new List<Te>(tesLength);
for (var i = 0; i < tesLength; i++)
{
var value = new Te().FromBytes(bytes, ref index);
tempTes.Add(value);
}
Tes = tempTes;
// Read Code
Code = GeneratorByteConverter.GetString(bytes, ref index);
// Read Message
Message = GeneratorByteConverter.GetString(bytes, ref index);
// Read StartDate
StartDate = DateTime.FromBinary(GeneratorByteConverter.ToInt64(bytes, ref index));
// Read EndDate
EndDate = DateTime.FromBinary(GeneratorByteConverter.ToInt64(bytes, ref index));
return this;
}
I created a list of sample objects like this:
var objects = new List<Td>();
for (int i = 0; i < 1000; i++)
{
var obj = new Td
{
Message = "Hello my friend",
Code = "Some code that can be put here",
StartDate = DateTime.Now.AddDays(-7),
EndDate = DateTime.Now.AddDays(2),
Cts = new List<Ct>(),
Tes = new List<Te>()
};
for (int j = 0; j < 10; j++)
{
obj.Cts.Add(new Ct { Foo = i * j });
obj.Tes.Add(new Te { Bar = i + j });
}
objects.Add(obj);
}
Results on my machine in Release build:
var watch = new Stopwatch();
watch.Start();
var bytes = BinarySerializer.SerializeMany(objects);
watch.Stop();
Size: 149000 bytes
Time: 2.059ms 3.13ms
Edit: Starting with CGbR 0.4.3 the binary serializer supports DateTime. Unfortunately the DateTime.ToBinary method is insanely slow. I will replace it with somehting faster soon.
Edit2: When using UTC DateTime by invoking ToUniversalTime() the performance is restored and clocks in at 1.669ms.