I'm downloading some files asynchronously into a large byte array, and I have a callback that fires off periodically whenever some data is added to that array. If I want to give developers the ability to use the last chunk of data that was added to array, then... well how would I do that? In C++ I could give them a pointer to somewhere in the middle, and then perhaps tell them the number of bytes that were added in the last operation so they at least know the chunk they should be looking at... I don't really want to give them a 2nd copy of that data, that's just wasteful.
I'm just thinking if people want to process this data before the file has completed downloading. Would anyone actually want to do that? Or is it a useless feature anyway? I already have a callback for when the buffer (entire byte array) is full, and then they can dump the whole thing without worrying about start and end points...
.NET has a struct that does exactly what you want:
System.ArraySegment.
In any case, it's easy to implement it yourself too - just make a constructor that takes a base array, an offset, and a length. Then implement an indexer that offsets indexes behind the scenes, so your ArraySegment can be seamlessly used in the place of an array.
You can't give them a pointer into the array, but you could give them the array and start index and length of the new data.
But I have to wonder what someone would use this for. Is this a known need? or are you just guessing that someone might want this someday. And If so, is there any reason why you couldn't wait to add the capability once somone actually needs it?
Whether this is needed or not depends on whether you can afford to accumulate all the data from a file before processing it, or whether you need to provide a streaming mode where you process each chunk as it arrives. This depends on two things: how much data there is (you probably would not want to accumulate a multi-gigabyte file), and how long it takes the file to completely arrive (if you are getting the data over a slow link you might not want your client to wait till it had all arrived). So it is a reasonable feature to add, depending on how the library is to be used. Streaming mode is usually a desirable attribute, so I would vote for implementing the feature. However, the idea of putting the data into an array seems wrong, because it fundamentally implies a non-streaming design, and because it requires an additional copy. What you could do instead is to keep each chunk of arriving data as a discrete piece. These could be stored in a container for which adding at the end and removing from the front is efficient.
Copying a chunk of a byte array may seem "wasteful," but then again, object-oriented languages like C# tend to be a little more wasteful than procedural languages anyway. A few extra CPU cycles and a little extra memory consumption can greatly reduce complexity and increase flexibility in the development process. In fact, copying bytes to a new location in memory to me sounds like good design, as opposed to the pointer approach which will give other classes access to private data.
But if you do want to use pointers, C# does support them. Here is a decent-looking tutorial. The author is correct when he states, "...pointers are only really needed in C# where execution speed is highly important."
I agree with the OP: sometimes you just plain need to pay some attention to efficiency. I don't think the example of providing an API is the best, because that certainly calls for leaning toward safety and simplicity over efficiency.
However, a simple example is when processing large numbers of huge binary files that have zillions of records in them, such as when writing a parser. Without using a mechanism such as System.ArraySegment, the parser becomes a big memory hog, and is greatly slowed down by creating a zillion new data elements, copying all the memory over, and fragmenting the heck out of the heap. It's a very real performance issue. I write these kinds of parsers all the time for telecommunications stuff which generate millions of records per day in each of several categories from each of many switches with variable length binary structures that need to be parsed into databases.
Using the System.ArraySegment mechanism versus creating new structure copies for each record tremendously speeds up the parsing, and greatly reduces the peak memory consumption of the parser. These are very real advantages because the servers run multiple parsers, run them frequently, and speed and memory conservation = very real cost savings in not having to have so many processors dedicated to the parsing.
System.Array segment is very easy to use. Here's a simple example of providing a base way to track the individual records in a typical big binary file full of records with a fixed length header and a variable length record size (obvious exception control deleted):
public struct MyRecord
{
ArraySegment<byte> header;
ArraySegment<byte> data;
}
public class Parser
{
const int HEADER_SIZE = 10;
const int HDR_OFS_REC_TYPE = 0;
const int HDR_OFS_REC_LEN = 4;
byte[] m_fileData;
List<MyRecord> records = new List<MyRecord>();
bool Parse(FileStream fs)
{
int fileLen = (int)fs.FileLength;
m_fileData = new byte[fileLen];
fs.Read(m_fileData, 0, fileLen);
fs.Close();
fs.Dispose();
int offset = 0;
while (offset + HEADER_SIZE < fileLen)
{
int recType = (int)m_fileData[offset];
switch (recType) { /*puke if not a recognized type*/ }
int varDataLen = ((int)m_fileData[offset + HDR_OFS_REC_LEN]) * 256
+ (int)m_fileData[offset + HDR_OFS_REC_LEN + 1];
if (offset + varDataLen > fileLen) { /*puke as file has odd bytes at end*/}
MyRecord rec = new MyRecord();
rec.header = new ArraySegment(m_fileData, offset, HEADER_SIZE);
rec.data = new ArraySegment(m_fileData, offset + HEADER_SIZE,
varDataLen);
records.Add(rec);
offset += HEADER_SIZE + varDataLen;
}
}
}
The above example gives you a list with ArraySegments for each record in the file while leaving all the actual data in place in one big array per file. The only overhead are the two array segments in the MyRecord struct per record. When processing the records, you have the MyRecord.header.Array and MyRecord.data.Array properties which allow you to operate on the elements in each record as if they were their own byte[] copies.
I think you shouldn't bother.
Why on earth would anyone want to use it?
That sounds like you want an event.
public class ArrayChangedEventArgs : EventArgs {
public (byte[] array, int start, int length) {
Array = array;
Start = start;
Length = length;
}
public byte[] Array { get; private set; }
public int Start { get; private set; }
public int Length { get; private set; }
}
// ...
// and in your class:
public event EventHandler<ArrayChangedEventArgs> ArrayChanged;
protected virtual void OnArrayChanged(ArrayChangedEventArgs e)
{
// using a temporary variable avoids a common potential multithreading issue
// where the multicast delegate changes midstream.
// Best practice is to grab a copy first, then test for null
EventHandler<ArrayChangedEventArgs> handler = ArrayChanged;
if (handler != null)
{
handler(this, e);
}
}
// finally, your code that downloads a chunk just needs to call OnArrayChanged()
// with the appropriate args
Clients hook into the event and get called when things change. This is what most client code in .NET expects to have in an API ("call me when something happens"). They can hook into the code with something as simple as:
yourDownloader.ArrayChanged += (sender, e) =>
Console.WriteLine(String.Format("Just downloaded {0} byte{1} at position {2}.",
e.Length, e.Length == 1 ? "" : "s", e.Start));
Related
I am writing .NET applications running on Windows Server 2016 that does an http get on a bunch of pieces of a large file. This dramatically speeds up the download process since you can download them in parallel. Unfortunately, once they are downloaded, it takes a fairly long time to pieces them all back together.
There are between 2-4k files that need to be combined. The server this will run on has PLENTLY of memory, close to 800GB. I thought it would make sense to use MemoryStreams to store the downloaded pieces until they can be sequentially written to disk, BUT I am only able to consume about 2.5GB of memory before I get an System.OutOfMemoryException error. The server has hundreds of GB available, and I can't figure out how to use them.
MemoryStreams are built around byte arrays. Arrays cannot be larger than 2GB currently.
The current implementation of System.Array uses Int32 for all its internal counters etc, so the theoretical maximum number of elements is Int32.MaxValue.
There's also a 2GB max-size-per-object limit imposed by the Microsoft CLR.
As you try to put the content in a single MemoryStream the underlying array gets too large, hence the exception.
Try to store the pieces separately, and write them directly to the FileStream (or whatever you use) when ready, without first trying to concatenate them all into 1 object.
According to the source code of the MemoryStream class you will not be able to store more than 2 GB of data into one instance of this class.
The reason for this is that the maximum length of the stream is set to Int32.MaxValue and the maximum index of an array is set to 0x0x7FFFFFC7 which is 2.147.783.591 decimal (= 2 GB).
Snippet MemoryStream
private const int MemStreamMaxLength = Int32.MaxValue;
Snippet array
// We impose limits on maximum array lenght in each dimension to allow efficient
// implementation of advanced range check elimination in future.
// Keep in sync with vm\gcscan.cpp and HashHelpers.MaxPrimeArrayLength.
// The constants are defined in this method: inline SIZE_T MaxArrayLength(SIZE_T componentSize) from gcscan
// We have different max sizes for arrays with elements of size 1 for backwards compatibility
internal const int MaxArrayLength = 0X7FEFFFFF;
internal const int MaxByteArrayLength = 0x7FFFFFC7;
The question More than 2GB of managed memory has already been discussed long time ago on the microsoft forum and has a reference to a blog article about BigArray, getting around the 2GB array size limit there.
Update
I suggest to use the following code which should be able to allocate more than 4 GB on a x64 build but will fail < 4 GB on a x86 build
private static void Main(string[] args)
{
List<byte[]> data = new List<byte[]>();
Random random = new Random();
while (true)
{
try
{
var tmpArray = new byte[1024 * 1024];
random.NextBytes(tmpArray);
data.Add(tmpArray);
Console.WriteLine($"{data.Count} MB allocated");
}
catch
{
Console.WriteLine("Further allocation failed.");
}
}
}
As has already been pointed out, the main problem here is the nature of MemoryStream being backed by a byte[], which has fixed upper size.
The option of using an alternative Stream implementation has been noted. Another alternative is to look into "pipelines", the new IO API. A "pipeline" is based around discontiguous memory, which means it isn't required to use a single contiguous buffer; the pipelines library will allocate multiple slabs as needed, which your code can process. I have written extensively on this topic; part 1 is here. Part 3 probably has the most code focus.
Just to confirm that I understand your question: you're downloading a single very large file in multiple parallel chunks and you know how big the final file is? If you don't then this does get a bit more complicated but it can still be done.
The best option is probably to use a MemoryMappedFile (MMF). What you'll do is to create the destination file via MMF. Each thread will create a view accessor to that file and write to it in parallel. At the end, close the MMF. This essentially gives you the behavior that you wanted with MemoryStreams but Windows backs the file by disk. One of the benefits to this approach is that Windows manages storing the data to disk in the background (flushing) so you don't have to, and should result in excellent performance.
I am not very experienced in C#, but have lots of experience from other languages.
I am doing a project in C# where I have to read and modify large files.
For this I have coded a buffering scheme where I keep chunks of data in memory, and swap them to disk when I need to read more. I always eliminate the [0] element from the array, by moving the following elements back one position.
public struct TBBuffer
{
public long offset;
public short[] data;
public GCHandle dataHandle;
}
//tb is a TBBuffer[], the data[] is initialized to 4096.
If I use a small sample file, where everything fits in the buffers allocated, everything works as intended.
Whenever I need to free up some memory for more data I do:
int bufIdx,bufNo;
for (bufIdx = 0; bufIdx < tb.buffer.Length - 1; bufIdx++)
{
tb.buffer[bufIdx] = tb.buffer[bufIdx + 1];
}
bufNo = tb.Length - 1;
I have determined that the above code is the source of the problem, but I am unable to find out why that is so.
So my question is: Considering the TBBuffer struct, and its contents, does anybody have a clue why this is not working as expected ?
Is there a more efficient way to do this.
Are you looking for array resize?
Array.Resize(ref tb.buffer, newSize);
Just show up your intention to .Net and let it do the work (in the efficient way) for you.
in for loop you use tb.Length. I think it should be tb.buffer.Length:
int bufIdx,bufNo;
for (bufIdx = 0; bufIdx < tb.buffer.Length - 1; bufIdx++)
{
tb.buffer[bufIdx] = tb.buffer[bufIdx + 1];
}
bufNo = tb.Length - 1;
I solved my problem... Rather embarrasing, but the problem was that the data[] of the very last buffer would point at the same as the next-to-last after I did my for () move stunt. After adding a tb.buffer[bufNo].data=new short[4096]; statement everything is back to normal. Thank you for your time, and all the constructive feedback. I will be looking into memory mapped files to see whether this feature will be a better option.
Given a populated byte[] values in C#, I want to prepend the value (byte)0x00 to the array. I assume this will require making a new array and adding the contents of the old array. Speed is an important aspect of my application. What is the best way to do this?
-- EDIT --
The byte[] is used to store DSA (Digital Signature Algorithm) parameters. The operation will only need to be performed once per array, but speed is important because I am potentially performing this operation on many different byte[]s.
If you are only going to perform this operation once then there isn't a whole lot of choices. The code provided by Monroe's answer should do just fine.
byte[] newValues = new byte[values.Length + 1];
newValues[0] = 0x00; // set the prepended value
Array.Copy(values, 0, newValues, 1, values.Length); // copy the old values
If, however, you're going to be performing this operation multiple times you have some more choices. There is a fundamental problem that prepending data to an array isn't an efficient operation, so you could choose to use an alternate data structure.
A LinkedList can efficiently prepend data, but it's less efficient in general for most tasks as it involves a lot more memory allocation/deallocation and also looses memory locallity, so it may not be a net win.
A double ended queue (known as a deque) would be a fantastic data structure for you. You can efficiently add to the start or the end, and efficiently access data anywhere in the structure (but you can't efficiently insert somewhere other than the start or end). The major problem here is that .NET doesn't provide an implementation of a deque. You'd need to find a 3rd party library with an implementation.
You can also save yourself a lot when copying by keeping track of "data that I need to prepend" (using a List/Queue/etc.) and then waiting to actually prepend the data as long as possible, so that you minimize the creation of new arrays as much as possible, as well as limiting the number of copies of existing elements.
You could also consider whether you could adjust the structure so that you're adding to the end, rather than the start (even if you know that you'll need to reverse it later). If you are appending a lot in a short space of time it may be worth storing the data in a List (which can efficiently add to the end) and adding to the end. Depending on your needs, it may even be worth making a class that is a wrapper for a List and that hides the fact that it is reversed. You could make an indexer that maps i to Count-i, etc. so that it appears, from the outside, as though your data is stored normally, even though the internal List actually holds the data backwards.
Ok guys, let's take a look at the perfomance issue regarding this question.
This is not an answer, just a microbenchmark to see which option is more efficient.
So, let's set the scenario:
A byte array of 1,000,000 items, randomly populated
We need to prepend item 0x00
We have 3 options:
Manually creating and populating the new array
Manually creating the new array and using Array.Copy (#Monroe)
Creating a list, loading the array, inserting the item and converting the list to an array
Here's the code:
byte[] byteArray = new byte[1000000];
for (int i = 0; i < byteArray.Length; i++)
{
byteArray[i] = Convert.ToByte(DateTime.Now.Second);
}
Stopwatch stopWatch = new Stopwatch();
//#1 Manually creating and populating a new array;
stopWatch.Start();
byte[] extendedByteArray1 = new byte[byteArray.Length + 1];
extendedByteArray1[0] = 0x00;
for (int i = 0; i < byteArray.Length; i++)
{
extendedByteArray1[i + 1] = byteArray[i];
}
stopWatch.Stop();
Console.WriteLine(string.Format("#1: {0} ms", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
//#2 Using a new array and Array.Copy
stopWatch.Start();
byte[] extendedByteArray2 = new byte[byteArray.Length + 1];
extendedByteArray2[0] = 0x00;
Array.Copy(byteArray, 0, extendedByteArray2, 1, byteArray.Length);
stopWatch.Stop();
Console.WriteLine(string.Format("#2: {0} ms", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
//#3 Using a List
stopWatch.Start();
List<byte> byteList = new List<byte>();
byteList.AddRange(byteArray);
byteList.Insert(0, 0x00);
byte[] extendedByteArray3 = byteList.ToArray();
stopWatch.Stop();
Console.WriteLine(string.Format("#3: {0} ms", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
Console.ReadLine();
And the results are:
#1: 9 ms
#2: 1 ms
#3: 6 ms
I've run it multiple times and I got different numbers, but the proportion is always the same: #2 is always the most efficient choice.
My conclusion: arrays are more efficient then Lists (although they provide less functionality), and somehow Array.Copy is really optmized (would like to understand that, though).
Any feedback will be appreciated.
Best regards.
PS: this is not a swordfight post, we are at a Q&A site to learn and teach. And learn.
The easiest and cleanest way for .NET 4.7.1 and above is to use the side-effect free Prepend().
Adds a value to the beginning of the sequence.
Example
// Creating an array of numbers
var numbers = new[] { 1, 2, 3 };
// Trying to prepend any value of the same type
var results = numbers.Prepend(0);
// output is 0, 1, 2, 3
Console.WriteLine(string.Join(", ", results ));
As you surmised, the fastest way to do this is to create new array of length + 1 and copy all the old values over.
If you are going to be doing this many times, then I suggest using a List<byte> instead of byte[], as the cost of reallocating and copying while growing the underlying storage is amortized more effectively; in the usual case, the underlying vector in the List is grown by a factor of two each time an addition or insertion is made to the List that would exceed its current capacity.
...
byte[] newValues = new byte[values.Length + 1];
newValues[0] = 0x00; // set the prepended value
Array.Copy(values, 0, newValues, 1, values.Length); // copy the old values
When I need to append data frequently but also want O(1) random access to individual elements, I'll use an array that is over allocated by some amount of padding for quickly adding new values. This means you need to store the actual content length in another variable, as the array.length will indicate the length + the padding. A new value gets appended by using one slot of the padding, no allocation & copy are necessary until you run out of padding. In effect, allocation is amortized over several append operations. There are speed space trade offs, as if you have many of these data structures you could have a fair amount of padding in use at any one time in the program.
This same technique can be used in prepending. Just as with appending, you can introduce an interface or abstraction between the users and the implementation: you can have several slots of padding so that new memory allocation is only necessary occasionally. As some above suggested, you can also implement a prepending interface with an appending data structure that reverses the indexes.
I'd package the data structure as an implementation of some generic collection interface, so that the interface appears fairly normal to the user (such as an array list or something).
(Also, if removal is supported, it's probably useful to clear elements as soon as they are removed to help reduce gc load.)
The main point is to consider the implementation and the interface separately, as decoupling them gives you the flexibility to choose varied implementations or to hide implementation details using a minimal interface.
There are many other data structures you could use depending on the applicability to your domain. Ropes or Gap Buffer; see What is best data structure suitable to implement editor like notepad?; Trie's do some useful things, too.
I know this is a VERY old post but I actually like using lambda. Sure my code may NOT be the most efficient way but its readable and in one line. I use a combination of .Concat and ArraySegment.
string[] originalStringArray = new string[] { "1", "2", "3", "5", "6" };
int firstElementZero = 0;
int insertAtPositionZeroBased = 3;
string stringToPrepend = "0";
string stringToInsert = "FOUR"; // Deliberate !!!
originalStringArray = new string[] { stringToPrepend }
.Concat(originalStringArray).ToArray();
insertAtPositionZeroBased += 1; // BECAUSE we prepended !!
originalStringArray = new ArraySegment<string>(originalStringArray, firstElementZero, insertAtPositionZeroBased)
.Concat(new string[] { stringToInsert })
.Concat(new ArraySegment<string>(originalStringArray, insertAtPositionZeroBased, originalStringArray.Length - insertAtPositionZeroBased)).ToArray();
The best choice depends on what you're going to be doing with this collection later on down the line. If that's the only length-changing edit that will ever be made, then your best bet is to create a new array with one additional slot and use Array.Copy() to do the rest. No need to initialize the first value, since new C# arrays are always zeroed out:
byte[] PrependWithZero(byte[] input)
{
var newArray = new byte[input.Length + 1];
Array.Copy(input, 0, newArray, 1, input.Length);
return newArray;
}
If there are going to be other length-changing edits that might happen, the most performant option might be to use a List<byte> all along, as long as the additions aren't always to the beginning. (If that's the case, even a linked list might not be an option that you can dismiss out of hand.):
var list = new List<byte>(input);
list.Insert(0, 0);
I am aware this is over 4-year-old accepted post, but for those who this might be relevant Buffer.BlockCopy would be faster.
I'm writing an app that will create thousands of small objects and store them recursively in array. By "recursively" I mean that each instance of K will have an array of K instances which will have and array of K instances and so on, and this array + one int field are the only properties + some methods. I found that memory usage grows very fast for even small amount of data - about 1MB), and when the data I'm processing is about 10MB I get the "OutOfMemoryException", not to mention when it's bigger (I have 4GB of RAM) :). So what do you suggest me to do? I figured, that if I'd create separate class V to process those objects, so that instances of K would have only array of K's + one integer field and make K as a struct, not a class, it should optimize things a bit - no garbage collection and stuff... But it's a bit of a challenge, so I'd rather ask you whether it's a good idea, before I start a total rewrite :).
EDIT:
Ok, some abstract code
public void Add(string word) {
int i;
string shorter;
if (word.Length > 0) {
i = //something, it's really irrelevant
if (t[i] == null) {
t[i] = new MyClass();
}
shorterWord = word.Substring(1);
//end of word
if(shorterWord.Length == 0) {
t[i].WordEnd = END;
}
//saving the word letter by letter
t[i].Add(shorterWord);
}
}
}
For me already when researching deeper into this I had the following assumptions (they may be inexact; i'm getting old for a programmer). A class has extra memory consumption because a reference is required to address it. Store the reference and an Int32 sized pointer is needed on a 32bit compile. Allocated always on the heap (can't remember if C++ has other possibilities, i would venture yes?)
The short answer, found in this article, Object has a 12bytes basic footprint + 4 possibly unused bytes depending on your class (has no doubt something to do with padding).
http://www.codeproject.com/Articles/231120/Reducing-memory-footprint-and-object-instance-size
Other issues you'll run into is Arrays also have an overhead. A possibility would be to manage your own offset into a larger array or arrays. Which in turn is getting closer to something a more efficient language would be better suited for.
I'm not sure if there are libraries that may provide Storage for small objects in an efficient manner. Probably are.
My take on it, use Structs, manage your own offset in a large array, and use proper packing instructions if it serves you (although i suspect this comes at a cost at runtime of a few extra instructions each time you address unevenly packed data)
[StructLayout(LayoutKind.Sequential, Pack = 1)]
Your stack is blowing up.
Do it iteratively instead of recursively.
You're not blowing the system stack up, your blowing the code stack up, 10K function calls will blow it out of the water.
You need proper tail recursion, which is just an iterative hack.
Make sure you have enough memory in your system. Over 100mb+ etc. It really depends on your system. Linked list, recursive objects is what you are looking at. If you keep recursing, it is going to hit the memory limit and nomemoryexception will be thrown. Make sure you keep track of the memory usage on any program. Nothing is unlimited, especially memory. If memory is limited, save it to a disk.
Looks like there is infinite recursion in your code and out of memory is thrown. Check the code. There should be start and end in recursive code. Otherwise it will go over 10 terrabyte memory at some point.
You can use a better data structure
i.e. each letter can be a byte (a-0, b-1 ... ). each word fragment can be in indexed also especially substrings - you should get away with significantly less memory (though a performance penalty)
Just list your recursive algorithm and sanitize variable names. If you are doing BFS type of traversal and keep all objects in memory, you will run out of mem. For example, in this case, replace it with DFS.
Edit 1:
You can speed up the algo by estimating how many items you will generate then allocate that much memory at once. As the algo progresses, fill up the allocated memory. This reduces fragmentation and reallocation & copy-on-full-array operations.
Nonetheless, after you are done operating on these generated words you should delete them from your datastructure so they can be GC-ed so you don't run out of mem.
This for small payloads.
I am looking to achieve 1,000,000,000 per 100ms.
The standard BinaryFormatter is very slow. The DataContractSerializer is slow than BinaryFormatter.
Protocol buffers (http://code.google.com/p/protobuf-net/) seems slower than the BinaryFormatter for small objects!
Are there any more Serialization mechanisms a should be looking at either hardcore coding or open source projects?
EDIT:
I am serializing in-memory then transmitting the payload over tcp on a async socket. The payloads generated in memory and are small double arrays (10 to 500 points) with a ulong identifier.
Your performance requirement restricts the available serializers to 0. A custom BinaryWriter and BinaryReader would be the fastest you could get.
I'd have expected Protobuf-net to be faster even for small objects... but you may want to try my Protocol Buffer port as well. I haven't used Marc's port for a while - mine was faster when I last benchmarked, but I'm aware that he's gone through a complete rewrite since then :)
I doubt that you'll achieve serializing a billion items in 100ms whatever you do though... I think that's simply an unreasonable expectation, especially if this is writing to disk. (Obviously if you're simply overwriting the same bit of memory repeatedly you'll get a lot better performance than serializing to disk, but I doubt that's really what you're trying to do.)
If you can give us more context, we may be able to help more. Are you able to spread the load out over multiple machines, for example? (Multiple cores serializing to the same IO device is unlikely to help, as I wouldn't expect this to be a CPU-bound operation if it's writing to a disk or the network.)
EDIT: Suppose each object is 10 doubles (8 bytes each) with a ulong identifier (4 bytes). That's 84 bytes per object at minimum. So you're trying to serialize 8.4GB in 100ms. I really don't think that's achievable, whatever you use.
I'm running my Protocol Buffers benchmarks now (they give bytes serialized per second) but I highly doubt they'll give you what you want.
You claim small items are slower than BinaryFormatter, but every time I'e measured it I've found the exact opposite, for example:
Performance Tests of Serializations used by WCF Bindings
I conclude, especially with the v2 code, that this may well be your fastest option. If you can post your specific benchmark scenario I'll happily help see what is "up"... If you can't post it here, if you want to email it to me directly (see profile) that would be OK too. I don't know if your stated timings are possible under any scheme, but I'm very sure I can get you a lot faster than whatever you are seeing.
With the v2 code, the CompileInPlace gives the fastest result - it allows some IL tricks that it can't use if compiling to a physical dll.
The only reason to serialize objects is to make them compatible with a generic transport medium. Network, disk, etc. The perf of the serializer never matters because the transport medium is always so much slower than the raw perf of a CPU core. Easily by two orders of magnitude or more.
Which is also the reason that attributes are an acceptable trade-off. They are also I/O bound, their initialization data has to be read from the assembly metadata. Which requires a disk read for the first time.
So, if you are setting perf requirements, you need to focus 99% on the capability of the transport medium. A billion 'payloads' in 100 milliseconds requires very beefy hardware. Assume a payload is 16 bytes, you'll need to move 160 gigabytes in a second. This is quite beyond even the memory bus bandwidth inside the machine. DDR RAM moves at about 5 gigabytes per second. A one gigabit Ethernet NIC moves at 125 megabytes per second, burst. A commodity hard drive moves at 65 megabytes per second, assuming no seeking.
Your goal is not realistic with current hardware capabilities.
You could write a custom serialization by implement ISerailizable on your data structures. Anyway you will probably face some "impedence" from the hardware itself to serialize with these requirements.
Proto-Buff is really quick but has got limitatins. => http://code.google.com/p/protobuf-net/wiki/Performance
In my experience, Marc's Protocol Buffers implementation is very good. I haven't used Jon's. However, you should be trying to use techniques to minimise the data and not serialise the whole lot.
I would have a look at the following.
If the messages are small you should look at what entropy you have. You may have fields that can be partially or completely be de-duplicated. If the communication is between two parties only you may get benefits from building a dictionary both ends.
You are using TCP which has an overhead enough without a payload on top. You should minimise this by batching your messages in to larger bundles and/or look at UDP instead. Batching itself when combined with #1 may get you closer to your requirement when you average your total communication out.
Is the full data width of double required or is it for convenience? If the extra bits are not used this will be a chance for optimisation when converting to a binary stream.
Generally generic serialisation is great when you have multiple messages you have to handle over a single interface or you don't know the full implementation details. In this case it would probably be better to build your own serialisation methods to convert a single message structure directly to byte arrays. Since you know the full implementation both sides direct conversion won't be a problem. It would also ensure that you can inline the code and prevent box/unboxing as much as possible.
This is the FASTEST approach i'm aware of. It does have its drawbacks. Like a rocket, you wouldn't want it on your car, but it has its place. Like you need to setup your structs and have that same struct on both ends of your pipe. The struct needs to be a fixed size, or it gets more complicated then this example.
Here is the perf I get on my machine (i7 920, 12gb ram) Release mode, without debugger attached. It uses 100% cpu during the test, so this test is CPU bound.
Finished in 3421ms, Processed 52.15 GB
For data write rate of 15.25 GB/s
Round trip passed
.. and the code...
class Program
{
unsafe
static void Main(string[] args)
{
int arraySize = 100;
int iterations = 10000000;
ms[] msa = new ms[arraySize];
for (int i = 0; i < arraySize; i++)
{
msa[i].d1 = i + .1d;
msa[i].d2 = i + .2d;
msa[i].d3 = i + .3d;
msa[i].d4 = i + .4d;
msa[i].d5 = i + .5d;
msa[i].d6 = i + .6d;
msa[i].d7 = i + .7d;
}
int sizeOfms = Marshal.SizeOf(typeof(ms));
byte[] bytes = new byte[arraySize * sizeOfms];
TestPerf(arraySize, iterations, msa, sizeOfms, bytes);
// lets round trip it.
var msa2 = new ms[arraySize]; // Array of structs we want to push the bytes into
var handle2 = GCHandle.Alloc(msa2, GCHandleType.Pinned);// get handle to that array
Marshal.Copy(bytes, 0, handle2.AddrOfPinnedObject(), bytes.Length);// do the copy
handle2.Free();// cleanup the handle
// assert that we didnt lose any data.
var passed = true;
for (int i = 0; i < arraySize; i++)
{
if(msa[i].d1 != msa2[i].d1
||msa[i].d1 != msa2[i].d1
||msa[i].d1 != msa2[i].d1
||msa[i].d1 != msa2[i].d1
||msa[i].d1 != msa2[i].d1
||msa[i].d1 != msa2[i].d1
||msa[i].d1 != msa2[i].d1)
{passed = false;
break;
}
}
Console.WriteLine("Round trip {0}",passed?"passed":"failed");
}
unsafe private static void TestPerf(int arraySize, int iterations, ms[] msa, int sizeOfms, byte[] bytes)
{
// start benchmark.
var sw = Stopwatch.StartNew();
// this cheats a little bit and reuses the same buffer
// for each thread, which would not work IRL
var plr = Parallel.For(0, iterations/1000, i => // Just to be nice to the task pool, chunk tasks into 1000s
{
for (int j = 0; j < 1000; j++)
{
// get a handle to the struc[] we want to copy from
var handle = GCHandle.Alloc(msa, GCHandleType.Pinned);
Marshal.Copy(handle.AddrOfPinnedObject(), bytes, 0, bytes.Length);// Copy from it
handle.Free();// clean up the handle
// Here you would want to write to some buffer or something :)
}
});
// Stop benchmark
sw.Stop();
var size = arraySize * sizeOfms * (double)iterations / 1024 / 1024 / 1024d; // convert to GB from Bytes
Console.WriteLine("Finished in {0}ms, Processed {1:N} GB", sw.ElapsedMilliseconds, size);
Console.WriteLine("For data write rate of {0:N} GB/s", size / (sw.ElapsedMilliseconds / 1000d));
}
}
[StructLayout(LayoutKind.Explicit, Size= 56, Pack=1)]
struct ms
{
[FieldOffset(0)]
public double d1;
[FieldOffset(8)]
public double d2;
[FieldOffset(16)]
public double d3;
[FieldOffset(24)]
public double d4;
[FieldOffset(32)]
public double d5;
[FieldOffset(40)]
public double d6;
[FieldOffset(48)]
public double d7;
}
If you don't want to take the time to implement a comprehensive explicit serialization/de-serialization mechanism, try this: http://james.newtonking.com/json/help/html/JsonNetVsDotNetSerializers.htm ...
In my usage with large objects (1GB+ when serialized to disk) I find that the file generated by the NewtonSoft library is 4.5 times smaller and takes 6 times fewer seconds to process than when using the BinaryFormatter.