Best practice/way to serialize the object using c# - c#

im working on collaborative drawing (socket-programming) in which i have to send and recieve 100 to 1000 Point of C# Class, so i wanted to know that what is the best way to send and recieve these points.. i have two options.. one is List<Point> and other is Point[] using either BinaryFormatter or JSON, but i have read that JSON is used to send small amount of data, and i dont know will it work with C# window applications or not
thanx for any help

There are so many ways to serialize your data.
If you want to transfer a lot of objects, don't use JSON - it's a text format, every unnecessary character is a byte waste of space. If your object uses, say, 20 bytes, and it's textual representation uses 100 bytes (e.g. because of the field names) than it's a bad choice for large collections, especially with network transfer.
Unless you need the serialized output to be readable, of course. That's, I believe, the only reason to use JSON.
Binary serialization is totally a different matter. There are many serializers out there: BinaryFormatter, protobuf-net by Marc Gravell, Migrant (which I co-author), many many others.
The choice is hard and domain-specific. Do you need graph dependency preserved? Do you have many different objects or large collections? Different libraries will give you different results.
As your data set is not very big (I assume we're talking about a small class/struct Point), I'd focus on readability. You don't want to design your code to be easily serializable, and you seldom want to write wrappers (because they have to be maintained).
Use datatypes that are meaningful in your context.
Do you need random access to your Points? Then probably you need an array. Do you need to create a resizable collection and iterate over it? List might be better.
I've ran a simple test, using 1000000 System.Windows.Point instances, both on BinaryFormatter and Migrant. There was no real difference whether I've used Point[] or List<Point>.
The choice is yours.
Here is a snippet of the test I've done. The code is not very pimped-up, but you'll change it to List<Point> with no effort. If you need a simple framework to serialize data, I may recommend Migrant. Please notice the one-liner usage: result = Serializer.DeepClone (source); ;-)
using System;
using Antmicro.Migrant;
using System.Diagnostics;
using System.Runtime.Serialization.Formatters.Binary;
using System.IO;
using System.Windows;
namespace test15
{
class MainClass
{
private static void EnsureEqual(Point[] source, Point[] result)
{
for (var i = 0; i < result.Length; ++i) {
if (source [i] != result [i]) {
throw new Exception ();
}
}
}
public static void Main (string[] args)
{
var source = new Point[1000000];
Point[] result, binResult;
var timer = new Stopwatch ();
for (var i = 0; i < source.Length; ++i) {
source [i] = new Point (i, i);
}
//Migrant
timer.Start ();
result = Serializer.DeepClone (source);
timer.Stop ();
EnsureEqual (source, result);
Console.WriteLine ("Migrant time: {0}", timer.Elapsed);
timer.Reset ();
//Binary formatter
var binaryForm = new BinaryFormatter ();
using (var ms = new MemoryStream ()) {
timer.Start ();
binaryForm.Serialize (ms, source);
ms.Position = 0;
binResult = binaryForm.Deserialize(ms) as Point[];
timer.Stop ();
}
Console.WriteLine ("Binary formatter time: {0}", timer.Elapsed);
EnsureEqual (source, binResult);
}
}
}

Related

Is Marshal.Copy too processor-intensive in this situation?

I am working on a realtime simulation model. The models are written in unmanaged code, but the models are controlled by C# managed code, called the ExecutiveManager. An ExecutiveManager runs multiple models at a time, and controls the timing of the running models (like if a model has a "framerate" of 20 per second, the executive will tell the models when to start it's next frame).
We are seeing a consistently high load on the CPU when running the simulation, it can get up to 100% and stay there on a machine that should be totally appropriate. I have used a processor profiler to determine where the issues are, and it pointed me to two methods: WriteMemoryRegion and ReadMemoryRegion. The ExecutiveManager makes the calls to these methods. Models have shared memory regions, and the ExecutiveManager is used to read and write these regions using these Methods. Both read and write make calls to Marshal.Copy, and my gut tells me that's where the issue is, but I don't want to trust my gut! We are going to do further testing to narrow things down more, but I wanted to do a quick sanity check on Marshal.Copy. WriteMemoryRegion and ReadMemoryRegion are called each frame, and furthermore they're called by each model in the ExecutiveManager, and each model typically has 6 shared regions. So for 10 models each with 6 regions running at 20 frames per second calling both WriteMemoryRegion and ReadMemoryRegion, that's 2400 calls of Marshal.Copy per second. Is this unreasonable, or could my problem lie elsewhere?
public async Task ReadMemoryRegion(MemoryRegionDefinition g) {
if (!cache.ContainsKey(g.Name)) {
cache.Add(g.Name, mmff.CreateOrOpen(g.Name, g.Size));
}
var mmf = cache[g.Name];
using (var stream = mmf.CreateViewStream())
using (var reader = brf.Create(stream)) {
var buffer = reader.ReadBytes(g.Size);
await WriteIcBuffer(g, buffer).ConfigureAwait(false);
}
}
private Task WriteIcBuffer(MemoryRegionDefinition g, byte[] buffer) {
Marshal.Copy(buffer, 0, new IntPtr(g.BaseAddress),
buffer.Length);
return Task.FromResult(0);
}
public async Task WriteMemoryRegion(MemoryRegionDefinition g) {
if (!cache.ContainsKey(g.Name)) {
if (g.Size > 0) {
cache.Add(g.Name, mmff.CreateOrOpen(g.Name, g.Size));
} else if (g.Size == 0){
throw new EmptyGlobalException($#"Global {g.Name} not
created as it does not contain any variables.");
} else {
throw new NegativeSizeGlobalException($#"Global {g.Name}
not created as it has a negative size.");
}
}
var mmf = cache[g.Name];
using (var stream = mmf.CreateViewStream())
using (var writer = bwf.Create(stream)) {
var buffer = await ReadIcBuffer(g);
writer.Write(buffer);
}
}
private Task<byte[]> ReadIcBuffer(MemoryRegionDefinition g) {
var buffer = new byte[g.Size];
Marshal.Copy(new IntPtr(g.BaseAddress), buffer, 0, g.Size);
return Task.FromResult(buffer);
}
I need to come up with a solution so that my processor isn't catching on fire. I'm very green in this area so all ideas are welcome. Again, I'm not sure Marshal.Copy is the issue, but it seems possible. Please let me know if you see other issues that could contribute to the processor problem.

Perform operations while streamreader is open or copy stream locally, close the stream and then perform operations?

Which of the following approaches is better? I meant to ask, is it better to copy the stream locally, close it and do whatever operations that are needed to be done using the data? or just perform operations with the stream open? Assume that the input from the stream is huge.
First method:
public static int calculateSum(string filePath)
{
int sum = 0;
var list = new List<int>();
using (StreamReader sr = new StreamReader(filePath))
{
while (!sr.EndOfStream)
{
list.Add(int.Parse(sr.ReadLine()));
}
}
foreach(int item in list)
sum += item;
return sum;
}
Second method:
public static int calculateSum(string filePath)
{
int sum = 0;
using (StreamReader sr = new StreamReader(filePath))
{
while (!sr.EndOfStream)
{
sum += int.Parse(sr.ReadLine());
}
}
return sum;
}
If the file is modified often, then read the data in and then work with it. If it is not accessed often, then you are fine to read the file one line at a time and work with each line separately.
In general, if you can do it in a single pass, then do it in a single pass. You indicate that the input is huge, so it might not all fit into memory. If that's the case, then your first option isn't even possible.
Of course, there are exceptions to every rule of thumb. But you don't indicate that there's anything special about the file or the access pattern (other processes wanting to access it, for example) that prevents you from keeping it open longer than absolutely necessary to copy the data.
I don't know if your example is a real-world scenario or if you're just using the sum thing as a placeholder for more complex processing. In any case, if you're processing a file line-by-line, you can save yourself a lot of trouble by using File.ReadLines:
int sum = 0;
foreach (var line in File.ReadLines(filePath))
{
sum += int.Parse(line);
}
This does not read the entire file into memory at once. Rather, it uses an enumerator to present one line at a time, and only reads as much as it must to maintain a relatively small (probably four kilobyte) buffer.

should i try to avoid "new" keyword in ultra-low-latency software?

I'm writing HFT trading software. I do care about every single microsecond. Now it written on C# but i will migrate to C++ soon.
Let's consider such code
// Original
class Foo {
....
// method is called from one thread only so no need to be thread-safe
public void FrequentlyCalledMethod() {
var actions = new List<Action>();
for (int i = 0; i < 10; i++) {
actions.Add(new Action(....));
}
// use actions, synchronous
executor.Execute(actions);
// now actions can be deleted
}
I guess that ultra-low latency software should not use "new" keyword too much, so I moved actions to be a field:
// Version 1
class Foo {
....
private List<Action> actions = new List<Action>();
// method is called from one thread only so no need to be thread-safe
public void FrequentlyCalledMethod() {
actions.Clear()
for (int i = 0; i < 10; i++) {
actions.Add(new Action { type = ActionType.AddOrder; price = 100 + i; });
}
// use actions, synchronous
executor.Execute(actions);
// now actions can be deleted
}
And probably I should try to avoid "new" keyword at all? I can use some "pool" of pre-allocated objects:
// Version 2
class Foo {
....
private List<Action> actions = new List<Action>();
private Action[] actionPool = new Action[10];
// method is called from one thread only so no need to be thread-safe
public void FrequentlyCalledMethod() {
actions.Clear()
for (int i = 0; i < 10; i++) {
var action = actionsPool[i];
action.type = ActionType.AddOrder;
action.price = 100 + i;
actions.Add(action);
}
// use actions, synchronous
executor.Execute(actions);
// now actions can be deleted
}
How far should I go?
How important to avoid new?
Will I win anything while using preallocated object which I only need to configure? (set type and price in example above)
Please note that this is ultra-low latency so let's assume that performance is preferred against readability maintainability etc. etc.
In C++ you don't need new to create an object that has limited scope.
void FrequentlyCalledMethod()
{
std::vector<Action> actions;
actions.reserve( 10 );
for (int i = 0; i < 10; i++)
{
actions.push_back( Action(....) );
}
// use actions, synchronous
executor.Execute(actions);
// now actions can be deleted
}
If Action is a base class and the actual types you have are of a derived class, you will need a pointer or smart pointer and new here. But no need if Action is a concrete type and all the elements will be of this type, and if this type is default-constructible, copyable and assignable.
In general though, it is highly unlikely that your performance benefits will come from not using new. It is just good practice here in C++ to use local function scope when that is the scope of your object. This is because in C++ you have to take more care of resource management, and that is done with a technique known as "RAII" - which essentially means taking care of how a resource will be deleted (through a destructor of an object) at the point of allocation.
High performance is more likely to come about through:
proper use of algorithms
proper parallel-processing and synchronisation techniques
effective caching and lazy evaluation.
As much as I detest HFT, I'm going to tell you how to get maximum performance out of each thread on a given piece of iron.
Here's an explanation of an example where a program as originally written was made 730 times faster.
You do it in stages. At each stage, you find something that takes a good percentage of time, and you fix it.
The keyword is find, as opposed to guess.
Too many people just eyeball the code, and fix what they think will help, and often but not always it does help, some.
That's guesswork.
To get real speedup, you need to find all the problems, not just the few you can guess.
If your program is doing new, then chances are at some point that will be what you need to fix.
But it's not the only thing.
Here's the theory behind it.
For high-performance trading engines at good HFT shops, avoiding new/malloc in C++ code is a basic.

How does IEnumerable differ from IObservable under the hood?

I'm curious as to how IEnumerable differs from IObservable under the hood. I understand the pull and push patterns respectively but how does C#, in terms of memory etc, notify subscribers (for IObservable) that it should receive the next bit of data in memory to process? How does the observed instance know it's had a change in data to push to the subscribers.
My question comes from a test I was performing reading in lines from a file. The file was about 6Mb in total.
Standard Time Taken: 4.7s, lines: 36587
Rx Time Taken: 0.68s, lines: 36587
How is Rx able to massively improve a normal iteration over each of the lines in the file?
private static void ReadStandardFile()
{
var timer = Stopwatch.StartNew();
var linesProcessed = 0;
foreach (var l in ReadLines(new FileStream(_filePath, FileMode.Open)))
{
var s = l.Split(',');
linesProcessed++;
}
timer.Stop();
_log.DebugFormat("Standard Time Taken: {0}s, lines: {1}",
timer.Elapsed.ToString(), linesProcessed);
}
private static void ReadRxFile()
{
var timer = Stopwatch.StartNew();
var linesProcessed = 0;
var query = ReadLines(new FileStream(_filePath, FileMode.Open)).ToObservable();
using (query.Subscribe((line) =>
{
var s = line.Split(',');
linesProcessed++;
}));
timer.Stop();
_log.DebugFormat("Rx Time Taken: {0}s, lines: {1}",
timer.Elapsed.ToString(), linesProcessed);
}
private static IEnumerable<string> ReadLines(Stream stream)
{
using (StreamReader reader = new StreamReader(stream))
{
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
}
My hunch is the behavior you're seeing is reflecting the OS caching the file. I would imagine if you reversed the order of the calls you would see a similar difference in speeds, just swapped.
You could improve this benchmark by performing a few warm-up runs or by copying the input file to a temp file using File.Copy prior to testing each one. This way the file would not be "hot" and you would get a fair comparison.
I'd suspect that you're seeing some kind of internal optimization of the CLR. It probably caches the content of the file in memory between the two calls so that ToObservable can pull the content much faster...
Edit: Oh, the good colleague with the crazy nickname eeh ... #sixlettervariables was faster and he's probably right: it's rather the OS who's optimizing than the CLR.

Yield multiple IEnumerables

I have an piece of code that does calculations on assets. There are many millions of those so I want to compute everything in streams. My current 'pipeline' looks like this:
I have a query that is executed as a Datareader.
Then my Asset class has a constructor that accepts an IDataReader;
Public Asset(IdataReader rdr){
// logic that initiates fields
}
and a method that converts the IDataReader to an IEnumerable<Asset>
public static IEnumerable<Asset> ToAssets(IDataReader rdr) {
// make sure the reader is in the right formt
CheckReaderFormat(rdr);
// project reader into IEnumeable<Asset>
while (rdr.Read()) yield return new Asset(rdr);
}
That then gets passed into a function that does the actually calculations and then projects it into a IEnumerable<Asnwer>
That then gets a wrapper the exposes the Answers as an IDataReader and that then that gets passed to a OracleBulkCopy and the stream is written to the DB.
So far it works like a charm. Because of the setup I can swap the DataReader for an IEnumerable that reads from a file, or have the results written to a file etc. All depending on how I string the classes/ functions together.
Now: There are several thing I can compute, for instance besides the normal Answer I could have a DebugAnswer class that also outputs some intermediate numbers for debugging. So what I would like to do is project the IEnumerable into several output streams so I can put 'listeners' on those. That way I won't have to go over the data multiple times. How can I do that? Kind of like having several Events and then only fire certain code if there's a listeners attached.
Also sometimes I write to the DB but also to a zipfile just to keep a backup of the results. So then I would like to have 2 'listeners' on the IEnumerable. One that projects is as an IDataReader and another one that writes straight to the file.
How do I output multiple output streams and how can I put multiple listeners on one outputstream? What lets me compose streams of data like that?
edit
so some pseudocode of what I would like to do:
foreach(Asset in Assets){
if(DebugListener != null){
// compute
DebugAnswer da = new DebugAnswer {result = 100};
yield da to DebugListener; // so instead of yield return yield to that stream
}
if(AnswerListener != null){
// compute basic stuff
Answer a = new Answer { bla = 200 };
yield a to AnswerListener;
}
}
Thanks in advance,
Gert-Jan
What you're describing sounds sort of like what the Reactive framework provides via the IObservable interface, but I don't know for sure whether it allows multiple subscribers to a single subscription stream.
Update
If you take a look at the documentation for IObservable, it has a pretty good example of how to do the sort of thing you're doing, with multiple subscribers to a single object.
Your example rewritten using Rx:
// The stream of assets
IObservable<Asset> assets = ...
// The stream of each asset projected to a DebugAnswer
IObservable<DebugAnswer> debugAnswers = from asset in assets
select new DebugAnswer { result = 100 };
// Subscribe the DebugListener to receive the debugAnswers
debugAnswers.Subscribe(DebugListener);
// The stream of each asset projected to an Anwer
IObservable<Answer> answers = from asset in assets
select new Answer { bla = 200 };
// Subscribe the AnswerListener to receive the answers
answers.Subscribe(AnswerListener);
This is exactly the job for Reactive Extensions (became part of .NET since 4.0, available as a library in 3.5).
You don't need multiple "listeners", you just need pipeline components that aren't destructive or even necessarily transformable.
IEnumerable<T> PassThroughEnumerable<T>(IEnumerable<T> source, Action<T> action) {
foreach (T t in source) {
Action(t);
yield return t;
}
}
Or, as you're processing in the pipeline just raise some events to be consumed. You can async them if you want:
static IEnumerable<Asset> ToAssets(IDataReader rdr) {
CheckReaderFormat(rdr);
var h = this.DebugAsset;
while (rdr.Read()) {
var a = new Asset(rdr);
if (h != null) h(a);
yield return a;
}
}
public event EventHandler<Asset> DebugAsset;
If I got you right, it should be possible to replace or decorate the wrapper. The WrapperDecorator may forward calls to the normal OracleBulkCopy (or whatever you're using) and add some custom debug code.
Does that help you?
Matthias

Categories