XML Serialization - Efficient?

XML Serialization - Efficient? - c#

Hey everybody. I'm creating a catalog app where users add/download information on cars. This could result in hundreds, possibly thousands, of cars and their data (make, model, year, image etc...). Seeing as WP7 no database, I'm using XML. My question is, would it be efficient to store every object in a list, then serialize that entire list? When the user loads the app, the entire list is deserialized and every object is instantiated. Is there a better way of doing this? Thanks.
ps - I've come across DataContractSerializer, but not sure if I should use that since it seems to be WCF related (and I'm not using WCF in my app).

Just do it and see. Unless every aspect of this is totally new to you, it should take less time to prototype and test something like this than it would take to have a discussion about it on SO - especially since the end result of the SO discussion will probably be someone telling you to prototype and test it.
If it's too slow, then you can look at alternatives - using a different kind of serialization method, partially deserializing the objects at startup to get the UI up and running and then continuing the deserialization in the background, or whatever.

Related

Best way to handle large amount of permanent data

I'm developing a PC app in Visual Studio where I'm showing the status of hundreds of sensors that are connected via WiFi. The thing is that I need to hold on to the sensor data even after I close the app, so I'm considering some form of permanent storage. These are the options I've considered:
1) My Sensor object is relatively compact with only a few properties. I could serialize all the objects before closing the app and load them every time the app starts anew.
2) I could throw all the properties (which are mostly strings and doubles) into a simple text file and create a custom protocol for storage and retrieval.
3) I could integrate a database with my app. Someone told me this is the best way to go about it, but I'm a bit hesitant seeing as I'm not familiar with DBs.
Which method would yield the best results in terms of resource usage and speed? Or is there some other, better way to go about this?

First thing you need is to understand is your problem. For example, when the program is running do you need to have everything in memory at the same time or do you work with your sensors one at a time?
What is a "large amount of data"? For example, to me that will never be less than million (or billion in some cases).
Once you know that you shouldn't be scared of using something just because you are not familiar to it. Otherwise you are not looking for the best solution for your problem, you are just hacking around it in a way that you feel comfortable.
This being said, you have several ways of doing this. Like you said you can serialize data, using json to store and a few other alternatives but if we are talking about a "large amount of data that we want to persist" I would always call for the use of Databases (the name says a lot). If you don't need to have everything in memory at the same time then I believe that this is you best option.
I personally don't like them (again, personal choice) but one way of not learning SQL (a lot) while you still use your objects is to use an ORM like NHibernate (you will also need to learn how to use it so you don't get things a slower).
If you need to have everything loaded at the same time (most often that is not the case so be sure of this) you need to know what you want to keep and serialize it. If you want that data to be readable by another tool or organize in a given way consider a data format like XML or JSON.

Also, you can use mmap-file.
File is permanent, and keep data between program run.
So, you just keep your data structs in the mmap-ed area, and no more.
MSDN manual here:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366556%28v=vs.85%29.aspx

Since you need to load all the data once at the start of the program, the database case seems doubtful. The DB necessary when you need to load a bit of data many times.
So first two cases seem more preferred. I would advice to hide a specific solution behind an interface, then you'll can change it later.
Standard .NET serialization of sensors' array is more simple probably, and it will be easier to expand.

Can a DataContractSerializer be setup to ignore errors in a file rather then just fail entirely?

I'm using DataContractSerializer to save a large number of different classes which make up a tree structures to XML files. I'm in the initial stages of writing this software so at this point all the different components are changing around quite a bit. Yet every time I make a change to a class I end up breaking my programs ability to open previously saved files.
My tree structures will still be functional if components are missing. Is there some way to tell DataContractSerializer to skip over data it has a problem deserializing and continue on rather then just quitting at the first problem it has?
I know one answer would be to write my own serialization class, but I'd rather not spend the time to do that. I was hopping to still be able to take advantage of DataContractSerializer, but with out it being an all or nothing situation.

I think what you're looking for is IExtensibleDataObject. This way, any unexpected elements get read into a name-value dictionary maintained internally, and can even be serialized back later. See the following resources for help.
Blog post -- WCF Extensibility – Other Serialization Extensions
Forward-Compatible Data Contracts
Data Contract Versioning

Undo Redo in WPF/C# in an already functional application

I have done some research already as to how I can achieve the title of this question. The app I am working on has been under development for a couple of years or so (slow progress though, you all know how it is in the real world). It is now a requirement for me to put in Undo/Redo multiple level functionality. It's a bit late to say "you should have thought about this before you started" ... well, we did think about it - and we did nothing about it and now here it is. From searching around SO (and external links) I can see that the two most common methods appear to be ...
Command Pattern
Memento Pattern
The command pattern looks like it would be a hell of a lot of work, I can only imagine it throwing up thousands of bugs in the process too so I don't really fancy that one.
The Memento pattern is actually a lot like what I had in my head for this. I was thinking if there was some way to quickly take a snapshot of the object model currently in memory, then I would be able to store it somewhere (maybe also in memory, maybe in a file). It seems like a great idea, the only problem I can see for this, is how it will integrate with what we have already written. You see the app as we have it draws images in a big panel (potentially hundreds) and then allows the user to manipulate them either via the UI or via a custom built properties grid. The entire app is linked up with a big observer pattern. The second anything changes, events are fired and everything that needs to update does. This is nice but I cant help thinking that if a user is entering text into a texfield on the properties grid there will be a bit of delay before the UI catches up (seems as everytime the user presses a key, a new snapshot will be added to the undo list). So my question to you is ....
Do you know of any good alternatives to the Memento pattern that might work.
Do you think the Memento pattern will fit in here or will it slow the app down too much.
If the Memento pattern is the way to go, what is the most efficient way to make a snapshot of the object model (i was thinking serialising it or something)
Should the snapshots be stored in memory or is it possible to put them into files?
If you have got this far, thankyou kindly for reading. Any input you have will be valuable and very much appreciated.

Well , Here is my thought on this problem.
1- You need multi level undo/redo functionality. so you need to store user actions performed which can be stored in a stack.
2- Your second problem how to identify what has been changed by a operation i think through Memento pattern , it is quite a challenge. Memento is all about toring initial object state in your memory.
either , you need to store what is changed by a operation so that you can use this information to undo the opertions.
Command pattern is designed for the Undo/Redo functionality and i would say that its late but its worth while to implement the design which is being used for several years and works for most of the applications.

If performance allows it you could serialize your domain before each action. A few hundred objects is not much if the objects aren't big themselves.
Since your object graph is probably non trivial (i.e. uses inheritance, cycles,...) the integrated XmlSerializer and JsonSerializers are out of question. Json.net supports these, but does some lossy conversions on some types (local DateTimes, numbers,...) so it's bad too.
I think the protobuf serializers need either some form of DTD(.proto file) or decoration of all properties with attributes mapping their name to a number, so it might not be optimal.
BinaryFormatter can serialize most stuff, you just need to decorate all classes with the [Serializable] attribute. But I haven't used it myself, so there might be pitfalls I'm not aware of. Perhaps related to Singletons or events.

The critical things for undo/redo are
knowing what state you need to save and restore
knowing when you need to save the state
Adding undo/redo after the fact is always a painful thing to do - (I know this comment is of no use to you now, but it's always best to design support into the application framework before you start, as it helps people use undo-friendly patterns throughout development).
Possibly the simplest approach will be a memento-based one:
Locate all the data that makes up your "document". Can you unify this data in some way so that it forms a coherent whole? Usually if you can serialise your document structure to a file, the logic you need is in the serialisation system, so that gives you a way in. The down side to using this directly is usually that you will usually have to serialise everything so your undo will be huge and slow. If possible, refactor code so that (a) there is a common serialisation interface used throughout the application (so any and every part of your data can be saved/restored using a generic call), and (b) every sub-system is encapsulated so that modifications to the data have to go through a common interface (rather than lots of people modifying member variables directly, they should all call an API provided by the object to request that it makes changes to itself) and (c) every sub-portion of the data keeps a "version number". Every time an alteration is made (through the interface in (b)) it should increment that version number. This approach means you can now scan your entire document and use the version numbers to find just the parts of it that have changed since you last looked, and then serialise the minimal amount to save and restore the changed state.
Provide a mechanism whereby a single undo step can be recorded. This means allowing multple systems to make changes to the data structure, and then when everything has been updated, triggering an undo recording. Working out when to do this may be tricky, but it can usually be accomplished by scanning your document for changes (see above) in your message loop, when your UI has finished processing each input event.
Beyond that, I'd advise going for a command based approach, because there are many benefits to it besides undo/redo.

You may find the Monitored Undo Framework to be useful. http://muf.codeplex.com/
It uses something similar to the memento pattern, by monitoring for changes as they happen and allows you to put delegates on the undo stack that will reverse / redo the change.
I considered an approach that would serialize / deserialize the document but was concerned about the overhead. Instead, I monitor for changes in the model (or view model) on a property by property bases. Then, as needed, I use the MUF library to "batch" related changes so that they undo / redo as a unit of change.
The fact that you have your UI setup to react to changes in the underlying model is good. It sounds like you could inject the undo / redo logic there and the changes would bubble up to the UI.
I don't think that you'd see much lag or performance degradation. I have a similar application, with a diagram that we render based on the data in the model. We've had good results with this so far.
You can find more info and documentation on the codeplex site at http://muf.codeplex.com/. The library is also available via NuGet, with support for .NET 3.5, 4.0, SL4 and WP7.

Looking for the most painless non-RDBMS storage method in C#

I'm writing a simple program that will run entirely client-side. (Desktop programming? do people still do that?) and I need a simple way to store trivial amounts of data in a structured form, but really don't see any need to use a database system. What's more, some of the data needs to be serialized and passed around to different users, like some kind of "file" or perhaps a "document". (has anyone ever done that before?)
So, I've looked at using .Net DataSets, LINQ, direct XML manipulation, and they all seem like they would get the job done, but I would like to know before I dive into any of them if there's one method that is generally regarded as easier to code than others. As I said, the amount of data to be stored is trivial, even if one hundred people all used the same machine we're not talking about more than 10 MB, so performance is not as large a concern as is codeability/maintainability. Thank you all in advance!

Sounds like Linq-to-XML is a good option for this.
Link 1
Link 2
Tons of info out there on this.

Without knowing anything else about your app, the .Net DataSets would likely be your easiest option because WriteXml and ReadXml already exist.

Any serialization API should do fine here. I would recommend something that is contract based (not BinaryFormatter, which is type-based) as that will keep it usable over time (as your assembly changes).
So I would build a basic object model (DTO) and use any of:
XmlSerializer
DataContractSerializer
protobuf-net (you all knew it was coming...)
OO, simple, and easy. And easy to use for passing fragments of the data (either between users of to a central server).

I would choose an embedded database. Using something like sqlite doesn't seem to be an overkill for me. You may even try its c# port (http://code.google.com/p/csharp-sqlite/).

How should I test a method that populates a list from a DataReader?

So I'm working on some legacy code that's heavy on the manual database operations. I'm trying to maintain some semblance of quality here, so I'm going TDD as much as possible.
The code I'm working on needs to populate, let's say a List<Foo> from a DataReader that returns all the fields required for a functioning Foo. However, if I want to verify that the code in fact returns one list item per one database row, I'm writing test code that looks something like this:
Expect.Call(reader.Read()).Return(true);
Expect.Call(reader["foo_id"]).Return((long) 1);
// ....
Expect.Call(reader.Read()).Return(true);
Expect.Call(reader["foo_id"]).Return((long) 2);
// ....
Expect.Call(reader.Read()).Return(false);
Which is rather tedious and rather easily broken, too.
How should I be approaching this issue so that the result won't be a huge mess of brittle tests?
Btw I'm currently using Rhino.Mocks for this, but I can change it if the result is convincing enough. Just as long as the alternative isn't TypeMock, because their EULA was a bit too scary for my tastes last I checked.
Edit: I'm also currently limited to C# 2.

To make this less tedious, you will need to encapsulate/refactor the mapping between the DataReader and the Object you hold in the list. There is quite of few steps to encapsulate that logic out. If that is the road you want to take, I can post code for you. I am just not sure how practical it would be to post the code here on StackOverflow, but I can give it a shot to keep it concise and to the point. Otherwise, you are stuck with the tedious task of repeating each expectation on the index accessor for the reader. The encapsulation process will also get rid of the strings and make those strings more reusable through your tests.
Also, I am not sure at this point how much you want to make the existing code more testable. Since this is legacy code that wasn't built with testing in mind.

I thought about posting some code and then I remembered about JP Boodhoo's Nothin But .NET course. He has a sample project that he is sharing that was created during one of his classes. The project is hosted on Google Code and it is a nice resource. I am sure it has some nice tips for you to use and give you ideas on how to refactor the mapping. The whole project was built with TDD.

You can put the Foo instances in a list and compare the objects with what you read:
var arrFoos = new Foos[]{...}; // what you expect
var expectedFoos = new List<Foo>(arrFoos); // make a list from the hardcoded array of expected Foos
var readerResult = ReadEntireList(reader); // read everything from reader and put in List<Foo>
Expect.ContainSameFoos(expectedFoos, readerResult); // compare the two lists

Kokos,
Couple of things wrong there. First, doing it that way means I have to construct the Foos first, then feed their values to the mock reader which does nothing to reduce the amount of code I'm writing. Second, if the values pass through the reader, the Foos won't be the same Foos (reference equality). They might be equal, but even that's assuming too much of the Foo class that I don't dare touch at this point.

Just to clarify, you want to be able to test your call into SQL Server returned some data, or that if you had some data you could map it back into the model?
If you want to test your call into SQL returned some data checkout my answer found here

#Toran: What I'm testing is the programmatic mapping from data returned from the database to quote-unquote domain model. Hence I want to mock out the database connection. For the other kind of test, I'd go for all-out integration testing.
#Dale: I guess you nailed it pretty well there, and I was afraid that might be the case. If you've got pointers to any articles or suchlike where someone has done the dirty job and decomposed it into more easily digestible steps, I'd appreciate it. Code samples wouldn't hurt either. I do have a clue on how to approach that problem, but before I actually dare do that, I'm going to need to get other things done, and if testing that will require tedious mocking, then that's what I'll do.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.