Best way to handle large amount of permanent data

Best way to handle large amount of permanent data - c#

I'm developing a PC app in Visual Studio where I'm showing the status of hundreds of sensors that are connected via WiFi. The thing is that I need to hold on to the sensor data even after I close the app, so I'm considering some form of permanent storage. These are the options I've considered:
1) My Sensor object is relatively compact with only a few properties. I could serialize all the objects before closing the app and load them every time the app starts anew.
2) I could throw all the properties (which are mostly strings and doubles) into a simple text file and create a custom protocol for storage and retrieval.
3) I could integrate a database with my app. Someone told me this is the best way to go about it, but I'm a bit hesitant seeing as I'm not familiar with DBs.
Which method would yield the best results in terms of resource usage and speed? Or is there some other, better way to go about this?

First thing you need is to understand is your problem. For example, when the program is running do you need to have everything in memory at the same time or do you work with your sensors one at a time?
What is a "large amount of data"? For example, to me that will never be less than million (or billion in some cases).
Once you know that you shouldn't be scared of using something just because you are not familiar to it. Otherwise you are not looking for the best solution for your problem, you are just hacking around it in a way that you feel comfortable.
This being said, you have several ways of doing this. Like you said you can serialize data, using json to store and a few other alternatives but if we are talking about a "large amount of data that we want to persist" I would always call for the use of Databases (the name says a lot). If you don't need to have everything in memory at the same time then I believe that this is you best option.
I personally don't like them (again, personal choice) but one way of not learning SQL (a lot) while you still use your objects is to use an ORM like NHibernate (you will also need to learn how to use it so you don't get things a slower).
If you need to have everything loaded at the same time (most often that is not the case so be sure of this) you need to know what you want to keep and serialize it. If you want that data to be readable by another tool or organize in a given way consider a data format like XML or JSON.

Also, you can use mmap-file.
File is permanent, and keep data between program run.
So, you just keep your data structs in the mmap-ed area, and no more.
MSDN manual here:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366556%28v=vs.85%29.aspx

Since you need to load all the data once at the start of the program, the database case seems doubtful. The DB necessary when you need to load a bit of data many times.
So first two cases seem more preferred. I would advice to hide a specific solution behind an interface, then you'll can change it later.
Standard .NET serialization of sensors' array is more simple probably, and it will be easier to expand.

Related

How to properly save application data for later use

Ok, so I am working on a c# windows forms application and it uses different types of structures that hold data and display to the user. I want to use a saveDialogBox to allow the user to save the information(i.e configuration, state). The only way I can think to do this is to make a routine that goes through the structures and write the corresponding elements to a text file. Upon loading this routine would be used to load the data back.
This is of course a dumb way to do it I'll admit. Anything I've done in school was only writing to text files. Is there other ways to make some formatted file to save and load from?
I've been looking at serialization to save objects to files. I am not too sure how all this works though. help.

to save your application setting .. I think these links will help you
http://msdn.microsoft.com/en-us/library/aa730869%28VS.80%29.aspx
http://www.thescarms.com/dotnet/AppSettings.aspx
and
How to use settings in Visual C#

My 'Old School' way of doing this has always been to save settings during the program execution to a database (providing that you take the time to ensure you're not hammering the database with updates / inserts).
If my application needs to be more efficient AND I need to easily be able to recall the saved settings I serialize to XML using System.Xml.Serialization (from memory). XML serialization is human readable which is helpful (but not the most efficient in terms of processing time).
If I need even more efficiency you can go the whole way and serialize to binary.
I'd suggest reading / understanding http://msdn.microsoft.com/en-us/library/Vstudio/ms233843.aspx in it's entirety before coming back here. I'd say once you read this you'll be far better equipped to make a decision on which way you want to take your application.
In my experience there aren't that many DUMB ways to solve problems however there is almost always a better way to solve them given enough time and research.

Undo Redo in WPF/C# in an already functional application

I have done some research already as to how I can achieve the title of this question. The app I am working on has been under development for a couple of years or so (slow progress though, you all know how it is in the real world). It is now a requirement for me to put in Undo/Redo multiple level functionality. It's a bit late to say "you should have thought about this before you started" ... well, we did think about it - and we did nothing about it and now here it is. From searching around SO (and external links) I can see that the two most common methods appear to be ...
Command Pattern
Memento Pattern
The command pattern looks like it would be a hell of a lot of work, I can only imagine it throwing up thousands of bugs in the process too so I don't really fancy that one.
The Memento pattern is actually a lot like what I had in my head for this. I was thinking if there was some way to quickly take a snapshot of the object model currently in memory, then I would be able to store it somewhere (maybe also in memory, maybe in a file). It seems like a great idea, the only problem I can see for this, is how it will integrate with what we have already written. You see the app as we have it draws images in a big panel (potentially hundreds) and then allows the user to manipulate them either via the UI or via a custom built properties grid. The entire app is linked up with a big observer pattern. The second anything changes, events are fired and everything that needs to update does. This is nice but I cant help thinking that if a user is entering text into a texfield on the properties grid there will be a bit of delay before the UI catches up (seems as everytime the user presses a key, a new snapshot will be added to the undo list). So my question to you is ....
Do you know of any good alternatives to the Memento pattern that might work.
Do you think the Memento pattern will fit in here or will it slow the app down too much.
If the Memento pattern is the way to go, what is the most efficient way to make a snapshot of the object model (i was thinking serialising it or something)
Should the snapshots be stored in memory or is it possible to put them into files?
If you have got this far, thankyou kindly for reading. Any input you have will be valuable and very much appreciated.

Well , Here is my thought on this problem.
1- You need multi level undo/redo functionality. so you need to store user actions performed which can be stored in a stack.
2- Your second problem how to identify what has been changed by a operation i think through Memento pattern , it is quite a challenge. Memento is all about toring initial object state in your memory.
either , you need to store what is changed by a operation so that you can use this information to undo the opertions.
Command pattern is designed for the Undo/Redo functionality and i would say that its late but its worth while to implement the design which is being used for several years and works for most of the applications.

If performance allows it you could serialize your domain before each action. A few hundred objects is not much if the objects aren't big themselves.
Since your object graph is probably non trivial (i.e. uses inheritance, cycles,...) the integrated XmlSerializer and JsonSerializers are out of question. Json.net supports these, but does some lossy conversions on some types (local DateTimes, numbers,...) so it's bad too.
I think the protobuf serializers need either some form of DTD(.proto file) or decoration of all properties with attributes mapping their name to a number, so it might not be optimal.
BinaryFormatter can serialize most stuff, you just need to decorate all classes with the [Serializable] attribute. But I haven't used it myself, so there might be pitfalls I'm not aware of. Perhaps related to Singletons or events.

The critical things for undo/redo are
knowing what state you need to save and restore
knowing when you need to save the state
Adding undo/redo after the fact is always a painful thing to do - (I know this comment is of no use to you now, but it's always best to design support into the application framework before you start, as it helps people use undo-friendly patterns throughout development).
Possibly the simplest approach will be a memento-based one:
Locate all the data that makes up your "document". Can you unify this data in some way so that it forms a coherent whole? Usually if you can serialise your document structure to a file, the logic you need is in the serialisation system, so that gives you a way in. The down side to using this directly is usually that you will usually have to serialise everything so your undo will be huge and slow. If possible, refactor code so that (a) there is a common serialisation interface used throughout the application (so any and every part of your data can be saved/restored using a generic call), and (b) every sub-system is encapsulated so that modifications to the data have to go through a common interface (rather than lots of people modifying member variables directly, they should all call an API provided by the object to request that it makes changes to itself) and (c) every sub-portion of the data keeps a "version number". Every time an alteration is made (through the interface in (b)) it should increment that version number. This approach means you can now scan your entire document and use the version numbers to find just the parts of it that have changed since you last looked, and then serialise the minimal amount to save and restore the changed state.
Provide a mechanism whereby a single undo step can be recorded. This means allowing multple systems to make changes to the data structure, and then when everything has been updated, triggering an undo recording. Working out when to do this may be tricky, but it can usually be accomplished by scanning your document for changes (see above) in your message loop, when your UI has finished processing each input event.
Beyond that, I'd advise going for a command based approach, because there are many benefits to it besides undo/redo.

You may find the Monitored Undo Framework to be useful. http://muf.codeplex.com/
It uses something similar to the memento pattern, by monitoring for changes as they happen and allows you to put delegates on the undo stack that will reverse / redo the change.
I considered an approach that would serialize / deserialize the document but was concerned about the overhead. Instead, I monitor for changes in the model (or view model) on a property by property bases. Then, as needed, I use the MUF library to "batch" related changes so that they undo / redo as a unit of change.
The fact that you have your UI setup to react to changes in the underlying model is good. It sounds like you could inject the undo / redo logic there and the changes would bubble up to the UI.
I don't think that you'd see much lag or performance degradation. I have a similar application, with a diagram that we render based on the data in the model. We've had good results with this so far.
You can find more info and documentation on the codeplex site at http://muf.codeplex.com/. The library is also available via NuGet, with support for .NET 3.5, 4.0, SL4 and WP7.

Looking for the most painless non-RDBMS storage method in C#

I'm writing a simple program that will run entirely client-side. (Desktop programming? do people still do that?) and I need a simple way to store trivial amounts of data in a structured form, but really don't see any need to use a database system. What's more, some of the data needs to be serialized and passed around to different users, like some kind of "file" or perhaps a "document". (has anyone ever done that before?)
So, I've looked at using .Net DataSets, LINQ, direct XML manipulation, and they all seem like they would get the job done, but I would like to know before I dive into any of them if there's one method that is generally regarded as easier to code than others. As I said, the amount of data to be stored is trivial, even if one hundred people all used the same machine we're not talking about more than 10 MB, so performance is not as large a concern as is codeability/maintainability. Thank you all in advance!

Sounds like Linq-to-XML is a good option for this.
Link 1
Link 2
Tons of info out there on this.

Without knowing anything else about your app, the .Net DataSets would likely be your easiest option because WriteXml and ReadXml already exist.

Any serialization API should do fine here. I would recommend something that is contract based (not BinaryFormatter, which is type-based) as that will keep it usable over time (as your assembly changes).
So I would build a basic object model (DTO) and use any of:
XmlSerializer
DataContractSerializer
protobuf-net (you all knew it was coming...)
OO, simple, and easy. And easy to use for passing fragments of the data (either between users of to a central server).

I would choose an embedded database. Using something like sqlite doesn't seem to be an overkill for me. You may even try its c# port (http://code.google.com/p/csharp-sqlite/).

Should I store localization content in the application state

I am developing my first multilingual C# site and everything is going ok except for one crucial aspect. I'm not 100% sure what the best option is for storing strings (typically single words) that will be translated by code from my code behind pages.
On the front end of the site I am going to use asp.net resource files for the wording on the pages. This part is fine. However, this site will make XML calls and the XML responses are only ever in english. I have been given an excel sheet with all the words that will be returned by the XML broken into the different languages but I'm not sure how best to store/access this information. There are roughly 80 words x 7 languages.
I am thinking about creating a dictionary object for each language that is created by my global.asax file at application run time and just keeping it stored in memory. The plus side for doing this is that the dictionary object will only have to be created once (until IIS restarts) and can be accessed by any user without needing to be rebuilt but the downside is that I have 7 dictionary objects constantly stored in memory. The server is a Win 2008 64bit with 4GB of RAM so should I even be concerned with memory taken up by using this method?
What do you guys think would be the best way to store/retrieve different language words that would be used by all users?
Thanks for your input.
Rich

From what you say, you are looking at 560 words which need to differ based on locale. This is a drop in the ocean. The resource file method which you have contemplated is fit for purpose and I would recommend using them. They integrate with controls so you will be making the most from them.
If it did trouble you, you could have them on a sliding cache, i.e. sliding cache of 20mins for example, But I do not see anything wrong with your choice in this solution.
OMO
Cheers,
Andrew
P.s. have a read through this, to see how you can find and bind values in different resource files to controls and literals and use programatically.
http://msdn.microsoft.com/en-us/magazine/cc163566.aspx

As long as you are aware of the impact of doing so then yes, storing this data in memory would be fine (as long as you have enough to do so). Once you know what is appropriate for the current user then tossing it into memory would be fine. You might look at something like MemCached Win32 or Velocity though to offload the storage to another app server. Use this even on your local application for the time being that way when it is time to push this to another server or grow your app you have a clear separation of concerns defined at your caching layer. And keep in mind that the more languages you support the more stuff you are storing in memory. Keep an eye on the amount of data being stored in memory on your lone app server as this could become overwhelming in time. Also, make sure that the keys you are using are specific to the language. Otherwise you might find that you are storing a menu in german for an english user.

C# Is holding data in a delimited string bad practice

Is it bad practice to have a string like
"name=Gina;postion= HouseMatriarch;id=1234" to hold state data in an application.I know that I could just as well have a struct , class or hashtable to hold this info.
Is it acceptable practice to hold delimited key/value pairs in a database field– just for use in where the data type is not known at design time.
Thanks...I am just trying to insure that I maintain good design practices

Yes, holding your data in a string like "name=Gina;postion= HouseMatriarch;id=1234" is very bad practice. This data structure should be stored in structs or objects, because it is hard to access, validate and process the data in a string. It will take you much more time to write the code to parse your string to get at your data than just using the language structures of C#.
I would also advise against storing key/value pairs in database fields if the other option is just adding columns for those fields. If you don't know the type of your data at design time, you are probably not doing the design right. How will you be able to build an application when you don't know what data types your fields will have to hold? Or perhaps you should elaborate on the context of the application to make the intent clearer. It is not all black and white :-)

Well, if your application only passes this data between other systems, I don't see any problems in treating it as a string. However, if you need to use the data, I would definitely introduce the necessary types to handle that.

I think you will find your application easier to maintain if you make a struct or class to hold the data and then add a custom property to return (and set) the string you been using. This method will take the fields and format it in the string that you are already using and do the reverse (take the string and fill the fields) This way you maintain maximum compatibility with your old algorithms.

Well one immediate problem with that approach is embedded escape chars. Given your example what would happen if the user entered their name as follows:
Pet;er
or
Pe=;ter
or
pe;Name=Yeoi;
I am not sure what state data it is you are trying to hold, and without any context it's hard to make valid suggestions. Perhaps a first step would be to replace this with a key value pair, at least that negates the problem mentioned above and means you don't have to parse strings regularly.

I try to not keep data in any string based formats. But I encountered several situations, in which it was not possible to know in advance how the structure of the data will be (e.g. it was possible for the customer/end-user to dynamically add fields).
In contrast to your approach, we decided to store the data in XML, e.g. in your case this would be something similar like this:
<user id="1234">
<name>Gina</name>
<postion>HouseMatriarch</position>
</user>
This gives you the following advantages:
The classes to work with the data (read/write) are already available in the framework (e.g. XmlDocument or XML serialization)
you can easily exchange the data with other systems (if/when required)
You can store the data in a file
you can store the data in a database column (xml data type). You can even query that column when using SQL Server (although I'd try to avoid storing data in XML, that has to be queried)
using XML allows to add additional fields to your data at any time
Update: I'm not sure why my answer was downvoted that much - maybe it is because of the bad example. Therefore I'd like to make it clear: I would not use XML for properties such as an ID/primary key of a user, or for standard properties like "name", "email", etc. But for "extended/dynamic" properties (as described above) I still think this is an easy and elegant solution.

If you want to store structured data in a string I think you should use a standard notation such as JSON.

It's bad practice because of the amount of effort you have to go to, to construct the strings and parse them later. There are other more robust ways of serialising data for passing between systems.
For core business data, suitably designed classes will be far simpler to maintain, and with all the properties strongly typed, you'll know early on when you mis-type a property name.
As for key-value pairs, I'd say they're sometimes Ok, sometimes not. If there are a lot of possible values, but not a lot of actually owned values, then it can be perfectly all right to use KVPs. Sebastian Dietz's alternative of having a separate column for each field would result in a lot of empty fields in that case. It would also mean extra work altering the table every time you needed a new one.

None of the answers has mentioned normalization yet, so I thought I would. When database fields are involved, one of the key principles of normalization is that each field in a table only represents one thing. Delimited fields violate that principle.
One of the guys at Red Gate Software posted this article along those lines that you may find useful.

Well it just means that it is less searchable or indexable as a hashtable would be. It would also require a bit of processing to get into a state where it could be easily used by other bits of code. For example a bit of code that queries the id in that data would be something horrible such as:
if(dataStringThing.Substring(26, 4) == SomeIdInStringFormat)
So yes in most cases it is bad practice. However in other cases where this might be a default format that you need to retain the data in or performance means that you only should parse it as and when required. So it may not be a bad thing.
I would suggest myself if you have reasons to keep it in that format that it might be best to transform it into a class that separates the fields but also create a ToString() implementation on that class that restores it to the original format if you also need this. If the performance of it is a concern then modify this object to only parse the source into the fields in the class the first time those fields are accessed.
To re-iterate nothing in isolation is necessarily a bad practise. Bad practises are context dependant.

It (hopefully) obviously shouldn't be a normal choice. But there are cases where it's useful or necessary.
I can't think of any cases that wouldn't involve it being part of a communications protocol with some external service (e.g. a database connection string), so you're probably stuck with the format.
If you have a choice in the format (perhaps you are writing both sides of a system which can only communicate using strings), then at least choose something structured and well known. Examples of such have been given elsewhere, but the prime ones are naturally going to be XML or JSON. CSV, or some other delimited format may be useful in very simple cases (such as the database connection string) - but pay special attention to escaping delimiter characters (as the "Bobby Tables" joke (already referenced in another comment) nicely illustrated - google for him if you are not familiar with that one).
Your mention of a database suggests that this may be where the focus is. Are you trying to serialise application objects? (there are other ways of doing that). As another poster said, this may be a sign of a design that needs rethinking. But if you do need to store unknown datatypes in a DB, then XML may be an appropriate choice - especially if your DB supports XML fields. It's a bit of a minefield, though, so make sure you are familiar with how they work first.

I think it is not that bad when you are using a StringList for manging your string.
Especially when the structure of a e.g configuration-string (or configuration-database field) must be flexibel.
But in normally you should not do this, because of this disadvantages.

It all depends on what you're trying to accomplish.
If you need a heirarcical format of data or lots of fields that preserve data type, then no... a parsed string is a bad idea.
However, if you just need to transmit a string across a service and byte-conservation is important, then a Tag-Data pair may be exactly what you need.
If you do use a parsed string, it's important to be able to get at the data inside and quickly manage it. If you want an example TDP class, I posted one today to my website:
http://www.jerryandcheryl.net/jspot/2009/01/tag-data-pairs.html
I hope that helps.

I suggest considering these usage factors.
If you are processing the data within your own code, then you can use whatever data structures you wish. However, you may have issues developing your own implementation of a complex data structure, so consider using a pre-built one instead. Many come with whatever programming platform you may be using, while many more are documented in various books, articles, and discussions both printed and online. If you properly isolate your work from others, then you can safely do whatever you want.
On the other hand, if you need to share that data with others, then most careful consideration should be given. If you must share the data with an API, or via a storage mechanism (database, file, etc.), or via some transport (sockets, HTTP, etc.), then you should be thinking of others first and foremost. If you wish success and respect from your efforts, then you need to pay attention to standards and conventions and cost. Thankfully, practically any such use that you can imagine has been done before, so you can leverage others' efforts.
In a database, consider how others (and yourself) will be inserting, updating, deleting, and selecting the data. For example, using XML in a database makes all these steps unnecessarily hard and expensive compared to the alternatives. Pay attention to database normalization--learn it if you are not familiar already.
If you are dealing with text, pay attention to character encodings and make them explicit.
If there is an existing standard or convention for what you are doing, honor it. If there is a compelling reason to deviate, then accept the burden of justifying it, explaining it, and making it easy for others to accommodate your choices.
If you control both sides of a communication/transport medium, feel free to optimize. If you don't, err on the side of interoperability. Remember that a primary difference between the two scenarios is the level of self-description embedded with the data: interoperability has lots, optimization drops it based on shared assumptions. Text-rich data is more understandable, but binary is faster.
Think about your audience.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.