Sometimes, when I save to XML, I end up with a completely empty XML file.
I can't reproduce the issue on demand yet. It is just occasional. Are there steps that one can take to assist the user in this regard?
At the moment I do this:
public bool SavePublisherData()
{
bool bSaved = false;
try
{
XmlSerializer x = new XmlSerializer(_PublisherData.GetType());
using (StreamWriter writer = new StreamWriter(_strPathXML))
{
_PublisherData.BuildPublisherListFromDictionary();
x.Serialize(writer, _PublisherData);
bSaved = true;
}
}
catch
{
}
return bSaved;
}
The reason I have not put anything in the catch block is because this code is part of a C# DLL and I am calling it from an MFC project. I have read that you can't (or shouldn't) pass exceptions through from one environment to another. Thus, when an exception happens in my DLL I don't really know how I can sensibly feed that information to the user so they can see it. That is a side issue.
But this is how I save it. So, what steps can one take to try and prevent complete data loss?
Thank you.
Update
I have looked at the KB article that the link in the comments refers to and it states:
Use the following XmlSerializer class constructors. These class constructors cache the assemblies.
This is also re-stated in the article itself indicated in the comments:
What is the solution?
The default constructors XmlSerializer(type) and XmlSerializer(type, defaultNameSpace) caches the dynamic assembly so if you use those constructors only one copy of the dynamic assembly needs to be created.
Seems pretty smart… why not do this in all constructors? Hmm… interesting idea, wonder why they didn’t think of that one:) Ok, the other constructors are used for special cases, and the assumption would be that you wouldn’t create a ton of the same XmlSerializers using those special cases, which would mean that we would cache a lot of items we later didn’t need and use up a lot of extra space. Sometimes you have to do what is good for the majority of the people.
So what do you do if you need to use one of the other constructors? My suggestion would be to cache the XmlSerializer if you need to use it often. Then it would only be created once.
My code uses one of these default constructors as you can see:
XmlSerializer(_PublisherData.GetType());
So I don't think I need to worry about this XmlSerializerFactory in this instance.
I'm using protobuf to serialize large objects to binary files to be deserialized and used again at a later date. However, I'm having issues when I'm deserializing some of the larger files. The files are roughly ~2.3 GB in size and when I try to deserialize them I get several exceptions thrown (in the following order):
Sub-message not read correctly
Invalid wire-type; this usually means you have over-written a file without truncating or setting the length; see Using Protobuf-net, I suddenly got an exception about an unknown wire-type
Unexpected end-group in source data; this usually means the source data is corrupt
I've looked at the question referenced in the second exception, but that doesn't seem to cover the problem I'm having.
I'm using Microsoft's HPC pack to generate these files (they take a while) so the serialization looks like this:
using (var consoleStream = Console.OpenStandardOutput())
{
Serializer.Serialize(consoleStream, dto);
}
And I'm reading the files in as follows:
private static T Deserialize<T>(string file)
{
using (var fs = File.OpenRead(file))
{
return Serializer.Deserialize<T>(fs);
}
}
The files are two different types. One is about 1GB in size, the other about 2.3GB. The smaller files all work, the larger files do not. Any ideas what could be going wrong here? I realise I've not given a lot of detail, can give more as requested.
Here I need to refer to a recent discussion on the protobuf list:
Protobuf uses int to represent sizes so the largest size it can possibly support is <2G. We don't have any plan to change int to size_t in the code. Users should avoid using overly large messages.
I'm guessing that the cause of the failure inside protobuf-net is basically the same. I can probably change protobuf-net to support larger files, but I have to advise that this is not recommended, because it looks like no other implementation is going to work well with such huge data.
The fix is probably just a case of changing a lot of int to long in the reader/writer layer. But: what is the layout of your data? If there is an outer object that is basically a list of the actual objects, there is probably a sneaky way of doing this using an incremental reader (basically, spoofing the repeated support directly).
I'm making a roguelike game in XNA with procedurally generated levels.
It takes about a second to generate a whole new level but takes about 4 seconds to serialize it and about 8 seconds to deserialize one with my current methods. Also the files are massive (about 10 megs depending on how big the level is)
I serialize like this.
private void SerializeLevel()
{
string name = Globals.CurrentLevel.LvLSaveString;
using (Stream stream = new FileStream("SAVES\\"+name+".lvl", FileMode.Create, FileAccess.Write, FileShare.None))
{
formatter.Serialize(stream, Globals.CurrentLevel);
stream.Close();
}
}
My game engine architecture is basically a load of nested Lists which might go..
Level\Room\Interior\Interiorthing\sprite
This hierarchy is important to maintain for the game/performance. For instance usually only things in the current room are considered for updates and draws.
I want to try something like the Raw Binary formatter shown in this post to improve serialization/deserialization performance
I can just save the ints and floats and bools which correspond to all the positions of/configurations of things and reinstantiate everything when I load a level (which only takes a second)
My question is how do I use this Raw Binary serializer while also maintaining which object is which, what type it is and which nested list it is in.
In the example cited OP is just serializing a huge list of ints and every 3rd one is taken as the start of a new coordinate.
I could have a new stream for each different type of thing in each room but that would result in loads of different files (I think) Is there a way to segregate the raw binary stream with some kind of hierarchy? Ie. split it up into different sections pertaining to different rooms and different lists of things?
UPDATE
Ok, one thind that was throwing me off was that in question I reference OP is referring to "manual serialization" as "raw binary serialization" which I couldnt find any info on.
If you want to serialize each member of Globals independently, and upon deserialization to manually update the member value, you need to know which member you are currently processing upon deserialization. I can suggest you these:
Process items in the same order. The code in your example will put binary data in the stream that it is nearly impossible to extract, unless you deserialize members in the order they have been serialized. This is going to be maintenance hell if new items are added and is not a good solution regarding both code clarity, maintainability and backwards compatibility.
Use dictionary. As per comments, Globals appears to be a static class, therefore it itself is not serializable. When serializing, put all members of the Globals class in a dictionary, and serialize it. Upon deserialization, you will know that you have a dictionary (not a random mess of objects). Then from the deserialized dictionary restore the Globals object
Use custom class. Create a class with all settings (a better approach). Use a single static instance of the class to access settings. You can serialize and deserialize that class
Settings. The second approach gets closer to an already built-in concept in .NET - Settings. Take a look at it, it seems that the Globals class is in fact a custom variant of a settings configuration
So I finally got my listview content to serialize and write to a file so I can restore my apps state across different sessions. Now I'm wondering if there is a way I can incrementally serialize and save my data. Currently, I call this method in the SaveState method of my mainpage:
private async void writeToFile()
{
var f = await Windows.Storage.ApplicationData.Current.LocalFolder.CreateFileAsync("data.txt", CreationCollisionOption.ReplaceExisting);
using (var st = await f.OpenStreamForWriteAsync())
{
var s = new DataContractSerializer(typeof(ObservableCollection<Item>),
new Type[] { typeof(Item) });
s.WriteObject(st, convoStrings);
}
}
What I think would be more ideal is to write out data to storage as it is generated, so I don't have to serialize my entire list in the small suspend time frame, but I don't know if this is possible to incrementally serialize my collection, and if it is, how would I do it?
Note that my data doesn't change after it is generated, so I don't have to worry about anything other than appending new data to the end of my currently serialized list.
It depends on you definition when to save the data to the hard drive. Maybe you want to save the new collection state when a new collection item is added or removed? Or when the content of an item changes.
The main problem about saving everything just in time to the hard drive is, that is may be doggy slow. If you're using an async programming model, it wouldn't be a problem directly since your app won't hang since yeah, everything is async.
I think it may be a better idea to save the collection say every minute AND when the user closes the application. This will only work if you're dealing with a cerain amount of data since you only have about 3 seconds to perform all the IO stuff.
As you can see, there is no perfect solution. It really depends on your requirements and the size of the data. Without further information thats all I can tell you for sure.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
For an open source project I am looking for a good, simple implementation of a Dictionary that is backed by a file. Meaning, if an application crashes or restarts the dictionary will keep its state. I would like it to update the underlying file every time the dictionary is touched. (Add a value or remove a value). A FileWatcher is not required but it could be useful.
class PersistentDictionary<T,V> : IDictionary<T,V>
{
public PersistentDictionary(string filename)
{
}
}
Requirements:
Open Source, with no dependency on native code (no sqlite)
Ideally a very short and simple implementation
When setting or clearing a value it should not re-write the entire underlying file, instead it should seek to the position in the file and update the value.
Similar Questions
Persistent Binary Tree / Hash table in .Net
Disk backed dictionary/cache for c#
PersistentDictionary<Key,Value>
bplustreedotnet
The bplusdotnet package is a library of cross compatible data structure implementations in C#, java, and Python which are useful for applications which need to store and retrieve persistent information. The bplusdotnet data structures make it easy to store string keys associated with values permanently.
ESENT Managed Interface
Not 100% managed code but it's worth mentioning it as unmanaged library itself is already part of every windows XP/2003/Vista/7 box
ESENT is an embeddable database storage engine (ISAM) which is part of Windows. It provides reliable, transacted, concurrent, high-performance data storage with row-level locking, write-ahead logging and snapshot isolation. This is a managed wrapper for the ESENT Win32 API.
Akavache
*Akavache is an asynchronous, persistent key-value cache created for writing native desktop and mobile applications in C#. Think of it like memcached for desktop apps.
- The C5 Generic Collection Library
C5 provides functionality and data structures not provided by the standard .Net System.Collections.Generic namespace, such as persistent tree data structures, heap based priority queues, hash indexed array lists and linked lists, and events on collection changes.
Let me analyze this:
Retrieve information by key
Persistant storage
Do not want to write back the whole file when 1 value changes
Should survive crashes
I think you want a database.
Edit: I think you are searching for the wrong thing. Search for a database that fits your requirements. And change some of your requirements, because I think it will be difficult to meet them all.
one way is to use the Extensible Storage Engine built into windoows to store your stuff. It's a native win database that supports indexing, transactions etc...
I was working on porting EHCache to .NET. Take a look at the project
http://sourceforge.net/projects/thecache/
Persistent caching is core functionality that is already implemented. All main Unit Tests are passing. I got a bit stuck on distributed caching, but you do not need that part.
Sounds cool, but how will you get around changes to the stored value (if it was a reference type) itself? If its immutable then all is well but if not you're kinda stuffed :-)
If you're not dealing with immutable values, I would suspect a better approach would be to handle persistence at the value level and to just rebuild the dictionary as necessary.
(edited to add a clarification)
I think your issue is likely to be that last point:
When setting or clearing a value it should not re-write the entire underlying file, instead it should seek to the position in the file and update the value.
This is exactly what a DB does - you're basically describing a simple file based table structure.
We can illustrate the problem by looking at strings.
Strings in memory are flexible things - you don't need to know the length of a string in C# when you declare its type.
In data storage strings and everything else are fixed sizes. Your saved dictionary on disk is just a collection of bytes, in order.
If you replace a value in the middle it either has to be exactly the same size or you will have to rewrite every byte that comes after it.
This is why most databases restrict text and blob fields to fixed sizes. New features like varchar(max)/varbinary(max) in Sql 2005+ are actually clever simplifications to the row only actually storing a pointer to the real data.
You can't use the fixed sizes with your example because it's generic - you don't know what type you're going to be storing so you can't pad the values out to a maximum size.
You could do:
class PersistantDictionary<T,V> : Dictionary<T,V>
where V:struct
...as value types don't vary in storage size, although you would have to be careful with your implementation to save the right amount of storage for each type.
However your model wouldn't be very performant - if you look at how SQL server and Oracle deal with table changes they don't change the values like this. Instead they flag the old record as a ghost, and add a new record with the new value. Old ghosted records are cleaned up later when the DB is less busy.
I think you're trying to reinvent the wheel:
If you're dealing with large amounts of data then you really need to check out using a full-blown DB. MySql or SqlLite are both good, but you're not going to find a good, simple, open-source and lite implementation.
If you aren't dealing with loads of data then I'd go for whole file serialisation, and there are already plenty of good suggestions here on how to do that.
I wrote up an implementation myself based on a very similar (I think identical) requirement I had on another project a while ago. When I did it, one thing I realised was that most of the time you'll be doing writes, you only do a read rarely when the program crashes or when it's closed. So the idea is to make the writes as fast as possible. What I did was make a very simple class which would just write a log of all the operations (additions and deletions) to the dictionary as things occurred. So after a while you get a lot of repeating between keys. Because of that, once the object detects a certain amount of repetition, it'll clear the log and rewrite it so each key and its value only appears once.
Unfortunately, you can't subclass Dictionary because you can't override anything in it. This is my simple implementation, I haven't tested it though I'm sorry, I thought you might want the idea though. Feel free to use it and change it as much as you like.
class PersistentDictManager {
const int SaveAllThreshold = 1000;
PersistentDictManager(string logpath) {
this.LogPath = logpath;
this.mydictionary = new Dictionary<string, string>();
this.LoadData();
}
public string LogPath { get; private set; }
public string this[string key] {
get{ return this.mydictionary[key]; }
set{
string existingvalue;
if(!this.mydictionary.TryGetValue(key, out existingvalue)) { existingvalue = null; }
if(string.Equals(value, existingvalue)) { return; }
this[key] = value;
// store in log
if(existingvalue != null) { // was an update (not a create)
if(this.IncrementSaveAll()) { return; } // because we're going to repeat a key the log
}
this.LogStore(key, value);
}
}
public void Remove(string key) {
if(!this.mydictionary.Remove(key)) { return; }
if(this.IncrementSaveAll()) { return; } // because we're going to repeat a key in the log
this.LogDelete(key);
}
private void CreateWriter() {
if(this.writer == null) {
this.writer = new BinaryWriter(File.Open(this.LogPath, FileMode.Open));
}
}
private bool IncrementSaveAll() {
++this.saveallcount;
if(this.saveallcount >= PersistentDictManager.SaveAllThreshold) {
this.SaveAllData();
return true;
}
else { return false; }
}
private void LoadData() {
try{
using(BinaryReader reader = new BinaryReader(File.Open(LogPath, FileMode.Open))) {
while(reader.PeekChar() != -1) {
string key = reader.ReadString();
bool isdeleted = reader.ReadBoolean();
if(isdeleted) { this.mydictionary.Remove(key); }
else {
string value = reader.ReadString();
this.mydictionary[key] = value;
}
}
}
}
catch(FileNotFoundException) { }
}
private void LogDelete(string key) {
this.CreateWriter();
this.writer.Write(key);
this.writer.Write(true); // yes, key was deleted
}
private void LogStore(string key, string value) {
this.CreateWriter();
this.writer.Write(key);
this.writer.Write(false); // no, key was not deleted
this.writer.Write(value);
}
private void SaveAllData() {
if(this.writer != null) {
this.writer.Close();
this.writer = null;
}
using(BinaryWriter writer = new BinaryWriter(File.Open(this.LogPath, FileMode.Create))) {
foreach(KeyValuePair<string, string> kv in this.mydictionary) {
writer.Write(kv.Key);
writer.Write(false); // is not deleted flag
writer.Write(kv.Value);
}
}
}
private readonly Dictionary<string, string> mydictionary;
private int saveallcount = 0;
private BinaryWriter writer = null;
}
Check this blog out:
http://ayende.com/Blog/archive/2009/01/17/rhino.dht-ndash-persistent-amp-distributed-storage.aspx
Looks to be exactly what you are looking for.
Just use serialization. Look at the BinaryFormatter class.
I don't know of anything to solve your problem. It will need to be a fixed size structure, so that you can meet the requirements of being able to rewrite records without rewriting the entire file.
This means normal strings are out.
Like Douglas said, you need to know the fixed size of your types (both T and V). Also, variable-length instances in the object grid referenced by any of those instances are out.
Still, implementing a dictionary backed by a file is quite simple and you can use the BinaryWriter class to write the types to disk, after inheriting or encapsulating the Dictionary<TKey, TValue> class.
Consider a memory mapped file. I'm not sure if there is direct support in .NET, but you could pinvoke the Win32 calls.
I haven't actually used it, but this project apparently provides an mmap()-like implementation in C#
Mmap
I'd recommend SQL Server Express or other database.
It's free.
It integrates very well with C#, including LINQ.
It's faster than a homemade solution.
It's more reliable than a homemade solution.
It's way more powerful than a simple disk-based data structure, so it'll be easy to do more in the future.
SQL is an industry standard, so other developers will understand your program more easily, and you'll have a skill that is useful in the future.
I am not much of a programmer, but wouldn't creating a really simple XML format to store your data do the trick?
<dico>
<dicEntry index="x">
<key>MyKey</key>
<val type="string">My val</val>
</dicEntry>
...
</dico>
From there, you load the XML file DOM and fill up your dictionary as you like,
XmlDocument xdocDico = new XmlDocument();
string sXMLfile;
public loadDico(string sXMLfile, [other args...])
{
xdocDico.load(sXMLfile);
// Gather whatever you need and load it into your dico
}
public flushDicInXML(string sXMLfile, dictionary dicWhatever)
{
// Dump the dic in the XML doc & save
}
public updateXMLDOM(index, key, value)
{
// Update a specific value of the XML DOM based on index or key
}
Then whenever you want, you can update the DOM and save it on disk.
xdocDico.save(sXMLfile);
If you can afford to keep the DOM in memory performance-wise, it's pretty easy to deal with. Depending on your requirements, you may not even need the dictionary at all.