Serializing a Dictionary to disk? - c#

We have a Hashtable (specifically the C# Dictionary class) that holds several thousands/millions of (Key,Value) pairs for near O(1) search hits/misses.
We'd like to be able to flush this data structure to disk (serialize it) and load it again later (deserialize) such that the internal hashtable of the Dictionary is preserved.
What we do right now:
Load from Disk => List<KVEntity>. (KVEntity is serializable. We use Avro to serialize - can drop Avro if needed)
Read every KVEntity from array => dictionary. This regenerates the dictionary/hashtable internal state.
< System operates, Dictionary can grow/shrink/values change etc >
When saving, read from the dictionary into array (via myKVDict.Values.SelectMany(x => x) into a new List<KVEntity>)
We serialize the array (List<KVEntity>) to disk to save the raw data
Notice that during our save/restore, we lose the internal tashtable/dictionary state and have to rebuild it each time.
We'd like to directly serialize to/from Dictionary (including it's internal "live" state) instead of using an intermediate array just for the disk i/o. How can we do that?
Some pseudo code:
// The actual "node" that has information. Both myKey and myValue have actual data work storing
public class KVEntity
{
public string myKey {get;set;}
public DataClass myValue {get;set;}
}
// unit of disk IO/serialization
public List<KVEntity> myKVList {get;set;}
// unit of run time processing. The string key is KVEntity.myKey
public Dictionary<string,KVEntity> myKVDict {get;set;}

Storing the internal state of the Dictionary instance would be bad practice - a key tenet of OOP is encapsulation: that internal implementation details are deliberately hidden from the consumer.
Furthermore, the mapping algorithm used by Dictionary might change across different versions of the .NET Framework, especially given that CIL assemblies are designed to be forward-compatible (i.e. a program written against .NET 2.0 will generally work against .NET 4.5).
Finally, there are no real performance gains from serialising the internal state of the dictionary. It is much better to use a well-defined file format with a focus on maintainability than speed. Besides, if the dictionary contains "several thousands" of entries then that should load from disk in under 15ms by my reckon (assuming you have an efficient on-disk format). Finally, a data structure optimised for RAM will not necessarily work well on-disk where sequential reads/writes are better.
Your post is very adamant about working with the internal state of the dictionary, but your existing approach seems fine (albiet, it could do with some optimisations). If you revealed more details we can help you make it faster.
Optimisations
The main issues I see with your existing implementation is the conversion to/from Arrays and Lists, which is unnecessary given that Dictionary is directly enumerable.
I would do something like this:
Dictionary<String,TFoo> dict = ... // where TFoo : new() && implements a arbitrary Serialize(BinaryWriter) and Deserialize(BinaryReader) methods
using(FileStream fs = File.OpenWrite("filename.dat"))
using(BinaryWriter wtr = new BinaryWriter(fs, Encoding.UTF8)) {
wtr.Write( dict.Count );
foreach(String key in dict.Keys) {
wtr.Write( key );
wtr.Write('\0');
dict[key].Serialize( wtr );
wtr.Write('\0'); // assuming NULL characters can work as record delimiters for safety.
}
}
Assuming that your TFoo's Serialize method is fast, I really don't think you'll get any faster speeds than this approach.
Implementing a de-serializer is an exercise for the reader, but should be trivial. Note how I stored the size of the dictionary to the file, so the returned dictionary can be set with the correct size when it's created, thus avoiding the re-balancing problem that #spender describes in his comment.

So we're going to stick with our existing strategy given Dai's reasoning and that we have C# and Java compatibility to maintain (which means the extra tree-state bits of the C# Dictionary would be dropped on the Java side anyways which would load only the node data as it does right now).
For later readers still interested in this I found a very good response here that somewhat answers the question posed. A critical difference is that this answer is for B+ Trees, not Dictionaries, although in practical applications those two data structures are very similar in performance. B+ Tree performance closer to Dictionaries than regular trees (like binary, red-black, AVL etc). Specifically, Dictionaries deliver near O(1) performance (but no "select from a range" abilities) while B+ Trees have O(logb(X)) where b = base is usually large which makes them very performant compared to regular trees where b=2. I'm copy-pasting it here for completeness but all credit goes to csharptest.net for the B+ Tree code, test, benchmarks and writeup(s).
For completeness I'm going to add my own implementation here.
Introduction - http://csharptest.net/?page_id=563
Benchmarks - http://csharptest.net/?p=586
Online Help - http://help.csharptest.net/
Source Code - http://code.google.com/p/csharptest-net/
Downloads - http://code.google.com/p/csharptest-net/downloads
NuGet Package - http://nuget.org/List/Packages/CSharpTest.Net.BPlusTree

Related

.NET small class with many instances optimization scenario?

I have millions of instances of class Data, I seek optimization advise.
Is there a way to optimize it in any way - save memory for example by serializing it somehow, although it will hurt the retrieval speed which is important too. Maybe turning the class to struct - but it seems that the class is pretty large for struct.
Queries for this objects can take hundreds-millions of these objects at a time. They sit in a list and queried by DateTime. The results are aggregated in different ways, many calculation can be applied.
[Serializable]
[DataContract]
public abstract class BaseData {}
[Serializable]
public class Data : BaseData {
public byte member1;
public int member2;
public long member3;
public double member4;
public DateTime member5;
}
Unfortunately, while you did specify that you want to "optimize", you did not specify what the exact problem is you mean to tackle. So I cannot really give you more than general advice.
Serialization will not help you. Your Data objects are already stored as bytes in memory. Nor will turning it into a struct help. The difference between a struct and a class lies in their addressing and referencing behaviour, not in their memory footprint.
The only way I can think of to reduce the memory footprint of a collection with "hundreds-millions" of these objects would be to serialize and compress the entire thing. But that is not feasible. You would always have to decompress the entire thing before accessing it, which would shoot your performance to hell AND actually almost double the memory consumption on access (compressed and decompressed data both lying in memory at that point).
The best general advice I can give you is not to try to optimize this scenario yourself, but use specialized software for that. By specialized software, I mean a (in-memory) database. One example of a database you can use in-memory, and for which you already have everything you need on-board in the .NET framework, is SQLite.
I assume, as you seem to imply, that you have a class with many members, have a large number of instances, and need to keep them all in memory at the same time to perform calculations.
I ran a few tests to see if you could actually get different sizes for the classes you described.
I used this simple method for finding the in-memory size of an object:
private static void MeasureMemory()
{
int size = 10000000;
object[] array = new object[size];
long before = GC.GetTotalMemory(true);
for (int i = 0; i < size; i++)
{
array[i] = new Data();
}
long after = GC.GetTotalMemory(true);
double diff = after - before;
Console.WriteLine("Total bytes: " + diff);
Console.WriteLine("Bytes per object: " + diff / size);
}
It may be primitive, but I find that it works fine for situations like this.
As expected, almost nothing you can do to that class (turning it to a struct, removing the inheritance, or the method attributes) influences the memory being used by a single instance. As far as memory usage goes, they are all equivalent. However, do try to fiddle with your actual classes and run them through the given code.
The only way you could actually reduce the memory footprint of an instance would be to use smaller structures for keeping your data (int instead of long for example). If you have a large number of booleans, you could group them into a byte or integer, and have simple property wrappers to work with them (A boolean takes a byte of memory). These may be insignificant things in most situations, but for a hundred million objects, removing a boolean could make a difference of a hundred MB of memory. Also, be aware that the platform target you choose for your application can have an impact on the memory footprint of an object (x64 builds take up more memory then x86 ones).
Serializing the data is very unlikely to help. An in-memory database has it's upsides, especially if you are doing complex queries. However, it is unlikely to actually reduce the memory usage for your data. Unfortunately, there just aren't many ways to reduce the footprint of basic data types. At some point, you will just have to move to a file-based database.
However, here are some ideas. Please be aware that they are hacky, highly conditional, decrease the computation performance and will make the code harder to maintain.
It is often a case in large data structures that objects in different states will have only some properties filled, and the other will be set to null or a default value. If you can identify such groups of properties, perhaps you could move them to a sub-class, and have one reference that could be null instead of having several properties take up space. Then you only instantiate the sub-class once it is needed. You could write property wrappers that could hide this from the rest of the code. Have in mind that the worst case scenario here would have you keeping all the properties in memory, plus several object headers and pointers.
You could perhaps turn members that are likely to take a default value into binary representations, and then pack them into a byte array. You would know which byte positions represent which data member, and could write properties that could read them. Position the properties that are most likely to have a default value at the end of the byte array (a few longs that are often 0 for example). Then, when creating the object, adjust the byte array size to exclude the properties that have the default value, starting from the end of the list, until you hit the first member that has a non-default value. When the outside code requests a property, you can check if the byte array is large enough to hold that property, and if not, return the default value. This way, you could potentially save some space. Best case, you will have a null pointer to a byte array instead of several data members. Worst case, you will have full byte arrays taking as much space as the original data, plus some overhead for the array. The usefulness depends on the actual data, and assumes that you do relatively few writes, as the re-computation of the array will be expensive.
Hope any of this helps :)

.net Serialization: How to use raw binary writer while maintaining which thing is which

I'm making a roguelike game in XNA with procedurally generated levels.
It takes about a second to generate a whole new level but takes about 4 seconds to serialize it and about 8 seconds to deserialize one with my current methods. Also the files are massive (about 10 megs depending on how big the level is)
I serialize like this.
private void SerializeLevel()
{
string name = Globals.CurrentLevel.LvLSaveString;
using (Stream stream = new FileStream("SAVES\\"+name+".lvl", FileMode.Create, FileAccess.Write, FileShare.None))
{
formatter.Serialize(stream, Globals.CurrentLevel);
stream.Close();
}
}
My game engine architecture is basically a load of nested Lists which might go..
Level\Room\Interior\Interiorthing\sprite
This hierarchy is important to maintain for the game/performance. For instance usually only things in the current room are considered for updates and draws.
I want to try something like the Raw Binary formatter shown in this post to improve serialization/deserialization performance
I can just save the ints and floats and bools which correspond to all the positions of/configurations of things and reinstantiate everything when I load a level (which only takes a second)
My question is how do I use this Raw Binary serializer while also maintaining which object is which, what type it is and which nested list it is in.
In the example cited OP is just serializing a huge list of ints and every 3rd one is taken as the start of a new coordinate.
I could have a new stream for each different type of thing in each room but that would result in loads of different files (I think) Is there a way to segregate the raw binary stream with some kind of hierarchy? Ie. split it up into different sections pertaining to different rooms and different lists of things?
UPDATE
Ok, one thind that was throwing me off was that in question I reference OP is referring to "manual serialization" as "raw binary serialization" which I couldnt find any info on.
If you want to serialize each member of Globals independently, and upon deserialization to manually update the member value, you need to know which member you are currently processing upon deserialization. I can suggest you these:
Process items in the same order. The code in your example will put binary data in the stream that it is nearly impossible to extract, unless you deserialize members in the order they have been serialized. This is going to be maintenance hell if new items are added and is not a good solution regarding both code clarity, maintainability and backwards compatibility.
Use dictionary. As per comments, Globals appears to be a static class, therefore it itself is not serializable. When serializing, put all members of the Globals class in a dictionary, and serialize it. Upon deserialization, you will know that you have a dictionary (not a random mess of objects). Then from the deserialized dictionary restore the Globals object
Use custom class. Create a class with all settings (a better approach). Use a single static instance of the class to access settings. You can serialize and deserialize that class
Settings. The second approach gets closer to an already built-in concept in .NET - Settings. Take a look at it, it seems that the Globals class is in fact a custom variant of a settings configuration

In memory representation of large data

Currently, I am working on a project where I need to bring GBs of data on to client machine to do some task and the task needs whole data as it do some analysis on the data and helps in decision making process.
so the question is, what are the best practices and suitable approach to manage that much amount of data into memory without hampering the performance of client machine and application.
note: at the time of application loading, we can spend time to bring data from database to client machine, that's totally acceptable in our case. but once the data is loaded into application at start up, performance is very important.
This is a little hard to answer without a problem statement, i.e. what problems you are currently facing, but the following is just some thoughts, based on some recent experiences we had in a similar scenario. It is, however, a lot of work to change to this type of model - so it also depends how much you can invest trying to "fix" it, and I can make no promise that "your problems" are the same as "our problems", if you see what I mean. So don't get cross if the following approach doesn't work for you!
Loading that much data into memory is always going to have some impact, however, I think I see what you are doing...
When loading that much data naively, you are going to have many (millions?) of objects and a similar-or-greater number of references. You're obviously going to want to be using x64, so the references will add up - but in terms of performance the biggesst problem is going to be garbage collection. You have a lot of objects that can never be collected, but the GC is going to know that you're using a ton of memory, and is going to try anyway periodically. This is something I looked at in more detail here, but the following graph shows the impact - in particular, those "spikes" are all GC killing performance:
For this scenario (a huge amount of data loaded, never released), we switched to using structs, i.e. loading the data into:
struct Foo {
private readonly int id;
private readonly double value;
public Foo(int id, double value) {
this.id = id;
this.value = value;
}
public int Id {get{return id;}}
public double Value {get{return value;}}
}
and stored those directly in arrays (not lists):
Foo[] foos = ...
the significance of that is that because some of these structs are quite big, we didn't want them copying themselves lots of times on the stack, but with an array you can do:
private void SomeMethod(ref Foo foo) {
if(foo.Value == ...) {blah blah blah}
}
// call ^^^
int index = 17;
SomeMethod(ref foos[index]);
Note that we've passed the object directly - it was never copied; foo.Value is actually looking directly inside the array. The tricky bit starts when you need relationships between objects. You can't store a reference here, as it is a struct, and you can't store that. What you can do, though, is store the index (into the array). For example:
struct Customer {
... more not shown
public int FooIndex { get { return fooIndex; } }
}
Not quite as convenient as customer.Foo, but the following works nicely:
Foo foo = foos[customer.FooIndex];
// or, when passing to a method, SomeMethod(ref foos[customer.FooIndex]);
Key points:
we're now using half the size for "references" (an int is 4 bytes; a reference on x64 is 8 bytes)
we don't have several-million object headers in memory
we don't have a huge object graph for GC to look at; only a small number of arrays that GC can look at incredibly quickly
but it is a little less convenient to work with, and needs some initial processing when loading
additional notes:
strings are a killer; if you have millions of strings, then that is problematic; at a minimum, if you have strings that are repeated, make sure you do some custom interning (not string.Intern, that would be bad) to ensure you only have one instance of each repeated value, rather than 800,000 strings with the same contents
if you have repeated data of finite length, rather than sub-lists/arrays, you might consider a fixed array; this requires unsafe code, but avoids another myriad of objects and references
As an additional footnote, with that volume of data, you should think very seriously about your serialization protocols, i.e. how you're sending the data down the wire. I would strongly suggest staying far away from things like XmlSerializer, DataContractSerializer or BinaryFormatter. If you want pointers on this subject, let me know.

Google Protocol Buffers Serialization hangs writing 1GB+ data

I am serializing a large data set using protocol buffer serialization. When my data set contains 400000 custom objects of combined size around 1 GB, serialization returns in 3~4 seconds. But when my data set contains 450000 objects of combined size around 1.2 GB, serialization call never returns and CPU is constantly consumed.
I am using .NET port of Protocol Buffers.
Looking at the new comments, this appears to be (as the OP notes) MemoryStream capacity limited. A slight annoyance in the protobuf spec is that since sub-message lengths are variable and must prefix the sub-message, it is often necessary to buffer portions until the length is known. This is fine for most reasonable graphs, but if there is an exceptionally large graph (except for the "root object has millions of direct children" scenario, which doesn't suffer) it can end up doing quite a bit in-memory.
If you aren't tied to a particular layout (perhaps due to .proto interop with an existing client), then a simple fix is as follows: on child (sub-object) properties (including lists / arrays of sub-objects), tell it to use "group" serialization. This is not the default layout, but it says "instead of using a length-prefix, use a start/end pair of tokens". The downside of this is that if your deserialization code doesn't know about a particular object, it takes longer to skip the field, as it can't just say "seek forwards 231413 bytes" - it instead has to walk the tokens to know when the object is finished. In most cases this isn't an issue at all, since your deserialization code fully expects that data.
To do this:
[ProtoMember(1, DataFormat = DataFormat.Group)]
public SomeType SomeChild { get; set; }
....
[ProtoMember(4, DataFormat = DataFormat.Group)]
public List<SomeOtherType> SomeChildren { get { return someChildren; } }
The deserialization in protobuf-net is very forgiving (by default there is an optional strict mode), and it will happily deserialize groups in place of length-prefix, and length-prefix in place of groups (meaning: any data you have already stored somewhere should work fine).
1.2G of memory is dangerously close to the managed memory limit for 32 bit .Net processes. My guess is the serialization triggers an OutOfMemoryException and all hell breaks loose.
You should try to use several smaller serializations rather than a gigantic one, or move to a 64bit process.
Cheers,
Florian

Looking for a simple standalone persistent dictionary implementation in C# [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
For an open source project I am looking for a good, simple implementation of a Dictionary that is backed by a file. Meaning, if an application crashes or restarts the dictionary will keep its state. I would like it to update the underlying file every time the dictionary is touched. (Add a value or remove a value). A FileWatcher is not required but it could be useful.
class PersistentDictionary<T,V> : IDictionary<T,V>
{
public PersistentDictionary(string filename)
{
}
}
Requirements:
Open Source, with no dependency on native code (no sqlite)
Ideally a very short and simple implementation
When setting or clearing a value it should not re-write the entire underlying file, instead it should seek to the position in the file and update the value.
Similar Questions
Persistent Binary Tree / Hash table in .Net
Disk backed dictionary/cache for c#
PersistentDictionary<Key,Value>
bplustreedotnet
The bplusdotnet package is a library of cross compatible data structure implementations in C#, java, and Python which are useful for applications which need to store and retrieve persistent information. The bplusdotnet data structures make it easy to store string keys associated with values permanently.
ESENT Managed Interface
Not 100% managed code but it's worth mentioning it as unmanaged library itself is already part of every windows XP/2003/Vista/7 box
ESENT is an embeddable database storage engine (ISAM) which is part of Windows. It provides reliable, transacted, concurrent, high-performance data storage with row-level locking, write-ahead logging and snapshot isolation. This is a managed wrapper for the ESENT Win32 API.
Akavache
*Akavache is an asynchronous, persistent key-value cache created for writing native desktop and mobile applications in C#. Think of it like memcached for desktop apps.
- The C5 Generic Collection Library
C5 provides functionality and data structures not provided by the standard .Net System.Collections.Generic namespace, such as persistent tree data structures, heap based priority queues, hash indexed array lists and linked lists, and events on collection changes.
Let me analyze this:
Retrieve information by key
Persistant storage
Do not want to write back the whole file when 1 value changes
Should survive crashes
I think you want a database.
Edit: I think you are searching for the wrong thing. Search for a database that fits your requirements. And change some of your requirements, because I think it will be difficult to meet them all.
one way is to use the Extensible Storage Engine built into windoows to store your stuff. It's a native win database that supports indexing, transactions etc...
I was working on porting EHCache to .NET. Take a look at the project
http://sourceforge.net/projects/thecache/
Persistent caching is core functionality that is already implemented. All main Unit Tests are passing. I got a bit stuck on distributed caching, but you do not need that part.
Sounds cool, but how will you get around changes to the stored value (if it was a reference type) itself? If its immutable then all is well but if not you're kinda stuffed :-)
If you're not dealing with immutable values, I would suspect a better approach would be to handle persistence at the value level and to just rebuild the dictionary as necessary.
(edited to add a clarification)
I think your issue is likely to be that last point:
When setting or clearing a value it should not re-write the entire underlying file, instead it should seek to the position in the file and update the value.
This is exactly what a DB does - you're basically describing a simple file based table structure.
We can illustrate the problem by looking at strings.
Strings in memory are flexible things - you don't need to know the length of a string in C# when you declare its type.
In data storage strings and everything else are fixed sizes. Your saved dictionary on disk is just a collection of bytes, in order.
If you replace a value in the middle it either has to be exactly the same size or you will have to rewrite every byte that comes after it.
This is why most databases restrict text and blob fields to fixed sizes. New features like varchar(max)/varbinary(max) in Sql 2005+ are actually clever simplifications to the row only actually storing a pointer to the real data.
You can't use the fixed sizes with your example because it's generic - you don't know what type you're going to be storing so you can't pad the values out to a maximum size.
You could do:
class PersistantDictionary<T,V> : Dictionary<T,V>
where V:struct
...as value types don't vary in storage size, although you would have to be careful with your implementation to save the right amount of storage for each type.
However your model wouldn't be very performant - if you look at how SQL server and Oracle deal with table changes they don't change the values like this. Instead they flag the old record as a ghost, and add a new record with the new value. Old ghosted records are cleaned up later when the DB is less busy.
I think you're trying to reinvent the wheel:
If you're dealing with large amounts of data then you really need to check out using a full-blown DB. MySql or SqlLite are both good, but you're not going to find a good, simple, open-source and lite implementation.
If you aren't dealing with loads of data then I'd go for whole file serialisation, and there are already plenty of good suggestions here on how to do that.
I wrote up an implementation myself based on a very similar (I think identical) requirement I had on another project a while ago. When I did it, one thing I realised was that most of the time you'll be doing writes, you only do a read rarely when the program crashes or when it's closed. So the idea is to make the writes as fast as possible. What I did was make a very simple class which would just write a log of all the operations (additions and deletions) to the dictionary as things occurred. So after a while you get a lot of repeating between keys. Because of that, once the object detects a certain amount of repetition, it'll clear the log and rewrite it so each key and its value only appears once.
Unfortunately, you can't subclass Dictionary because you can't override anything in it. This is my simple implementation, I haven't tested it though I'm sorry, I thought you might want the idea though. Feel free to use it and change it as much as you like.
class PersistentDictManager {
const int SaveAllThreshold = 1000;
PersistentDictManager(string logpath) {
this.LogPath = logpath;
this.mydictionary = new Dictionary<string, string>();
this.LoadData();
}
public string LogPath { get; private set; }
public string this[string key] {
get{ return this.mydictionary[key]; }
set{
string existingvalue;
if(!this.mydictionary.TryGetValue(key, out existingvalue)) { existingvalue = null; }
if(string.Equals(value, existingvalue)) { return; }
this[key] = value;
// store in log
if(existingvalue != null) { // was an update (not a create)
if(this.IncrementSaveAll()) { return; } // because we're going to repeat a key the log
}
this.LogStore(key, value);
}
}
public void Remove(string key) {
if(!this.mydictionary.Remove(key)) { return; }
if(this.IncrementSaveAll()) { return; } // because we're going to repeat a key in the log
this.LogDelete(key);
}
private void CreateWriter() {
if(this.writer == null) {
this.writer = new BinaryWriter(File.Open(this.LogPath, FileMode.Open));
}
}
private bool IncrementSaveAll() {
++this.saveallcount;
if(this.saveallcount >= PersistentDictManager.SaveAllThreshold) {
this.SaveAllData();
return true;
}
else { return false; }
}
private void LoadData() {
try{
using(BinaryReader reader = new BinaryReader(File.Open(LogPath, FileMode.Open))) {
while(reader.PeekChar() != -1) {
string key = reader.ReadString();
bool isdeleted = reader.ReadBoolean();
if(isdeleted) { this.mydictionary.Remove(key); }
else {
string value = reader.ReadString();
this.mydictionary[key] = value;
}
}
}
}
catch(FileNotFoundException) { }
}
private void LogDelete(string key) {
this.CreateWriter();
this.writer.Write(key);
this.writer.Write(true); // yes, key was deleted
}
private void LogStore(string key, string value) {
this.CreateWriter();
this.writer.Write(key);
this.writer.Write(false); // no, key was not deleted
this.writer.Write(value);
}
private void SaveAllData() {
if(this.writer != null) {
this.writer.Close();
this.writer = null;
}
using(BinaryWriter writer = new BinaryWriter(File.Open(this.LogPath, FileMode.Create))) {
foreach(KeyValuePair<string, string> kv in this.mydictionary) {
writer.Write(kv.Key);
writer.Write(false); // is not deleted flag
writer.Write(kv.Value);
}
}
}
private readonly Dictionary<string, string> mydictionary;
private int saveallcount = 0;
private BinaryWriter writer = null;
}
Check this blog out:
http://ayende.com/Blog/archive/2009/01/17/rhino.dht-ndash-persistent-amp-distributed-storage.aspx
Looks to be exactly what you are looking for.
Just use serialization. Look at the BinaryFormatter class.
I don't know of anything to solve your problem. It will need to be a fixed size structure, so that you can meet the requirements of being able to rewrite records without rewriting the entire file.
This means normal strings are out.
Like Douglas said, you need to know the fixed size of your types (both T and V). Also, variable-length instances in the object grid referenced by any of those instances are out.
Still, implementing a dictionary backed by a file is quite simple and you can use the BinaryWriter class to write the types to disk, after inheriting or encapsulating the Dictionary<TKey, TValue> class.
Consider a memory mapped file. I'm not sure if there is direct support in .NET, but you could pinvoke the Win32 calls.
I haven't actually used it, but this project apparently provides an mmap()-like implementation in C#
Mmap
I'd recommend SQL Server Express or other database.
It's free.
It integrates very well with C#, including LINQ.
It's faster than a homemade solution.
It's more reliable than a homemade solution.
It's way more powerful than a simple disk-based data structure, so it'll be easy to do more in the future.
SQL is an industry standard, so other developers will understand your program more easily, and you'll have a skill that is useful in the future.
I am not much of a programmer, but wouldn't creating a really simple XML format to store your data do the trick?
<dico>
<dicEntry index="x">
<key>MyKey</key>
<val type="string">My val</val>
</dicEntry>
...
</dico>
From there, you load the XML file DOM and fill up your dictionary as you like,
XmlDocument xdocDico = new XmlDocument();
string sXMLfile;
public loadDico(string sXMLfile, [other args...])
{
xdocDico.load(sXMLfile);
// Gather whatever you need and load it into your dico
}
public flushDicInXML(string sXMLfile, dictionary dicWhatever)
{
// Dump the dic in the XML doc & save
}
public updateXMLDOM(index, key, value)
{
// Update a specific value of the XML DOM based on index or key
}
Then whenever you want, you can update the DOM and save it on disk.
xdocDico.save(sXMLfile);
If you can afford to keep the DOM in memory performance-wise, it's pretty easy to deal with. Depending on your requirements, you may not even need the dictionary at all.

Categories