Looking for a simple standalone persistent dictionary implementation in C# [closed] - c#

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
For an open source project I am looking for a good, simple implementation of a Dictionary that is backed by a file. Meaning, if an application crashes or restarts the dictionary will keep its state. I would like it to update the underlying file every time the dictionary is touched. (Add a value or remove a value). A FileWatcher is not required but it could be useful.
class PersistentDictionary<T,V> : IDictionary<T,V>
{
public PersistentDictionary(string filename)
{
}
}
Requirements:
Open Source, with no dependency on native code (no sqlite)
Ideally a very short and simple implementation
When setting or clearing a value it should not re-write the entire underlying file, instead it should seek to the position in the file and update the value.
Similar Questions
Persistent Binary Tree / Hash table in .Net
Disk backed dictionary/cache for c#
PersistentDictionary<Key,Value>

bplustreedotnet
The bplusdotnet package is a library of cross compatible data structure implementations in C#, java, and Python which are useful for applications which need to store and retrieve persistent information. The bplusdotnet data structures make it easy to store string keys associated with values permanently.
ESENT Managed Interface
Not 100% managed code but it's worth mentioning it as unmanaged library itself is already part of every windows XP/2003/Vista/7 box
ESENT is an embeddable database storage engine (ISAM) which is part of Windows. It provides reliable, transacted, concurrent, high-performance data storage with row-level locking, write-ahead logging and snapshot isolation. This is a managed wrapper for the ESENT Win32 API.
Akavache
*Akavache is an asynchronous, persistent key-value cache created for writing native desktop and mobile applications in C#. Think of it like memcached for desktop apps.
- The C5 Generic Collection Library
C5 provides functionality and data structures not provided by the standard .Net System.Collections.Generic namespace, such as persistent tree data structures, heap based priority queues, hash indexed array lists and linked lists, and events on collection changes.

Let me analyze this:
Retrieve information by key
Persistant storage
Do not want to write back the whole file when 1 value changes
Should survive crashes
I think you want a database.
Edit: I think you are searching for the wrong thing. Search for a database that fits your requirements. And change some of your requirements, because I think it will be difficult to meet them all.

one way is to use the Extensible Storage Engine built into windoows to store your stuff. It's a native win database that supports indexing, transactions etc...

I was working on porting EHCache to .NET. Take a look at the project
http://sourceforge.net/projects/thecache/
Persistent caching is core functionality that is already implemented. All main Unit Tests are passing. I got a bit stuck on distributed caching, but you do not need that part.

Sounds cool, but how will you get around changes to the stored value (if it was a reference type) itself? If its immutable then all is well but if not you're kinda stuffed :-)
If you're not dealing with immutable values, I would suspect a better approach would be to handle persistence at the value level and to just rebuild the dictionary as necessary.
(edited to add a clarification)

I think your issue is likely to be that last point:
When setting or clearing a value it should not re-write the entire underlying file, instead it should seek to the position in the file and update the value.
This is exactly what a DB does - you're basically describing a simple file based table structure.
We can illustrate the problem by looking at strings.
Strings in memory are flexible things - you don't need to know the length of a string in C# when you declare its type.
In data storage strings and everything else are fixed sizes. Your saved dictionary on disk is just a collection of bytes, in order.
If you replace a value in the middle it either has to be exactly the same size or you will have to rewrite every byte that comes after it.
This is why most databases restrict text and blob fields to fixed sizes. New features like varchar(max)/varbinary(max) in Sql 2005+ are actually clever simplifications to the row only actually storing a pointer to the real data.
You can't use the fixed sizes with your example because it's generic - you don't know what type you're going to be storing so you can't pad the values out to a maximum size.
You could do:
class PersistantDictionary<T,V> : Dictionary<T,V>
where V:struct
...as value types don't vary in storage size, although you would have to be careful with your implementation to save the right amount of storage for each type.
However your model wouldn't be very performant - if you look at how SQL server and Oracle deal with table changes they don't change the values like this. Instead they flag the old record as a ghost, and add a new record with the new value. Old ghosted records are cleaned up later when the DB is less busy.
I think you're trying to reinvent the wheel:
If you're dealing with large amounts of data then you really need to check out using a full-blown DB. MySql or SqlLite are both good, but you're not going to find a good, simple, open-source and lite implementation.
If you aren't dealing with loads of data then I'd go for whole file serialisation, and there are already plenty of good suggestions here on how to do that.

I wrote up an implementation myself based on a very similar (I think identical) requirement I had on another project a while ago. When I did it, one thing I realised was that most of the time you'll be doing writes, you only do a read rarely when the program crashes or when it's closed. So the idea is to make the writes as fast as possible. What I did was make a very simple class which would just write a log of all the operations (additions and deletions) to the dictionary as things occurred. So after a while you get a lot of repeating between keys. Because of that, once the object detects a certain amount of repetition, it'll clear the log and rewrite it so each key and its value only appears once.
Unfortunately, you can't subclass Dictionary because you can't override anything in it. This is my simple implementation, I haven't tested it though I'm sorry, I thought you might want the idea though. Feel free to use it and change it as much as you like.
class PersistentDictManager {
const int SaveAllThreshold = 1000;
PersistentDictManager(string logpath) {
this.LogPath = logpath;
this.mydictionary = new Dictionary<string, string>();
this.LoadData();
}
public string LogPath { get; private set; }
public string this[string key] {
get{ return this.mydictionary[key]; }
set{
string existingvalue;
if(!this.mydictionary.TryGetValue(key, out existingvalue)) { existingvalue = null; }
if(string.Equals(value, existingvalue)) { return; }
this[key] = value;
// store in log
if(existingvalue != null) { // was an update (not a create)
if(this.IncrementSaveAll()) { return; } // because we're going to repeat a key the log
}
this.LogStore(key, value);
}
}
public void Remove(string key) {
if(!this.mydictionary.Remove(key)) { return; }
if(this.IncrementSaveAll()) { return; } // because we're going to repeat a key in the log
this.LogDelete(key);
}
private void CreateWriter() {
if(this.writer == null) {
this.writer = new BinaryWriter(File.Open(this.LogPath, FileMode.Open));
}
}
private bool IncrementSaveAll() {
++this.saveallcount;
if(this.saveallcount >= PersistentDictManager.SaveAllThreshold) {
this.SaveAllData();
return true;
}
else { return false; }
}
private void LoadData() {
try{
using(BinaryReader reader = new BinaryReader(File.Open(LogPath, FileMode.Open))) {
while(reader.PeekChar() != -1) {
string key = reader.ReadString();
bool isdeleted = reader.ReadBoolean();
if(isdeleted) { this.mydictionary.Remove(key); }
else {
string value = reader.ReadString();
this.mydictionary[key] = value;
}
}
}
}
catch(FileNotFoundException) { }
}
private void LogDelete(string key) {
this.CreateWriter();
this.writer.Write(key);
this.writer.Write(true); // yes, key was deleted
}
private void LogStore(string key, string value) {
this.CreateWriter();
this.writer.Write(key);
this.writer.Write(false); // no, key was not deleted
this.writer.Write(value);
}
private void SaveAllData() {
if(this.writer != null) {
this.writer.Close();
this.writer = null;
}
using(BinaryWriter writer = new BinaryWriter(File.Open(this.LogPath, FileMode.Create))) {
foreach(KeyValuePair<string, string> kv in this.mydictionary) {
writer.Write(kv.Key);
writer.Write(false); // is not deleted flag
writer.Write(kv.Value);
}
}
}
private readonly Dictionary<string, string> mydictionary;
private int saveallcount = 0;
private BinaryWriter writer = null;
}

Check this blog out:
http://ayende.com/Blog/archive/2009/01/17/rhino.dht-ndash-persistent-amp-distributed-storage.aspx
Looks to be exactly what you are looking for.

Just use serialization. Look at the BinaryFormatter class.

I don't know of anything to solve your problem. It will need to be a fixed size structure, so that you can meet the requirements of being able to rewrite records without rewriting the entire file.
This means normal strings are out.

Like Douglas said, you need to know the fixed size of your types (both T and V). Also, variable-length instances in the object grid referenced by any of those instances are out.
Still, implementing a dictionary backed by a file is quite simple and you can use the BinaryWriter class to write the types to disk, after inheriting or encapsulating the Dictionary<TKey, TValue> class.

Consider a memory mapped file. I'm not sure if there is direct support in .NET, but you could pinvoke the Win32 calls.

I haven't actually used it, but this project apparently provides an mmap()-like implementation in C#
Mmap

I'd recommend SQL Server Express or other database.
It's free.
It integrates very well with C#, including LINQ.
It's faster than a homemade solution.
It's more reliable than a homemade solution.
It's way more powerful than a simple disk-based data structure, so it'll be easy to do more in the future.
SQL is an industry standard, so other developers will understand your program more easily, and you'll have a skill that is useful in the future.

I am not much of a programmer, but wouldn't creating a really simple XML format to store your data do the trick?
<dico>
<dicEntry index="x">
<key>MyKey</key>
<val type="string">My val</val>
</dicEntry>
...
</dico>
From there, you load the XML file DOM and fill up your dictionary as you like,
XmlDocument xdocDico = new XmlDocument();
string sXMLfile;
public loadDico(string sXMLfile, [other args...])
{
xdocDico.load(sXMLfile);
// Gather whatever you need and load it into your dico
}
public flushDicInXML(string sXMLfile, dictionary dicWhatever)
{
// Dump the dic in the XML doc & save
}
public updateXMLDOM(index, key, value)
{
// Update a specific value of the XML DOM based on index or key
}
Then whenever you want, you can update the DOM and save it on disk.
xdocDico.save(sXMLfile);
If you can afford to keep the DOM in memory performance-wise, it's pretty easy to deal with. Depending on your requirements, you may not even need the dictionary at all.

Related

PostScript memory management implementation for local and global

In PostScript you have VM to store the values of composite objects.
They can be stored in local or global VM depending on the VM allocation mode of the interpreter.
I'm working on an interpreter in C# (a bit similar to the JAVA) language. And I can't figure out how to represent local and global VM.
Let's say I have an object:
public class StringObj : Composite {
public string Data { get;set; }
}
The Data property (value of StringObj) is stored either in local or global VM. But how could this be presented in C# (or Java).
C# itself already has memory management itself (stack/heap/...) but these are the internals of the language and .NET framework, which I can't control.
Should I need to create an own memory structure? If so, how would/could that be represented?
Or would it be ok to just store a bool property on each Composite object to know if it is local or global, something like this:
public class StringObj : Composite {
public string Data { get;set; }
public bool IsGlobal { get; set; }
}
Update:
Maybe if I know how the "save" operator works, I might better understand how to implement the memory management.
What exactly does the "save" operator save?
"creates a snapshot of the current state of virtual memory"
From reading the restore operator I think it stores this:
The array packing mode (packing)
VM allocation mode (boolean)
object output format (?)
user interpreter parameters (?)
saves a copy of the current graphics state on the graphics state stack
What else does it save? as the definition of "current state of virtual memory" is not quite good defined.
Should I also check every object on the stack to verify if it is composite or not and save the value of the object on the stack? or are stacks/dictionaries untouched? or..?
This isn't really a question anyone else can answer for you, if you are intent on writing your own PostScript interpreter.
You will need to fully understand the memory management of PostScript objects, and their lifetime, and design your own memory management around that. I think it very unlikely that you can get away without designing your own memory structure(s), I've certainly never seen a PostScript interpreter which didn't.
Again I can't answer a vague question like "how would/could that be represented?", that's much too general. There are many ways you could design a PostScript memory manager, the choice is entirely yours. If memory management interests you then presumably you will have a preferred approach, if it doesn't then stick with something simple, just make sure it covers all the basics.
By the way, is there a reason you are writing your own interpreter, other than personal satisfaction ? The general consensus is that its more than 5 man year work to implement a PostScript interpreter (potentially less if you use a pre-existing graphics library for rendering, assuming you have a PostScript-compatible one). That seems like a lot of work for a 30+ year old language that is not in wide usage.
Regarding save:
save saves, well, everything.... Just like the sentence you quoted, a save is a mark that you can later return to, and encapsulates everything in the PostScript VM.
I think you've missed the fact that I can make changes to a composite object, and those changes are subject to save and restore.
Try this:
%!
/mydict <</Test (this is a string)>> def
save
mydict /Test (This is not a string) put
mydict /Test get == flush
restore
mydict /Test get == flush
Notice that the content of the dictionary changes after the restore.
One interpreter I knew of used save as,essentially, a 'high water mark', combined with a copy-on-write architecture. If you performed an operation on an existing composite object which was below a save mark, then the object was copied, and the alteration made to the copy. Then a restore simply freed everything back to the last save mark. However details of implementation are not specified, provided the interpreter behaves correctly you can do anything you like.
Note that objects in global VM are not subject to save and restore. You also need to be careful around the stack contents when doing a restore, to ensure that objects on the stack wouldn't be discarded, this triggers an invalidrestore error.
Note 2, the job server loop save and restore will affect global VM.....

Update only the changes that are different

I have a Entity-Set employee_table, I'm getting the data through excel sheet which I have loaded in the memory and the user will click Save to save the changes in the db and its all good for the first time inserting records and no issue with that.
but how can I update only the changes that are made? meaning that, let say I have 10 rows and 5 columns and out of 10 rows say row 7th was modified and out of 5 column let say 3rd column was modified and I just need to update only those changes and keep the existing value of the other columns.
I can do with checking if (myexistingItem.Name != dbItem.Name) { //update } but its very tedious and not efficient and I'm sure there is a better way to handle.
here is what I got so far.
var excelData = SessionWrapper.GetSession_Model().DataModel.OrderBy(x => x.LocalName).ToList();;
var dbData = context.employee_master.OrderBy(x => x.localname).ToList();
employee_master = dbEntity = employee_master();
if (dbData.Count > 0)
{
//update
foreach (var dbItem in dbData)
{
foreach(var xlData in excelData)
{
if(dbItem.customer == xlData.Customer)
{
dbEntity.customer = xlData.Customer;
}
//...do check rest of the props....
db.Entry(dbEntity).State = EntityState.Modified;
db.employee_master.Add(dbEntity);
}
}
//save
db.SaveChanges();
}
else
{
//insert
}
You can make this checking more generic with reflection.
Using this answer to get the value by property name.
public static object GetPropValue(object src, string propName)
{
return src.GetType().GetProperty(propName).GetValue(src, null);
}
Using this answer to set the value by property name.
public static void SetPropertyValue(object obj, string propName, object value)
{
obj.GetType().GetProperty(propName).SetValue(obj, value, null);
}
And this answer to list all properties
public static void CopyIfDifferent(Object target, Object source)
{
foreach (var prop in target.GetType().GetProperties())
{
var targetValue = GetPropValue(target, prop.Name);
var sourceValue = GetPropValue(source, prop.Name);
if (!targetValue.Equals(sourceValue))
{
SetPropertyValue(target, prop.Name, sourceValue);
}
}
}
Note: If you need to exclude some properties, you can implement very easy by passsing the list of properties to the method and you can check in if to be excluded or not.
Update:
I am updating this answer to provide a little more context as to why I suggested not going with a hand-rolled reflection-based solution for now; I also want to clarify that there is nothing wrong with a such a solution per se, once you have identified that it fits the bill.
First of all, I assume from the code that this is a work in progress and therefore not complete. In that case, I feel that the code doesn't need more complexity before it's done and a hand-rolled reflection-based approach is more code for you to write, test, debug and maintain.
For example, right now you seem to have a situation where there is a simple 1:1 simple copy from the data in the excel to the data in the employee_master object. So in that case reflection seems like a no-brainer, because it saves you loads of boring manual property assignments.
But what happens when HR (or whoever uses this app) come back to you with the requirement: If Field X is blank on the Excel sheet, then copy the value "Blank" to the target field, unless it's Friday, in which case copy the value "N.A".
Now a generalised solution has to accomodate custom business logic and could start to get burdensome. I have been in this situation and unless you are very careful, it tends to end up with turning a mess in the long run.
I just wanted to point this out - and recommend at least looking at Automapper, because this already provides one very proven way to solve your issue.
In terms of efficiency concerns, they are only mentioned because the question mentioned them, and I wanted to point out that there are greater inefficiencies at play in the code as posted as opposed to the inefficiency of manually typing 40+ property assignments, or indeed the concern of only updating changed fields.
Why not rewrite the loop:
foreach (var xlData in excelData)
{
//find existing record in database data:
var existing = dbData.FirstOrDefault(d=>d.customer==xlData.Customer);
if(existing!=null)
{
//it already exists in database, update it
//see below for notes on this.
}
else
{
//it doesn't exist, create employee_master and insert it to context
//or perform validation to see if the insert can be done, etc.
}
//and commit:
context.SaveChanges();
}
This lets you avoid the initial if(dbData.Count>0) because you will always insert any row from the excel sheet that doesn't have a matching entry in dbData, so you don't need a separate block of code for first-time insertion.
It's also more efficient than the current loop because right now you are iterating every object in dbData for every object in xlData; that means if you have 1,000 items in each you have a million iterations...
Notes on the update process and efficiency in general
(Note: I know the question wasn't directly about efficiency, but since you mentioned it in the context of copying properties, I just wanted to give some food for thought)
Unless you are building a system that has to do this operation for multiple entities, I'd caution against adding more complexity to your code by building a reflection-based property copier.
If you consider the amount of properties you have to copy (i.e the number of foo.x = bar.x type statements) , and then consider the code required to have a robust, fully tested and provably efficient reflection-based property copier (i.e with built-in cache so you don't have to constantly re-reflect type properties, a mechanism to allow you to specify exceptions, handling for edge cases where for whatever unknown reason you discover that for random column X the value "null" is to be treated a little differently in some cases, etc), you may well find that the former is actually significantly less work :)
Bear in mind that even the fastest reflection-based solution will always still be slower than a good old fashioned foo.x = bar.x assignment.
By all means if you have to do this operation for 10 or 20 separate entities, consider the general case, otherwise my advice would be, manually write the property copy assignments, get it right and then think about generalising - or look at Automapper, for example.
In terms of only updating field that have changed - I am not sure you even need to. If the record exists in the database, and the user has just presented a copy of that record which they claim to be the "correct" version of that object, then just copy all the values they gave and save them to the database.
The reason I say this is because in all likelihood, the efficiency of only sending for example 4 modified fields versus 25 or whatever, pales into insignificance next to the overhead of the actual round-trip to the database itself; I'd be surprised if you were able to observe a meaningful performance increase in these kinds of operations by not sending all columns - unless of course all columns are NVARCHAR(MAX) or something :)
If concurrency is an issue (i.e. other uses might be modifying the same data) then include a ROWVERSION type column in the database table, map it in Entity Framework, and handle the concurrency issues if and when they arise.

Serializing a Dictionary to disk?

We have a Hashtable (specifically the C# Dictionary class) that holds several thousands/millions of (Key,Value) pairs for near O(1) search hits/misses.
We'd like to be able to flush this data structure to disk (serialize it) and load it again later (deserialize) such that the internal hashtable of the Dictionary is preserved.
What we do right now:
Load from Disk => List<KVEntity>. (KVEntity is serializable. We use Avro to serialize - can drop Avro if needed)
Read every KVEntity from array => dictionary. This regenerates the dictionary/hashtable internal state.
< System operates, Dictionary can grow/shrink/values change etc >
When saving, read from the dictionary into array (via myKVDict.Values.SelectMany(x => x) into a new List<KVEntity>)
We serialize the array (List<KVEntity>) to disk to save the raw data
Notice that during our save/restore, we lose the internal tashtable/dictionary state and have to rebuild it each time.
We'd like to directly serialize to/from Dictionary (including it's internal "live" state) instead of using an intermediate array just for the disk i/o. How can we do that?
Some pseudo code:
// The actual "node" that has information. Both myKey and myValue have actual data work storing
public class KVEntity
{
public string myKey {get;set;}
public DataClass myValue {get;set;}
}
// unit of disk IO/serialization
public List<KVEntity> myKVList {get;set;}
// unit of run time processing. The string key is KVEntity.myKey
public Dictionary<string,KVEntity> myKVDict {get;set;}
Storing the internal state of the Dictionary instance would be bad practice - a key tenet of OOP is encapsulation: that internal implementation details are deliberately hidden from the consumer.
Furthermore, the mapping algorithm used by Dictionary might change across different versions of the .NET Framework, especially given that CIL assemblies are designed to be forward-compatible (i.e. a program written against .NET 2.0 will generally work against .NET 4.5).
Finally, there are no real performance gains from serialising the internal state of the dictionary. It is much better to use a well-defined file format with a focus on maintainability than speed. Besides, if the dictionary contains "several thousands" of entries then that should load from disk in under 15ms by my reckon (assuming you have an efficient on-disk format). Finally, a data structure optimised for RAM will not necessarily work well on-disk where sequential reads/writes are better.
Your post is very adamant about working with the internal state of the dictionary, but your existing approach seems fine (albiet, it could do with some optimisations). If you revealed more details we can help you make it faster.
Optimisations
The main issues I see with your existing implementation is the conversion to/from Arrays and Lists, which is unnecessary given that Dictionary is directly enumerable.
I would do something like this:
Dictionary<String,TFoo> dict = ... // where TFoo : new() && implements a arbitrary Serialize(BinaryWriter) and Deserialize(BinaryReader) methods
using(FileStream fs = File.OpenWrite("filename.dat"))
using(BinaryWriter wtr = new BinaryWriter(fs, Encoding.UTF8)) {
wtr.Write( dict.Count );
foreach(String key in dict.Keys) {
wtr.Write( key );
wtr.Write('\0');
dict[key].Serialize( wtr );
wtr.Write('\0'); // assuming NULL characters can work as record delimiters for safety.
}
}
Assuming that your TFoo's Serialize method is fast, I really don't think you'll get any faster speeds than this approach.
Implementing a de-serializer is an exercise for the reader, but should be trivial. Note how I stored the size of the dictionary to the file, so the returned dictionary can be set with the correct size when it's created, thus avoiding the re-balancing problem that #spender describes in his comment.
So we're going to stick with our existing strategy given Dai's reasoning and that we have C# and Java compatibility to maintain (which means the extra tree-state bits of the C# Dictionary would be dropped on the Java side anyways which would load only the node data as it does right now).
For later readers still interested in this I found a very good response here that somewhat answers the question posed. A critical difference is that this answer is for B+ Trees, not Dictionaries, although in practical applications those two data structures are very similar in performance. B+ Tree performance closer to Dictionaries than regular trees (like binary, red-black, AVL etc). Specifically, Dictionaries deliver near O(1) performance (but no "select from a range" abilities) while B+ Trees have O(logb(X)) where b = base is usually large which makes them very performant compared to regular trees where b=2. I'm copy-pasting it here for completeness but all credit goes to csharptest.net for the B+ Tree code, test, benchmarks and writeup(s).
For completeness I'm going to add my own implementation here.
Introduction - http://csharptest.net/?page_id=563
Benchmarks - http://csharptest.net/?p=586
Online Help - http://help.csharptest.net/
Source Code - http://code.google.com/p/csharptest-net/
Downloads - http://code.google.com/p/csharptest-net/downloads
NuGet Package - http://nuget.org/List/Packages/CSharpTest.Net.BPlusTree

In memory representation of large data

Currently, I am working on a project where I need to bring GBs of data on to client machine to do some task and the task needs whole data as it do some analysis on the data and helps in decision making process.
so the question is, what are the best practices and suitable approach to manage that much amount of data into memory without hampering the performance of client machine and application.
note: at the time of application loading, we can spend time to bring data from database to client machine, that's totally acceptable in our case. but once the data is loaded into application at start up, performance is very important.
This is a little hard to answer without a problem statement, i.e. what problems you are currently facing, but the following is just some thoughts, based on some recent experiences we had in a similar scenario. It is, however, a lot of work to change to this type of model - so it also depends how much you can invest trying to "fix" it, and I can make no promise that "your problems" are the same as "our problems", if you see what I mean. So don't get cross if the following approach doesn't work for you!
Loading that much data into memory is always going to have some impact, however, I think I see what you are doing...
When loading that much data naively, you are going to have many (millions?) of objects and a similar-or-greater number of references. You're obviously going to want to be using x64, so the references will add up - but in terms of performance the biggesst problem is going to be garbage collection. You have a lot of objects that can never be collected, but the GC is going to know that you're using a ton of memory, and is going to try anyway periodically. This is something I looked at in more detail here, but the following graph shows the impact - in particular, those "spikes" are all GC killing performance:
For this scenario (a huge amount of data loaded, never released), we switched to using structs, i.e. loading the data into:
struct Foo {
private readonly int id;
private readonly double value;
public Foo(int id, double value) {
this.id = id;
this.value = value;
}
public int Id {get{return id;}}
public double Value {get{return value;}}
}
and stored those directly in arrays (not lists):
Foo[] foos = ...
the significance of that is that because some of these structs are quite big, we didn't want them copying themselves lots of times on the stack, but with an array you can do:
private void SomeMethod(ref Foo foo) {
if(foo.Value == ...) {blah blah blah}
}
// call ^^^
int index = 17;
SomeMethod(ref foos[index]);
Note that we've passed the object directly - it was never copied; foo.Value is actually looking directly inside the array. The tricky bit starts when you need relationships between objects. You can't store a reference here, as it is a struct, and you can't store that. What you can do, though, is store the index (into the array). For example:
struct Customer {
... more not shown
public int FooIndex { get { return fooIndex; } }
}
Not quite as convenient as customer.Foo, but the following works nicely:
Foo foo = foos[customer.FooIndex];
// or, when passing to a method, SomeMethod(ref foos[customer.FooIndex]);
Key points:
we're now using half the size for "references" (an int is 4 bytes; a reference on x64 is 8 bytes)
we don't have several-million object headers in memory
we don't have a huge object graph for GC to look at; only a small number of arrays that GC can look at incredibly quickly
but it is a little less convenient to work with, and needs some initial processing when loading
additional notes:
strings are a killer; if you have millions of strings, then that is problematic; at a minimum, if you have strings that are repeated, make sure you do some custom interning (not string.Intern, that would be bad) to ensure you only have one instance of each repeated value, rather than 800,000 strings with the same contents
if you have repeated data of finite length, rather than sub-lists/arrays, you might consider a fixed array; this requires unsafe code, but avoids another myriad of objects and references
As an additional footnote, with that volume of data, you should think very seriously about your serialization protocols, i.e. how you're sending the data down the wire. I would strongly suggest staying far away from things like XmlSerializer, DataContractSerializer or BinaryFormatter. If you want pointers on this subject, let me know.

Persist List<int> through App Shutdowns

Short Version
I have a list of ints that I need to figure out how to persist through Application Shutdown. Not forever but, you get the idea, I can't have the list disappear before it is dealt with. The method for dealing with it will remove entry's in the list.
What are my options? XML?
Background
We have a WinForm app that uses Local SQL Express DB's that participate in Merge Replication with a Central Server. This will be difficult to explain but we also have(kind of) an I-Series 400 Server that a small portion of data gets written to as well. For various reasons the I-Series is not available through replication and as such all "writes" to it need to be done while it is available.
My first thought to solve this was to simply have a List object that stored the PK that needed to be updated. Then, after a successful sync, I would have a method that checks that list and calls the UpdateISeries() once for each PK in the list. I am pretty sure this would work, except in a case where they shut down innappropriately or lost power, etc. So, does anyone have better ideas on how to solve this? XML file maybe, though I have never done that. I worry about actually creating a Table in SQL Express because of Replication....maybe unfounded but...
For reference, UpdateISeries(int PersonID) is an existing Method in a DLL that is used internally. Re-writting it, as a potential solution to this issue, really isn't viable at the time.
Sounds like you need to serialize and deserialize some objects.
See these .NET topics to find out more.
From the linked page:
Serialization is the process of converting the state of an object into a form that can be persisted or transported. The complement of serialization is deserialization, which converts a stream into an object. Together, these processes allow data to be easily stored and transferred.
If it is not important for the on-disk format to be human readable, and you want it to be as small as possible, look at binary serialization.
Using the serialization mechanism is probably the way to go. Here is an example using the BinaryFormatter.
public void Serialize(List<int> list, string filePath)
{
using (Stream stream = File.OpenWrite(filePath))
{
var formatter = new BinaryFormatter();
formatter.Serialize(stream, list);
}
}
public List<int> Deserialize(string filePath)
{
using (Stream stream = File.OpenRead(filePath)
{
var formatter = new BinaryFormatter();
return (List<int>)formatter.Deserialize(stream);
}
}
If you already have and interact with a SQL database, use that, to get simpler code with fewer dependencies. Replication can be configured to ignore additional tables (even if you have to place them in another schema). This way, you can avoid a number of potential data corruption problems.

Categories