Multi-sorted data structure - c#

I'm trying to implement A* and I ran into a problem. I have a set where I need to find the minimum value of a given function, but I also need to be able to check if a given cell is in that set. In order to do this efficiently, I need the set to be sorted by position and value. It doesn't seem too difficult to write such a data structure. I just need one sorted by position and one by value, and have each refer to the other. There are two problems with this. First, in order to do it well, the structures need to be able to refer to parts of each other. There's no point in searching through a tree in log time if I can just point to the particular element. In order to do this, I'd pretty much need to rewrite trees from scratch. Second, it doesn't seem like the sort of thing I should be writing. Data structures are supposed to be part of the libraries. What's the name of the sort of data structure I need, and where can I find a C# library for it?

There is no need for the two data structures to interact at all. Just have two data structures side by side. Make sure that when you add/remove an item you add/remove it from both. You can then fetch the minimum value of either collection based on which property you're interested in.
The only real reason to create a new data structure would be to ensure that adding/removing items was kept in sync between the two collections. There would be no need to manipulate the actual trees explicitly.
Such a custom type would look something like this (other operations omitted; they all just delegate to first and/or second).
public class SetPair<T>
{
private SortedSet<T> first;
private SortedSet<T> second;
public SetPair(IComparer<T> firstComparer, IComparer<T> secondComparer)
{
first = new SortedSet<T>(firstComparer ?? Comparer<T>.Default);
second = new SortedSet<T>(secondComparer ?? Comparer<T>.Default);
}
public T FirstMin { get { return first.Min; } }
public T SecondMin { get { return second.Min; } }
public bool Add(T item)
{
return first.Add(item) &&
second.Add(item);
}
public bool Remove(T item)
{
return first.Remove(item) &&
second.Remove(item);
}
}

Related

Ordered collection based on DateTime

I'm looking for a way to implement a collection which guarantees order based on a DateTime value. My initial idea was to use a SortedSet<T1> with a custom IComparer<T1> like this:
internal class DateTimeComparer : IComparer<MyType> {
public int Compare(MyType x, MyType y) {
return x.DateTimeProp.CompareTo(y.DateTimeProp);
}
}
var sortedSet = new SortedSet<MyType>(new DateTimeComparer());
However this doesn't work as the set seems to override/replace all items which have the exact timestamp. To validate this assumption, I created two collections, one being a simple list which was sorted using the DateTime property after its was populated, and another one based on the SortedSet<T1> -> The one based on the set had several entries missing which happened to be the ones which had the exact same timestamp.
What other options are there to implement such a collection?
An efficient way to maintain a sorted collection of items is to use a binary search tree. You could of course build your own binary search tree, however the SortedSet<T> class is implemented using a red-black binary search tree so it seems smarter to reuse that class which is exactly what you are trying to do.
The ordering of items in SortedSet<T> is controlled by comparing pairs of items by calling the IComparer<T>.Compare method. If this method returns 0 the two items are considered equal and only one of these items will be stored in the set. In your case with DateTimeComparer you get the problem that only a single MyType instance with a specifiec DateTimeProp can be stored in the set.
To solve this problem you have to make sure that distinct MyType instances never are equal when compared using the DateTimeComparer.Compare method. You can modify your code to achieve this:
class DateTimeComparer : IComparer<MyType> {
readonly ObjectIDGenerator idGenerator = new ObjectIDGenerator();
public int Compare(MyType x, MyType y) {
if (x.DateTimeProp != y.DateTimeProp)
return x.DateTimeProp.CompareTo(y.DateTimeProp);
bool firstTime;
var xId = idGenerator.GetId(x, out firstTime);
var yId = idGenerator.GetId(y, out firstTime);
return xId.CompareTo(yId);
}
}
If the two instances have different values of DateTimeProp then they should be ordered according to these. This is handled by the initial if statement.
If the two values have the same DateTimeProp values they need to be ordered based on some other criteria. You can use other properties of MyType but there might be cases where these properties are equal and it is important that the method never returns 0 except if x and y refers to the same instances (e.g. ReferenceEquals(x, y) is true).
To handle this you can use an ObjectIDGenerator which will assign unique 64 bit ID values to distinct instances. These can then be compared to provide an ordering.
Note that the ordering of items with same DateTimeProp values will be random but consistent. To control this ordering you can use other properties of MyType but eventually you will have to use the generated ID to provide an ordering when all properties of two different instances are the same.

Membase server And Enyim -- Integrating LINQ and/or Collections

I just installed membase and the enyim client for .NET, and came across an article that mentions this technique for integrating linq:
public static IEnumerable<T> CachedQuery<T>
(this IQueryable<T> query, MembaseClient cache, string key) where T : class
{
object result;
if (cache.TryGet(key, out result))
{
return (IEnumerable<T>)result;
}
else
{
IEnumerable<T> items = query.ToList();
cache.Store(StoreMode.Set, key, items);
return items;
}
}
It will check if the required data is in cache first, and if not cache it then return it.
Currently I am using a Dictionary<'String, List'> in my application and want to replace this with a membase/memcached type approach.
What about a similar pattern for adding items to a List<'T'> or using Linq operators on a cached list? It seems to me that it could be a bad idea to store an entire List<'T'> in cache under a single key and have to retrieve it, add to it, and then re-set it each time you want to add an element. Or is this an acceptable practice?
public bool Add(T item)
{
object list;
if (cache.TryGet(this.Key, out list))
{
var _list = list as List<T>;
_list.Add(item);
return cache.Store(StoreMode.Set, this.Key, _list);
}
else
{
var _list = new List<T>(new T[] { item });
return cache.Store(StoreMode.Set, this.Key, _list);
}
}
How are collections usually handled in a caching situation like this? Are hashing algorithms usually used instead, or some sort of key-prefixing system to identify 'Lists' of type T within the key-value store of the cache?
It depends on several factors:
Is this supposed to be scalable? Is this list user-specific and you can be certain that "Add" won't be called twice at the same time for the same list? - Race conditions are a risk.
I did implement such a thing where I stored a generic list in membase, but it's user-specific, so I can be pretty certain that there will be no race condition.
You should also consider the volume of the serialized list, which may be large. I my case the lists were pretty small.
Not sure if it helps, but I implemented a very basic iterateable list with random access over membase (via double indirection). Random access is done via a composite key (which is composed of several fields).
You need to:
Have a key that holds the list's length.
Have the ability to build the composite key (e.g one or more fields from your object).
Have the value that you'd like save (e.g. another field).
E.g:
list_length = 3
prefix1_0-> prefix2_[field1.value][field2.value][field3.value] -> field4.value
prefix1_1-> prefix2_[field1.value][field2.value][field3.value] -> field4.value
prefix1_2-> prefix2_[field1.value][field2.value][field3.value] -> field4.value
To perform serial access you iterate over the keys with "prefix1". To perform random access you use the keys with "prefix2" ans the fields that compose the key.
I hope it's clear enough.

capture changes to properties of an object

I have multiple business objects in my application (C#, Winforms, WinXP). When the user executes some action on the UI, each of these objects are modified and updated by different parts of the application. After each modification, I need to first check what has changed and then log these changes made to the object. The purpose of logging this is to create a comprehensive tracking of activity going on in the application.
Many among these objects contain contain lists of other objects and this nesting can be several levels deep. The 2 main requirements for any solution would be
capture changes as accurately as possible
keep performance cost to minimum.
eg of a business object:
public class MainClass1
{
public MainClass1()
{
detailCollection1 = new ClassDetailCollection1();
detailCollection2 = new ClassDetailCollection2();
}
private Int64 id;
public Int64 ID
{
get { return id; }
set { id = value; }
}
private DateTime timeStamp;
public DateTime TimeStamp
{
get { return timeStamp; }
set { timeStamp = value; }
}
private string category = string.Empty;
public string Category
{
get { return category; }
set { category = value; }
}
private string action = string.Empty;
public string Action
{
get { return action; }
set { action = value; }
}
private ClassDetailCollection1 detailCollection1;
public ClassDetailCollection1 DetailCollection1
{
get { return detailCollection1; }
}
private ClassDetailCollection2 detailCollection2;
public ClassDetailCollection2 DetailCollection2
{
get { return detailCollection2; }
}
//more collections here
}
public class ClassDetailCollection1
{
private List<DetailType1> detailType1Collection;
public List<DetailType1> DetailType1Collection
{
get { return detailType1Collection; }
}
private List<DetailType2> detailType2Collection;
public List<DetailType2> DetailType2Collection
{
get { return detailType2Collection; }
}
}
public class ClassDetailCollection2
{
private List<DetailType3> detailType3Collection;
public List<DetailType3> DetailType3Collection
{
get { return detailType3Collection; }
}
private List<DetailType4> detailType4Collection;
public List<DetailType4> DetailType4Collection
{
get { return detailType4Collection; }
}
}
//more other Types like MainClass1 above...
I can assume that I will have access to the old values and new values of the object.
In that case I can think of 2 ways to try to do this without being told what has explicitly changed.
use reflection and iterate thru all properties of the object and compare
those with the corresponding
properties of the older object. Log
any properties that have changed. This
approach seems to be more flexible, in
that I would not have to worry if any
new properties are added to any of the
objects. But it also seems performance
heavy.
Log changes in the setter of all the properties for all the objects.
Other than the fact that this will
need me to change a lot of code, it
seems more brute force. This will be
maintenance heavy and inflexible if
some one updates any of the Object
Types. But this way it may also be
preformance light since I will not
need to check what changed and log
exactly what properties are changed.
Suggestions for any better approaches and/or improvements to above approaches are welcome
I developed a system like this a few years ago. The idea was to track changes to an object and store those changes in a database, like version control for objects.
The best approach is called Aspect-Oriented Programming, or AOP. You inject "advice" into the setters and getters (actually all method execution, getters and setters are just special methods) allowing you to "intercept" actions taken on the objects. Look into Spring.NET or PostSharp for .NET AOP solutions.
I may not be able to give you a good answer, but I will tell you that in the overwhelming majority of cases, option 1 is NOT a good answer. We're dealing with a very similar reflective "graph-walker" in our project; seemed like a good idea at the time, but it is a nightmare, for the following reasons:
You know the object changed, but without a high level of knowledge in the reflective "change handling" class about the workings of objects above it, you may not know why. If that information is important to you, you have to give it to the change handler, most l;ikely through a field or property on the domain object, requiring changes to your domain and imparting knowledge to the domain about the business logic.
Changes can affect multiple objects, but logs for changes at every level may not be desired; for instance, the client may not want to see a change to a Borrower's outstanding loan count in the log when a new Loan is approved, but they do want to see changes due to consolidations. Managing rules about logging in these cases requires change handling classes to know about more of the structure than just one object, which can very quickly make a change-handling object VERY big, and VERY brittle.
The requirements of your graph walker are probably more than you know; if your object graph includes backreferences or cross-references, the walker must know where it's been, and the simplest comprehensive way to do that is to keep a list of objects it's processed, and check the current object against those it's handled before processing it (making anti-backtracking an N^2 operation). It must also not consider changes to objects in the graph that will not be persisted when you persist the top level (references that are not "cascaded"). NHibernate gives you the ability to plug into its own graph-walker and abide by the cascade rukles in your mappings, which helps, but if you're using a roll-your-own DAL, or you DO want to log changes to objects that NHibernate won't cascade to, you're going to have to set this all up yourself.
A piece of logic in a handler may make a change that requires an update to a "parent" object (updating a calculated field, perhaps). Now, you have to go back and re-evaluate the changed object if the change is of interest to another piece of the change handling logic.
If you have logic that requires creation and persistence of a new object, you must do one of two things; attach the new object to the graph somewhere (where it may or may not be picked up by the walker), or persist the new object in its own transaction (if you're using an ORM, the object CANNOT reference an object from the other graph with a "cascade" setting that will cause it to be saved first).
Finally, being highly reflective in both walking the graph and finding the "handlers" for a particular object, passing a complex tree into such a framework is a guaranteed speed bump in your application.
I think you'll save yourself a lot of headaches if you skip the "change handler" reflective pattern, and include the creation of audit logs or any pre-persistence logic in the "unit of work" you're performing up at the business layer, through a set of "audit loggers". This allows the logic making the changes to employ an algorithm selection pattern such as Command or Strategy to tell your audit framework exactly what kind of change is happening, so it can pick the logger that will produce the required logging messages.
See here how adempiere did the changelog: http://wiki.adempiere.net/Change_Log

Returning multiple values

I have a function that identify coordinates on a page, and I am returning them as a
Dictionary<int, Collection<Rectangle>> GetDocumentCoordinates(int DocumentId)
However, later I need information about each page - if it was validated, what is the page resolution, color/bw, etc. I could create another function and run through pretty much the same result set as the previous function and get that information.
Dictionary<int, PageInfo> GetDocumentAttributes(int DocumentId)
Another alternative would be to add a ref parameter so I can get these values back.
Dictionary<int, Collection<Rectangle>> GetCoordinates(int DocumentId, ref Dictionary<int, PageInfo> PageAttributes)
Yet another alternative is to create an encompassing class that contains the Dictionary and the page information:
class DocumentInfo
{
Dictionary<int, Collection<Rectangle>> Coordinates { get; set;}
Dictionary<int, PageInfo> PageAttributes { get; set; }
}
and then define:
DocumentInfo GetDocumentInfo(int DocumentId);
I'm leaning towards the last option, but your insights are very much appreciated.
The last option is definitely the best. I've found that, when taking or returning complex data with multiple meanings, creating a complex type to encapsulate this data is the best practice for a number of reasons.
First, your return data probably will change as your design changes. Encapsulating this data in an object allows you to alter what it carries and how your methods operate on this data without altering the interfaces of your objects. Obviously, your data object shouldn't implement an interface; at most, have a base class with the minimum interface and then pass references to the base around.
Second, you may find your data gets complex to the point where you will need to perform validation on it. Rather than have this validation in all the methods of your classes where you act upon this data, you can easily wrap this up in the data class. Single responsibility, etc.
It seems like you need a lot of data out. The last option should be fine, and is extensible; it you wanted (to simplify the Dictionary<,> usage), you could encapsulate things a bit more, but the fact that C# doesn't directly support named indexed properties means you'd need a few classes, unless you just wrap with methods like:
class DocumentInfo {
Dictionary<int, Collection<Rectangle>> rectangles = ...
public Collection<Rectangle> GetRectangles(int index) {
return rectangles[index]; // might want to clone to
// protect against mutation
}
Dictionary<int, PageInfo> pages = ...
public PageInfo GetPageInfo(int index) {
return pages[index];
}
}
I'm not quite clear what the int is, so I can't say whether this is sensible (so I've just left it alone).
Also - with the first option, you probably wouldn't need ref - it would be sufficient to use out.

What is a good design when trying to build objects from a list of key value pairs?

So if I have a method of parsing a text file and returning a list of a list of key value pairs, and want to create objects from the kvps returned (each list of kvps represents a different object), what would be the best method?
The first method that pops into mind is pretty simple, just keep a list of keywords:
private const string NAME = "name";
private const string PREFIX = "prefix";
and check against the keys I get for the constants I want, defined above. This is a fairly core piece of the project I'm working on though, so I want to do it well; does anyone have any more robust suggestions (not saying there's anything inherently un-robust about the above method - I'm just asking around)?
Edit:
More details have been asked for. I'm working on a little game in my spare time, and I am building up the game world with configuration files. There are four - one defines all creatures, another defines all areas (and their locations in a map), another all objects, and a final one defines various configuration options and things that don't fit else where. With the first three configuration files, I will be creating objects based on the content of the files - it will be quite text-heavy, so there will be a lot of strings, things like names, plurals, prefixes - that sort of thing. The configuration values are all like so:
-
key: value
key: value
-
key: value
key: value
-
Where the '-' line denotes a new section/object.
Take a deep look at the XmlSerializer. Even if you are constrained to not use XML on-disk, you might want to copy some of its features. This could then look like this:
public class DataObject {
[Column("name")]
public string Name { get; set; }
[Column("prefix")]
public string Prefix { get; set; }
}
Be careful though to include some kind of format version in your files, or you will be in hell's kitchen come the next format change.
Making a lot of unwarranted assumptions, I think that the best approach would be to create a Factory that will receive the list of key value pairs and return the proper object or throw an exception if it's invalid (or create a dummy object, or whatever is better in the particular case).
private class Factory {
public static IConfigurationObject Factory(List<string> keyValuePair) {
switch (keyValuePair[0]) {
case "x":
return new x(keyValuePair[1]);
break;
/* etc. */
default:
throw new ArgumentException("Wrong parameter in the file");
}
}
}
The strongest assumption here is that all your objects can be treated partly like the same (ie, they implement the same interface (IConfigurationObject in the example) or belong to the same inheritance tree).
If they don't, then it depends on your program flow and what are you doing with them. But nonetheless, they should :)
EDIT: Given your explanation, you could have one Factory per file type, the switch in it would be the authoritative source on the allowed types per file type and they probably share something in common. Reflection is possible, but it's riskier because it's less obvious and self documenting than this one.
What do you need object for? The way you describe it, you'll use them as some kind (of key-wise) restricted map anyway. If you do not need some kind of inheritance, I'd simply wrap a map-like structure into a object like this:
[java-inspired pseudo-code:]
class RestrictedKVDataStore {
const ALLOWED_KEYS = new Collection('name', 'prefix');
Map data = new Map();
void put(String key, Object value) {
if (ALLOWED_KEYS.contains(key))
data.put(key, value)
}
Object get(String key) {
return data.get(key);
}
}
You could create an interface that matched the column names, and then use the Reflection.Emit API to create a type at runtime that gave access to the data in the fields.
EDIT:
Scratch that, this still applies, but I think what your doing is reading a configuration file and parsing it into this:
List<List<KeyValuePair<String,String>>> itemConfig =
new List<List<KeyValuePair<String,String>>>();
In this case, we can still use a reflection factory to instantiate the objects, I'd just pass in the nested inner list to it, instead of passing each individual key/value pair.
OLD POST:
Here is a clever little way to do this using reflection:
The basic idea:
Use a common base class for each Object class.
Put all of these classes in their own assembly.
Put this factory in that assembly too.
Pass in the KeyValuePair that you read from your config, and in return it finds the class that matches KV.Key and instantiates it with KV.Value
public class KeyValueToObjectFactory
{
private Dictionary _kvTypes = new Dictionary();
public KeyValueToObjectFactory()
{
// Preload the Types into a dictionary so we can look them up later
// Obviously, you want to reuse the factory to minimize overhead, so don't
// do something stupid like instantiate a new factory in a loop.
foreach (Type type in typeof(KeyValueToObjectFactory).Assembly.GetTypes())
{
if (type.IsSubclassOf(typeof(KVObjectBase)))
{
_kvTypes[type.Name.ToLower()] = type;
}
}
}
public KVObjectBase CreateObjectFromKV(KeyValuePair kv)
{
if (kv != null)
{
string kvName = kv.Key;
// If the Type information is in our Dictionary, instantiate a new instance of that class.
Type kvType;
if (_kvTypes.TryGetValue(kvName, out kvType))
{
return (KVObjectBase)Activator.CreateInstance(kvType, kv.Value);
}
else
{
throw new ArgumentException("Unrecognized KV Pair");
}
}
else
{
return null;
}
}
}
#David:
I already have the parser (and most of these will be hand written, so I decided against XML). But that looks like I really nice way of doing it; I'll have to check it out. Excellent point about versioning too.
#Argelbargel:
That looks good too. :')
...This is a fairly core piece of the
project I'm working on though...
Is it really?
It's tempting to just abstract it and provide a basic implementation with the intention of refactoring later on.
Then you can get on with what matters: the game.
Just a thought
<bb />
Is it really?
Yes; I have thought this out. Far be it from me to do more work than neccessary. :')

Categories