Adding side notes to each property in MongoDB document

Adding side notes to each property in MongoDB document - c#

I have a collection with a collection of documents. Each document has around 20 different properties with different data types (e.g. Int, Double, String).
I am searching for an efficient way or the appropriate way to add side notes to each property.
My thought (I am using C# to model the document structure) is for each property, instead of
:
public int PageRank {get; set; }
to use:
public Dictionary<int, string> PageRank {get; set;}
This means that each item in the document is a collection of both the value and the string for the side note.
The side notes will be seen at the front-end by the user.
Any better implementation?

Idan, for performance reasons, you should consider your use case from the MongoDB point of view -- not from the object oriented language point of view. The way it ends up looking in C# is an afterthought -- its the DB performance that counts. So, when querying your documents, if the side notes are mostly not needed, it will be better to place them into a separate collection (possibly) thus reducing the size of each document and enabling MongoDB to read more of them into the available memory. If the user does need to look at the side notes, you would do this with a separate query. You know your usage scenario better, so its up to you to decide how to do this, but its these kinds of design decisions that you need to concern yourself with -- and the C# code will be shaped according to your schema

Related

Dynamic form with no real OOP or objects?

I am tacking a large refactor of a project, and I had asked this question to confirm/understand the direction I should go in and I think I got the answer that I wanted, which is not to throw away years worth of code. So, now begins the challenge of refactoring the code. I've been reading Martine Fowler and Martin Feathers' books, and they have a lot of insight, but I am looking for advice on the ultimate goal of where I want the application to be.
So to reiterate the application a little bit, its a dynamic forms system, with lots of validation logic and data logic between the fields. The main record that gets inserted is the set of form fields that is on the page. Another part of it is 'Actions' that you can do for a person. These 'Actions' can differ client by client, and there are hundreds of 'Actions'. There is also talk that we can somehow make an engine that can eventually take on other similar areas, where a 'person' can be something else (such as student, or employee). So I want to build something very de-coupled. We have one codebase, but different DBs for different clients. The set of form fields on the page are dynamic, but the DB is not - it is translated into the specific DB table via stored procs. So, the generic set of fields are sent to the stored proc and the stored proc then decides what to do with the fields (figure out which table it needs to go to). These tables in fact are pretty static, meaning that they are not really dynamic, and there is a certain structure to it.
What I'm struggling specifically is how to setup a good way to do the dynamic form control page. It seems majority of the logic will be in code on the UI/aspx.cs page, because its loading controls onto the webpage. Is there some way I can do this, so it is done in a streamlined fashion, so the aspx.cs page isn't 5000 lines long? I have a 'FORM' object, and one of the properties is its' 'FIELDS'. So this object is loaded up in the business layer and the Data layer, but now on the fron end, it has to loop through the FIELDS and output the controls onto the page. Also, someway to be able to control the placement would be useful, too - not sure how do get that into this model....
Also, from another point of view - how can I 'really' get this into an object-oriented-structure? Because technically, they can create forms of anything. And those form fields can represent any object. So, for example, today they can create a set of form fields, that represent a 'person' - tomorrow they can create a set of form fields that represent a 'furniture'. How can I possibly translate this to to a person or a furniture object (or should I even be trying to?). And I don't really have controls over the form fields, because they can create whatever....
Any thought process would be really helpful - thanks!

How can I possibly translate this to to a person or a furniture object
(or should I even be trying to?)
If I understand you correctly, you probably shouldn't try to convert these fields to specific objects since the nature of your application is so dynamic. If the stored procedures are capable of figuring out which combination of fields belongs to which tables, then great.
If you can change the DB schema, I would suggest coming up with something much more dynamic. Rather than have a single table for each type of dynamic object, I would create the following schema:
Object {
ID
Name
... (clientID, etc.) ...
}
Property {
ID
ObjectID
Name
DBType (int, string, object-id, etc.)
FormType ( textbox, checkbox, etc.)
[FormValidationRegex] <== optional, could be used by field controls
Value
}
If you can't change the database schema, you can still apply the following to the old system using the stored procedures and fixed tables:
Then when you read in a specific object from the database, you can loop through each of the properties and get the form type and simple add the appropriate generic form type to the page:
foreach(Property p in Object.Properties)
{
switch(p.FormType)
{
case FormType.CheckBox:
PageForm.AddField(new CheckboxFormField(p.Name, p.Value));
break;
case FormType.Email:
PageForm.AddField(new EmailFormField(p.Name, p.Value));
break;
case FormType.etc:
...
break;
}
}
Of course, I threw in a PageForm object, as well as CheckboxFormField and EmailFormField objects. The PageForm object could simply be a placeholder, and the CheckboxFormField and EmailFormField could be UserControls or ServerControls.
I would not recommend trying to control placement. Just list off each field one by one vertically. This is becoming more and more popular anyway, even with static forms who's layout can be controlled completely. Most signup forms, for example, follow this convention.
I hope that helps. If I understood your question wrong, or if you'd like further explanations, let me know.

Not sure I understand the question. But there's two toolboxes suitable for writing generic code. It's generics, and it's reflection - typically in combination.
I don't think I really understand what you're trying to do, but a method using relfection to identify all the properties of an object might look like this:
using System.Reflection;
(...)
public void VisitProperties(object subject)
{
Type subjectType = subject.GetType();
foreach (PropertyInfo info in subjectType.GetProperties()
{
object value = info.GetValue(subject, null);
Console.WriteLine("The name of the property is " + info.Name);
Console.WriteLine("The value is " + value.ToString());
}
}
You can also check out an entry on my blog where I discuss using attributes on objects in conjunction with reflection. It's actually discussing how this can be utilized to write generic UI. Not exactly what you want, but at least the same principles could be used.
http://codepatrol.wordpress.com/2011/08/19/129/
This means that you could create your own custom attributes, or use those that already exists within the .NET framework already, to describe your types. Attributes to specify rules for validation, field label, even field placement could be used.
public class Person
{
[FieldLabel("First name")]
[ValidationRules(Rules.NotEmpty | Rules.OnlyCharacters)]
[FormColumn(1)]
[FormRow(1)]
public string FirstName{get;set;}
[FieldLabel("Last name")]
[ValidationRules(Rules.NotEmpty | Rules.OnlyCharacters)]
[FormColumn(2)]
[FormRow(1)]
public string LastName{get;set;}
}
Then you'd use the method described in my blog to identify these attributes and take the apropriate action - e.g. placing them in the proper row, giving the correct label, and so forth. I won't propose how to solve these things, but at least reflection is a great and simple tool to get descriptive information about an unknown type.

I found xml invaluable for this same situation. You can build an object graph in your code to represent the form easily enough. This object graph can again be loaded/saved from a db easily.
You can turn your object graph into xml & use xslt to generate the html for display. You now also have the benefit of customising this transform for differnetn clients/versions/etc. I also store the xml in the database for performance & to give me a publish function.
You need some specific code to deal with the incoming data, as you're going to be accessing the raw request post. You need to validate the incoming data against what you think you was shown. That stops people spoofing/meddling with your forms.
I hope that all makes sense.

MongoDb and self referencing objects

I am just starting to learn about mongo db and was wondering if I am doing something wrong....I have two objects:
public class Part
{
public Guid Id;
public ILIst<Materials> Materials;
}
public class Material
{
public Guid MaterialId;
public Material ParentMaterial;
public IList<Material> ChildMaterials;
public string Name;
}
When I try to save this particular object graph I receive a stack overflow error because of the circular reference. My question is, is there a way around this? In WCF I am able to add the "IsReference" attribute on the datacontract to true and it serializes just fine.

What driver are you using?
In NoRM you can create a DbReference like so
public DbReference<Material> ParentMaterial;
Mongodb-csharp does not offer strongly typed DbReferences, but you can still use them.
public DBRef ParentMaterial;
You can follow the reference with Database.FollowReference(ParentMaterial).

Just for future reference, things like references between objects which are not embedded within a sub-document structure, are handled extremely well by a NoSQL ODB, which is generally designed to deal with transparent relations in arbitrarity complex object models.
If you are familiar with Hibernate, imagine that without any mapping file AT ALL and orders of magnitude faster performance because there is no runtime JOIN behind the scenes, all relations are resolved with the speed of a b-tree lookup.
Here is a video from Versant (disclosure - I work for them), so you can see how it works.
This is a little boring in the beginning, but shows every single step to take a Java application and make it persistent in an ODB... then make it fault tolerant, distributed, do some parallel queries, optimize cache load, etc...
If you want to skip to the cool part, jump about 20 minutes in and you will avoid the building of the application and just see the how easy it is to dynamically evolve schema, add distribution and fault tolerance to any existing application ):

If you want to store object graphs with relationships between them requiring multiple 'joins' to get to the answer you are probably better off with a SQL-style database. The document-centric approach of MongoDB and others would probably structure this rather differently.
Take a look at MongoDB nested sets which suggests some ways to represent data like this.

I was able to accomplish exactly what I needed by using a modified driver from NoRM mongodb.

c#: Using Assemblies (via Reflection) as a (meta)data store

SOME CONTEXT
one of my projects requires carrying around some of "metadata" (yes I hate using that word).
What the metadata specifically consists of is not important, only that it's more complex than a simple "table" or "list" - you could think of it as a mini-database of information
Currently I have this metadata stored in an XML file and have an XSD that defines the schema.
I want to package this metadata with my project, currently that means keeping the XML file as a resource
However, I have been looking for a more strongly-typed alternative. I am considering moving it from an XML file to C# code - so instead of using XML apis to traverse my metadata, relying on .NET code via reflection on types
Besides the strong(er) typing, some useful characteristics I see from using an assembly are for this: (1) I can refactor the "schema" to some extent with tools like Resharper, (2) The metadata can come with code, (3) don't have to rely on any external DB component.
THE QUESTIONS
If you have tried something like this, I am curious about what you learned.
Was your experience positive?
What did you learn?
What problems in this approach did you uncover?
What are some considerations I should take into account?
Would you do this again?
NOTES
Am not asking for how to use Reflection - no help is needed there
Am fundamentally asking about your experiences and design considerations
UPDATE: INFORMATION ABOUT THE METADATA
Because people are asking I'll try describing the metadata a bit more. I'm trying to abstract a bit - so this will seem a bit artificial.
There are three entities in the model:
A set of "groups" - each group has a unique name and several properites (usually int values that represent ID numbers of some kind)
Each "group" contains 1 or more "widgets" (never more than 50) - each item has properties like name (therea are multiple names), IDs, and various boolean properties.
Each widget contains a one or more "scenarios". Each "scenario" is documentation- a URL to a description of how to use the widget.
Typically I need to run these kinds of "queries"
Get the names of all the widgets
Get the names of all groups that contain at least one widget where BoolProp1=true
Get given the ID of a widget, which group contains that widget
How I was thinking about modelling the entities in the assembly
There are 3 classes: Group, Widget, Documentation
There are 25 Groups so I will have 25 Group classes - so "FooGroup" will derive from Group, same pattern follows for widgets and documentation
Each class will have attributes to account for names, ids, etc.

I have used and extended Metadata for a large part of my projects, many of them related to describing components, relationships among them, mappings, etc.
(Major categories of using attributes extensively include O/R Mappers, Dependency Injection framework, and Serialization description - specially XML Serialization)
Well, I'm going to ask you to describe a little bit more about the nature of the data you want to embed as resource. Using attributes are naturally good for the type of data that describes your types and type elements, but each usage of attributes is a simple and short one. Attributes (I think) should be very cohesive and somehow independent from each other.
One of the solutions that I want to point you at, is the "XML Serialization" approach. You can keep your current XMLs, and put them into your assemblies as Embedded Resource (which is what you've probably done already) and read the whole XML at once into a strongly-typed hierarchy of objects.
XML Serialization is very very simple to use, much simpler than the typical XML API or even LINQ2XML, in my opinion. It uses Attributes to map class properties to XML elements and XML attributes. Once you've loaded the XML into the objects, you have everything you want in the memory as "typed" data.
Based on what I understand from your description, I think you have a lot of data to be placed on a single class. This means a large and (in my opinion) ugly attribute code above the class. (Unless you can distribute your data among members making each of them small and independent, which is nice.)
I have many positive experiences using XML Serialization for large amount of data. You can arrange data as you want, you get type safety, you get IntelliSence (if you give your XSD to visual studio), and you also get half of the Refactoring. ReSharper (or any other refactoring tool that I know of) don't recognize XML Serialization, so when you refactor your typed classes, it doesn't change the XML itself, but changes all the usage of the data.
If you give me more details on what your data is, I might be able to add something to my answer.
For XML Serialization samples, just Google "XML Serialization" or look it up in MSDN.
UPDATE
I strongly recommend NOT using classes for representing instances of your data. Or even using a class to encapsulate data is against its logical definition.
I guess your best bet would be XML Serialization, provided that you already have your data in XML. You get all the benefits you want, with less code. And you can perform any query on the XML Serializable objects using LINQ2Objects.
A part of your code can look like the following:
[XmlRoot]
public class MyMetadata
{
[XmlElement]
public Group[] Groups { get; set; }
}
public class Group
{
[XmlAttribute]
public string Name { get; set; }
[XmlAttribute]
public int SomeNumber { get; set; }
[XmlElement]
public Widget[] Widgets { get; set; }
}
public class Widget
{
...
}
You should call new XmlSerializer(typeof(MyMetadata)) to create a serializer, and call its Deserialize method giving it the stream of your XML, and you get a filled instance of MyMetadata class.

It's not clear from your description but it sounds like you have assembly-level metadata that you want to be able to access (as opposed to type-level). You could have a single class in each assembly that implements a common interface, then use reflection to hunt down that class and instantiate it. Then you can hard-code the metadata within.
The problems of course are the benefits that you lose from the XML -- namely that you can't modify the metadata without a new build. But if you're going this direction you probably have already taken that into account.

How to handle multiple object types when creating a new Type

Been tasked to write some asset tracking software...
Want to try to do this the right way. So I thought that a lot of assets had common fields.
For instance, a computer has a model and a manufacturer which a mobile phone also has.
I would want to store computers, monitors, mobile phones, etc. So I thought the common stuff can be taken into account using an abstract base class. The other properties that do not relate to one another would be stored in the actual class itself.
For instance,
public abstract class Asset {
private string manufacturer;
public string Manufacturer { get; set; }
//more common fields
}
public class Computer : Asset {
private string OS;
public strin OS { get; set; }
//more fields pertinent to a PC, but inherit those public properties of Asset base
}
public class Phone : Asset {
//etc etc
}
But I have 2 concerns:
1)If I have a web form asking someone to add an asset I wanted to give them say a radio box selection of the type of asset they were creating. Something to the effect of:
What are you creating
[]computer
[]phone
[]monitor
[OK] [CANCEL]
And they would select one but I dont want to end up with code like this:
pseudocode:
select case(RadioButtonControl.Text)
{
case "Computer": Computer c = new Computer(args);
break;
case "Phone": Phone p = new Phone(args);
break;
....
}
This could get ugly....
Problem 2) I want to store this information in one database table with a TypeID field that way when an Insert into the database is done this value becomes the typeid of the row (distinguishes whether it is a computer, a monitor, a phone, etc). Should this typeid field be declared inside the base abstract class as some sort of enum?
Thanks

My advice is to avoid this general design altogether. Don't use inheritance at all. Object orientation works well when different types of objects have different behavior. For asset tracking, none of the objects really has any behavior at all -- you're storing relatively "dumb" data, none of which does (or should) really do anything at all.
Right now, you seem to be approaching this as an object oriented program with a database as a backing store (so to speak). I'd reverse that: it's a database with a front-end that is (or at least might be) object oriented.
Then again, unless you have some really specific and unusual needs in your asset tracking, chances are that you shouldn't do this at all. There are literally dozens of perfectly reasonable asset tracking packages already on the market. Unless your needs really are pretty unusual, reinventing this particular wheel won't accomplish much.
Edit: I don't intend to advise against using OOP within the application itself at all. Quite the contrary, MVC (for example) works quite well, and I'd almost certainly use it for almost any kind of task like this.
Where I'd avoid OOP would be in the design of the data being stored. Here, you benefit far more from using something like an SQL-based database via something like OLE DB, ODBC, or JDBC.
Using a semi-standard component for this will give you things like scalability and incremental backup nearly automatically, and is likely to make future requirements (e.g. integration with other systems) considerably easier, as you'll have a standardized, well understood layer for access to the data.
Edit2: As far as when to use (or not use) inheritance, one hint (though I'll admit it's no more than that) is to look at behaviors, and whether the hierarchy you're considering really reflects behaviors that are important to your program. In some cases, the data you work with are relatively "active" in the program -- i.e. the behavior of the program itself revolves around the behavior of the data. In such a case, it makes sense (or at least can make sense) to have a relatively tight relationship between the data and the code.
In other cases, however, the behavior of the code is relatively unaffected by the data. I would posit that asset tracking is such a case. To the asset tracking program, it doesn't make much (if any) real difference whether the current item is a telephone, or a radio, or a car. There are a few (usually much broader) classes you might want to take into account -- at least for quite a few businesses, it matters whether assets are considered "real estate", "equipment", "office supplies", etc. These classifications lead to differences in things like how the asset has to be tracked, taxes that have to be paid on it, and so on.
At the same time, two items that fall under office supplies (e.g. paper clips and staples) don't have significantly different behaviors -- each has a description, cost, location, etc. Depending on what you're trying to accomplish, each might have things like a trigger when the quantity falls below a certain level, to let somebody know that it's time to re-order.
One way to summarize that might be to think in terms of whether the program can reasonably work with data for which it wasn't really designed. For asset tracking, there's virtually no chance that you can (or would want to) create a class for every kind of object somebody might decide to track. You need to plan from the beginning on the fact that it's going to be used for all kinds of data you didn't explicitly account for in the original design. Chances are that for the majority of items, you need to design your code to be able to just pass data through, without knowing (or caring) much about most of the content.
Modeling the data in your code makes sense primarily when/if the program really needs to know about the exact properties of the data, and can't reasonably function without it.

What is the most appropriate design for an object who's database key is made up of multiple columns?

Suppose I have a table in my database that is made up of the following columns, 3 of which uniquely identify the row:
CREATE TABLE [dbo].[Lines]
(
[Attr1] [nvarchar](10) NOT NULL,
[Attr2] [nvarchar](10) NOT NULL,
[Attr3] [nvarchar](10) NOT NULL,
PRIMARY KEY (Attr1, Attr2, Attr3)
)
Now, I have an object in my application that represents one of those lines. It has three properties on it that correspond to the three Attr columns in the database.
public class Line
{
public Line(string attr1, string attr2, string attr3)
{
this.Attr1 = attr1;
this.Attr2 = attr2;
this.Attr3 = attr3;
}
public Attr1 {get; private set;}
public Attr2 {get; private set;}
public Attr3 {get; private set;}
}
There's a second object in the application that stores a collection of these line objects.
Here's the question: What is the most appropriate design when referencing an individual line in this collection (from a caller's perspective)? Should the caller be responsible for tracking the index of the line he's changing and then just use that index to modify a line directly in the collection? Or...should there be method(s) on the object that says something to the effect of:
public GetLine(string attr1, string attr2, string attr3)
{
// return the line from the collection
}
public UpdateLine(Line line)
{
// update the line in the collection
}
We're having a debate on our team, because some of us think that it makes more sense to reference a line using their internal index in the collection , and others think there's no reason to have to introduce another internal key when we can already uniquely identify a line based on the three attributes.
Thoughts?

Your object model should be designed so that it makes sense to an object consumer. It should not be tied to the data model to the greatest extent practical.
It sounds like it is more intuitive for the object consumer to think in terms of the three attributes. If there are no performance concerns that speak to the contrary, I would let the object consumer work with those attributes and not concern him with the internal workings of data storage (i.e. not require them to know or care about an internal index).

I think the base question you are encountering is how much control the user of your API should have over your data, and what exactly you expose. This varies wildly depending on what you want to do, and either can be appropriate.
The question is, who is responsible for the information you wish to update. From what you have posted, it appears that the Line object is responsible the information, and thus I would advocate a syntax such as Collection.GetLine(attr1, attr2, attr3).UpdateX(newX) and so forth.
However, it may be that the collection actually has a greater responsibility to that information, in which case Collection.UpdateX(line, newX) would make more sense (alternatively, replace the 'line' arg with 'attr1, attr2, attr2').
Thirdly, it is possible, though unlikely (and rarely the best design IMHO) that the API user is most responsible for the information, in which case an approach you mentioned where the user handles tracking Line indices and directly modifies information.

You do not want the calling object to "track the index of the line he's changing" - ever. This makes your design way too interdependent, pushes object-level implementation decisions off onto the users of the object, makes testing more difficult, and can result in difficult to diagnose bugs when you accidentally update one object (due to key duplications) when you meant to update another.
Go back to OO discipline: the Line object that you are returning from the GetLine method should be acting like a real, first class "thing."
The complication, of course, comes if you change one of the fields in the line object that is used as part of your index. If you change one of these fields, you won't be able to find the original in the database when you go to do your update. Well, that is what data hiding in objects is all about, no?
Here is my suggestion, have three untouchable fields in the object that correspond to its state in the database ("originalAttr1", "originalAttr2", "originalAttr3"). Also, have three properties ("attr1", "attr2", "attr3") that start out with the same values as the originals but that are Settable. Your Getters and Setters will work on the attr properties only. When you "Update" (or perform other actions that go back to the underlying source), use the originalAttrX values as your keys (along with uniqueness checks, etc.).
This might seem like a bit of work but it is nothing compared to the mess that you'll get into if you push all of these implementation decisions off on the consumer of the object! Then you'll have all of the various consumers trying to (redundantly) apply the correct logic in a consistent manner - along with many more paths to test.
One more thing: this kind of stuff is done all the time in data access libraries and so is a quite common coding pattern.

What is the most appropriate design
when referencing an individual line in
this collection (from a caller's
perspective)?
If the caller is 'thinking' in terms of the three attributes, I would consider adding an indexer to your collection class that's keyed on the three attributes, something like:
public Line this[string attr1, string attr2, string attr3] {
get {
// code to find the appropriate line...
}
}
Indexers are the go-to spot for "How Do I Fetch Data From This Collection" and, IMO, are the most intuitive accessor to any collection.

I always prefer to just use a single column ID column even if there is a composite key that can be used. I would just add an identity column to the table and use that for look up instead. Also, it would be faster because query for a single int column would perform better than a key spanned across three text columns.
Having a user maintain some sort of line index to look up a line doesn't seem very good to me. So if I had to pick between the two options you posed though, I would use the composite key.

If the client is retrieving the Line object using three string values, then that's what you pass to the getter method. From that point on, everything necessary to update the object in the database (such as a unique row ID) should be hidden within the Line object itself.
That way all the gory details are hidden from the client, which protects the client from damaging it, and also protects the client from any future changes you might make to the dB access within the Line object.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.