MongoDB relationships for objects - c#

Please excuse my english, I'm still trying to master it.
I've started to learn MongoDB (coming from a C# background) and I like the idea of what is MongoDB. I have some issues with examples on the internet.
Take the popular blog post / comments example. Post has none or many Comments associated with it. I create Post object, add a few Comment objects to the IList in Post. Thats fine.
Do I add that to just a "Posts" Collection in MonoDB or should I have two collections - one is blog.posts and blog.posts.comments?
I have a fair complicated object model, easiest way to think of it is as a Banking System - ours is mining. I tried to highlight tables with square brackets.
[Users] have one or many [Accounts] that have one or many [Transactions] which has one and only one [Type]. [Transactions] can have one or more [Tag] assigned to the transaction. [Users] create their own [Tags] unique to that user account and we sometimes need to offer reporting by those tags (Eg. for May, tag drilling-expense was $123456.78).
For indexing, I would have thought seperating them would be good but I'm worried it is bad practice this thinking from old RBDMS days.
In a way, its like the blog example. I'm not sure if I should have 1 [Account] Collection and persist all information there, or have an intermediate step that splits it up to seperate collections.
The other related query is, when you persist back and forth, do you usually get back everything associated with that record - even if not required or do you limit?

It depends.
It depends on how many of each of these type of objects you expect to have. Can you fit them all into a single MongoDB document for a given User? Probably not.
It depends on the relationships - is user-Account a one-to-many or a many-to-many relationship? If it's one to many and the number of Accounts is small you might chose to put them in an IList on a User document.
You can still model relationships in MongoDB with separate collections BUT there are no joins in the database so you have to do that in code. Loading a User and then loading their Accounts might be just fine from a performance perspective.
You can index INTO arrays on documents. Don't think of an Index as just being an index on a simple field on a document (like SQL). You can use, say, a Tag collection on a document and index into the tags. (See http://www.mongodb.org/display/DOCS/Indexes#Indexes-Arrays)
When you retrieve or write data you can do a partial read and a partial write of any document. (see http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields)
And, finally, when you can't see how to get what you want using collections and indexes, you might be able to achieve it using map reduce. For example, to find all the tags currently in use sorted by their frequency of use you would map each document emitting the tags used in it, and then you would reduce that set to get the result you want. You might then store the result of that map reduce permanently and only up date it when you need to.
One further concern: You mention calculating totals by tag. If you want accounting-quality transactional consistency, MongoDB might not be the right choice for you. "Eventual-consistency" is the name of the game for NoSQL data stores and they generally aren't a good fit for financial transactions. For example, it doesn't matter if one user sees a blog post with 3 comments while another sees 4 because they hit different replica copies that aren't in sync yet, but for a financial report, that kind of consistency does matter - your report might not add up!

Related

LINQ - Select all in parent-child hierarchy

I was wondering if there is a neat way do to this, that DOESN'T use any kind of while loop or similar, preferably that would run against Linq to Entities as a single SQL round-trip, and also against Linq To Objects.
I have an entity - Forum - that has a parent-child relationship going on. That is, a Forum may (or in the case of the top level, may not) have a ParentForum, and may have many ChildForums. A Forum then contains many Posts.
What I'm after here is a way to get all the Posts from a tree of Forums - i.e. the Forum in question, and all it's children, grandchildren etc. I don't know in advance how many sub-levels the Forum in question may have.
(Note - I know this example isn't necessarily a valuable use case, but the Forum object model one is one that is familiar to most people, and so serves as a generic and accessible premise rather than my actual domain model.)
One possible way would be if your actual data tables were stored using a left/right tree (example here: http://www.sitepoint.com/hierarchical-data-database-2/ . Note, that example is in MySQL/PHP, but it's trivial to implement). Using this, you can find out all forums that fall within a parent's left/right values and given that, you can retrieve all posts who's forum IDs is IN those forum IDs.
I'm sure you might get a few proper answers regarding the Linq queries. I'm posting this as an advisory when it comes to the SQL side of things.
I had a similar issue with a virtual filesystem in SQL. I needed to be able to query files in folders recursively - with folders, of course, having a recursive parent-child relationship. I also needed it to be fast, and I certainly didn't want to be dropping back to client-side processing.
For performance I ended up writing stored procedures and inline functions - unfortunately much too complicated to post here (and I might get the sack for sharing company code!). The key, however, was to learn how to work with Recursive CTEs http://msdn.microsoft.com/en-us/library/ms186243.aspx. It took me a few days to nail it but the performance is incredible (they are very easy to get wrong though - so pay attention to the query plans).

How to deal with large objects?

I have 5 types of objects: place info (14 properties),owner company info (5 properties), picture, ratings (stores multiple vote results), comments.
All those 5 objects will gather to make one object (Place) which will have all the properties and information about all the Place's info, pictures, comments, etc
What I'm trying to achieve is to have a page that displays the place object and all it's properties. another issue, if I want to display the Owner Companies' profiles I'll have object for each owner company (but I'll add a sixth property which is a list of all the places they own)
I've been practicing for a while, but I never got into implementing and performance experience, but I sensed that it was too much!
What do you think ?
You have to examine the use case scenarios for your solution. Do you need to always show all of the data, or are you starting off with displaying only a portion of it? Are users likely to expand any collapsed items as part of regular usage or is this information only used in less common usages?
Depending on your answers it may be best to fetch and populate the entire page with all of the data at once, or it may be the case that only some data is needed to render the initial screen and the rest can be fetched on-demand.
In most cases the best solution is likely to involve fetching only the required data and to update the page dynamically using ajax queries as needed.
As for optimizing data access, you need to strike a balance between the number of database requests and the complexity of each individual request. Because of network latency it is often important to fetch as much as possible using as few queries as possible, even if this means you'll sometimes be fetching data that you do not always need. But if you include too much data in a single query, then computing all the joins may also be costly. It is quite rare to see a solution in which it is better to first fetch all root objects and then for every element go fetch some additional objects associated with that element. As such, design your solution to fetch all data at once, but include only what you really need and try to keep the number of involved tables to a minimum.
You have 3 issues to deal with really, and they are often split into DAL, BLL and UI
Your objects obviously belong in the BLL and if you're considering performance then you need to consider how your objects will be created and how they interface to the DAL. I have many objects with 50-200 properties so 14 properties is really no issue.
The UI side of it is seperate, and if you're considering the performance of displaying a lot of information onto a single page you'll consider tabbed content, grids etc.
Tackle it one thing at a time and see where your problems lie.

DB design when data is unknown about an entity?

I'm wondering if the following DB schema would have repercussions later. Let's say I'm writing a place entity. I'm not certain what properties of place will be stored in the DB. I'm thinking of making two tables: one to hold the required (or common) info, and one to hold additional info.
Table 1 - Place
PK PlaceId
Name
Lat
Lng
etc... (all the common fields)
Table 2 - PlaceData
PK DataId
PK FieldName
PK FK PlaceId
FieldData
Usage Scenario
I want certain visitors to have the capability of entering custom fields about a place. For example, a restaurant is a place that may have the following fields: HasParking, HasDriveThru, RequiresReservation, etc... but a car dealer is also a place, and those fields wouldn't make sense for a car dealer.
I want to support any type of place, from a single table (well, 2nd table has custom fields), because I don't know the number of types of places that will eventually be added to my site.
Overall goal
On my asp.net MVC (C#/Razor) site, where I display a place, it will show the attributes, as a unordered list populated by: SELECT * FROM PlaceData WHERE PlaceId = #0.
This way, I wouldn't need to show empty field names on the view (or do a string.IsNullOrWhitespace() check for each and every field. Which I would be forced to do if every attribute was a column on the table.
I'm assuming this scenario is quite common, but are there better ways to do it? Particularly from a performance perspective? What are the major drawbacks of this schema?
Your idea is referred to as an Entity-Attribute-Value table and is generally bad news in a RDBMS. RDBMSes are geared toward highly structured data.
The overall options are:
Model the db further in an RDBMS, which is most likely if someone is holding back specs from you.
Stick with the RDBMS, using XML columns for the data whose structure is variable. This makes the most sense if a relatively small portion of your data storage schema is semi- or un-structured. Speaking from a MS SQL Server perspective, this data can be indexed and you can perform checks that your data complies with an XML schema definition.
Move to a non-relational DB such as MongoDB, Cassandra, CouchDB, etc. This is what a lot of social sites and I suspect blog sites run with. Also, it is within reason to use a combination of RDBMS and non-relational stores if that's what your needs call for.
EAV gets to be a mess because you're creating a database within a database and lose all of the benefits a RDBMS can provide (foreign keys, data type enforcement, etc.) and the SQL code needed to reconstruct your objects goes from lasagna to fettuccine to spaghetti in the blink of an eye.
Given the information that's been added to the question, it would seem a good fit to create a PlaceDetails column of type XML in the Place table. You could also split that column into another table with a 1:1 relationship if performance requirements dictate it.
The upside to doing it that way is that you can retrieve the data using very simple SQL code, even using the xml data type's methods for searching the data. But that approach also allows you to do the more complex presentation-oriented data parsing in C#, which is better suited to that purpose than T-SQL is.
If you want your application to be able to create its own custom fields, this is a fine model. The Mantis Bugtracker uses this as well to allow Admins to add custom fields to their tickets.
If in any case, it's going to be the programmer that is going to create the field, I must agree with pst that this is more a premature optimization.
At any given time you can add new columns to the database (always watching for the third normalization rule) so you should go with what you want and only create a second table if needed or if such columns breaks any of the normal forms.

Customizeable database

What would be the best database/technique to use if I'd like to create a database that can "add", "remove" and "edit" tables and columns?
I'd like it to be scaleable and fast.
Should I use one table and four columns for this (Id, Table, Column, Type, Value) - Is there any good articles about this. Or is there any other solutions?
Maybe three tables: One that holds the tables, one that holds the columns and one for the values?
Maybe someone already has created a db for this purpose?
My requirements is that I'm using .NET (I guess the database don't have to be on windows, but I would prefer that)
Since (in comments on the question) you are aware of the pitfalls of the "inner platform effect", it is also true that this is a very common requirement - in particular to store custom user-defined columns. And indeed, most teams have needed this. Having tried various approaches, the one which I have found most successful is to keep the extra data in-line with the record - in particular, this makes it simple to obtain the data without requiring extra steps like a second complex query on an external table, and it means that all the values share things like timestamp/rowversion for concurrency.
In particular, I've found a CustomValues column (for example text or binary; typically json / xml, but could be more exotic) a very effective way to work, acting as a property-bag for the additional data. And you don't have to parse it (or indeed, SELECT it) until you know you need the extra data.
All you then need is a way to tie named keys to expected types, but you need that metadata anyway.
I will, however, stress the importance of making the data portable; don't (for example) store any specific platform-bespoke serialization (for example, BinaryFormatter for .NET) - things like xml / json are fine.
Finally, your RDBMS may also work with this column; for example, SQL Server has the xml data type that allows you to run specific queries and other operations on xml data. You must make your own decision whether that is a help or a hindrance ;p
If you also need to add tables, I wonder if you are truly using the RDBMS as an RDBMS; at that point I would consider switching from an RDBMS to a document-database such as CouchDB or Raven DB

SQL Server for C# Programmers [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm a pretty good C# programmer who needs to learn SQL Server. What's the best way for me to learn SQL Server/Database development?
Note: I'm a total newb when it comes to DB's and SQL.
SQL is about set theory, or more correctly, relational algebra. Read a brief primer on that. And learn to think in sets, not in procedures.
On the practical side, there are four fundamental operations,
selects, which show some projection of a table(s) data
deletes, which remove some subset of a table's rows,
inserts, which add rows to a table,
updates, which (possibly) change data in a table
(By subset, I mean any subset, including the empty set, and not necessarily a proper subset.)
Anywhere I can write a column name in DDL (except as the target of an update), I can write an expression that uses column names, functions, or constants.
select 1, 2, 3 from table will return the resultset "1 2 3", once for each row in the table. If the column named create_date is of type date, and the function month returns a month number given a date, select month( create_date) from table will show me the month number for each create_date.
A where clause is a predicate that restricts rows selected, or deleted, or updated to those rows for which the predicate is true. A where cause can be composed of an arbitrary number of predicates connected by the logical operators and or and not. Just like the column list in a select, I can use column names, functions, and constants in my where clause. What result set do you think is returned from select * from table where 1 = 1;?
In a query, tables are related by joins, in which some datum or key in one is related by an operator to a datum or key in another table. The relational operator is often equality, but can in fact be any binary operator or even a function.
Tables are related, as I mentioned above, by keys; a row in a table may relate to zero, one, or many rows in another table; this is referred to as the cardinality of the relation. Relations may be one-to-one, one-to-many, many-to-many. There are standard ways of representing each relation. Before you look up the standard ways to do this, think about how you'd represent each one, what the minimum requirements of each kind is. You'll see that a many-to-many relation can in fact also model one-to-many and one-to-one; ask yourself why, given that, all relations are not many-to-many.
EF Codd, among other, pioneered the idea of normal form in relational databases. There are commonly held to be five or six normal forms, but the most important summary of normal form is simple: every entity that your database models should be represented by one row and one row only, every attribute should depend on the row's key, and every row should model an entity or a relationship. Read a primer on normal form, and understand why you can get data inconsistencies if a your database isn't normalized.
In all this, try to understand why I like to say "if you lie to the database, it will lie to you". By this I don't mean bad data, I mean bad design. E.g., if you model a one-to-many relation as many-to-many, what "lies" can be recorded? What "lies" can happen if your tables aren't normalized?
A view, in practical terms, is a select query given a name and stored in the database. If I often join table student to table major through the many-to-many relation student_major, maybe I can write a view that selects the columns of interest from that join, and use the view instead of alway rewriting the join.
Practical tips: first, write a view. whatever you're doing, it'll be simpler and clearer if you write a view for every calculation or sub-calculation you do. Write a view that encapsulates each join, write a view that encapsulates each transformation. Almost anything you want to do can be done in a view.
Decomposing a query into views serves the same ends as functional decomposition serves in procedural code: it allows you to concentrate on doing one thing well, makes it more easily tested, and allows you to compose more complex functionality out of simpler operations. Here's an example where I use views to transform a table into forms that more easily allow me to apply successive transformations, in order to get to a goal.
Don't conflate data. Each table ought to unambiguously model one thing (one kind of entity) and only one thing; each column should express one and only one attribute of that thing. Different kinds of entities belong in different tables.
Metadata is your friend. Your database platform will provide some metadata; what it doesn't provide you should add. Since metadata is data, all the rules for modeling data apply. You can get, for example, the names of all objects in your database from the sytem table sysobjects; syscolumns contains all the columns. To find all the columns in one table, you'd join sysobjects and syscolumns on id, and add a where clause restricting the resultset to a particular table name: where sysobjects.name = 'mytable'.
Experiment. Sit down at a database and ask yourself, "How can I represent people with hair colors and professions and residences? What tables and relations are implied in modeling that?" Then model that, as tables.
Then ask yourself, "How can I show all blonde doctors who reside in Atlanta", and write the query that does that. Piece it together by writing views that show you all blondes, all doctors, and all people who reside in Atlanta.
You'll find that in asking "how can I find that", you'll expose deficiencies in your model, and you'll find that you want or even need to change the way your model works. Make the changes, see how they make your queries easier or harder to write.
I love Joe Celko books from novice to advanced. I also think virtual labs are great.
An easy way to learn SQL syntax?
Use Microsoft Access. Use the Northwind sample database, open Access up in Query view and run some queries.
Creating a Simple Query
Start with SELECT * FROM and work your way up to more complicated examples.
One of the Best resources is http://www.sqlservercentral.com/ Tons of articles
Another good resource is http://www.trainingspot.com/VideoLibrary/Default.aspx
And here is a list of books my DBA suggested I read for learning SQL
Best Damn Exchange, SQL and IIS Book Period or on google books
Beginning SQL Server 2008 Developers or on Google books
Here are the three books I strongly recommend you read in order.
Begining SQL Server 2005 Programming
Professional SQL Server 2005 Programming
The Gurus Guide to Transact SQL
W3Schools has a nice tutorial with try by example setup. But other than just installing a express edition and having a bunch of trials runs with the demo databases, I'd say no book will teach you better.
I would say your very best bet is to sign up for a DB class at a local college. You can usually find an evening class. You will start with simple Database concepts like what is a database, and what are tables.
The instructor will usually give you a project as homework about halfway though the class where you will design and implement a simple database for something like a video store. You will have interaction with other students who are at your same level and will be interested in discussing the technical details from a new DB guy standpoint. And you will have an experienced instructor you can ask questions of and get timely interaction from, who won't be snarky like us internet posters :)
Get it from horse's mouth --> http://www.asp.net/learn/videos/default.aspx?tabid=63#sql
These days most of the universities have their courses online. Try to research some good professors and learn the fundamentals. Their assignments are also useful.
of the top of my head, I can think of MIT opencourseware (OCW)
This depends on what you will need to do. If you just need to access databases, you should have a look the various access strategies - DataReader, DataSet, LINQ to SQL, Entity Framework, NHibernate - and pick a solution.
If you need to develope database, get a good book on that topic. Get familar with the theoretical stuff - relational algebra, keys, referential integrity and normalisation. Then have a look at SQL and finally you may have a closer look at ACID transaction, locking, concurency control, indexes, and all the technical details that make a database server work.
I would suggest to read the wikipedia articles - may be the 100 most important ones - to get the big picture and then approch the details where required. But this will probably be no replacement for a good book if you want to get a good database developer.
I tend to like books because I can read them anywhere, I can go at my own pace and I can get eBook copies (when using apress). I also happen to learn more efficiently in this manner as I already know most of the concepts, like database types.. int, bool, guid, etc... you will know those as well. So, essentially, I would recommend the apress series of books - very comprehensive IMO. And you can generally find them used for very cheap on Amazon... Here is one tailored to you:
http://www.amazon.com/Beginning-SQL-Server-2008-Developers/dp/1590599586/ref=sr_1_1?ie=UTF8&s=books&qid=1239758026&sr=1-1
When you sign up to Microsoft Books Newsletters (From Microsoft Press) they actually give you (free) an ebook called Introducing SQL Server 2008.
http://csna01.libredigital.com/?urss1q2we6

Categories