Could you use Lucene as an OODB?

Could you use Lucene as an OODB? - c#

Given that Lucene is a robust document based search engine could it be used as an Object Database for simple applications (E.G., CMS style applications) and if so what do you see the benefits and limitations?
I understand the role of the RDBMS (and use them on a daily basis) but watned to explore other technologies/ideas.
For example say my domain entities are like:
[Serializable]
public class Employee
{
public string FirstName {get;set;}
public string Surname {get;set;}
}
Could I use reflection and store the property values of the Employee object as fields in a Lucene document, plus store a binary serialized version of the Employee object into another field in the same Lucene document?

No. Trying to use Lucene as an effective OODB (Object Oriented Database) is going to be like trying to fit a square peg into a round hole. They're really two completely different beasts.
Lucene is good at building a text index of a set of documents...not storing objects (in a programming sense). Maybe you mis-understand what an Object Oriented Database is. You can check out the definition at Wikipedia:
Object Databases
Object Oriented Databases have their place. If you truly have an application that would benefit from an OODB, I would suggest checking out something like InterSystems Caché

Related

MongoDB data modelling - Indexes and PK

I'm currently transitioning from an RDBMS to a NoSQL solution, more specifically MongoDB. Consider the following tables in my database (the original solution is much more complex, but I include this so you have an idea):
User (PK_ID_User, FirstName, LastName, ...);
UserProfile: (PK_ID_UserProfile, ProfileName, FK_ID_User, ...);
The keys in this table are GUIDs, however they are custom generated. For example:
UserGUIDs will be of the following structure: US022d717e507f40a6b9551f11ebf2fcb4 (so, US-prefix and random numbers),
while UserProfile GUIDS will be of this format: UP0025f5804a30483b9b769c5707b02af6 (so UP-prefix and random numbers)
Now, suppose I want to convert this RDBMS data model to NoSQL MongoDB. For my application (which uses the C# driver), it is very important that all of document properties in MongoDB have the same name. This also counts for the ID fields: the names PK_ID_User and PK_ID_UserProfile, including the GUIDs, have to be the same.
Now, MongoDB uses a standard unique indexed property _id for storing id's. The name of this _id fields can ofcourse not be changed, even though I really need for my application to preserve the column / property names.
So I came up with the following document structures for my Users and User Profiles. Bear in mind that, for this case, I chose to use referenced data modeling over embeds for various reasons I won't explain here:
User-document
{
_id: ObjectId, - indexed
PK_ID_User: custom GUID, - indexed, as it needs to be unique
FirstName: string,
...
}
UserProfile-document
{
_id: ObjectId - indexed
PK_ID_UserProfile: custom GUID, as explained above - indexed, as it needs to be unique,
...
}
And here's the C# class:
public class User
{
[BsonConstructor]
public User() { }
[BsonId] // the _id field
[BsonRepresentation(BsonType.ObjectId)]
public string Id { get; set; }
[BsonElement("PK_ID_User")]
public string PK_ID_User { get; set; }
//Other Mapper properties
}
The reason I chose this modelling strategy is the following: the current project consists of a whole web service, using ORM and RDBMS, and a client side that more or less maps the database objects to client side view objects. So it's really necessary to preserve the names of the Ids / PKs as much as possible. I decided that it'd be best to let MongoDB use the ObjectId's internally (for CRUD-operations), as they don't cause performance overhead, and use the custom GUIDs so they are compatible with the rest of my code. This way, minimal changes have to be made, MongoDB is happy and I am happy, as externally, I can keep querying results based on my GUID PKs that will always be unique. As in MongoDB, my PK GUIDs are stored as unique strings, I think I don't have to worry about GUID overhead on the server side: the GUIDs are created by my C# application.
However, I have my doubts about performance, I now always have a minimum of 2 indexes per document / collection, and have no idea how costly it is in terms of performance.
Is there a better approach for my problem, or should I stick to my current solution?
Kind regards.

I now always have a minimum of 2 indexes per document / collection, and have no idea how costly it is in terms of performance.
Indexes cost performance for inserts and updates, and you posted no info on the frequency of write operations or your setup. It'll be impossible to give a definite answer without a measurement.
Then again, provided you're using a web application, I'd say the sheer network delay to your clients will be several orders of magnitude higher than the difference between say 1, 2 or 3 indexes, since all these operations mostly hit the RAM.
What's costly is the writing to disk, not the restructuring of the BTree in memory. Of course, having more and more indexes will increase the likelihood of an insert leading to costly restructuring of an index tree that would have to hit the disk, but that also depends on the structure of the keys themselves.
If anything, I'd worry about bad cache coherence and time-locality of GUIDs: If your data were very time-local (like logs), then a GUID might hurt (high jitter at the beginning of the string), because updates would be more likely to rearrange entire sub-trees and a typical time-range query would grab items cluttered throughout the tree. But since this appears to be about users and user profiles, such a query would probably not make much sense.

L2S Approach for Interface Property

We're using L2S and we have a class like this:
public class HumanContainer
{
public List<IHuman> Humans { get; set; }
public string SomeOtherProperty { get; set; }
}
Our database has tables like this:
HumanContainer
- Geek
We've only had one type of Human so far (Geek). And when we send/retrieve HumanContainers to/from the DB, we know to treat them as Geeks. Now that we need a second Human (Athlete), we have a choice to make for how to implement this.
One option is to create another table (Athlete) in the DB:
HumanContainer
- Geek
- Athlete
For every new concrete Human like this, we'll need to loop through HumanContainer.Humans, detect the type, add it to the appropriate EntitySet<>, then save.
Another option is to have only one table for all Humans:
HumanContainer
- Humans
If we do that, then we'll need something like an XML column where we serialize the Human into its specific type and store it in that column. Then we'll need to deserialize that column when retrieving the data.
Is one of the approaches recommended? I'm curious to know how people have been handling this situation. Is there a third approach that I haven't listed here?

What it sounds like you are trying to do is represent inheritance in a relational database. Guy Burstein has a pair of great articles on this subject: How To: Model Inheritance in Databases and Linq to SQL Inheritance.

As I understand properly your question, In your case It's possible to have different type of humans in future. You can try following solution.
Solution 1:
As you mentioned only create One Table in Database 'Humas' and serialize the Human into specific Type and store it in that column and deserialize that column when retrieving the data. This solution seems good because If need to any Human type in future we don't need to change database design. and easy to manage.
But The dis advantage of this solution is that if application required only Geek Humans type then first need to retrieve column data and deserialize it and after that we can find Geek Human type.
Solution 2:
Create Two table in database.
1) HumanType : for save Type of human (Geek, Athlete, or any other Type)
2) Human : storing Human information. This table contain reference key of HumanType.
The Advantage of this solution is that you can easily fire query based on requirement (e.g Only Geek Type Human easily fetched from table). and if any new Human Type come then one entry required to enter in HumanType Database.

Can I dynamically/on the fly create a class from an interface, and will nHibernate support this practice?

I’ve done some Googling but I have yet to find a solution, or even a definitive answer to my problem.
The problem is simple. I want to dynamically create a table per instance of a dynamically named/created object. Each table would then contain records that are specific to the object. I am aware that this is essentially an anti-pattern but these tables could theoretically become quite large so having all of the data in one table could lead to performance issues.
A more concrete example:
I have a base class/interface ACCOUNT which contains a collection of transactions. For each company that uses my software I create a new concrete version of the class, BOBS_SUB_SHOP_ACCOUNT or SAMS_GARAGE_ACCOUNT, etc. So the identifying value for the class is the class name, not a field within the class.
I am using C# and Fluent nHibernate.
So my questions are:
Does this make sense or do I need to clarify more? (or am I trying
to do something I REALLY shouldn’t?)
Does this pattern have a name?
Does nHibernate support this?
Do you know of any documentation on
the pattern I could read?
Edit: I thought about this a bit more and I realized that I don't REALLY need dynamic objects. All I need is a way to tie objects with some identifier to a table through NHibernate. For example:
//begin - just a brain dump
public class Account
{
public virtual string AccountName { get; set; }
public virtual IList Stuff { get; set; }
}
... somewhere else in code ...
//gets mapped to a table BobsGarageAccount (or something similar)
var BobsGarage = new Account{AccountName="BobsGarage"};
//gets mapped to a table StevesSubShop(or something similar)
var StevesSubShop = new Account{AccountName="StevesSubShop"};
//end
That should suffice for what i need, assuming NHibernate would allow it. I am trying to avoid a situation where one giant table would have the heck beat out of it if high volume occurred on the account tables. If all accounts were in one table... it could be ugly.
Thank you in advance.

Rather than creating a class on the fly, I would recommend a dynamic object. If you implement the right interfaces (one example is here, and in any case you can get there by inheriting from DynamicObject), you can write
dynamic bobsSubShopAccount = new DynamicAccount("BOBS_SUB_SHOP_ACCOUNT");
Console.WriteLine("Balance = {0}", bobsSubShopAccount.Balance);
in your client code. If you use the DLR to implement DynamicAccount, all these calls get intercepted at runtime and passed to your class at runtime. So, you could have the method
public override bool TryGetMember(GetMemberBinder binder, out object result)
{
if (DatabaseConnection.TryGetField(binder.Name, out result))
return true;
// Log the database failure here
result = null;
return false; // The attempt to get the member fails at runtime
}
to read the data from the database using the name of the member requested by client code.
I haven't used NHibernate, so I can't comment with any authority on how NHibernate will play with dynamic objects.

Those classes seem awfully smelly to me, and attempt to solve what amounts to be an actual storage layer issue, not a domain issue. Sharding is the term that you are looking for, essentially.
If you are truly worried about performance of the db, and your loads will be so large, perhaps you might look at partitioning the table instead? Your domain objects could easily handle creating the partition key, and you don't have to do crazy voodoo with NHibernate. This will also more easily permit you to not do nutty domain level things in case you change your persistence mechanisms later. You can create collection filters in your maps, or map readonly objects to a view. The latter option would be a bit smelly in the domain though.
If you absolutely insist on doing some voodoo you might want to look at NHibernate.Shards, it was intended for easy database sharding. I can't say what the current dev state and compatibility is, but it's an option.

DDD: Can a Value Object have lists inside them?

I'm not well versed in domain driven design and I've recently started created a domain model for a project. I still haven't decided on an ORM (though I will likely go with NHibernate) and I am currently trying to ensure that my Value Objects should be just that.
I have a few VOs that have almost no behavior other than to encapsulate "like" terms, for instance:
public class Referral {
public Case Case { get; set; } // this is the a reference to the aggregate root
public ReferralType ReferralType { get; set; } // this is an enum
public string ReferralTypeOther { get; set; }
} // etc, etc.
This particular class has a reference to "Case" which is two levels up, so if say I were going to access a Referral I could go: case.social.referral (Case, Social and Referral are all classes, there is a single Social inside a Case and there is a single Referral inside a Social). Now that I am looking at it as I type it, I don't think I need a Case in the Referral since it will be accessible through the Social entity, correct?
Now, there is no doubt in my mind this is something that should be a VO, and the method I plan to use to persist this to the database is to either have NHibernate assign it a surrogate identifier (which I am still not too clear on, if anyone could please elaborate on that too it would help me out, since I don't know if the surrogate identifier requires that I have an Id in my VO already or if it can operate without one) and/or a protected Id property that would not be exposed outside the Referral class (for the sole purpose of persisting to the DB).
Now on to my title question: Should a VO have a collection, (in my case a List) inside it? I can only think of this as a one-to-many relationship in the database but since there is no identity it didn't seem adequate to make the class an entity. Below is the code:
public class LivingSituation {
private IList<AdultAtHome> AdultsAtHome { get; set; }
public ResidingWith CurrentlyResidingWith { get; set } // this is an enum
} // etc, etc.
This class currently doesn't have an Id and the AdultsAtHome class just has intrinsic types (string, int). So I am not sure if this should be an entity or if it can remain as a VO and I just need to configure my ORM to use a 1:m relationship for this using their own tables and a private/protected Id field so that the ORM can persist to the DB.
Also, should I go with normalized tables for each of my classes, or not? I think I would only need to use a table per class when there is a possibility of having multiple instances of the class assigned to an entity or value object and/or there is the possibility of having collections 1:m relationships with some of those objects. I have no problem with using a single table for certain value objects that have intrinsic types but with nested types I think it would be advantageous to use normalized tables. Any suggestions on this as well?
Sorry for being so verbose with the multiple questions:
1) Do I need a surrogate identifier (with say NHibernate) for my value objects?
2) If #1 is yes, then does this need to be private/protected so that my value object "remains" a value object in concept?
3) Can a value object have other value objects (in say, a List) or would that constitute an entity? (I think the answer to this is no, but I'd prefer to be sure before I proceed further.)
4) Do I need a reference to the aggregate root from a value object that is a few levels down from the aggregate root? (I don't think I do, this is likely an oversight on my part when writing the model, anyone agree?)
5) Is it OK to use normalized tables for certain things (like nested types and/or types with collections as properties which would need their own tables anyway for the 1:m relationship) while having the ORM do the mapping for the simpler value objects to the same table that belongs to my entity?
Thanks again.

Take a look at the answers to related questions here and here
1) Yes - If you're storing VOs in their own table
2) If you can use a private/protected ID property, then great. Alternatively, you might use explicit interfaces to 'hide' the ID property.
But, reading into your question, are you suggesting that developers who see an ID property will automatically assume the object is an entity? If so, they need (re)training.
3) Yes it can, but with the following restrictions:
It should be quite rare
It should only reference other VOs
Also, consider this: VOs shouldn't stick around. Would it be easy/efficient to re-create the entire VO every time it's needed? If not, make it an Entity.
4) Depends on how you want to implement your Aggregate Locking. If you want to use Ayende's solution, the answer is yes. Otherwise, you would need a mechanism to traverse the object graph back to the Aggregate Root.
5) Yes. Don't forget that DDD is Persistence Ignorant (in an ideal world!).
However...
I believe Referral should be an Entity. Imagine these conversations:
Conversation 1:
Tom: "Hey Joe! Can you give me David Jone's referral?"
Joe: "Which one?"
Tom: "Sorry, I mean Referral No.123"
Conversation 2:
Tom: "Hey Joe! Can you give me David Jone's referral?"
Joe: "Which one?"
Tom: "I don't care - just give me any"
Conversation 1 suggests that Referral is an Entity, whereas conversation 2 suggests it's a VO.
One more thing: Does Referral.ReferralType change during it's lifetime (there's another hint that it should be an Entity)? If it doesn't change, consider using polyporphism and let NH handle it.
Hope that helps!

DDD Value Object: How to persist entity objects without tons of SQL joins?

Obviously I am wrong in saying that DDD is similar to EAV/CR in usefulness, but the only difference I see so far is physical tables build for each entity with lots of joins rather than three tables and lots of joins.
This must be due to my lack of DDD understanding. How do you physically store these objects to the database without significant joins and complication when importing data? I know you can simply create objects that feed in to your repository, but it's difficult to train tools like Microsoft Sql Server Integration Server to use your custom C# objects and framework. Maybe that should be my question, how do you use your DDD ASP.NET C# framework with Microsoft SQL Server Integration Services and Report Services? LOL.
In an EAV/CR database, we can setup a single Person table with different classes depending on the type of person: Vendor, Customer, Buyer, Representative, Company, Janitor, etc. Three tables, a few joins, attributes are always string with validation before we insert, just like ModelValidation in MVC where the object accepts any value but won't persist until it's valid.
In a standard relational model, we used to create a table for each type of entity, mixing in redundant data types like City.
Using Domain Driven Design, we use objects to represent each type of entity, nested objects for each type of ValueObject, and more nested objects still as necessary. In my understanding, this results in a table for each kind of entity and a table for each kind of information set (value object). With all these tables, I see a lot of joins. We also end up creating a physical table for each new contact type. Obviously there is a better way, so I must be incorrect in how I persist objects to a database.
My Vendor looks like this:
public class Vendor {
public int vendorID {get; set;}
public Address vAddress {get; set;}
public Representative vRep {get;set;}
public Buyer vBuyer {get; set;}
}
My Buyer:
public class Buyer {
public int buyerID {get; set;}
public Address bAddress {get; set;}
public Email bEmail {get; set;}
public Phone bPhone {get; set;}
public Phone bFax (get; set;}
}
Do we really reference things like Vendor.vBuyer.bPhone.pAreaCode? I would think we would reference and store Vendor.BuyerPhoneNumber, and build the objects almost like aliases to these parts: Vendor.Address1, Vendor.Address2, Vendor.BuyerPhoneNumber ... etc.

The real answer is to match your SQL normalization strategy to your objects. If you have lots of duplicate addresses and you need to associate them together, then normalize the data to a separate table, thus creating the need for the value object.

You could serialize your objects to xml and save it to an xml column in you Sql Server. After all, you are trying to represent a hierarchical data structure, and that's where xml excels.

Domain-driven design proponents often recommend keeping the data model as close to the object model as possible, but it isn't an ironclad rule.
You can still use a EAV/CR database design if you create mappings in your object-relational mapping layer to transform (project) your data into your objects.
Deciding how to design your objects and provide access to child values is really a separate question that you have to address on a case-by-case basis. Vendor.BuyerPhoneNumber or Vendor.vBuyer.bPhone.pAreaCode? The answer always depends, because it's rooted in your specific requirements.

One of the best ways to store Domain Objects is actually a document database. It works beautifully because the transactional boundary of the document matches perfectly the consistency boundary of the Aggregate Root. You don't have to worry about JOINs, or eager/lazy loading issues. That's not strictly necessary, though, if you apply CQRS (which I've written about below).
The downside is often with querying. If you wish to query directly the persisted data behind your Domain Objects, you can get into knots. However that is a complexity that CQRS aims to solve for you, where you have different parts of your application doing the queries than the parts loading/validating/storing Domain Objects.
You might have a complex "Command" implementation that loads Domain Objects, invokes the behaviour on them (remembering that Domain Objects must have behaviour and encapsulate their data, or else risk becoming "anaemic"), before saving them and optionally even publishing events about what happened.
You might then use those events to update some other "read store", though you don't have to. The point is you have a totally different vertical slice implemented in your application that doesn't have to bother with that complex object model/ORM business, and instead goes straight to the data, loads exactly what it needs and returns it for displaying to the user.
CQRS is not hard, and is not complex. All it does is instruct you to have separate code for:
Handling commands (which mutate state and therefore need the business rules/invariants involved).
Executing queries (which do not mutate state and therefore don't need all the complex business rules/invariants involved, and so can "just" go get the data in an efficient way.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.