Graph database for .NET, does it help our case?

Graph database for .NET, does it help our case? - c#

Short description of our application: We analyze .NET assemblies and detect dependencies between them (e.g. method calls). We save those dependencies in a MSSQL server database. From a class/method in code we can now find all direct and indirect dependencies and are able to find out which code may break if we change the interface or implementation.
Although we make good use of indices (dropped our import performance, but that runs overnight anyways) we still have performance issues. As we import many many versions of the same assembly we have quite a heavy amount of data and queries take a few seconds, which is just not fast enough (< 1.5s is the target).
As dependencies are a graph-like structure we're wondering if switching from MSSQL to a NoSQL graph database may help. This would take some time so we're hoping for some external input first.
If yes, you can of course also post a recommended .NET graph database :-)

Call me an old fogey, but I would be quite careful making such a technology switch - as this SO question shows, the technology choice is fairly limited, and I think you run the risk of turning your project into a "Neo4J" project, rather than a "dependency management" project. If you've really hit the buffers, that's worth considering, but it doesn't sound like you should be there with the data volumes you're discussing.
The first thing I'd consider is looking at the "nested set" model - this specifically solves the performance problem when retrieving all children for a given node.

Related

Migrating Custom .NET ORM to Entity Frame / Dapper

I've inherited a project that is massive in scale, and a bit of a labyrinth. The traffic is substantial enough to want to optimize the data access, so I've began converting some WCF services that are invoked by javascript to Web API.
Unfortunately, the database's primary keys (not auto incrementing) are also managed by a custom ORM by querying a MySQL function that returns the next set of ID's to be used. The ORM then caches them and serves them up to the application. The database itself is an ever growing 2 TB's of data, which would make downtime significant.
I had planned on using Dapper as I've enjoyed ease/performance in the past, but weening this database off of this custom ORM seems daunting and prone to error.
Questions:
Does anyone have any advice for tackling a large scale project?
Should I focus more on migrating the database into an entirely new data structure? (it needs significant normalization, too!)

My humble opinion:
A rule of thumb when you deal with legacy code is: if something works, keep it that way. If necessary, make a change or an improvement. The main reasons are:
The effect of redesign is almost zero when you want to add a business value to the system.
The system, good or bad, works. No matter the care you have, you can always mess something with an structural change.
Besides, it depends a lot on the plan of the company the reasons to change (adding a feature, fixing a bug, improving design or optimizing resource usage). My humble experience tells me that time and budget are very important, and although we always want to redesign (or in some cases, to code from scratch), the most important is the business objectives and the added value.
In your case, maybe it's not necessary to change all the ORM. If you say that the ID's are cached, a better approach would be to modify the PK's, adding on them the identity property (with the properly starting value on each table). After that you can delete that particular part of the code that get the next Id's.
In some cases, an unnormalized database has his reasons. I've seen cases in which the data is copied to the tables to avoid the join, which affects performance. I'm talking about millions of records...
Reasons to change the ORM: maybe if it's inefficient, or if it does not close unmanaged code (in this case a better approach is to implement the IDisposable interface). If it works, maybe a better approach is to use a new ORM once you need to create new functionalities. If the project needs a refactoring for optimization purposes, the change needs to be applied in the bottlenecks, not the entire code.
There is a lot of discussion about the topic. A good recommended resource is "Working effectively with legacy code" by Michael Feathers, or "Getting Started With DDD When Surrounded By Legacy Systems" by Eric Evans.
Greetings!

We have migrated VB6 code to C# in .net

The code was migrated using a third party tool. what ever the tool couldnt do, was done by the .net developers, so that all compile issues were fixed. My question is, for such migration activities, do we not bother running unit tests for the functions.
Secondly, Could anyone suggest if we should use some tool in VSTS 10 to create a UML model of this code to minimize risks of issues that the client might find. How cumbersome is it.
Are there any other suggestions for how quality migrated code can be delivered, in light of the fact that the functionality of the original VB6 application is unknown to us.

for such migration activities, do we not bother running unit tests for the functions.
I wouldn't trust freshly translated code (mechanical or otherwise) at all. Absolutely it needs testing.
the functionality of the original VB6 application is unknown to us.
That will make regression testing quite... challenging. If you don't know how it is meant to behave, how do you know when you've finished it?
Of course, you could decide not to unit test the translated code, then you won't know how the new code works either - not sure that "unknown = unknown" counts as a "pass", though.

In my experience, the vast majority of applications provide a great deal of "unknown" functionality. After all the reason we write software is to help us manage information in ways that immeasurably exceed our abilities as mere morals. Over time, the size and complexity of our software grows, and grows, and grows until it contains a vast amount of "unknown" functionality. The unknown functionality was probably known and verified as "correct" at one time and it was captured in detail by the source code. However, as time passes no one fully remembers/knows what all the functionality is or even why it is "correct". The full functionality is only "remembered/known" by the source code, teams "test what they change" and the rest is assumed correct unless a problem shows up. This is particularly true of systems that have been extended and changed by many people over many years. Of course this creates risk, and we can do better, process like TDD and tools to automate unit testing are helping, but for many older systems lack of system understanding and incomplete testing are facts of life. The technical idealist in me does not like this, but the business realist in me accepts it.
All that said, this presents a major problem for migration teams. In theory these teams are "changing everything". In a VB6-to-.NET migration, "Test what we changed" means test it all. Ouch. Also the functional requirements for a migration often are "just make it do what it does now, but on the new platform." Not very useful when people do not know/remember everything the system does let alone how to verify that it does it correctly. I am working with several customers that have huge VB6 apps containing 100s of thousands of LOC organized into hundreds or forms and classes and several thousand methods, properties, and event handlers. I am sure these apps contain 10s of thousands of function points. I like to ask migration teams how long it would take them to find the error if I went into the VB6 and "broke" one little thing somewhere. I rarely get an answer...
This is why I advocate using a tool-assisted rewrite methodology. One of the most critical inputs to this process is the production-tested source code. We assume this code is "correct" since you or your customers are running their business on it. The source code is an extremely detailed, formal, and complete answer to the question: what does the system do? In our approach, the migration team iteratively customizes, calibrates, and verifies the automatic, systematic translation and re-engineering of the VB6 source to a complete .NET source. We translate, test, tune, and repeat; each time improving the quality of the translation in terms of functional correctness and conformance to .NET coding standards. Verifying and refining what the tool does is central to the methodology.
In order to verify code quality, we use code reviews and "side-by-side" testing. Code reviews are done by inspecting the .NET code using eyes, and other tools such as the .NET compiler, FXCop, NDepends, etc. We also do a lot of comparing successive generations of the translated codes using a product like BeyondCompare to verify that each translation tuning change has the desired effect and no undesired side-effect. Side-by-side testing is just what it sounds like: the general idea is to run the legacy and .NET apps in side-by-side test environments and make sure their results and behaviors match. There are at least a couple challenges here:
what do you do when you "run the app"; and
how do you make sure the results and behaviors match?
The first question is typically answered in terms of test data, use cases and automated unit tests; the second question is answered in terms of looking at the application UI, and the results (data, web pages, reports) from both systems and comparing (aka approval-based testing). Of course testing tools can go a long way to increase the efficiency. A large-scale migration is a very good time to have a discussion about starting to use testing tools.
If you are planning to migrate a large complex codebase, you need to plan to be very smart about testing. If done properly, the tool-assisted approach delivers production ready code very efficiently, and this will free up resources to produce QC artifacts and improve QC processes that will endure long after the migration.
Disclaimer: I work for Great Migrations.

From the tone of your question it sounds like you know the answer! I would say anything other than a complete set of regression tests would be a recipe for disaster! Ideally, you would want to run the same set of tests against both the old and new versions, although it sounds like you might not be able to do that...
My honest answer - make sure you've got plenty of support/maintenance developers ready to work round the clock fixing support issues!

Is it foolish of me not to use NHibernate for my project?

I am working on a .NET web application that uses an SQL Server database with approximatly 20 to 30 tables.
Most tables will be included in the .NET solution as class.
I have written my own data access layer to read the objects from, and write them to the database.
The whole thing is consist of just a few classes and very few lines of code en uses generics and reflection to find out what SQL and parameters to use.
Now, such thing could be done by using NHibernate (or similair framework) and some co-workers claim that is foolish of me not to use it.
My main argument for not using it is that i want maximum control over my application, know exactly what everything does and how everything works, even if that costs me more development time.
I also dont like the fact i have to map my database in XML files (my own solution lets me map it in the entity class files).
So, what i would like to hear from you is, is it really stupid to not use NHibernate in this situation?
Am i really being ignorant or is it not such a strange idea to use my own solution?

I think these days there really isn't any reason to roll your own persistence framework since there are so many good choices out there. You don't have to use NHibernate (though it is a good choice) but I would seriously consider using something that is well tested and established in the industry as it will tend to perform better and have less bugs that something you write yourself.

It probably is foolish to write your own classes instead of using NHibernate, but it's less foolish to continue using your own classes, given that you've already written them. Maybe.

I won't call you foolish because I've done exactly the same thing in the past. Then I started using NHibernate and wondered why the hell I rolled my own. It's good, give it a go.

You have several possibilities that are probably better than you reinventing the wheel. Let me name two most likely choices:
Use Entity Framework for your DAL+DAO. This will make your classes (that you've already written) obsolete, since EF will create their own and you'll get up to date with latest language capabilities and technologies.
Use Fluent NHibernate so you don't have to work with XML mappings. This way you'll keep your business layer object classes you've written and avoid tedious NHibernation XML files. It's all C#.
Your way of thinking is good. You want control. That's fine. But using your own DAL is a bit foolish these days, because you are basically reinventing the wheel, plus you'll have not tested/buggy code that will take considerable time to develop+test+debug.
If I were you, I'd go with the #2 option, since I've done option #1 and I know I had to customize lots of things to make EF work as it should. EF will be ready with V2.

People tend to use frameworks that are already written because, well, they're already written (and tested).
But there IS merit to rolling your own. Only you and your colleagues can make assumptions about your domain. A generic framework like NHibernate cannot make many assumptions, because that wouldn't make it very universal.
When you roll your own, you can bake these assumptions into your framework, to make a more streamlined, natural API. That said, if you were starting over I would have suggested taking an existing framework and wrapping it to better suit your needs. But since you already have something and it works for you, I'm not sure that I would suggest swapping it out for something else.

It depends on what they mean by "foolish."
If by "foolish" they mean you shouldn't have written your persistence layer in the first place, they're probably right, but that's crying over spilled milk.
If by "foolish" they mean you should rewrite all your existing code to use another framework (like NHibernate) when it's already working with yours, they're probably wrong (although there's something to be said for # of bugs in NHibernate vs likely # of bugs in yours).
If by "foolish" they mean the entire team knows NHibernate cold, and it's already used in the rest of your code, so by using your framework you're making it harder on the team, they're absolutely right, and you should probably refactor the code in NHibernate as soon as possible, before any more code gets locked in to your framework.
If by "foolish" they mean no one there really knows NHibernate, they just like it, then... nobody wins. They're being fussy, you implemented a framework you didn't have to... let's call it a tie.
All of that said, everyone should write a persistence framework or three. Those probably shouldn't end up in anything that ships, but it's a good exercise. The only mistake you made was tying code the team had to maintain into your good exercise.

There are many good persistence tools out there that are well tested and have proven performance (NHibernate, Linq to entities, LLBL Gen Pro). If your needs are very different from the normal persistence frameworks that exist then I would roll my own. I would want to take advantage of the testing and optimizations of an existing tool if at all possible, however.
That being said, I might also roll my own if I wanted to have the experience of building my own ORM tool and was willing to live with the downsides (not as well tested or optimized as tools that have been around for years, speed to market).

Making your own solution, especially when it seems to work fine and be as simple as you say, is neither ignorant nor strange. There are lots of situations where it's better to do that than to add a dependency on a separate project like nHibernate.
That said, there are of course also a lot of situations where the complete opposite is true. :)
It really depends on your project and team. If you are developing an enterprise application that will eventually be supported by someone else, sticking to industry standards might be a good idea even if it means a bit more work up front.

All of the answers here are great, but I am really surprised that nobody has mentioned Castle ActiveRecord, it sounds very similar to what your framework does and really simplifies the interface to NHibernate. It's one of the patterns that made Ruby on Rails so popular after all!
Ayende Rahien (one of the principal NH developers) gave a GREAT presentation on ActiveRecord at Oredev a few years ago which I highly recommend: http://www.viddler.com/explore/oredev/videos/89

I think that it is a matter of balance of control. You say that you want control and you don't want mappings. If this control comes at the cost that there is an increased development and maintenance cost and that it takes longer to produce working code, then it is a problem.
I personally don't see a problem in rolling a framework as long as it simplifies a repetitive task and makes development more productive and code more stable due to less room for interpretation. We have rolled our own framework, that includes a persistence/data access implementation. Our reasons for doing it, though, were specific. In this case, it was to work within a DDD environment that was much closer to what Evans describes than what most off the shelf products were providing.
I think the difference is, though, that we understood that there was an upfront cost and that it would eventually balance itself out through savings in development time in the future. Of course, if you are writing code that you manually have to manage connections, map data, etc., you are probably going down the wrong path. At the very least, you could be using something like Enterprise Library to help you manage the tedium of connectivity and command construction. But, I also think, that if you have no reuse - nothing that is a "framework" type of implementation that you can abstract and apply to other projects, then you are creating a maintenance nightmare and time sink that you will be the sole owner of.

We were also using our own Data Access Layer and entity classes. We also had a code generator who used to generate all this classes for us. But now we are using Entity Framework and we are more then happy.
Simple advise : Start learning nHibernate or whatever you prefer and start using it in your next project.

Entity Spaces - http://www.entityspaces.net/Portal/Default.aspx
is also a good tool.

I ended up using Fluent NHibernate for the job.
All my entity classes were generated with ActiveRecordGenerator (http://code.google.com/p/active-record-gen/)

What is the general view on using Staging tables? We use them a lot for Order imports from External vendors

We have been having a lot of internal debate regarding staging tables. Some are view staging tables as archaic and will prevent the ability to build re-usable services, etc. It is also being communicated that these will prohibit the business to grow and handle expanded business channels.
I am not necessarily for or against either option, but I do know that having the staged data has been a life saver in many occasions and has made it really easy to re-import orders we have had issues with.
Just wanted to see what others thought about staging data and what other methods are being used to handle scenarios similar to ours (Taking orders from external partners, Amazon, etc and importing them into our ERP system).
Thanks,
S

Some places I've worked I've used staging tables, others I've used other techniques.
Each one has its own advantages and disadvantages.
That said, don't worry about it.
If some data feed comes along that requires some method other than what you are doing, then you'll come up with a new solution.
Change is driven by requirements.
(personally, when someone comes to me and says "We have to change to X because what we do now is inefficient and bad and witches will come and eat our children", they have this image in their minds that on tuesday, we will have an opportunity to triple our client base but only if we do this new thing, but if we don't get cracking on it now, then we'll miss the opportunity because none of those potential clients is willing to wait even a minute and they'll all demand the exact same thing and we can build exactly what they want right now even though we have no idea what they want HURRY HURRY HURRY AND DON'T BREAK ANYTHING. Which, of course, isn't how anything works. A single client (or whatever) comes along and says "Hey, we want your services, can you accept our XML?" to which the response is always "Sure thing", and then you get tasked with it and can make intelligent decisions, and plan things out. As opposed to the "chicken with its head cut-off" methodology preferred by people who like technical words but hate knowing anything tehcnical)

There is no reason for a debate - you have a working system. Anyone who thinks their "re-usable services" theory can do it better should put up or shut up.
Let them build a test implementation on your development servers for a common high volume scenario, and compare it to the current system - including criteria for recovery and re-import after a failure.
I hear this all the time where I work as well (usually from managers who just read an article about SOA and XML) and in situations dealing with large amounts of data - bulk imports into staging tables can handle a much higher data volume than any type of web service.

Serializing vs Database

I believe that the best way to save your application state is to a traditional relational database which most of the time its table structure is pretty much represent the data model of our system + meta data.
However other guys in my team think that today it's best to simply serialize the entire object graph to a binary or XML file.
No need to say (but I'll still say it) that World War 3 is going between us and I would like to hear your opinion about this issue.
Personally I hate serialization because:
The data saved is adhered only to your development platform (C# in my case). No other platforms like Java or C++ can use this data.
Entire object graph (including all the inheritance chain) is saved and not only the data we need.
Changing the data model might cause severe backward compatibility issues when trying to load old states.
Sharing parts of the data between applications is problematic.
I would like to hear your opinion about that.

You didn't say what kind of data it is -- much depends on your performance, simultaneity, installation, security, and availability/centralization requirements.
If this data is very large (e.g. many instances of the objects in question), a database can help performance via its indexing capabilities. Otherwise it probably hurts performance, or is indistinguishable.
If your app is being run by multiple users simultaneously, and they may want to write this data, a database helps because you can rely on transactions to ensure data integrity. With file-based persistence you have to handle that yourself. If the data is single-user or single-instance, a database is very likely overkill.
If your app has its own soup-to-nuts installation, using a database places an additional burden on the user, who must set up and maintain (apply patches etc.) the database server. If the database can be guaranteed to be available and is handled by someone else, this is less of an issue.
What are the security requirements for the data? If the data is centralized, with multiple users (either simultaneous or sequential), you may need to manage security and permissions on the data. Without seeing the data it's hard to say whether it would be easier to manage with file-based persistence or a database.
If the data is local-only, many of the above questions about the data have answers pointing toward file-based persistence. If you need centralized access, the answers generally point toward a database.
My guess is that you probably don't need a database, based solely on the fact that you're asking about it mainly from a programming-convenience perspective and not a data-requirements perspective. Serialization, especially in .NET, is highly customizable and can be easily tailored to persist only the essential pieces you need. There are well-known best practices for versioning this data as well, so I'm not sure there's an advantage on the database side from that perspective.
About cross-platform concerns: If you do not know for certain that cross-platform functionality will be required in the future, do not build for it now. It's almost certainly easier overall to solve that problem when the time comes (migration etc.) than to constrain your development now. More often than not, YAGNI.
About sharing data between parts of the application: That should be architected into the application itself, e.g. into the classes that access the data. Don't overload the persistence mechanism to also be a data conduit between parts of the application; if you overload it that way, you're turning the persisted state into a cross-object contract instead of properly treating it as an extension of the private state of the object.

It depends on what you want to serialize of course. In some cases serialization is ridicilously easy.
(I once wrote kind of a timeline program in Java,
where you could draw en drag around and resize objects. If you were ready you could save it in file (like myTimeline.til). On that momenet hundreds of objects where saved, their position on the canvas, their size, their colors, their innertexts, their special effects,...
You could than ofcourse open myTimeLine.til and work further.
All this only asked a few lines of code. (just made all classes and their dependencies
serializable) and my coding time took less than 5 minutes, I was astonished myself! (it was the first time I used serialization ever)
Working on a timeline you could also 'saveAs' for different versions and the 'til' files where very easy to backup and mail.
I think in my particular case it would be a bit idiot to use databases. But that's of course for document-like structures only, like Word to name one.)
My point thus first : there are certainly several scenarios in which databases wouldn't be the best solution. Serialization was not invented by developers just because they were bored.
Not true if you use XMLserialization or SOAP
Not quite relevant anymore
Only if you are not carefull, plenty of 'best practices' for that.
Only if you want it to be problematic, see 1
Of course serialization has besides the speed of implementation other important advantages like not needing a database at all in some cases!

See this Stackoverflow posting for a commentary on the applicability of XML vs. the applicability of a database management system. It discusses an issue that's quite similar to the subject of the debate in your team.

You have some good points. I pretty much agree with you, but I'll play the devil's advocate.
Well, you could always write a converter in C# to extract the data later if needed.
That's a weak point, because disk space is cheap and the amount of extra bytes we'll use costs far less than the time we'll waste trying to get this all to work your way.
That's the way of the world. Burn the bridges and require upgrades. Convert the data, or make a tool to do that, and then no longer support the old version's way of doing it.
Not if the C# program hands off the data to the other applications. Other applications should not be accessing the data that belongs to this application directly, should they?

For transfer and offline storage, serialization is fine; but for active use, some kind of database is far preferable.
Typically (as you say), without a database, you need to deserialize the entire stream to perform any query, which makes it hard to scale. Add the inherent issues with threading etc, and you're asking for pain.
Some of your other pain points about serialization aren't all true - as long as you pick wisely. Obviously, BinaryFormatter is a bad choice for portability and versioning, but "protocol buffers" (Google's serialization format) has versions for Java, C++, C#, and a lot of others, and is designed to be version tolerant.

Just make sure you have a component that handles saving/loading state with a clean interface to the rest of your application. Then whatever choice you make for persistence can easily be revisited later.
Serializing an object graph to a file might be a good quick and dirty initial solution that is very quick to implement.
But if you start to run into issues that make a database a better choice you can plug in a new version with little or no impact on the rest of the application.

Yes propably true. The downside is that you must retrieve the whole object which is like retrieving all rows from a table. And if it's big it will be a downside. But if it ain't so big and with my hobbyprojects they are not, so maybe they should be a perfect match?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.