How to share business concepts across different programming languages? - c#

We develop a distributed system built from components implemented in different programming languages (C++, C# and Python) and communicating one with another across a network.
All the components in the system operate with the same business concepts and communicate one with another also in terms of these concepts.
As a results we heavily struggle with the following two challenges:
Keeping the representation of our business concepts in these three languages in sync
Serialization / deserialization of our business concepts across these languages
A naive solution for this problem would be just to define the same data structures (and the serialization code) three times (for C++, C# and Python).
Unfortunately, this solution has serious drawbacks:
It creates a lot of “code duplication”
It requires a huge amount of cross-language integration tests to keep everything in sync
Another solution we considered is based on the frameworks like ProtoBufs or Thrift. These frameworks have an internal language, in which the business concepts are defined, and then the representation of these concepts in C++, C# and Python (together with the serialization logic) is auto-generated by these frameworks.
While this solution doesn’t have the above problems, it has another drawback: the code generated by these frameworks couples together the data structures representing the underlying business concepts and the code needed to serialize/deserialize these data-structures.
We feel that this pollutes our code base – any code in our system that uses these auto-generated classes is now “familiar” with this serialization/deserialization logic (a serious abstraction leak).
We can work around it by wrapping the auto-generated code by our classes / interfaces, but this returns us back to the drawbacks of the naive solution.
Can anyone recommend a solution that gets around the described problems?

Lev, you may want to look at ICE. It provides object-oriented IDL with mapping to all the languages you use (C++, Python, .NET (all .NET languages, not just C# as far as I understand)). Although ICE is a middle-ware framework, you don't have to follow all its policies.
Specifically in your situation you may want to define the interfaces of your components in ICE IDL and maintain them as part of the code. You can then generate code as part of your build routine and work from there. Or you can use more of the power that ICE gives you.
ICE support C++ STL data structures and it supports inheritance, hence it should give you sufficiently powerful formalism to build your system gradually over time with good degree of maintainability.

Well, once upon a time MS tried to solve this with IDL. Well, actually it tried to solve a bit more than defining data structures, but, anyway, that's all in the past because no one in their right mind would go the COM route these days.
One option to look at is SWIG which is supposed to be able to port data structures as well as actual invocation across languages. I haven't done this myself but there's a chance it won't couple the serialization and data-structures so tightly as protobufs.
However, you should really consider whether the aforementioned coupling is such a bad thing after all. What would be the ideal solution for you? Supposedly it's something that does two things: it generates compatible data structures across multiple languages based on one definition and it also provides the serialization code to stitch them together - but in a separate abstraction layer. The idea being that if one day you decide to use a different serialization method you could just switch out that layer without having to redefine all your data structures. So consider that - how realistic is it really to expect to some day switch out only the serialization code without touching the interfaces at all? In most cases the serialization format is the most permanent design choice, since you usually have issues with backwards compatibility, etc. - so how much are you willing to pay right now in development cost in order to be able to theoretically pull that off in the future?
Now let's assume for a second that such a tool exists which separates data structure generation from serialization. And lets say that after 2 years you decide you need a completely different serialization method. Unless this tool also supports plugable serialization formats you would need to develop that layer anyway in order to stitch your existing structures to the new serialization solution - and that's about as much work as just choosing a new package altogether. So the only real viable solution that would answer your requirements is something that not only support data type definition and code generation across all your languages, and not only be serialization agnostic, but would also have ready made implementation of that future serialization format you would want to switch to - because if it's only agnostic to the serialization format it means you'd still have the task of implementing it on your own - in all languages - which isn't really less work than redefining some data structures.
So my point is that there's a reason serialization and data type definition so often go together - it's simply the most common use case. I would take a long look at what exactly you wish to be able to achieve using the abstraction level you require, think of how much work developing such a solution would entail and if it's worth it. I'm certain that are tools that do this, btw - just probably the expensive proprietary kind that cost $10k per license - the same argument applies there in my opinion - it's probably just over engineering.

All the components in the system operate with the same business concepts and communicate
one with another also in terms of these concepts.
When I got you right, you have split up your system in different parts communicating by well-defined interfaces. But your interfaces share data structures you call "business concepts" (hard to understand without seeing an example), and since those interfaces have to build for all of your three languages, you have problems keeping them "in-sync".
When keeping interfaces in sync gets a problem, then it seems obvious that your interfaces are too broad. There are different possible reasons for that, with different solutions.
Possible Reason 1 - you overgeneralized your interface concept. If that's the case, redesign here: throw generalization over board and create interfaces which are only as broad as they have to be.
Possible reason 2: parts written in different languages are not dealing with separate business cases, you may have a "horizontal" partition between them, but not a vertical. If that's the case, you cannot avoid the broadness of your interfaces.
Code generation may be the right approach here if reason 2 is your problem. If existing code generators don't suffer your needs, why don't you just write your own? Define the interfaces for example as classes in C#, introduce some meta attributes and use reflection in your code generator to extract the information again when generating the according C++, Python and also the "real-to-be-used" C# code. If you need different variants with or without serialization, generate them too. A working generator should not be more effort than a couple of days (YMMV depending on your requirements).

I agree with Tristan Reid (wrapping the business logic).
Actually, some months ago I faced the same problem, and then I incidentally discovered the book "The Art Of Unix Programming" (freely available online). What grabbed my attention was the philosophy of separating policy from mechanism (i.e. interfaces from engines). Modern programming environments such as the NET platform try to integrate everything under a single domain. In those days I was asked for developing a WEB application that had to satisfy the following requirements:
It had to be easily adapted to future trends of User Interfaces without having to change the core algorithms.
It had to be accessible by means of different interfaces: web, command line and desktop GUI.
It had to run on Windows and Linux.
I bet for developing the mechanism (engines) completely in C/C++ and using native OS libraries (POSIX or WinAPI) and good open source libraries (postgresql, xml, etc...). I developed the engine modules as command-line programs and I eventually implemented 2 interfaces: web (with PHP+JQuery framework) and desktop (NET framework). Both interfaces had nothing to do with the mechanisms: they simply launched the core modules executables by calling functions such as CreateProcess() in Windows, or fork() in UNIX, and used pipes to monitor their processes.
I'm not saying UNIX Programming Philosophy is good for all purposes, but I am applying it from then with good results and maybe it will work for you too. Choose a language for implementing the mechanism and then use another that makes interface design easy.

You can wrap your business logic as a web service and call it from all three languages - just a single implementation.

You could model these data structures using tools like a UML modeler (Enterprise Architect comes to mind as it can generate code for all 3.) and then generate code for each language directly from the model.
Though I would look closely at a previous comment about using XSD.

I would accomplish that by using some kind of meta-information about your domain entities (either XML or DSL, depending on complexity) and then go for code generation for each language. That would reduce (manual) code duplication.

Related

Best practice for writing a custom portion of code?

Found thinking of a title a little tricky, but basically my question is how do you keep a system as generic (right term?) as possible when needing to write customer special code.
My scenario is that we have a standard system that lives in the trunk of subversion. Sometimes customers want to make a change that does not conform to that standard system, so our practice is to usually branch off the code and develop on that branch, but then it gets messy when trunk work needs to be included in a customer special or if an idea a customer wants needs to be in both the branch and trunk.
Is there a better way or standard to deal with such requests because it seems everything coded is very specific?
It's certainly a difficult scenario.
One option might be to have configuration to stipulate whether a feature is turned on or not - so the feature goes into the main branch but you only set the config to true for the customer(s) that require it. It's still a bit messy but I think it's cleaner than having multiple branches per customer.
In more complex scenarios you can develop new functionality as a type of 'plug in' architecture, where you have different implementations implementing the same interface - some type of factory could decide which to load.
I don't think there is any magic bullet for maintaining a generic code base which is subject to differing customer requests.
In terms of improving code re-usability, there are usually several parts:
Language Features
Being able to interchange smaller parts of an application is a core language design concern. Many languages are designed with methods of maximizing re-usability. Here are some examples (some are C# specific):
Interfaces
Generics
Co/Contra-variance
Dependency Injection
Design patterns (specifically, ones related to structure or creation)
They will allow you to reuse your codebase more easily but, on their own, aren't a golden bullet.
Project Design
Even if you exploit every feature available in C#, a insufficiently thought out project will probably be difficult to interchange. I'd reccomend any larger solutions should be split into logical, independant, subunits. Again, an example of a solution structure might be:
MyProgram // Wires up components and subunits.
MyProgram.Core // Contains a core 'kernel' for your application
MyProgram.IO // Contains generic interfaces and implementations for performing IO
MyProgram.UI // Contains UI code that displays and interacts with your core. The core should not depend on UI elements.
MyProgram.Models // Contains structure of databases, data models other components may act on
MyProgram.Models.Serialization // Converts your models for IO etc. Maybe you want MyProgram.IO to be generic enough to reuse.
Helpers // Generic helpers that you tend to use in multiple applications. You will want these to be in a standalone repository so that you can keep the classes validated.
Versioning
Ultimately, you may not be able to handle -every- problem with language features and good project design. If you have an awkward client request, sometimes you might want to pull (not branch) the components you -can- reuse and branch the ones you need to edit.

How to properly separate concerns in my architecture without designing a spacecraft?

In my last question I posted some sample code on how I was trying to achieve separation of concerns. I received some ok advice, but I still just don't "get it" and can't figure out how to design my app to properly separate concerns without designing the next space shuttle.
The site I am working on (slowly converting from old ASP section by section) is moderately sized with several different sections including a store (with ~100 orders per day) and gets a decent amount of traffic (~300k uniques/month). I am the primary developer and there might be at most 2-3 devs that will also work on the system.
With this in mind, I am not sure I need full enterprise level architecture (correct me if i am wrong), but since I will be working on this code for the next few years, I want it to perform well and also be easy to extend as needed. I am learning C# and trying to incorporate best practices from the beginning. The old ASP site was a spaghetti mess and I want to avoid that this time around.
My current stab at doing this ended up being a bunch of DTOs with services that validate and make calls to a DAL layer to persist. It was not intentional, but I think the way it is setup now is a perfect anemic domain model. I have been trying to combat this by turning my BLL to domain objects and only use the DTOs to transfer data between the DAL and BO, but it is just not working. I also had all my dtos/blls split up according to the database tables / functionality (eg - YouTube style app - I have separate DTO/BLL/DAL for segments, videos, files, comments, etc).
From what I have been reading, I need to be at least using repositories and probably interfaces as well. This is great, but I am unsure how to move forward. Please help!
From what I can see you have four points that need addressing:
(1) "With this in mind, I am not sure I need full enterprise level architecture"
Lets deal with the high level fluff first. It depends on what you mean by "full enterprise level architecture", but the short answer is "Yes" you need to address many aspects of the system (and it will depend on the context of the system as to what the main ones are). If nothing else, the keys ones would be Change and Supportability. You need to structure the application in a way that supports changes in the future (logical and physical separation of concerns (Dependency Injection is a great for the latter); modular design, etc).
(2) "How to properly separate concerns in my architecture without designing a spacecraft?"
I like this approach (it's an article I wrote that distilled everything I had learnt up to that point) - but here's the gist:
Looking at this you'll have a minimum of six assemblies - and that's not huge. If you can break your system down (separate concerns) into these large buckets it should go a long way to giving what you need.
(3) Detail
Separating concerns into different layers and classes is great but you need to go further than that if you want to be able to effectively deal with change. Dependency Inversion (DI) is a key tool to use here. When I learnt DI it was a hand-rolled affair (as shown in the previous link) but there are lots of frameworks (etc) for it now. If you're new to DI (and you work in .Net) the article will step you through the basics.
(4) How to move forward
Get a simple vertical slice (UI all the way to the DB) working using DI, etc. As you do this you'll also be building the bones of the framework (sub-systems and major plumbing) that your system will use.
Having got that working start on a second slice; it's at this point that you should uncover any places where you're inadvertently not reusing things you should be - this is the time to change those before you build slices 3,4 and 5 - before there's too much rework.
Updates for Comments:
Do you you think I should completely
drop web forms and take up MVC from
scratch or just with what I know for
now?
I have no idea, but for the answer to be 'yes' you'd need to be able to answer these following questions with 'yes':
We have the required skills and experience to use and support MVC.
We have time to make the change (there is clear benefit in making this change).
We know MVC is better suited for our needs.
Making this change does not put successful delivery at risk.
...do I need to move to projects and
setup each of these layers as a
separate project?
Yes. Projects map 1-to-1 with assemblies, so get the benefits of loose-coupling you'll definitely want to separate things that way, and be careful how you set references.
when you refer to POCOs, are you meaning just DTOs or rich domain objects?
DTO not Rich Domain Object. BUT, people seem yo use the terms POCO and DTO interchangeably when strictly speaking they aren't - if you're from the Martin Fowler school of thought. In his view a DTO would be a bunch of POCO's (or other objects(?)) parcelled together for sending "across the wire" so that you only make one call to some external system and not lots of calls.
Everyone says I should not expose my
data structures to my UI, but I say
why not?
Managing dependencies. What you don't want is for you UI to reference the physical data structure because as soon as that changes (and it will) you'll be (to use the technical term) screwed. This is the whole point of layering. What you want to do is have the UI depend on abstractions - not implementations. In the 5-Layer Architecture the POCOs are safe to use for that because they are an abstract / logical definition of 'some thing' (a business concept) and so they should only change if there is a business reason - so in that sense they are fairly stable and safer to depend on.
If you are in the process of rewriting your eCommerce site, you should atleast consider replacing it with a standard package.
There are many more such packages available today. So although the decision to build the original site may have been correct, it is possible that building a custom app is no longer the correct decision.
There are several eConmmerce platforms listed here: Good e-commerce platform for Java or .NET
It should cost much less than the wages of 2-3 developers.

C# OAuth 2.0 Library - How to Implement the Domain Model

OAuth 2.0 specs are getting more and more stable (http://tools.ietf.org/html/draft-ietf-oauth-v2) and I will be implementing C# OAuth 2.0 library for an internal project. I would like to hear opinions on how to implement a clear domain for the library. Major points of concern would be:
How to name the classes, should every or most keywords in the specs describing concrete concepts be made into separate classes?
How to name the namespaces, should every major topic discussed in the specification be made into a separate namespace (authentication, server, client, security, etc.)
How should the server and client resources be modeled (as properties inside classes, or as inner classes)
And many more I'll be listing as they come up...
Anybody with real experience in creating libs out of specifications (like the numerous IETF specs) would be of monumental help. It would also be very helpful to point out libs with excellent spec implementations, which can act as a guide.
Edit: Checked out DotNetOAuth CTP but obviously they don't provide a clean model as to be inspired of.
You are probably on the right track. In general, names for classes and attributes should largely follow the spec, and you should include links to the specification within the XML documentation. By matching the names, a person familiar with the standard can more easily understand what the code is doing.
I would heavily recommend including unit tests for the complete project. Not only will this help you maintain the integrity of each build, but they will expose areas that are not as usable as they should be. For example, if you find yourself having to use a convoluted mess of classes and methods to simply request authentication for something, then you need to refactor it to be easier on the consumer of the library.
Basically, your priorities should be in this order:
Working code
Easy to Use
Documentation
Matching the Specification
Other than this, you have freedom to implement it to your personal preference. You may notice that there are some domains that have tons of different libraries that accomplish the same thing in different ways. This is a good thing, because different people like different things. Some people will want a library that mirrors a specification, while others will want to use one with good documentation that might be difficult to use. Others just want something that will work with a few lines of code and stay out of their way. It largely depends on your beliefs on the matter. You can't please them all, but just choose a path and run with it.
That said, I would recommend against excessive namespacing. It's much easier for people to do an include MyOpenAuth rather than include 3 different namespaces. Use them where it seems logical, but in general, the concept of Open Authentication can be considered it's own subject domain (under the umbrella of a single namespace). But, it's up to you.

Why do C# and Java require everything to be in a class?

It seemed like this question should have been asked before, but searching found nothing.
I've always wondered what's the point of making us put every bit of code inside a class or interface. I seem to remember that there were some advantages to requiring a main() function like C, but nothing for classes. Languages like Python are, in a way, even more object oriented than Java since they don't have primitives, but you can put code wherever you want.
Is this some sort of "misinterpretation" of OOP? After all, you can write procedural code like you would in C and put it inside a class, but it won't be object oriented.
I think the goal of requiring that everything is enclosed in classes is to minimize the number of concepts that you need to deal with in the language. In C# or Java, you only need to understand the object-model (which is fairly complex, though). However, you only have classes with members and instances of classes (objects).
I think this is a very important goal that most of the languages try to follow in one way or another. If C# had some global code (for example to allow interactive evaluation and specification of the startup code without Main method), you'd have one additional concept to learn (top-level code). The choice made by C#/Java is of course just one way to get the simplicity.
Of course, it is a question whether this is the right choice. For example:
In functional languages, programs are structured using types (type declarations) and expressions. The body of the program is simply an expression that is evaluated, which is a lot simpler than a class with Main method and it also enables interactive scripting (as in Python).
In Erlang (and similar languages), program is structured as concurrently executing processes with one main process that starts other processes. This is a dramatically different approach, but it makes a good sense for some types of applications.
In general, every language has some way of looking at the world and modelling it and uses this point of view when looking at everything. This works well in some scenarios, but I think that none of the models is fully universal. That may be a reason why languages that mix multiple paradigms are quite popular today.
As a side-note, I think that the use of Main method is somewhat arguable choice (probably inheriting from C/C++ languages). I would suppose that more clear object-oriented solution would be to start the program by creating an instance of some Main class.
C# was not designed for "programming in the small". Rather, it was designed for component-oriented programming. That is, for programming scenarios where teams of people are developing interdependent software components that are going to be released in multiple versions over time.
The emphasis on programming-in-the-large and away from programming-in-the-small means that sometimes there is a whole lot of 'ceremony' around small programs. Using this, class that, main blah blah blah, all to write 'hello world'.
The property "a one line program is one line long" would be nice to have in C#. We're considering allowing code outside of classes in small programs as a possible feature in a hypothetical future version of C#; if you have constructive opinions pro or con on such a feature, feel free to send them to me via the contact link on my blog.
I think the idea with Java was that a top-level class would represent a single unit of code that would be compiled to a separate .class file. The idea was that these discrete, self-contained units of code could then be easily shared and combined into many projects (just as a carpenter can combine basic parts like nuts, bolts, pieces of wood, etc., to make a variety of items). The class was seen as the smallest, most basic atomic unit, and thus everything should be a part of a class to make it easier to assemble larger programs from these parts.
One can argue that object-oriented programming's promise of easily composable code didn't work out very well, but back when Java was being designed, the goal of OOP was to create little units (classes) that could easily be combined to make unique programs.
I imagine C# had some of the same goals in mind.

Comparing C# and Java

I learned Java in college, and then I was hired by a C# shop and have used that ever since. I spent my first week realizing that the two languages were almost identical, and the next two months figuring out the little differences. For the most part, was I noticing the things that Java had that C# doesn't, and thus was mostly frustrated. (example: enum types which are full-fledged classes, not just integers with a fresh coat of paint) I have since come to appreciate the C# world, but I can't say I knew Java well enough to really contrast the two so I'm curious to get a community cross-section.
What are the relative merits and weaknesses of C# and Java? This includes everything from language structure to available IDEs and server software.
Comparing and contrasting the languages between the two can be quite difficult, as in many ways it is the associated libraries that you use in association with the language that best showcases the various advantages of one of another.
So I'll try to list out as many things I can remember or that have already been posted and note who I think has the advantage:
GUI development (thick or thin). C# combined with .NET is currently the better choice.
Automated data source binding. C# has a strong lead with LINQ, also a wealth of 3rd part libraries also gives the edge
SQL connections. Java
Auto-boxing. Both languages provide it, but C# Properties provides a better design for it in regards to setters and getters
Annotation/Attributes. C# attributes are a stronger and clear implementation
Memory management - Java VM in all the testing I have done is far superior to CLR
Garbage collection - Java is another clear winner here. Unmanaged code with the C#/.NET framework makes this a nightmare, especially when working with GUI's.
Generics - I believe the two languages are basically tied here... I've seen good points showing either side being better. My gut feeling is that Java is better, but nothing logic to base it on. Also I've used C# generics ALLOT and Java generics only a few times...
Enumerations. Java all the way, C# implementation is borked as far as I'm concerned.
XML - Toss up here. The XML and serialization capabilities you get with .NET natively beats what you get with eclipse/Java out of the box. But there are lots of libraries for both products to help with XML... I've tried a few and was never really happy with any of them. I've stuck with native C# XML combined with some custom libraries I made on my own and I'm used to it, so hard to give this a far comparison at this point...
IDE - Eclipse is better than Visual Studio for non-GUI work. So Java wins for non-GUI and Visual Studio wins for GUI...
Those are all the items I can't think off for the moment... I'm sure you can literally pick hundreds of items to compare and contrasting the two. Hopefully this lists is a cross section of the more commonly used features...
One difference is that C# can work with Windows better. The downside of this is that it doesn't work well with anything but Windows (except maybe with Mono, which I haven't tried).
Another thing to keep in mind, you may also want to compare their respective VMs.
Comparing the CLR and Java VM will give you another way to differentiate between the two.
For example, if doing heavy multithreading, the Java VM has a stronger memory model than the CLR (.NET's equivalent).
C# has a better GUI with WPF, something that Java has traditionally been poor at.
C# has LINQ which is quite good.
Otherwise the 2 are practically the same - how do you think they created such a large class library so quickly when .NET first came out? Things have changed slightly since then, but fundamentally, C# could be called MS-Java.
Don't take this as anything more than an opinion, but personally I can't stand Java's GUI. It's just close enough to Windows but not quite, so it gets into an uncanny valley area where it's just really upsetting to me.
C# (and other .Net languages, I suppose) allow me to make programs that perfectly blend into Windows, and that makes me happy.
Of course, it's moot if we're not talking about developing a desktop application...
Java:
Enums in Java kick so much ass, its not even funny.
Java supports generic variance
C#:
C# is no longer limited to Windows (Mono).
The lack of the keyword internal in Java is rather disappointing.
You said:
enum types which are full-fledged classes, not just integers with a fresh coat of paint
Have you actually looked at the output? If you compile an application with enums in in then read the CIL you'll see that an enum is actually a sealed class deriving from System.Enum.
Tools such as Red-Gate (formerly Lutz Roeder's) Reflector will disassemble it as close to the orginal C# as possible so it may not be easily visible what is actually happening under the hood.
As Elizabeth Barrett Browning said: How do I love thee? Let me count the ways.
Please excuse the qualitative (vs. quantitative) aspect of this post.
Comparing these 2 languages (and their associated run-times) is very difficult. Comparisons can be at many levels and focus on many different aspects (such as GUI development mentioned in earlier posts). Preference between them is often personal and not just technical.
C# was originally based on Java (and the CLR on the JRE) but, IMHO, has, in general, gone beyond Java in its features, expressiveness and possibly utility. Being controlled by one company (vs. a committee), C# can move forward faster than Java can. The differences ebb and flow across releases with Java often playing catch up (such as the recent addition of lambdas to Java which C# has had for a long time). Neither language is a super-set of the other in all aspects as both have features (and foibles) the other lacks.
A detailed side-by-side comparison would likely take several 100s of pages. But my net is that for most modern business related programming tasks they are similar in power and utility. The most critical difference is probably in portability. Java runs on nearly all popular platforms, which C# runs mostly only on Windows-based platforms (ignoring Mono, which has not been widely successful). Java, because of its portability, arguably has a larger developer community and thus more third party library and framework support.
If you feel the need to select between them, your best criteria is your platform of interest. If all your work will run only on Windows systems, IMHO, C#/CLR, with its richer language and its ability to directly interact with Windows' native APIs, is a clear winner. If you need cross system portability then Java/JRE is a clear winner.
PS. If you need more portable jobs skills, then IMHO Java is also a winner.

Categories