Event sourcing incremental int id

Event sourcing incremental int id - c#

I looked at a lot of event sourcing tutorials and all are using simple demos to focus on the tutorials topic (Event sourcing)
That's fine until you hit in a real work application something that is not covered in one of these tutorials :)
I hit something like this.
I have two databases, one event-store and one projection-store (Read models)
All aggregates have a GUID Id, what was 100% fine until now.
Now I created a new JobAggregate and a Job Projection.
And it's required by my company to have a unique incremental int64 Job Id.
Now I'm looking stupid :)
An additional issue is that a job is created multiple times per second!
That means, the method to get the next number have to be really safe.
In the past (without ES) I had a table, defined the PK as auto increment int64, save Job, DB does the job to give me the next number, done.
But how can I do this within my Aggregate or command handler?
Normally the projection job is created by the event handler, but that's to late in the process, because the aggregate should have the int64 already. (For replaying the aggregate on an empty DB and have the same Aggregate Id -> Job Id relation)
How should I solve this issue?
Kind regards

In the past (without ES) I had a table, defined the PK as auto increment int64, save Job, DB does the job to give me the next number, done.
There's one important thing to notice in this sequence, which is that the generation of the unique identifier and the persistence of the data into the book of record both share a single transaction.
When you separate those ideas, you are fundamentally looking at two transactions -- one that consumes the id, so that no other aggregate tries to share it, and another to write that id into the store.
The best answer is to arrange that both parts are part of the same transaction -- for example, if you were using a relational database as your event store, then you could create an entry in your "aggregate_id to long" table in the same transaction as the events are saved.
Another possibility is to treat the "create" of the aggregate as a Prepare followed by a Created; with an event handler that responds to the prepare event by reserving the long identifier post facto, and then sends a new command to the aggregate to assign the long identifier to it. So all of the consumers of Created see the aggregate with the long assigned to it.
It's worth noting that you are assigning what is effectively a random long to each aggregate you are creating, so you better dig in to understand what benefit the company thinks it is getting from this -- if they have expectations that the identifiers are going to provide ordering guarantees, or completeness guarantees, then you had best understand that going in.
There's nothing particularly wrong with reserving the long first; depending on how frequently the save of the aggregate fails, you may end up with gaps. For the most part, you should expect to be able to maintain a small failure rate (ie - you check to ensure that you expect the command to succeed before you actually run it).
In a real sense, the generation of unique identifiers falls under the umbrella of set validation; we usually "cheat" with UUIDs by abandoning any pretense of ordering and pretending that the risk of collision is zero. Relational databases are great for set validation; event stores maybe not so much. If you need unique sequential identifiers controlled by the model, then your "set of assigned identifiers" needs to be within an aggregate.
The key phrase to follow is "cost to the business" -- make sure you understand why the long identifiers are valuable.

Here's how I'd approach it.
I agree with the idea of an Id generator which is the "business Id" but not the "techcnical Id"
Here the core is to have an application-level JobService that deals with all the infrastructure services to orchestrate what is to be done.
Controllers (like web controller or command-lines) will directly consume the JobService of the application level to control/command the state change.
It's in PHP-like pseudocode, but here we talk about the architecture and processes, not the syntax. Adapt it to C# syntax and the thing is the same.
Application level
class MyNiceWebController
{
public function createNewJob( string $jobDescription, xxxx $otherData, ApplicationJobService $jobService )
{
$projectedJob = $jobService->createNewJobAndProject( $jobDescription, $otherData );
$this->doWhateverYouWantWithYourAleadyExistingJobLikeForExample301RedirectToDisplayIt( $projectedJob );
}
}
class MyNiceCommandLineCommand
{
private $jobService;
public function __construct( ApplicationJobService $jobService )
{
$this->jobService = $jobService;
}
public function createNewJob()
{
$jobDescription = // Get it from the command line parameters
$otherData = // Get it from the command line parameters
$projectedJob = $this->jobService->createNewJobAndProject( $jobDescription, $otherData );
// print, echo, console->output... confirmation with Id or print the full object.... whatever with ( $projectedJob );
}
}
class ApplicationJobService
{
// In application level because it just serves the first-level request
// to controllers, commands, etc but does not add "domain" logic.
private $application;
private $jobIdGenerator;
private $jobEventFactory;
private $jobEventStore;
private $jobProjector;
public function __construct( Application $application, JobBusinessIdGeneratorService $jobIdGenerator, JobEventFactory $jobEventFactory, JobEventStoreService $jobEventStore, JobProjectorService $jobProjector )
{
$this->application = $application; // I like to lok "what application execution run" is responsible of all domain effects, I can trace then IPs, cookies, etc crossing data from another data lake.
$this->jobIdGenerator = $jobIdGenerator;
$this->jobEventFactory = $jobEventFactory;
$this->jobEventStore = $jobEventStore;
$this->jobProjector = $jobProjector;
}
public function createNewJobAndProjectIt( string $jobDescription, xxxx $otherData ) : Job
{
$applicationExecutionId = $this->application->getExecutionId();
$businessId = $this->jobIdGenerator->getNextJobId();
$jobCreatedEvent = $this->jobEventFactory->createNewJobCreatedEvent( $applicationExecutionId, $businessId, $jobDescription, $otherData );
$this->jobEventStore->storeEvent( $jobCreatedEvent ); // Throw exception if it fails so no projecto will be invoked if the event was not created.
$entityId = $jobCreatedEvent->getId();
$projectedJob = $this->jobProjector->project( $entityId );
return $projectedJob;
}
}
Note: if projecting is too expensive for synchronous projection just return the Id:
// ...
$entityId = $jobCreatedEvent->getId();
$this->jobProjector->enqueueProjection( $entityId );
return $entityId;
}
}
Infrastructure level (common to various applications)
class JobBusinessIdGenerator implements DomainLevelJobBusinessIdGeneratorInterface
{
// In infrastructure because it accesses persistance layers.
// In the creator, get persistence objects and so... database, files, whatever.
public function getNextJobId() : int
{
$this->lockGlobalCounterMaybeAtDatabaseLevel();
$current = $this->persistance->getCurrentJobCounter();
$next = $current + 1;
$this->persistence->setCurrentJobCounter( $next );
$this->unlockGlobalCounterMaybeAtDatabaseLevel();
return $next;
}
}
Domain Level
class JobEventFactory
{
// It's in this factory that we create the entity Id.
private $idGenerator;
public function __construct( EntityIdGenerator $idGenerator )
{
$this->idGenerator = $idGenerator;
}
public function createNewJobCreatedEvent( Id $applicationExecutionId, int $businessId, string $jobDescription, xxxx $otherData ); : JobCreatedEvent
{
$eventId = $this->idGenerator->createNewId();
$entityId = $this->idGenerator->createNewId();
// The only place where we allow "new" is in the factories. No other places should do a "new" ever.
$event = new JobCreatedEvent( $eventId, $entityId, $applicationExecutionId, $businessId, $jobDescription, $otherData );
return $event;
}
}
If you do not like the factory creating the entityId, could seem ugly to some eyes, just pass it as a parameter with a specific type and pss the responsibility to create a new fresh one and do not reuse one at some other intermedaite service (never the application service) to create it for you.
Nevertheless if you do so, pay care to what if a "silly" service just creates "two" JobCreatedEvent with the same entity Id? That would really be ugly. At the end, creation would only occur once, and the Id is created at the very core of the "creation of the event of JobCreationEvent" (reundant redundancy). Your choice anyway.
Other classes...
class JobCreatedEvent;
class JobEventStoreService;
class JobProjectorService;
Things that do not matter in this post
We could discuss much if the projectors shoud be in the infrastructure level global to multiple applications calling them... or even in the domain (as I need "at least" one way to read the model) or it belongs more to the application (maybe the same model can be read in 4 different ways in 4 different applications and each they have their own projectors)...
We could discuss much where are the side-effects triggered if implicit in the event-store or in the application level (I've not called any side-effects processor == event listener). I think of side-effects being in the application layer as they depend on infrastructure...
But all this... is not the topic of this question.
I don't care about all those things for this "post". Of course they are not negligible topics and you will have your own strategy for them. And you have to design all this very carefully. But here the question was where to crete the auto-incremental Id coming from a business requierement. And doing all those projectors (sometimes called calculators) and side-effects (sometimes called reactors) in a "clean-code" way here would blur the focus of this answer. You get the idea.
Things that I care in this post
What I care is that:
If the experts what an "autonumeric" then it's a "domain requirement" and therefore its a property in the same level of definition than "description" or "other data".
The fact they want this property does not conflict with the fact that all entities have an "internal id" in the format that the coder chooses, being an uuid, a sha1 or whatever.
If you need sequential ids for that property, you need a "supplier of values" AKA JobBusinessIdGeneratorService which has nothing to do with the "entity Id" itself.
That Id generator will be the responsible to ensure that once the number has been autoincremented, it is syncrhonously persisted before it's being returned to the client, so it is impossible to return two times the same id upon failures.
Drawbacks
There's a sequence-leak you'll have to deal with:
If the Id generator points to 4007, the next call to getNextJobId() will increment it to 4008, persist the pointer to "current = 4008" and then return.
If for some reason the creation and persistence fails, then the next call will give 4009. We then will have a sequence of [ 4006, 4007, 4009, 4010 ], with 4008 missing.
It was because from the generator point of view, 4008 was "actually used" and it, as a generator, does not know what you did with it, the same way than if you have a dummy silly loop that extracts 100 numbers.
Do never compensate with a ->rollback() in a catch of a try / catch block because that can generate you concurrency problems if you get 2008, another process gets 2009, then the first process fails, the rollback will break. Just assume that "on failure" the Id was "just consumed" and do not blame the generator. Blame who failed.
I hope it helps!

#SharpNoizy, very simple.
Create your own Id Generator. Say an alphanumeric string, for example "DB3U8DD12X" that gives you billions of possibilites. Now, what you want to do is generate these ids in a sequencial order by giving each character an ordered value...
0 - 0
1 - 1
2 - 2
.....
10 - A
11 - B
Get the idea? So, what you do next is to create your function that will increment each index of your "D74ERT3E4" string using that matrix.
So, "R43E4D", "R43E4E", "R43E4F", "R43E4G"... get the idea?
Then when you application loads, you look at the database and find the latest Id generated. Then you load in memory the next 50,000 combinations (in case that you want super speed) and create a static class/method that is going to give you that value back.
Aggregate.Id = IdentityGenerator.Next();
this way you have control over the generation of your IDs because that's the only class that has that power.
I like this approach because is more "readable" when using it in your web api for example. GUIDs are hard (and tedious) to read, remember, etc.
GET api/job/DF73 is way better to remember than api/job/XXXX-XXXX-XXXXX-XXXX-XXXX
Does that make sense?

Related

Enforcing invariants for child entities in concurrent editing environments

Given the invariant that a child collection cannot exceed x number of items, how can the domain guarantee such an invariant is enforced in a concurrent/web environment? Let's look at a (classic) example:
We have a Manager with Employees. The (hypothetical) invariant states that a Manager cannot have more than seven direct reports (Employees). We might implement this (naively) like so:
public class Manager {
// Let us assume that the employee list is mapped (somehow) from a persistence layer
public IList<Employee> employees { get; private set; }
public Manager(...) {
...
}
public void AddEmployee(Employee employee) {
if (employees.Count() < 7) {
employees.Add(employee);
} else {
throw new OverworkedManagerException();
}
}
}
Until recently, I had considered this approach to be good enough. However, it seems there is an edge-case that makes it possible for the database to store more than seven employees and thus break the invariant. Consider this series of events:
Person A goes to edit Manager in UI
(6 employees in memory, 6 employees in database)
Person B goes to edit Manager in UI
(6 employees in memory, 6 employees in database)
Person B adds Employee and saves changes
(7 employees in memory, 7 employees in database)
Person A adds Employee and saves changes
(7 employees in memory, 8 employees in database)
When the domain object is once again pulled from the database, the Manager constructor may (or may not) reinforce the Employee count invariant on the collection, but either way we now have a discrepancy between our data and what our invariant expects. How do we prevent this situation from happening? How do we recover from this cleanly?

Consider this series of events:
Person A goes to edit Manager in UI
(6 employees in memory, 6 employees in database)
Person B goes to edit Manager in UI
(6 employees in memory, 6 employees in database)
Person B adds Employee and saves changes
(7 employees in memory, 7 employees in database)
Person A adds Employee and saves changes
(7 employees in memory, 8 employees in database)
The simplest approach is to implement the database writes as a compare and swap operation. All writes are working with a stale copy of the aggregate (after all, we're looking at the aggregate in memory, but the book of record is the durable copy on disk). The key idea is that when we actually perform the write, we are also checking that the stale copy we were working with is still the live copy in the book of record.
(For instance, in an event sourced system, you don't append to the stream, but append to a specific position in the stream -- ie, where you expect the tail pointer to be. So in a race, only one write gets to commit to the tail position; the other fails on a concurrency conflict and starts over.)
The analog to this in a web environment might be to use an eTag, and verify that the etag is still valid when you perform the write. The winner gets a successful response, the loser gets a 412 Precondition Failed.
An improvement on this is to use a better model for your domain. Udi Dahan wrote:
A microsecond difference in timing shouldn’t make a difference to core business behaviors
Specifically, if your model ends up in a different state just because commands A and B happen to be processed in a different order, your model probably doesn't match your business very well.
The analog in your example would be that both commands should succeed, but the second of the two should also set a flag that notes that the aggregate is currently out of compliance. This approach prevents idiocies when an addEmployee command and a removeEmployee command happen to get ordered the wrong way around in the transport layer.
The (hypothetical) invariant states that a Manager cannot have more than seven direct reports
A thing to be wary of -- even in hypothetical examples, is whether or not the database is the book of record. The database seldom gets veto power over the real world. If the real world is the book of record, your probably shouldn't be rejecting changes.

How do we prevent this situation from happening?
You implement this behavior in your Repository implementation: when you load the Aggregate, you also keep track of the Aggregate's version. The version can be implemented as a unique key constraint of Aggregate's Id and a integer sequence number. Every Aggregate has it's own sequence number (initially every Aggregate has sequence number 0). Before the Repository tries to persist it, it increments the sequence number; if a concurrent persist has occurred, the database behind the Repository will throw a "unique key constraint violated" kind of exception and the persisting will not occur.
Then (if you have designed the Aggregate as a pure, non-side effect object as you should do in DDD!), you could transparently retry the command execution, re-running all the Aggregate's domain code, thus re-checking the invariants. Please note that the operation must be retried only if a "unique constraint violation" infrastructure exception occur, not in case the Aggregate throws a domain exception.
How do we recover from this cleanly?
You could retry the command execution until no "unique constraint violation" is thrown.
I've implemented this retrying in PHP here: https://github.com/xprt64/cqrs-es/blob/master/src/Gica/Cqrs/Command/CommandDispatcher/ConcurrentProofFunctionCaller.php

This is not so much a DDD problem as a persistence layer problem. There are multiple ways to look at this.
From a traditional ACID/strong consistency perspective
You need to have a look at your particular database's available concurrency and isolation strategies, possibly reflected in your ORM capabilities. Some of them will allow you to detect such conflicts and throw an exception as Person A saves their changes at step 4.
As I said in my comment, in a typical web application that uses the Unit of Work pattern (via an ORM or otherwise), this shouldn't happen quite as often as you seem to imply though. Entities don't stay in memory tracked by the UoW all along steps 1. to 4., they are reloaded at steps 3. and 4. Transactions 3 and 4 would have to be concurrent for the problem to occur.
Weaker, lock-free consistency
You have a few options here.
Last-one-wins, where the 7 employees from Person A will erase those from Person B. This can be viable in certain business contexts. You can do it by persisting the change as a employees = <new list> instead of employees.Add.
Relying on version numbers, as #VoiceOfUnreason described.
Eventual consistency with compensation, where something else in the application checks the invariant (employees.Count() < 7) after the fact, outside of Person A and B's transactions. A compensating action has to be taken if a violation of the rule is detected, like rollbacking the last operation and notifying Person A that the manager would have been overworked.

How do I unit test a public method which utilizes a private property?

I have a class which is basically a pipeline. It processes messages and then deletes them in batches. In order to do this the ProcessMessage() method doesn't directly delete messages; it adds them to a private Observable<IMessage>(). I then have another public method which watches that observable and deletes the messages en masse.
That results in code similar to:
public void CreateDeletionObservable(int interval = 30, int messageCount = 10)
{
this.processedMessages.Buffer(TimeSpan.FromSeconds(interval), messageCount).Subscribe(observer =>
{
client.Value.DeleteMessages(observer.ToList());
});
}
The problem is that my unit test doesn't have a value for processedMessages. I can't provide a moq'd value as it's private. I don't need to test what values are in processedMessages; I just need for them to exist in order to test that method's behavior. Specifically I need to test that my observable will continue running if an exception is thrown (that logic isn't in the code yet). As I see it I have a few options:
1) Refactor my class to use a single monster observable chain with a single entry point and a few exits (success, error, retry, etc.). This would avoid the use of private properties to pass collections around between public methods. However, that chain would be extremely difficult to parse much less unit test. I don't believe that making my code less readable and testable is a viable option.
2) Modify my CreateDeletionObservable method to accept a test list of Messages:
public void CreateDeletionObservable(int interval = 30, int messageCount = 10, IObservable<IMessage> processedMessages = null)
That would allow me to supply stubbed data for the method to use, but it's a horrible code smell. A variation on this is to inject that Observable at the constructor level, but that's no better. Possibly worse.
3) Make processedMessages public.
4) Don't test this functionality.
I don't like any of these options, but I'm leaning towards 2; injecting a list for testing purposes. Is there an option I'm missing here?

Your senses serve you well. I think in this case you can revert to guidance I find useful of "Test your boundaries" (Udi Dahan, but cant find the reference).
It seems that you can input message (via an Observable Sequence) and that as a side effect you will eventually delete these messages from the Client. So it seems that your test should read something like
"Given an EventProcessor, When 10 Messages are Processed, Then the Events are deleted from the client"
"Given an EventProcessor, When 5 Messages are Processed in 30s, Then the Events are deleted from the client"
So instead of testing this small part of the pipe that somehow knows about this.processedMessages (where did that instance come from?), test the chain. But this doesn't mean you need to create a massive unusable chain. Just create enough of the chain to make it testable.
Providing more of the code base would also help, e.g. where does this.processedMessages & client.Value come from? This is probably key and at a guess applying a more functional approach might help?

How to avoid convoluted logic for custom log messages in code?

I know the title is a little too broad, but I'd like to know how to avoid (if possible) this piece of code I've just coded on a solution of ours.
The problem started when this code resulted in not enough log information:
...
var users = [someRemotingProxy].GetUsers([someCriteria]);
try
{
var user = users.Single();
}
catch (InvalidOperationException)
{
logger.WarnFormat("Either there are no users corresponding to the search or there are multiple users matching the same criteria.");
return;
}
...
We have a business logic in a module of ours that needs there to be a single 'User' that matches some criteria. It turned out that, when problems started showing up, this little 'inconclusive' information was not enough for us to properly know what happened, so I coded this method:
private User GetMappedUser([searchCriteria])
{
var users = [remotingProxy]
.GetUsers([searchCriteria])
.ToList();
switch (users.Count())
{
case 0:
log.Warn("No user exists with [searchCriteria]");
return null;
case 1:
return users.Single();
default:
log.WarnFormat("{0} users [{1}] have been found"
users.Count(),
String.Join(", ", users);
return null;
}
And then called it from the main code like this:
...
var user = GetMappedUser([searchCriteria]);
if (user == null) return;
...
The first odd thing I see there is the switch statement over the .Count() on the list. This seems very strange at first, but somehow ended up being the cleaner solution. I tried to avoid exceptions here because these conditions are quite normal, and I've heard that it is bad to try and use exceptions to control program flow instead of reporting actual errors. The code was throwing the InvalidOperationException from Single before, so this was more of a refactor on that end.
Is there another approach to this seemingly simple problem? It seems to be kind of a Single Responsibility Principle violation, with the logs in between the code and all that, but I fail to see a decent or elegant way out of it. It's even worse in our case because the same steps are repeated twice, once for the 'User' and then for the 'Device', like this:
Get unique user
Get unique device of unique user
For both operations, it is important to us to know exactly what happened, what users/devices were returned in case it was not unique, things like that.

#AntP hit upon the answer I like best. I think the reason you are struggling is that you actually have two problems here. The first is that the code seems to have too much responsibility. Apply this simple test: give this method a simple name that describes everything it does. If your name includes the word "and", it's doing too much. When I apply that test, I might name it "GetUsersByCriteriaAndValidateOnlyOneUserMatches()." So it is doing two things. Split it up into a lookup function that doesn't care how many users are returned, and a separate function that evaluates your business rule regarding "I can handle only one user returned".
You still have your original problem, though, and that is the switch statement seems awkward here. The strategy pattern comes to mind when looking at a switch statement, although pragmatically I'd consider it overkill in this case.
If you want to explore it, though, think about creating a base "UserSearchResponseHandler" class, and three sub classes: NoUsersReturned; MultipleUsersReturned; and OneUserReturned. It would have a factory method that would accept a list of Users and return a UserSearchResponseHandler based on the count of users (encapsulating the logic of the switch inside the factory.) Each handler method would do the right thing: log something appropriate then return null, or return a single user.
The main advantage of the Strategy pattern comes when you have multiple needs for the data it identifies. If you had switch statements buried all over your code that all depended on the count of users found by a search, then it would be very appropriate. The factory can also encapsulate substantially more complex rules, such as "user.count must = 1 AND the user[0].level must = 42 AND it must be a Tuesday in September". You can also get really fancy with a factory and use a registry, allowing for dynamic changes to the logic. Finally, the factory nicely separates the "interpreting" of the business rule from the "handling" of the rule.
But in your case, probably not so much. I'm guessing you likely have only the one occurrence of this rule, it seems pretty static, and it's already appropriately located near the point where you acquired the information it's validating. While I'd still recommend splitting out the search from the response parser, I'd probably just use the switch.
A different way to consider it would be with some Goldilocks tests. If it's truly an error condition, you could even throw:
if (users.count() < 1)
{
throw TooFewUsersReturnedError;
}
if (users.count() > 1)
{
throw TooManyUsersReturnedError;
}
return users[0]; // just right

How about something like this?
public class UserResult
{
public string Warning { get; set; }
public IEnumerable<User> Result { get; set; }
}
public UserResult GetMappedUsers(/* params */) { }
public void Whatever()
{
var users = GetMappedUsers(/* params */);
if (!String.IsNullOrEmpty(users.Warning))
log.Warn(users.Warning);
}
Switch for a List<string> Warnings if required. This treats your GetMappedUsers method more like a service that returns some data and some metadata about the result, which allows you to delegate your logging to the caller - where it belongs - so your data access code can get on with just doing its job.
Although, to be honest, in this scenario I would prefer simply to return a list of user IDs from GetMappedUsers and then use users.Count to evaluate your "cases" in the caller and log as appropriate.

What's the best way to return multiple enum values? (java and C#)

more of original content deleted to make question easier to reference:
So I have a House class that has a method House.buy(Person p), causing the person to buy the house. I want to know if its possible for the Person to buy the House, so I also have a method House.tryBuy(Player p) that returns if the Person can buy the house. I have an enum BuyState with values like OK, NotEnoughMoney, and AlreadyOwned. There's a few different conditions to be satisfied, and the client would like to know which failed. But what if multiple conditions fail? I could either have a hierarchy, like if House is already owned and Person doesn't have enough money, return BuyStates.AlreadyOwned. But this only lets me tell the client one thing.
I could have N separate conditions and an enum with N*N values, like ConditionA_AND_ConditionB_ANDConditionC but that makes no sense at all for several reasons. I know there are bit-fields, with a bit for each condition, but they just seem too low-level, annoying to implement, and not-scalable. So I need a way to return a list of values from an enum, So how about a class like this:
class C<type_of_enum> {
private List<type_of_enum> values;
//etc etc
}
Is this the "best" possible design?
(keeping this question about java AND C# to keep answers valid)

In Java, the most natural way to do this is with EnumSet. An example of constructing one:
return EnumSet.of(BuyConditions.NotEnoughMoney, BuyConditions.AlreadyOwned);

In C# I would go for a flags enumeration.
Check Designing Flags Enumerations.

Is this the "best" possible design?
This seems like a bizarre design all around. When modeling real-world things in software, it pays dividends to make the model reflect reality; such models are easier to understand, maintain, and expand.
First off, a house is not something that buys a person. A person is something that buys a house. The "buy" method should be on "person", not on "house".
Second, a house is not something which determines whether a house can be bought. The owner of the house is the entity which determines whether it can be bought. (Why is there an "already owned" error condition? Of course the house is already owned. Someone owns it.)
Third, you might have to consider what happens in a world where multiple buyers might be attempting to buy the house all at once. In reality, the seller collects various offers and makes counteroffers, sales may be contingent upon other events, and so on. Should all of those things be present in the model? If so, where? Probably in the state of the object representing the owner, since the owner is the thing being negotiated with.
Fourth, in reality, house purchasing transactions usually involve a trusted third party to do escrow, and various other parties such as the seller's and buyer's lenders who might be providing funds or holding liens. Are those parties reflected in this model?
Fifth, if your intention is to add to your model "reasons why you cannot buy this house", then what you are describing is a policy system. In that case, represent policies as first-class objects in your system, so that they can be manipulated like any other objects. Owners have policies about under what conditions they will sell. Banks have policies about the conditions under which they will lend. And so on.
In this model, your problem becomes "ask the policy resolution engine if the buyer meets all the necessary conditions imposed by every relevant agency's policy trees in order to purchase a given house". In other words, the question "can X buy Y?" is not for either X or Y to figure out; it's a matter for a policy resolution engine to work out, and THAT thing is what gives you back a list of the policies X has failed to fulfill in order to purchase Y from Z.
Make sense?

Yes, that does seem like the best design. You want to return a list (or set) of reasons, so it's natural to return it as a set.

I think returning a list of reasons not to buy is great; it's very expressive of what you're trying to do. A set would probably be more appropriate, but only slightly so.

It sounds fine to me. You want to return a list of the conditions that failed, and you're returning a list of the conditions that failed.

You could use a callback:
class House {
public void buy(Result result) {
if (true)
result.ok(this);
else
result.error(this, EnumSet.of(Status.NOT_ENOUGH_MONEY, Status.ALREADY_OWNED));
}
}
enum Status {
NOT_ENOUGH_MONEY,
ALREADY_OWNED
}
interface Result {
public void ok(House house);
public void error(House house, EnumSet<Status> status);
}

Instead of returning a list, you can pass a list pointer to the function, to be filled up with reasons in case error conditions arise. The function itself can return 0 to indicate success and 1 for failure, in which case you can check the content of the list.
This will make it faster to know if the function call was successful or not, assuming most of the time it will be success.

Is it a good idea to create a custom type for the primary key of each data table?

We have a lot of code that passes about “Ids” of data rows; these are mostly ints or guids. I could make this code safer by creating a different struct for the id of each database table. Then the type checker will help to find cases when the wrong ID is passed.
E.g the Person table has a column calls PersonId and we have code like:
DeletePerson(int personId)
DeleteCar(int carId)
Would it be better to have:
struct PersonId
{
private int id;
// GetHashCode etc....
}
DeletePerson(PersionId persionId)
DeleteCar(CarId carId)
Has anyone got real life experience
of dong this?
Is it worth the overhead?
Or more pain then it is worth?
(It would also make it easier to change the data type in the database of the primary key, that is way I thought of this ideal in the first place)
Please don’t say use an ORM some other big change to the system design as I know an ORM would be a better option, but that is not under my power at present. However I can make minor changes like the above to the module I am working on at present.
Update:
Note this is not a web application and the Ids are kept in memory and passed about with WCF, so there is no conversion to/from strings at the edge. There is no reason that the WCF interface can’t use the PersonId type etc. The PersonsId type etc could even be used in the WPF/Winforms UI code.
The only inherently "untyped" bit of the system is the database.
This seems to be down to the cost/benefit of spending time writing code that the compiler can check better, or spending the time writing more unit tests. I am coming down more on the side of spending the time on testing, as I would like to see at least some unit tests in the code base.

It's hard to see how it could be worth it: I recommend doing it only as a last resort and only if people are actually mixing identifiers during development or reporting difficulty keeping them straight.
In web applications in particular it won't even offer the safety you're hoping for: typically you'll be converting strings into integers anyway. There are just too many cases where you'll find yourself writing silly code like this:
int personId;
if (Int32.TryParse(Request["personId"], out personId)) {
this.person = this.PersonRepository.Get(new PersonId(personId));
}
Dealing with complex state in memory certainly improves the case for strongly-typed IDs, but I think Arthur's idea is even better: to avoid confusion, demand an entity instance instead of an identifier. In some situations, performance and memory considerations could make that impractical, but even those should be rare enough that code review would be just as effective without the negative side-effects (quite the reverse!).
I've worked on a system that did this, and it didn't really provide any value. We didn't have ambiguities like the ones you're describing, and in terms of future-proofing, it made it slightly harder to implement new features without any payoff. (No ID's data type changed in two years, at any rate - it's could certainly happen at some point, but as far as I know, the return on investment for that is currently negative.)

I wouldn't make a special id for this. This is mostly a testing issue. You can test the code and make sure it does what it is supposed to.
You can create a standard way of doing things in your system than help future maintenance (similar to what you mention) by passing in the whole object to be manipulated. Of course, if you named your parameter (int personID) and had documentation then any non malicious programmer should be able to use the code effectively when calling that method. Passing a whole object will do that type matching that you are looking for and that should be enough of a standardized way.
I just see having a special structure made to guard against this as adding more work for little benefit. Even if you did this, someone could come along and find a convenient way to make a 'helper' method and bypass whatever structure you put in place anyway so it really isn't a guarantee.

You can just opt for GUIDs, like you suggested yourself. Then, you won't have to worry about passing a person ID of "42" to DeleteCar() and accidentally delete the car with ID of 42. GUIDs are unique; if you pass a person GUID to DeleteCar in your code because of a programming typo, that GUID will not be a PK of any car in the database.

You could create a simple Id class which can help differentiate in code between the two:
public class Id<T>
{
private int RawValue
{
get;
set;
}
public Id(int value)
{
this.RawValue = value;
}
public static explicit operator int (Id<T> id) { return id.RawValue; }
// this cast is optional and can be excluded for further strictness
public static implicit operator Id<T> (int value) { return new Id(value); }
}
Used like so:
class SomeClass
{
public Id<Person> PersonId { get; set; }
public Id<Car> CarId { get; set; }
}
Assuming your values would only be retrieved from the database, unless you explicitly cast the value to an integer, it is not possible to use the two in each other's place.

I don't see much value in custom checking in this case. You might want to beef up your testing suite to check that two things are happening:
Your data access code always works as you expect (i.e., you aren't loading inconsistent Key information into your classes and getting misuse because of that).
That your "round trip" code is working as expected (i.e., that loading a record, making a change and saving it back isn't somehow corrupting your business logic objects).
Having a data access (and business logic) layer you can trust is crucial to being able to address the bigger pictures problems you will encounter attempting to implement the actual business requirements. If your data layer is unreliable you will be spending a lot of effort tracking (or worse, working around) problems at that level that surface when you put load on the subsystem.
If instead your data access code is robust in the face of incorrect usage (what your test suite should be proving to you) then you can relax a bit on the higher levels and trust they will throw exceptions (or however you are dealing with it) when abused.
The reason you hear people suggesting an ORM is that many of these issues are dealt with in a reliable way by such tools. If your implementation is far enough along that such a switch would be painful, just keep in mind that your low level data access layer needs to be as robust as an good ORM if you really want to be able to trust (and thus forget about to a certain extent) your data access.
Instead of custom validation, your testing suite could inject code (via dependency injection) that does robust tests of your Keys (hitting the database to verify each change) as the tests run and that injects production code that omits or restricts such tests for performance reasons. Your data layer will throw errors on failed keys (if you have your foreign keys set up correctly there) so you should also be able to handle those exceptions.

My gut says this just isn't worth the hassle. My first question to you would be whether you actually have found bugs where the wrong int was being passed (a Car ID instead of a Person ID in your example). If so, it is probably more of a case of worse overall architecture in that your Domain objects have too much coupling, and are passing too many arguments around in method parameters rather than acting on internal variables.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.