How to define a complex MongoDb document Id? - c#

I've a situation where I want to store count of messages of a particular user in a chat in a particular day.
To make the Id unique, I thought I should combine these ids together and it became ~20 characters length consisted of ChatId + DDMMYY + UserId
public class UserContributions
{
[BsonId]
public string ChatIdDateUserId { get; set; }
public int Count { get; set; }
// the rest
}
But I guess an ID with this length is not good regarding performance. Is that How I should make a complex ID?
Thanks.

The length should not be too much of a problem. Furthermore, you can have a compound _id
{
_id:{
ChatID: "someId",
Date: ISODate("2017-10-30T00:00:00.000Z"),
UserID: "someUID"
}
}
Some notes first: Do NOT use strings to denote a date. First of all, ISODates are stored as 64bit unsigned integer. While with a date stored as 6 characters, you save some space, but you loose all capabilities for date operations in aggregations as well as normal date comparisons. Bad idea.
Second, your model is prone to collision. The same user at the same date could only post one message to a specific chat. The second message of the day would have exactly the same values and hence would violate the constraint of uniqueness. So you actually have to use the full ISODate, down to the millisecond.
And still then, there is a small chance of collision (say you have a date generated on two application servers which are slightly of time wise). There is a reason why there is an additional counter in ObjectId.
Here is how I would model it
{
_id: new ObjectId(),
ChatId: "someChat",
UserId: "someUser"
}
Reason: the ObjectId contains a timestamp by which you can query (I do not know C# well enough to give an example, hence I will make this a wiki answer to give others the opportunity to do so), eliminates unneeded complexity and with indices on both ChatId and UserId it is fast enough.

Related

Generating IDs for Items in an Inventory System

So I am making an inventory system as a school project and I want to generate a Unique IDs for the items so that it will be the identifier for that certain Item.
Is it possible to generate an ID which is based on certain fields such as the item name, the price, the expiration.
Is there some existing library I can use for this?
EDIT: It is okay for my system to have duplicate ids because it will mean that the item already exists in the system and does not need to be added again.
Is it possible to generate an ID which is based on certain fields such as the item name, the price, the expiration.
Yes: a hash.
For example:
// It's very important to use `InvariantCulture ` and "o" to ensure consistent formatting on all computers
String infoText = item.Name + item.Price.ToString( "C2", CultureInfo.InvariantCulture ) + item.Expiration.ToString("o");
Byte[] infoBytes = Encoding.UTF8.GetBytes( infoText );
using( SHA256 sha = new SHA256Cng() )
{
Byte[] hash = sha.ComputeHash( infoBytes );
String hashText = Convert.ToBase64String( hash );
Console.WriteLine( "{0} = {1}", infoText, hashText )
}
A hash (also known as a "digest") will always be the same for the same input, and will be different for different input.
So if you have an SKU containing $10 worth of apples that expires on 2019-10-09, and you feed that in to the code above, then it will generate a unique code you can use without needing to store the mapping between that SKU and that unique code (also known as "content-based addressing").
...so if you come across another $10 worth of apples that also expire on 2019-10-09, then it will have the same unique code, even though it's a different object, and you didn't need to memorize that unique code you generated earlier.
But if you come across $10 worth of pears that expires on 2019-10-09, or $20 worth of apples that also expires on the same day, the'll have a different code.
You could create the ID by concatenating all of the information together as a big string or even hashing that string.
However this can still give you duplicates if all of that information is that same several times. There doesn't seem to be any need on using the existing information to generate the ID. If you are basing it on existing data then there will always be a chance of duplicates. Unless there is some sort of constraint on one of the fields requiring that to be unique. But if that's unique then you could just use that as an ID in the first place.
If you are storing this data in a database I would suggest creating an ID field and make that the primary key and give it an identity. This will automatically increment the ID so it will always be unique.
You could also just generate a GUID by using Guid.NewGuid() and use that as the ID.
Is it possible to generate an ID which is based on certain fields such as the item name, the price, the expiration.
Joining these three strings or even hashing them still have the chance of being duplicate.
Simple question needs simple solution, why not using a running number? Since you have an inventory system, for the first item in your inventory system that should be ID#1 (or you can start with any numbers you like).
But if you insist to use certain fields information i would suggest that adding all below information together:
combination of fields information
timestamp
user id(person who perform the insert)
maybe your favorite colors
hashing all information above

Event Source, Anti-Corruption layer design with NEventStore

I have two legacy enterprise application that have a few similar features. I need to build a system that responds to data change events from those systems, processes that data and exposes the combined results through an API in multiple formats. I would like to use an Event Source/DDD style architecture but I'm not sure if it makes sense. Given the simplified model below, how could I design the system?
Note - The following has been edited to clarify the question:
Both systems have Products that contain different prices based on the date. When the price changes, each system can emit an event, PriceChanged, that contains the identifier of the legacy system, that system's primary key, a date and the new price. The product ID is unique to the system, but may not be unique between both systems so a system ID will also be included.
PriceUpdated {
int systemId,
int productId,
Date date,
Float amount
}
Within the bounded context of my new system there would be a service that receives this event and need to look up my aggregate by systemId, productId, and date and then emit a command to update the corresponding price in the aggregate. My aggregate would be defined as:
class ProductPriceAggregate
{
Guid Id,
int systemId,
int productId,
Date date,
Float amount
Apply(CreateProductPriceCommand e){
Id = e.Id;
systemId = e.systemId;
productId = e.productId;
date = e.date;
RaiseEvent(new ProductPriceCreatedEvent(this))
}
Apply(UpdateProductPriceCommand d){
amount = e.amount;
RaiseEvent(new ProductPriceUpdatedEvent(this));
}
}
If I am using NEventStore which stores streams using a GUID, then each aggreateId will be represented by a GUID. My service would need to query for the GUID using the systemId, productId and date to emit a command with the correct ID.
The service might look like this:
class PriceUpdateService : ISubscribeTo<PriceUpdated>{
Handle<PriceUpdated>(PriceUpdated e)
{
var aggregateId = RetrieveId(e.systemId, e.productId, e.date);
if (aggregateId == null)
Raise(new CreateProductPriceCommand(e))
else
Raise(new UpdateProductPriceCommand(aggregateId, e.amount);
}
RetrieveId(int systemId, int productId, DateTime date)
{
// ???
}
}
The question is what is the best way to look up the aggregate's ID? The legacy systems emitting the PriceUpdated event will have no knowledge of this new system. Could I use a read model that is updated in response to ProductPriceCreatedEvent that contains enough information to query for the ID? Would I need another aggregate who's responsibility is to index ProductPrices? As posted as an answer by VoiceOfUnreason, I could use a repeatable convention for generating the ID by systemId, productId and date. Is this the recommended option from a DDD perspective?
Do you control your own IDs?
PriceUpdated {
int systemId,
int productId,
Date date,
Float amount
}
An alternative to trying to lookup an aggregateId is to calculate what that aggregateId must be. The basic idea is that the different points that need to find an aggregate from this event share an instance of a Domain Service that encapsulates the calculation.
The signature looks like the query you wrote in your question
// Just a query, we aren't changing state anywhere.
var aggregateId = idGenerator.getId(e.systemId, e.productId, e.date);
Any given implementation takes it's own salt, the arguments you pass to it, and generates a hash that is the common id used everywhere to map this combination of arguments to an aggregate.
You can, of course, produce identifiers for different aggregates using the same event data by passing in an idGenerator with a different salt.
For the particular case where your IDs are UUID, you can use a Name-Based UUID.

Timestamp data from DB2 is not accurate when using EntityFramework

I have data in IBM DB2 and what I am trying to do is to get that data using EntityFramework.
I have one column that has TIMESTAMP values in this format: dd.MM.yyyy hh:mm:ss.ffffff and it is primary key in that table.
When I am getting data from DB2 database time part of timestamp is lost.
On the other side I have entity that has string property for that Id:
public string Id { get; set; }
This is a method that will return specific merchant. It is expected only one, because timestamps aka Ids are unique. But, when data gets pulled and when time part is lost I get 9 merchants with same Id and of course exception).
MerchantEntity IMerchantRepository.GetMerchantByRegistrationNumber(string companyRegistrationNumber)
{
var db = merchantDataContextProvider.GetMerchantContext();
var merchant = db.Merchants.Include(x => x.MerchantProperty).
SingleOrDefault(x => x.MerchantProperty.CompanyRegistrationNumber == companyRegistrationNumber);
return merchant;
}
I have tried to use DateTime type for Id, the only difference was that it was cutting last 3 numbers, instead whole time part.
Did anyone had same or similar problem? Who to get accurate data when using DB2?
Timestamps as primary key.. not a great idea. If you do however want to use time as your basis for creating an ID of the merchant, you could do this:
var newId = string.format("{0:D19}", DateTime.Now.Ticks)
which will give you a 19-digit long number based on the current time. The problem of course is that if the clock on the PC resets, or you run the program on a different computer, different timezones etc, this strategy will fail.
If you need something that uniquely identifies a row in a table, use GUID's. They're very unique, and the chance of two different processes making the same one is very close to zero

Generating a unique random number

I am entering student id as a randon number into the DB
int num = r.Next(1000);
Session["number"] = "SN" + (" ") + num.ToString();
But is there any chance of getting a duplicate number?How can i avoid this?
EDIT :: I have a identity column and the student id is separate from the ID,i am going to enter a random student id into the DB from UI.
It is a very common task to have a column in a DB that is merely an integer unique ID. So much so that every database I've ever worked with has a specific column type, function, etc. for dealing with it. It will vary based on whatever specific database you use, but you should figure out what that is and use it.
You need a value that is unique not, random. The two are different. Random numbers repeat, they aren't unique. Unique numbers also aren't random. For example, if you just increment numbers up from 0 it will be unique, but that's not in any way random.
You could use a GUID, which would be unique, but it would be 128 bits. That's pretty big. Most databases will just have a counter that they increment every time you add an item, so 32 bits is usually enough. This will save you a lot of space. Incrementing a counter is also quicker than calculating a GUID's new value. For DB operations that tend to involve adding lots of items, that could matter.
As Jodrell mentions in the comments, you should also consider the size of the index if you use a GUID or other large field. Storing and maintaining that index will be much more expensive (in both time and space) with column that needs that many more bits.
If you try to do something yourself there's a good chance you'll do it wrong. Either your algorithm won't be entirely unique, it will have race conditions due to improper synchronization, it will be less performant because of excessive synchronization, it will be significantly larger because that's what it took to reduce the risk of collisions, etc. At the end of the day the database will have access to tools that you don't; let it take care of it so you don't need to worry about what you could mess up.
Sure there is a very likely chance that you will get a duplicate number. Next is just giving you a number between 0 and 1000, but there is no guarantee that the number will not be some number that Next has returned in the past.
If you are trying to work with unique values, look into possibly using Guids instead of integers or have a constantly increasing integer value instead of any random number. Here the reference page on Guid
http://msdn.microsoft.com/en-us/library/system.guid.aspx
you can use Guid's instead of random int , they are going to always be unique
There is no way to guarentee an int is unique unless you check every one that already exists, and even then - like the comments say , you are guarenteed duplicates when you pass 1000 ids
EDIT:
I mention that I think Guid's are best here because of the question , first indexing the table is not going to take long at all - it is assumed that there are going to be less then 1000 students because of the size of int, 128 bits is fine in a table with less then 1000 rows.
Guid's are a good thing to learn - even though they are not always the most effecient way
Creating a unique Guid in c# has a benifit that you can keep using and displaying that id - like in the question , without another trip to Db to figure out which unique id was assigned to the student
Yes, you will get duplicates. If you want a truly unique item, you will need to use Guid. If you still want to use numbers, then you will need to keep track of the numbers you have already used, similar to identity column in database.
Yes, you will certainly get duplicates. You could use a GUID instead:
Guid g = Guid.NewGuid();
GUIDs are theoretically "Globally Unique".
You can try to generate id using Guid:
Session["number"] = "SN" + (" ") + Guid.NewGuid().ToString();
It will highly descrease a chance to get duplicate id.
If you are using random numbers then no there is no way of avoiding it. There will always be a chance of a collision.
I think what you are probably looking for is an Identity column, or whatever the equivalent is for your database server.
In LINQ to SQL it is possible to set row like this:
[Column ( IsPrimaryKey = true, IsDbGenerated = true )]
public int ID { get; set; }
I dont know if it helps you in asp, but maybe it is a good hint...
Yes there is a chance of course.
Quick solution:
Check if it is a duplicate number first and try again until it is no longer a duplicate number.

The limit of Int32 for Identity Column

This is just a consideration for a site am creating and for other big sites out there.
I am using Identity Column to store the ID of some of my tables and I have classes whose Id are decorated with Int32 to hold the value of the ID retrieved from database.
My worry is that as the site grows bigger, some tables that grows exponentially e.g QuestionComments might exceed the Int32 limit in future. So I change my class to use long.
public class Question
{
public long QuestionID { get; set; }
...
}
//Converting database value to .Net type
Question q = new Question();
q.QuestionID = Convert.ToInt32(myDataRow["QuestionID"]);
How true is my assumption? Would using a UniqueIdentifier be better? Are there other way to address this?
Can you imagine billions of records being stored for any one entity? If so you could switch to BigInt which is otherwise known as Int64. Of course once you start seeing many millions of records you need to start thinking about data partitioning and archiving to avoid serious performance issues. If you have a infrastructure team you may want to let them know that you expect heavy usage and will need a serious maintenance plan. If you are the infrastructure team then you better hit the books!
If you really think it's reasonable that you'll have over 2 billion comments, then use BIGINT in SQL Server (Int64 in .Net). This requires 8 bytes of storage instead of 4 bytes for INT, however you can offset this for the first 2+billion values if you use data compression.

Categories