Disconnected LINQ Updates: rowversion vs. datetime with trigger? - c#

We're using LINQ to SQL and WCF for a new middle tier, and we're using Data Transfer Objects for passing over the wire rather than using the actual LINQ classes. I'm going to be using one or the other of the methods outlined here - Linq Table Attach() based on timestamp or row version - in order to ensure that updates work correctly and that concurrency is handled correctly.
To save you folks some reading time, essentially you can either use a timestamp/rowversion column in your table or have a datetime column with a default and an update trigger - either way it gets you a column that gets a newly generated value each time an insert or update occurs, and that column is the one used by LINQ to check for concurrency.
My question is - which one is better? We already have datetime columns for "UpdatedWhen" in many of our tables (but not all - don't ask), but would be adding in the defaults and the triggers, or we could just add the rowversion (we'd have to use the timestamp syntax for now, since we're still supporting SQL2005 for a while) to each table - either way we're modifying the DB in order to make it work, so I'd like to know whether there's a performance difference or any other important difference to note between these two alternatives. I've tried searching the web and here on SO, but no luck so far. Thanks.

I had to make the similar decision recently.
I tried rowversion solution first.
The disadvantages that I found:
Inconvenient usages in LINQ-to-SQL, I mapped the field to byte[]. The code does not look clean when you compare byte arrays
Theoretically rowversion can roll over and start from 0 again, so row with higher rowversion will not necessarily be older row
Rowversion is updated on any row update, which in my case was not desirable, I needed to exclude some columns to not affect row version. Having a trigger allows to implement any level of flexibility.
As a result I used datetime2 column with a default constraint and update trigger to set the value to sysutcdatetime().
This type has accuracy 100 nanoseconds (precision 7 digits - 23:59:59.9999999).
Although it is possible, I never saw generation of the same value twice yet. But in my case it will not hurt if there will be duplicates. If it was important to me, I would add unique constraint, and see if this ever fails.
I used sysutcdatetime() as this value would not be affected by daylight saving.

I would lean towards using timestamp column for concurrency checks. One - triggers would have some impact upon performance and two - with date time column you'll be limiting yourself to the precision of DateTime column in SQL and C#.
MSDN:
datetime values are rounded to increments of .000, .003, or .007 seconds...
You may want to look at SO: Is there any difference between DateTime in c# and DateTime in SQL server? and MSDN: datetime (Transact-SQL) for more info.

Related

Why is first insert winning over second one in Cassandra?

There are two data centers with 3 nodes each. I'm doing two simple inserts (very fast back to back) to the same table with a consistency level of local quorum. The table has one partitioning key and no clustering columns.
Sometimes the first insert wins over the second one. The data produced by the first insert statement is what gets saved in the database even though I do an insert right after that.
C# Code
var statement = "Insert Into customer (id,name) Values (1, "foo")";
statement.SetConsistencyLevel(ConsistencyLevel.LocalQuorum);
session.Execute(statement);
Set the timestamp on client. In most new drivers this is done automatically to better ensure order preserved. However older drivers or pre Cassandra 2.1 its not supported and needs to be in query. I dont know what driver or version you are using, but you can also put it in the CQL. Its supported on protocol level though so driver should have better mechanism.
Something like: var statement = "INSERT INTO customer (id,name) VALUES (1, 'foo') USING TIMESTAMP {microsecond timestamp}";
Best approach is to use a monatomic timestamp so that each call is always higher then last (ie use current milliseconds and add a counter). I don't know C# to tell you how to best approach that. Look at https://docs.datastax.com/en/developer/csharp-driver/3.3/features/query-timestamps/#using-a-timestamp-generator
If you don't have a timestamp set it on the mutation, the coordinator will assign it after it parses the query. Since networks and netty queues can do funny things order is not a sure thing, especially as they end up on different nodes that may have some clock drift.

EF: Is it OK to use default value for the created date property?

I want to set the CreatedDate column to the current datetime whenever a new row is inserted, so I made it like this:
public DateTime CreateDate { get; set; } = DateTime.Now;
I tested it and it works just fine, Is this a practical approach? I saw many articles achieve the same with so much code and using fluent API, no one mentioned this simple method?
That depends on what you want to accomplish.
I saw many articles achieve the same with so much code and using fluent API,
Most of those articles will be about adding default values at the database level. You're not doing that. If you insert a row into your table using plain SQL, and don't specify a value for CreatedDate, you'll get an error.
With what you're doing, CreatedDate always needs to be specified in SQL when inserting. But Entity Framework will always specify it in the SQL when inserting, and completely ignore any default value set at the database level.
So if that's what you want -- the default value only gets applied when creating objects through C# -- then what you're doing is totally fine. It may also be written as setting the value from inside the class's constructor.
#Alex Kozlowski raises a good comment though, which is that DateTime.Now may not be the value you expect to be inserted. It depends on which system is running the code. The time zone may be different from your server's, or the clock may be out of sync.
Yes. This feature was added in c# version 6 onwards. This will work from c#-6 onwards only. And that's why you won't find it in many articles.

Concurrency issue

We are creating a client server application using WPF/C# with SQL. Here we are generating a unique number b checking DB(To get the last maximum number) and with that max value, we are increment '1' and storing the value in DB. At this time another user also working on the same screen and creating unique numbers, in some case the the unique numbers gets duplicated and throws exception.
We found this is a concurrency issue.
Indeed, fetching a number out, adding one, and hoping it still isn't in use is a thread-race and a race between multiple clients - and should be avoided.
Options:
use an IDENTITY column in the database, and let the database generate the value itself during INSERT; the database server knows how to do this safely and reliably
if that isn't possible, you might want to delay this code until you are ready to INSERT so it is all part of a single database operation - and even then, if it isn't in a "serializable transaction" (with key-range read locks, etc), then you would have to loop on "get the max, increment, try to insert but note that we might have lost a race, so only insert if the value doesn't exist - which it might; repeat from start if unsuccessful"
alternatively, you could create the new record when you first need the number (even though the rest of the data isn't available), noting that you might still need the "loop until successful" approach
Frankly, the IDENTITY column approach is the simplest.
Finally, We have follwed Singleton pattern with lock to resolver this issue.
Thanks.

Best way to handle Date Ordering to reach better performance?

I have large tables, contains DateTime columns to store exact time of some events and actions in my application. In some cases, users can enter the date of an event.
I want to validate event sequences, find events by the time of happening, and such things.
If I order events by DateTime, it's time consuming in large data. If I order by Id, there's no guaranty that users data entry is ordered, also users are not responsible to determine the sequence (they just enter date time). I prefer to order by a numeric field instead of DateTime.
What do you suggest?
Ordering by DateTime columns should not be slow, even on large data, provided your database has that column indexed.
I would, personally, do the ordering directly on the DateTime (with an index on the db), but make sure that your LINQ queries limit the results to the appropriate date window.
Keep in mind that a datetime will be stored as an integer number of ticks since a fixed point in time (generally Jan. 1st 1970). To compare datetimes it is just an integer comparison of this single value, it doesn't need to compare year, month, day, etc. That is unless you're storing the date as a string, and not as a datetime.
My guess is that your database is internally storing the data sorted by ID, and that the ID is also indexed, which is why that's so quick. Your problem isn't sorting on a datatime, it's simply sorting on a non-ID column. As Reed suggested, you probably just need to index the column. It's also possible that you're doing something, somewhere, in a way that you shouldn't. It's hard to say what that might be without seeing the code, the DB configuration, etc.

How to generate a transaction number?

I was thinking of formatting it like this
TYYYYMMDDNNNNNNNNNNX
(1 character + 19 digits)
Where
T is type
YYYY is year
MM is month
DD is day
N is sequencial number
X is check digit
The problem is, how do I generate the sequencial number? since my primary key is not an auto increment integer value, if it was i would use that, but its not.
EDIT can I have the sequencial number resets itself after 1 day (24hours).
P201012080000000001X <-- first
transaction of 2010/12/08
P2010120810000000002X <--- second
transaction of 2010/12/08
P201012090000000001X <--- First
transaction of 2010/12/09
(X is the check digit)
The question is meaningless without a context. Others have commented on your question. Please answer the comments. What is the "transaction number" for; where is it used; what is the "transaction" that you need an external identifier for.
Identity or auto-increment columns may have some use internally, but they are quite useless outside the database.
If we had the full schema, knowing which components are PKs that will not change, etc, we could provide a more meaningful answer.
At first glance, without the info requested, I see no point in recording date in the "transaction" (the date is already stored in the transaction row)
You seem to have the formula for your transaction number, the only question you really have is how to generate a sequence number that resets each day.
You can consider the following options:
Use a database sequence and a scheduled job that resets it.
Use a sequence from outside the database (for instance, a file or memory structure).
With the proper isolation level, you should be able to include the (SELECT (MAX(Seq) + 1) FROM Table WHERE DateCol = CURRENT_DATE) as a value expression in your INSERT statement.
Also note that there's probably no real reason to actually store the transaction number in the database as it's easy to derive it from the information it encodes. All you need to store is the sequential number.
You can track the auto-incs separately.
Or, as you get ready to add a new transaction. First poll the DB for the newest transaction and break that apart to find the number, and increase that.
Or add an auto-inc field, but don't use it as a key.
You can use a uuid generator so that you don't have to mind about a sequence and you are sure not to have collision between transactions.
eg :
in java :
java.util.UUID.randomUUID()
05f4c168-083a-4107-84ef-10346fad6f58
5fb202f1-5d2a-4d59-bbeb-5bcabd513520
31836df6-d4ee-457b-a47a-d491d5960530
3aaaa3c2-c1a0-4978-9ca8-be1c7a0798cf
in php :
echo uniqid()
4d00fe31232b6
4d00fe4eeefc2
4d00fe575c262
there is a UUID generator in barely all languages.
A primary key that big is a very, very bad idea. You will waste huge amounts of table space unnecessarily and make your table very slow to query and manage. Make you primary key a small simple incrementing int and store the transaction date in a separate field. When necessary in a query you can select a transaction number for that day with:
SELECT ROW_NUMBER OVER (PARTITION BY TxnDate ORDER BY TxnID), TxnDate, ...
Please read this regarding good primary key selection criteria. http://www.sqlskills.com/BLOGS/KIMBERLY/category/Indexes.aspx

Categories