I have the following scenario:
A user can create orders having a given amount (e.g.: 500$)
There is a limit for the total orders' amount that can be added for a single day (e.g.: max 2000$/day)
At the moment, when creating a new order, this requirement is implemented as follows:
var newOrder = /* logic for creating the new order */;
var orders = _ordersRepository.GetAllBy(userId, date); // get the orders from the db
var totalAmount = orders.Sum(o => o.Amount);
if(totalAmount < MaximumAmount) {
newOrder.IsApproved = true;
}
else {
newOrder.IsApproved = false;
}
_ordersRepository.Add(newOrder);
_ordersRepository.SaveChanges(); // insert into the db
The problem with this approach is that it doesn't handle concurrent insertions properly in scenarios like:
Maximum orders limit: 2000$
In a single second, a user sends 10 requests for creating 10 new orders of 500$ each
The requests are handled concurrently and because of the short timeframe, the currently implemented check is executed before the new orders are saved in the database and therefore allows creating all of them. In the end, the user ends exceeding the maximum limit.
How could I solve this issue, ideally without having to call SaveChanges multiple times? I'm using Entity Framework Core 5 and SQL Server.
Thank you for the suggestions from the comments!
I've tried using transactions with the IsolationLevel set to Serializable, but then I've realized this would lock too many tables (the example from the question is a dummy one, the actual implementation is more complex).
I agree that sometimes it might be easier to have this kind of logic in the database, but adding a stored procedure for this will kind of break the current consistency and most probably leave the door open for other stored procedures. I'm not saying stored procedures are bad, just that in my current situation, even if it is a bit harder/more complex to achieve this without them, I believe it's worth it for consistency reasons.
The solution I've ended with
I've ended splitting the flow in 2 steps as follows:
// step 1
var newOrder = /* logic for creating the new order */;
_ordersRepository.Add(newOrder);
_ordersRepository.SaveChanges(); // insert into the db
// step 2
var orders = _ordersRepository.GetAllBy(userId, date); // get the orders from the db
var totalAmount = orders.Sum(o => o.Amount);
if(totalAmount < MaximumAmount) {
newOrder.IsApproved = true;
}
_ordersRepository.Update(newOrder);
_ordersRepository.SaveChanges(); // update the new order
Step 1 just creates the new order and inserts it into the database, the IsApproved flag being left to the default which is false.
Step 2 performs the daily limit validation and if the check passes, the IsApproved flag to true.
I know it's not an actual solution, but a workaround. Locking a table might have a too big performance impact, especially if the given table is used by multiple app features. With this solution, even if there was an issue in the Step 2, the order will be left with IsApproved=false so it won't have any impact and the user can either try again later, or somebody from support can handle it.
Related
I have a method that needs to "claim" a payment number to ensure it is available at a later time. I cannot just get a new payment number when ready to commit to the database, as the number is added to a signed token, and then the payment number is taken from the signed token later on when committing to the database to allow the token to be linked to the payment afterwards.
Payment numbers are sequential and the current method used in existing code is:
Create a Payment
Get the last payment number from the database
Increment the payment number
Use this payment number for the Payment
Update the database with the incremented payment number
In my service I am trying to prevent the following race-condition:
My service reads the payment number (eg. 100)
Another service uses and updates the payment number (now 101)
My service increments the number locally (to 101) and updates the database (still 101)
This would produce two payments with a payment number of 100.
Here is my implementation so far, in my Transaction class:
private DbSet<PaymentIdentifier> paymentIdentifier;
//...
private int ClaimNextPaymentNumber()
{
int nextPaymentNumber = -1;
using(var dbTransaction = db.Database.BeginTransaction())
{
int lastPaymentNumber = paymentIdentifier.ElementAt(0).Identifier;
nextPaymentNumber = lastPaymentNumber + 1;
paymentIdentifier.ElementAt(0).Identifier = nextPaymentNumber;
db.SaveChanges();
dbTransaction.Commit();
}
return nextPaymentNumber;
}
The PaymentIdentifier table has a single row and a single column "Identifier" (hence the .ElementAt(0)). I am unable to change the database structure as there is lots of legacy code relying on it that is very brittle.
Will having the code wrapped in a transaction (as I have done) protect against the race condition, or is there some Entity Framework / PostgreSQL idiosyncrasies I need to deal with to protect the identifier from being read whilst performing the transaction?
Thank you!
(As a side point, I believe lots of legacy code in the other software connecting to the database simply ignores the race condition and relies on it being "very fast")
It helps you with the race condition only if all code, including legacy, will use this method. If there is still code that continue using client side incrementing without transaction, you'll get the same problem. Just exchange 'My service' and 'Another service' in your description.
1. Another service reads the payment number (eg. 100) **without** transaction
2. My service uses and updates the payment number (now 101) **with** transaction
3. Another service increments the number locally (to 101) and updates the database (still 101) **without** transaction
Note that you can replace your code with simpler one by executing this query without explicit transaction.
update PaymentIdentifier set Identifier = Identifier + 1 returning Identifier;
But again, it will not solve your concurrency problem until you replace all places where the Identifier is incremented. If you can change that, you would better use SEQUENCE or Generators that will safely provide you with incremental Ids.
A transaction does not automaticaly lock your table. A Transaction just ensures that multiple changes to the database are done altogether or nothing at all (see the A (atomic) in ACID). But the thing you want is that only one session can read, add one, update the value. And after that is done the next session is allowed to do the same thing.
So you now have different possibilities:
Use a Sequence you can get the next value for example like that SELECT nextval('mysequencename'). If if two sessions try to get a value at the same time they will get two differnt values.
If you have more complex needs and want to store every "token" with additional data in a table. so every token is a row in the table with additional colums you could use table locking. With this you could restrict the access to table. So only one session is allowed to access the table at a time. But make sure that you use locks for as short as possible because this will become your performance bottleneck.
The database prevents the race condition by throwing a concurrency violation error in this case. So, I looked at how this is handled in the legacy code (following the suggestion by #sergey-l) and it uses a simple retry mechanism. So, I did the same:
private int ClaimNextPaymentNumber()
{
DbContextTransaction dbTransaction;
bool failed;
int paymentNumber = -1;
do
{
failed = false;
using(dbTransaction = db.Database.BeginTransaction())
{
try
{
paymentNumber = TryToClaimNextPaymentNumber();
}
catch(DbUpdateConcurrencyException ex)
{
failed = true;
ResetForClaimPaymentNumberRetry(ex);
}
dbTransaction.Commit();
concurrencyExceptionRetryCount = 0;
}
}
while(failed);
return paymentNumber;
}
What would be the most effective way to open/use a SQL Server connection if we're reading rows to be deleted in batches?
foreach(IEnumerable<Log> logsPage in LogsPages)
{
foreach(Log logEntry in logsPage)
{
// 1. get associated filenames
// 2. delete row
// 3. try delete each file
}
}
Log page size is about 5000 rows
Files associated with the log entries may vary in size. I don't think they are larger than say 500 Mb.
We use Dapper
Should we let Dapper open connections on each step of the foreach loop? I suppose SQL Server connection pooling takes place here?
Or should we open an explicit connection per batch?
If you're performing multiple database operations in a tight loop, it would usually be preferable to open the connection for the duration of all the operations. Returning the connection to the pool can be beneficial in contested systems where there can be an indeterminate interval before the next database operation, but if you're doing lots of sequential operations: constantly fetching and returning connections from the pool (and executing sp_reset_connection, which happens behind the scenes) add overhead for no good reason.
So to be explicit, I'd have the Open[Async]() here above the first foreach.
Note: for batching, you might find that there are ways of doing this with fewer round-trips, in particular making use of the IN re-writing in Dapper based on the ids. Since you mention SQL-Server, This can be combined with setting a SqlMapper.Settings.InListStringSplitCount to something positive (5, 10, etc are reasonable choices; note that this is a global setting); for example, for a simple scenario:
connection.Execute("delete from Foo where Id in #ids",
new { ids = rows.Select(x => x.Id) });
is much more efficient than:
foreach (var row in rows)
{
connection.Execute("delete from Foo where Id = #id",
new { id = row.Id });
}
Without InListStringSplitCount, the first version will be re-written as something like:
delete from Foo where Id in (#ids0, #ids1, #ids2, ..., #idsN)
With InListStringSplitCount, the first version will be re-written as something like:
delete from Foo where Id in (select cast([value] as int) from string_split(#ids,','))
which allows the exact same query to be used many times, which is good for query-plan re-use.
"Cassandra: The Definitive Guide, 2nd Edition" says:
Cassandra’s batches are a good fit for use cases such as making
multiple updates to a single partition, or keeping multiple tables in
sync. A good example is making modifications to denormalized tables
that store the same data for different access patterns.
The last statement above applies to the following attempt, where all the Save... are insert statements for different tables
var bLogged = new BatchStatement();
var now = DateTimeOffset.UtcNow;
var uuidNow = TimeUuid.NewId(now);
bLogged.Add(SaveMods.Bind(id, uuidNow, data1)); // 1
bLogged.Add(SaveMoreMods.Bind(id, uuidNow, data2)); // 2
bLogged.Add(SaveActivity.Bind(now.ToString("yyyy-MM-dd"), id, now)); // 3
await GetSession().ExecuteAsync(bLogged);
We'll focus on statements 1 and 2 (the 3rd one is just to signify there's one more statement in the batch).
Statement 1 writes to table1 partitioned by id with uuidNow being a clustering key desc.
Statement 2 writes to table2 partitioned by id only, so it's the tip of the table1 for the same id.
More times than I'd like the two tables get out of sync in the sense that table2 does not have the tip of the table1. It would be one or two mods behind within a few milliseconds.
While looking for resolution most on the web advise against all batches, which prompted my solution eliminating all mismatches:
await Task.WhenAll(
GetSession().ExecuteAsync(SaveMods.Bind(id, uuidNow, data1)),
GetSession().ExecuteAsync(SaveMoreMods.Bind(id, uuidNow, data2)),
GetSession().ExecuteAsync(SaveActivity.Bind(now.ToString("yyyy-MM-dd"), id, now))
);
The question is: what are batches good for, just the first statement in the quote? In that case how do I ensure modifications to different tables are in sync?
Using higher consistency (ie quorum) on reads/writes may help but there is always a possibility for inconsistencies between the table/partitions.
Batch statements will try to ensure that all the mutations in the batch will all happen or not. It does not guarantee that all the mutations will occur in an instant (no isolation, you can do a read where first mutation has been applied but others haven't). Also, batch statements will not provide a consistent view of all the data across all the nodes. For linearizable consistency you should consider using paxos (lightweight transactions) for conditional updates and trying to limit things that require the linearizability into a single partition.
I can't believe it is so hard to get someone to show me a simple working example. It leads me to believe that everyone can only talk like they know how to do it but in reality they don't.
I shorten the post down to only what I want the example to do. Maybe the post was getting to long and scared people away.
To get this bounty I am looking for a WORKING EXAMPLE that I can copy in VS 2010 and run.
What the example needs to do.
Show what datatype should be in my domain for version as a timestamp in mssql 2008
Show nhibernate automatically throwing the "StaleObjectException"
Show me working examples of these 3 scenarios
Scenario 1
User A comes to the site and edits Row1. User B comes(note he can see Row1) and clicks to edit Row1, UserB should be denied from editing the row until User A is finished.
Scenario 2
User A comes to the site and edits Row1. User B comes 30mins later and clicks to edit Row1. User B should be able to edit this row and save. This is because User A took too long to edit the row and lost his right to edit.
Scenario 3
User A comes back from being away. He clicks the update row button and he should be greeted with StaleObjectException.
I am using asp.net mvc and fluent nhibernate. Looking for the example to be done in these.
What I tried
I tried to build my own but I can't get it throw the StaleObjectException nor can I get the version number to increment. I tired opening 2 separate browser and loaded up the index page. Both browsers showed the same version number.
public class Default1Controller : Controller
{
//
// GET: /Default1/
public ActionResult Index()
{
var sessionFactory = CreateSessionFactory();
using (var session = sessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
var firstRecord = session.Query<TableA>().FirstOrDefault();
transaction.Commit();
return View(firstRecord);
}
}
}
public ActionResult Save()
{
var sessionFactory = CreateSessionFactory();
using (var session = sessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
var firstRecord = session.Query<TableA>().FirstOrDefault();
firstRecord.Name = "test2";
transaction.Commit();
return View();
}
}
}
private static ISessionFactory CreateSessionFactory()
{
return Fluently.Configure()
.Database(MsSqlConfiguration.MsSql2008
.ConnectionString(c => c.FromConnectionStringWithKey("Test")))
.Mappings(m => m.FluentMappings.AddFromAssemblyOf<TableA>())
// .ExposeConfiguration(BuidSchema)
.BuildSessionFactory();
}
private static void BuidSchema(NHibernate.Cfg.Configuration config)
{
new NHibernate.Tool.hbm2ddl.SchemaExport(config).Create(false, true);
}
}
public class TableA
{
public virtual Guid Id { get; set; }
public virtual string Name { get; set; }
// Not sure what data type this should be for timestamp.
// To eliminate changing to much started with int version
// but want in the end timestamp.
public virtual int Version { get; set; }
}
public class TableAMapping : ClassMap<TableA>
{
public TableAMapping()
{
Id(x => x.Id);
Map(x => x.Name);
Version(x => x.Version);
}
}
Will nhibernate stop the row from being retrieved?
No. Locks are only placed for the extent of a transaction, which in a web application ends when the request ends. Also, the default type of transaction isolation mode is Read committed which means that read locks are released as soon as the select statement terminates. If you are reading and making edits in the same request and transaction, you could place a read and write lock on the row at hand which would prevent other transactions from writing to or reading from that row. However, this type of concurrency control doesn't work well in a web application.
Or would the User B be able to still see the row but if he tried to save it would crash?
This would happen if [optimistic concurrency] was being used. In NHibernate, optimistic concurrency works by adding a version field. Save/update commands are issued with the version upon which the update was based. If that differs from the version in the database table, no rows are updated and NHibernate will throw.
What happens if User A say cancels and does not edit. Do I have to
release the lock myself or is there a timeout can be set to release
the lock?
No, the lock is released at the end of the request.
Overall, your best bet is to opt for optimistic concurrency with version fields managed by NHibernate.
How does it look in code? Do I setup in my fluent nhibernate to
generate a timestamp(not sure if I would timespan datatype).
I would suggest using a version column. If you're using FluentNhibernate with auto mappings, then if you make a column called Version of type int/long it will use that to version by default, alternatively you can use the Version() method in the mapping to do so (it's similar for timestamp).
So now I generated somehow the timestamp and the user is editing a
row(through a gui). Should I be storing the timestamp in memory or
something? Then when the user submits call from memory the timestamp
and id of the row and check?
When the user starts editing a row, you retrieve it and store the current version (the value of the version property). I would recommend putting the current version in a hidden field in the form. When the user saves his changes, you can either do a manual check against the version in the database (check that it's the same as the version in the hidden field), or you can set the version property to the value from the hidden field (if you are using databinding, you could do this automatically). If you set the version property, then when you try to save the entity, NHibernate will check that the version you're saving matches the version in the database, and throws an exception if it doesn't.
NHibernate will issue an update query something like:
UPDATE xyz
SET ,
Version = 16
WHERE Id = 1234 AND Version = 15
(assuming your version was 15) - in the process it will also increment the version field
If so that means the business logic is keeping track of the "row
locking" but in theory someone still could just go Where(x => x.Id ==
id) and grab that row and update at will.
If someone else updates the row via NHibernate, it will increment the version automatically, so when your user tries to save it with the wrong version you will get an exception which you need to decide how to handle (ie. try show some merge screen, or tell the user to try again with the new data)
What happens when the row gets updated? Do you set null to the timestamp?
It updates the version or timestamp (timestamp will get updated to the current time) automatically
What happens if the user never actually finishes updating and leaves. How does the row
every become unlocked again?
The row is not locked per se, it is instead using optimistic concurrency, where you assume that no-one will change the same row at the same time, and if someone does, then you need to retry the update.
Is there still a race condition what happens or is this next to
impossible in happening? I am just concerned 2 ppl try to get edit the
same row and both of them see it in their gui for editing but one is
actually going to get denied in the end because they lost the race
condition.
If 2 people try to edit the same row at the same time, one of them will lose if you're using optimistic concurrency. The benefit is that they will KNOW that there was a conflict, as opposed to either losing their changes and thinking that it updated, or overwriting someone else's changes without knowing about it.
So I did something like this
var test = session.Query.Where(x => x.Id ==
id).FirstOrDefault(); // send to user for editing. Has versioning on
it. user edits and send back the data 30mins later.
Codes does
test.Id = vm.Id; test.ColumnA = vm.ColumnA; test.Version = vm.Version;
session.Update(test); session.Commit(); So the above will work right?
The above will throw an exception if someone else has gone in and changed the row. That's the point of it, so you know that a concurrency issue has arisen. Typically you'd show the user a message saying "Someone else has changed this row" with the new row there and possibly their changes also so the user has to select which changes win.
but if I do this
test.Id = vm.Id;
test.ColumnA = vm.ColumnA;
session.Update(test);
session.Commit(); it would not commit right?
Correct as long as you haven't reloaded test (ie. you did test = new Xyz(), not test = session.Load() ) because the Timestamp on the row wouldn't match
If someone else updates the row via NHibernate, it will increment the
version automatically, so when your user tries to save it with the
wrong version you will get an exception which you need to decide how
to handle (ie. try show some merge screen, or tell the user to try
again with the new data)
Can I make it so when the record is grabbed this checked. I want to
keep it simple at first that only one person can edit at a time. The
other guy won't even be able to access the record to edit while
something is editing it.
That's not optimistic concurrency. As a simple answer you could add a CheckOutDate property which you set when someone starts editing it, and set it to null when they finish. Then when they start to edit, or when you show them the rows to edit you could exclude all rows where that CheckOutDate is newer than say the last 10 minutes (then you wouldn't need a scheduled task to reset it periodically)
The row is not locked per se, it is instead using optimistic
concurrency, where you assume that no-one will change the same row at
the same time, and if someone does, then you need to retry the update.
I am not sure what your saying does this mean I can do
session.query.Where(x => x.id == id).FirstOrDefault(); all day
long and it will keep getting me the record(thought it would keep
incrementing the version).
The query will NOT increment the version, only an update to it will increment the version.
I don't know that much about nHibernate itself, but if you are prepared to create some stored procs on the database it can >sort of< be done.
You will need one extra data column and two fields in your object model to store information against each row:
A 'hash' of all the field values (using SQL Server CHECKSUM 2008 and later or HASHBYTES for earlier editions) other than the hash field itself and the EditTimestamp field. This could be persisted to the table using INSERT/UPDATE triggers if needs be.
An 'edit-timestamp' of type datetime.
Change your procedures to do the following:
The 'select' procedure should include a where clause similar to 'edit-timestamp < (Now - 30 minutes)' and should update the 'edit-timestamp' to the current time. Run the select with appropriate locking BEFORE updating the row I'm thinking a stored procedure with hold locking such as this one here Use a persistent date/time rather than something like GETDATE().
Example (using fixed values):
BEGIN TRAN
DECLARE #now DATETIME
SET #now = '2012-09-28 14:00:00'
SELECT *, #now AS NewEditTimestamp, CHECKSUM(ID, [Description]) AS RowChecksum
FROM TestLocks
WITH (HOLDLOCK, ROWLOCK)
WHERE ID = 3 AND EditTimestamp < DATEADD(mi, -30, #now)
/* Do all your stuff here while the record is locked */
UPDATE TestLocks
SET EditTimestamp = #now
WHERE ID = 3 AND EditTimestamp < DATEADD(mi, -30, #now)
COMMIT TRAN
If you get a row back from this procedure then you 'have' the 'lock', otherwise, no rows will be returned and there's nothing to edit.
The 'update' procedure should add a where clause similar to 'hash = previously returned hash'
Example (using fixed values):
BEGIN TRAN
DECLARE #RowChecksum INT
SET #RowChecksum = -845335138
UPDATE TestLocks
SET [Description] = 'New Description'
WHERE ID = 3 AND CHECKSUM(ID, [Description]) = #RowChecksum
SELECT ##ROWCOUNT AS RowsUpdated
COMMIT TRAN
So in your scenarios:
User A edits a row. When you return this record from the database, the 'edit-timestamp' has been updated to the current time and you have a row so you know you can edit. User B would not get a row because the timestamp is still too recent.
User B edits the row after 30 minutes. They get a row back because the timestamp has passed more than 30 minutes ago. The hash of the fields will be the same as for user A 30 minutes ago as no updates have been written.
Now user B updates. The previously retrieved hash still matches the hash of the fields in the row, so the update statement succeeds, and we return the row-count to show that the row was updated. User A however, tries to update next. Because the value of the description field has changed, the hashvalue has changed, and so nothing is updated by the UPDATE statement. We get a result of 'zero rows updated' so we know that either the row has since been changed or the row was deleted.
There are probably some issues regarding scalability with all these locks going on and the above code could be optimised (might get problems with clocks going forward/back for example, use UTC), but I wrote these examples just to explain how it could work.
Outside of that I can't see how you can do this without utilising database level row-locking within the select transaction. It might be that you can request those locks via nHibernate, but that's beyond my knowledge of nHibernate I'm afraid.
Have you looked at the ISaveOrUpdateEventListener interface?
public class SaveListener : NHibernate.Event.ISaveOrUpdateEventListener
{
public void OnSaveOrUpdate(NHibernate.Event.SaveOrUpdateEvent e)
{
NHibernate.Persister.Entity.IEntityPersister p = e.Session.GetEntityPersister(null, e.Entity);
if (p.IsVersioned)
{
//TODO: check types etc...
MyEntity m = (MyEntity) e.Entity;
DateTime oldversion = (DateTime) p.GetVersion(m, e.Session.EntityMode);
DateTime currversion = (DateTime) p.GetCurrentVersion(m.ID, e.Session);
if (oldversion < currversion.AddMinutes(-30))
throw new StaleObjectStateException("MyEntity", m.ID);
}
}
}
Then in your Configuration, register it.
private static void Configure(NHibernate.Cfg.Configuration cfg)
{
cfg.EventListeners.SaveOrUpdateEventListeners = new NHibernate.Event.ISaveOrUpdateEventListener[] {new SaveListener()};
}
public static ISessionFactory CreateSessionFactory()
{
return Fluently.Configure().Database(...).
.Mappings(...)
.ExposeConfiguration(Configure)
.BuildSessionFactory();
}
And version the Properties you want to version in your Mapping class.
public class MyEntityMap: ClassMap<MyENtity>
{
public MyEntityMap()
{
Table("MyTable");
Id(x => x.ID);
Version(x => x.Timestamp);
Map(x => x.PropA);
Map(x => x.PropB);
}
}
The short answer to your question is you can't/shouldn't do this in a simple web application with nhibernates optimistic (version) and pessimistic (row locks) locking. The fact that your transactions are only as long as a request are your limiting factor.
What you CAN do is create another table and entity class, and mappings that manages these "locks". At the lowest level you need an Id of the object being edited and the Id of the user performing the editing, and a datetime of when the lock was acquired. I would make the Id of the object being edited the primary key since you want it to be exclusive...
When a user clicks on a row to edit, you can try to acquire a lock (create a new record in that table with the ids and current datetime). If the lock already exists for another user, than it will fail because you are trying to violate a primary key constraint.
If a lock is acquired, when the user clicks save you need to check that they still have a valid "lock" before performing the actual save. Then, perform the actual save and remove the lock record.
I would also recommend a background service/process that sweeps these locks periodically and removes the ones that have expired or are older than your time limit.
This is my prescribed way of dealing with "locks" in a web environment. Good luck!
Yes, it is possible to lock a row with nhibernate but if I understand well, your scenario is in a web context and then it is not the best practice.
The best practive is to use optimistic locking with automatic versioning as you mentioned.
Locking a row when page is opening and releasing it when page is unloading will quickly lead to dead lock the row (javascript issue, page not killed properly...).
Optimistic locking will make NHibernate throws an exception when flushing a transaction which contains objects modified by another session.
If you want to have true concurent modification of the same information you may try to think about a system which merge many users input inside a same document, but it is a system on its own, not managed by ORM.
You will have to choose a way to deal with session in a web environment.
http://nhibernate.info/doc/nh/en/index.html#transactions-optimistic
The only approach that is consistent with high concurrency and high
scalability is optimistic concurrency control with versioning.
NHibernate provides for three possible approaches to writing
application code that uses optimistic concurrency.
Hey you can try these sites
http://thesenilecoder.blogspot.ca/2012/02/nhibernate-samples-row-versioning-with.html
http://stackingcode.com/blog/2010/12/09/optimistic-concurrency-and-nhibernate
History
I have a list of "records" (3,500) which I save to XML and compress on exit of the program. Since:
the number of the records increases
only around 50 records need to be updated on exit
saving takes about 3 seconds
I needed another solution -- embedded database. I chose SQL CE because it works with VS without any problems and the license is OK for me (I compared it to Firebird, SQLite, EffiProz, db4o and BerkeleyDB).
The data
The record structure: 11 fields, 2 of them make primary key (nvarchar + byte). Other records are bytes, datatimes, double and ints.
I don't use any relations, joins, indices (except for primary key), triggers, views, and so on. It is flat Dictionary actually -- pairs of Key+Value. I modify some of them, and then I have to update them in database. From time to time I add some new "records" and I need to store (insert) them. That's all.
LINQ approach
I have blank database (file), so I make 3500 inserts in a loop (one by one). I don't even check if the record already exists because db is blank.
Execution time? 4 minutes, 52 seconds. I fainted (mind you: XML + compress = 3 seconds).
SQL CE raw approach
I googled a bit, and despite such claims as here:
LINQ to SQL (CE) speed versus SqlCe
stating it is SQL CE itself fault I gave it a try.
The same loop but this time inserts are made with SqlCeResultSet (DirectTable mode, see: Bulk Insert In SQL Server CE) and SqlCeUpdatableRecord.
The outcome? Do you sit comfortably? Well... 0.3 second (yes, fraction of the second!).
The problem
LINQ is very readable, and raw operations are quite contrary. I could write a mapper which translates all column indexes to meaningful names, but it seems like reinventing the wheel -- after all it is already done in... LINQ.
So maybe it is a way to tell LINQ to speed things up? QUESTION -- how to do it?
The code
LINQ
foreach (var entry in dict.Entries.Where(it => it.AlteredByLearning))
{
PrimLibrary.Database.Progress record = null;
record = new PrimLibrary.Database.Progress();
record.Text = entry.Text;
record.Direction = (byte)entry.dir;
db.Progress.InsertOnSubmit(record);
record.Status = (byte)entry.LastLearningInfo.status.Value;
// ... and so on
db.SubmitChanges();
}
Raw operations
SqlCeCommand cmd = conn.CreateCommand();
cmd.CommandText = "Progress";
cmd.CommandType = System.Data.CommandType.TableDirect;
SqlCeResultSet rs = cmd.ExecuteResultSet(ResultSetOptions.Updatable);
foreach (var entry in dict.Entries.Where(it => it.AlteredByLearning))
{
SqlCeUpdatableRecord record = null;
record = rs.CreateRecord();
int col = 0;
record.SetString(col++, entry.Text);
record.SetByte(col++,(byte)entry.dir);
record.SetByte(col++,(byte)entry.LastLearningInfo.status.Value);
// ... and so on
rs.Insert(record);
}
Do more work per transaction.
Commits are generally very expensive operations for a typical relational database as the database must wait for disk flushes to ensure data is not lost (ACID guarantees and all that). Conventional HDD disk IO without specialty controllers is very slow in this sort of operation: the data must be flushed to the physical disk -- perhaps only 30-60 commits can occur a second with an IO sync between!
See the SQLite FAQ: INSERT is really slow - I can only do few dozen INSERTs per second. Ignoring the different database engine, this is the exact same issue.
Normally, LINQ2SQL creates a new implicit transaction inside SubmitChanges. To avoid this implicit transaction/commit (commits are expensive operations) either:
Call SubmitChanges less (say, once outside the loop) or;
Setup an explicit transaction scope (see TransactionScope).
One example of using a larger transaction context is:
using (var ts = new TransactionScope()) {
// LINQ2SQL will automatically enlist in the transaction scope.
// SubmitChanges now will NOT create a new transaction/commit each time.
DoImportStuffThatRunsWithinASingleTransaction();
// Important: Make sure to COMMIT the transaction.
// (The transaction used for SubmitChanges is committed to the DB.)
// This is when the disk sync actually has to happen,
// but it only happens once, not 3500 times!
ts.Complete();
}
However, the semantics of an approach using a single transaction or a single call to SubmitChanges are different than that of the code above calling SubmitChanges 3500 times and creating 3500 different implicit transactions. In particular, the size of the atomic operations (with respect to the database) is different and may not be suitable for all tasks.
For LINQ2SQL updates, changing the optimistic concurrency model (disabling it or just using a timestamp field, for instance) may result in small performance improvements. The biggest improvement, however, will come from reducing the number of commits that must be performed.
Happy coding.
i'm not positive on this, but it seems like the db.SubmitChanges() call should be made outside of the loop. maybe that would speed things up?