session.StartTransaction();
await mongo.Collection1.UpdateOneAsync(session, filter1, update1);
await mongo.Collection2.BulkWriteAsync(session, updatesToDifferentDocs);
await mongo.Collection3.UpdateOneAsync(session, filter2, update2);
await session.CommitTransactionAsync();
The above code is running concurrently on multiple threads. The final update for Collection3 has a high chance of writing on the same document by multiple threads. I wanted the transactions across the 3 collections to be atomic which is why I put them in one session, which is what I thought session is essentially used for, however, I'm not familiar with the details of its inner-workings.
Without knowing much about the built-in features of Mongo. It's pretty obvious why this is giving me a write conflict. I simply can't write to the same document in Collection3 at the same time on multiple threads.
However, I tried Googling a bit and it seems like Mongo >= 3.2 has WiredTiger Storage Engine by default which has Document level locks that doesn't need to be used by the developer. I've read that it automatically retries the transaction if the document was initially locked.
I don't really know if I'm using session incorrectly here, or I just have to manually implement some kind of lock/semaphore/queue system. Another option would be to manually check for write conflict and re-attempt the entire session. But it feels like I'm just reinventing the wheel here if Mongo is already supposed to have concurrency support.
Should have updated this thread earlier but nonetheless, here is what I ended up doing to solve my problem. While MongoDB does have automatic retries to transactions along with some locking mechanisms, I couldn't find a clean way to do leverage this for my specific problematic session. Kept getting write conflicts even though I thought I'd acquired locks on all the colliding documents start of each session.
Having to maintain atomicity for a session that reads and writes across multiple collections not just documents, I thought it was cleaner to simply wrap it in custom retry logic. I followed the example bottom of page here and used a timeout that I thought was reasonable for my use case.
I decided to use a timeout-based retry logic because I knew most of the collisions would be tightly packed together temporally. For less predictable collisions, maybe some queue-based mechanism would be better.
the automatic retry behavior is different for transactions in mongodb. by default a transactional write operation will wait for 5ms and aborts the transaction if a lock cannot be aquired.
you could try increasing the 5ms timeout by running the following admin command on your mongodb instance:
db.adminCommand( { setParameter: 1, maxTransactionLockRequestTimeoutMillis: 100 } )
more details about that can be found here.
alternatively you could manually implement some retry logic for failed transactions.
Related
I have an API that are used for add/update new records to DB.
On start of such request I try to get data from db by some identifiers for request and do some updates.
In case there there are few concurrent requests to my API, some duplicates maybe be created.
So I am thinking about "wait till prev request is finished".
For this I found a solution to use new SemaphoreSlim(1,1) to allow only 1 request in a time.
But now I am wondering if it is a good solution. Because in case 1 request may take up to 1 min of processing, will other requests be alive until SemaphoreSlim allow to process them?
For sure it is something related to configuration, but it always some approx. number in settings and it may be limited by some additional threshold settings.
The canonical way to do this is to use database transactions.
For example, SQL Server's transaction isolation level "serializable" ensures that even if transactions are concurrent, the effect will be as if they had been executed one after another. This will give you the best of both worlds: Your requests can be processed in parallel, and the database engine ensures that locks and serialization happen if, and only if, it's required to avoid transactional inconsistency.
Conveniently, "serializable" is the default isolation level used by TransactionScope. Thus, if your DB library provider supports it, wrapping your code in a TransactionScope block might be all you need.
So the question is long but pretty self explanatory. I have an app that runs on multiple servers that uses parallel looping to handle objects coming out of a MongoDB Collection. Since MongoDB forces me to allow multi read access I cannot stop multiple processes and or servers from grabbing the same document from the collection and duplicating work.
The program is such that the app waits for information to appear, does some work to figure out what to do with it, then deletes it once it's done. What I hope to achieve is that if I could keep documents from being accessed at the same time, knowing that once one has been read it will eventually be deleted, I can speed up my throughput a bit overall by reducing the number of duplicates and allowing the apps to grab things that aren't being worked.
I don't think pessimistic is quite what I'm looking for but maybe I misunderstood the concept. Also if alternative setups are being used to solve the same problem I would love to hear what might be being used.
Thanks!
What I hope to achieve is that if I could keep documents from being accessed at the same time
The simplest way to achieve this is by introducing a dispatch process architecture. Add a dedicated process that just watch for changes then delegate or dispatch the tasks out to multiple workers.
The process could utilise MongoDB ChangeStreams to access real-time data changes on a single collection, a database or an entire deployment. Once it receives a stream/document, just sends to a worker for processing.
This should also reduce multiple workers trying to access the same tasks and have a logic to back-down.
We have an card order process which can take a few seconds to run. Data in this process is run using the LockMode.UPGRADE in nHibernate.
A second (webhook) process with runs with LockMode.NONE is occasionally being triggered before the first order process completed creating a race condition and appears to be using the original row data. Its not being blocked until the first is complete so is getting old data.
Since our database is not running with NO WAIT or any of the other SNAPSHOT COMMITTED settings.
My Question is: Can lockmode.none somehow ignore the UPGRADE lock and read the old data (cache perhaps?)
Thanks.
Upgrade locks are for preventing another transaction to acquire the same data for upgrade too.
A transaction running in read committed mode without additional lock will still be able to read them. You need an exclusive lock for blocking read committed reads (provided they are without snapshot isolation level). Read more on Technet.
As far as I know, NHibernate does not provide a way to issue exclusive locks on entity reads. (You will have them by actually updating the entities in the transaction.) You may workaround this by using CreateSQLQuery for issuing your locks directly in DB. Or in your webhook process, acquire upgrade lock too.
But this path will surely lead to deadlocks or lock contentions.
More generally, avoiding explicitly locking by adapting code patterns is preferable. I have just written an example here, see the "no explicit lock" solution.
As for your webhook process, is there really any differences between those two situation?
It obtains its data, which was not having any update lock on it, but before it processes it, its data get updated by the order process.
It obtains its data, including some which was having an update lock on them.
Generally you will still have the same trouble to solve, and avoiding 2. will not resolve 1. Here too, that is the way the process work with its data that need to be adapted for supporting concurrency. (Hint: using long running transactions for solving 1. will surely lead to locks contention too.)
From the name FindOneAndUpdate() I understand this is an atomic operation.
What if I want to find 10 items (Limit(10)) and Update them all alike?
For example set a state field to "in progress"?
Is that atomically achievable with MongoDb? Is there some built-in functionality in the C# driver maybe? I don't want to implement 2PC myself if it is avoidable :-)
I have other consumers asking for documents as well, I therefore want to avoid double processing although it is not critical to my business case.
The motivation NOT to use FindOneAndUpdate() 10 times is purely networking (less chatter, and better performance) related. I do not have a requirement for transaction-like behavior.
The database nor the business case is under my control but I was told to expect many documents going in and out rather quickly.
In MongoDB, operations are only considered atomic on a per-document basis. That is, if you update multiple fields in a document with a single update statement, you will get all of the updates or none of them should they be queried while the update is happening. That means an update that affects more than one document will not be an atomic operation across all documents, but will be atomic within the document.
Since it sounds like you are concerned with being more efficient with sending commands to the server, not so much as to whether or not the operation is atomic server side, you can use BulkWriteAsync() which takes a IEnumerable<WriteModel<TDocument>> of updates to perform on the server.
This allows you to build a list updates and execute them in one operation to the server. Care must be taken during this process to properly handle failed writes. Take a look at the MongoDB Docs on this over here.
Firstly, let me give a brief description of the scenario. I'm writing a simple game where pretty much all of the work is done on the server side with a thin client for players to access it. A player logs in or creates an account and can then interact with the game by moving around a grid. When they enter a cell, they should be informed of other players in that cell and similarly, other players in that cell will be informed of that player entering it. There are lots of other interactions and actions that can take place but it's not worth going in to detail on them as it's just more of the same. When a player logs out then back in or if the server goes down and comes back up, all of the game state should persist, although if the server crashes, it doesn't matter if I lose 10 minutes or so of changes.
I've decided to use NHibernate and a SQLite database, so I've been reading up a lot on NHibernate, following tutorials and writing some sample applications, and am thoroughly confused as to how I should go about this!
The question I have is: what's the best way to manage my sessions? Just from the small amount that I do understand, all these possibilities jump out at me:
Have a single session that's always opened that all clients use
Have a single session for each client that connects and periodically flush it
Open a session every time I have to use any of the persisted entities and close it as soon as the update, insert, delete or query is complete
Have a session for each client, but keep it disconnected and only reconnect it when I need to use it
Same as above, but keep it connected and only disconnect it after a certain period of inactivity
Keep the entities detached and only attach them every 10 minutes, say, to commit the changes
What kind of strategy should I use to get decent performance given that there could be many updates, inserts, deletes and queries per second from possibly hundreds of clients all at once, and they all have to be consistent with each other?
Another smaller question: how should I use transactions in an efficient manner? Is it fine for every single change to be in its own transaction, or is that going to perform badly when I have hundreds of clients all trying to alter cells in the grid? Should I try to figure out how to bulk together similar updates and place them within a single transaction, or is that going to be too complicated? Do I even need transactions for most of it?
I would use a session per request to the server, and one transaction per session. I wouldn't optimize for performance before the app is mature.
Answer to your solutions:
Have a single session that's always opened that all clients use: You will have performance issues here because the session is not thread safe and you will have to lock all calls to the session.
Have a single session for each client that connects and periodically flush it: You will have performance issues here because all data used by the client will be cached. You will also see problems with stale data from the cache.
Open a session every time I have to use any of the persisted entities and close it as soon as the update, insert, delete or query is complete: You won't have any performance problems here. A disadvantage are possible concurrency or corrupt data problems because related sql statements are not executed in the same transaction.
Have a session for each client, but keep it disconnected and only reconnect it when I need to use it: NHibernate already has build-in connection management and that is already very optimized.
Same as above, but keep it connected and only disconnect it after a certain period of inactivity: Will cause problems because the amount of sql connections is limited and will also limit the amount of users of your application.
Keep the entities detached and only attach them every 10 minutes, say, to commit the changes: Will cause problems because of stale data in the detached entities. You will have to track changes yourself, which makes you end up with a piece of code that looks like the session itself.
It would be useless to go into more detail now, because I would just repeat the manuals/tutorials/book. When you use a session per request, you probably won't have problems in 99% of the application you describe (and maybe not at all). Session is a lightweight not threadsafe class, that to live a very short. When you want to know exactly how the session/connection/caching/transaction management works, I recommend to read a manual first, and than ask some more detailed questions about the unclear subjects.
Read the 'ISessionFactory' on this page of NHibernate documentation. ISessions are meant to be single-threaded (i.e., not thread-safe) which probably means that you shouldn't be sharing it across users. ISessionFactory should be created once by your application and ISessions should be created for each unit of work. Remember that creating an ISessions does not necessarily result in opening a database connection. That depends on how your SessionFactory's connection pooling strategy is configured.
You may also want to look at Hibernate's Documentation on Session and Transaction.
I would aim to keep everything in memory, and either journal changes or take periodic offline snapshots.
Have a read through NHibernate Best Practices with ASP.NET, there are some very good tips in here for a start. As mentioned already be very careful with an ISession as it is NOT threadsafe, so just keep that in mind.
If you require something a little more complex then take a look into the NHibernate.Burrow contrib project. It states something like "the real power Burrow provides is that a Burrow conversation can span over multiple http requests".