I have an API that are used for add/update new records to DB.
On start of such request I try to get data from db by some identifiers for request and do some updates.
In case there there are few concurrent requests to my API, some duplicates maybe be created.
So I am thinking about "wait till prev request is finished".
For this I found a solution to use new SemaphoreSlim(1,1) to allow only 1 request in a time.
But now I am wondering if it is a good solution. Because in case 1 request may take up to 1 min of processing, will other requests be alive until SemaphoreSlim allow to process them?
For sure it is something related to configuration, but it always some approx. number in settings and it may be limited by some additional threshold settings.
The canonical way to do this is to use database transactions.
For example, SQL Server's transaction isolation level "serializable" ensures that even if transactions are concurrent, the effect will be as if they had been executed one after another. This will give you the best of both worlds: Your requests can be processed in parallel, and the database engine ensures that locks and serialization happen if, and only if, it's required to avoid transactional inconsistency.
Conveniently, "serializable" is the default isolation level used by TransactionScope. Thus, if your DB library provider supports it, wrapping your code in a TransactionScope block might be all you need.
Related
A have an API that executes a query that takes 1 minute to process. If someone makes a GET request to this API, I will execute the query and save the results in Redis.
New requests to this API will use the cached data from Redis, avoiding doing this 1 minute query again.
My problem is: at 8AM, my cache is dropped because new data is available in the database. The first API request will execute the 1 minute-long query. The second request will also execute the same 1 minute-long query, since the first one hasn't finished yet and Redis is empty.
In the end, I have thousands of queries running and the database can't handle all of them, and no query can finish because the database stops to work.
Is there a known pattern to handle this?
What I'm doing to handle this is to set a flag "isQueryRunning" (thread-safe by a lock) to allow just one thread to execute per time, leaving the others waiting, but I would like to know if there are other known strategies.
There are several strategies. The one you mentioned is valid, if somewhat basic because it won't work well behind a load balancer, as your lock is not distributed.
A common way around this is for state to be stored in a persistent store. In your case, this state flag could be stored in Redis itself. That gets you over the non-distributed lock problem.
However, this ties up the server because you're waiting on request threads. In REST it is common for an API to simply check the state and either
return stale data (a different cached copy still available while the cache is being rebuilt) or
return a 202 ACCEPTED HTTP status with a LOCATION header that has a URI that points to the new data. A client can then poll that location. This means of course you have to code that other endpoint, which will continue to return 202 until the data is available, and then either
return 200 with the data, or
return 301 or 307 (redirects back to the original URI)
The first is very simple if stale data is an OK thing. You can simply do a "swap" in the cache (very quick) when the new data is available. (Btw, this swap is probably better than simply dropping the data altogether before replacing it).
The second, is of course, more complex, but scales well and avoids stale data as much as possible. More than just a location can be returned. You may return info such as a possible ready-time for the data (e.g. 1 minute), a value indicating how much data is retrieved (e.g. a percentage), or other status. See here for example.
session.StartTransaction();
await mongo.Collection1.UpdateOneAsync(session, filter1, update1);
await mongo.Collection2.BulkWriteAsync(session, updatesToDifferentDocs);
await mongo.Collection3.UpdateOneAsync(session, filter2, update2);
await session.CommitTransactionAsync();
The above code is running concurrently on multiple threads. The final update for Collection3 has a high chance of writing on the same document by multiple threads. I wanted the transactions across the 3 collections to be atomic which is why I put them in one session, which is what I thought session is essentially used for, however, I'm not familiar with the details of its inner-workings.
Without knowing much about the built-in features of Mongo. It's pretty obvious why this is giving me a write conflict. I simply can't write to the same document in Collection3 at the same time on multiple threads.
However, I tried Googling a bit and it seems like Mongo >= 3.2 has WiredTiger Storage Engine by default which has Document level locks that doesn't need to be used by the developer. I've read that it automatically retries the transaction if the document was initially locked.
I don't really know if I'm using session incorrectly here, or I just have to manually implement some kind of lock/semaphore/queue system. Another option would be to manually check for write conflict and re-attempt the entire session. But it feels like I'm just reinventing the wheel here if Mongo is already supposed to have concurrency support.
Should have updated this thread earlier but nonetheless, here is what I ended up doing to solve my problem. While MongoDB does have automatic retries to transactions along with some locking mechanisms, I couldn't find a clean way to do leverage this for my specific problematic session. Kept getting write conflicts even though I thought I'd acquired locks on all the colliding documents start of each session.
Having to maintain atomicity for a session that reads and writes across multiple collections not just documents, I thought it was cleaner to simply wrap it in custom retry logic. I followed the example bottom of page here and used a timeout that I thought was reasonable for my use case.
I decided to use a timeout-based retry logic because I knew most of the collisions would be tightly packed together temporally. For less predictable collisions, maybe some queue-based mechanism would be better.
the automatic retry behavior is different for transactions in mongodb. by default a transactional write operation will wait for 5ms and aborts the transaction if a lock cannot be aquired.
you could try increasing the 5ms timeout by running the following admin command on your mongodb instance:
db.adminCommand( { setParameter: 1, maxTransactionLockRequestTimeoutMillis: 100 } )
more details about that can be found here.
alternatively you could manually implement some retry logic for failed transactions.
So in WCF to flow transactions from client to server you must have your
[OperationBehavior(TransactionScopeRequired = true)]
On your instance methods and
[TransactionFlow(TransactionFlowOption.Allowed)]
On your service interfaces. And everything works. However, I find it concerning that
the server allocates a TX even if the client isn't flowing one up. It seems wasteful
I understand .NET transactions can be lightweight. Am I overreacting? Should I just
trust in .NET and let it allocate a needless local transaction? I'm worried it's
unnecessary bulk, and even more worried it may get promoted to MSDTC involvement
EDIT 1:
The operation at hand which makes this clumsy is:
insert on table A
insert on table B
read on table A
insert on table C
Operation 3, read MUST be marked up as above as transactionscoperequired. Otherwise since TX is not flowed, read times out. I find this a little weird, brute forcing a TX to exist for a read-only operation. It implies I'll have to mark most of the WCF calls in the system with a TransactionScopeRequired=true
A transaction is a tiny .NET in-memory data structure. It is nothing. What's expensive are the resource enlistments. That said, you are going to have at least one such enlistment.
Transactions usually help with database throughput, especially with writes.
You probably want your method to execute under a transaction anyway because you want effects to be atomic and reads to be consistent. It doesn't matter whether the client requests a tran or not.
and even more worried it may get promoted to MSDTC involvement
That's a valid concern. That said distributed transactions are best avoided because they are slow and they do not work at all with some HA strategies like mirroring and AG.
I have a data entry ASP.NET application. During a one complete data entry many transactions occur. I would like to keep track of all those transactions so that if the user wants to abandon the data entry, all the transaction of which I have been keeping record can be rolled back.
SQL 2008 ,Framework version is 4.0 and I am using c#.
This is always a tough lesson to learn for people that are new to web development. But here it is:
Each round trip web request is a separate, stand-alone thread of execution
That means, simply put, each time you submit a page request (click a button, navigate to a new page, even refresh a page) then it can run on a different thread than the previous one. What's more, even if you do get the same thread twice, several other web requests may have been processed by the thread in the time between your two requests.
This makes it effectively impossible to span simple transactions across more than one web request.
Here's another concept that you should keep in mind:
Transactions are intended for batch operations, not interactive operations.
What this means is that transactions are meant to be short-lived, and to encompass several operations executing sequentially (or simultaneously) in which all operations are atomic, and intended to either all complete, or all fail. Transactions are not typically designed to be long-lived (meaning waiting for a user to decide on various actions interactively).
Web apps are not desktop apps. They don't function like them. You have to change your thinking when you do web apps. And the biggest lesson to learn, each request is a stand-alone unit of execution.
Now, above, I said "simple transactions", also known as lightweight or local transactions. There's also what's known as a Distributed Transaction, and to use those requires a Distributed Transaction Coordinator. MSDTC is pretty commonly used. However, DT's perform much more slowly than LWT's. Also, they require that the infrastructure be setup to use a DTC.
It's possible to span a transaction over web requests using a DTC. This is done by "Enlisting" in a Distribute Transaction, and then somehow sharing this transaction identifier between requests. But this is a lot of work to setup, and deal with, and has a lot of error prone situations. It's not something you want to do if you have other options.
In general, you're better off adding the data to a temporary table or tables, and then when the final save is done, transfer that data to the permanent tables. Another option is to maintain some state (such as using ViewState or Session) to keep track of the changes.
One popular way of doing this is to perform operations client-side using JavaScript and then submitting all the changes to the server when you are done. This is difficult to implement if you need to navigate to different pages, however.
From your question, it appears that the transactions are complete when the user exercises the option to roll them back. In such cases, I doubt if the DBMS's transaction rollback semantics would be available. So, I would provide such semantics at the application layer as follows:
Any atomic operation that can be performed on the database should be encapsulated in a Command object. Each command will implement the undo method that would revert the action performed by its execute method.
Each transaction would contain a list of commands that were run as part of it. The transaction is persisted as is for further operations in future.
The user would be provided with a way to view these transactions that can be potentially rolled back. Upon selection of a transaction by user to roll it back, the list of commands corresponding to such a transaction are retrieved and the undo method is called on all those command objects.
HTH.
You can also store them on temporary Table and move those records to your original table 'at later stage'..
If you are just managing transactions during a single save operation, use TransactionScope. But it doesn't sound like that is the case.
If the user may wish to abandon n number of previous save operations, it suggests that an item may exist in draft form. There might be one working draft or many. Subsequently, there must be a way to promote a draft to a final version, either implicitly or explicitly. Think of how an email program saves a draft. It doesn't actually send your message, you may abandon it at any time, and you may recall it at a later time. When you send the message, you have "committed the transaction".
You might also add a user interface to rollback to a specific version.
This will be a fair amount of work, but if you are willing to save and manage multiple copies of the same item it can be accomplished.
You may save the a copy of the same data in the same schema using a status flag to indicate that it is a draft, or you might store the data in an intermediate format in separate table(s). I would prefer the first approach in that it allows the same structures to be used.
A while ago, I wrote an application used by multiple users to handle trades creation.
I haven't done development for some time now, and I can't remember how I managed the concurrency between the users. Thus, I'm seeking some advice in terms of design.
The original application had the following characteristics:
One heavy client per user.
A single database.
Access to the database for each user to insert/update/delete trades.
A grid in the application reflecting the trades table. That grid being updated each time someone changes a deal.
I am using WPF.
Here's what I'm wondering:
Am I correct in thinking that I shouldn't care about the connection to the database for each application? Considering that there is a singleton in each, I would expect one connection per client with no issue.
How can I go about preventing the concurrency of the accesses? I guess I should lock when modifying the data, however don't remember how to.
How do I set up the grid to automatically update whenever my database is updated (by another user, for example)?
Thank you in advance for your help!
Consider leveraging Connection Pooling to reduce # of connections. See: http://msdn.microsoft.com/en-us/library/8xx3tyca.aspx
lock as late as possible and release as soon as possible to maximize concurrency. You can use TransactionScope (see: http://msdn.microsoft.com/en-us/library/system.transactions.transactionscope.aspx and http://blogs.msdn.com/b/dbrowne/archive/2010/05/21/using-new-transactionscope-considered-harmful.aspx) if you have multiple db actions that need to go together to manage consistency or just handle them in DB stored proc. Keep your query simple. Follow the following tips to understand how locking work and how to reduce resource contention and deadlock: http://www.devx.com/gethelpon/10MinuteSolution/16488
I am not sure other db, but for SQL, you can use SQL Dependency, see http://msdn.microsoft.com/en-us/library/a52dhwx7(v=vs.80).aspx
Concurrency is usually granted by the DBMS using locks. Locks are a type of semaphore that grant the exclusive lock to a certain resource and allow other accesses to be restricted or queued (only restricted in the case you use uncommited reads).
The number of connections itself does not pose a problem while you are not reaching heights where you might touch on the max_connections setting of your DBMS. Otherwise, you might get a problem connecting to it for maintenance purposes or for shutting it down.
DBMSes usually use a concept of either table locks (MyISAM) or row locks (InnoDB, most other DBMSes). The type of lock determines the volume of the lock. Table locks can be very fast but are usually considered inferior to row level locks.
Row level locks occur inside a transaction (implicit or explicit). When manually starting a transaction, you begin your transaction scope. Until you manually close the transaction scope, all changes you make will be attributes to this exact transaction. The changes you make will also obey the ACID paradigm.
Transaction scope and how to use it is a topic far too long for this platform, if you want, I can post some links that carry more information on this topic.
For the automatic updates, most databases support some kind of trigger mechanism, which is code that is run at specific actions on the database (for instance the creation of a new record or the change of a record). You could post your code inside this trigger. However, you should only inform a recieving application of the changes, not really "do" the changes from the trigger, even if the language might make it possible. Remember that the action which triggered the code is suspended until you finish with your trigger code. This means that a lean trigger is best, if it is needed at all.