I'm facing the following situation:
A system I'm working on has a few different parts(services and ASP.net) with seperate responsibilities. These parts are combined by 2 resources: A MSSQL-DB and files on a windows filesystem.
Currently all these parts access these resources individually. I think this is causing unpredictability and inconsistency.
I'm thinking of introducing a service that regulates access to these resources. I'm not sure if this is an accepted design principle.
The general question is:
What kind of solution should I be looking at and what should I keep in mind when designing this?
Specific questions:
Is this just a Data Access Layer?
Is it bad to introduce a SPOF like this?
Can you recommend any reading material aimed at this kind of solution? (especially if there's specific material for C#)
edit because of a great question by allen-smithee:
The database is currently accessed by embedded queries. They are seperated into a class but these are different for every service so it's not a shared library.
1/ A Data Access Layer simply encapsulates the data logic, what you need is concurrency control to ensure consistency of your data model across the independent services.
2/ Depending how you implement concurrency it can be a single point of failure but I don't think there is anything wrong with that - "plan for failure" is a great design mantra. You can build in redundancy and fail-over mechanisms, or you can distribute your concurrency control across your services.
3/ The way you choose to implement concurrency will depend on how your application functions and what your users expect. To give some specific scenarios:
Scenario A
When a service begins an update start a transaction and take out one or more row-level locks for the records involved. If any other service tries to edit the record at the same time either block or return an error such as 'this record is currently locked'. Note that all locks have to be taken before reading and kept for the duration of the update to ensure consistency with other writes.
Pros - Fairly straight forward to implement for small data models. MSSQL supports plenty of locking scenarios and even custom application locks that you can use to group resources.
Cons - If your transaction needs to access multiple tables/rows and different services or functions access overlapping tables you can easily get into all sorts of deadlock problems.
MSSQL generally prefers pessimistic locking and can escalate locks from row to page and table level, which means read and write locks may behave in ways you wouldn't initially expect. You may need to spend a considerable amount of time debugging these interactions in SQL Server Profiler and be prepared to make changes to your data model to work around these issues.
Scenario B
Each table row has an incremental version number. A service reads the data it needs, performs a series of updates, and then within a transaction lock checks the current row version against the one it used for the update. If the version numbers do not match it rolls back the transaction, cancelling the update. The service may then attempt to perform the operation again starting with reading the data.
Pros - Readers are not blocked and the lock is held only very briefly while the service tries to commit the update. MSSQL has built-in support for this concurrency method in the form of 'Row Versioning' with the 'Snapshot Isolation' level. If conflicts are rare this method can be extremely responsive - perfect for real-time applications.
Cons - This method may require significant changes to your data model and the service behaviour.
Scenario C
A single data service is responsible for all data access. Other services request data from and submit updates to this service. The service is responsible for reading and writing to the database and filesystem, and performs some level of data integrity checking and resolves data conflicts.
Pros - Encapsulates data integrity and control in one module, simplifying other services. Allows you to implement caching, locking etc at the application level providing finer-grained control.
Cons - Significant changes to existing architecture required. Resolving data conflicts can require a significant amount of code if you choose to resolve at the field level. Services will need to be able to handle a rejected update when resolution is not possible.
That's the major scenarios I can think of off the top of my head but there are plenty more. Generally all concurrency control for data will revolve around locking while performing an action (pessimistic locking); performing an action and then checking for a conflict (optimistic locking via versioning); or performing an action and then merging conflicts (conflict resolution.)
Thinking about your specific data model and how the model is updated will guide which mix of these techniques you will use. Searching for any of the terms above will give you plenty to read and there are a lot of Technet articles that specifically address these issues in an MSSQL context. Take heart - I've seen good programmers get this stuff wrong, it really is a challenging problem, but it is solvable if you work through it methodically.
Related
I need an ORM that is suitable for stateful application. I'm going to keep entities between requests in low-latency realtime game server with persistent client connections. There is an only 1 server instance connected to database so no data can be changed from "outside" and the server can rely on its cache.
When user remotely logs in to the server its whole profile is loaded to server memory. Several higher-level services are also created for each user to operate profile data and provide functionality. They can also have internal fields (state) to store temporary data. When user wants to change his signature he asks corresponding service to do so. The service tracks how frequently user changes his signature and allows it only once per ten minutes (for example) - such short interval is not tracked in db, this is a temporary state. This change should be stored to db executing only 1 query: UPDATE users SET signature = ... WHERE user_id = .... When user logs off it's unloaded from server memory after minutes/hours of inactivity. Db here is only a storage. This is what I call stateful.
Some entities are considered "static data" and loaded only once at application start. Those can be referenced from other "dynamic" entities. Loading "dynamic" entity should not require reloading referenced "static data" entity.
Update/Insert/Delete should set/insert/delete only changed properties/entities even with "detached" entity.
Write operations should not each time load data from database (perform Select) preliminary to detect changes. (A state can be tracked in dynamically generated inheritor.) I have a state locally, there is no sense to load anything. I want to continue tracking changes even outside of connection scope and "upload" changes when I want.
While performing operations references of persisted objects should not be changed.
DBConnection-per-user is not going to work. The expected online is thousands of users.
Entities from "static data" can be assigned to "dynamic" enitity properties (which represent foreign keys) and Update should handle it correctly.
Now I'm using NHibernate despite it's designed for stateless applications. It supports reattaching to session but that looks like very uncommon usage, requires me to use undocumented behavior and doesn't solve everything.
I'm not sure about Entity Framework - can I use it that way? Or can you suggest another ORM?
If the server will recreate (or especially reload) user objects each time user hits a button it will eat CPU very fast. CPU scales vertically expensively but have small effect. Contrary if you are out of RAM you can just go and buy more - like with horizontal scaling but easier to code. If you think that another approach should be used here I'm ready to discuss it.
Yes, you can use EF for this kind of application. Please keep in mind, that on heavy load you will have some db errors time to time. And typically, it's faster to recover after errors, when you application track changes, not EF. By the way, you can use this way NHibernate too.
I have used hibernate in a stateful desktop application with extremely long sessions: the session starts when the application launches, and remains open for as long as the application is running. I had no problems with that. I make absolutely no use of attaching, detaching, reattaching, etc. I know it is not standard practice, but that does not mean it is not doable, or that there are any pitfalls. (Edit: but of course read the discussion below for possible pitfalls suggested by others.)
I have even implemented my own change notification mechanism on top of that, (separate thread polling the DB directly, bypassing hibernate,) so it is even possible to have external agents modify the database while hibernate is running, and to have your application take notice of these changes.
If you have lots and lots of stuff already working with hibernate, it would probably not be a good idea to abandon what you already have and rewrite it unless you are sure that hibernate absolutely won't do what you want to accomplish.
I know several topics on the subject have been discussed, because I have been reading a lot to try to resolve my issue, but somehow they happen to not fulfill my needs (maybe for the lack of detail). Anyway, if you think some specific 'topic' might be useful, please link it.
I'm developing a desktop application with WPF (and MVVM) and I'm using NHibernate. After researching about possible ways to manage my session, I have decided to use the session-per-form approach. By this way, I think I can fully use the features of NHibernate like lazy-loading, cache and so on.
As I'm working with a database, I don't want to freeze my UI while I'm loading or saving my entities, so I thought I should use a dedicated thread (in each form, which I think simplifies the development) to handle the database interaction. The problem, though, is how I should 'reuse' the thread (supposing I have a session associated with that thread) to make my 'database calls'.
I think I couldn't use TPL because I'm not guaranteed that the two tasks would run in the same thread (it's not even guaranteed that they will be run in different threads than the invoker)
I would prefer to use session-per-form, as I have seen similar discussions that end by using session-per-conversation or something like that. But anyway, if you find that session-per-conversation would be better, please tell me (and hopefully explain why)
Threads don't provide a way to directly run more than one method, so I think I would have to 'listen' for requests, but I'm still unsure if I really have to do this and how I would 'use' the session (and save it) only inside the thread.
EDIT:
Maybe I'm having this problem because I'm confusing thread-safety with something else.
When the NHibernate documentation says that ISession instances are not thread-safe, does it means that I will (or could) get into trouble if two threads attempt to use it at the same time, right? In my case, if I use TPL, different threads could use the same session, but I wouldn't perform more than one operation in the same session at the same time. So, would I get into trouble in that situation?
If I may make a suggestion, desktop applications are poorly suited to interact with the database directly. The communication is not encrypted and it's really easy for someone with even the slightest amount of know-how to grab the database password and begin messing with records using a SQL connection and corrupt your database.
It would be better to create a web service with authentication that stands between the desktop application and the database as you could create credentials for each person and every transaction would be forcibly subjected to your various business rules.
This would also take care of your threading issue as you would be able to create HTTP connections on another thread with little to no trouble concerning session management. A cookie value is likely all that would be required and RestSharp makes this fairly trivial.
I have a data entry ASP.NET application. During a one complete data entry many transactions occur. I would like to keep track of all those transactions so that if the user wants to abandon the data entry, all the transaction of which I have been keeping record can be rolled back.
SQL 2008 ,Framework version is 4.0 and I am using c#.
This is always a tough lesson to learn for people that are new to web development. But here it is:
Each round trip web request is a separate, stand-alone thread of execution
That means, simply put, each time you submit a page request (click a button, navigate to a new page, even refresh a page) then it can run on a different thread than the previous one. What's more, even if you do get the same thread twice, several other web requests may have been processed by the thread in the time between your two requests.
This makes it effectively impossible to span simple transactions across more than one web request.
Here's another concept that you should keep in mind:
Transactions are intended for batch operations, not interactive operations.
What this means is that transactions are meant to be short-lived, and to encompass several operations executing sequentially (or simultaneously) in which all operations are atomic, and intended to either all complete, or all fail. Transactions are not typically designed to be long-lived (meaning waiting for a user to decide on various actions interactively).
Web apps are not desktop apps. They don't function like them. You have to change your thinking when you do web apps. And the biggest lesson to learn, each request is a stand-alone unit of execution.
Now, above, I said "simple transactions", also known as lightweight or local transactions. There's also what's known as a Distributed Transaction, and to use those requires a Distributed Transaction Coordinator. MSDTC is pretty commonly used. However, DT's perform much more slowly than LWT's. Also, they require that the infrastructure be setup to use a DTC.
It's possible to span a transaction over web requests using a DTC. This is done by "Enlisting" in a Distribute Transaction, and then somehow sharing this transaction identifier between requests. But this is a lot of work to setup, and deal with, and has a lot of error prone situations. It's not something you want to do if you have other options.
In general, you're better off adding the data to a temporary table or tables, and then when the final save is done, transfer that data to the permanent tables. Another option is to maintain some state (such as using ViewState or Session) to keep track of the changes.
One popular way of doing this is to perform operations client-side using JavaScript and then submitting all the changes to the server when you are done. This is difficult to implement if you need to navigate to different pages, however.
From your question, it appears that the transactions are complete when the user exercises the option to roll them back. In such cases, I doubt if the DBMS's transaction rollback semantics would be available. So, I would provide such semantics at the application layer as follows:
Any atomic operation that can be performed on the database should be encapsulated in a Command object. Each command will implement the undo method that would revert the action performed by its execute method.
Each transaction would contain a list of commands that were run as part of it. The transaction is persisted as is for further operations in future.
The user would be provided with a way to view these transactions that can be potentially rolled back. Upon selection of a transaction by user to roll it back, the list of commands corresponding to such a transaction are retrieved and the undo method is called on all those command objects.
HTH.
You can also store them on temporary Table and move those records to your original table 'at later stage'..
If you are just managing transactions during a single save operation, use TransactionScope. But it doesn't sound like that is the case.
If the user may wish to abandon n number of previous save operations, it suggests that an item may exist in draft form. There might be one working draft or many. Subsequently, there must be a way to promote a draft to a final version, either implicitly or explicitly. Think of how an email program saves a draft. It doesn't actually send your message, you may abandon it at any time, and you may recall it at a later time. When you send the message, you have "committed the transaction".
You might also add a user interface to rollback to a specific version.
This will be a fair amount of work, but if you are willing to save and manage multiple copies of the same item it can be accomplished.
You may save the a copy of the same data in the same schema using a status flag to indicate that it is a draft, or you might store the data in an intermediate format in separate table(s). I would prefer the first approach in that it allows the same structures to be used.
A while ago, I wrote an application used by multiple users to handle trades creation.
I haven't done development for some time now, and I can't remember how I managed the concurrency between the users. Thus, I'm seeking some advice in terms of design.
The original application had the following characteristics:
One heavy client per user.
A single database.
Access to the database for each user to insert/update/delete trades.
A grid in the application reflecting the trades table. That grid being updated each time someone changes a deal.
I am using WPF.
Here's what I'm wondering:
Am I correct in thinking that I shouldn't care about the connection to the database for each application? Considering that there is a singleton in each, I would expect one connection per client with no issue.
How can I go about preventing the concurrency of the accesses? I guess I should lock when modifying the data, however don't remember how to.
How do I set up the grid to automatically update whenever my database is updated (by another user, for example)?
Thank you in advance for your help!
Consider leveraging Connection Pooling to reduce # of connections. See: http://msdn.microsoft.com/en-us/library/8xx3tyca.aspx
lock as late as possible and release as soon as possible to maximize concurrency. You can use TransactionScope (see: http://msdn.microsoft.com/en-us/library/system.transactions.transactionscope.aspx and http://blogs.msdn.com/b/dbrowne/archive/2010/05/21/using-new-transactionscope-considered-harmful.aspx) if you have multiple db actions that need to go together to manage consistency or just handle them in DB stored proc. Keep your query simple. Follow the following tips to understand how locking work and how to reduce resource contention and deadlock: http://www.devx.com/gethelpon/10MinuteSolution/16488
I am not sure other db, but for SQL, you can use SQL Dependency, see http://msdn.microsoft.com/en-us/library/a52dhwx7(v=vs.80).aspx
Concurrency is usually granted by the DBMS using locks. Locks are a type of semaphore that grant the exclusive lock to a certain resource and allow other accesses to be restricted or queued (only restricted in the case you use uncommited reads).
The number of connections itself does not pose a problem while you are not reaching heights where you might touch on the max_connections setting of your DBMS. Otherwise, you might get a problem connecting to it for maintenance purposes or for shutting it down.
DBMSes usually use a concept of either table locks (MyISAM) or row locks (InnoDB, most other DBMSes). The type of lock determines the volume of the lock. Table locks can be very fast but are usually considered inferior to row level locks.
Row level locks occur inside a transaction (implicit or explicit). When manually starting a transaction, you begin your transaction scope. Until you manually close the transaction scope, all changes you make will be attributes to this exact transaction. The changes you make will also obey the ACID paradigm.
Transaction scope and how to use it is a topic far too long for this platform, if you want, I can post some links that carry more information on this topic.
For the automatic updates, most databases support some kind of trigger mechanism, which is code that is run at specific actions on the database (for instance the creation of a new record or the change of a record). You could post your code inside this trigger. However, you should only inform a recieving application of the changes, not really "do" the changes from the trigger, even if the language might make it possible. Remember that the action which triggered the code is suspended until you finish with your trigger code. This means that a lean trigger is best, if it is needed at all.
I need to design a real-time product stock management engine (C# & WCF) but i don't know how to proceed in order to handle concurrency access and data integrity.
Here is some of the features the engine should be handle :
Stock Incoming products
Order preparation
Move products from one place to another
...
May i use MSMQ in order to ensure correct stock count (Messages processed in order by message pooling) or may i use application thread locking.
Note that my application have to be in Real-Time, preparer have to know in real-time how many products there are in stock in time. If there is lack of products at picking he can send a "request" to an operator.
Use a SQL database. They are already designed with data integrity, concurrency and data storage in mind.
you should probably use an SQL database as Lee says. If you use a transaction to e.g. store an order and decrease available product counts (both in the same transaction) the database guarantees atomicity. You probably also want some kind of concurrency mechanism (like a row version) to prevent inconsistent values (1st process reads, 2nd process updates the same value, then 1st process updates too overwriting the previous update based on outdated values).
Well the scenario that you have mentioned is generally where one has to use a queue rather than a persistent storage to meet the throughput needs. On searching on the net you can find a lot of case studies for the same where people have employed queuing systems to enhance the throughput of the system. SQL server can just not scale to that levels.
In special cases when your need to make your queue persistent very special methods are used as to how to mitigate the performance effects because of this. For ex. Apache's ActiveMQ has its own special file storage system which performs much better compared to simply using a MySQL for the backend persistence. Probably MSMQ also provides a similar option but am not sure.