I consider using CLR trigger instead of traditional T-SQL one because I need to use some logic that is already implemented in C#. I'm aware that SQL server supports CLR integration and in my case it seems like a solution that's worth a shot.
However, the operations I want to perform can be somewhat slow. Not slow enough to rule out using them in triggered actions completely, but probably noticeably slow when it comes to inserting hundreds of thousands of records. The slowest part can strongly benefit from caching, I suppose that it will be very few cache misses and thousands of cache hits. At this point it all leads to a question: can CLR triggers have any state? And, more important, what's the life cycle of this state?
I suppose I could use static fields of trigger class to hold some state, but I have no idea when it gets initialized (When the server is started? At transaction start? Not specified?). I am not sure if it's the safe route and therefore ask what the common practices for using some state in CLR triggers are (if any).
To avoid confusion: I need to cache CLR objects, not the results of some SQL queries, so it's not about how good SQL Server itself is at caching, I want to cache some data that doesn't belong to database. Also, I consider CLR not because I can't do string manipulations and bound checking in T-SQL. I need to execute some logic that is implemented in CLR class library and has a lot of dependencies. Wether I should use triggers in this case is another question that has almost nothing to do with this one.
Many thanks in advance.
PS: I will appreciate any comments and insights on topic, even the ones that don't answer my question directly, but please don't make it all about "triggers are evil and shouldn't ever be used" and "CLR integration is slow and a major compatibility pain". Also, I know that it may scream "premature optimization" to someone, but at the moment I just want to know what my optimization options are going in since I'm new to CLR integration in SQL server. I won't optimize it unless profiling results suggest so, but I don't want to implement the whole thing to realize it's too slow and there is nothing I can do about it.
I use SQL Server 2008 and .NET 3.5.
While it is possible to use static class fields in the SQLCLR Trigger class to cache values, there are several things you need to be very cautious about:
How much data do you plan on caching? You don't want to take up too much memory that SQL Server should instead be using for queries.
There is a single AppDomain per Database per Assembly Owner (i.e. AUTHORIZATION on the Assembly). This means that the code in any particular Assembly is shared across all SQL Server Sessions (i.e. SPIDs). If the data is just lookup data that won't change based on which process is interacting with the static field, then this is fine. But if the data is different per process, then this will produce "odd" behavior unless you associate a value such as the current TransactionID with the process.
If the data is per process, assuming you find a way to differentiate each particular SPID / SESSION, how are you going to clean up the old data? It will exist in memory until explicitly removed or the AppDomain is unloaded. This is not a problem for common lookup data that is meant to be shared with everyone as that type of data doesn't increase with each new process. But per-process data will continually increase unless cleared out.
AppDomains can be unloaded at any time and for a variety of reasons (memory pressure, drop/recreate of the Assembly, security change related to the Assembly, security change related to the DB, running DBCC FREESYSTEMCACHE('ALL'), etc). If the data being cached can cause different outcomes between sequential processes if one process relies upon data cached by a prior process, then this cannot be guaranteed to work. If the cache being dropped between processes results in nothing more than the need to reload the cache, then it should be fine.
Other notes (but nothing to be cautious about):
AppDomains are loaded when the first method is called in an Assembly where there is no currently running AppDomain for the Database that the Assembly exists in and the User that is the Authorizer of that Assembly.
AppDomains will remain loaded until they are unloaded by SQL Server for the one of the reasons noted above, but none of those scenarios will necessarily occur. Meaning, the AppDomain can remain loaded for a very long time (i.e. until server / service restart).
Each Assembly is loaded the first time a method inside of it is referenced.
In order to make use of the loading event, you can place code in the static class construct. Just be aware that there is no SqlContext available, so you can't make any SqlConnections in a static class constructor that use the in-process Context Connection (i.e. Context Connection = true).
Related
I need an ORM that is suitable for stateful application. I'm going to keep entities between requests in low-latency realtime game server with persistent client connections. There is an only 1 server instance connected to database so no data can be changed from "outside" and the server can rely on its cache.
When user remotely logs in to the server its whole profile is loaded to server memory. Several higher-level services are also created for each user to operate profile data and provide functionality. They can also have internal fields (state) to store temporary data. When user wants to change his signature he asks corresponding service to do so. The service tracks how frequently user changes his signature and allows it only once per ten minutes (for example) - such short interval is not tracked in db, this is a temporary state. This change should be stored to db executing only 1 query: UPDATE users SET signature = ... WHERE user_id = .... When user logs off it's unloaded from server memory after minutes/hours of inactivity. Db here is only a storage. This is what I call stateful.
Some entities are considered "static data" and loaded only once at application start. Those can be referenced from other "dynamic" entities. Loading "dynamic" entity should not require reloading referenced "static data" entity.
Update/Insert/Delete should set/insert/delete only changed properties/entities even with "detached" entity.
Write operations should not each time load data from database (perform Select) preliminary to detect changes. (A state can be tracked in dynamically generated inheritor.) I have a state locally, there is no sense to load anything. I want to continue tracking changes even outside of connection scope and "upload" changes when I want.
While performing operations references of persisted objects should not be changed.
DBConnection-per-user is not going to work. The expected online is thousands of users.
Entities from "static data" can be assigned to "dynamic" enitity properties (which represent foreign keys) and Update should handle it correctly.
Now I'm using NHibernate despite it's designed for stateless applications. It supports reattaching to session but that looks like very uncommon usage, requires me to use undocumented behavior and doesn't solve everything.
I'm not sure about Entity Framework - can I use it that way? Or can you suggest another ORM?
If the server will recreate (or especially reload) user objects each time user hits a button it will eat CPU very fast. CPU scales vertically expensively but have small effect. Contrary if you are out of RAM you can just go and buy more - like with horizontal scaling but easier to code. If you think that another approach should be used here I'm ready to discuss it.
Yes, you can use EF for this kind of application. Please keep in mind, that on heavy load you will have some db errors time to time. And typically, it's faster to recover after errors, when you application track changes, not EF. By the way, you can use this way NHibernate too.
I have used hibernate in a stateful desktop application with extremely long sessions: the session starts when the application launches, and remains open for as long as the application is running. I had no problems with that. I make absolutely no use of attaching, detaching, reattaching, etc. I know it is not standard practice, but that does not mean it is not doable, or that there are any pitfalls. (Edit: but of course read the discussion below for possible pitfalls suggested by others.)
I have even implemented my own change notification mechanism on top of that, (separate thread polling the DB directly, bypassing hibernate,) so it is even possible to have external agents modify the database while hibernate is running, and to have your application take notice of these changes.
If you have lots and lots of stuff already working with hibernate, it would probably not be a good idea to abandon what you already have and rewrite it unless you are sure that hibernate absolutely won't do what you want to accomplish.
I'm facing the following situation:
A system I'm working on has a few different parts(services and ASP.net) with seperate responsibilities. These parts are combined by 2 resources: A MSSQL-DB and files on a windows filesystem.
Currently all these parts access these resources individually. I think this is causing unpredictability and inconsistency.
I'm thinking of introducing a service that regulates access to these resources. I'm not sure if this is an accepted design principle.
The general question is:
What kind of solution should I be looking at and what should I keep in mind when designing this?
Specific questions:
Is this just a Data Access Layer?
Is it bad to introduce a SPOF like this?
Can you recommend any reading material aimed at this kind of solution? (especially if there's specific material for C#)
edit because of a great question by allen-smithee:
The database is currently accessed by embedded queries. They are seperated into a class but these are different for every service so it's not a shared library.
1/ A Data Access Layer simply encapsulates the data logic, what you need is concurrency control to ensure consistency of your data model across the independent services.
2/ Depending how you implement concurrency it can be a single point of failure but I don't think there is anything wrong with that - "plan for failure" is a great design mantra. You can build in redundancy and fail-over mechanisms, or you can distribute your concurrency control across your services.
3/ The way you choose to implement concurrency will depend on how your application functions and what your users expect. To give some specific scenarios:
Scenario A
When a service begins an update start a transaction and take out one or more row-level locks for the records involved. If any other service tries to edit the record at the same time either block or return an error such as 'this record is currently locked'. Note that all locks have to be taken before reading and kept for the duration of the update to ensure consistency with other writes.
Pros - Fairly straight forward to implement for small data models. MSSQL supports plenty of locking scenarios and even custom application locks that you can use to group resources.
Cons - If your transaction needs to access multiple tables/rows and different services or functions access overlapping tables you can easily get into all sorts of deadlock problems.
MSSQL generally prefers pessimistic locking and can escalate locks from row to page and table level, which means read and write locks may behave in ways you wouldn't initially expect. You may need to spend a considerable amount of time debugging these interactions in SQL Server Profiler and be prepared to make changes to your data model to work around these issues.
Scenario B
Each table row has an incremental version number. A service reads the data it needs, performs a series of updates, and then within a transaction lock checks the current row version against the one it used for the update. If the version numbers do not match it rolls back the transaction, cancelling the update. The service may then attempt to perform the operation again starting with reading the data.
Pros - Readers are not blocked and the lock is held only very briefly while the service tries to commit the update. MSSQL has built-in support for this concurrency method in the form of 'Row Versioning' with the 'Snapshot Isolation' level. If conflicts are rare this method can be extremely responsive - perfect for real-time applications.
Cons - This method may require significant changes to your data model and the service behaviour.
Scenario C
A single data service is responsible for all data access. Other services request data from and submit updates to this service. The service is responsible for reading and writing to the database and filesystem, and performs some level of data integrity checking and resolves data conflicts.
Pros - Encapsulates data integrity and control in one module, simplifying other services. Allows you to implement caching, locking etc at the application level providing finer-grained control.
Cons - Significant changes to existing architecture required. Resolving data conflicts can require a significant amount of code if you choose to resolve at the field level. Services will need to be able to handle a rejected update when resolution is not possible.
That's the major scenarios I can think of off the top of my head but there are plenty more. Generally all concurrency control for data will revolve around locking while performing an action (pessimistic locking); performing an action and then checking for a conflict (optimistic locking via versioning); or performing an action and then merging conflicts (conflict resolution.)
Thinking about your specific data model and how the model is updated will guide which mix of these techniques you will use. Searching for any of the terms above will give you plenty to read and there are a lot of Technet articles that specifically address these issues in an MSSQL context. Take heart - I've seen good programmers get this stuff wrong, it really is a challenging problem, but it is solvable if you work through it methodically.
In my current setup I have a dedicated Appfabric server. Most of the objects stored there are reference objects which means most of the operations are 'Get' operations. Therefore I've considered using LocalCache.
Unfortunately, recently I experienced problems with the availability of the cache server resulting from various network issues. The application server continues to work directly with the DB in these cases thanks to a provider I've written. However, it has a very large impact on performance as expected.
I want to be able to use some kind of a local cache for the highly referenced objects, even when the cache server is down. For this purpose I've considered using the MemoryCache of .Net 4. I don't really care about the objects being stale and I rely on a timeout eviction policy, therefore I don't worry about synchronization between the application servers.
I wanted to hear what do you think about this solution.
- Are there any other points I should consider?
- Is there a better solution to provide fast access for highly referenced objects even when the cache server is down?
Appfabric's LocalCache is a client cache, local and inproc to the client application, which stores references of frequently used data, so application does not need to deserialize same object again. However since LocalCache works with the cache server, it would not work if cache server is down.
One solution possible to your problem is as you have mentioned, having an independant client cache so even if cache server goes down, client cache will still be available.
When relying on inproc cache you will have to keep it in mind that in-process caches store reference of cached objects. If your application modifies object after getting from cache, it will be modified in cache as well. Also if multiple threads may end up modifying same item in cache, you will need thread synchronization for such objetcs.
However even using an independant client cache, you application may end up hitting the database frequently, since data in client cache of one application server will not be accessable to other servers.
A better solution might be using replicated cache servers, where each server will have all cached data. This will not only improve get performace for referential data but also will eliminate single point of failure, like in your case.
If Appfabric is not a hard requirement for application, you may look into NCache for better scalability and high availablility.
Did you consider AppFabric's local cache feature? Or is it not suitable for you?
In my client-server architecture I have few API functions which usage need to be limited.
Server is written in .net C# and it is running on IIS.
Until now I didn't need to perform any synchronization. Code was written in a way that even if client would send same request multiple times (e.g. create sth request) one call will end with success and all others with error (because of server code + db structure).
What is the best way to perform such limitations? For example I want no more that 1 call of API method: foo() per user per minute.
I thought about some SynchronizationTable which would have just one column unique_text and before computing foo() call I'll write something like foo{userId}{date}{HH:mm} to this table. If call end with success I know that there wasn't foo call from that user in current minute.
I think there is much better way, probably in server code, without using db for that. Of course, there could be thousands of users calling foo.
To clarify what I need: I think it could be some light DictionaryMutex.
For example:
private static DictionaryMutex FooLock = new DictionaryMutex();
FooLock.lock(User.GUID);
try
{
...
}
finally
{
FooLock.unlock(User.GUID);
}
EDIT:
Solution in which one user cannot call foo twice at the same time is also sufficient for me. By "at the same time" I mean that server started to handle second call before returning result for first call.
Note, that keeping this state in memory in an IIS worker process opens the possibility to lose all this data at any instant in time. Worker processes can restart for any number of reasons.
Also, you probably want to have two web servers for high availability. Keeping the state inside of worker processes makes the application no longer clustering-ready. This is often a no-go.
Web apps really should be stateless. Many reasons for that. If you can help it, don't manage your own data structures like suggested in the question and comments.
Depending on how big the call volume is, I'd consider these options:
SQL Server. Your queries are extremely simple and easy to optimize for. Expect 1000s of such queries per seconds per CPU core. This can bear a lot of load. You can use a SQL Express for free.
A specialized store like Redis. Stack Overflow is using Redis as a persistent, clustering-enabled cache. A good idea.
A distributed cache, like Microsoft Velocity. Or others.
This storage problem is rather easy because it fits a key/value store model well. And the data is near worthless so you don't even need to backup.
I think you're overestimating how costly this rate limitation will be. Your web-service is probably doing a lot more costly things than a single UPDATE by primary key to a simple table.
I am about to develop a Windows service in C#. This service needs to keep track of events in the system, and write some data to files from time to time. These ongoing events form a certain state, so I'll keep the state in memory and update it as events will arrive. I don't want to over-complicate things so I don't want the state to be persistent on disk, but I'm wondering if I could somehow make it persistent in memory, so that if the service crashes (and auto restarts by Windows) it could pick up from where it left and go on (possibly losing some events, not a big deal).
I was thinking along the line of creating a "shared" memory area, thus letting Windows manage it, and using it only in the service - but I'm not sure that object will persist after the service dies.
Any ideas?
EDIT: I'm not looking for an overkill solution. The data is somewhat important so I'd like to keep it waiting in memory until the service is restarted, but the data is not too important. It's more of a nice-to-have feature if I can persist the data easily, without working with files, external 3rd party processes and so on. My ideal solution would be a simple built-in feature (in .NET or in Windows) that will provide me with some in-memoory persistence, just to recover from a crash event.
You can use a Persitent Caching Block from the Microsoft Enterprise Library.
It is configurable and you can use many backing stores like database and isolated storage.
I know you said that you don't want to over-complicate things by persisting it to disk, but it's definitely going to much more complicate to persist stuff into shared memory or any of the solutions listed here. The reason why so many applications use databases or file storage is because it's the simplest solution.
I would recommend you keep all the state in a single object or object hierarchy, serialize this object to XML and write it to a file. It really doesn't get much simpler than that.
You could use Memcached, or Redis (which also persists it's data on disk, but handles it automatically).
http://code.google.com/p/redis/
You could also take a look at this question:
Memcached with Windows and .NET
I don't see why it'd be harder to persist to disk.
using db4o you can persist the instances you are already working with.
How about using isolated storage and persisting the object into memory that way?
Even if, for instance, you keep the data on a shared-memory of some other networked pc, how would you "guarantee" that the networked pc wont hang/restart/halt/etc? In that case your service will lose the persisted data anyway.
I would suggest, and chances are you'd likely to end up, storing the data on the same disk.
Note that, because of the volatile nature of memory(RAM) you cannot reload data that was previously there, before the system restart; not unless you use some mechanism to store/reload on disk.
--EDIT--
In that case, how about using MSMQ? So you can push everything over the queue, and even if your service gets a restart, it would look for the items in the queue and continue onwards.