RESTful Webservice with embedded IronPython: engine & scope questions

RESTful Webservice with embedded IronPython: engine & scope questions - c#

I have a RESTful C# web service (using Open Rasta) that I want to run IronPython scripts that talk to a CouchDB.
One thing I could use some clarification on is: How often do I need a new instance of the python engine and the scope? one each per application? per session? per request?
I currently have a static engine at the application level along with a dictionary of compiled scripts; then, per request, I create a new scope and execute the code within that scope...
Is that correct? thread safe? and as performant as it could be?
EDIT: regarding the bounty Please also answer the question I posed in reply to Jeff: Will a static instance of the engine cause sequential requests from different clients to wait in line to execute? if so I will probably need everything on a per-request basis.

A ScriptRuntime/ScriptEngine per application and a Scope per request is exactly how it should be done. Runtimes/Engine are thread-safe and Scopes are not.

Per request is the way to go unless all of your code is thread safe. You may get better performance using per application (per session implies you have th notion of "sesions" between you client and server), however the implication there is that all of your code in the "application" is thread safe.
So per-request is what you should use unless you know your code to be thread safe.
Note also that per application will be faster only if:
In order to make things thread safe
you've not blocking threads in any
way.
To a certain extent if the
business layer/data layer are
extremely "heavy" (take a lot of
time to instantiate) then some
performance benefit may be gained.

Related

C# async/await in backend service / webservice usefull?

Is async/await useful in a backend / webservice scenario?
Given the case there is only one thread for all requests / work. If this thread awaits a task it is not blocked but it also has no other work to do so it just idles. (It can't accept another request because the current execution is waiting for the task to resolve).
Given the case there is one thread per request / "work item". The Thread still idles because the other request is handled by another thread.
The only case I can imagine is doing two async operations at a the same time is like reading a file and sending an http request. But this sounds like a rare case. Is should read the file first and then post the content and not post something I didn't even read.

Given the case there is one thread per request / "work item". The Thread still idles because the other request is handled by another thread.
That's closer to reality but the server doesn't just keep adding threads ad infinitum - at some point it'll let requests queue if there's not a thread free to handle the request. And that's where freeing up a thread that's got no other work to usefully do at the moment starts winning.

It's hard to read your question without feeling that you misunderstand how webservers work and how async/await & threads work. To make it simple, just think of it like this: async/await is almost always good to use when you query an external resource (e.g. database, web service/API, system file, etc). If you follow this simple rule, you don't need to think too deeply about each situation.
However, when you read & learn more on these subjects and gain good experience, deep thinking becomes essential in each case because there are always exceptions to any rule, so there are scenarios where the overhead of using async/await & threads may transcends their benefits. For example, Microsoft decided not to use it for the logger in ASP.Net Core and there is even a comment about it in the source code.
In your case, the webserver uses much more threads that you seem to think and for much more reasons than you seem to think. Also when a thread is idling waiting for something, it cannot do anything else. What async/await do is that they untie the thread from the current awaited task so the thread can go back to the pool and do something else. When the awaited task is finished, a thread (can be a different thread) is pulled out of the pool to continue the job. You seem to understand this to some degree, but perhaps you just don't know what other things a thread in a webserver can do. Believe me, there is a lot to do.
Finally, remember that threads are generic workers, they can do anything. Webservers may have specialized threads for different tasks, but they fall into two or three categories. Threads can still do anything within their category. Webservers can even move threads to different categories when required. All of that is done for you so you don't need to think about it in most cases and you can just focus on freeing the threads so the webserver can do its job.

Given the case there is only one thread for all requests / work.
I challenge you to say that this is a very abstruse case. Even before multi core servers because standard, asp.net used 50+ threads per core.
If this thread awaits a task it is not blocked but it also has no other work to do so it
just idles.
No, it goes back into the pool handling other requests. MOST web services will love handling as many requests as possible with as few resources as possible. Servers only handling one client are a rare edge case. Extremely rare. Most web services will handle as many requests as the plenthora of clients throw at them.

Semaphore scope and behavior

I learned about semaphores from an earlier question I had today, and I stll am scratching my head here. There seems to be no discussion about scope beyond global and local, where global is defined as the entire operating system.
If I have an application made from several assemblies, and each assembly has several classes, and each class has a private static semaphore object, with different "queue" lengths, if I start queuing different tasks up in my application thread pool in different places, how does that work? How do the threads behave around each other? All the examples I see include one or two classes in one assembly, and I'm not getting a clear picture on how this works.
I use thread pooling all over my app. It parallelizes data (sending customized emails to various people, generating customized reports en masse, collecting data from various web services, etc) while leaving my interface responsive, which is a wonderful thing.
One of my web service sources limits me to five concurrent connections, and I could not figure out how to limit the web requests to 5 active threads while still allowing the rest of the application to utilize other threads as necessary. So, I turned to SO, and asked how to do it. The proposed answer was use Semaphores.
Until that point, I did not know a thing about semaphores, so I researched it. It does indeed seem to limit the number of threads executing a specific method, but it does not make sense how this communicates properly with the thread pool manager. If I implemented a semaphore on my web request functionality, and I get a backlog of threads waiting to perform web service calls, how does the thread pool know (can it know?) to issue more threads for other processes? The scope of the semaphore is private; it shouldn't see the object.
Further, is that what the semaphore is supposed to do? Can I likewise limit other groups of tasks, by having them share a common semaphore? Is this a gross bastardization of the intent of a semaphore, or exactly what it's meant to do. There's so much information out there on it, but in simplified, abstract form, and I couldn't find an article describing when and how it is appropriate to use these things.
So how does a private static semaphore communicate with the thread pool so the thread pool knows whether or not to spawn another worker thread? Does it? Will I be creating more problems than solutions by doing this? What sort of behavior can I expect my thread pool to exhibit with a backlog of web requests? Will it spawn new threads for the web requests until it's "full", reducing the thread availability for other methods? Can I make it not do that?

The scope constraints, (if you can call them that!), are because the semaphores are OS kernel synchronization primitives that can be used, (unnamed), for inter-thread comms or, (named), or inter-process, (inter-assembly) comms. The language cannot restrict the scope of the named variant.
There is a huge pile of information on semaphores in general on Google. For .NET-wapped ones, MSDN.
Inter-process signalling and communication via. named semaphores is certainly possible in general. How you might do it in the managed environment is another matter. In unmanaged code, it usually involves other comms elements, like shared memory areas and/or memory-mapped files. You should probably not go there.
Be careful about trying to constrain thread pools in different assemblies by making the tasks signa/wait on named semaphores. By all means try it, if you think it may solve some problem with your app, but there is at least the possibility that the pool thread counts, running pool threads, pool threads blocked on semaphores, pool threads blocked inside tasks on IO etc. may become unstable.

You seem to be assuming that the only solution is to partition the builtin .NET thread pool. How about using separate custom thread pools for your task groups? See this link for Jon Skeet's sample code.

When to use System.Threading.ThreadPool and when one of the many custom thread pools?

I'm working on creating an async handler for ASP.NET that will execute a slow stored procedure. I think I understand that to gain additional throughput on a mixed load of slow and fast pages, the slow page needs to be executed on a thread pool separate from the one the ASP.NET uses, otherwise the async pattern will cause double the number of scarce threads to be used (correct me if I'm wrong).
So I have found System.Threading.ThreadPool - it looks like it should do the trick, but...
The various tutorials on the net such as this one which uses this custom pool, the one in John Skeet's MiscUtils, and the custom thread pool referenced in this tutorial about async patterns.
System.Threading.ThreadPool has existed since 1.1 - why do people routinely feel the need to write a brand new one? Should I avoid using System.Threading.ThreadPool?
I'm a rank beginner when it comes to threading, so go easy on the undefined jargon.
UPDATE. The stored procedure to be executed will not necessarily be MS-SQL and will not necessarily be able to use a built-in async method such as BeginExecuteNonQuery().

Here's what I found on the topic. Why you shouldn't use ThreadPool in ASP.NET http://madskristensen.net/post/Done28099t-use-the-ThreadPool-in-ASPNET.aspx. It's quite old but I don't think it has changed that much. Or correct me if I'm wrong.
Using the System.Threading.ThreadPool or a custom delegate and calling its BeginInvoke offer a quick way to fire off worker threads for your application. But unfortunately, they hurt the overall performance of your application since they consume threads from the same pool used by ASP.NET to handle HTTP requests.
Using custom threads with the aid of System.Threading.Thread class should solve the problem as the threads created are not part of your application's pool.

Singleton pattern in web applications

I'm using a singleton pattern for the datacontext in my web application so that I dont have to instantiate it every time, however I'm not sure how web applications work, does IIS open a thread for every user connected? if so, what would happend if my singleton is not thread safe? Also, is it OK to use a singleton pattern for the datacontext? Thanks.

I'm using a singleton pattern for the datacontext in my web application
"Singleton" can mean many different things in this context. Is it single-instance per request? Per session? Per thread? Per AppDomain (static instance)? The implications of all of these are drastically different.
A "singleton" per request (stored in the HttpContext) is fine. A singleton per session is discouraged, but can be made to work. A singleton per thread may appear to work but is likely to result in unexpected and difficult-to-debug behaviour. A singleton per Application or AppDomain is a disaster waiting to happen.
so that I dont have to instantiate it every time
Creating a DataContext is very, very cheap. The metadata is globally cached, and connections aren't created until you actually execute a query. There is no reason to try to optimize away the construction of a DataContext instance.
however I'm not sure how web applications work, does IIS open a thread for every user connected?
IIS uses a different thread for every request, but a single request may use multiple threads, and the threads are taken from the Thread Pool, which means that ultimately the same user will have requests on many different threads, and conversely, different users will share the same thread over multiple requests and an extended period of time. That is why I mention above that you cannot rely on a Thread-Local Singleton.
if so, what would happend if my singleton is not thread safe?
Very bad things. Anything that you cache globally in an ASP.NET application either needs to be made thread safe or needs to be locked while it is in use.
Also, is it OK to use a singleton pattern for the datacontext? Thanks.
A DataContext is not thread-safe, and in this case, even if you lock the DataContext while it is in use (which is already a poor idea), you can still run into cross-thread/cross-request race conditions. Don't do this.
DataContext instances should be confined to the scope of a single method when possible, using the using clause. The next best thing is to store them in the HttpContext. If you must, you can store one in the Session, but there are many things you need to be aware of (see this question I answered recently on the ObjectContext - almost all of the same principles apply to a DataContext).
But above all, do not create "global" singleton instances of a DataContext in an ASP.NET application. You will deeply regret it later.

Many people keep the DataContext around for the duration of the request by keeping it in the HttpContext.Current.Items Thereby it is also private to the request.
Have a look at this blogpost by Steve Sanderson, and the UnitOfWork pattern.

Static variables are visible to all users on the per app domain, not per session. Once created, the variable will sit in memory for the lifetime of the app domain, even if there are no active references to the object.
So if you have some sort of stateful information in a web app that shouldn't be visible to other users, it should absolutely not be static. Store that sort of information in the users session instead, or convert your static var to something like this:
public static Data SomeData
{
get
{
if (HttpContext.Session["SomeData"] == null)
HttpContext.Session["SomeData"] = new Data();
return (Data)HttpContext.Session["SomeData"];
}
}
It looks like a static variable, but its session specific, so the data gets garbage collected when the session dies and its totally invisible to other users. There safety is not guaranteed.
Additionally, if you have stateful information in a static variable, you need some sort of syncronization to modify it, otherwise you'll have a nightmare of race conditions to untangle.

#ryudice the web server creates a new thread for each request. I think the best approach is to have a datacontext bound to each request, meaning that you should create a new datacontext every time you serve a request. A good way of achieving this is by using a DI tool, such as StructureMap. These kind of tools allow you to setup the lifecycle of the instances you configure, so for example in your case you would configure your XDataContext class to be HttpContext scoped.
Regards.

here are Microsoft's examples on how to do multi-tier with LINQ-To-SQL.
http://code.msdn.microsoft.com/multitierlinqtosql

ASP.NET Threading: should I use the pool for DB and Emails actions?

I’m looking for the best way of using threads considering scalability and performance.
In my site I have two scenarios that need threading:
UI trigger: for example the user clicks a button, the server should read data from the DB and send some emails. Those actions take time and I don’t want the user request getting delayed. This scenario happens very frequently.
Background service: when the app starts it trigger a thread that run every 10 min, read from the DB and send emails.
The solutions I found:
A. Use thread pool - BeginInvoke:
This is what I use today for both scenarios.
It works fine, but it uses the same threads that serve the pages, so I think I may run into scalability issues, can this become a problem?
B. No use of the pool – ThreadStart:
I know starting a new thread takes more resources then using a thread pool.
Can this approach work better for my scenarios?
What is the best way to reuse the opened threads?
C. Custom thread pool:
Because my scenarios occurs frequently maybe the best way is to start a new thread pool?
Thanks.

I would personally put this into a different service. Make your UI action write to the database, and have a separate service which either polls the database or reacts to a trigger, and sends the emails at that point.
By separating it into a different service, you don't need to worry about AppDomain recycling etc - and you can put it on an entire different server if and when you want to. I think it'll give you a more flexible solution.

I do this kind of thing by calling a webservice, which then calls a method using a delegate asynchronously. The original webservice call returns a Guid to allow tracking of the processing.

For the first scenario use ASP.NET Asynchronous Pages. Async Pages are very good choice when it comes to scalability, because during async execution HTTP request thread is released and can be re-used.
I agree with Jon Skeet, that for second scenario you should use separate service - windows service is a good choice here.

Out of your three solutions, don't use BeginInvoke. As you said, it will have a negative impact on scalability.
Between the other two, if the tasks are truly background and the user isn't waiting for a response, then a single, permanent thread should do the job. A thread pool makes more sense when you have multiple tasks that should be executing in parallel.
However, keep in mind that web servers sometimes crash, AppPools recycle, etc. So if any of the queued work needs to be reliably executed, then moving it out of process is a probably a better idea (such as into a Windows Service). One way of doing that, which preserves the order of requests and maintains persistence, is to use Service Broker. You write the request to a Service Broker queue from your web tier (with an async request), and then read those messages from a service running on the same machine or a different one. You can also scale nicely that way by simply adding more instances of the service (or more threads in it).
In case it helps, I walk through using both a background thread and Service Broker in detail in my book, including code examples: Ultra-Fast ASP.NET.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.