Service call works in main thread, but crashes when multithreaded - c#

My company has an application that keeps track of information related to web sites that are hosted on various machines. A central server runs a windows service that gets a list of sites to check, and then queries a service running on those target sites to get a response that can be used to update the local data.
My task has been to apply multithreading to this process to reduce the time it takes to run through all the sites (almost 3000 sites that take about 8 hours to run sequentially). The service runs through successfuly when it's not multithreaded, but the moment I spread out the work to multiple threads (testing with 3 right now, plus a watcher thread) there's a bizarre crash that seems to originate from the call to the remote services that are supposed to provide the data. It's a SOAP/XML call.
When run on the test server, the service just gives up and doesn't complete it's task, but doesn't stop running. When run through the debugger (Dev Studio 2010) the whole thing just stops. I'll run it, and seconds later it'll stop debugging, but not because it completed. It does not throw an exception or give me any kind of message. With breakpoints I can walk through to the point where it just stops. Event logging leads me to the same spot. It stops on the line of code that tries to get a response from the web service on the other sites. And again: it only does that when multithreaded.
I found some information that suggested there's a limit to the number of connections that defaults to 2. The proposed solution is to add some tags to the app.config, but that hasn't solved the problem...
<system.net>
<connectionManagement>
<add address="*" maxconnection="20"/>
</connectionManagement>
</system.net>
I still think it might be related to the number of allowed connections, but I have been unable to find information around it online very well. Is there something straightforward I'm missing? Any help would be much appreciated.

No crash however bizarre will escape the stack-dump. Try going through that dump and see if it points out to some obvious function.
Are you using some third party tool or some other component for the actual service call ? If yes, then please check the documentation/contact-the-person-who-wrote-it, to confirm that their components are thread safe. If they are not, you have large task ahead. :) (I have worked on DB which are not safe, so trust me it is not very uncommon to find few global static variables thrown around..)
Lastly if you are 100% sure that this is due multiple threads then, put a lock in your worked thread. Initially say it covers entire main-while-loop. Therotically it should not crash not as even though it is multi-threaded, you have serialized the execution.
Next step is to reduce to scope of the thread. Say, there are three functions in the
main-while-loop , say f1(), f2(), f3(), then start locking f2() and f3() while leaving f1 unlocked... If things work out, then problem is somewhere in f2 or f3().
I hope you got the idea of what I am suggest
I know this is like blind man guessing elephant, but that is the best you can do, if your code uses LOT many external component which are not adequately documented.

Related

Why would I bother properly terminating all threads in a multi-thread process?

The company I work for uses Visual Studio to develop its website and all of its features, and there is also a separate site that's been developed for testing the site. This 'testing' site can run individual test cases against the website, and must be run for each possible case.
Everything is written in VB.NET and each time the program is run a single thread is created to run the test. However, at the 'end' of the test the thread seems to still lingers. The stop button in Visual Studio must be manually clicked in order to terminate the application. Also, a process icon lingers in the task bar long after the application has closed.
It appears to me that the program is not correctly terminating all threads run during the tests, but I'm not sure if this is an issue worth brining up in the office, so I ask the following question...
What is the purpose of properly closing an application and all threads running on it, and what are the consequences, if any, of not doing so?
Well it's probably a small problem now, but it's not a good practice, IMHO. Imagine what would happen if the same code was now being executed by a continuous integration server, for instance, TeamCity (or Jenkins, or...), and the unit tests are being run continuously and automatically, by said build server.
What would happen to the build status when those threads fail to close down cleanly? We often face this problem due to bad design decisions in threading, or due to simple (and possibly, idiotic) mistakes in our unit testing code. The net effect though, is a hung build process.
I've seen CI servers hang for almost half a day before someone (mercifully) killed the build process. Essentially, this indicates a problem in our code that may or may not become a huge issue. If this was server-side code, there is potential for this code to lead to a pretty bad situation. My advice would be to dig out your introspection toolkits (memory profiling, perf profiling, etc) and see what exactly is going on, and resolve it.
We had a similar problem with an application that is being called to index SPA pages on our application server. It was throwing an exception in some cases and threads were not closing. The biggest downside is that it will consume the servers memory which is bad
Another downside as it runs as a web application that it will consume available ports and stop running when it run out of available ports.
The code should be modified to peacefully kill the thread after finishing or on exceptions and of course report any.

IIS7 stops working after 5 requests

Here is my problem:
I have just been brought onto a massive asp.net C# project and I've been charged with fixing some performance issues (not my area of expertise). More specifically after 5 - 7 redirects/ajax calls the web server stops responding and the whole page (and eventually the browser) freezes.
I don't think this is a coding issue as I've set up break points in a few pages (Page_Load method) and after the 5 requests it does not even reach the break points.
I don't believe this is related to this issue as I've increased the browser's maximum connections per server parameter and I got the same behavior. Furthermore after these 5 request in one browser IE, the application stops working in FF as well.
This is not a resource issue as the w3wp.exe process never exceeds 500MB memory.
One thing I've noticed when using Fiddler and other tools to monitor the requests is that the server takes a very long time when loading image files (png, jpg). I don't know if this is relevant.
I've enabled failed request tracing on the server and the only thing I've noticed is that some request fail with a 401 error even dough I've set Anonymous Authentication to enabled.
Here is the exact message
MODULE_SET_RESPONSE_ERROR_STATUS
ModuleName ManagedPipelineHandler
Notification 128
HttpStatus 401
HttpReason Unauthorized
HttpSubStatus 0
ErrorCode 0
ConfigExceptionInfo
Notification EXECUTE_REQUEST_HANDLER
ErrorCode The operation completed successfully. (0x0)
This message is sometimes thrown with ModuleName: ScriptModule
I have already wasted 2 days on this thing and I'm running out of ideas so any suggestions would be appreciated.
Like any large generic problem, your best bet in diagnosing the issue is to figure out how to break down the issue into smaller parts, how to hypothesize the issues, and how to validate or invalidate your hypotheses. My first inclination would be to hypothesize that the server-side processes in this particular are taking a long time, causing your client requests to block, making the whole thing seem frozen.
From there, I would attempt to replicate the long running server side processes by creating isolated client side tests - perhaps if the URLs are HTTP gets, I would test the same URLs individually. If they were HTTP posts, I'd create an isolated test form if feasible to see what happens with each request. If a long running server side process is found then you have a starting point.
If there are no long running server side processes then it may be JavaScript / client side coding issues that need to be looked into. But definitely when you're working a large, unfamiliar project, your best bet is to figure out how to break down the issue into smaller components that can then be tested
I solved the issue finally. Here is what I did:
Experimented with IIS settings and App_Pool recycling and noticed that there is nothing wrong with the way it handles requests that actually reach it.
I focused on the Http.sys module and noticed that in the log files there were a lot of Timer_ConnectionIdle and Client_Reset errors.
After some more experimentation and a lot of Google searches, I accidentally found this answer and it solved my issue. As the answer suggests the problem was caused by the AVG antivirus installed and incorrectly configured on the server.
Thanks for all the help and suggestions.
If it's ajax calls that are causing your browser to freeze, make sure they are not blocking ajax calls.
Just appending to Shan's answer, which is a good one.
First off, there is obviously a code issue as this is by no means 'normal' behavior for IIS.
That said, you must isolate it as Shan indicated. For example, given the server itself no longer accepts connections then we can pretty well eliminate javascript as the source of the problem and relegate it to being just a symptom.
Typically when a worker process spins into space like this it is due to either an infinite loop or an issue where multiple threads are trying to lock the same resource. I bet if you let it run long enough IIS itself will timeout, kill and restart the process.
With that in mind you want to look for any type of multithreaded garbage (which I highly recommend you don't do in a web server) or for anything that indicates a tight infinite loop. A loop is going to become apparent if you execute the requests individually. A multi-threaded issue will only show up if you happen to get a collision.
Run various performance counters on the web server. Also, once it locks up, let it sit that way for awhile. Once IIS performs it's own reset on the worker process go look for indicators in the event log.

Unity Not Being Disposed Causing Server Lockup?

We have a server farm of about 40 servers that we roll code to every couple weeks. One thing we noticed when we roll the code live is after deploying the assemblies and performing an IIS reset and put it back in the BigIp (F5) and it receives traffic the server will lockup for about 10 minutes and clients will just spin until an eventual timeout.
Looking at the perfmon we can see a dramatic spike in number of finally's and number of pinned objects btw which lead me to investigate memory issues.
So one thing I started looking into it our Unity IoC configuration. In the global.asax.cs we are registering about 15 interfaces where most are using the ContainerControlledLifetimeManager to manage the lifetime. Normally there is never a problem with the code except in this ten minute window so my first thought was a memory or resource management issue.
Does anyone know if you have to explicitly Dispose() of your Unity Container or is this handled by Unity automagically somehow? I noticed today that there was no Dispose wiring in place for Application_End so my thought was maybe when the servers are brought back on after the IIS reset there is a Unity or object resource issue until the GC comes around and frees the memory (the ten minutes it takes to come up).
Any help is appreciated!
Performing an iisreset will kill the currently running w3wp.exe process, so it's unlikely that not properly disposing of unity objects in Application_End would cause performance issues on startup. It is possible that the old web process doesn't properly release file system or other resources the new web process depends upon, but I think you'd see file access or some other errors if that were the case.
Since you're performing an iisreset, I would look closely at the code that runs when the application starts for the first time. Maybe there are some components that take alot of time to start up (i.e., say there is a singleton type class that downloads and caches a bunch of stuff from the database) that are causing the slow down, possibly only when combined with the stress of handling all of the waiting HTTP requests. Also, keep in mind that ASP.NET will incur a bunch of overhead as it compiles the application to be used the first time. Since it seems that your web application is behind a load balancer, you may want to come up with a way to "prime" the application on each individual web server before you add that web server back to the load balancer, which could be accomplished by just loading a page locally on that web server. Priming the application would allow the web app initialize itself without having to handle any outside requests, which should improve the startup time.
Long story short, I would investigate startup issues and see what I could tune there before I focused on shutdown issues.

Best Practices of fault toleration and reliability for scheduled tasks or services

I have been working on many applications which run as windows service or scheduled tasks.
Now, i want to make sure that these applications will be fault tolerant and reliable. For example; i have a service that runs every hour. if the service crashes while its operating or running, i d like the application to run again for the same period (there are several things involved with this including transactions of data processing) , to avoid data loss. moreover, i d like the program to report the error with details. My goal is to avoid data loss and not falling behind for running the program.
I have built a class library that a user can import into a project. Library is supposed to keep information of running instance of the program, ie. program reads and writes information of running interval, running status etc. This data is stored in a database.
I was curious, if there are some best practices to make the scheduled tasks/ windows services fault tolerant and reliable.
Edit : I am talking about independent tasks or services which on different servers. and my goal is to make sure that the service will keep running, report any failures and recover from them.
I'm interested in what other people have to say, but I'll give you a few points that I've stumbled across:
Make an event handler for Unhandled Exceptions. This way you can clean up resources, write to a log file, email an administrator, or anything you need to instead of having it crash.
AppDomain.CurrentDomain.UnhandledException += new UnhandledExceptionEventHandler(AppUnhandledExceptionEventHandler);
Override any servicebase event handlers you need in the main part of your application. OnStart and OnStop are pretty crucial, but there are many others you can use. http://msdn.microsoft.com/en-us/library/system.serviceprocess.servicebase%28v=VS.71%29.aspx
Beware of timers. Windows forms timers won't work right in a service. User System.Threading.Timers or System.Timers.Timer. Best Timer for using in a Windows service
If you are updating on a thread, make sure you use a lock() or monitor in key sections to make sure everything is threadsafe.
Be careful not to use anything user specific, as a service runs without a specific user context. I noticed some of my SQL connection strings were no longer working for windows authorizations, etc. Also have heard people having trouble with mapped drives.
Never make a service with a UI. In fact for Vista and 7 they make it nearly impossible to do anyway. It shouldn't require user interaction, the most you can do is send a message with a WIN32 function. MSDN claims making interactive services is bad practice. http://msdn.microsoft.com/en-us/library/ms683502%28VS.85%29.aspx
For debugging purposes, it is way cool to make a service run as a console application until you get it doing what you want it to. Awesome tutorial: http://mycomponent.blogspot.com/2009/04/create-debug-install-windows-service-in.html
Anyway, hope that helps a little, but that is just a couple thing I poked around to find on my own.
Something obvious - don't run all your tasks at the same time. Try to schedule them so only one task is using some expensive resource at any time (if possible). For example, if you need to send out newsletters and some specific notifications, schedule them at different times. If two tasks need to clean up something in the database, let the one run after another.
Also schedule tasks to run outside of normal business hours - at night obviously.

IIS hosted web service method call randomly dies

We have an IIS hosted web method which is randomly dying on us about 10% of the time. In trying to debug this we've added Log.Debug() messages in front of every real code line and it appears to be dying on random lines.
Has anyone seen this or have an idea on how to debug this?
[Additional Details]
We've spent a lot of time looking at it and have discovered the following...
We have a seperate self-hosted WCF Service that access the same database and lives on the same machine. When it is under heavy load the web method croaks every time. If it's not under load then things usually work fine (but not 100%).
High CPU doesn't seem to be part of the problem. We ran a small app that created a high cpu load and the web service did not die.
The web service dies when we either new up an XmlSerializer (without doing the sgen precomp) OR have NHibernate create a SessionFactory. The only two things these things have in common is that they 1) seem like things people commonly do.. 2) seem like they would be fairly intensive.
We've added a Global.asax to try to capture Application_End and Application_Error but neither event gets fired. This to me implies that we're not dealing with a normal application pool resetting?
Sounds like it might be a threading issue. You are using informative debug messages -- you should try to reproduce the issue while running the debugger and breaking on all exceptions. Make sure you check all the windows logs for information on why the app pool crashed.
Per comment: It's hard to say, but many things can cause a thread to appear to "just die." Memory issues: are you doing any interop? Improper marshaling: are you touching data on another thread? But, I will play the probabilities and ask if you're sure your handling any exception that might be happening and logging it. Are you sure you are not gobbling up an exception and not reporting it? Somewhere down low? Is this a permissions issue? Are you running partial trust or on a low privilege user account?
Figured it out.. two problems really..
We added Global.asax but it didn't get copied over which explains why we weren't seeing any messages. We fixed this and found out that...
Our WCF log was being written out to the bin directory of the IIS Web Service. In retrospect this is kind of silly since the WS is an old school web service. The WCF stuff is in the same directory only for some reason that is unknown to us since the initial person who set things up is gone..
Lesson learned.. Somewhere there is a message that explains everything.. you just have to find it.

Categories