I am working on multi instance application. Is there any way in c# to know that how many instances are running currently.I used one peice of code to count the window processes of my application name but that is not a good way.
string fileName = Process.GetCurrentProcess().MainModule.FileName;
int count = 0;
foreach (Process p in Process.GetProcesses())
{
try
{
if (p.MainModule.FileName == fileName)
{
count++;
}
}
catch { }
}
MessageBox.Show("Total Instances Running are " + count);
Can it be done by using semaphore or some increment and decrement counter that is incremented by one when new instance is created and decrement by one when a instance closes.
A Semaphore helps you count down, blocking you when you get to 0. You could use a global semaphore, but you'll have to initialize it at a high enough-value and start counting down.
I think all in all, your own solution is probably the cleanest.
Why don't you use shared memory? Of course you have to protect it with a mutex visible to all processes, but in this way you can store commonly used data among all processes.
I suppose you have to P/Invoke platform routine is order to create a shared memory, but it really straightforward.
Managing multiple processes is never not a problem. The OS puts up a big wall between them that makes it just about anything hard and expensive. The code you are using is no exception, iterating running processes is expensive. Always first consider using threads instead and look for AppDomains in a .NET program, a feature expressly invented to provide the kind of isolation a process can provide but without the cost of a slaying the interop barrier.
If you are committed to a multi-process solution then you almost always need a separate process that acts as an arbiter. Responsible for ensuring the worker processes get started and doing something meaningful when one of them dies on an unhandled exception. Which is itself something that's very hard to deal with since there is no good way to get any info about the reason it died and you lost an enormous amount of state. The typical outcome is a malfunction of the entire app unless you give these processes very simple things to do that you can easily do without.
Such an arbiter process has no problem counting instances cheaply, it can use the Process.Exited event to maintain a counter. An event instead of having to poll.
Related
I am writing a heavy web scraper in c#. I want it to be fast and reliable.
Parallel.Foreach and Parallel.For are way too slow for this.
For the input I am using a list of URLs. I want to have up to 300 threads working at the exact same time (my cpu and net connection can handle this). What would be the best way to do this? Would using tasks work better for this?
Sometimes the threads end for no apparent reason and some of the results don't get saved. I want a more reliable way of doing this. Any ideas?
I want to have a more solid queue type of scraping.
What I came up with (not all code but the important parts):
List <string> input = // read text file
int total = words.Length;
int maxThreads = 300;
while (true)
{
if (activeThreads < maxThreads)
{
current++;
Thread thread = new Thread(() => CrawlWebsite(words[current]));
thread.Start();
}
}
public static void CrawlWebsite(string word)
{
activeThreads++;
// scraping part
activeThreads--;
}
Consider using System.Threading.ThreadPool. It could be a little faster for your scenario with many threads, as well as you don't need to manage activeThreads. Instead you can use ThreadPool.SetMaxThreads() and SetMinThreads() and the ThreadPool manages the number of parallel threads for you.
BTW, there is missing synchronization of the shared variables in your example. One of the ways how to synchronize access is using "lock" - see http://msdn.microsoft.com/en-us/library/c5kehkcz.aspx
Also your thread-runned method - the CrawlWebsite() should handle ThreadAbortException - see http://msdn.microsoft.com/en-us/library/system.threading.threadabortexception.aspx.
I was recently working on very similar problem and don´t think that using any high number of threads will make it faster. The slowest think is usually downloading the data. Having huge number of threads does not make it faster, because mostly they are waiting for network connections data transfer etc. So I ended up with having two queues. One is handled by some small number of threads that just send async download requests (10-15 requets at a time). The responses are stored into another queue that goes into another thread pool that takes care of parsing and data processing (Number of threads here depends on your CPU and processing algorithm).
I also save all downloaded data to a database. Anytime I want to implement parsing of some new information from the web I don´t need to redownload the content, but only parse the cached web from DB (This saves a looot of time)
I'm working in .NET 4 in C#. I have LauncherProgram.exe that will create multiple instances of WorkerProgram.exe, have them do some work on the arguments supplied when the process is created, and then LauncherProgram.exe will launch a new set of WorkerProgram.exe instances to do some different work.
Each WorkerProgram.exe is launched with some parameters that tell it what to work on, and there can be one or more WorkerProgram.exe launched at the same time. The WorkerProgram.exe reads the supplied parameters, performs some initialization, and then is ready to do the work.
What I'm trying to figure out is how to make each set of WorkerProgram.exe launched at the same time "tell" or "signal" or "I can't figure out the proper term" the LauncherProgram.exe that EACH process has completed the initialization step and is ready to begin. I want to synchronize the start of the "do your work" in the WorkerProgram.exe instances launched in a set.
I'm setting up my LauncherProgram.exe to work something like this (ignoring types for now):
while (there are sets of work to do)
{
for each set of work
{
for each group data in set
create and launch a WorkerProgram.exe for a single set of data
wait for all created WorkerProgram.exe to indicate init is complete
send signal to start processing
}
}
I actually have a small test program where I use named events to signal multiple spawned processes to START something at the same time.
(Hopefully all the above makes sense)
I just can't figure out the "wait for N processes to tell me their initialization is ready" bit.
I've searched for "process synchronization" and "rendezvous" and looked at using named events and named semaphores. I can find a bunch of things about threads, but less about separate processes.
LauncherProgram.exe creates the WorkerProgram.exe processes using the System.Diagnostics.Process class, if that helps.
If you can give me better terms to help narrow my search, or point me to a design pattern or mechanism, or a library or class that helps, I'd be very appreciative.
Thanks.
You can use the System.Threading.Mutex class for interprocess communication. See http://msdn.microsoft.com/en-us/library/system.threading.mutex(v=vs.110).aspx. It is probably easiest to name each Mutex, giving the process id of WorkerProgram.exe or some other distinguishing characteristic as the name.
You can use some interprocess communication but the simple way to do it is to write to temp files for instance writing DONE to some file and having Launcher read periodically until all WorkerProgram write DONE to their respective files, etc... or even create a FileMapping in windows to share memory between processes with file backings.
Other ways to do it include remote procedure calls, sockets, and simple file mappings.
Does a destructor get called if the app crashes? If it's an unhandled exception I'm guessing it does, but what about more serious errors, or something like a user killing the application process?
And a few more potentially dumb questions:
what happens to all the objects in an app when the app exits and all finalizers have been executed - do the objects get garbage collected or are they somehow all "unloaded" with the process or appdomain?
is the garbage collector part of each application (runs in the same process) or is it independent?
I would encourage you to try this for yourself. For example:
using System;
class Program {
static void Main(string[] args) {
var t = new Test();
throw new Exception("kaboom");
}
}
class Test {
~Test() { Console.WriteLine("finalizer called"); }
}
Run this at the command prompt so you can see the last gasp. First with the throw statement commented out.
Like any unhandled exception in Windows, the default exception filter that Windows provides invokes the Windows Error Reporting dialog, displayed by WerFault.exe. If you click "Close program", WerFault will use TerminateProcess() to kill the program. That's a quick end, there is no opportunity to run the finalizer thread, as would happen when a program exits normally.
Windows then takes care of cleanup up the shrapnel. It automatically closes any operating system handles your program might have opened but didn't get a chance to close in the finalizer. Files are the trickier problem here, their buffers don't get flushed and you'll easily end up with a partially written file on disk.
If killing an application, the application would almost 100% lost the control immediately and there's no chance for it to call the destructor.
I don't even know C#, but based on my experiences with other programming languages I would guess: If an app crashes, that means there's something seriously wrong with it. Incorrect memory handling etc. It would be strange for any programming language to try to execute destructors/deallocators/finalizers/... in such a case. Things would probably just go more wrong ;)
Update: (forgot to try to answer your other questions) again, not C#-specific, but typically there is no guarantee that destructors/deallocators/finalizers/... actually get called. The reason for this is that when a process quits it is much easier and more efficient to simply "zap" the memory block used for the process than to run its destructors etc. to clean up the memory.
I'm not sure how to answer your last question without going into too much technical detail. There are several ways in which garbage collectors can be designed and made to run, the easiest is that garbage collection stops the current process and continues it when it's done, although it is also possible (but more difficult) to have garbage collectors which run concurrently with processes whose memory they are collecting.
You may want to read up on garbage collection theory to better understand all of this. There's actually a whole site about just this topic: www.memorymanagement.org.
I have a C# application which launches another executable using Process.Start().
99% of the time this call works perfectly fine. After the application has run for quite some time though, Process.Start() will fail with the error message:
Insufficient system resources exist to complete the requested service
Initially I thought this must have been due to a memory leak in my program - I've profiled it fairly extensively and it doesn't appear there's a leak - the memory footprint will still be reasonable even when this message failed.
Immediately after a failure like this, if I print some of the system statistics it appears that I have over 600MB of RAM free, plenty of space on disk, and the CPU usage is effectively at 0%.
Is there some other system resource I haven't thought of? Am I running into a memory limit within the .NET VM?
Edit2:
I opened up the application in SysInternals Process Explorer and it looks like I'm leaking Handles left and right:
Handles Used: 11,950,352 (!)
GDI Handles: 26
USER Handles: 22
What's strange here is that the Win32 side of handles seem very reasonable, but somehow my raw handle count has exploded waaaaay out of control. Any ideas what could cause a Handle leak like this? I was originally convinced it was Process.Start() but that would be USER handles, wouldn't it?
Edit:
Here's an example of how I'm creating the process:
var pInfo = new ProcessStartInfo(path, ClientStartArguments)
{
UseShellExecute = false,
WorkingDirectory = workingDirectory
};
ClientProcess = Process.Start(pInfo);
Here's an example of how I kill the same process (later in the program after I have interacted with the process):
Process[] clientProcesses = Process.GetProcessesByName(ClientProcessName);
if (clientProcesses.Length > 0)
{
foreach (var clientProcess in clientProcesses.Where(
clientProcess => clientProcess.HasExited == false))
{
clientProcess.Kill();
}
}
The problem here is with retained process handles. As we can see from your later edits you are keeping a reference to the Process object returned by Process.Start(). As mentioned in the documentation of Process:
Like many Windows resources, a process is also identified by its handle, which might not be unique on the computer. A handle is the generic term for an identifier of a resource. The operating system persists the process handle, which is accessed through the Handle property of the Process component, even when the process has exited. Thus, you can get the process's administrative information, such as the ExitCode (usually either zero for success or a nonzero error code) and the ExitTime. Handles are an extremely valuable resource, so leaking handles is more virulent than leaking memory.
I especially like the use of the word virulent. You need to dispose and release the reference to Process.
Also check out this excellent question and it's corresponding answer: Not enough memory or not enough handles?
Since the Process class implements IDisposable, it is good practice to properly dispose of it when you are done. In this case, it will prevent handle leaks.
using (var p = new Process())
{
p.StartInfo = new ProcessStartInfo(#"C:\windows\notepad.exe");
p.Start();
p.WaitForExit();
}
If you are calling Process.Kill() and the process has already exited, you will get an InvalidOperationException.
That's not an uncommon problem to have with little programs like this. The problem is that you are using a large amount of system resources but very little memory. You don't put enough pressure on the garbage collected heap so the collector never runs. So finalizable objects, the wrappers for system handles like Process and Thread, never get finalized.
Simply disposing the Process object after the process has exited will go a long way to solve the problem. But might not solve it completely, any threads that the Process class uses or you use yourself consume 5 operating system handles each. The Thread class doesn't have a Dispose() method. It should but it doesn't, it is next to impossible to call it correctly.
The solution is triggering a garbage collection yourself. Count the number of times you start a process. Every, say, hundredth time call GC.Collect(). Keep an eye on the Handle count with Taskmgr.exe. Use View + Select Columns to add it. Fine tune the GC.Collect calls to so that it doesn't increase beyond, say, 500.
I am writing a class that I know that needs a lock, because the class must be thread safe. But as I am Test-Driven-Developing I know that I can't write a line of code before creating a test for it. And I find very difficult to do since the tests will get very complex in the end. What do you usually do in those cases? There is any tool to help with that?
This question is .NET specific
Someone asked for the code:
public class StackQueue
{
private Stack<WebRequestInfo> stack = new Stack<WebRequestInfo>();
private Queue<WebRequestInfo> queue = new Queue<WebRequestInfo>();
public int Count
{
get
{
return this.queue.Count + this.stack.Count;
}
}
public void Enqueue(WebRequestInfo requestInfo)
{
this.queue.Enqueue(requestInfo);
}
public void Push(WebRequestInfo requestInfo)
{
this.stack.Push(requestInfo);
}
private WebRequestInfo Next()
{
if (stack.Count > 0)
{
return stack.Pop();
}
else if (queue.Count > 0)
{
return queue.Dequeue();
}
return null;
}
}
Well, you can usually use things like ManualResetEvent to get a few threads into an expected problem state before releasing the gate... but that only covers a small subset of threading issues.
For the bigger problem of threading bugs, there is CHESS (in progress) - maybe an option in the future.
You shouldn't really test for thread safety in a unit test. You should probably have a separate set of stress tests for thread-safety.
Okay, now that you've posted the code:
public int Count
{
get
{
return this.queue.Count + this.stack.Count;
}
}
This is a great example of where you're going to have trouble writing a unit test that will expose the threading issues in your code. This code potentially needs synchronization, because the values of this.queue.Count and this.stack.Count can change in the middle of the calculation of the total, so it can return a value that's not "correct".
HOWEVER - Given the rest of the class definition, nothing actually depends on Count giving a consistent result, so does it actually matter if it's "wrong"? There's no way to know that without knowing how other classes in your program use this one. That makes testing for threading issues an integration test, rather than a unit test.
When writing multi-threaded code, you must use your brain even more than the usual. You must reason logically about every single line of code, whether it is thread safe or not. It's like proving the correctness of a mathematical formula - you can not prove things like "N + 1 > N for all N" by just giving examples of values of N with which the formula is true. Similarly, proving that a class is thread-safe is not possible by writing test cases that try to expose problems with it. With a test it's only possible to prove that there is a fault, but not that there are no faults.
The best thing that you can do, is to minimize the need for multi-threaded code. Preferably the application should have no multi-threaded code (for example by relying on thread-safe libraries and suitable design patterns), or it should be restricted to a very small area. Your StackQueue class looks like simple enough, so that you can make it safely thread-safe with a little thinking.
Assuming that the Stack and Queue implementations are thread-safe (I don't know .NET's libraries), you just need to make Next() thread-safe. Count is already thread-safe as it is, because no client can use the value returned from it safely without using client-based locking - state dependencies between methods would otherwise break the code.
Next() is not thread-safe, because it has state dependencies between methods. If threads T1 and T2 call stack.Count at the same time and it returns 1, then one of them will get the value with stack.Pop(), but the other will call stack.Pop() when the stack is empty (which then appears to throw InvalidOperationException). You will need a stack and queue with non-blocking versions of Pop() and Dequeue() (ones that return null when empty). Then the code would be thread-safe when written like this:
private WebRequestInfo Next()
{
WebRequestInfo next = stack.PopOrNull()
if (next == null)
{
next = queue.DequeueOrNull();
}
return next;
}
TDD is a tool -- and a good one -- but sometimes you have problems that are not well-solved using a particular tool. I would suggest that, if developing the tests is overly complex, you should use TDD to develop the expected functionality, but perhaps rely on code inspection to assure your self that the locking code that you add is appropriate and will allow your class to be thread-safe.
One potential solution would be to develop the code that would go inside the lock and put it in its own method. Then you might be able to fake this method to test your locking code. In your fake code you could simply establish a wait to ensure that the second thread accessing the code must have waited on the lock until the first thread completes. Without knowing exactly what your code does, I can't be more specific than that.
Multithreading can result in so complex problems that it is next to impossible to write unit tests for them. You might manage to write a unit test which has 100% failure rate when executed on a code but after you make it pass, it's quite likely that there are still race conditions and similar problems in the code.
The problem with threading problems is that they crop up randomly. Even if you get a unit test to pass, it doesn't necessarily mean that the code works. So in this case TDD gives a false sense of security and might even be considered a bad thing.
And it is also worth remembering that if a class is thread safe you can use it from several threads without problems - but if a class is not thread safe it doesn't immediately imply that you can't use it from several threads without problems. It still could be thread safe in practice but no one just wants to take responsibility for it not being thread safe. And if it's thread safe in practice, it's impossible to write a unit test which fails because of multi-threading. (Of course most non-thread safe classes really aren't thread safe and will fail happily.)
Just to be clear, not all classes need locks to be thread safe.
If your tests end up being overly complex, it may be a symptom of your classes or methods being too complicated, being two tightly coupled, or taking on too much responsibility. Try to follow the single responsibility principle.
Would you mind posting more concrete information about your class?
Especially with multi-core systems, you can usually test for threading issues, just not deterministically.
The way I usually do it is spin up multiple threads that blast through the code in question and count the number of unexpected results. Those threads run for some short period of time (usually 2-3 seconds), then use Interlocked.CompareExchange to add their results, and exit normally. My test that spun them up then calls .Join on each, then checks to see whether the number of errors was 0.
Sure, it's not foolproof, but with a multi-core CPU, it usually does the job well enough to demonstrate to my coworkers that there's a problem that needs to be addressed with the object (assuming the intended use calls for multi-threaded access).
I agree with the other posters that multithreaded code should be avoided, or at least confined to small portions of your application. However, I still wanted some way to test those small portions. I'm working on a .NET port of the MultithreadedTC Java library. My port is called Ticking Test, and the source code is published on Google Code.
MultithreadedTC lets you write a test class with several methods marked with a TestThread attribute. Each thread can wait for a certain tick count to arrive, or assert what it thinks the current tick count should be. When all current threads are blocked, a coordinator thread advances the tick count and wakes up any threads that are waiting for the next tick count. If you're interested, check out the MultithreadedTC overview for examples. MultithreadedTC was written by some of the same people that wrote FindBugs.
I've used my port successfully on a small project. The major missing feature is that I have no way to track newly created threads during the test.
Most unit tests operate sequentially and not concurrently, so they will not expose concurrency problems.
Code inspection by programmers with concurrency expertise is your best bet.
Those evil concurrency problems often won't show up until you have enough product in the field to generate some statistically relevant trends. They are incredibly hard to find sometimes, so it is often best to avoid having to write the code in the first place. Use pre-existing thread-safe libraries and well-established design patterns if possible.
You find your "invariant" - something that must remain true irrespective of the number and interleaving of client threads. This is the hard part.
Then write a test that spawns a number of threads, exercises the method and asserts that the invariant still holds. This will fail - because there exists no code to make it thread-safe. Red
Add code to make it thread-safe and make the stress test pass. Green. Refactor as needed.
For more details, refer to the GOOS book for their chapters towards the end related to multi-threaded code.