I'm embarking on a C# project where certain tasks will be divvied up and run on multiple threads. I'm looking for as many good tools as possible to help make sure I'm running as efficiently as possible. So I want to watch for things like resource (CPU) usage, blocking, deadlocks, threads that are waiting for work, and so on. I want to be able to compare different approaches to see what works best under different conditions.
I'm also looking to learn what Perfmon counters are more or less useful for trying to optimize and compare threading models.
I also can't afford to purchase anything too expensive, so the free-er the better.
This is a C# project on .NET 3.5, with VS 2008 (though I could use the VS 2010 beta if it offered more help for threading, which I've heard it does).
Thanks.
EDIT: I'm definitely looking for Perfmon recommendations, as well as any other tool that I can also use when I want to monitor the app in a production environment. So, debugging tools are needed, but I also want tools for non-debug environments. Thx.
FURTHER EDIT: Here are a few useful links I've found since I asked the question:
Tools And Techniques to Identify Concurrency Issues (MSDN Magazine Article)
PerfMon - Your debugging buddy (important counters for .NET debugging, including threading related counters)
Indeed Visual Studio 2010 is a good option: http://www.danielmoth.com/Blog/2009/05/parallel-tasks-new-visual-studio-2010.html
One "trick" that helps me when debugging threads is to remember to set each thread's name property, as it helps a lot during debugging. If the thread's Name property is not assigned, it has a null value, and the debugging window will show <No name>, so it it won't be very helpful.
I've had the best results by creating a logger class and instrumenting my code so that I can catch when threads start and stop and measure elapsed time for their internal processes. You can also use the various .Net libs for capturing memory load as mentioned here: How to get memory available or used in C#.
I've logged to queues, databases (System.Data.Sqlite is great for this), and files. Even a cobmination queue->database with the logger on a separate thread. This has particularly helpful for multi-threaded Windows services because I can monitor the logs while its running and even control the logging verbosity through a separate control table via a small Windows app. Even the sys admins find it easy to use.
It's not that much extra work and you always have it once you deploy.
Another thing i would suggest is: expose some of your classes as wmi instances. This way you can tweak settings and call functions without restarting the application, this way you can notice the effects immediately. See this article.
I know this is a C# question, but still : in C# you can use any CLR libraries including those written in F#. In F# there are lots of cases where concurrency becomes very easy due to its functional nature (no side effects). Maybe writing some parts in F# might pay of.
Related
Is there a way/system to debug/monitor code without stopping execution?
In industrial automation control programming (PLC/PAC/DCS) it is possible to connect the debugger while the program is running, and see in the code editor the value of variables and expressions, without setting breakpoints or tracepoints.
As an example, let's have a F# multithreaded application, where code is executed in a continuous loop or triggered by timers. Is there a way to attach a debugger like Visual studio Debugger and see the values of variables and expressions (in the code editor or in a watch pane) WITHOUT interrupting the execution?
It doesn't matter if it's not synchronous, it's acceptable if the debugger/monitor does not capture all the code scans.
I am tasked to create an high level controller for a process plant and I would like to use C# or F# or even C++ with a managed or native application, instead of a PAC system. But being forced to interrupt execution to debug is a huge disadvantage in this kind of application.
UPDATE
First of all thanks to all for their answer.
Based on those answers, though, I realized that probably I need to reformulate my question as follows:
Is anyone aware of any library/framework/package/extension that allows to work with a native or managed application in windows or linux (C#, F# or C++) the exact same way as a PAC development platform, specifically:
1) Put the dev platform in "status" mode, where it shows automatically the runtime value for variables and expressions present in the code exceprt currently visible, without interrupting execution?
2) Create watch windows that show the runtime value of variables and expressions, again without interrupting execution?
Also, what I am looking for is something that (like any PAC platform) offers these features OUT OF THE BOX, without requiring any change in the application code (like adding log instructions).
Thank you in advance
UPDATE 2
It looks like there is something (see http://vsdevaids.webs.com/); does anyone know whether they are still available somewhere?
UPDATE 3
For those interested, I managed to download the last available release of VSDEVAIDS. I installed it and looks working, but it's pointless without a licence and couldn't find information on how to reach the author.
http://www.mediafire.com/file/vvdk2e0g6091r4h/VSDevAidsInstaller.msi
If somebody has better luck, please let me know.
this is a normal requirement - needing instrumentation / diagnostic data from a production system. Its not really a debugger. Its usually one of the first things you should establish in your system design.
Not knowing your system at all its hard to say what you need but generally they fall into 2 categories
human readable trace - something like log4net is what I would recommend
machine readable counters etc. Say 'number of widget shaving in last pass',..... This one is harder to generalize, you could layer it onto log4net too. Or invent your own pipe
With regards to your edited question, I can almost guarantee you that what you are looking for does not exist. Consequence-free debugging/monitoring of even moderate usefulness for production code with no prior effort? I'd have heard of it. Consider that both C++ and C# are extremely cross-platform. There are a few caveats:
There are almost certainly C++ compilers built for very specific hardware that do what you require. This hardware is likely to have very limited capabilities, and the compilers are likely to otherwise be inferior to their larger counterparts, such as gcc, clang, MSVC, to name a few.
Compile-time instrumentation can do what you require, although it affects speed and memory usage, and even stability, in my experience.
There ARE also frameworks that do what you require, but not without affecting your code. For example, if you are using WPF as your UI, it's possible to monitor anything directly related to the UI of your application. But...that's hardly a better solution than log4net.
Lastly, there are tools that can monitor EVERY system call your application makes for both Windows (procmon.exe/"Process Monitor" from SysInternals) and Linux (strace). There's very little you can't find out using these. That said, the ease of use is hardly what you're looking for, and strictly internal variables are still not going to be visible. Still might be something to consider if you know you'll be making system calls with the variables you're interested in and can set up adequate filtering.
Also, you should reconsider your "No impact on the code" requirement. There are .NET frameworks that can allow you to monitor an entire class merely by making a single function call during construction, or by deriving from a class in the framework. Many modern UIs are predicated on the UIs being able to be notified of any change to the data they are monitoring. Extensive effort has gone into making this as powerful and easy as possible. But it does require you to at least consider it when writing your code.
Many years ago (think 8 bit 6502/6809 days) you could buy (or usually rent, I seem to remember a figure of £40K to purchase one in the late 80s) a processor simulator, that would allow you replace the processor in your design with a pin compatible device that had a flying lead to the simulator box. this would allow things like capturing instructions/data leading up to a processor interrupt, or some other way of stopping the processor (even a 'push button to stop code' was possible). You could even step-backwards allowing you to see why an instruction or branch happened.
In these days of multi-core, nm-technology, I doubt there is such a thing.
I have been searching for this kind of features since quite a long time with no luck, unfortunately. Submitting the question to the StackOverflow community was sort of a "last resort", so now I'm ready to conclude that it doesn't exist.
VSDevAids (as #zzxyz pointed out) is not a solution, as it requires significant support from the application itself.
Pod cpu emulators (mentioned by #Neil) aka in-circuit emulators (ICE) and their evolutions are designed to thoroughly test the interaction between firmware and hardware, not so useful in high level programming (especially if managed like .NET).
Thanks for all contributions.
We have a very high performance multitasking, near real-time C# application. This performance was achieved primarily by implementing cooperative multitasking in-house with a home grown scheduler. This is often called micro-threads. In this system all the tasks communicate with other tasks via queues.
The specific problem that we have seems to only be solvable via first class continuations which C# does not support.
Specifically the problem arises in 2 cases dealing with queues. Whenever any particular task performs some work before placing an item on a queue. What if the queue is full?
Conversely, a different task may do some work and then need to take an item off of a queue. What if that queue is empty?
We have solved this in 90% of the cases by linking queues to tasks to avoid tasks getting invoked if any of their outbound queues are full or inbound queue is empty.
Furthermore certain tasks were converted into state machines so they can handle if a queue is full/empty and continue without waiting.
The real problem arises in a few edge cases where it is impractical to do either of those solutions. The idea in that scenario would be to save the stack state at the point and switch to a different task so that it can do the work and subsequently retry the waiting task whenever it is able to continue.
In the past, we attempted to have the waiting task call back into the schedule (recursively) to allow the other tasks to and later retry the waiting task. However, that led to too many "deadlock" situations.
There was an example somewhere of a custom CLR host to make the .NET threads actually operate as "fibers" which essentially allows switching stack state between threads. But now I can't seem to find any sample code for that. Plus it seems that will take some significant complexity to get it right.
Does anyone have any other creative ideas how to switch between tasks efficiently and avoid the above problems?
Are there any other CLR hosts that offer this, commercial or otherwise? Is there any add-on native library that can offer some form of continuations for C#?
There is the C# 5 CTP, which performs a continuation-passing-style transformation over methods declared with the new async keyword, and continuation-passing based calls when using the await keyword.
This is not actually a new CLR feature but rather a set of directives for the compiler to perform the CPS transformation over your code and a handful of library routines for manipulating and scheduling continuations. Activation records for async methods are placed on the heap instead of the stack, so they're not tied to a specific thread.
Nope, not going to work. C# (and even IL) is too complex language to perform such transformations (CPS) in a general way. The best you can get is what C# 5 will offer. That said, you will probably not be able to break/resume with higher order loops/iterations, which is really want you want from general purpose reifiable continuations.
Fiber mode was removed from v2 of the CLR because of issues under stress, see:
Fiber mode is gone...
Fibers and the CLR
Question to the CLR experts : fiber mode support in hosting
To my knowledge fiber support has not yet bee re-added, although from reading the above articles it may be added again (however the fact that nothing has mentioned for 6-7 years on the topic makes me believe that its unlikely).
FYI fiber support was intended to be a way for existing applications that use fibers (such as SQL Server) to host the CLR in a way that allows them to maximise performance, not as a method to allow .Net applications to create hundereds of threads - in short fibers are not a magic bullet solution to your problem, however if you have an application that uses fibers an wishes to host the CLR then the managed hosting APIs do provide the means for the CLR to "work nicely" with your application. A good source of information on this would be the managed hosting API documentation, or to look into how SQL Server hosts the CLR, of which there are several highly informative articles around.
Also take a quick read of Threads, fibers, stacks and address space.
Actually, we decided on a direction to go with this. We're using the Observer pattern with Message Passsing. We built a home grown library to handle all communication between "Agents" which are similar to an Erlang process. Later we will consider using AppDomains to even better separate Agents from each other. Design ideas were borrowed from the Erlang programming language which has extremely reliable mult-core and distributed processing.
The solution to your problem is to use lock-free algorithms allowing for system wide progress of at least one task. You need to use inline assembler that is CPU dependent to make sure that you atomic CAS (compare-and-swap). Wikipedia has an article as well as patterns described the the book by Douglas Schmidt called "Pattern-Oriented Software Architecture, Patterns for Concurrent and Networked Objects". It is not immediately clear to me how you will do that under the dotnet framework.
Other way of solving your problem is using the publish-subscriber pattern or possible thread pools.
Hope this was helpful?
I'm going to design an Application (C# or VB.NET) which use .NET Framework to run for very long time. It may be restarted every year or even more...
Is there anything (using special design patterns or so) which I must care about in designing "Long time running applications in .NET"?
Is .NET ever a good platform for these kind of applications or I should use other platforms such as J2SE?
(It's not a web application.)
I actually would say that using .NET is well-suited to long running applications. Managed code, in general, tends to do fairly well in this type of scenario, as a compacting GC helps prevent issues that can arise due to memory fragmentation over time.
That being said, it's difficult to give much guidance, as there's very little information in the question itself. The "every year or more" run times is not enough information to say that a particular framework or language choice would benefit - any language can work, as the issues that arise from long running applications tend to be more design issues, and less framework/language/toolset/etc.
I've written some .NET-based applications which run as services and stay continually running for very long times, and never had any issues with the application (at least none related to the technology itself).
I'd worry less about keeping an app running and more about what happens when it inevitably stops - and make no mistake, it WILL stop.
There are many factors that can go wrong; a crash, server fault, network failure or someone simply stopping the app. The true work will be resuming the application's tasks after it restarts.
.NET's garbage collector is very good, so as long as you don't have any non-obvious memory leaks, that should be OK. "Non-obvious" includes not releasing event-handlers when you're truly done with them., using lambda expressions for event handlers in other classes, and that sort of thing.
Be sure that you're catching and logging all unhandled exceptions. If it does die, you'll want to know why.
Also, take a look at the application restart support in Windows 7. This can restart your app in case it does fail. Although it's written for unmanaged code, it's accessible for .net in the Windows 7 API code pack.
If I have an existing solution containing multiple c# projects, are there any static analysis tools that can help me determine which areas of code are the most often used?
I'd like to use this information in order to determine which areas should have their test coverage ramped up first.
I've looked at some static analysis tools already, but they mostly seem to focus on things like complexity, coding conventions, code duplication etc.
Alternatively, if there aren't any analysis tools available that can do this, do you have any advice about how to determine which code I ought to focus on testing first?
Thanks!
EDIT: Just to clarify, what I'm looking for isn't code coverage. I'd like a rundown of which parts of my application are most often used, so that I can in turn focus on improving the coverage in those areas. I'm trying to avoid just writing tests for areas that don't yet have any, as they may be edge cases that aren't often executed.
Even static analysis tools which do try to figure out what happens at run-time do not usually try to estimate how often a piece of code is executed. The subject is hard enough as it is!
But dynamic analysis tools (e.g. profiling tools that either rely on transparent instrumentation of the code or use sampling) can tell you, after one or several "typical" executions (you provide the entries that you judge typical), how often this or that function was executed.
See Profiling (computer programming) on Wikipedia.
If I understood the question right, you are looking for a profiler. Give EQATEC Profiler a try. It's free.
It's originally intended to profile an application before shipping (to detect bottlenecks by measuring execution time of methods etc.) so I'm not sure if it's suitable for an application in a productive environment. At least it changes your code for profiling purposes and that might be unwanted. You should check this out.
Code coverage appears to be what you want.
NCover is a popular code coverage tool for .NET, if you can afford it.
What you're asking for is simply impossible to do accurately. The number of times something is executed can and usually will depend on the data that's entered at run-time. The best you can hope for from a static analysis tool isa direct answer when statically determinedAn O(N) style analysis otherwise
Even the latter would be quite difficult to get right overall. For example, it would need to know the complexity of essentially every function in the (huge and ever-expanding) .NET library. Some of those are hard to even characterize.
Just for example, how long does it take to allocate a block of memory? Well, usually it's usually nearly constant time -- but it's always possible that an allocation can trigger a garbage collection cycle, in which case the time taken will be (roughly) proportional to the number of objects still in use that were allocated since the last GC cycle...
"Profiler" is what you're looking for; which you choose is up to you.
I've used HP's Diagnostic Server to do this, although it'd cost money. It'll tell me what methods are called how many times, and the average and worst-case time spent in them.
As an important safety tip, running a profiler will slow down the execution of your code; it's not ideal for a long-term installation into a production environment.
If you want to see just what is used:
SD C# Test Coverage Tool
If you want to see how often it is used:
SD C# Profiler Tool
If coverage is not what you look for then you could look use two things:
As suggested a profiler and run predictive test scenarios.
Use performance counters which would end in a more permanent solution. These are quite difficult to implement but are helpful to diagnose performance deltas by analyzing the counters report. One way to implement them would be to wrap boundaries and manage counters from those wrap. Be wary that it's way much easier to integrate in a new project than in an existing one.
I have many unused computers at home. What would be the easiest way for me to utilize them to parallelize my C# program with little or no code changes?
The task I'm trying to do involves looping through lots of english sentences, the dataset can be easily broken into smaller chunks, processed in different machines concurrently.
… with little or no code changes?
Difficult. Basically, look into WCF as a way to communicate between various instances of the program across the network. Depending on the algorithm, the structure might have to be changed drastically, or not at all. In any case, you have to find a way to separete the problem into parts that act independently from each other. Then you have to devise a way of distributing these parts between different instances, and collecting the resulting data.
PLinq offers a great way to parallelize your program without big changes but this only works on one process, across different threads, and then only if the algorithm lends itself to parallelization. In general, some manual refactoring is necessary.
That's probably not possible.
How to parallelize a program depends entirely on what your program does and how it is written, and usually requires extensive code changes and increases the complexity of your program many fold.
The usual way to easily increase concurency in a program is take a task that is repeated many times and just write a function that splits that task into chunks and sends them to different cores to process.
The answer depends on the nature of the work your application will be doing. Different types of work have different possible parallelization solutions. For some types there is no possible/feasible way to parallelize.
The easiest scenario I can think of is for an application which work can easily be broken in discrete job chunks. If this is the case, then you simply design your application to work on a single job chunk. Provide your application with the ability to accept new jobs and deliver the finished jobs. Then, build a job scheduler on top of it. This scheduler can be part of the same application (configure one machine to be the scheduler and the rest as clients), or a separate application.
There are other things to consider: How will occur the communication among machines (files?, network connections?); the application need to be able to report/be_queried about percent of job completed?; there is a need to be able to force the application to stop proccessing current job?; etc.).
If you need a more detailed answer, edit your question and include details about the appplication, the problem the application solves, the expected amount of jobs, etc. Then, the community will come with more specific answers.
Dryad (Microsoft's variation of MapReduce) addresses exactly this problem (parallelize .net programs across multiple PCs). It's in research stage right now.
Too bad there are no CTPs yet :-(
You need to run your application on a distributed system, google for distributed computation windows or for grid computing c#.
Is each sentence processed independently, or are they somehow combined? If your processing operates on a single sentence at a time, you don't need to change your code at all. Just execute the same code on each of your machines and divide the data (your list of sentences) between them. You can do this either by installing a portion of the data on each machine, or by sharing the database and assigning a different chunk to each machine.
If you want to change your code slightly to facilitate parallelism, share the entire database and have the code "mark" each sentence as it's processed, then look for the next unmarked sentence to process. This will give you a gentle introduction to the concept of thread safety -- techniques that ensure one processor doesn't adversely interfere with another.
As always, the more details you can provide about your specific application, the better the SO community can tailor our answers to your purpose.
Good luck -- this sounds like an interesting project!
Before I would invest in parallelizing your program, why not just try breaking the datasets down into pieces and manually run your program on each computer and collate the outputs by hand. If that works, then try automating it with scripts and write a program to collate the outputs.
There are several software solutions that allow you to use commodity based hardware. One is Appistry. I work at Appistry and we have done numerous solutions to run C# applications across hundreds of machines.
A few useful links:
http://www.appistry.com/resource-library/index.html
You can download the product for free here:
http://www.appistry.com/developers/
Hope this helps
-Brett
You might want to look at Flow-Based Programming - it has a Java and a C# implementation. Most approaches to this problem involve trying to take a conventional single-threaded program and figure out which parts can run in parallel. FBP takes a different approach: the application is designed from the start in terms of multiple "black-box" components running asynchronously (think of a manufacturing assembly line). Since a conventional single-threaded program acts like a single component in the FBP environment, it is very easy to extend an existing application. In fact, pieces of an existing app can often be broken off and turned into separate components, provided they can run asynchronously with the rest of the app (i.e. not subroutines). Someone called this "turning an iceberg into ice cubes").