I'm writing a calculation intensive program in C# using the TPL. Some preliminary benchmarking shows good reduction in computation time through using processors with more cores/threads.
However, there is a limit to how many threads are available on a single CPU (I think even the best Xeons money can buy is currently have about 16).
I've been reading about how render farms with a 'grid' of multiple inexpensive CPUs in their own machines is a good way to increase the overall core count, but I have no idea how I go about implementing one of these. Is it implemented at the OS level with Microsoft server technology (and if so, how?), or do I also need to modify the C# code itself?
Any help or links to existing information would be greatly appreciated.
If you want to do this at scale (100s of nodes) then developing your own system is hard. You have to handle; nodes becoming unavailable, data replication to each node, tracking job progress.. It's a long list. You also need to consider the sort of communication you're going to require between your nodes. Remember that the cost of sending a message (data) from one thread to another is tiny compared to the cost of sending it to another machine across a network (even a fast one). You may have to completely rewrite your multithreaded application to run well on a distributed system, even to the point of using a completely different algorithm.
Hadoop
Microsoft had plans to commercialize Dryad as LINQ to HPC but this project was sidelined a while back (I worked on this project before I left Microsoft). I believe you can still get the final "public preview", but it's unsupported. The SQL team opted to work with the Hadoop/Hortonworks people on getting a Windows/Azure/.NET friendly Hadoop distribution off the ground. As far as I know the only thing they have delivered is HDInsight. A Hadoop service running in Azure.
There is now a Microsoft .NET SDK For Hadoop which will allow you to manage a cluster and submit jobs etc. It does not seem to allow you to write code that executes on the Hadoop nodes. You can however use the Hadoop streaming API. This is fairly low level but is language agnostic so you can pretty much use it to integrate map reduce code written in any language with Hadoop. More details on this can be found in this blog post.
Hadoop for .NET Developers
If you want to do this as a smaller scale (10s of nodes) then I would look for something like MPI .NET. it looks like this project has been abandoned but something similar is probably what you want.
You might look into some like Dryad - http://research.microsoft.com/en-us/projects/dryadlinq/default.aspx
It might on the other hand also be a big too much for your situation, but the ideas in Dryad could be simplified to your needs.
You might also look into making your own TaskScheduler, which could handle the distribution of threads to agents running on other boxes, but you would have to implement a simple socket client/server communication to get and push the data.
Another and a bit odd suggestion, which might be okay for investigating things, is to do the following.
Let the master of the calculation cut the problem into the number of available client computers.
Write the parameters to kick of the calculation for each client to a file shared by all on the network.
Let the clients look for files dedicated to them, and kick of the calculation for their piece, when file appears. The output is written back to a result file.
The server will sit an listen for all clients completing their jobs.
The files could be replaced with a database, low-level sockets, REST services, Web Services etc. depending on your needs.
Say I created a winform app and distributed to anonymous users, and I want to have a way to get the statistics of user opens the app, one way I can think of is opening a webpage (lightweight) on app startup then analyzing how many times the webpage is opened.
Any other ways to get the statistics?
The best way is to use the existing tools out there and implement them in your application for statistical usage and analytics for example as mentioned before: EQATEC.
you also have Preemptive analytics: http://www.preemptive.com/products/runtime-intelligence/overview
This is a most common used tool especially in the whole ALM lifecycle process and how applications and companies progress onwards with minimal effort story.
you also need to know exactly what kind of statistics are you monitoring here? number of times your app is run on a particular version of the software? particular screens being used within your app? memory/CPU usage? etc...
you also have trackerbird:
http://www.trackerbird.com/
There are many ways.
The initial idea of having your application "phone home" to count its usage isn't that bad, but it does not have to be an entire web page.
You could just post some data to a webservice.
If it can't connect to your web service, you could store the information locally and send it next time the app starts with a valid web connection.
If you do this asynchronously, it should be hardly noticeable when you start the app.
You can have your own tracking service, which you can consume every time when application starts. Alternative way you can use any third party application/service which provides this kind of functionality. Telerik eqatec analytics could be one of its kind.
I just implemented a WCF service and I am currently looking at service monitoring options. Our server team that currently hosts only java services wants us to have instances running all the time, so it can gather data in that instance during its lifetime and they said they will use one of our operations with webmon to get statistical information. But we are using per call and I dont think that will work under this architecture.
I am wondering if there is a way to get the statistics of how an operation in the service did in certain amount of time and provide an another operation for webmon to use that gives an integer value about its performace in certain time period, webmon, then decides weather to alert the admin or not.
I was considering parsing of log files to get statistics but that might be an expensive operation if done every 15 mins.
If not what are my options for detailed automatic health monitoring of wcf applications?
My company very recently agreed to open-source (under the GPL License) the tool that we use internally to monitor our live web services and for producing availability and response time reports. It's called ServiceMon and it may meet your needs.
It runs on Windows as a standalone application and works by following a simple script of operations that dictate the services to be monitored. For example, to check a web page contains a particular value, in a similar manner to webmon, you'd use this line:
http-get "http://www.google.com" must-contain "I'm Feeling Lucky"
The frequency at which it executes the script operations can be easily configured as can the order which it processes them.
In addition to monitoring web pages and web services we use ServiceMon to track availability statistics of each service and to produce response time statistics.
ServiceMon is written using a plugin architecture so you can use .NET to add new types of monitoring operations. So, for example, if your web service uses funky authentication you can fairly easily plug this in to the utility.
Full documentation and download instructions here
I hope you find it useful and I'd love to hear your thoughts
Disclaimer: I developed ServiceMon so I may be a little bit biased :)
I am building a client-server based solution; client being a desktop application and the server being a web application.
Basically, I need to monitor the performance and resource utilization of the client, which is a .NET 2.0 based Windows Desktop application.
The most important thing I need to monitor is the network resources the client uses, i.e. what is the size of the data that flows out from the client to the server and what is the size of the data that the client downloads from the server.
Apart from this, general performance monitoring would help too.
Please guide.
Edit: A few people have suggested using perfmon, but aren't the values shown in perfmon system-wide? I need these network based stats for a single application only...bytes being sent and received by a single desktop application.
The standard tool for network monitoring is Wireshark.
It allows you to filter the network traffic very flexiblely.
This could be quite an overkill for your application though.
If you are using pure .NET, I would suggest that you add performance logging into your networking classes on the server side- if you are using .Net library classes, then inheritate from them your own classes which add statistics when sending and receiving data.
You need to split your monitoring in two parts:
How the system interacts with the server (number of calls performed)
Amount of network traffic (size of exchanged data for any call)
The first part is (in my experience) often negleted while it has a lot of importance, because acquiring a new connection is often much more expensive that data traffic in itself.
You do not tell us anything about the king of connection you're using (low level tcpip calls, web services, WCF or what else) but my suggestion is:
Find a way to determine how many time your application calls the server
Find how much any single call is costing in term of data exchanged
How to monitor these values depends a lot from the technology involved, for some is very simple (if, for example, you're using a web service, setting up Fiddler to monitor the calls and examining an monitoring results is very simple), for other you need to work using a low level traffic analyzer like Wireshark or MS Network Monitor and learn how to filter traffic according to IP address of the server, ports used and other parameters.
If you clarify your application architecture I can try to be more specific.
Regards
Massimo
You can also use Task Manager to do this. Go to the processes tab, then View->"select columns". Check "I/O read bytes" and "I/O write bytes". Then find your program in the processes list and you can observe the cumulative values.
Take a look at this article: http://www.codeproject.com/KB/IP/apptraffwatcher.aspx
You may be able to tear apart the source code, and grab what you need to meassure download/upload for your application's process ID.
It looks like he uses this library to get information about the amount of traffic: http://www.codeproject.com/KB/IP/trafficwatcher.aspx
I tried the perfmon and I was unable to watch our network traffic either. But I was able to in the Performance Explorer in Visual Studio 2005 Team suite.
If you have Team edition Visual Studio you can set up either Sampling/Instrumentation on your desktop application. Then go into options of the session. select Events -> Windows Kernel Trace -> Network. Run your application and let the Visual studio log the data. Then save the report. (I love Microsoft for this "feature") go to the command prompt, navigate to C:\Program Files\Microsoft Visual Studio 8\Team Tools\Performance Tools and run "vsperfreport /CALLTRACE (filename).vsp" This will produce a csv file containing all network packets sent/recieved/size/port etc by the desktop application.
I know this was a long winded solution but I just tried it on my .Net 2.0 application and it captured all of our communication with Oracle Identity Manager and Oracle Database.
It is not clear by your post if you are using HTTP requests. You indicated that the server is a web application, which implies (perhaps incorrectly) to me that you might be using the HTTP protocol to send/receive data from server to client.
If so, one tool that might be of use is Fiddler. This tool will monitor all HTTP traffic in and out of your workstation and it can (I believe) watch specific sessions and applications. The nice part is that you can see individual requests and see the statistics for these requests, including bytes in/out, round trip times, and other useful bits of information.
If you are not HTTP based, then this tool won't help.
I'm surprised nobody has suggested SysInternals (now Microsoft) Process Explorer (technet.microsoft.com/en-us/sysinternals/bb896653.aspx). If you right click on the executable in question and left click properties it will bring up a dialog box. Then you switch to the performance tab and you can monitor I/O of the executable. The Performance Graph tab will show CPU usage and I/O bytes history graphed over time. It's a cool and free tool.
You want to look at perfmon (otherwise called Performance Monitor in admin tools off the start menu).
Open it in its default graph view, add a counter, select network interface, then bytes per second (or a similar counter), click ok and you're done.
You can experiment with the other networking counters as there are many, one of them will do exactly what you want. You can also save the perfmon logs to a file and view them afterwards - you'll see the graph in its entirety and you can "zoom in" on sections. Alternatively, you can save log-style files with just raw numbers.
Here's a quick guide through perfmon as an admin tool, once you understand that, the rest comes easily.
In Vista you can't add individual counters any more, you add the entire set of counters grouped under an object - so for my example, you'd add the Network Interface object, then you'd see all the individual counters on the graph after you click ok.
If you want this built into your client codebase, and not using an external tool, you can use Performance Counters to get access to this and most other things reported by the Performance Monitor, Task Manager, etc.
You should check out ACE Analyst for this use case - think of it as a superintelligent layer on top of Wireshark packet captures. You need to look at the packets to understand the true nature of the application behavior as runs across the network.
I've got a backup application here which connects to various webservices and downloads/uploads files from ftp or http servers. What is the easiest way to limit the bandwidth usage of my application?
I need to do that because the application once installed and running will be slowing down internet access for all office people, which eventually will get me into hell. So I'd like to implement a speed-limit which is active during the work-hours and gets disabled at night.
What you are looking for is called Bandwidth throttling And here is a good example how is this done, also review the comments to know how it is done from a client side.
You may also want to take a look at this example too, putting things in a real application