Distributing image processing to multiple Windows servers, file locking Q

Distributing image processing to multiple Windows servers, file locking Q - c#

I need to process large image files into smaller image files. I would like to distribute the work to many "slave" servers, rather than tasking my main server with this. I am using Windows Server 2005/2008, C#, and ASP.NET. I have a lot of web application development experience but have not developed distributed systems. I had a notion that this could be designed as follows:
1) Files would be placed in a shared network drive
2) Slave servers would periodically poll the drive for new content
3) Slave servers would rename newly found files to something like UNPROCESSED_appIDXXXX_jidXXXXX_photoidXXXXX.tif and begin processing that file.
4) Other slave servers would avoid trying to process files that are in process by examining file name, i.e. if something has been named "UNPROCESSED" they will not attempt to process.
I am wondering a few things:
1) Will there be issues with two slave servers trying to "grab" and rename the file at once, or will Windows Server automatically lock the file?
2) What do you think the best mechanism for notification of new content for processing should be? One simple idea is to write a basic aspx page on each slave system and have it running on a timer. A better idea might be to write a Windows Service that utilizes SystemFileWatcher and have it running on each slave system. A third idea is to have a central server somehow dispatch instructions to a given slave server to attempt a processing job, but I do not know of ways of invoking that kind of communication beyond a very hack-ish approach of having the master server pass a message via HTTP.
I'd much appreciate any guidance you have to offer.
Cheers,
-KF

If you don't want to go all the way with a compute cluster type solution. You should consider having a job manager running somewhere that will parcel out the work. That way, when a server becomes available to do work, it asks the job manager for a new bit of work to do. It can then tell the job manager that it's finished and the job manager can inform your "client" when the work on the whole job is complete. That way, it's easy to register work and know it's complete and the job manager can parcel out the work without the worry of race conditions on file renames. :)

Related

Register certain events on client machine and notify to another C#

Please don't get confuse yourself with the title of this question, I don't know what is the exact technical term of what I want to accomplish :). My requirement may be little strange and I already implemented it but I need some best practice/method to do it properly.
Here is my situation.
I am developing a client system monitoring windows application (Tracking software in client side and monitoring software in my system). I have many systems connected to a LAN and I have a monitoring system. If any certain actions happen on client system, I will get notified. I cannot use any databases in my network so what I am doing is, Since my system is also connected to LAN I shared one folder in my system. Whenever some actions happens in client system, Tracking software will create a file containing event to the shared folder in my system. The monitoring software uses a timer which will continuously check for any new files in the shared folder on a certain interval(15 Minutes). If any file found, monitoring system will know some event has happened and will show the event.
But the problem I will get notified only after 15 minutes. Also is I don't think this is the best way. There may be some good and best methods. Is there any way like registering event directly to my Monitoring application from client machine?
Please NOTE: I cannot use any Database for this purpose.
Any suggestions will be appreciated.

Take a look at SignalR - it provides real time notification and can be used exactly as you describe.
You would not require a database (but remember if your server isn't running you will miss events - this may or may not be acceptable).

Take a look at FileSystemWatcher. This will monitor directories and raise events. IME, it works well, but can fail with large amounts of traffic.

This sounds like a perfect candidate for MSMQ (MS Message Queue) and Triggers.
Create an MSMQ that all your Tracking Softwares can write to. Then have an MSMQ trigger (perhaps connecting to a front-end through WCF/named pipes) to display an alert in your Monitoring Software

You may want to use WCF Framework.
Here is two links that can help you:
wcf-tutorial-events-and-callbacks
wcf-tutorial-basic-interprocess-communication

Windows Service Vs Simple Program

Let me give a back ground for everybody before I go to my problem. My company hosts website for many clients, my company also contracts some of the work to another company.
So when we first set up a website with all the informations to our clients, we pass that information to the other company we contracted and three of us have the same data. Problem is once the site is up and running, our clients will change some data and when ever they do that we should be able to update our contracted company.
The way we transfer data to the contracted company is by using a web service (httppost, xml data). Now my question is what it the best way to write a program which sends updated data to the contracted company everytime our clients change some data.
1) Write a windows service having a timer inside my code where every 30min or so connects to the database and find all changes and send it to the contracted company
2) Write the same code as #1 (with out the timer in it) but this time make it a simple program and let windows scheduler wake it every 30min
3) Any other suggestion you may have
Techenologies available for me are VS 2008, SQLServer 2005

Scheduled task is the way to go. Jon wrote up a good summary of why services are not well suited for this sort of thing: http://weblogs.asp.net/jgalloway/archive/2005/10/24/428303.aspx

A service is easy to create and install and is more "professional" feeling so why not go that way? Using a non-service EXE would also work of course and would be slightly easier to get running (permissions, etc.) but I think the difference in setup between the two is nearly negligible.

One possible solution would be to add a timestamp column to your data tables.
Once this is done, you can have one entry in each table that has the last collected time by your contracted company. They can pull all records since that last time and update their records accordingly.

A Windows Service is more self contained, and you can easily configure it to start up automatically when the OS is starting up. You might also need to create additional configuration options, as well as some way to trigger the synchronization immediately.
It will also give you more room to grow your functionality for the service in the future.
A standalone app should be easier to develop though, however you are reliant on the windows scheduler to execute the task always. My experience has been that it is easier to mess up things with the windows scheduler and have it not run, for example in cases where you reboot the OS but no user has logged in.
If you want a more professional approach go with the service, even though it might mean a little bit more work.

A windows service makes more sense in this case. Think about what happens after your server is restarted:
With a Windows Application you need to have someone restart the application, or manually copy a shortcut to the startup folder to make sure the application gets launched
OR,
With a Windows Service you set it to start automatically and forget about it. When the machine reboots your service starts up and continues processing.
One more consideration, what happens when there is an error? A Windows application would likely show an error dialog and wait for input before continuing; whereas a service would log the error in the event log and carry on.

Determining programmatically whether an ASP.NET/C# Website is down

I am having an ASP.NET/C# Web application hosted in IIS6. My requirement is to send a mail whenever the Website is down without using any third party tool. How can I accomplish this job programmatically (of course using C#)? Thanks in advance!!!!!

You will need a PC that is as independent as possible form the WebServer. Ideally on the other side of the world.
Then run a little program with a Timer and check every X minutes. Do a simple grab with WebClient. If it fails, send the mail.
For improved reliability, run more instances of the monitoring program at different locations.

Define "down". There are many reasons why a website might not be accessible or only partially working. Ultimately, it's really what the end user is seeing that's most important. A tool that is running outside of the website's network infrastructure that periodically queries the website's key pages and checks important factors such as the HTTP status code, the response time, the size of the page and even possibly checks that important chunks of HTML are present would achieve this.
Attempting to determine why the site is not responsing is an even more complex task that would involve checking for the presence of the IIS application pool, etc.
This is not a trivial tool to create so I would recommend using an off-the shelf solution if possible.

High availability & scalability for C#

I've got a C# service that currently runs single-instance on a PC. I'd like to split this component so that it runs on multiple PCs. Each PC should be assigned a certain part of the work. If one PC fails, its work should be moved to a backup machine.
Data synchronization can be done by the DB, so that should not be much of an issue. My current idea is to use some kind of load balancer that splits and sends the incoming requests to the array of PCs and makes sure the work is actually processed.
How would I implement such a functionality? I'm not sure if I'm asking the right question. If my understanding of how this goal should be achieved is wrong, please give me a hint.
Edit:
I wonder if the idea given above (load balancer splitswork packages to PCs and checks for result) is feasible at all. If there is some kind of already implemented solution so this seemingly common problem, I'd love to use that solution.
Availability is a critical requirement.

I'd recommend looking at a Pull model of load-sharing, rather than a Push model. When pushing work, the coordinating server(s)/load-balancer must be aware of all the servers that are currently running in your system so that it knows where to forward requests; this must either be set in config or dynamically set (such as in the Publisher-Subscriber model), then constantly checked to detect if any servers have gone offline. Whilst it's entirely feasible, it can complicate the scaling-out of your application.
With a Pull architecture, you have a central work queue (hosted in MSMQ, Sql Server Service Broker or similar) and each processing service pulls work off that queue. Expose a WCF service to accept external requests and place work onto the queue, safe in the knowledge that some server will do the work, even though you don't know exactly which one. This has the added benefits that each server monitors it's own workload and picks up work as-and-when it is ready, and you can easily add or remove servers to/from this model without any change in config.
This architecture is supported by NServiceBus and the communication between Windows Azure Web & Worker roles.

From what you said each PC will require a full copy of your service -
Each PC should be assigned a certain
part of the work. If one PC fails, its
work should be moved to a backup
machine
Otherwise you won't be able to move its work to another PC.
I would be tempted to have a central server which farms out work to individual PCs. This means that you would need some form of communication between each machine and and keep a record back on the central server of what work has been assigned where.
You'll also need each machine to measure it's cpu loading and reject work if it is too busy.
A multi-threaded approach to the service would make good use of those multiple processor cores that are ubiquitoius nowadays.

How about using a server and multi-threading your processing? Or even multi-threading on a PC as you can get many cores on a standard desktop now.
This obviously doesn't deal with the machine going down, but could give you much more performance for less investment.

you can check windows clustering, and you have to handle set of issues that depends on the behaviour of the service (you can put more details about the service itself so I can answer)

This depends on how you wanted to split your workload, this usually done by
Splitting the same workload by multiple services
Means same service being installed on
different servers and will do the
same job. Assume your service is reading huge data from the db servers and processing them to produce huge client specific datafiles and finally this datafile is been sent to the clients. In this approach all your services installed in diff servers will do the same work but they split the work to increaese the performance.
Splitting the part of the workload by multiple services
In this approach each service will be assigned to the indivitual jobs and works on different goals. in above example one serivce is responsible for reading data from db and generating huge data files and another service is configured only to read the data file and send it to clients.
I have implemented the 2nd approach in one of my work. Because this let me isolate and debug the errors in case of any failures.

The usual approach for load balancer is to split service requests evenly between all service instances.
For each work item (request) you can store relative information in database. Then each service should also have at least one background thread checking database for abandoned work items.

I would suggest that you publish your service through WCF (Windows Communication Foundation).
Then implement a "central" client application which can keep track of available providers of your service and dish out work. The central app will act as scheduler and load balancer of the tasks to be performed.
Check out Juwal Lövy's book on WCF ("Programming WCF Services") for a good introduction on this topic.

You can have a look at NGrid : http://ngrid.sourceforge.net/
or Alchemi : http://www.gridbus.org/~alchemi/index.html
both are grid computing framework with load balancers that will get you started in no time.
Cheers,
Florian

Monitoring (network) resource utlization and performance of a windows application

I am building a client-server based solution; client being a desktop application and the server being a web application.
Basically, I need to monitor the performance and resource utilization of the client, which is a .NET 2.0 based Windows Desktop application.
The most important thing I need to monitor is the network resources the client uses, i.e. what is the size of the data that flows out from the client to the server and what is the size of the data that the client downloads from the server.
Apart from this, general performance monitoring would help too.
Please guide.
Edit: A few people have suggested using perfmon, but aren't the values shown in perfmon system-wide? I need these network based stats for a single application only...bytes being sent and received by a single desktop application.

The standard tool for network monitoring is Wireshark.
It allows you to filter the network traffic very flexiblely.
This could be quite an overkill for your application though.
If you are using pure .NET, I would suggest that you add performance logging into your networking classes on the server side- if you are using .Net library classes, then inheritate from them your own classes which add statistics when sending and receiving data.

You need to split your monitoring in two parts:
How the system interacts with the server (number of calls performed)
Amount of network traffic (size of exchanged data for any call)
The first part is (in my experience) often negleted while it has a lot of importance, because acquiring a new connection is often much more expensive that data traffic in itself.
You do not tell us anything about the king of connection you're using (low level tcpip calls, web services, WCF or what else) but my suggestion is:
Find a way to determine how many time your application calls the server
Find how much any single call is costing in term of data exchanged
How to monitor these values depends a lot from the technology involved, for some is very simple (if, for example, you're using a web service, setting up Fiddler to monitor the calls and examining an monitoring results is very simple), for other you need to work using a low level traffic analyzer like Wireshark or MS Network Monitor and learn how to filter traffic according to IP address of the server, ports used and other parameters.
If you clarify your application architecture I can try to be more specific.
Regards
Massimo

You can also use Task Manager to do this. Go to the processes tab, then View->"select columns". Check "I/O read bytes" and "I/O write bytes". Then find your program in the processes list and you can observe the cumulative values.

Take a look at this article: http://www.codeproject.com/KB/IP/apptraffwatcher.aspx
You may be able to tear apart the source code, and grab what you need to meassure download/upload for your application's process ID.
It looks like he uses this library to get information about the amount of traffic: http://www.codeproject.com/KB/IP/trafficwatcher.aspx

I tried the perfmon and I was unable to watch our network traffic either. But I was able to in the Performance Explorer in Visual Studio 2005 Team suite.
If you have Team edition Visual Studio you can set up either Sampling/Instrumentation on your desktop application. Then go into options of the session. select Events -> Windows Kernel Trace -> Network. Run your application and let the Visual studio log the data. Then save the report. (I love Microsoft for this "feature") go to the command prompt, navigate to C:\Program Files\Microsoft Visual Studio 8\Team Tools\Performance Tools and run "vsperfreport /CALLTRACE (filename).vsp" This will produce a csv file containing all network packets sent/recieved/size/port etc by the desktop application.
I know this was a long winded solution but I just tried it on my .Net 2.0 application and it captured all of our communication with Oracle Identity Manager and Oracle Database.

It is not clear by your post if you are using HTTP requests. You indicated that the server is a web application, which implies (perhaps incorrectly) to me that you might be using the HTTP protocol to send/receive data from server to client.
If so, one tool that might be of use is Fiddler. This tool will monitor all HTTP traffic in and out of your workstation and it can (I believe) watch specific sessions and applications. The nice part is that you can see individual requests and see the statistics for these requests, including bytes in/out, round trip times, and other useful bits of information.
If you are not HTTP based, then this tool won't help.

I'm surprised nobody has suggested SysInternals (now Microsoft) Process Explorer (technet.microsoft.com/en-us/sysinternals/bb896653.aspx). If you right click on the executable in question and left click properties it will bring up a dialog box. Then you switch to the performance tab and you can monitor I/O of the executable. The Performance Graph tab will show CPU usage and I/O bytes history graphed over time. It's a cool and free tool.

You want to look at perfmon (otherwise called Performance Monitor in admin tools off the start menu).
Open it in its default graph view, add a counter, select network interface, then bytes per second (or a similar counter), click ok and you're done.
You can experiment with the other networking counters as there are many, one of them will do exactly what you want. You can also save the perfmon logs to a file and view them afterwards - you'll see the graph in its entirety and you can "zoom in" on sections. Alternatively, you can save log-style files with just raw numbers.
Here's a quick guide through perfmon as an admin tool, once you understand that, the rest comes easily.
In Vista you can't add individual counters any more, you add the entire set of counters grouped under an object - so for my example, you'd add the Network Interface object, then you'd see all the individual counters on the graph after you click ok.

If you want this built into your client codebase, and not using an external tool, you can use Performance Counters to get access to this and most other things reported by the Performance Monitor, Task Manager, etc.

You should check out ACE Analyst for this use case - think of it as a superintelligent layer on top of Wireshark packet captures. You need to look at the packets to understand the true nature of the application behavior as runs across the network.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.