Rename file extensions in a sequential order - c#

Windows Service - C# - VS2010
I have multiple instances of a FileWatcher Service. Each one looks for a different extension in the directory. I have a separate Router service that monitors the directory for zip files and renames the extensions to one of the values that the services look at.
Example:
Directory in question (all FileWatcher Services monitor this directory) contains the following files:
a.zip, b.zip, c.zip
FileWatcher1 looks for extensions of *.000, FileWatcher2 looks for extensions of *.001, FileWatcher3 looks for extensions of *.002
The Router will see the .zip files and change the file extensions on the zip files, but it should keep in sequence in order to delegate the same amount of work to each FileWatcher.
Also, if there are two zip files dropped, it would change a.zip -> a.000, and b.zip -> b.001, but if 5 minutes go by and another batch of zip files are dropped, it should know to rename the next file to *.002.
I have everything working fine, but now I need to implement the sequential part to the Router and am not sure the best way of implementation (currently router is changing every extension to *.000 thus only one FileWatcher is getting the work). I know this might be considered a cheap way of doing this but it's all we really need at the moment. Any help would be appreciated.

Maybe a different way of looking at it. Have you thought about having a single watcher and then using a thread pool? The reason why I am suggesting this is that you will have to start looking at the sizes and complexities of the fields to adequately distribute the work. You might start pushing more work to .000 because it's next in line when it is still busy processing a large amount of data from the first job whereas .001 could be free as it was processing a small file.
If you really want to get around the problem of the next extension in line, why not just keep a static variable with the next extension number. I am not 100% sure if the Router Filewatcher will run multiple threads when it sees new files one after the other but I don't think so. If that does happen then you will need to put some thread safety code when accessing the static variable.

Can the Router just keep a counter and do a mod 3 (or N, where N is the number of watchers) operation for every new file?

Related

is there a solution to save on one file by multiple app in same time in C#?

I have a program that runs under the internal network
So several copies of this program are executed simultaneously and all of them are saved and restored on one file.
How do I do this?
If the file is xls, it gives an error.
How does my program write to a file at the same time?
If I'm understanding well your question, I don't think you can write from several sources to a same file, see race condition: file system, but I believe you can simulate this behavior by having several files (one for each instance of these application) and then every certain time update your main file with the gathered data.
At the same time is a bad idea ->
Racecondition, use something which is ACID (Database f.e.) to avoid such problems. If it has to be a file use some mutual exclusion mechanism like semaphores (semaphores in .net).

Multiple files in log4net with same logger

I'm currently working in a chat API, and I receive multiple requests at the same time, from different sessions, so its almost impossible to track each
conversation separately, because it mixes with all the others logs from other conversations.
So I want to create a separated file for each session(conversation) dynamically, with the filename as the sessionId, but if I create multiple loggers, my application just freeze, because I can have more than 100 sessions simultaneously.
I have also tried to change the file path (programmatically) for each request with its id on it, but it also freezes the application after 1-2 hours.
Is there any solution for this problem?
If these conversation files are so important, consider other options than logging. A database might be appropriate.
Another solution might be to parse the log files and split them into conversation files in a separate (logical?) process (perhaps later, after the session has ended.) This way the program doesn't need to keep track of many files at the same time and parsing can be done faster/more efficiently.

Best file mutex in .NET 3.5

I want to use some mutex on files, so any process won't touch certain files before other stop using them. How can I do it in .NET 3.5? Here are some details:
I have some service, which checks every period of time if there are any files/directories in certain folder and if there are, service's doing something with it.
My other process is responsible for moving files (and directories) into certain folder and everything works just fine.
But I'm worrying because there can be situation, when my copying process will copy the files to certain folder and in the same time (in the same milisecond) my service will check if there are some files, and will do something with them (but not with all of them, because it checked during the copying).
So my idea is to put some mutex in there (maybe one extra file can be used as a mutex?), so service won't check anything until copying is done.
How can I achieve something like that in possibly easy way?
Thanks for any help.
The canonical way to achieve this is the filename:
Process A copies the files to e.g. "somefile.ext.noprocess" (this is non-atomic)
Process B ignores all files with the ".noprocess" suffix
After Process B has finished copying, it renames the file to "somefile.ext"
Next time Process B checks, it sees the file and starts processing.
If you have more than one file, that have to be processd together (or none), you need to adapt this scheme to an additional transaction file containing the file names for the transaction: Only if this file exists and has the correct name, must process B read it and process the files mentioned in it.
Your problem really is not of mutual exclusion, but of atomicity. Copying multiple files is not an atomic operation, and so it is possible to observe the files in a half-copied state which you'd like to prevent.
To solve your problem, you could hinge your entire operation on a single atomic file system operation, for example renaming (or moving) of a folder. That way no one can observe an intermediate state. You can do it as follows:
Copy the files to a folder outside the monitored folder, but on the same drive.
When the copying operation is complete, move the folder inside the monitored folder. To any outside process, all the files would appear at once, and it would have no chance to see only part of the files.

directory monitoring

What is the best way for me to check for new files added to a directory, I dont think the filesystemwatcher would be suitable as this is not an always on service but a method that runs when my program starts up.
there are over 20,000 files in the folder structure I am monitoring, at present I am checking each file individually to see if the filepath is in my database table, however this is taking around ten minutes and I would like to speed it up is possible,
I can store the date the folder was last checked - is it easy to get all files with createddate > last checked date.
anyone got any Ideas?
Thanks
Mark
Your approach is the only feasible (i.e. file system watcher allows you to see changes, not check on start).
Find out what takes so long. 20.000 checks should not take 10 minutes - maybe 1 maximum. Your program is written slowly. How do you test it?
Hint: do not ask the database, get a list of all files into memory, a list of all filesi n the database, check in memory. 20.000 SQL statements to the database are too slow, this way you need ONE to get the list.
10 minutes seems awfully long for 20,000 files. How are you going about doing the comparison? Your suggestion doesn't account for deleted files either. If you want to remove those from the database, you will have to do a full comparison.
Perhaps the problem is the database round trips. You can retrieve a known file list from the database in large chunks (or all at once), sorted alphabetically. Sort the local file list as well and walk the two lists, processing missing or new entries as you go along.
FileSystemWatcher is not reliable, so even if you could use a service, it would not necessarily work for you.
The two options I can see are:
Keep a list of files you know about and keep comparing to this list. This will allow you to see if files were added, deleted etc. Keep this list in memory, instead of querying the database for each file.
As you suggest, store a timestamp and compare to that.
You can write in somewhere the last timestamp that onfile was created, it is simple and can work for you.
Can you write a service that runs on that machine? The service can then use FileSystemWtcher
Having a FileSystemWatcher service like Kevin Jones suggests is probably the most pragmatic answer, but there are some other options.
You can watch the directory with inotify if you mount it with Samba on a linux box. That of course assumes you don't mind fragmenting your platform, but that's what inotify is there for.
And then more correctly but with correspondingly less chance of you getting a go-ahead, if you're sitting monitoring a directory with 20K files in it it is probably time to evolve your system architecture. Not knowing all that much more about your application, it sounds like a message queue might be worth looking at.

Best way to use SFTP folder as concurrent work queue

I am writing a c# windows service which will be polling an SFTP folder for new files (one file = one job) and processing them. Multiple instances of the service may be running at the same time, so it is important that they do not step on each other.
I realize that an SFTP folder does not make an ideal queue, but that's what I have to work with. What do I need to do to either use this SFTP folder as a concurrent message queue, or safely represent it in a way that can be used concurrently?
Seems like your biggest problem would be dealing with multiple instances of the program stepping on each other and processing the same files.
The way I've handled this in the past is to have the program grab the first file and immediately rename it from say 'filename.txt' to 'filename.txt.processing'. The processes would be set up to ignore any file ending in '.processing' so that they don't step on each other. I don't think a file rename is perfectly atomic, but I've never had any problems with it.
Multiple instances of the service may
be running at the same time
On the same machine, or different ones?
Not sure if moving a file in Windows is an atomic operation.
If it is, then when a service chooses to work on a file, it should attempt to move the file to another folder.
If the move operation is successful, then it is safe to work on the file.
You could also leveragea datasbase to keep track of which files are being processed, have been processed or are awaiting processing.
This adds the complicatio of updating the table with new files.

Categories