We have here a windows server and one day we will get via sftp some text files in a folder. I dont have more information, but maybe this is enough. Now I should write a function that is moving these files into another folder. Well that should not be that hard, I thought... but now I realized that Im able to move a file before its finished. So I was searching for some solutions and Im really confused.
My solution would be to check the file and the processes around it. Because if the file is not finished yet, there is a copy-process and I can check this process. To make this easy, I just have to try to lock the file and if there is no another process, well then the file is ready for move?
using (File.Open("myFile", FileMode.Open, FileAccess.Read, FileShare.None))
{ /*rdy!*/ }
But now I see that people are writing something about checksum test or to test the filesize and if the filesize is not changing then the file is ready. Is this stuff not a little bit complicated? Please tell me that my solution could work also... Im not able to test it with any server to server sftp stuff. I just know that if I copy a file to another folder (via explorer) this is working. Does this work via sftp transfer as well? Any ideas? Thank you
File-size checks are dangerous - what if the upload is suspended and later resumed? How much time should go by until you accept the current file size as the final file size? => Not a good solution.
I'd go for the locking, however, this only works if the process that writes the file also opens the file in a way so that it is locked exclusively. If the process doesn't do that, you'll be stuck with your problem again.
Another solution would be to upload the files with temporary names, like ".sftptmp". And to have the uploader rename it after it is done. That way you can be sure the file has been uploaded - just ignore all files that end with ".sftptmp". This, however, assumes that you actually have control over the process of uploading files.
Another option is to have the sender put a control file after the data file. For example, put uploadfile-20220714.txt, then put uploadfile-20220714.ctl. The control file can contain file information such as the name and size of the data file. This option requires the sender to modify their process, but it shouldn't require too much effort.
Related
I have a server where word files are stored, i want by pressing a link the user can open the word file from the server, edit in it then saving it back to the server.
till now i figured out that i cant do so directly, but to save the file locally , edit on it then upload it back again.
so is there is a better way to do so? if not how to wait till the file is saved then automatically be uploaded again .
I think your best bet is probably to look into some of the controls out there you can buy to do this. Not necessarily cheap but depends on how much you need this functionality:
https://www.syncfusion.com/aspnet-mvc-ui-controls/word-processor
Regarding detecting a file change locally... you can use the FileSystemWatcher to do this on a folder... this isn't something that will work inherently from a website though. YOu would need to implement and distribute a windows service that your users would install and they would need to download the file they were changing into the folder(s) that your FileSystemWatcher was watching...
https://learn.microsoft.com/en-us/dotnet/api/system.io.filesystemwatcher?view=netframework-4.8
https://learn.microsoft.com/en-us/dotnet/framework/windows-services/walkthrough-creating-a-windows-service-application-in-the-component-designer
Hope this helps.
I want to provide my users with the ability to post a log file onto my server rather than requiring them to find it and email it to me.
I am using this library http://ftplib.codeplex.com/ to open an ftp session on the server and send the file. There is a bit of renaming involved but that is it.
Unfortunately the log file to be sent is actually open so I got a 'file is being used by another process' exception. This makes sense when I think about it in so far as the log is open while my app is running. I closed it but, of course, uploading is a long process. I put the upload code into a background thread so that the user may continue. However the log cannot be re-opened until the upload is complete. In the meantime there could be some event that should be written to the log.
So I am looking for a way to copy the log and then upload it. What would be the best way to do that? The log is a binary file BTW.
If you don't own the code that has the log file open (ie, it's another app or a closed source dll), you can try doing a File.Copy(<log>, <tempdest>) and send that file, deleting it when you're done. This only sometimes works when you don't have read access to the file.
If you do own the code that is accessing the file in the first place, you want to open it with an explicit ShareMode ie
File.Open(path, FileMode.Open, FileAccess.ReadWrite, FileShare.Read)
I'm making a little app in C#/.NET that watch for the creation of a file and when it is created it gets its content, parse it and write it in another file.
Everything is working fine so far. But the problem is : there's another process that watch for this file as well. My process is only READING the file while the second one reads it and then DELETES it.
My application is making its job but when it reads the file, the other process can't read it and totally crashes (Not made by me and don't have the sources to fix it).
My application is running very fast and other open the files for a very little time to get the content and put it in a variable so it could close the file faster and then parse the content of the file which is in the variable.
I clearly don't know how but I'd like to be able to read the file and let the other read the file at the same time without any hiccups. Is it possible? I still think that there will be a problem about the fact that the file is being deleted after the other app is done parsing it...
Any suggestions or ideas?
Thanks very much!
You can open the file as follows to ensure you don't lock it from other processes:
using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
// do your stuff
}
But if the other process is trying to open it in exclusive mode, it won't help and it will still crash. There's no way to deal with that other than fixing the code for the other process.
KISS: Can you have the file created in a location which the first program isn't looking at, but your software is - and when you are done processing it you then move it to the current location where the first program is looking?
Otherwise:
You are going to have contention since it's going to be a race to see which process actually "notices" the file first and begins working.
I'm assuming you also don't have any control over the process creating the file?
In that case you might look at PsSuspend or PauseSp - if you can control the other process by suspending it until you are ready for it (done with the file) then that might be viable. Not sure how robust this would be.
There's also still the potential race condition of "noticing" the file and performing an action (whatever it is) - keeping the other process paused perpetually until you want it to run (or killing it and starting it) is the only completely deterministic way to achieve what you want within the constraints.
If you are using an NTFS drive (which is very likely), then you can create a hard-link to the file. Essentially, this duplicates the file without actually creating a duplicate. You can read the file with the hard-link. The other process can delete the file, which will only remove their link to the file. This will leave the file in place for you to read. When your program is done reading the file, it can delete the hard-link, and the file system will see that both links have been deleted, and it will delete the file itself.
This can be done from the command line with
fsutil hardlink create <NewFileName> <ExistingFileName>
Or you can P/Invoke the CreateHardLink function in the Windows API.
Can you create another empty zero bytes file called .reading file which has the same name but extension "reading" to it. Then once first process is done reading the file, rename .reading to .done and the second process can check .done files and delete the original file,since both .done and original file have same name but different extensions ?.
#Prashant's response gave me the inspiration for this, and it's very similar, but I believe will solve your problem.
If the other process must match a certain filename pattern
Rename the file to something that
won't match first, a very cheap/fast
operation
Rename it back when finished
If it matches every file in a given folder
Move it to another folder (also a very cheap operation in most filesystems)
Move it back when finished.
If the other process had already locked your file (even for read) then your process would fail, and you can make that graceful. If not you should be safe.
There is still a race condition possibility, of course, but this should be much safer than what you are doing.
I want to replace existing files on an IIS website with updated versions. Say these files are large pdf documents, which can be accessed via hyperlinks. The site is up 24x7, so I'm concerned about locking issues when a file is being updated at exactly the same time that someone is trying to read the file.
The files are updated using C# code run on the server.
I can think of two options for opening the file for writing.
Option 1) Open the file for writing, using FileShare.Read :
using (FileStream stream = new FileStream(path, FileMode.Create, FileAccess.Write, FileShare.Read))
While this file is open, and a user requests the same file for reading in a web browser via a hyperlink, the document opens up as a blank page.
Option 2) Open the file for writing using FileShare.None :
using (FileStream stream = new FileStream(path, FileMode.Create, FileAccess.Write, FileShare.None))
While this file is open, and a user requests the same file for reading in a web browser via a hyperlink, the browser shows an error. In IE 8, you get HTTP 500, "The website cannot display the page", and in Firefox 3.5, you get : "The process cannot access the file because it is being used by another process."
The browser behaviour kind of makes sense, and seem reasonable. I guess its highly unlikely that a user will attempt to read a file at exactly the same time you are updating it. It would be nice if somehow, the file update was atomic, like updating a database with SQL wrapped around a transaction.
I'm wondering if you guys worry about this sort of thing, and prefer either of the above options, or even have other options of your own for updating files.
How about copying the new version of the file with a different name and then do File.Move() setting the overwrite argument to true? While you're writing it you won't interfere with the web server and moving the file(s) will be quick.
I usually don't worry about this kind of problem, but you could work it around this way:
Create a link to an ASPX page which downloads referenced file, like "Download.aspx?Name=Document1.pdf"
In that page, before download that file, look for a folder named "Updated"
If you find it, get your file from it
If not, go get it from "Current" folder
To update your files:
Create a folder name "Updating"
Copy your new files into it
Rename it to "Updated" (so new downloads use it as source)
Update your "Current" folder
Delete your "Updated" folder
One option is to build an extra layer between the hyperlink and the file. Instead of linking directly to the file have the hyperlink point to another page (or similar resource). This resource/page can then determine which is the latest file that needs to be sent to the browser and then send it down to the browser. This way the link will always be the same, but the file can change.
This is a threading issue at heart, and those can be tricky to solve in a clean way. Conceptually, you want to synchronize read and write access to the file from multiple threads.
To achieve this, I would store the file outside of IIS' website root so that IIS does not serve it directly. I would create a separate HttpHandler to serve it instead. The HttpHandler would lock on an object that the write code would also lock on. You should use a ReaderWriterLockSlim so that you can have multiple concurrent reads (EnterReadLock) while also ensuring only a single write thread can execute (EnterWriteLock) and that reads are blocked while writing.
Having a similar issue, I developed my own solution (basically having multiple versions of a file, serving them through an ASHX handler).
Please see my CodeProject article discussing the solution (and possible caveats)
We have a C# Windows service polling a folder waiting for an FTP’ed file to be posted in. To avoid using the file when it is still being written to we attempt to get a lock on the file first, however, there seems to be occasions where we are getting a lock on the file after the FTP’ed file is created but before the file is written to, so we end up opening an empty file.
Is there a reliable anyway to tell if the FTP is complete?
You could possibly change the filename before upload, then rename it after it's done. that way it will look like it doesn't exist until finished.
A practice I've seen done is you transfer two files, One which is the actual file, then a second one which will we can call a .done file. The ideal is as soon as you see the .done file you know the first file should be done.
Other options include watching the file for modifications and wait for a certain ammount of time of no modifications. Of course this is not full proof.
Edit
Kyle makes a good point that adding a checksum to the .done file and/or indicating the size of the first file is a good protection against fringe cases.
I'm always a big fan of the .filepart protocol, so that no matter what transfer protocol you use (ftp,ssh,rsync,etc) you have the same understanding.
This isn't a direct answer to your question, but instead of searching for a ftp-only solution a more generic solution could be better for you in long run.
(.filepart: rename the file,test.txt to test.txt.filepart, then when it is done, name it back to test.txt)
What about using a folder watcher to index the contents and if a files size does not change within 5 mins you can pretty-much guarantee the upload has been finished.
The time out could be tied to the timeout of your FTP server to.
http://www.codeproject.com/KB/files/MonitorFolderActivity.aspx
I've always used a checksum file. So you send a checksum file that denotes the filesize and the checksum. You'll know the file is uploaded correctly when the checksum in the first file matches the actual checksum on the file system.
The method I've used in the past is a mix of some of the other replies here.
i.e. FTP a file using a different extension to the one expected (eg FILENAME.part) then rename it with the proper extension as the last step of uploading.
On the server, use a FileSystemWatcher to look for new files with the correct extension.
The FSW will not see the file until it's renamed, and the renaming operation is atomic so the file will be complete and available the moment it's been renamed.
Renaming or moving files of course relies on you having control over the uploading process.
If you do not have any control over how the files are uploaded, you will be stuck with using the FSW to know a file is being uploaded, then monitoring it's size - when it's unchanged for a long period of time you may be able to assume it's complete.
Rather than polling, you might want to have a look at System.IO.FileSystemWatcher.