Get recent uploaded file from FTP - c#

Is there any way in C# to get the recent uploaded file?
Whenever a new file is uploaded to the FTP, a trigger should be raised that this is the new file that is added.
I achieved it at a level using FtpWebRequest and WINSCP (check for new files which has last modified date with in 5 minutes) but there is a use case which is failing here.
Lets say a file is modified last on 01/01/2018 and I uploaded this file on FTP today then as per its last modified date it wont be processed.
Is there any way by which I can check which file is uploaded recently.

You can only use the information that the FTP server provides you with. And it won't tell you, what files were added. If you cannot use file modification time, you are out of luck. Except maybe if the server provides a file creation (not modification) timestamp. But I do not know of any major FTP server that does.
So all you can do, is to remember a list of files on the server and compare a current list again a previous one, to find what files were added.

Related

SSH.NET: Is it possible to upload files using SFTP and preserve the file dates from source files?

Currently, I am using the Renci SSH.NET library to upload files to a Unix server using SFTP. One thing that I don't like is that after uploading files, the creation- and modified dates are altered to the time when the upload took place.
I would like to preserve the original file dates from the source files, is that possible?
The SSH.NET library won't do it for you automatically. You have to code it.
There are SftpClient.SetLastWriteTime and SftpClient.SetLastWriteTimeUtc methods. But they are actually not implemented yet.
You can code it like this instead:
SftpFileAttributes fileAttributes = client.GetAttributes(targetFile);
fileAttributes.LastWriteTime = File.GetLastWriteTime(sourceFile);
client.SetAttributes(targetFile, fileAttributes);
Though due to a lack of UTC API in the SftpFileAttributes, you might have problems setting the timestamp correctly, if a client and a server are not in the same timezone.
For more details, see my answer to:
Modified date time changes when moving a file from Windows to UNIX server using SSH.NET
Or use another SFTP library capable for preserving the timestamp automatically, ideally with an UTC support.
For example, WinSCP .NET assembly does it automatically. Just use the Session.PutFiles method:
session.PutFiles(sourceFile, targetFile).Check();
(I'm the author of WinSCP)

Can you pre-compress data files to be inserted into a zip file at a later time to improve performance?

As part of our installer build, we have to zip thousands of large data files into about ten or twenty 'packages' with a few hundred (or even thousands of) files in each which are all dependent on being kept with the other files in the package. (They are versioned together if you will.)
Then during the actual install, the user selects which packages they want included on their system. This also lets them download updates to the packages from our site as one large, versioned file rather than asking them to download thousands of individual ones which could also lead to them being out of sync with others in the same package.
Since these are data files, some of them change regularly during the design and coding stages, meaning we then have to re-compress all files in that particular zip package, even if only one file has changed. This makes the packaging step of our installer build take well over an hour each time, with most of that going to re-compressing things that we haven't touched.
We've looked into leaving the zip packages alone, then replacing specific files inside them, but inserting and removing large files from the middle of a zip doesn't give us that much of a performance boost. (A little, but not enough that its worth it.)
I'm wondering if its possible to pre-process files down into a cached raw 'compressed state' that matches how it would be written to the zip package, but only the data itself, not the zip header info, etc.
My thinking is if that is possible, during our build step, we would first look for any data file that doesn't have a compressed cache associated with it, and if not, we would compress that file and write the result to the cache.
Next we would simply append all of the caches together in a file stream, adding any appropriate zip header needed for the files.
This would mean we are still recreating the entire zip during each build, but we are only recompressing data that has changed. The rest would just be written as-is which is very fast since it is a straight write-to-disk. And if a data file changes, its cache is destroyed, so next build-pass it would be recreated.
However, I'm not sure such a thing is possible. Is it, and if so, is there any documentation to show how one would go about attempting this?
Yes, that's possible. The most straightforward approach would be to zip each file individually into its own associated zip archive with one entry. When any file is modified, you replace its associated zip file to keep all of those up to date. Then you can write a simple program to take a set of those single entry zip files and merge them into a single zip file. You will need to refer to the documentation in the PKZip appnote. Take a look at that.
Now that you've read the appnote, what you need to do is use the local header, data, and central header from each individual zip file, write the local header and data as is sequentially to the new zip file, and save the central header and the offsets of the local headers in the new file. Then at the end of the new file save the current offset, write a new central directory using the central headers you saved, updating the offsets appropriately, and ending with a new end of central directory record with the offset of the start of the central directory.
Update:
I decided this was a useful enough thing to write. You can get it here.
You could zip each file before hand, and then "zip" them together with no compression at the end to quickly aggregate them into a distributable package. It won't be as efficient as compressing all the data at once, but should be faster to make modifications.
I cannot seem to locate an actual exe that implements this type of functionality. It appears that most existing tools I've tried that have the ability to merge/update will reprocess(compress) the data stream as you have already stated you saw.
However it seems what you describe can be done if you or someone wants to write it. If you take a look at this link for the ZIP file format specification, you can get an overview of the structure you would have to parse out and process. It looks like you can pretty quickly go from file to file gathering up and discarding the files of interest, then merging in your new/updated files. You would still need to rebuild a new central directory (refer to section 4.3.6 of the above linked document) within your new destination archive.
After a little more digging, the DotNetZip Library forum has a message asking about the same type of functionality which also gives a description just like I described above. It also links to this document which seems to indicate that support for that may be added to the DotNetZip library for you to further experiment with.

access a zipped file without unzipping?

My program produces a log of info every hour that the system is running, that contains various data like access times, data transfers and any faults/warnings experienced. unfortunately these log files can be anywhere from 10,000KB to 25,000KB in size, so I've begun zipping them individually once they're at least 24hr old, this way my system has only 24 unzipped log files at any one time.
The issue I need to resolve is that part of this software is a 'Diagnostics' window, where the user can load up log files from a selected date range based on file's creation time and view their contents in an easy to read format. I understand that in order for the files to show up in their search there must be an exception allowing .zip to be checked as well, but I cannot access any of the file's data to see if said .zip files fall into the date range.
My question is: is their a way for me to access the zipped file's information (and to further extent it's contents) without having to unzip the files, do the search, re-zip the files? that seems like too much work to unzip one hundred or more files if only 1 or 2 fall in your date range.
You should add a timestamp to the filename of each zipped file.
In general, when you zip a file you're putting the actual data of the file into a format that is unreadable. Most zipping algorithms (keep in mind that there are very many) work on a very bit-hacky level, which is why you really need to unzip the files to get your original data out. (There's no such thing as a free lunch.)
Luckily though, a file is not just a file! Because you're totally right, having to read a file to do things with it would be terrible! Imagine having to search a file system if you had to read each file to figure out where in the directory it was.
There are a number of ways to access the metadata associated with your file depending on what exact system you're on. For instance, in unix-style machines using the command ls -l will get you the last edited information.
That said, log files usually have names that start with a timestamp for this exact reason. If you want to keep your filenames pretty though, going through the last-edited date is probably the way to go.
A good zip library (e.g. SharpZipLib) ought to allow you to iterate over the files contained in the archive without extracting them. This will allow you to query the associated file dates. For example, using the aforementioned SharpZipLib, you would just need to inspect the DateTime property of the ZipEntry objects contained in the archive.

how to find the timestamp of an online pdf file using c#?

I am writing an application that would download and replace a pdf file only if the timestamp is newer than that of the already existing one ...
I know its possible to read the time stamp of a file on a local computer via the code line below,
MessageBox.Show(File.GetCreationTime("C:\\test.pdf").ToString());
is it possible to read the timestamp of a file that is online without downloading it .. ?
Unless the directory containing the file on the site is configured to show raw file listings there's no way to get a timestamp for a file via HTTP. Even with raw listings you'd need to parse the HTML yourself to get at the timestamp.
If you had FTP access to the files then you could do this. If just using the basic FTP capabilities built into the .NET Framework you'd still need to parse the directory listing to get at the date. However there are third party FTP libraries that fill in the gaps such as editFTPnet where you get a FTPFile class.
Updated:
Per comment:
If I were to set up a simple html file with the dates and filenames
written manually , I could simply read that to find out which files
have actually been updated and download just the required files . is
that a feasible solution ..
That would be one approach, or if you have scripting available (ASP.NET, ASP, PHP, Perl, etc) then you could automate this and have the script get the timestamp of the files(s) and render them for you. Or you could write a very simple web service that returns a JSON or XML blob containing the timestamps for the files which would be less hassle to parse than some HTML.
It's only possible if the web server explicitly serves that data to you. The creation date for a file is part of the file system. However, when you're downloading something over HTTP it's not part of a file system at that point.
HTTP doesn't have a concept of "files" in the way people generally think. Instead, what would otherwise be a "file" is transferred as response data with a response header that gives information about the data. The header can specify the type of the data (such as a PDF "file") and even specify a default name to use if the client decides to save the data as a file on the client's local file system.
However, even when saving that, it's a new file on the client's local file system. It has no knowledge of the original file which produced the data that was served by the web server.

How can I tell if a file on an FTP is identical to a local file with out actually downloading the file?

I'm writing a simple program that is used to synchronize files to an FTP. I want to be able to check if the local version of a file is different from the remote version, so I can tell if the file(s) need to be transfered. I could check the file size, but that's not 100% reliable because obviously it's possible for two files to be the same size but contain different data. The date/time the files were modified is also not reliable as the user's computer date could be set wrong.
Is there some other way to tell if a local file and a file on an FTP are identical?
There isn't a generic way. If the ftp site includes a checksum file, you can download that (which will be a lot quicker since a checksum is quite small) and then see if the checksums match. But of course, this relies on the owner of the ftp site creating a checksum file and keeping it up to date.
Other then that, you are S.O.L.
If the server is plain-old FTP, you can't do any better than checking the size and timestamps.
FTP has no mechanism for giving you the hashes/checksums of files, so you would need to do something like keeping a special "listing file" that has all the file names and hashes, or doing a separate request via HTTP, or some other protocol.
Ideally, you should not be using FTP anyway, it's really an obsolete protocol. If you have control of the system, you could use rsync or something like it.
Use a checksum. You generate the md5 (or sha1, sha2 etc) hash of both files, and if the files are identical, then the hashes will be identical.
IETF tried to achieve this by adding new FTP commands such as MD5 and MMD5.
http://www.faqs.org/rfcs/ftp-rfcs.html
However, no all FTP vendors support them. So you must have a check on the targeting FTP server you application will work against to see if it supports MD5/MMD5. If not, you can pick up the workarounds mentioned above.
Couldn't you use a FileSystemWatcher and just have the client remember what changed?
http://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher.aspx
Whenever your client uploads files to the FTP server, map each file to its hash and store it locally on the client computer (or store it anywhere you can access later, format doesnt matter, can be an xml file, plain text, as long as you can retreive the key/value pairs). Then when you upload files again just check the local files with the hash table you created, if it is different then upload the file. This way you don't have to rely on the server to maintain a checksum file and you dont have to have a process running to monitor the FileSystemWatcher events.

Categories