how to find the timestamp of an online pdf file using c#? - c#

I am writing an application that would download and replace a pdf file only if the timestamp is newer than that of the already existing one ...
I know its possible to read the time stamp of a file on a local computer via the code line below,
MessageBox.Show(File.GetCreationTime("C:\\test.pdf").ToString());
is it possible to read the timestamp of a file that is online without downloading it .. ?

Unless the directory containing the file on the site is configured to show raw file listings there's no way to get a timestamp for a file via HTTP. Even with raw listings you'd need to parse the HTML yourself to get at the timestamp.
If you had FTP access to the files then you could do this. If just using the basic FTP capabilities built into the .NET Framework you'd still need to parse the directory listing to get at the date. However there are third party FTP libraries that fill in the gaps such as editFTPnet where you get a FTPFile class.
Updated:
Per comment:
If I were to set up a simple html file with the dates and filenames
written manually , I could simply read that to find out which files
have actually been updated and download just the required files . is
that a feasible solution ..
That would be one approach, or if you have scripting available (ASP.NET, ASP, PHP, Perl, etc) then you could automate this and have the script get the timestamp of the files(s) and render them for you. Or you could write a very simple web service that returns a JSON or XML blob containing the timestamps for the files which would be less hassle to parse than some HTML.

It's only possible if the web server explicitly serves that data to you. The creation date for a file is part of the file system. However, when you're downloading something over HTTP it's not part of a file system at that point.
HTTP doesn't have a concept of "files" in the way people generally think. Instead, what would otherwise be a "file" is transferred as response data with a response header that gives information about the data. The header can specify the type of the data (such as a PDF "file") and even specify a default name to use if the client decides to save the data as a file on the client's local file system.
However, even when saving that, it's a new file on the client's local file system. It has no knowledge of the original file which produced the data that was served by the web server.

Related

Generating XML Files & Returning Zipped In A Folder Within Docker

I am currently creating an API (asp.net core 2.2) which takes in a json input, generates XML files based on the json, adds them to a folder structure and then returns this folder structure zipped as a response. I am currently struggling to imagine how I am going to store the file once generated within Docker and then how to add them to a zip file in a specific folder format to return to the requester.
I created a similar app (without using APIs) in Winforms, meaning I could just add folders and files to a temporary file on the clients machine and then zip it up all locally, but I'm wondering whether this is possible on Docker too (or if there is a better way all together?). Once the zip file has been returned to the requester there is no requirement to keep it stored anywhere either.
I have done some research into using file/memory streams but not come across anything particularly useful.
Any resources or recommendations would be greatly appreciated!

Without first parsing, convert Excel file to Json string

Okay, this is different than other posts I'm seeing. I'm not trying to first open an Excel file and parse the contents into a Json object. I'm trying to take the file and convert it to a stream object of some sort or byte[] and then convert that to Json so I can use it as an input parameter to a POST method for a WebAPI.
Here is the full scenario.
I have clients that will use an internal-only website to select one or more Excel files. The workstations the users work on may or may not have Excel installed, thus, all of my Excel processing has to be done on the server. Once the Excel files are processed, they are combined into a System.Data.DataTable and the values are aggregated into one master report. This aggregated report needs to be returned to the client system so it can be saved.
I currently have this site working just fine in ASP.NET using C#. However, I need the "guts" of the website to be a WebAPI so that automation programs I have can make calls directly to the WebAPI and accomplish the same task that the internal-only website does. This will allow all processing for this sort of task to run through one code base (right now, there are about 4 versions of this task and they all behave differently, providing differing output).
The way I thought to do this was to, from the client, convert the Excel files to an array of System.IO.MemoryStream objects, then serialize the full array as a Json.NET stream and upload the stream to the webserver where it will be deserialized back into an array of MemoryStream. Once that is done, I can iterate the array and process each Excel file by the MemoryStream.
My problem is I can't figure out how to convert the MemoryStream[] into Json and then deserialize that up on the server.
Rather than trying to pass the excel file around as JSON let the user upload the file to the server and then process it from there.
In the JSON rather than giving the content of the file put a link to the file.

Get recent uploaded file from FTP

Is there any way in C# to get the recent uploaded file?
Whenever a new file is uploaded to the FTP, a trigger should be raised that this is the new file that is added.
I achieved it at a level using FtpWebRequest and WINSCP (check for new files which has last modified date with in 5 minutes) but there is a use case which is failing here.
Lets say a file is modified last on 01/01/2018 and I uploaded this file on FTP today then as per its last modified date it wont be processed.
Is there any way by which I can check which file is uploaded recently.
You can only use the information that the FTP server provides you with. And it won't tell you, what files were added. If you cannot use file modification time, you are out of luck. Except maybe if the server provides a file creation (not modification) timestamp. But I do not know of any major FTP server that does.
So all you can do, is to remember a list of files on the server and compare a current list again a previous one, to find what files were added.

Temporary files without saving to HDD

Here is my case:
I'm using ABCPDF to generate a HTML document from a .DOCX file that I need to show on the web.
When you export to HTML from ABCPDF you generate a HTML and a folder with support files (.css, .js, .png)
Now these HTML files may contain quite sensitive data so I immediately after generating the files, I move them to a password-protected .zip file (from which I fetch them later)
The problem is, that this leaves the files unencrypted on the HDD for a few seconds and even longer if I'm (for some reason) unable to delete them at once.
I'd like suggestions for another way of doing this. I've looked in to a ram drive, but I'm not happy with installing such drivers on my servers. (AND the RAM drive would still be accessible from the OS)
The cause of the problem here might be that ABCPDF can only export HTML as files (since its multiple files) and not as a stream.
Any ideas?
I'm using .NET 4.6.x and c#
Since all your files except the .HTML are anonymous, you can use the suggested way of writing the HTML to a stream. Only all other files will be stored to the file system.
http://www.websupergoo.com/helppdfnet/source/5-abcpdf/doc/1-methods/save.htm
When saving to a Stream the format can be indicated using a Doc.SaveOptions.FileExtension property such as ".htm" or ".xps". For HTML you must provide a sensible value for the Doc.SaveOptions.Folder property.
http://www.websupergoo.com/helppdfnet/source/5-abcpdf/xsaveoptions/2-properties/folder.htm
This property specifies the folder where to store additional data such as images and fonts. it is only used when exporting documents to HTML. It is ignored otherwise.
For a start, try using a simple MemoryStream to hold the sensitive data. If you get large files or high traffic, open an encrypted stream to a file on your system.

How can I tell if a file on an FTP is identical to a local file with out actually downloading the file?

I'm writing a simple program that is used to synchronize files to an FTP. I want to be able to check if the local version of a file is different from the remote version, so I can tell if the file(s) need to be transfered. I could check the file size, but that's not 100% reliable because obviously it's possible for two files to be the same size but contain different data. The date/time the files were modified is also not reliable as the user's computer date could be set wrong.
Is there some other way to tell if a local file and a file on an FTP are identical?
There isn't a generic way. If the ftp site includes a checksum file, you can download that (which will be a lot quicker since a checksum is quite small) and then see if the checksums match. But of course, this relies on the owner of the ftp site creating a checksum file and keeping it up to date.
Other then that, you are S.O.L.
If the server is plain-old FTP, you can't do any better than checking the size and timestamps.
FTP has no mechanism for giving you the hashes/checksums of files, so you would need to do something like keeping a special "listing file" that has all the file names and hashes, or doing a separate request via HTTP, or some other protocol.
Ideally, you should not be using FTP anyway, it's really an obsolete protocol. If you have control of the system, you could use rsync or something like it.
Use a checksum. You generate the md5 (or sha1, sha2 etc) hash of both files, and if the files are identical, then the hashes will be identical.
IETF tried to achieve this by adding new FTP commands such as MD5 and MMD5.
http://www.faqs.org/rfcs/ftp-rfcs.html
However, no all FTP vendors support them. So you must have a check on the targeting FTP server you application will work against to see if it supports MD5/MMD5. If not, you can pick up the workarounds mentioned above.
Couldn't you use a FileSystemWatcher and just have the client remember what changed?
http://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher.aspx
Whenever your client uploads files to the FTP server, map each file to its hash and store it locally on the client computer (or store it anywhere you can access later, format doesnt matter, can be an xml file, plain text, as long as you can retreive the key/value pairs). Then when you upload files again just check the local files with the hash table you created, if it is different then upload the file. This way you don't have to rely on the server to maintain a checksum file and you dont have to have a process running to monitor the FileSystemWatcher events.

Categories