In a project I am doing I want to give users the option of 'securely' deleting a file - as in, overwriting it with random bits or 0's. Is there an easy-ish way of doing this in C#.NET? And how effective would it be?
You could invoke sysinternals SDelete to do this for you. This uses the defragmentation API to handle all those tricky edge cases.
Using the defragmentation API, SDelete
can determine precisely which clusters
on a disk are occupied by data
belonging to compressed, sparse and
encrypted files.
If you want to repackage that logic in a more convenient form, the API is described here.
You can't securely delete a file on a journaling filesystem. The only non-journaling system still in heavy use is fat32. On any other system, the only way to securely delete is to shred the entire hard drive.
EDIT
The reason secure delete doesn't work, is that that data used to overwrite a file might not be stored in the same location as the data it is overwriting.
It seems Microsoft does provide a secure delete tool, but it does not appear to be something that you can use as a drop in replacement.
The only good way to prevent deleted file recover, short of shredding the disk, would be to encrypt the file before it is written to disk.
It wouldn't be secure at all. Instead you may wish to look at alternative solutions like encryption.
One solution would be to encrypt the contents of the data file. A new key would be used each time the file is updated. When you want to "securely delete" the data simply "lose" the encryption key and delete the file. The file will still be on the disk physically but without the encryption key recovery would be impossible.
Here is more detailed explanation as to why "secure" overwrites of files is poor security:
Without a low level tool (outside of .net runtime) you have no access to the physical disk location. Take a filestream on NTFS, when you "open a file for write access" you have no guarantee that the "updated" copy (in this case random 101010 version) will be stored in the same place (thus overwriting the original file). In fact most of the time this is what happens:
1) File x.dat is stored starting at cluster 8493489
2) You open file x.dat for write access. What is returned to you by the OS is merely a pointer to the file stream abstracted by not just the OS but the underlying file system and device drivers (hardware RAID for example) and sometimes the physical disk itself (SSD). You update the contents of the file with random 1 & 0s and close the filestream.
3) The OS likely may (and likely will) write the new file to another cluster (say cluster 4384939). It will then merely update the MFT indicating file x is now stored at 4384939.
To the end user it looks like only one copy of the file exists and it now has random data in it however the original data still exists on the disk.
Instead you should consider encrypting the contents of the file with a different key each time file is saved. When the user wants the file "deleted" delete the key and file. The physical file may remain but without encryption key recovery would be impossible.
Gutmann erasing implementation
I'd first try simply to open the file and overwrite its contents as I would normally do it. Pretty trivial in C#, I won't even bother to write it. However I don't know how secure that would be. For one thing, I'm quite certain it would not work on flash drives and SSD's that use sophisticated algorithms to provide wear leveling. I don't know what would work there, perhaps it would need to be done on driver level, perhaps it would be impossible at all. On normal drives I just don't know what Windows would do. Perhaps it would retain old data as well.
Related
I would like to take a serialized file and save it to my recourses folder in project.
My reason for doing this (maybe there's a better way) is I have a published exe (single executable file) for the program that runs and when it creates a serialized file I don't want it to save it to desktop. I need to somehow save it to my exe without going outside of it.
Any advice on how I could do this?
It's very ugly.....but you could use an "alternative data stream" on NTFS system.
http://ntfs.com/ntfs-multiple.htm
https://learn.microsoft.com/en-us/sysinternals/downloads/streams
How to read and modify NTFS Alternate Data Streams using .NET
https://blogs.msmvps.com/bsonnino/2016/11/24/alternate-data-streams-in-c/
https://oddvar.moe/2018/04/11/putting-data-in-alternate-data-streams-and-how-to-execute-it-part-2/
https://blog.foldersecurityviewer.com/ntfs-alternate-data-streams-the-good-and-the-bad/
https://www.irongeek.com/i.php?page=security/altds
You'll probably have security scanners stopping you from doing it.
In addition if you copy the from an NTFS volume to say FAT, then alternative data streams are lost.
Also some backup software may not backup ADS properly.
https://wiki.sep.de/wiki/index.php/Support_for_NTFS_alternate_data_streams_(ADS)_for_Windows
https://www.2brightsparks.com/resources/articles/ntfs-alternate-data-stream-ads.html
https://community.osr.com/discussion/89308/alternate-data-streams-and-backups
https://social.technet.microsoft.com/Forums/Azure/en-US/007d5442-1cd8-4293-b717-b8fa72606189/ntfs-data-streams-broken-by-design-on-file-copy?forum=winserverfiles
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I need to be able to uniquely identify a file.
The file would be uploaded to a web application and could be uploaded from anywhere. The file could be mailed to someone, renamed, edited and then uploaded from a different machine altogether and need a way of identifying that this was the original file that was originally uploaded. I need a reliable way of identifying a file across different file systems.
I cannot use the filename as the identifier as the file could be renamed. I still need to be able to uniquely identify the file even though it was renamed.
I cannot use the the Hash on the file as the Hash would change if the file was edited.
I understand Linux has inode number property and windows has the IndexNumber. I can use the NtQueryInformationFile and get the indexNumber. The indexNumber was same when the file was edited and when the file was renamed. But then IndexNumber was different when the file was moved from one folder to another.
From all the reading I have done, it seems like the 'indexNumber' is not reliable for all documents. I almost have a feeling that there is no unique identifier for a file that would be constant across different folders, machines and that would remain unchanged when edited, renamed etc. But here I am StackOverflow. Any help is appreciated.
Edit: Here is the business problem I am trying to solve. A user uploads a file to our web application. Then inputs a bunch of metadata for the file. Similar to adding tags on the file. We keep the file in blob storage but the user still has his local copy that he mails to another user. He maybe edits and renames the file before mailing. When the other user uploads the file to our web application, is there any way we can identify that this was the original file so as to pre-populate the metadata that the original user had entered.
The simple answer to your question is that it can't be done.
Let me summarize what you're trying to accomplish.
If I have a file on my office computer, and upload that to your web application, you want to store that into your system as a new file. Then, if I copy the file from my office computer to my home computer, edit the file contents, rename the file, and then upload it into your web application, you want to identify that this is the same file as the one I previously uploaded.
It can't be done.
Not with a 100% guarantee that you can identify this.
When you are uploading files to a web application, what is sent is this:
The name of the file
The length of the file
The contents of the file
Things such as alternate data streams (NTFS), from the other answer here, or inode or similar identifiers, from the comments, are not sent. Your web application will not see them. Nor would these things be "across multiple computers".
So bottom line, this is impossible.
Your options are:
Let the user uniquely pick the file they want to overwrite, meaning that the user could pick unrelated files and thus be "wrong"
Work out a reasonable chance that you identified the right file, accepting the chance that you identified incorrectly
Embed a unique id into the file itself, however since the file contents can be edited (and the id can be changed) this is not guaranteed
... other options that doesn't have a 100% guarantee of being right
The first option is of course the easiest.
The second option could use systems such as what git is doing when it tracks renames, but even this will fail depending on how much the file was edited between the uploads. Git fail in this respect too, except that "failure" here simply means it doesn't show you the full history of a file, it doesn't break down and become unusable.
The third option might work if the file should be edited by a program similar to Word or Excel or Photoshop, etc. You could embed the ID and just make sure that program doesn't change it. It would probably have a higher and acceptable chance of being right, but it might still be possible to edit.
So you will have to decide what would be acceptable to you, but you cannot create a system in which you are guaranteed to identify the file, even if it was renamed and the contents changed. Because at that point you have no guarantee that the user is simply trying to upload a different file altogether.
On windows with the NTFS file system you could use alternate data streams or NTFS streams;
http://ntfs.com/ntfs-multiple.htm
stream: A sequence of bytes written to a file on the target file system. Every file stored on a volume that uses the file system contains at least one stream, which is normally used to store the primary contents of the file. Additional streams within the file can be used to store file attributes, application parameters, or other information specific to that file. Every file has a default data stream, which is unnamed by default. That data stream, and any other data stream associated with a file, can optionally be named.
https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/8ac44452-328c-4d7b-a784-d72afd19bd9f
There is not a lot of official documentation however. But you can inject a GUID there to be able to track the file.
On limitation about this solution is that this only works for the NTFS filesystem, when the file is copied to e.g. a FAT file system, the information is lost.
You need to access native win32 api's however. Check for example this SO:
https://stackoverflow.com/a/605167/4122889
Or this random blog:
https://blogs.msmvps.com/bsonnino/2016/11/24/alternate-data-streams-in-c/
As part of our installer build, we have to zip thousands of large data files into about ten or twenty 'packages' with a few hundred (or even thousands of) files in each which are all dependent on being kept with the other files in the package. (They are versioned together if you will.)
Then during the actual install, the user selects which packages they want included on their system. This also lets them download updates to the packages from our site as one large, versioned file rather than asking them to download thousands of individual ones which could also lead to them being out of sync with others in the same package.
Since these are data files, some of them change regularly during the design and coding stages, meaning we then have to re-compress all files in that particular zip package, even if only one file has changed. This makes the packaging step of our installer build take well over an hour each time, with most of that going to re-compressing things that we haven't touched.
We've looked into leaving the zip packages alone, then replacing specific files inside them, but inserting and removing large files from the middle of a zip doesn't give us that much of a performance boost. (A little, but not enough that its worth it.)
I'm wondering if its possible to pre-process files down into a cached raw 'compressed state' that matches how it would be written to the zip package, but only the data itself, not the zip header info, etc.
My thinking is if that is possible, during our build step, we would first look for any data file that doesn't have a compressed cache associated with it, and if not, we would compress that file and write the result to the cache.
Next we would simply append all of the caches together in a file stream, adding any appropriate zip header needed for the files.
This would mean we are still recreating the entire zip during each build, but we are only recompressing data that has changed. The rest would just be written as-is which is very fast since it is a straight write-to-disk. And if a data file changes, its cache is destroyed, so next build-pass it would be recreated.
However, I'm not sure such a thing is possible. Is it, and if so, is there any documentation to show how one would go about attempting this?
Yes, that's possible. The most straightforward approach would be to zip each file individually into its own associated zip archive with one entry. When any file is modified, you replace its associated zip file to keep all of those up to date. Then you can write a simple program to take a set of those single entry zip files and merge them into a single zip file. You will need to refer to the documentation in the PKZip appnote. Take a look at that.
Now that you've read the appnote, what you need to do is use the local header, data, and central header from each individual zip file, write the local header and data as is sequentially to the new zip file, and save the central header and the offsets of the local headers in the new file. Then at the end of the new file save the current offset, write a new central directory using the central headers you saved, updating the offsets appropriately, and ending with a new end of central directory record with the offset of the start of the central directory.
Update:
I decided this was a useful enough thing to write. You can get it here.
You could zip each file before hand, and then "zip" them together with no compression at the end to quickly aggregate them into a distributable package. It won't be as efficient as compressing all the data at once, but should be faster to make modifications.
I cannot seem to locate an actual exe that implements this type of functionality. It appears that most existing tools I've tried that have the ability to merge/update will reprocess(compress) the data stream as you have already stated you saw.
However it seems what you describe can be done if you or someone wants to write it. If you take a look at this link for the ZIP file format specification, you can get an overview of the structure you would have to parse out and process. It looks like you can pretty quickly go from file to file gathering up and discarding the files of interest, then merging in your new/updated files. You would still need to rebuild a new central directory (refer to section 4.3.6 of the above linked document) within your new destination archive.
After a little more digging, the DotNetZip Library forum has a message asking about the same type of functionality which also gives a description just like I described above. It also links to this document which seems to indicate that support for that may be added to the DotNetZip library for you to further experiment with.
I want to save some small pieces of information which change simultaneously every second. But the problem is where I can save it?
I tried to save in application setting & xml file tab. But when applications quit all data get corrupted. Because it won't save. The same issues like when the electricity went out, normal problem in my country. This is also corrupt stored information.
I am wondering to save in database but its quite small information and I don't think to use whole database for it.
Use SQLite as a local storage database. You can save the data using Transaction. Using transaction can help you get rid of data corruption problem.
You can also use SQL Express. It is integrated into VS environment so it's very easy to setup in your project. It is simply a file-based version of SQL you can add to your project and then design through Visual Studio. Step-by-step instructions are here: http://msdn.microsoft.com/en-us/library/ms233763(v=vs.80).aspx
Since you mention in a comment that you're only storing 2 variables, a database is probably an overkill. Instead create a class that will round-robin write to two separate files, always picking the older one of the files to write to, as long as the newer one is valid. That way if one of the files gets corrupt, you can revert to the older "version" based on file modified timestamp. You can pick your favorite file format and verify when loading that the file is not corrupt. Your read/write logic can be like follows:
Reads:
Read the newer file.
If corrupt, read the older file.
Writes:
Read the newer file. If corrupt, overwrite it. This is to make sure you don't overwrite the old one when the new one was already corrupt, ending with two corrupt files.
If not corrupt, overwrite the older file. This makes sure if you corrupt the older file, you still have the newer file's values.
I'm writing a simple program that is used to synchronize files to an FTP. I want to be able to check if the local version of a file is different from the remote version, so I can tell if the file(s) need to be transfered. I could check the file size, but that's not 100% reliable because obviously it's possible for two files to be the same size but contain different data. The date/time the files were modified is also not reliable as the user's computer date could be set wrong.
Is there some other way to tell if a local file and a file on an FTP are identical?
There isn't a generic way. If the ftp site includes a checksum file, you can download that (which will be a lot quicker since a checksum is quite small) and then see if the checksums match. But of course, this relies on the owner of the ftp site creating a checksum file and keeping it up to date.
Other then that, you are S.O.L.
If the server is plain-old FTP, you can't do any better than checking the size and timestamps.
FTP has no mechanism for giving you the hashes/checksums of files, so you would need to do something like keeping a special "listing file" that has all the file names and hashes, or doing a separate request via HTTP, or some other protocol.
Ideally, you should not be using FTP anyway, it's really an obsolete protocol. If you have control of the system, you could use rsync or something like it.
Use a checksum. You generate the md5 (or sha1, sha2 etc) hash of both files, and if the files are identical, then the hashes will be identical.
IETF tried to achieve this by adding new FTP commands such as MD5 and MMD5.
http://www.faqs.org/rfcs/ftp-rfcs.html
However, no all FTP vendors support them. So you must have a check on the targeting FTP server you application will work against to see if it supports MD5/MMD5. If not, you can pick up the workarounds mentioned above.
Couldn't you use a FileSystemWatcher and just have the client remember what changed?
http://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher.aspx
Whenever your client uploads files to the FTP server, map each file to its hash and store it locally on the client computer (or store it anywhere you can access later, format doesnt matter, can be an xml file, plain text, as long as you can retreive the key/value pairs). Then when you upload files again just check the local files with the hash table you created, if it is different then upload the file. This way you don't have to rely on the server to maintain a checksum file and you dont have to have a process running to monitor the FileSystemWatcher events.