How can I determine when a file was most recently renamed? - c#

I have a program that compares files in two folders. I want to detect if a file has been renamed, determine the newest file (most recently renamed), and update the name on the old file to match.
To accomplish this, I would check to see if the newest file is bit by bit identical to the old one, and if it is, simply rename the old file to match the new one.
The problem is, I have nothing to key on to tell me which file was most recently renamed.
I would love some property like FileInfo.LastModified, but for files that have been renamed.
I've already looked at solutions like FileSystemWatcher, and that is not really what I'm looking for. I would like to be able to run my synchronizer whenever I want, without having to worry about some dedicated process tracking a folder's state.
Any ideas?

A: At least on NTFS, you can attach alternate data streams to a file.
On your first sync, you can just attach a GUID in an ADS to the source files to tag them.
B: If you don't have write access to the source, store hashes of the files you synced in your target repository. When the source changes, you only have to hash the source files and only compare bit-by-bit if the hashes collide. Depending on the quality and speed of your hash function, this will save you a lot of time.

If you are running on an NTFS drive you can enable the change journal which you can then query for things like rename events. However you need to be an admin to enable it to begin with and it will use disk space. Unfortunately I don't know of any specific C# implementations of reading the journal.

You could possibly create a config file that holds a list of all expected names within the folder, and then, if a file in the folder is not a member of the expected list of names, determine that the file has then been renamed. This would, however, add another layer of work considering you'd have to change the list every time you wish to add a new file to the folder.

Filesystems generally do not track this.
Since you seem to be on Windows, you can use GetFileInformationByHandle(). (Sorry, I don't know the C# equivalent.) You can use the "file index" fields in the struct returned to see if files have the same index as something you've seen before. Keep in mind that hardlinks will also have the same index.
Alternatively you could hash file contents somehow.
I don't know precisely what you're trying to do, so I can't tell you whether either of these points makes sense. It could be that the most reasonable answer is, "no, you can't do that."

I would make a CRC (e.g. CRC example) of (all?) the files in the 2 directories storing the last update time with the CRC value, file name etc. After that, interate through the lists finding maches by the CRC and then use the date values to decide what to do.

Related

What would be the most effective data structure for storing and comparing directories in C#?

So I am trying to develop an application in C# right now (for practice), a simple file synchronization desktop program where the user can choose a folder to monitor, then whenever a change occurs in said directory, it is copied to another directory.
I'm still in school and just finished my data structures course, so I'm still a bit of a new to this. But what I was currently thinking is the best solution would be a tree, right? Then I could use breadth-first search to compare, and if a node doesn't match then I would copy the node from the original tree to the duplicate tree. However that seems like it might be inefficient, because I would be searching the entire tree every time.
Possibly considering a linked list too. I really don't know where to go with this. What I've got accomplished so far is the directory monitoring, so I can save to a log file every time something is changed. So that's good. But I feel like this is the toughest part. Can anyone offer any guidance?
Use a hash table (e.g., Dictionary<string,FileInfo>. One of the properties of a FileInfo is the absolute path to the file: use that as the key.
Hash table looks up are cheap (and fast).

File.Delete or File.Encrypt to wipe files?

is it possible to use either File.Delete or File.Encrypt to shred files? Or do both functions not overwrite the actual content on disk?
And if they do, does this also work with wear leveling of ssds and similar techniques of other storages? Or is there another function that I should use instead?
I'm trying to improve an open source project which currently stores credentials in plaintext within a file. Because of reasons they are always written to that file (I don't know why Ansible does this, but for now I don't want to touch that part of the code, there may be some valid reason, why that is that way, at least for now) and I can just delete that file afterwards. So is using File.Delete or File.Encrypt the right approach to purge that information off the disk?
Edit: If it is only possible using native API and pinvoke, I'm also fine with that. I'm not limited to only .net, but to C#.
Edit2: To provide some context: The plaintext credentials are saved by the ansible internals as they are passed as a variable for the modules that get executed on the target windows host. This file is responsible for retrieving the variables again: https://github.com/ansible/ansible/blob/devel/lib/ansible/module_utils/powershell/Ansible.ModuleUtils.Legacy.psm1#L287
https://github.com/ansible/ansible/blob/devel/lib/ansible/module_utils/csharp/Ansible.Basic.cs#L373
There's a possibility that File.Encrypt would do more to help shred data than File.Delete (which definitely does nothing in that regard), but it won't be a reliable approach.
There's a lot going on at both the Operating System and Hardware level that's a couple of abstraction layers separated from the .NET code. For example, your file system may randomly decide to move the location where it's storing your file physically on the disk, so overwriting the place where you currently think the file is might not actually remove traces from where the file was stored previously. Even if you succeed in overwriting the right parts of the file, there's often residual signal on the disk itself that could be picked up by someone with the right equipment. Some file systems don't truly overwrite anything: they just add information every time a change happens, so you can always find out what the disk's contents were at any given point in time.
So if you legitimately cannot prevent a file getting saved, any attempt to truly erase it is going to be imperfect. If you're willing to accept imperfection and only want to mitigate the potential for problems somewhat, you can use a strategy like the ones you've found to try to overwrite the file with garbage data several times and hope for the best.
But I wouldn't be too quick to give up on solving the problem at its source. For example, Ansible's docs mention:
A great alternative to the password lookup plugin, if you don’t need to generate random passwords on a per-host basis, would be to use Vault in playbooks. Read the documentation there and consider using it first, it will be more desirable for most applications.

c# Create a unique name with a GUID

I am creating a back up solution. I doubt there is anything new in what I'm trying to achieve.
Before copying the file I want to take a backup of the destination file in case anything becomes corrupt. This means renaming the file.
I have to be careful when renaming in case the file already exists and adding a 01 to the end is not safe.
My question is, based upon not finding the answer else where, would adding a GUID to the file name work. So, if my file was called file01.txt, renaming the file to file01.txtGUID (where GUID is the generated GUID), I could then perform my back up of that file (at this instance having 2 back ups) and then, after ensuring the file has copied (by comparing length of file to the source), delete the file with the GUID in the name.
I know the GUID is not 100% guaranteed to be unique but would this suffice?
Just get a GUID, then ask the destination OS if name+GUID exists. If it does, then pick a new GUID and try again. You are going to delete the name+GUID file anyway, so who cares if you can't pick a unique filename on the first try.
I think you might be focusing on the wrong problems given the risk and impact.
What if you don't have disk space to make two backups on the destination system?
What if the filename + path is too long for the destination OS to handle?
What is someone else modifies the file in the period of time between when you get the name and try to perform an operation of time on the file?
Writing defensive code is about thinking about risks, but don't drive yourself crazy that you focus on less likely or nearly impossible scenarios.
Why don't you just use GetTempFileName()? That's what it's for;
http://msdn.microsoft.com/en-us/library/system.io.path.gettempfilename.aspx
Yes this would suffice. Nothing is impossible via quantum mechanics, and in theory in a million year you might be able to reproduce a GUID via chance, but as you're also adding the name of the file, so it's even more impossible. You could of course also add the filesize in byte, or a hash of the file, but remember that on Windows the length of a path is not infinite.
Guid.NewGuid()
if your friend.
It is globally unique, unique in the universe. The post you are citing is a joke.

How to project any folder changes to a new folder and leave original untouched?

In my program I am calling methods that do lots of changes to a content of a folder, including:
deleting files/folders,
changing files/folders,
adding files/folders,
adding/deleting symboliclinks/junctions.
That is no problem so far. But I came up with the idea of optionally projecting the final state of the folder (after all the operations are done) to another folder, so that the original folder remains untouched.
Just copying the folder before applying the operations is not appropriate, because the operations might delete large chunks of data, that would have to be unnecessarily copied beforehand. And so it came to my mind, that a professional programmer would certainly not approach it this way.
Ideally I would write something like this (pseudo code):
originalFolder.Delete(lots of files).Add(Some other stuff, maybe change some permissions etc).ProjectTo(newFolder)
Is there some kind of design pattern or other way I could achieve something like this? Maybe some virtual file system I can do stuff on before materializing it into a seperate folder?
I know how to write extension methods and I have already written lots of trivial ones, but I really need to be put on the right path on how to achieve something like this.
If the adding and deleting would be done through YOUR apis, then you can modify the list of files in memory without touching the physical files and when you are set do the changes with the copy on the final folder.
Of course that assumes that you don't need the files changed in any matter thus you won't need to read the new structure through the filesystem before committing, I mean that it would be totally within your application.
If this was on linux, I would have suggested another solution which is to use hard links and hard link the files to many folders and thus actually do whatever you want with the first folder without touching the second. I am not sure if NTFS supports that.
If all you want is to delay changes to the original folder until you are certain that you want to commit them, then a Unit of Work pattern might do the trick. Store all operations that are to be applied to the folder in a container, and then commit them sequentially.
This sounds a bit dangerous though, since changes to the original folder before changes are committed easily can mess things up. In that case you would have to implement some sort of concurrency check to be as certain as possible that all operations will succeed.

What is faster Renaming files, Changing an attribute in them or Moving them between folders

I'm developing a file system manager module, and wondering what will be a more efficient approach.
This will be on a Windows machine with NTFS.
The module will need to notify a different module regarding new files created on a specific directory and also maintain some kind of state for this files so already processed files can be deleted, and in case of failure, the unprocessed files will be processed again.
I thought of either moving files between directories as their state changes, or renaming files according to their state or changing the files attributes as a sign of their state.
I'm wondering what would be the most efficient approach, considering the possibility of a large quantity of files being created over a short time span.
I can't fully answer your question, but give some general hints. Most important of all, the answer to your question might largely depend on the underlying file system (NTFS, FAT32, etc.).
Renaming or moving a file on the same partition generally means that directory entries are changed. The actual file contents need not be touched. Once you move a file to a different partition or hard disk drive, the actual file contents must be copied, too, which takes far more time.
That all being said, I would generally assume a rename to be slightly quicker than moving a file to another directory (on the same partition), since only one directory is affected instead of two. I'm also not quite sure what you mean by changing a file "attribute" -- however, if you're talking about e.g. setting the "archive" flag of a file, or making the file "read-only", that might again be slightly faster than a rename, if the directory entry can be changed in-place instead of being replaced with a new one of a different size.
Again: Do take my assumptions with caution, since this all depends on the particular file system. (For example, hiding a file on a UNIX file system usually means renaming it -- prefixing the name with a . --, but the same is not true for typical DOS/Windows file systems.)
Renaming took: 1498.8166
ApplyAttribute took: 340.5407
Transfer took: 2527.6837
Transfer took: 3933.4944
ApplyAttribute took: 419.635
Renaming took: 1384.0079
Tested with 1000 files.
Run tests twice in order to ensure no caching is in place.
EDITED: nasty bug was fixed, sorry.
Go with attributes.
Why do you want to store this information directly in the filesystem? I would recommend using a SQL database to keep track of the files. That way, you avoid modifying the filesystem, it's probably going to be faster, and you can easily have more information about the files if you need them.
Also, having one folder with large amount of files might be slow by itself, so you might consider having more folders for the files, if that makes sense for you.

Categories