I am creating a back up solution. I doubt there is anything new in what I'm trying to achieve.
Before copying the file I want to take a backup of the destination file in case anything becomes corrupt. This means renaming the file.
I have to be careful when renaming in case the file already exists and adding a 01 to the end is not safe.
My question is, based upon not finding the answer else where, would adding a GUID to the file name work. So, if my file was called file01.txt, renaming the file to file01.txtGUID (where GUID is the generated GUID), I could then perform my back up of that file (at this instance having 2 back ups) and then, after ensuring the file has copied (by comparing length of file to the source), delete the file with the GUID in the name.
I know the GUID is not 100% guaranteed to be unique but would this suffice?
Just get a GUID, then ask the destination OS if name+GUID exists. If it does, then pick a new GUID and try again. You are going to delete the name+GUID file anyway, so who cares if you can't pick a unique filename on the first try.
I think you might be focusing on the wrong problems given the risk and impact.
What if you don't have disk space to make two backups on the destination system?
What if the filename + path is too long for the destination OS to handle?
What is someone else modifies the file in the period of time between when you get the name and try to perform an operation of time on the file?
Writing defensive code is about thinking about risks, but don't drive yourself crazy that you focus on less likely or nearly impossible scenarios.
Why don't you just use GetTempFileName()? That's what it's for;
http://msdn.microsoft.com/en-us/library/system.io.path.gettempfilename.aspx
Yes this would suffice. Nothing is impossible via quantum mechanics, and in theory in a million year you might be able to reproduce a GUID via chance, but as you're also adding the name of the file, so it's even more impossible. You could of course also add the filesize in byte, or a hash of the file, but remember that on Windows the length of a path is not infinite.
Guid.NewGuid()
if your friend.
It is globally unique, unique in the universe. The post you are citing is a joke.
Related
is it possible to use either File.Delete or File.Encrypt to shred files? Or do both functions not overwrite the actual content on disk?
And if they do, does this also work with wear leveling of ssds and similar techniques of other storages? Or is there another function that I should use instead?
I'm trying to improve an open source project which currently stores credentials in plaintext within a file. Because of reasons they are always written to that file (I don't know why Ansible does this, but for now I don't want to touch that part of the code, there may be some valid reason, why that is that way, at least for now) and I can just delete that file afterwards. So is using File.Delete or File.Encrypt the right approach to purge that information off the disk?
Edit: If it is only possible using native API and pinvoke, I'm also fine with that. I'm not limited to only .net, but to C#.
Edit2: To provide some context: The plaintext credentials are saved by the ansible internals as they are passed as a variable for the modules that get executed on the target windows host. This file is responsible for retrieving the variables again: https://github.com/ansible/ansible/blob/devel/lib/ansible/module_utils/powershell/Ansible.ModuleUtils.Legacy.psm1#L287
https://github.com/ansible/ansible/blob/devel/lib/ansible/module_utils/csharp/Ansible.Basic.cs#L373
There's a possibility that File.Encrypt would do more to help shred data than File.Delete (which definitely does nothing in that regard), but it won't be a reliable approach.
There's a lot going on at both the Operating System and Hardware level that's a couple of abstraction layers separated from the .NET code. For example, your file system may randomly decide to move the location where it's storing your file physically on the disk, so overwriting the place where you currently think the file is might not actually remove traces from where the file was stored previously. Even if you succeed in overwriting the right parts of the file, there's often residual signal on the disk itself that could be picked up by someone with the right equipment. Some file systems don't truly overwrite anything: they just add information every time a change happens, so you can always find out what the disk's contents were at any given point in time.
So if you legitimately cannot prevent a file getting saved, any attempt to truly erase it is going to be imperfect. If you're willing to accept imperfection and only want to mitigate the potential for problems somewhat, you can use a strategy like the ones you've found to try to overwrite the file with garbage data several times and hope for the best.
But I wouldn't be too quick to give up on solving the problem at its source. For example, Ansible's docs mention:
A great alternative to the password lookup plugin, if you don’t need to generate random passwords on a per-host basis, would be to use Vault in playbooks. Read the documentation there and consider using it first, it will be more desirable for most applications.
I'm developing a file system manager module, and wondering what will be a more efficient approach.
This will be on a Windows machine with NTFS.
The module will need to notify a different module regarding new files created on a specific directory and also maintain some kind of state for this files so already processed files can be deleted, and in case of failure, the unprocessed files will be processed again.
I thought of either moving files between directories as their state changes, or renaming files according to their state or changing the files attributes as a sign of their state.
I'm wondering what would be the most efficient approach, considering the possibility of a large quantity of files being created over a short time span.
I can't fully answer your question, but give some general hints. Most important of all, the answer to your question might largely depend on the underlying file system (NTFS, FAT32, etc.).
Renaming or moving a file on the same partition generally means that directory entries are changed. The actual file contents need not be touched. Once you move a file to a different partition or hard disk drive, the actual file contents must be copied, too, which takes far more time.
That all being said, I would generally assume a rename to be slightly quicker than moving a file to another directory (on the same partition), since only one directory is affected instead of two. I'm also not quite sure what you mean by changing a file "attribute" -- however, if you're talking about e.g. setting the "archive" flag of a file, or making the file "read-only", that might again be slightly faster than a rename, if the directory entry can be changed in-place instead of being replaced with a new one of a different size.
Again: Do take my assumptions with caution, since this all depends on the particular file system. (For example, hiding a file on a UNIX file system usually means renaming it -- prefixing the name with a . --, but the same is not true for typical DOS/Windows file systems.)
Renaming took: 1498.8166
ApplyAttribute took: 340.5407
Transfer took: 2527.6837
Transfer took: 3933.4944
ApplyAttribute took: 419.635
Renaming took: 1384.0079
Tested with 1000 files.
Run tests twice in order to ensure no caching is in place.
EDITED: nasty bug was fixed, sorry.
Go with attributes.
Why do you want to store this information directly in the filesystem? I would recommend using a SQL database to keep track of the files. That way, you avoid modifying the filesystem, it's probably going to be faster, and you can easily have more information about the files if you need them.
Also, having one folder with large amount of files might be slow by itself, so you might consider having more folders for the files, if that makes sense for you.
I know that pretty much every programming language has a method to check the existence of a file or directory.
However, in my case, a file is made which stores program settings. If it does not exist (ie !File.Exists or Directory.Count == 0 where Directory is the containing directory of the file), then prompt for some settings to be filled in.
However, is this sort of code reliable? For example, there may be no files in the directory in which case the details are requested, otherwise there may be files of other types which are irrelevant or a tampered file of the correct format. Would it be better to check for the specific file itself? Would it also be better to check for variables which make up the file as this is faster?
What is the best general way of doing this? Check a file, if the folder is there? If the folder is there and empty? Check the strings which are written to the file?
EDIT: A school of thought from a colleague was that I should check at the variable level because it would be closer to the problem (identify issues like incorrect encryption, corruption, locales, etc).
Thanks
I'd just check for the existence and validity of the specific file. If you encounter a corrupted file, get rid of it and make a new one.
For basic development, it is purely choice. In the case where the file existing is crucial to the stability of the application, checking the file directly is the safest route.
I have used GetShortPathName frequently with no problem. However, now I'm having a problem.
In the past I have done, for example, #"C:\LongFoldername\LonfolderName\"
Now I'm using UNC like this #"\\MyServerName\TheLongFolderName"
But it does not get shortened. It stays the same.
I have tried #"\\?\MyServerName\TheLongFolderName"
But that returns "".
I have read GetShortPathName Function But it did not help.
What am I missing?
Thanks!
I doubt very much that GetShortPathName will work on network names because they wouldn't be unique anymore and who would manage the mappings.
In a filesystem the short path name in guaranteed unique on the whole file system and it is created when the file with the long name is created or renamed. You cannot make ensure this in a network.
But even on a file system it is not guaranteed that a given file has a short file name, this may depend on system settings.
THe share name is too long - it must be under 11 chars for compability with GetShortPathName. I think you may be warned of this when creating long share names in some instances.
http://groups.google.co.uk/group/microsoft.public.win32.programmer.kernel/msg/88454e39076262ab
I have a program that compares files in two folders. I want to detect if a file has been renamed, determine the newest file (most recently renamed), and update the name on the old file to match.
To accomplish this, I would check to see if the newest file is bit by bit identical to the old one, and if it is, simply rename the old file to match the new one.
The problem is, I have nothing to key on to tell me which file was most recently renamed.
I would love some property like FileInfo.LastModified, but for files that have been renamed.
I've already looked at solutions like FileSystemWatcher, and that is not really what I'm looking for. I would like to be able to run my synchronizer whenever I want, without having to worry about some dedicated process tracking a folder's state.
Any ideas?
A: At least on NTFS, you can attach alternate data streams to a file.
On your first sync, you can just attach a GUID in an ADS to the source files to tag them.
B: If you don't have write access to the source, store hashes of the files you synced in your target repository. When the source changes, you only have to hash the source files and only compare bit-by-bit if the hashes collide. Depending on the quality and speed of your hash function, this will save you a lot of time.
If you are running on an NTFS drive you can enable the change journal which you can then query for things like rename events. However you need to be an admin to enable it to begin with and it will use disk space. Unfortunately I don't know of any specific C# implementations of reading the journal.
You could possibly create a config file that holds a list of all expected names within the folder, and then, if a file in the folder is not a member of the expected list of names, determine that the file has then been renamed. This would, however, add another layer of work considering you'd have to change the list every time you wish to add a new file to the folder.
Filesystems generally do not track this.
Since you seem to be on Windows, you can use GetFileInformationByHandle(). (Sorry, I don't know the C# equivalent.) You can use the "file index" fields in the struct returned to see if files have the same index as something you've seen before. Keep in mind that hardlinks will also have the same index.
Alternatively you could hash file contents somehow.
I don't know precisely what you're trying to do, so I can't tell you whether either of these points makes sense. It could be that the most reasonable answer is, "no, you can't do that."
I would make a CRC (e.g. CRC example) of (all?) the files in the 2 directories storing the last update time with the CRC value, file name etc. After that, interate through the lists finding maches by the CRC and then use the date values to decide what to do.