My naive solution to maintain atomicity is to open streams on both files and exchange the contents via a temporary file. However, I understand that File.Move is much more efficient when both files exist on the same drive because no data is actually copied.
Unfortunately, C#'s File.Move requires that the destination file not exist, so it is impossible to use for an atomic exchange of two files.
Is there a way to ensure neither file will be touched during the exchange and still gain the efficiency of renaming files that exist on the same drive.
Preferably, I'm looking for a solution with C#, though I'm not against using a P/Invoke if there is a lower level way to achieve this. My understanding is that OSX can achieve this via exchangedata() and Linux can achieve this via renameat2(). Anything similar for Windows?
Related
is it possible to use either File.Delete or File.Encrypt to shred files? Or do both functions not overwrite the actual content on disk?
And if they do, does this also work with wear leveling of ssds and similar techniques of other storages? Or is there another function that I should use instead?
I'm trying to improve an open source project which currently stores credentials in plaintext within a file. Because of reasons they are always written to that file (I don't know why Ansible does this, but for now I don't want to touch that part of the code, there may be some valid reason, why that is that way, at least for now) and I can just delete that file afterwards. So is using File.Delete or File.Encrypt the right approach to purge that information off the disk?
Edit: If it is only possible using native API and pinvoke, I'm also fine with that. I'm not limited to only .net, but to C#.
Edit2: To provide some context: The plaintext credentials are saved by the ansible internals as they are passed as a variable for the modules that get executed on the target windows host. This file is responsible for retrieving the variables again: https://github.com/ansible/ansible/blob/devel/lib/ansible/module_utils/powershell/Ansible.ModuleUtils.Legacy.psm1#L287
https://github.com/ansible/ansible/blob/devel/lib/ansible/module_utils/csharp/Ansible.Basic.cs#L373
There's a possibility that File.Encrypt would do more to help shred data than File.Delete (which definitely does nothing in that regard), but it won't be a reliable approach.
There's a lot going on at both the Operating System and Hardware level that's a couple of abstraction layers separated from the .NET code. For example, your file system may randomly decide to move the location where it's storing your file physically on the disk, so overwriting the place where you currently think the file is might not actually remove traces from where the file was stored previously. Even if you succeed in overwriting the right parts of the file, there's often residual signal on the disk itself that could be picked up by someone with the right equipment. Some file systems don't truly overwrite anything: they just add information every time a change happens, so you can always find out what the disk's contents were at any given point in time.
So if you legitimately cannot prevent a file getting saved, any attempt to truly erase it is going to be imperfect. If you're willing to accept imperfection and only want to mitigate the potential for problems somewhat, you can use a strategy like the ones you've found to try to overwrite the file with garbage data several times and hope for the best.
But I wouldn't be too quick to give up on solving the problem at its source. For example, Ansible's docs mention:
A great alternative to the password lookup plugin, if you don’t need to generate random passwords on a per-host basis, would be to use Vault in playbooks. Read the documentation there and consider using it first, it will be more desirable for most applications.
I'm writing a file system using Dokan. What I want to achieve is allowing users to access files that are on multiple sources as if they are all on a local folder. i.e. a file can be available locally, on a remote location or in memory.
Initially I was creating placeholders that describe where the actually file is available (like the win8.1 OneDrive). When the user access a file, I read the placeholders first. Knowing the real location of that file, I read the real one and send the data back to the user application.
After about an hour of coding I found this idea seriously wrong. If the real location of the files are on the Internet, this will work. But if the file is available locally, I actually need request my hard drive to find two files(placeholder and the real file). Also, if the file is available in memory (users do this to improve performance), I still need to access the hard drive, making it pointless to cache the file into RAM.
So... I guess I have to write my own file table, like the NTFS MFT. Well, the concept of a file table is straightforward. But I'm not sure if I can write one that's as efficient as NTFS. Then I started considering a Database. But I'm also not sure if this is a good idea...
What should I do?
Thanks!
PS. I only have very basic knowledge of File Systems.
I've ran into a bit of a stupid problem today:
In my project I have to use a library (that I can't replace), he problem is that I'm using MemoryStream instead of frequently saving to the HDD (because there are many files, and they are small in size, so it's perfect for MemoryStream). The problem is that the library API is built around filesystem access - and one of the functions accepts only direct path to file.
How can I still send a string (path) to the method, which makes a new FileStream without actually touch the hard-drive?
For example "\MEMORY\myfile.bin"?
Well - that's thought.
Basically, you have three possible solutions:
You can use a reflector to modify the library given.
You can inspect the appropriate method, and then, by using some reflection magic you might be able to modify the object at runtime (very un-recommended)
You can play around with system calls and API - and by going into low-level ring0 assembly modify kernal.dll to referrer I/O queries from your path to the memory. (maybe that's possible without ring0 access - I am not sure).
Obviously, the most recommended is to use a reflector to modify the library given. otherwise, I can't see a solution for you.
In respond to the first comment, you can:
use RAMDrive (a program which allocates small chunks of the system memory and show it as partition)
If the file must exist on the disk (and only disk paths are accepted), then the main option is a virtual filesystem which lets you expose custom data as a filesystem. There exist several options, such as now-dead Dokan, our Solid File System OS Edition and Callback File System (see description of our Virtual Storage product line) and maybe Pismo File Mount would work (never looked at it closely).
It all depends on how the library is constructed.
If it's a 100% managed library that uses a FileStream, you are probably stuck.
If it takes the provided filename and call a native WIN32 CreateFile function, it's possible to give it something else than a file such as a named pipe.
To test quickly if it's possible, pass #"\\.\pipe\random_name" to the method: if it responds by saying explicitely that it can't open pipes and filenames begining with \\.\, well, sorry. ON the other hand, if it says it can't find the file, you have a chance to make it work.
You can then create a NamedPipeServerStream and use the same name for your library method call prepended with \\.\pipe\.
You can't "represent" it as a file, but you could "convert" it to a file using a StreamWriter class.
I have a site which is akin to SVN, but without the version control.. Users can upload and download to Projects, where each Project has a directory (with subdirs and files) on the server. What i'd like to do is attach further information to files, like who uploaded it, how many times its been downloaded, and so on. Is there a way to do this for FileInfo, or should I store this in a table where it associates itself with an absolute path or something? That way sounds dodgy and error prone :\
It is possible to append data to arbitrary files with NTFS (the default Windows filesystem, which I'm assuming you're using). You'd use alternate data streams. Microsoft uses this for extended metadata like author and summary information in Office documents.
Really, though, the database approach is reasonable, widely used, and much less error-prone, in my opinion. It's not really a good idea to be modifying the original file unless you're actually changing its content.
As Michael Petrotta points out, alternate data streams are a nifty idea. Here's a C# tutorial with code. Really though, a database is the way to go. SQL Compact and SQLite are fairly low-impact and straightforward to use.
I´m creating a PDA app and I need to upload/download a lot of small files and my idea is to gather them in an uncompressed zip file.
The question is: It´s a good idea to read those files from the zip without separating them? How can I do so? Or is it better to unzip them? Since the files are not compressed my simple mind points that maybe reading them from the zip it´s more or less as efficient as reading them directly from the file system...
Thanks for you time!
Since there are two different Open-source libraries (SharpZipLib and DotNetZip Library) to handle writing & extracting files from a zip file, why worry about doing it yourself?
ewww - don't use J#.
The DotNetZip library, as of v1.7, runs on the .NET Compact Framework 2.0 and above. It can handle reading or writing compressed or uncompressed files within a ZIP archive. The source distribution includes a CF example app. It's really simple.
Sounds as if you want to use the archive to group your files.
From a reading the files point of view, it makes very little difference if the files are handled one way or the other. You would need to implement the ability to read zip files, though. Even if you use a lib like James Curran suggested, it means additional work, which can mean additional sources of error.
From the uploading the files point of view, it makes more sense: The uploader could gather all the files needed and would have to take care of only one single upload. This reduces overhead as well as error handling (if one uplaod fails, do you have to delete all files of this group already uploaded?).
As for the efficiency of reading them from the archive vs. reading them directly from the disc: The difference should be minimal. You (or your zip library) need to once parse the zip directory structure, which is pretty straight forward. The rest is reading part of a file into memory vs. reading a file into memory.