Performance concerns when writing my own file system

Performance concerns when writing my own file system - c#

I'm writing a file system using Dokan. What I want to achieve is allowing users to access files that are on multiple sources as if they are all on a local folder. i.e. a file can be available locally, on a remote location or in memory.
Initially I was creating placeholders that describe where the actually file is available (like the win8.1 OneDrive). When the user access a file, I read the placeholders first. Knowing the real location of that file, I read the real one and send the data back to the user application.
After about an hour of coding I found this idea seriously wrong. If the real location of the files are on the Internet, this will work. But if the file is available locally, I actually need request my hard drive to find two files(placeholder and the real file). Also, if the file is available in memory (users do this to improve performance), I still need to access the hard drive, making it pointless to cache the file into RAM.
So... I guess I have to write my own file table, like the NTFS MFT. Well, the concept of a file table is straightforward. But I'm not sure if I can write one that's as efficient as NTFS. Then I started considering a Database. But I'm also not sure if this is a good idea...
What should I do?
Thanks!
PS. I only have very basic knowledge of File Systems.

Related

File.Delete or File.Encrypt to wipe files?

is it possible to use either File.Delete or File.Encrypt to shred files? Or do both functions not overwrite the actual content on disk?
And if they do, does this also work with wear leveling of ssds and similar techniques of other storages? Or is there another function that I should use instead?
I'm trying to improve an open source project which currently stores credentials in plaintext within a file. Because of reasons they are always written to that file (I don't know why Ansible does this, but for now I don't want to touch that part of the code, there may be some valid reason, why that is that way, at least for now) and I can just delete that file afterwards. So is using File.Delete or File.Encrypt the right approach to purge that information off the disk?
Edit: If it is only possible using native API and pinvoke, I'm also fine with that. I'm not limited to only .net, but to C#.
Edit2: To provide some context: The plaintext credentials are saved by the ansible internals as they are passed as a variable for the modules that get executed on the target windows host. This file is responsible for retrieving the variables again: https://github.com/ansible/ansible/blob/devel/lib/ansible/module_utils/powershell/Ansible.ModuleUtils.Legacy.psm1#L287
https://github.com/ansible/ansible/blob/devel/lib/ansible/module_utils/csharp/Ansible.Basic.cs#L373

There's a possibility that File.Encrypt would do more to help shred data than File.Delete (which definitely does nothing in that regard), but it won't be a reliable approach.
There's a lot going on at both the Operating System and Hardware level that's a couple of abstraction layers separated from the .NET code. For example, your file system may randomly decide to move the location where it's storing your file physically on the disk, so overwriting the place where you currently think the file is might not actually remove traces from where the file was stored previously. Even if you succeed in overwriting the right parts of the file, there's often residual signal on the disk itself that could be picked up by someone with the right equipment. Some file systems don't truly overwrite anything: they just add information every time a change happens, so you can always find out what the disk's contents were at any given point in time.
So if you legitimately cannot prevent a file getting saved, any attempt to truly erase it is going to be imperfect. If you're willing to accept imperfection and only want to mitigate the potential for problems somewhat, you can use a strategy like the ones you've found to try to overwrite the file with garbage data several times and hope for the best.
But I wouldn't be too quick to give up on solving the problem at its source. For example, Ansible's docs mention:
A great alternative to the password lookup plugin, if you don’t need to generate random passwords on a per-host basis, would be to use Vault in playbooks. Read the documentation there and consider using it first, it will be more desirable for most applications.

.NET Windows API Check if folder accessed

I am searching for a function that allows me to put a dialog-window(w/ a password query) before the folder is accessed. Is there such a function? Also, this would be great if this protection is there before any program, even Windows Explorer/cmd.exe are allowed to access those files. Is that possible to make?
I'm not using something like IOContainer, passwd. protected ZIPs or any other things that are too slow, because I guess 20GB in one file are a bit overkill and it would take ages to decrypt that file. Is there maybe a VFS solution for C# which supports password protection and can be used as a normal filesystem or folder on the disk?
Thanks!

There exist two options. The simpler one is to have a virtual file system mapped from the file. Our product, SolFS (OS edition), does exactly what you are asking in the second part of your question - it provides a container with optional encryption, which is exposed as a virtual drive so that access to the contents is transparent. Decryption in such systems is done in pages, so 20GB-large file won't be decrypted in whole as you worry.
Another option is to employ a filesystem filter driver, which will intercept requests for directory opening, and will ask the user for a password. This approach is possible (we even have a product for this, called CallbackFilter), but there are two drawbacks in it: first, it's not impossible to remove the driver, leaving the data unprotected. And the second problem is that if you ask the user for a password in a callback, while the OS is waiting for access to the directory, you can end up in a deadlock or a timeout while the user is thinking.
With these two limitations in mind something like SolFS is the preferred and recommended approach.
PS: and we have free non-commercial licenses as well.

How to create self expiring files

I need to have controlled usage over STL files, using a desktop software.( I shall develop the software). It can be either in C++ or C#.
Here STL files refers to STereoLithography files, used for 3D printing.
Controlled usage , refers to usage specified by the distributor. So it can be 1 day, 2 hour, or whatever the distributor deems fit . The files shall self expire after the user has received it.
Any ideas shall be appreciated.

I looked into the STL standard definition and it looks like it might be hard to embed some license data inside. A few options coming to my mind are:
a) Create your own format being a superset of STL, including some embedded license data. You would have to restrict usage of "clear" STL files, because user might have extracted the data portion of your file and save it to simple STL file.
b) Create your own format with your own structure including the license. It'll make extracting the data harder than in point a).
c) Make the program download the data from your server - the license testing will be on your side then. Make sure, that no data is saved on the harddrive, because otherwise user can again extract data and save the file somewhere else
d) (Preferred) Do not implement any security measures (determined cracker will destroy them eventually, because at some point you have to store unencrypted STL data on the disc or in memory, so it can be accessed). Instead, license your files correctly.
Remember, there is no security measure, that cannot be broken. It's a lot more valuable for your customers, that you spend time on developing new features than on implementing new security measures, which will annoy legit users and will be ignored by unfair ones anyway eventually.

Files do not expire on their own (unless we're talking about faulty storage media) and access to them needs to be restricted by software or a combination of software and hardware.
If you plan to make the STL files openly available at any single point (e.g. when the user tries to open them in the viewer or editor), their content cannot be hidden or prevented from copying.
And even if you bundle them with a program that would extract them from itself or obtain them from your website when the editor starts and delete them when it exits (automatically), the editor may still be able to save a copy as a different file (it may even save a temporary/backup copy automatically).
One way to protect those files from copying is to make them available within a program of yours and never outside of it, which may render the files totally useless if your program doesn't let the user determine if they're good (I'm imagining, 1 day, 2 hour, or whatever implies some sort of trial version). But even then they may still be extracted from it at run time by skillful hackers.
If the OS supports DRM for arbitrary files and in ways of interest to you, you might be able to use the OS DRM functionality to control file copying and lifetime. Unfortunately, I do not have practical knowledge of this to point in the direction of such a solution.
Another option is to distribute the files in the open, but embed into them some kind of watermarks, unique for each user/license and able to survive a certain amount of editing. This won't solve every problem, but if a copy starts circulating online, you will be able to tell who "leaked" it and go after them.
At any rate, all protection can be circumvented, given enough time and skills. If you can't break it, it doesn't mean someone else won't be able to.

Editing large binary files

I'm busy with a little project which has a lot of data like images text files and other things and I'm trying to pack it all up in one big file or multiple big files so the program folder doesn't look messy.
But the problem is how can I edit these files. I've thought about the file structure and it's going to be something like this:
[DWORD] Number of files
[DWORD]FileId
[STRING]FileName
[DWORD]FileSize
[DWORD]FileIndex
[BYTES]All the files
So the first part is too quickly get a list of all the files and the FileIndex is the Position in the binary file so I can set the pointer too for example 300 and read the file.
But if I want to create a patch and edit it I would have to read all the bytes after the file i'm editing and copy them all back which could take ages with a couple of files.
The binary file could be a few 100 mb's when all the files are inserted.
So how do other programs do this for example games use these big files and also patch a lot is there some kind of trick to insert extra bytes more quickly?

There is no "trick" to inserting bytes in the middle of a file.
Usually solutions involve adding files to the end of the file, then switching their position in the index. Then you run into the problem of having to defragment the file. You can break files into large chunks which can mitigate some of the defragmentation woes, but then the files are not contiguous.
If you are dealing with non-static data, I would not recommend doing this unless you absolutely have to. I've seen absolutely brilliant software engineers take a considerable amount of time to write a reasonable implementation of this.
Using sqlite as a virtual file system can be a viable solution to this. But then again, so is putting the data files in another folder so it doesn't look "messy".

If at all possible, I'd probably package the data up into a zip file. This will not only clean up your directory, but (especially for the text files you mention) throw in some compression essentially for free. There are also, of course, quite a few existing tools and libraries for creating, examining, modifying, etc., a zip file.
Using zlib (for one example), most of the work is handled for you (e.g., as demonstrated in minizip).

The trick is to make patches by overwriting the data. Otherwise, there are systems available to manage large volumes of data, for example databases.
You can create a database file that will accompany your program, and hold all your data there, and not in files. You can even embed the database code in your application, with SQLite, for example, or use external DB's like Sql Server, Oracle SQL, or MySql.
What you're describing is basically implementing your own file system. Its a tricky and a very difficult task to make that effective.

You could treat the packing and editing program sort of like a custom memory allocator:
Use a minimum block size - When you add a file, use enough whole
blocks to fit the file. This automatically gives the files some room
to grow without effecting the others.
When a file gets too big for its current allocation, move it to the end of the package.
Mark the free blocks as free, and keep the offset to the head of the
free list in the package header. When adding other files, first
check to see if there is a free block big enough for them.
When extending files past their current block, check to see if the following block is on the free list.
If the free list gets too long (too much fragmentation), consolodate the package. Move each file forward to start in the first free block. This will have to re-write the whole file, but it would happen rarely.
Alternately, instead of the simple directory you have, use something like a FAT. For each file, store a list of chunks and sizes. When you extend a file past its current allocation, add another chunk with the remainder. Defragment occasionaly as needed.
Both of these would add a little overhead to the package, but leaving gaps is really the only alternative to rewriting the whole thing on every insert.

The is not way to insert bytes into a file other than the one you described. This is independent of the programming language. It's just how file systems work...
You can overwrite parts of the file, but only as long as you respect the byte count.

Have you thought about using a .zip file? I keep seeing formats out there where multiple files are stored as one, and the underlying file is really a zip file. The nice thing about this is that the zip library handles the low-level bit-tracking stuff for you.
A couple examples that come to mind:
A Word .docx file is really a zip (rename one to .zip, and you can open it -- it has whole folders in it)
The .xap file that Silverlight packages use is another one.

You can use a managed shared memory, supported by memory mapped file. You still have to have sufficient address space for the whole file, but you don't need to copy the whole file into memory. You can use most standard facilities with shared memory allocator, though you can quickly find that specifying custom allocator everywhere is a chore. But the good news is that you don't need to implement it all yourself, you can take Boost.Interprocess and it already has all necessary facilities for both unix and windows.

Attaching arbitrary data to DirectoryInfo/FileInfo?

I have a site which is akin to SVN, but without the version control.. Users can upload and download to Projects, where each Project has a directory (with subdirs and files) on the server. What i'd like to do is attach further information to files, like who uploaded it, how many times its been downloaded, and so on. Is there a way to do this for FileInfo, or should I store this in a table where it associates itself with an absolute path or something? That way sounds dodgy and error prone :\

It is possible to append data to arbitrary files with NTFS (the default Windows filesystem, which I'm assuming you're using). You'd use alternate data streams. Microsoft uses this for extended metadata like author and summary information in Office documents.
Really, though, the database approach is reasonable, widely used, and much less error-prone, in my opinion. It's not really a good idea to be modifying the original file unless you're actually changing its content.

As Michael Petrotta points out, alternate data streams are a nifty idea. Here's a C# tutorial with code. Really though, a database is the way to go. SQL Compact and SQLite are fairly low-impact and straightforward to use.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.