Attaching arbitrary data to DirectoryInfo/FileInfo? - c#

I have a site which is akin to SVN, but without the version control.. Users can upload and download to Projects, where each Project has a directory (with subdirs and files) on the server. What i'd like to do is attach further information to files, like who uploaded it, how many times its been downloaded, and so on. Is there a way to do this for FileInfo, or should I store this in a table where it associates itself with an absolute path or something? That way sounds dodgy and error prone :\

It is possible to append data to arbitrary files with NTFS (the default Windows filesystem, which I'm assuming you're using). You'd use alternate data streams. Microsoft uses this for extended metadata like author and summary information in Office documents.
Really, though, the database approach is reasonable, widely used, and much less error-prone, in my opinion. It's not really a good idea to be modifying the original file unless you're actually changing its content.

As Michael Petrotta points out, alternate data streams are a nifty idea. Here's a C# tutorial with code. Really though, a database is the way to go. SQL Compact and SQLite are fairly low-impact and straightforward to use.

Related

Performance concerns when writing my own file system

I'm writing a file system using Dokan. What I want to achieve is allowing users to access files that are on multiple sources as if they are all on a local folder. i.e. a file can be available locally, on a remote location or in memory.
Initially I was creating placeholders that describe where the actually file is available (like the win8.1 OneDrive). When the user access a file, I read the placeholders first. Knowing the real location of that file, I read the real one and send the data back to the user application.
After about an hour of coding I found this idea seriously wrong. If the real location of the files are on the Internet, this will work. But if the file is available locally, I actually need request my hard drive to find two files(placeholder and the real file). Also, if the file is available in memory (users do this to improve performance), I still need to access the hard drive, making it pointless to cache the file into RAM.
So... I guess I have to write my own file table, like the NTFS MFT. Well, the concept of a file table is straightforward. But I'm not sure if I can write one that's as efficient as NTFS. Then I started considering a Database. But I'm also not sure if this is a good idea...
What should I do?
Thanks!
PS. I only have very basic knowledge of File Systems.

Editing large binary files

I'm busy with a little project which has a lot of data like images text files and other things and I'm trying to pack it all up in one big file or multiple big files so the program folder doesn't look messy.
But the problem is how can I edit these files. I've thought about the file structure and it's going to be something like this:
[DWORD] Number of files
[DWORD]FileId
[STRING]FileName
[DWORD]FileSize
[DWORD]FileIndex
[BYTES]All the files
So the first part is too quickly get a list of all the files and the FileIndex is the Position in the binary file so I can set the pointer too for example 300 and read the file.
But if I want to create a patch and edit it I would have to read all the bytes after the file i'm editing and copy them all back which could take ages with a couple of files.
The binary file could be a few 100 mb's when all the files are inserted.
So how do other programs do this for example games use these big files and also patch a lot is there some kind of trick to insert extra bytes more quickly?
There is no "trick" to inserting bytes in the middle of a file.
Usually solutions involve adding files to the end of the file, then switching their position in the index. Then you run into the problem of having to defragment the file. You can break files into large chunks which can mitigate some of the defragmentation woes, but then the files are not contiguous.
If you are dealing with non-static data, I would not recommend doing this unless you absolutely have to. I've seen absolutely brilliant software engineers take a considerable amount of time to write a reasonable implementation of this.
Using sqlite as a virtual file system can be a viable solution to this. But then again, so is putting the data files in another folder so it doesn't look "messy".
If at all possible, I'd probably package the data up into a zip file. This will not only clean up your directory, but (especially for the text files you mention) throw in some compression essentially for free. There are also, of course, quite a few existing tools and libraries for creating, examining, modifying, etc., a zip file.
Using zlib (for one example), most of the work is handled for you (e.g., as demonstrated in minizip).
The trick is to make patches by overwriting the data. Otherwise, there are systems available to manage large volumes of data, for example databases.
You can create a database file that will accompany your program, and hold all your data there, and not in files. You can even embed the database code in your application, with SQLite, for example, or use external DB's like Sql Server, Oracle SQL, or MySql.
What you're describing is basically implementing your own file system. Its a tricky and a very difficult task to make that effective.
You could treat the packing and editing program sort of like a custom memory allocator:
Use a minimum block size - When you add a file, use enough whole
blocks to fit the file. This automatically gives the files some room
to grow without effecting the others.
When a file gets too big for its current allocation, move it to the end of the package.
Mark the free blocks as free, and keep the offset to the head of the
free list in the package header. When adding other files, first
check to see if there is a free block big enough for them.
When extending files past their current block, check to see if the following block is on the free list.
If the free list gets too long (too much fragmentation), consolodate the package. Move each file forward to start in the first free block. This will have to re-write the whole file, but it would happen rarely.
Alternately, instead of the simple directory you have, use something like a FAT. For each file, store a list of chunks and sizes. When you extend a file past its current allocation, add another chunk with the remainder. Defragment occasionaly as needed.
Both of these would add a little overhead to the package, but leaving gaps is really the only alternative to rewriting the whole thing on every insert.
The is not way to insert bytes into a file other than the one you described. This is independent of the programming language. It's just how file systems work...
You can overwrite parts of the file, but only as long as you respect the byte count.
Have you thought about using a .zip file? I keep seeing formats out there where multiple files are stored as one, and the underlying file is really a zip file. The nice thing about this is that the zip library handles the low-level bit-tracking stuff for you.
A couple examples that come to mind:
A Word .docx file is really a zip (rename one to .zip, and you can open it -- it has whole folders in it)
The .xap file that Silverlight packages use is another one.
You can use a managed shared memory, supported by memory mapped file. You still have to have sufficient address space for the whole file, but you don't need to copy the whole file into memory. You can use most standard facilities with shared memory allocator, though you can quickly find that specifying custom allocator everywhere is a chore. But the good news is that you don't need to implement it all yourself, you can take Boost.Interprocess and it already has all necessary facilities for both unix and windows.

How do you suggest I approach this unique problem?

I have a website where I allow businesses to register what products they sell individually. Then a consumer can online and search for a product and receive a list of all the shops where it's currently selling.
Although they can upload one product at a time, I want to allow businesses to mass upload things they offer.
I was thinking of using a excel spreadsheet. Have them download the template, and then have them upload the filled in excel sheet.
Others have suggested telling them to create a CSV file, but that is counter-intuitive in my honest opinion. Most likely a secretary will be creating the product sheets and she won't have a clue about what a CSV is.
What is the best way to approach this?
Well, it partly depends on the businesses. If they are medium or large businesses, they'd probably rather submit the data via a webservice anyway - then they don't have to get a human involved at all, after the initial development. They can write an application to periodically suck information from their (inevitable) database of products, and post to your web service.
If you're talking about very small companies without their own IT departments, that's less feasible, and either Excel or CSV would be a better approach. (As Caladain says, it's pretty simple to export to CSV... but you should try from a number of different spreadsheet programs as they may well have different subtleties in their export format. Things like text encoding will be important as well.)
But here's a novel idea... how about you ask some sample companies what they would like you to do? Presumably you have some companies in mind already - if you don't, it's potentially going to be pretty hard to make sure you're really building the right thing.
Find out how they already store their product list, and how they'd want to upload it to you. Then consider how difficult that would be, and possibly go back to them with something which is almost as easy for them, but a lot easier for you to implement, etc.
While I personally don't like Excel very much, it seems to be the best accepted format to do such things (involving a manual process).
My experience is that CSV breaks easily, for instance it uses the regional settings to determine the separator which can cause incompatibilities on either the client or the server side. Also, many people just save the file in any Excel format because they just don't know the difference.
Creating the files can be pretty easily done with some XSLT (e.g. create XMLSS format files, which are "XML Spreadsheet 2003" format).
You may also want to have a look at the Excel Data Reader on Codeplex for parsing the files.
Reading in an Excel file is actually pretty easy with ODBC. Tutorial on it.

Access a settings/preferences file on a server

My application has historically used an ini file on the same file server as the data it consumes is located to store per user settings so that they roam if the user logs on from multiple computers. To do this we had a file that looked like:
[domain\username1]
value1=foo
value2=bar
[domain\username2]
value1=foo
value2=baz
For this release we're trying to migrate away from using ini files due to limitations in the win32 ini read/write functions without having to write a custom ini file parser.
I've looked at app.config and user settings files and neither appear to be suitable. The former needs to be in the same folder as the executable, and the latter doesn't provide any means to create new values at runtime.
Is there a built in option I'm missing, or is my best path to write a preferences class of my own and use the framework's XML serialization to write it out?
I have found that the fastest way here is to just create an XML file that does what you want, then use XSD.exe to create a class and serialize the data. It is fast, and a few lines of code and works quite well.
Have you not checked out or have heard of nini which is a third party ini handler. I found it quite easy to use and simple in reading/writing to ini file.
For your benefit, it would mean very little changes, and easier to use.
The conversion from ini to another format needs to be weighed up, like the code impact, ease of programming (nitpicky aside, changing the code to use xml may be easy but it is limiting in that you cannot write to it). What would be the benefit in ripping out the ini codes and replace it with xml is a question you have to decide?
There may well be a knock on effect such as having to change it and adapt the code...but... for the for-seeable time, sure, ini is a bit outdated and old, but it is still in use, I cannot see Microsoft dropping the ini API support as it is very much alive and in use behind the scenes for driver installation...think of inf files used to specify where the drivers go and how is it installed...it is here to stay as the manufacturers of drivers have adopted it and is de-facto standard way of driver distribution...
Hope this helps,
Best regards,
Tom.

How do I store a rating in a song?

I want to be able to store information about a song that has been opened using my application. I would like for the user to be able to give the song a rating and this rating be loaded every time the users opens that file using my application.
I also need to know whether I should store the ratings in a database or an xml file.
C# ID3 Library is a .Net class library for editing id3 tags (v1-2.4). I would store the ratings directly into the comments section of the mp3 since id3v1 does not have many of the storage features that id3v2 does. If you want to store additional information for each mp3, what about placing a unique identifier on the mp3 and then having that do a database lookup?
I would be cautious about adding custom tags to mp3s as it is an easy way to ruin a large library. Also, I have gone down this road before and while I enjoyed the programming knowledge that came out of it, trying something like the iTunes SDK or Last FM might be a better route.
I would use a single-file, zero-config database. SQL Server Compact in your case.
I don't think XML is a good idea. XML shines in data interchange and storing very small amounts of information. In this case a user may rate thousands of tracks ( I have personally in online radios that allow ratings), and you may have lots of other information to store about the track.
Export and import using XML export procedures if you have to. Don't use it as your main datastore.
I would store it in a file as it is easier to keep with the mp3 file itself. If all you're doing is storing ratings, would you consider setting the ID3 rating field instead?
For this type of very simple storage I don't think it really matters all that much. The pro's of XML is its very easy to deploy and its editable outside of your app. the con's are, its editible outside your application (could be good, could be bad, depends on your situation)
Maybe another option (just because you can ;-) is an OODBMS, check out DB4Objects, its seriously addictive and very, very cool.
As mentioned earlier it is better to store such information in media file itself. And my suggestion is to use TagLib# lib for this (best media metadata lib I can find). Very powerful and easy to use.
I would store the ratings in a XML file, that way it's easy to edit from the outside, easy to read in .NET and you don't have to worry about shipping a database for something simple with you application.
Something like this might work for you:
<Songs>
<Song Title="{SongTitle}">
<Path>{Song path}</Path>
<Rating>3</Rating>
</Song>
</Songs>
If the song format supports suitable meta data (eg. MP3), then follow Kevin's advice of using the meta data. This is by far the best way of doing it, and it is what the meta data is intended for.
If not, then it really depends on your application. If you want to share the rating information - especially over a web service, then I would go for XML: it would be trivial to supply your XML listings as one big feed, for example.
XML (or most other text formats) also have the advantage that they can be easily edited by a human in a text editor.
The database would have its advantages if you had a more closed system, you wanted speed and fast indexing, and/or have other tables you might want to store as well (eg. data about albums and bands).

Categories