Creating a partial (or bounded) FileStream in C# - c#

I have a FileStream that consists of several files put into one file, and I have a list of the lengths of the files, in other words I can easely calculate the position and length of all the files. What I want to create is a Open-method that takes a fileindex and returns a stream containing only that file. Currently I've implemented this using a memory-stream, but that forces me to copy the whole (not the container, but the whole contained) file into memory, and I don't want to do that. So, what I would like to be able to do is create a class that implements stream and takes another stream, a offset and a length parameter and then is readable and seekable, only when you do Seek(0) you should get to the offset of the underlaying stream. So like an adapter-class, and I was wondering if this was possible, or even a good idea, or if anyone has any better ideas of how to solve this problem. I realize that if I do it the way I just described I need to make sure that access to the underlaying stream is synchronized, and that all of the partial streams open holds a private variable telling them where currently in the stream they are, but this should probably be dooable, right? has anyone done anything like this before? Or is there a simpel .NET-class I can just use? Any help would be appreciated.
Oh, and sorry for bad english, I forgot to install my browser in english, so spellchecker tells me everything is wrong.

If you're using .NET 4.0, you could use memory-mapped files. They do pretty much what you've described: you can map a "view" of a large file, specified by an offset and a length, into memory, and access just that part of the file using a Stream.
Otherwise, I think your approach sounds good. Just watch out for corner cases involving reading or writing beyond the boundaries of the intended file!

Related

How to resize a file, "trimming" its beginning?

I am implementing a file-based queue of serialized objects, using C#.
Push() will serialize an object as binary and append it to the end of the file.
Pop() should deserialize an object from the beginning of the file (this part I got working). Then, the deserialized part should be removed from the file, making the next object to be "first".
From the standpoint of file system, that would just mean copying file header several bytes further on the disk, and then moving the "beginning of the file" pointer. The question is how to implement this in C#? Is it at all possible?
Easiest that I can see
1) stream out (like a log, dump it into file),
(note: you'd need some delimiters and a 'consistent format' of your 'file' - based on what your data is)
2) and later stream in (just read file from start, in one go, and process w/o removing anything)
and that'd work fine, FIFO (first in first out).
So, my suggestion - don't try to optimize that by removing, skipping
etc. (rather regroup and use more files.
3) If you worry about the scale of things - then just 'partition' that into small enough files, e.g. each 100 or 1,000 records (depends, do some calculations).
You may need to make some sort of 'virtualizer' here, which maps files, keeps track of your 'database' as, if it's over multiple files. The simplest is to just use the file-system and check file times etc. Or add some basic code to improve that.
However, I think you may have problems if you have to ensure
'transactions' - i.e. what if things fail so you need to keep track of
where the file left off, retrace etc.
That might be an issue, but you know best if it's really necessary to have that (how critical). You can always work 'per file' and per smaller files. If it fails, rollback and do the file again (or log problems). If it succeeds you can delete file (after success) and go on like that.
This is very 'hand made' approach but should get you going with a simple and not too demanding solution (like you're describing). Or something along those lines.
I should probably add...
You could also save you some trouble and use some portable database for that or something similar. This is was purely based on the idea of hand-coding a simplest solution (and we could probably come up with something smarter, but being late this is what I have :).
Files don't work that way. You can trim off the end, but not the beginning. In order to mutate a file to remove content at the beginning you need to re-write the entire file.
I expect you'll want to find some other way to solve your problem. But a linear file is totally inappropriate for representing a FIFO queue.

Search in a file and write the matched content to another file

I have a large txt file and want to search through it and output certain strings, for example, let's say two lines are:
oNetwork.MapNetworkDrive "Q:", xyz & "\one\two\three\four"
oNetwork.MapNetworkDrive "G:", zzz
From this I'd like to copy and output the Q:, G:, and the "\one\two\three\four" to another file.
What's the most efficient way of doing this?
There is ultimately only one way to read a text file. You're going to have to go line-by-line and parse the entire file to pick out the pieces you care about.
Your best bet is to read the file using a StreanReader (File.OpenText is a good way to get one). From there, just keep calling ReadLine and picking out the bits you care about.
The main way to increase efficiency is to make sure you only have to parse the file once. Save everything you care about, and only what you care about. As much as you can, act on the information in the file right away then throw it away - the less you have to store, the better. Do not use File.ReadAllText since it will read the entirety of the file into memory all at once.

represent Memory Stream as a physical file

I've ran into a bit of a stupid problem today:
In my project I have to use a library (that I can't replace), he problem is that I'm using MemoryStream instead of frequently saving to the HDD (because there are many files, and they are small in size, so it's perfect for MemoryStream). The problem is that the library API is built around filesystem access - and one of the functions accepts only direct path to file.
How can I still send a string (path) to the method, which makes a new FileStream without actually touch the hard-drive?
For example "\MEMORY\myfile.bin"?
Well - that's thought.
Basically, you have three possible solutions:
You can use a reflector to modify the library given.
You can inspect the appropriate method, and then, by using some reflection magic you might be able to modify the object at runtime (very un-recommended)
You can play around with system calls and API - and by going into low-level ring0 assembly modify kernal.dll to referrer I/O queries from your path to the memory. (maybe that's possible without ring0 access - I am not sure).
Obviously, the most recommended is to use a reflector to modify the library given. otherwise, I can't see a solution for you.
In respond to the first comment, you can:
use RAMDrive (a program which allocates small chunks of the system memory and show it as partition)
If the file must exist on the disk (and only disk paths are accepted), then the main option is a virtual filesystem which lets you expose custom data as a filesystem. There exist several options, such as now-dead Dokan, our Solid File System OS Edition and Callback File System (see description of our Virtual Storage product line) and maybe Pismo File Mount would work (never looked at it closely).
It all depends on how the library is constructed.
If it's a 100% managed library that uses a FileStream, you are probably stuck.
If it takes the provided filename and call a native WIN32 CreateFile function, it's possible to give it something else than a file such as a named pipe.
To test quickly if it's possible, pass #"\\.\pipe\random_name" to the method: if it responds by saying explicitely that it can't open pipes and filenames begining with \\.\, well, sorry. ON the other hand, if it says it can't find the file, you have a chance to make it work.
You can then create a NamedPipeServerStream and use the same name for your library method call prepended with \\.\pipe\.
You can't "represent" it as a file, but you could "convert" it to a file using a StreamWriter class.

Reducing String Size By Zipping And Storing In Object

Our application at work basically has to create over a million objects each night to run a numerical simulation involving some weather observations that were recorded during the day.
Each object contains a few string properties (and one very large xml property - About 2 MB) - Beacuse of the size of the large xml property we dont load this up and instead prefer to go the database when we need access to this xml blob (which we do for each object)
I was wondering if it makes sense to somehow retrieve the xml data (which is 2MB) compress it in memory and store it in the object - This prevents us having to do a database query for each object when we come to process it.
I would much rather zip the data, store it in the object and at processing time, unzip and process
Is it possible to zip a string in process and how can I do this without creating millions of MemoryStreams / zip streams for each object?
I would think that compression is not a good idea - it adds quite an overhead to processing, which already appears to be quite intensive.
Perhaps a light-weight format would be better - JSON or a binary serialized object representing the data.
Without more detail, it is difficult to give a definite answer, or better options.
Well, there is DotNetZip which has a simple API so you can do something like this:
byte[] compressedProperty;
public string MyProperty
{
get { DeflateStream.UncompressString(compressedProperty); }
set { compressedProperty = DeflateStream.CompressString(value); }
}
Not sure if it will work out performance wise for you though.
Update:
I only know the GZipStream and the DeflateStream class. Neither of them expose a string interface. Even DotNetZip uses a stream under the hood when you call the functions above, it's just wrapped around a nice interface (which you could do with the System.IO.Compression classes on your own). Not sure what your problem is with streams.
If you really want to avoid streams then you probably have to roll your own compression. Here is a guy who rolled a simple Huffman encoder to encode strings in F#. Don't know how well it works but I you want to avoid 3rd party libs and streams then you could give it a crack.

BinaryFormatter in C# a good way to read files?

I want to read a binary file which was created outside of my program. One obvious way in C# to read a binary file is to define class representing the file and then use a BinaryReader and read from the file via the Read* methods and assign the return values to the class properties.
What I don't like with the approach is that I manually have to write code that reads the file, although the defined structure represents how the file is stored. I also have to keep the order correct when I read.
After looking a bit around I came across the BinaryFormatter which can automatically serialize and deserialze object in binary format. One great advantage would be that I can read and also write the file without creating additional code. However I wonder if this approach is good for files created from other programs on not just serialized .NET objects. Take for example a graphics format file like BMP. Would it be a good idea to read the file with a BinaryFormatter or is it better to manually and write via BinaryReader and BinaryWriter? Or are there any other approaches which suit better? I'am not looking for concrete examples but just for an advice what is the best way to implement that.
You'd have to be very VERY lucky to find an external file format that happened to map perfectly to the format the BinaryFormatter puts out. The BinaryFormatter obviously adds information on the types/things you're serializing, as well as the data itself, whereas a "normal" binary file format will generally be "these bytes are this, then these bytes are this".
When I've done this in the past (reading SWF headers springs to mind recently) I've always just used a file stream and processed/mapped it manually.

Categories