Background:
I'm trying to write a simple SoapExtension class to log inbound/outbound Soap messages from an asmx web service. Following this article on msdn, I have been able to get things working. However I'd really like to understand why/how it's working rather than just copy & pasting code.
The question:
What I'm stuggling to grasp specifically is the handling of the IO streams in the example. All other articles I've read on the web handle the streams in an identical way... first getting a reference to the original stream, creating an in memory "working" stream, and then swapping the contents as necessary.
First question is, what is meant by "stream chaining" in this context? My understaning of streams is that writing to any stream will automatically write to the 'inner' streams in a pipeline. If that's the case, why is it necessary to manually copy contents from one stream to another?
Second question is, in the examples Copy method they're creating a StreamReader and StreamWriter each time, without disposing them - is this not putting extra pressure on the GC? Doesn't seem like something you'd want on a high traffic web service... I tried wrapping both in using statements, but disposing the reader/writer also closed the stream which led to more serious errors. .NET 4 has new Stream.CopyTo(Stream) methods, but what would be a better approach for .NET 3.5?
Well, by chaining streams you can basically have different streams that do different things, in a chained sequence. For instance, you can have one stream that compresses the data, and then another stream that encrypts the data (or the opposite if we are moving in the other direction).
As for ChainStream itself, well... There are lots of things to say about this one. I really recommend this article called Inside of Chainstream, which is extremely in-depth and also covers most of the questions you have.
The chaining is done in the framework. You get the original stream and return the stream where you put your modified result. The framework will chain this new stream into any other extensions.
It is implemented this way because the chaining works "backwards". Normally you add new functionality on top of streams but in this case you want to deal with the information fed into the original stream.
Calling close on stream is the same as Dispose.
Related
I am working in c# and in my program i am currently opening a file stream at launch and writing data to it in csv format about every second. I am wondering, would it be more efficient to store this data in an arraylist and write it all at once at the end, or continue to keep an open filestream and just write the data every second?
If the amount of data is "reasonably manageable" in memory, then write the data a the end.
If this is continuous, I wonder id an option could be to use something like NLog to write your csv (create a specific log format) as that manages writes pretty efficiently. You would also need to set it to raise exceptions if there was an error.
You should consider using a BufferedStream instead. Write to the stream and allow the framework to flush to file as necessary. Just make sure to flush the stream before closing it.
From what I learned in Operating Systems writing to a file is a lot more expensive than writing to memory. However, your stream is most likely going to be cached. Which means that under the hood, all that file writing you are doing is happening in memory. The operating system handles all the actual writing to file asynchronously when it's the right time. Depending on your applications there is no need to worry about such micro-optimizations.
You can read more about why most languages take this approach under the hood here https://unix.stackexchange.com/questions/224415/whats-the-philosophy-behind-delaying-writing-data-to-disk
This kind of depends on your specific case. If you're writing data about once per second it seems likely that you're not going to see much of an impact from writing directly.
In general writing to a FileStream in small pieces is quite performant because the .NET Framework and the OS handle buffering for you. You won't see the file itself being updated until the buffer fills up or you explicitly flush the stream.
Buffering in memory isn't a terrible idea for smallish data and shortish periods. Of course if your program throws an exception or someone kills it before it writes to the disk then you lose all that information, which is probably not your favourite thing.
If you're worried about performance then use a logging thread. Post objects to it through a ConcurrentQueue<> or similar and have it do all the writes on a separate thread. Obviously threaded logging is more complex. It's not something I'd advise unless you really, really need the extra performance.
For quick-and-dirty logging I generally just use File.AppendAllText() or File.AppendAllLines() to push the data out. It takes a bit longer, but it's pretty reliable. And I can read the output while the program is still running, which is often useful.
I'm developing a multiple segment file downloader. To accomplish this task I'm currently creating as many temporary files on disk as segments I have (they are fixed in number during the file downloading). In the end I just create a new file f and copy all the segments' contents onto f.
I was wondering if there's not a better way to accomplish this. My idealization is of initially creating f in its full-size and then have the different threads write directly onto their portion. There need not to be any kind of interaction between them. We can assume any of them will start at its own starting point in the file and then only fill information sequentially in the file until its task is over.
I've heard about Memory-Mapped files (http://msdn.microsoft.com/en-us/library/dd997372(v=vs.110).aspx) and I'm wondering if they are the solution to my problem or not.
Thanks
Using the memory mapped API is absolute doable and it will probably perform quite well - of cause some testing would be recommended.
If you want to look for a possible alternative implementation, I have the following suggestion.
Create a static stack data structure, where the download threads can push each file segment as soon as it's downloaded.
Have a separate thread listen for push notifications on the stack. Pop the stack file segments and save each segment into the target file in a single threaded way.
By following the above pattern, you have separated the download of file segments and the saving into a regular file, by putting a stack container in between.
Depending on the implementation of the stack handling, you will be able to implement this with very little thread locking, which will maximise performance.
The pros of this is that you have 100% control on what is going on and a solution that might be more portable (if that ever should be a concern).
The stack decoupling pattern you do, can also be implemented pretty generic and might even be reused in the future.
The implementation of this is not that complex and probably on par with the implementation needed to be done around the memory mapped api.
Have fun...
/Anders
The answers posted so far are, of course addressing your question but you should also consider the fact that multi-threaded I/O writes will most likely NOT give you gains in performance.
The reason for multi-threading downloads is obvious and has dramatic results. When you try to combine the files though, remember that you are having multiple threads manipulate a mechanical head on conventional hard drives. In case of SSD's you may gain better performance.
If you use a single thread, you are by far exceeding the HDD's write capacity in a SEQUENTIAL way. That IS by definition the fastest way to write to conventions disks.
If you believe otherwise, I would be interested to know why. I would rather concentrate on tweaking the write performance of a single thread by playing around with buffer sizes, etc.
Yes, it is possible but the only precaution you need to have is to control that no two threads are writing at the same location of file, otherwise file content will be incorrect.
FileStream writeStream = new FileStream(destinationPath, FileMode.OpenOrCreate, FileAccess.Write, FileShare.Write);
writeStream.Position = startPositionOfSegments; //REMEMBER This piece of calculation is important
// A simple function to write the bytes ... just read from your source and then write
writeStream.Write(ReadBytes, 0 , bytesReadFromInputStream);
After each Write we used writeStream.Flush(); so that buffered data gets written to file but you can change according to your requirement.
Since you have code already working which downloads the file segments in parallel. The only change you need to make is just open the file stream as posted above, and instead of creating many segments file locally just open stream for a single file.
The startPositionOfSegments is very important and calculate it perfectly so that no two segments overwrite the desired downloaded bytes to same location on file otherwise it will provide incorrect result.
The above procedure works perfectly fine at our end, but this can be problem if your segment size are too small (We too faced it but after increasing size of segments it got fixed). If you face any exception then you can also synchronize only the Write part.
I am building an ASP.NET web application that creates PowerPoint presentations on the fly. I have the basics working but it creates actual physical files on the hard disk. That doesn't seem like a good idea for a large multi-user web application. It seems like it would be better if the application created the presentations in memory and then streamed them back to the user. Instead of manipulating files should I be working with the MemoryStream class? I am not exactly sure I understand the difference between working with Files and working with Streams. Are they sort of interchangeable? Can anyone point me to a good resource for doing file type operations in memory instead of on disk? I hope I have described this well enough.
Corey
You are trying to make decision that you think impacts performance of your application based on "doesn't seem like a good idea" measurement, which is barely scientific. It would be better to implement both and compare, but first you should list your concerns about either implementations.
Here are some ideas to start:
there are really not much difference between temporary files and in-memory streams. Both would have content in physical memory if they are small enough, both will hit the disk if there is memory pressure. Consider using temporary Delete on close files for your files if cleaning files up is the main concern.
OS already doing very good job for managing large files with caching, one would need to make sure pure in-memory solution at least matches it.
MemoryStream is not the best implementation for reasonably sized streams due its "all data is in single byte array" contract (see my answer at https://stackoverflow.com/a/10424137/477420).
Managing multiple large in-memory streams (i.e. for multiple users) is fun for x86 platform, less of a concern for x64 ones.
Some API simply don't provide a way working with Stream-based classes and require physical file.
Files and streams are similar, yes. Both essentially stream a byte array...one from memory, one from the hard drive. If the API you are using allows you to generate a stream, then you can easily do that and serve it out to the user using the Response object.
The following code will take a PowerPoint memory object (you'll need to modify it for your own API, but you can get the general idea), save it to a MemoryStream, then set the proper headers and write the stream to the Response (which will then let the user save the file to their local computer):
SaveFormat format = SaveFormat.PowerPoint2007;
Slideshow show = PowerPointWriter.Generate(report, format);
MemoryStream ms = new MemoryStream();
show.Save(ms, format);
Response.Clear();
Response.Buffer = true;
Response.ContentType = "application/vnd.ms-powerpoint";
Response.AddHeader("Content-Disposition", "attachment; filename=\"Slideshow.ppt\"");
Response.BinaryWrite(ms.ToArray());
Response.End();
Yes, I would recommend the MemoryStream. Typically any time you access a file, you are doing so with a stream. There are many kinds of streams (e.g. network streams, file streams, and memory streams) and they all implement the same basic interface. If you are already creating the file in a file stream, instead of something like a string or byte array, then it should require very little coding changes to switch to a MemoryStream.
Basically, a steam is simply a way of working with large amounts of data where you don't have to, or can't, load all the data at into memory at once. So, rather than reading or writing the entire set of data into a giant array or something, you open a stream which gives you the equivalent of a cursor. You can move your current position to any spot in the stream and read or write to it from that point.
A decompression API that I am using has the following API:
Decode(Stream inStream,Stream outStream)
I'd like to create a wrapper around this API, such that I can create my own Stream class which offers up the decoded data.
Stream decodedStream=new BlaDecodeStream(inStream);
So that I can than use this stream as a parameter to the XmlReader constructor in the same way one might use the System.IO.Compression.GZipStream. As far as I can tell, the only other option is set outStream stream to a MemoryStream or to a FileStream and go in two hops. The files I am dealing with are enormous, so neither of these options are particularly attractive.
Before I go reinventing the wheel, is there any prior art that I might be able to draw from, or something in the BCL I might have missed? The CircularStream implementation here would go some of the way to helping, but I'm really looking for something similar that would block (as opposed to over/underrun) when the Stream's internal buffer is 'empty' when reading from it and block when the internal buffer is full when writing to it.
In this way it could serve as parameter outStream and simultaneously (i.e. from another thread) could be read from by the XmlReader.
I asked about a blocking stream reader a while ago. I implemented one of the suggestions and it works fine.
I'm currently writing a little toy assembler in c# (going through the elements of computing systems book. Really good book by the way.)
The assembler takes an input file path and removes junk (comments etc) lines.
The file is then passed to a parser then finally to a another module that creates the binary code.
This isn't too complicated, but I'd like not to have to write out a temporary file to the filesystem every time one object has finished it's processing of the input file.
I'd like to just pass the stream onto the next object.
I originally thought each class involved in the parsing / junk removing would implement IDisposable but I think this means I can't pass the stream on the next object for processing (the stream would be closed, unless I keep it all in one using statement?).
I think I'm missing something here, is there a simple way to pass streams between objects cleanly, or do I need a different approach?
Thanks in advance for any help!
In general, it is the responsibility of the consumer to properly dispose of a Disposable object. As such, if you pass off a Stream to another object, you shouldn't Dispose it - that would be the responsibility of the consumer.
So in the clear-cut scenarios, either you hold a reference to a Disposable object, in which case you should ensure that it is properly disposed; or you pass the reference to someone else and forget about it.
Then what about the cases where you need to hold a reference yourself, but still pass it along? In these cases, pass a copy of the Disposable resource - this will alow you and the consumer to manage the lifetime of the two instances independently of each other. However, if you get into this situation, you should reconsider your design, as I would call that a code smell.
If something else is using the stream after the assembler is done with it, the assembler shouldn't "own" the stream. The caller should either create a stream for the assembler (and subsequent modules) to use, or the assembler should return a new stream which it is then the caller's responsibility to close.
It would be instructive to see some more details on what your program's architecture looks like and what methods we are discussing here.
The way I did these projects in TECS is:
read each line in the file
trim the whitespace at the beginning and end of the line
if the line is blank or if it starts with // then go to the next line
otherwise, store the line in an array (in C#, I actually use a List<string> object)
Once I've gone through all of the lines, I can close my file stream and safely do my work on the array of lines.
Overall, I agree with the previous comments. However if your model doesn't fit that, you could do what Microsoft did with the XML Writer: It accepts an XMLWriterSettings parameter when you instance it, and one of the properties of the settings object describes whether the writer should close the underlying stream when the writer is disposed.