.NET streams, passing streams between objects, best practices (C#) - c#

I'm currently writing a little toy assembler in c# (going through the elements of computing systems book. Really good book by the way.)
The assembler takes an input file path and removes junk (comments etc) lines.
The file is then passed to a parser then finally to a another module that creates the binary code.
This isn't too complicated, but I'd like not to have to write out a temporary file to the filesystem every time one object has finished it's processing of the input file.
I'd like to just pass the stream onto the next object.
I originally thought each class involved in the parsing / junk removing would implement IDisposable but I think this means I can't pass the stream on the next object for processing (the stream would be closed, unless I keep it all in one using statement?).
I think I'm missing something here, is there a simple way to pass streams between objects cleanly, or do I need a different approach?
Thanks in advance for any help!

In general, it is the responsibility of the consumer to properly dispose of a Disposable object. As such, if you pass off a Stream to another object, you shouldn't Dispose it - that would be the responsibility of the consumer.
So in the clear-cut scenarios, either you hold a reference to a Disposable object, in which case you should ensure that it is properly disposed; or you pass the reference to someone else and forget about it.
Then what about the cases where you need to hold a reference yourself, but still pass it along? In these cases, pass a copy of the Disposable resource - this will alow you and the consumer to manage the lifetime of the two instances independently of each other. However, if you get into this situation, you should reconsider your design, as I would call that a code smell.

If something else is using the stream after the assembler is done with it, the assembler shouldn't "own" the stream. The caller should either create a stream for the assembler (and subsequent modules) to use, or the assembler should return a new stream which it is then the caller's responsibility to close.
It would be instructive to see some more details on what your program's architecture looks like and what methods we are discussing here.

The way I did these projects in TECS is:
read each line in the file
trim the whitespace at the beginning and end of the line
if the line is blank or if it starts with // then go to the next line
otherwise, store the line in an array (in C#, I actually use a List<string> object)
Once I've gone through all of the lines, I can close my file stream and safely do my work on the array of lines.

Overall, I agree with the previous comments. However if your model doesn't fit that, you could do what Microsoft did with the XML Writer: It accepts an XMLWriterSettings parameter when you instance it, and one of the properties of the settings object describes whether the writer should close the underlying stream when the writer is disposed.

Related

When should you discard the in and/or out buffers of a SerialPort?

I am having small esoteric issues with my usage of the System.IO.Ports.SerialPort class in my C# application. I currently have written a thread safe class on top of the SerialPort class and am having issues with Reads and Writes that occur with short times in between.
For instance I might Write a command and then Read a response. This behavior is within a lock statement, once the lock releases I might have a Write that occurs and then another Write Read. On my second Read I sometimes receive an empty String.
Now that is just to outline my issue, but trying to make a question that is answerable. I am simply looking for information about when you should use DiscardOutBuffer and DiscardInBuffer? Basically are there any conventions on when to use these functions? Also, are the buffers ever discarded automatically in the underlying SerialPort implementation?
I have to assume that my issues lie with the buffers. That either the in buffer is discard too early or that in buffer is not cleared in time.

Having multiple simultaneous writers (no reader) to a single file. Is it possible to accomplish in a performant way in .NET?

I'm developing a multiple segment file downloader. To accomplish this task I'm currently creating as many temporary files on disk as segments I have (they are fixed in number during the file downloading). In the end I just create a new file f and copy all the segments' contents onto f.
I was wondering if there's not a better way to accomplish this. My idealization is of initially creating f in its full-size and then have the different threads write directly onto their portion. There need not to be any kind of interaction between them. We can assume any of them will start at its own starting point in the file and then only fill information sequentially in the file until its task is over.
I've heard about Memory-Mapped files (http://msdn.microsoft.com/en-us/library/dd997372(v=vs.110).aspx) and I'm wondering if they are the solution to my problem or not.
Thanks
Using the memory mapped API is absolute doable and it will probably perform quite well - of cause some testing would be recommended.
If you want to look for a possible alternative implementation, I have the following suggestion.
Create a static stack data structure, where the download threads can push each file segment as soon as it's downloaded.
Have a separate thread listen for push notifications on the stack. Pop the stack file segments and save each segment into the target file in a single threaded way.
By following the above pattern, you have separated the download of file segments and the saving into a regular file, by putting a stack container in between.
Depending on the implementation of the stack handling, you will be able to implement this with very little thread locking, which will maximise performance.
The pros of this is that you have 100% control on what is going on and a solution that might be more portable (if that ever should be a concern).
The stack decoupling pattern you do, can also be implemented pretty generic and might even be reused in the future.
The implementation of this is not that complex and probably on par with the implementation needed to be done around the memory mapped api.
Have fun...
/Anders
The answers posted so far are, of course addressing your question but you should also consider the fact that multi-threaded I/O writes will most likely NOT give you gains in performance.
The reason for multi-threading downloads is obvious and has dramatic results. When you try to combine the files though, remember that you are having multiple threads manipulate a mechanical head on conventional hard drives. In case of SSD's you may gain better performance.
If you use a single thread, you are by far exceeding the HDD's write capacity in a SEQUENTIAL way. That IS by definition the fastest way to write to conventions disks.
If you believe otherwise, I would be interested to know why. I would rather concentrate on tweaking the write performance of a single thread by playing around with buffer sizes, etc.
Yes, it is possible but the only precaution you need to have is to control that no two threads are writing at the same location of file, otherwise file content will be incorrect.
FileStream writeStream = new FileStream(destinationPath, FileMode.OpenOrCreate, FileAccess.Write, FileShare.Write);
writeStream.Position = startPositionOfSegments; //REMEMBER This piece of calculation is important
// A simple function to write the bytes ... just read from your source and then write
writeStream.Write(ReadBytes, 0 , bytesReadFromInputStream);
After each Write we used writeStream.Flush(); so that buffered data gets written to file but you can change according to your requirement.
Since you have code already working which downloads the file segments in parallel. The only change you need to make is just open the file stream as posted above, and instead of creating many segments file locally just open stream for a single file.
The startPositionOfSegments is very important and calculate it perfectly so that no two segments overwrite the desired downloaded bytes to same location on file otherwise it will provide incorrect result.
The above procedure works perfectly fine at our end, but this can be problem if your segment size are too small (We too faced it but after increasing size of segments it got fixed). If you face any exception then you can also synchronize only the Write part.

Behavior of disposed object

I have a class with IDisposable interface. Now I don't know what behavior should I implement. Should be thrown an ObjectDisposedException for each method call in this class after Dispose method, or it should only throw exception in specified methods like data access to disposed resources?
I tested Bitmap object (just example):
Bitmap b = new Bitmap (100, 100);
b.Dispose (); // if i remove this line - console will display: Format32bppArgb
Console.WriteLine (b.PixelFormat);
Console.ReadKey ();
And console displays: DontCare
So no exception has been thrown. Bitmap object allow to use PixelFormat property after I called Dispose. Should I follow this behavior?
My philosophy on this and many other issues is "do what makes sense".
In some cases, it may be very reasonable for certain class members to be used after a class has released its resources. Indeed, some scenarios may require such use. For example, if an object manages asynchronous transactions over a network connection, one might ask it to shut down and then, after it has done so, ask it how many transactions had been processed, whether any had been left dangling, etc. The final values of such statistics could not be known until after shutdown is complete, and there is conceptually nothing wrong with asking an object to shut down and then asking it for historical information relating to things it has already done.
While one might argue that Close should shut down the connection while allowing the use of properties that report historical informaion, while Dispose should shut things down and disallow the use of such properties, I regard such a distinction as unhelpful. Among other things, one may wish for the connection to release all resources associated with it (something Close might refrain from doing, in the interest of allowing a "reopen" request). Further, in cases where there's no other difference in behavior between Close and Dispose, I don't see any need to require two separate methods purely so Dispose can invalidate the statistical data.
In some sense, many IDisposable objects may be viewed as having two parts--an entity which is interacts with outside resources, and an entity which interacts with managed code and may have limited functionality by itself. While the "separation of concerns" principle would suggest that the two parts should be separate objects (and indeed, there are tines when such a split can be helpful), in many cases client code is going to want to hold a single reference which can serve both purposes. That reference is going to have to implement IDisposable, but disposal shouldn't destroy the managed-code side of things.
As an example, consider the WinForms Font class. That class encapsulates two things: (1) a collection of information about a font (typeface, size, style, etc.), and (2) a GDI font handle. When a Font is Disposed, it can no longer be used for drawing text, but it does not forget the typeface, style, etc. Given a Disposed font, it is possible to construct a new font using that information from an old one. Unfortunately, most of the properties that would allow such information to be read out are explicitly invalidated by Dispose, which means that in many cases if one wants to produce a font which is similar to an existing-but-disposed Font but has some changes, one must construct a new font with information copied from the old one, construct another new font based upon that one, and then Dispose the first new font that was created. It might have been helpful to have a FontDescription class which held information related to typestyle, etc. so that in cases where one wanted to hold a description of a font but didn't need a GDI handle, the font description could be stored in a non-disposable class, but that's not how the classes were designed.
should only throw exception in specified methods like data access to disposed resources?
That's automatic, a class that has a finalizer should throw in a case like this. After all, the method is going to access an operating system object that's no longer alive, that is going to produce an error. That better be reported with a clear one like ObjectDisposedException instead of a mystifying one produced by the operating system error code.
The Bitmap example you gave is a very sad one, but not uncommon for the GDI+ classes. They in general have very poor error handling. Let this not be an example.
The key phrase in the previous paragraph was "class that has a finalizer". Your class should not have a finalizer so whether you throw yourself instead of leaving it up to the method in the disposable class you encapsulate is debatable. In general you should avoid it, it tends to clutter your code for little real benefit. But feel free to do so if you wrap a crummy class like Bitmap that returns bad data.
After calling dispose, setting the object to null is the approach I generally follow. Then, you do not need to create any exception, since null exception will be thrown, and it seems to be the proper way.
When an object is null, it does not really matter whether it is null since it is disposed; or it is null as it is not initialized or it is null as it is explicitly set to null. The consumers should know it is null, not the underlying action of being null.

C# extending SoapExtension -

Background:
I'm trying to write a simple SoapExtension class to log inbound/outbound Soap messages from an asmx web service. Following this article on msdn, I have been able to get things working. However I'd really like to understand why/how it's working rather than just copy & pasting code.
The question:
What I'm stuggling to grasp specifically is the handling of the IO streams in the example. All other articles I've read on the web handle the streams in an identical way... first getting a reference to the original stream, creating an in memory "working" stream, and then swapping the contents as necessary.
First question is, what is meant by "stream chaining" in this context? My understaning of streams is that writing to any stream will automatically write to the 'inner' streams in a pipeline. If that's the case, why is it necessary to manually copy contents from one stream to another?
Second question is, in the examples Copy method they're creating a StreamReader and StreamWriter each time, without disposing them - is this not putting extra pressure on the GC? Doesn't seem like something you'd want on a high traffic web service... I tried wrapping both in using statements, but disposing the reader/writer also closed the stream which led to more serious errors. .NET 4 has new Stream.CopyTo(Stream) methods, but what would be a better approach for .NET 3.5?
Well, by chaining streams you can basically have different streams that do different things, in a chained sequence. For instance, you can have one stream that compresses the data, and then another stream that encrypts the data (or the opposite if we are moving in the other direction).
As for ChainStream itself, well... There are lots of things to say about this one. I really recommend this article called Inside of Chainstream, which is extremely in-depth and also covers most of the questions you have.
The chaining is done in the framework. You get the original stream and return the stream where you put your modified result. The framework will chain this new stream into any other extensions.
It is implemented this way because the chaining works "backwards". Normally you add new functionality on top of streams but in this case you want to deal with the information fed into the original stream.
Calling close on stream is the same as Dispose.

C# implementation of PushbackInputStream

I need a C# implementation of Java's PushbackInputStream. I have made my own very basic one, but I wondered if there was a well tested and decently performing version already available somewhere. As it happens I always push back the same bytes I read so really it just needs to be able to reposition backwards, buffering up to a number of bytes I specify. (like Java's BufferedInputStream with the mark and reset methods).
Update: I should add that I can't simply reposition the stream as CanSeek may be false. (e.g. when the input steam is a NetworkStream)
The problem with pushing data back into a stream is that any readers that sit on top of the stream may already have a local buffer of data. This makes this approach very brittle. Personally, I would try to avoid this scenario, and use data constructs where I either don't need to push back, or can use single-byte Peek etc.
You need to build a wrapper class that either functions as a stream, but supports a buffer of the last X bytes so you can seek back at least for a limited distance, or something that isn't a stream at all where you can indeed "push data back into the input stream".
Either way you're going to have to write something yourself.
Can't you just use a System.IO.Stream and seek backwards after reading from current position?
stream.Seek(-1, System.IO.SeekOrigin.Current)
Where -1 could be a variable of how far you want to go back?
So long as the stream indicates it supports seeking (CanSeek) then
stream.Seek(-offset, System.IO.SeekOrigin.Current)
Will be fine.

Categories