.Net streams: Returning vs Providing - c#

I have always wondered what the best practice for using a Stream class in C# .Net is. Is it better to provide a stream that has been written to, or be provided one?
i.e:
public Stream DoStuff(...)
{
var retStream = new MemoryStream();
//Write to retStream
return retStream;
}
as opposed to;
public void DoStuff(Stream myStream, ...)
{
//write to myStream directly
}
I have always used the former example for sake of lifecycle control, but I have this feeling that it a poor way of "streaming" with Stream's for lack of a better word.

I would prefer "the second way" (operate on a provided stream) since it has a few distinct advantages:
You can have polymorphism (assuming as evidenced by your signature you can do your operations on any type of Stream provided).
It's easily abstracted into a Stream extension method now or later.
You clearly divide responsibilities. This method should not care on how to construct a stream, only on how to apply a certain operation to it.
Also, if you're returning a new stream (option 1), it would feel a bit strange that you would have to Seek again first in order to be able to read from it (unless you do that in the method itself, which is suboptimal once more since it might not always be required - the stream might not be read from afterwards in all cases). Having to Seek after passing an already existing stream to a method that clearly writes to the stream does not seem so awkward.

I see the benefit of Streams is that you don't need to know what you're streaming to.
In the second example, your code could be writing to memory, it could be writing directly to file, or to some network buffer. From the function's perspective, the actual output destination can be decided by the caller.
For this reason, I would prefer the second option.
The first function is just writing to memory. In my opinion, it would be clearer if it did not return a stream, but the actual memory buffer. The caller can then attach a Memory Stream if he/she wishes.
public byte[] DoStuff(...)
{
var retStream = new MemoryStream();
//Write to retStream
return retStream.ToArray();
}

100% the second one. You don't want to make assumptions about what kind of stream they want. Do they want to stream to the network or to disk? Do they want it to be buffered? Leave these up to them.
They may also want to reuse the stream to avoid creating new buffers over and over. Or they may want to stream multiple things end-to-end on the same stream.
If they provide the stream, they have control over its type as well as its lifetime. Otherwise, you might as well just return something like a string or array. The stream isn't really giving you any benefit over these.

Related

Efficiently write ReadOnlySequence to Stream

Reading data using a PipeReader returns a ReadResult containing the requested data as a ReadOnlySequence<byte>. Currently I am using this (simplified) snippet to write the data fetched from the reader to my target stream:
var data = (await pipeReader.ReadAsync(cancellationToken)).Buffer;
// lots of parsing, advancing, etc.
var position = data.GetPosition(0);
while (data.TryGet(ref position, out var memory))
{
await body.WriteAsync(memory);
}
This seems to be a lot of code for such a basic task, which I would usually expect to be a one-liner in .NET. Analyzing the overloads provided by Stream I fail to see how this functionality can be achieved with less code.
Is there some kind of extension method I am missing?
Looks like what you are looking for is planned for .NET 7:
Stream wrappers for more types. Developers have asked for the ability to create a Stream around the contents of a ReadOnlyMemory, a ReadOnlySequence...
Not the greatest answer, but at least you can stop worrying that you are missing something obvious.

Why do most serializers use a stream instead of a byte array?

I am currently working on a socket server and I was wondering
Why do serializers like
XmlSerializer
BinaryFormatter
Protobuf-net
DataContractSerializer
all require a Stream instead of a byte array?
It means you can stream to arbitrary destinations rather than just to memory.
If you want to write something to a file, why would you want to create a complete copy in memory first? In some cases that could cause you to use a lot of extra memory, possibly causing a failure.
If you want to create a byte array, just use a MemoryStream:
var memoryStream = new MemoryStream();
serializer.Write(foo, memoryStream); // Or whatever you're using
var bytes = memoryStream.ToArray();
So with an abstraction of "you use streams" you can easily work with memory - but if the abstraction is "you use a byte array" you are forced to work with memory even if you don't want to.
You can easily make a stream over a byte array...but a byte array is inherently size-constrained, where a stream is open-ended...big as you need. Some serialization can be pretty enormous.
Edit: Also, if I need to implement some kind of serialization, I want to do it for the most basic abstraction, and avoid having to do it over multiple abstractions. Stream would be my choice, as there are stream implementations over lots of things: memory, disk, network and so forth. As an implementer, I get those for "free".
if you use a byte array/ buffer you are working temporarily in memory and you are limited in size
While a stream is something that lets you store things on disk, send across to other computers such as the internet, serial port, etc. streams often use buffers to optimize transmission speed.
So streaming will be useful if you are dealing with a large file
#JonSkeet's answer is the correct one, but as an addendum, if the issue you're having with making a temporary stream is "I don't like it because it's effort" then consider writing an extension method:
namespace Project.Extensions
{
public static class XmlSerialiserExtensions
{
public static void Serialise(this XmlSerializer serialiser, byte[] bytes, object obj)
{
using(var temp = new MemoryStream(bytes))
serialiser.Serialize(temp, obj);
}
public static object Deserialise(this XmlSerializer serialiser, byte[] bytes)
{
using(var temp = new MemoryStream(bytes))
return serialiser.Deserialize(temp);
}
}
}
So you can go ahead and do
serialiser.Serialise(buffer, obj);
socket.Write(buffer);
Or
socket.Read(buffer);
var obj = serialiser.Deserialise(buffer);
Byte arrays were used more often when manipulating ASCII (i.e. 1-byte) strings of characters often in machine dependent applications, such as buffers. They lend themselves more to low-level applications, whereas "streams" is a more generalized way of dealing with data, which enables a wider range of applications. Also, streams are a more abstract way of looking at data, which allows considerations such as character type (UTF-8, UTF-16, ASCII, etc.) to be handled by code that is invisible to the user of the data stream.

Can I use multiple BinaryWriters on the same Stream?

Can I create a new BinaryWriter and write on a Stream, while the stream is already beeing used by another BinaryWriter?
I need to write some data recursively, but I would like to avoid passing a BinaryWriter to a method as a parameter, as I need to pass a Stream instead. So, each method that will write data on the stream may need to create its own BinaryWriter instance. But I don't know if this is right. For now, it works well on a FileStream, but I don't know if it could lead to unexpected results on the users machines.
I wrote a simple example of what I want to achieve. Is this use of the BinaryWriter wrong?
Example:
public Main()
{
using (var ms = new MemoryStream())
{
// Write data on the stream.
WriteData(ms);
}
}
private void WriteData(Stream output)
{
// Create and use a BinaryWriter to use only on this method.
using (var bWriter = new BinaryWriter(output, Encoding.UTF8, true))
{
// Write some data using this BinaryWriter.
bWriter.Write("example data string");
// Send the stream to other method and write some more data there.
WriteMoreData(output);
// Write some more data using this BinaryWriter.
bWriter.Write("another example data string");
}
}
private void WriteMoreData(Stream output)
{
// Create and use a BinaryWriter to use only on this method.
using (var bWriter = new BinaryWriter(output, Encoding.Unicode, true))
{
// Write some data on this BinaryWriter.
bWriter.Write("write even more example data here");
}
}
Is this use of the BinaryWriter wrong?
It should work fine. BinaryWriter does no buffering itself, so each instance won't interfere with data written by other instances. You're passing true for the leaveOpen parameter, so when each instance is disposed, it won't close the underlying stream.
But "wrong" is to some degree in the eye of the beholder. I would say it's better to pass the BinaryWriter.
MemoryStream isn't buffered, but other types are. Each instance of BinaryWriter, when it's disposed, will flush the stream. This could be considered inefficient by some people, as it negates the benefit of the buffering, at least partially. Not an issue here, but may not be the best habit to get into.
In addition, each instance of BinaryWriter is going to create additional work for the garbage collector. If there's really only a few, that's probably not an issue. But if the real-world example involves a lot more calls, that could start to get noticeable, especially when the underlying stream is a MemoryStream (i.e. you're not dealing with some slow device I/O).
More to the point, I don't see any clear advantage to using multiple BinaryWriter instances on the same stream here. It seems like the natural, readable, easily-maintained thing to do would be to create a single BinaryWriter and reuse it until you're done writing.
Why do you want to avoid passing it as a parameter? You're already passing the Stream. Just pass the BinaryWriter instead. If you ever did need direct access to the underlying stream, it's always available via BinaryWriter.BaseStream.
Bottom line: I can't say there's anything clearly wrong per se with your proposal. But it's a deviation from normal conventions without (to me, anyway) a clear benefit. If you have a really good rationale for doing it this way, it should work. But I'd recommend against it.

c# MemoryStream vs Byte Array

I have a function, which generates and returns a MemoryStream. After generation the size of the MemoryStream is fixed, I dont need to write to it anymore only output is required. Write to MailAttachment or write to database for example.
What is the best way to hand the object around? MemoryStream or Byte Array? If I use MemoryStream I have to reset the position after read.
If you have to hold all the data in memory, then in many ways the choice is arbitrary. If you have existing code that operates on Stream, then MemoryStream may be more convenient, but if you return a byte[] you can always just wrap that in a new MemoryStream(blob) anyway.
It might also depend on how big it is and how long you are holding it for; MemoryStream can be oversized, which has advantages and disadvantages. Forcing it to a byte[] may be useful if you are holding the data for a while, since it will trim off any excess; however, if you are only keeping it briefly, it may be counter-productive, since it will force you to duplicate most (at an absolute minimum: half) of the data while you create the new copy.
So; it depends a lot on context, usage and intent. In most scenarios, "whichever works, and is clear and simple" may suffice. If the data is particularly large or held for a prolonged period, you may want to deliberately tweak it a bit.
One additional advantage of the byte[] approach: if needed, multiple threads can access it safely at once (as long as they are reading) - this is not true of MemoryStream. However, that may be a false advantage: most code won't need to access the byte[] from multiple threads.
The MemoryStream class is used to add elements to a stream.
There is a file pointer; It simulates random access, it depends on how it is implemented. Therefore, a MemoryStream is not designed to access any item at any time.
The byte array allows random access of any element at any time until it is unassigned.
Next to the byte [], MemoryStream lives in memory (depending on the name of the class). Then the maximum allocation size is 4 GB.
Finally, use a byte [] if you need to access the data at any index number. Otherwise, MemoryStream is designed to work with something else that requires a stream as input while you just have a string.
Use a byte[] because it's a fixed sized object making it easier for memory allocation and cleanup and holds relatively no overhead - especially since you don't need to use the functions of the MemoryStream. Further you want to get that stream disposed of ASAP so it can release the possible unmanaged resources it may be using.

Should I use exceptions in C# to enforce base class compatibility?

On one hand, I'm told that exceptions in C# are 'expensive', but on the other, I'm stuck on how to implement this.
My problem is this: I'm making a Stream derivitive, that wraps a NetworkStream. Now, the problem I'm facing is this: Read(byte[] buffer, int offset, int count). From the Stream docs for the function:
Returns:
... or zero (0) if the end of the stream has been reached.
The problem is, in the protocol I'm implementing the remote side can send an "end of record" token, or a "please respond" token. Obviously, if this happens at the start of the Read() this causes problems, since I need to return from the function, and I havn't read anything, so I need to return 0, which means the stream is finished, but it isn't... is a EndOfRecordException or similar justified in this case? And in this case, should it aways be thrown when this token is encountered (at the start of the Read() call and make sure these tokens are always at the start by returning early) so that there is some sort of pattern to how these tokens should be handled.
Edit: For what it's worth, these tokens generally come through 3-10 times a second. At the most, I wouldn't expect more than 25 a second.
Exceptions aren't really all that expensive - but they also aren't necessarily the best way to manage expected/normal flow.
To me, it sounds like you aren't actually implementing a Stream - you are encapsulating a stream into a "reader". I might be inclined to write a protocol-specific reader class with suitable methods to detect the end of a record, or Try... methods to get data or return false.
It sounds like you shouldn't really be deriving from Stream if your class is concerned with records. Streams don't generally interpret their data at all - they're just a transport mechanism of data from one place to another.
There have been cases like ZipInputStream in Java which end up being very confusing when a single InputStream effectively has several streams within it, and you can skip between them. Such APIs have been awful to use in my experience. Providing a separate class to implement the "record splitting" which can provide a stream for the data within a record sounds cleaner to me. Then each stream can behave consistently with normal streams. No need for new exceptions.
However, I'm just guessing at your context based on the limited information available. If you could give more details of the bigger picture, that would help.
It's not such a big deal performance-wise, but still... Exceptions are intended for, well, exceptions. Situations that are "unusual". If that is the way the underlying stream behaves, then your stream should be able to handle it. If it can, it should handle it on its own. If not, you can have the user set some callback or something which will get called when you receive a "please respond" token.
I believe that Stream-derived class should deal only with streaming issues and adhere to Stream semantic contract. All higher-level logic (interpreting EOF and EOR tokens) should be placed in some other class.
Maybe you can create an enum that you return, this enum can contain items for EndOfRecord, EndOfStream, ReadOk or whatever you need.
The actual read data can be passed as an out parameter.

Categories