Suppose this C# code:
using (MemoryStream stream = new MemoryStream())
{
StreamWriter normalWriter = new StreamWriter(stream);
BinaryWriter binaryWriter = new BinaryWriter(stream);
foreach(...)
{
binaryWriter.Write(number);
normalWriter.WriteLine(name); //<~~ easier to reader afterward.
}
return MemoryStream.ToArray();
}
My questions are:
Do I need to use flush inside the
loop to preserve order?
Is returning MemoryStream.ToArray() legal? I using the using-block as a convention, I'm afraid it will mess things up.
Scratch the previous answer - I hadn't noticed that you were using two wrappers around the same stream. That feels somewhat risky to me.
Either way, I'd put the StreamWriter and BinaryWriter in their own using blocks.
Oh, and yes, it's legal to call ToArray() on the MemoryStream - the data is retained even after it's disposed.
If you really want to use the two wrappers, I'd do it like this:
using (MemoryStream stream = new MemoryStream())
{
using (StreamWriter normalWriter = new StreamWriter(stream))
using (BinaryWriter binaryWriter = new BinaryWriter(stream))
{
foreach(...)
{
binaryWriter.Write(number);
binaryWriter.Flush();
normalWriter.WriteLine(name); //<~~ easier to read afterward.
normalWriter.Flush();
}
}
return MemoryStream.ToArray();
}
I have to say, I'm somewhat wary of using two wrappers around the same stream though. You'll have to keep flushing each of them after each operation to make sure you don't end up with odd data. You could set the StreamWriter's AutoFlush property to true to mitigate the situation, and I believe that BinaryWriter currently doesn't actually require flushing (i.e. it doesn't buffer any data) but relying on that feels risky.
If you have to mix binary and text data, I'd use a BinaryWriter and explicitly write the bytes for the string, fetching it with Encoding.GetBytes(string).
Update
Nevermind this answer, I got confused with the writers...
No, the order will be preserved (update: maybe not). Flush is useful/needed in other situations, though I can't remember when.
I think so, using makes sure everything cleans up nicely.
Related
I am currently working on a socket server and I was wondering
Why do serializers like
XmlSerializer
BinaryFormatter
Protobuf-net
DataContractSerializer
all require a Stream instead of a byte array?
It means you can stream to arbitrary destinations rather than just to memory.
If you want to write something to a file, why would you want to create a complete copy in memory first? In some cases that could cause you to use a lot of extra memory, possibly causing a failure.
If you want to create a byte array, just use a MemoryStream:
var memoryStream = new MemoryStream();
serializer.Write(foo, memoryStream); // Or whatever you're using
var bytes = memoryStream.ToArray();
So with an abstraction of "you use streams" you can easily work with memory - but if the abstraction is "you use a byte array" you are forced to work with memory even if you don't want to.
You can easily make a stream over a byte array...but a byte array is inherently size-constrained, where a stream is open-ended...big as you need. Some serialization can be pretty enormous.
Edit: Also, if I need to implement some kind of serialization, I want to do it for the most basic abstraction, and avoid having to do it over multiple abstractions. Stream would be my choice, as there are stream implementations over lots of things: memory, disk, network and so forth. As an implementer, I get those for "free".
if you use a byte array/ buffer you are working temporarily in memory and you are limited in size
While a stream is something that lets you store things on disk, send across to other computers such as the internet, serial port, etc. streams often use buffers to optimize transmission speed.
So streaming will be useful if you are dealing with a large file
#JonSkeet's answer is the correct one, but as an addendum, if the issue you're having with making a temporary stream is "I don't like it because it's effort" then consider writing an extension method:
namespace Project.Extensions
{
public static class XmlSerialiserExtensions
{
public static void Serialise(this XmlSerializer serialiser, byte[] bytes, object obj)
{
using(var temp = new MemoryStream(bytes))
serialiser.Serialize(temp, obj);
}
public static object Deserialise(this XmlSerializer serialiser, byte[] bytes)
{
using(var temp = new MemoryStream(bytes))
return serialiser.Deserialize(temp);
}
}
}
So you can go ahead and do
serialiser.Serialise(buffer, obj);
socket.Write(buffer);
Or
socket.Read(buffer);
var obj = serialiser.Deserialise(buffer);
Byte arrays were used more often when manipulating ASCII (i.e. 1-byte) strings of characters often in machine dependent applications, such as buffers. They lend themselves more to low-level applications, whereas "streams" is a more generalized way of dealing with data, which enables a wider range of applications. Also, streams are a more abstract way of looking at data, which allows considerations such as character type (UTF-8, UTF-16, ASCII, etc.) to be handled by code that is invisible to the user of the data stream.
Can I create a new BinaryWriter and write on a Stream, while the stream is already beeing used by another BinaryWriter?
I need to write some data recursively, but I would like to avoid passing a BinaryWriter to a method as a parameter, as I need to pass a Stream instead. So, each method that will write data on the stream may need to create its own BinaryWriter instance. But I don't know if this is right. For now, it works well on a FileStream, but I don't know if it could lead to unexpected results on the users machines.
I wrote a simple example of what I want to achieve. Is this use of the BinaryWriter wrong?
Example:
public Main()
{
using (var ms = new MemoryStream())
{
// Write data on the stream.
WriteData(ms);
}
}
private void WriteData(Stream output)
{
// Create and use a BinaryWriter to use only on this method.
using (var bWriter = new BinaryWriter(output, Encoding.UTF8, true))
{
// Write some data using this BinaryWriter.
bWriter.Write("example data string");
// Send the stream to other method and write some more data there.
WriteMoreData(output);
// Write some more data using this BinaryWriter.
bWriter.Write("another example data string");
}
}
private void WriteMoreData(Stream output)
{
// Create and use a BinaryWriter to use only on this method.
using (var bWriter = new BinaryWriter(output, Encoding.Unicode, true))
{
// Write some data on this BinaryWriter.
bWriter.Write("write even more example data here");
}
}
Is this use of the BinaryWriter wrong?
It should work fine. BinaryWriter does no buffering itself, so each instance won't interfere with data written by other instances. You're passing true for the leaveOpen parameter, so when each instance is disposed, it won't close the underlying stream.
But "wrong" is to some degree in the eye of the beholder. I would say it's better to pass the BinaryWriter.
MemoryStream isn't buffered, but other types are. Each instance of BinaryWriter, when it's disposed, will flush the stream. This could be considered inefficient by some people, as it negates the benefit of the buffering, at least partially. Not an issue here, but may not be the best habit to get into.
In addition, each instance of BinaryWriter is going to create additional work for the garbage collector. If there's really only a few, that's probably not an issue. But if the real-world example involves a lot more calls, that could start to get noticeable, especially when the underlying stream is a MemoryStream (i.e. you're not dealing with some slow device I/O).
More to the point, I don't see any clear advantage to using multiple BinaryWriter instances on the same stream here. It seems like the natural, readable, easily-maintained thing to do would be to create a single BinaryWriter and reuse it until you're done writing.
Why do you want to avoid passing it as a parameter? You're already passing the Stream. Just pass the BinaryWriter instead. If you ever did need direct access to the underlying stream, it's always available via BinaryWriter.BaseStream.
Bottom line: I can't say there's anything clearly wrong per se with your proposal. But it's a deviation from normal conventions without (to me, anyway) a clear benefit. If you have a really good rationale for doing it this way, it should work. But I'd recommend against it.
I have always wondered what the best practice for using a Stream class in C# .Net is. Is it better to provide a stream that has been written to, or be provided one?
i.e:
public Stream DoStuff(...)
{
var retStream = new MemoryStream();
//Write to retStream
return retStream;
}
as opposed to;
public void DoStuff(Stream myStream, ...)
{
//write to myStream directly
}
I have always used the former example for sake of lifecycle control, but I have this feeling that it a poor way of "streaming" with Stream's for lack of a better word.
I would prefer "the second way" (operate on a provided stream) since it has a few distinct advantages:
You can have polymorphism (assuming as evidenced by your signature you can do your operations on any type of Stream provided).
It's easily abstracted into a Stream extension method now or later.
You clearly divide responsibilities. This method should not care on how to construct a stream, only on how to apply a certain operation to it.
Also, if you're returning a new stream (option 1), it would feel a bit strange that you would have to Seek again first in order to be able to read from it (unless you do that in the method itself, which is suboptimal once more since it might not always be required - the stream might not be read from afterwards in all cases). Having to Seek after passing an already existing stream to a method that clearly writes to the stream does not seem so awkward.
I see the benefit of Streams is that you don't need to know what you're streaming to.
In the second example, your code could be writing to memory, it could be writing directly to file, or to some network buffer. From the function's perspective, the actual output destination can be decided by the caller.
For this reason, I would prefer the second option.
The first function is just writing to memory. In my opinion, it would be clearer if it did not return a stream, but the actual memory buffer. The caller can then attach a Memory Stream if he/she wishes.
public byte[] DoStuff(...)
{
var retStream = new MemoryStream();
//Write to retStream
return retStream.ToArray();
}
100% the second one. You don't want to make assumptions about what kind of stream they want. Do they want to stream to the network or to disk? Do they want it to be buffered? Leave these up to them.
They may also want to reuse the stream to avoid creating new buffers over and over. Or they may want to stream multiple things end-to-end on the same stream.
If they provide the stream, they have control over its type as well as its lifetime. Otherwise, you might as well just return something like a string or array. The stream isn't really giving you any benefit over these.
I am working on a parser that watches a stream (probably from a NetworkStream). When certain data is seen on the source stream, a new MemoryStream is created and the relevant data from the source is written to it.
Then I pass the MemoryStream for parsing to another class method that parses the MemoryStream as a Stream. In this method a BinaryReader is created. When it goes to read the data there is none because the BinaryReader is actually at the end of the data.
The BinaryReader does not have a Position property or Seek method, so the underlining BaseStream position needs to be changed. Once the position is changed the stream can be parsed.
In this case we are not adding additional data so there is no problem. But the thought occurs to me that if a similar situation arrives and additional data is going to be written to the this might not work because the position value was changed behind its back.
I’m a bit fuzzy on the implications here.
Do the writer and reader work with copies of the BaseStream and its position or the original object allowing for corruption?
Does this mean that I need to create the reader at the same time I create the writer so that both start at the same spot then pass then reader to the method instead of the BaseStream? I think this might be a better practice.
Does the BinaryReader and Writer maintain their own position information? I'm thing not because the property isn't there. If not can you use them concurrently in the same thread?
Update #1: Based on an answer and comments that have since been withdrawn I think I need to make my confusion a little clearer.
There is a BaseStream property in both BinaryWriter and BinaryReader. I thought it pointed to the stream object that was used to create the writer and reader. I’m starting to think it is just a worker object that is unique to both.
I don’t want to assume to much about the stream objects so that I remain open to multiple types of streams as a source.
Update #2: Now after running some test code I see that they are connected. When data is written it affects the position of the reader. I would have thought to be useful the reader would remain unaffected so that it could pick up where it left off, reading the next part of the stream.
I envisioned something like this:
A data event occurs. The event causes data to be written by the writer.
At some point the reader works on some of the data in the stream.
Another event occurs causing more data to be written.
It is appended to the data that the reader already is working on.
The reader finishes its work including new data.
But based on the way the position works between the reader and writer this is not the way the stream is to be used,.
Maybe my conception problem is because my BaseStream is a MemoryStream and the rules are different than it would be with a NetworkStream. I was trying to keep the implementation details of the stream's source out of the reading class.
I guess at this point I have the answer to the question. Now I need to find information on using streams to do the type of thing that I have in my head.
I found that working with MemoryStream the Reader and Writer updated the position in the base stream. Therefore you cannot read a stream that is still being written without juggling copying and restoring of the position value. Not recommended.
I reworked things so that I could write an entire transaction to a MemoryStream pass it to another class. The create a new MemoryStream for the next transaction.
The base stream is not a derivation of the given stream, it is the actual stream.
What is the difference between instantiating a Stream object, such as MemoryStream and calling the memoryStream.Write() method to write to the stream, and instantiating a StreamWriter object with the stream and calling streamWriter.Write()?
Consider the following scenario:
You have a method that takes a Stream, writes a value, and returns it. The stream is read from later on, so the position must be reset. There are two possible ways of doing it (both seem to work).
// Instantiate a MemoryStream somewhere
// - Passed to the following two methods
MemoryStream memoryStream = new MemoryStream();
// Not using a StreamWriter
private static Stream WriteToStream(Stream stream, string value)
{
stream.Write(Encoding.Default.GetBytes(value), 0, value.Length);
stream.Flush();
stream.Position = 0;
return stream;
}
// Using a StreamWriter
private static Stream WriteToStreamWithWriter(Stream stream, string value)
{
StreamWriter sw = new StreamWriter(stream);
sw.Write(value, 0, value.Length);
sw.Flush();
stream.Position = 0;
return stream;
}
This is partially a scope problem, as I don't want to close the stream after writing to it since it will be read from later. I also certainly don't want to dispose it either, because that will close my stream. The difference seems to be that not using a StreamWriter introduces a direct dependency on Encoding.Default, but I'm not sure that's a very big deal. What's the difference, if any?
With the StreamWriter you have higher level overloads that can write various types to the stream without you worrying about the details. For example your code
sw.Write(value, 0, value.Length);
Could actually just be
sw.Write(value);
Using the StreamWriter.Write(string) overload.
One difference is that new StreamWriter(stream) by default uses UTF-8 encoding, so it will support Unicode data. Encoding.Default (at least on my machine) is a fixed-size code page (such as Windows-1250) and only supports ASCII and a limited set of national characters (256 different characters in total).
You really shouldn't do the following:
stream.Write(encoding.GetBytes(value), 0, value.Length);
It's just a coincidence that the encoding you use has a fixed size of 1 byte. (It wouldn't work with UTF-16, or with UTF-8 and non-ASCII data.) Instead, if you need to directly write to a stream, do:
byte[] byteData=encoding.GetBytes(value);
stream.Write(byteData, 0, byteData.Length);
StreamWriter is a superclass of Stream that implements a TextWriter for easier handling of text. But since is a super class it has all the same methods in addition to the text handling ones. This why you need the Encoding.Default.GetBytes(value) in the first example and in the second you do not.
In terms of byte[] arrays, nothing, StreamWriter does introduce other more useful methods though for working with other types.