MemoryStream from HttpContent without copying

MemoryStream from HttpContent without copying - c#

I m trying to use System.Net.Http for POST requests. I m ok with HTTP response body being in memory but need to obtain MemoryStream for it. One way to do that would be to call HttpContent.GetAsByteArrayAsync() and wrap a MemoryStream on top of it, but I think this would require content to be copied into a separate byte array (since it returns Task of byte[]).
If the response body is already in some internal buffer in HttpContent, is it possible to create MemoryStream on top of that buffer, or return MemoryStream from HttpContent somehow and avoid copying to a separate byte array?
There is also HttpContent.GetAsStreamAsync(), but that returns regular Stream, not MemoryStream. Even though it is probably an instance of MemoryStream already, I suppose it is not safe or a good practice to cast the returned stream to MemoryStream? (since this is implementation detail that could change).
Is there any other way of doing this, or do i have no choice but to copy into byte[] first?
Thanks.

If you call LoadIntoBufferAsync first, ReadAsStreamAsync returns a readonly MemoryStream:
await req.Content.LoadIntoBufferAsync();
var stream = (MemoryStream) await req.Content.ReadAsStreamAsync();

If you call LoadIntoBufferAsync first, CopyToAsync can be used to populate a readonly MemoryStream:
var stream = new MemoryStream(req.Content.Headers.ContentLength);
await req.Content.LoadIntoBufferAsync((int)req.Content.Headers.ContentLength);
await req.Content.CopyToAsync(stream);
This implementation doesn't depend on side effects and is supported by the docs in all framework versions: https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpcontent.loadintobufferasync?view=netframework-4.6.2
Note: I tried to edit the above answer, but couldn't do it... So here you are.

Related

Why do most serializers use a stream instead of a byte array?

I am currently working on a socket server and I was wondering
Why do serializers like
XmlSerializer
BinaryFormatter
Protobuf-net
DataContractSerializer
all require a Stream instead of a byte array?

It means you can stream to arbitrary destinations rather than just to memory.
If you want to write something to a file, why would you want to create a complete copy in memory first? In some cases that could cause you to use a lot of extra memory, possibly causing a failure.
If you want to create a byte array, just use a MemoryStream:
var memoryStream = new MemoryStream();
serializer.Write(foo, memoryStream); // Or whatever you're using
var bytes = memoryStream.ToArray();
So with an abstraction of "you use streams" you can easily work with memory - but if the abstraction is "you use a byte array" you are forced to work with memory even if you don't want to.

You can easily make a stream over a byte array...but a byte array is inherently size-constrained, where a stream is open-ended...big as you need. Some serialization can be pretty enormous.
Edit: Also, if I need to implement some kind of serialization, I want to do it for the most basic abstraction, and avoid having to do it over multiple abstractions. Stream would be my choice, as there are stream implementations over lots of things: memory, disk, network and so forth. As an implementer, I get those for "free".

if you use a byte array/ buffer you are working temporarily in memory and you are limited in size
While a stream is something that lets you store things on disk, send across to other computers such as the internet, serial port, etc. streams often use buffers to optimize transmission speed.
So streaming will be useful if you are dealing with a large file

#JonSkeet's answer is the correct one, but as an addendum, if the issue you're having with making a temporary stream is "I don't like it because it's effort" then consider writing an extension method:
namespace Project.Extensions
{
public static class XmlSerialiserExtensions
{
public static void Serialise(this XmlSerializer serialiser, byte[] bytes, object obj)
{
using(var temp = new MemoryStream(bytes))
serialiser.Serialize(temp, obj);
}
public static object Deserialise(this XmlSerializer serialiser, byte[] bytes)
{
using(var temp = new MemoryStream(bytes))
return serialiser.Deserialize(temp);
}
}
}
So you can go ahead and do
serialiser.Serialise(buffer, obj);
socket.Write(buffer);
Or
socket.Read(buffer);
var obj = serialiser.Deserialise(buffer);

Byte arrays were used more often when manipulating ASCII (i.e. 1-byte) strings of characters often in machine dependent applications, such as buffers. They lend themselves more to low-level applications, whereas "streams" is a more generalized way of dealing with data, which enables a wider range of applications. Also, streams are a more abstract way of looking at data, which allows considerations such as character type (UTF-8, UTF-16, ASCII, etc.) to be handled by code that is invisible to the user of the data stream.

Can I use multiple BinaryWriters on the same Stream?

Can I create a new BinaryWriter and write on a Stream, while the stream is already beeing used by another BinaryWriter?
I need to write some data recursively, but I would like to avoid passing a BinaryWriter to a method as a parameter, as I need to pass a Stream instead. So, each method that will write data on the stream may need to create its own BinaryWriter instance. But I don't know if this is right. For now, it works well on a FileStream, but I don't know if it could lead to unexpected results on the users machines.
I wrote a simple example of what I want to achieve. Is this use of the BinaryWriter wrong?
Example:
public Main()
{
using (var ms = new MemoryStream())
{
// Write data on the stream.
WriteData(ms);
}
}
private void WriteData(Stream output)
{
// Create and use a BinaryWriter to use only on this method.
using (var bWriter = new BinaryWriter(output, Encoding.UTF8, true))
{
// Write some data using this BinaryWriter.
bWriter.Write("example data string");
// Send the stream to other method and write some more data there.
WriteMoreData(output);
// Write some more data using this BinaryWriter.
bWriter.Write("another example data string");
}
}
private void WriteMoreData(Stream output)
{
// Create and use a BinaryWriter to use only on this method.
using (var bWriter = new BinaryWriter(output, Encoding.Unicode, true))
{
// Write some data on this BinaryWriter.
bWriter.Write("write even more example data here");
}
}

Is this use of the BinaryWriter wrong?
It should work fine. BinaryWriter does no buffering itself, so each instance won't interfere with data written by other instances. You're passing true for the leaveOpen parameter, so when each instance is disposed, it won't close the underlying stream.
But "wrong" is to some degree in the eye of the beholder. I would say it's better to pass the BinaryWriter.
MemoryStream isn't buffered, but other types are. Each instance of BinaryWriter, when it's disposed, will flush the stream. This could be considered inefficient by some people, as it negates the benefit of the buffering, at least partially. Not an issue here, but may not be the best habit to get into.
In addition, each instance of BinaryWriter is going to create additional work for the garbage collector. If there's really only a few, that's probably not an issue. But if the real-world example involves a lot more calls, that could start to get noticeable, especially when the underlying stream is a MemoryStream (i.e. you're not dealing with some slow device I/O).
More to the point, I don't see any clear advantage to using multiple BinaryWriter instances on the same stream here. It seems like the natural, readable, easily-maintained thing to do would be to create a single BinaryWriter and reuse it until you're done writing.
Why do you want to avoid passing it as a parameter? You're already passing the Stream. Just pass the BinaryWriter instead. If you ever did need direct access to the underlying stream, it's always available via BinaryWriter.BaseStream.
Bottom line: I can't say there's anything clearly wrong per se with your proposal. But it's a deviation from normal conventions without (to me, anyway) a clear benefit. If you have a really good rationale for doing it this way, it should work. But I'd recommend against it.

.Net streams: Returning vs Providing

I have always wondered what the best practice for using a Stream class in C# .Net is. Is it better to provide a stream that has been written to, or be provided one?
i.e:
public Stream DoStuff(...)
{
var retStream = new MemoryStream();
//Write to retStream
return retStream;
}
as opposed to;
public void DoStuff(Stream myStream, ...)
{
//write to myStream directly
}
I have always used the former example for sake of lifecycle control, but I have this feeling that it a poor way of "streaming" with Stream's for lack of a better word.

I would prefer "the second way" (operate on a provided stream) since it has a few distinct advantages:
You can have polymorphism (assuming as evidenced by your signature you can do your operations on any type of Stream provided).
It's easily abstracted into a Stream extension method now or later.
You clearly divide responsibilities. This method should not care on how to construct a stream, only on how to apply a certain operation to it.
Also, if you're returning a new stream (option 1), it would feel a bit strange that you would have to Seek again first in order to be able to read from it (unless you do that in the method itself, which is suboptimal once more since it might not always be required - the stream might not be read from afterwards in all cases). Having to Seek after passing an already existing stream to a method that clearly writes to the stream does not seem so awkward.

I see the benefit of Streams is that you don't need to know what you're streaming to.
In the second example, your code could be writing to memory, it could be writing directly to file, or to some network buffer. From the function's perspective, the actual output destination can be decided by the caller.
For this reason, I would prefer the second option.
The first function is just writing to memory. In my opinion, it would be clearer if it did not return a stream, but the actual memory buffer. The caller can then attach a Memory Stream if he/she wishes.
public byte[] DoStuff(...)
{
var retStream = new MemoryStream();
//Write to retStream
return retStream.ToArray();
}

100% the second one. You don't want to make assumptions about what kind of stream they want. Do they want to stream to the network or to disk? Do they want it to be buffered? Leave these up to them.
They may also want to reuse the stream to avoid creating new buffers over and over. Or they may want to stream multiple things end-to-end on the same stream.
If they provide the stream, they have control over its type as well as its lifetime. Otherwise, you might as well just return something like a string or array. The stream isn't really giving you any benefit over these.

How to determine if MemoryStream is fixed size?

My method gets MemoryStream as parameter. How can I know whether this MemoryStream is expandable?
MemoryStream can be created using an array using "new MemoryStream(byte[] buf)". This means that stream will have fixed size. You can't append data to it.
On other hand, stream can be created with no parameters using "new MemoryStream()". In this case you can append data to it.
Question: How can I know - can I safely append data in a current stream or I must create a new expandable stream and copy data to it?

You can do that using reflection:
static bool IsExpandable(MemoryStream stream)
{
return (bool)typeof(MemoryStream)
.GetField("_expandable", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic)
.GetValue(stream);
}
I don't know if there's a cleaner/safer way to retrieve this information.

It's not actually a fixed size in a sense, better defined as "non-expandable" since it can still be truncated via SetLength, but anyway... Probably the best thing that you can do is always use an expandable stream, or if you don't control that aspect of the code... Perhaps try catch your attempt to expand the stream and if it fails, copy it over to a writable stream and recursively call the method again?

What is the difference between calling Stream.Write and using a StreamWriter?

What is the difference between instantiating a Stream object, such as MemoryStream and calling the memoryStream.Write() method to write to the stream, and instantiating a StreamWriter object with the stream and calling streamWriter.Write()?
Consider the following scenario:
You have a method that takes a Stream, writes a value, and returns it. The stream is read from later on, so the position must be reset. There are two possible ways of doing it (both seem to work).
// Instantiate a MemoryStream somewhere
// - Passed to the following two methods
MemoryStream memoryStream = new MemoryStream();
// Not using a StreamWriter
private static Stream WriteToStream(Stream stream, string value)
{
stream.Write(Encoding.Default.GetBytes(value), 0, value.Length);
stream.Flush();
stream.Position = 0;
return stream;
}
// Using a StreamWriter
private static Stream WriteToStreamWithWriter(Stream stream, string value)
{
StreamWriter sw = new StreamWriter(stream);
sw.Write(value, 0, value.Length);
sw.Flush();
stream.Position = 0;
return stream;
}
This is partially a scope problem, as I don't want to close the stream after writing to it since it will be read from later. I also certainly don't want to dispose it either, because that will close my stream. The difference seems to be that not using a StreamWriter introduces a direct dependency on Encoding.Default, but I'm not sure that's a very big deal. What's the difference, if any?

With the StreamWriter you have higher level overloads that can write various types to the stream without you worrying about the details. For example your code
sw.Write(value, 0, value.Length);
Could actually just be
sw.Write(value);
Using the StreamWriter.Write(string) overload.

One difference is that new StreamWriter(stream) by default uses UTF-8 encoding, so it will support Unicode data. Encoding.Default (at least on my machine) is a fixed-size code page (such as Windows-1250) and only supports ASCII and a limited set of national characters (256 different characters in total).
You really shouldn't do the following:
stream.Write(encoding.GetBytes(value), 0, value.Length);
It's just a coincidence that the encoding you use has a fixed size of 1 byte. (It wouldn't work with UTF-16, or with UTF-8 and non-ASCII data.) Instead, if you need to directly write to a stream, do:
byte[] byteData=encoding.GetBytes(value);
stream.Write(byteData, 0, byteData.Length);

StreamWriter is a superclass of Stream that implements a TextWriter for easier handling of text. But since is a super class it has all the same methods in addition to the text handling ones. This why you need the Encoding.Default.GetBytes(value) in the first example and in the second you do not.

In terms of byte[] arrays, nothing, StreamWriter does introduce other more useful methods though for working with other types.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.