When to use StreamReader.ReadBlock()?

When to use StreamReader.ReadBlock()? - c#

I would like to know of a situation Read(char[],int,int) fails to return all chars requested while ReadBlock() returns all chars as expected (say when StreamReader works with an instance of a FileStream object).

In practice, when using a StreamReader it is only likely to happen with a stream which may delay for some time - e.g. a network stream as people have mentioned here.
However, with a TextReader generally, you can expect it to happen at any time (including possibly with a future version of .NET in a case where it doesn't currently happen - that it doesn't happen with StreamReader backed on FileStream isn't documented, so there's no guarantee it won't happen in the future).
In particular there are a lot of cases where it's easier (with a knock on effect of simpler, more reliable and probably more efficient code) for the implementer to not return the requested amount if they can partially fulfil the call with just emptying the current buffer, or by using the amount requested as the amount to pass to a backing source (a stream or another TextReader) before doing an operation that could only return a lower number of characters.
Now, to answer the actual question as to "When to use StreamReader.ReadBlock()?" (or more generally, when to use TextReader.ReadBlock()). The following should be borne in mind:
Both Read() and ReadBlock() are guaranteed to return at least one character unless the entire source has been read. Neither will return 0 if there is pending content.
Calling ReadBlock() when Read() will do is wasteful, as it loops needlessly.
But on the other hand it's not that wasteful.
But on the third hand, the cases where Read() will return fewer than requested characters are often cases where another thread is engaged in getting the content that will fill the buffer for the next call, or where that content doesn't exist yet (e.g. user input or a pending operation on another machine) - there's better overall concurrency to be found in processing the partial result and then calling Read() again when that's finished.
So. If you can do something useful with a partial result, then call Read() and work on what you get. In particular if you are looping through and working on the result of each Read() then do this rather than with ReadBlock().
A notable case is if you are building your own TextReader that is backed by another. There's no point calling ReadBlock() unless the algorithm really needs a certain number of characters to work - just return as much as you can from a call to Read() and let the calling code call ReadBlock() if it needs to.
In particular, note that the following code:
char buffer = char[4096];
int len = 0;
while((len = tr.ReadBlock(buffer, 0 , 4096)) != 0)
DoSomething(buffer, 0, len);
Can be rewritten as:
char buffer = char[4096];
for(int len = tr.Read(buffer, 0, 4096); len != 0; len = tr.Read(buffer, 0, 4096))
DoSomething(buffer, 0, len);
It may call DoSomething() with smaller sizes sometimes, but it can also have better concurrency if there's another thread involved in providing the data for the next call to Read().
However, the gain is not major in most cases. If you really need a certain number of characters, then do call ReadBlock(). Most importantly in those cases where Read() will have the same result as ReadBlock() the overhead of ReadBlock() checking that it has done so is very slight. Don't try to second-guess whether Read() is safe or not in a given case; if it needs the guarantee of ReadBlock() then use ReadBlock().

It's a major issue on network streams particularly if the Streamed data is quite large and the server providing the Stream does so as chunked output (i.e. quite typical with HTTP) since a call to just Read() i.e. the single character read gives -1 once the EOS is reached, calls to Read(char[], int, int) with arguments gets up to the number of characters asked for or less if the EOS is reached and then returns the number of characters read or zero if the EOS has been reached
Whereas ReadBlock() waits for data to be available from the Stream so you never run into this issue.
One other thing to note is that both forms reads a maximum number of characters and are not guaranteed to return that many characters if there aren't that many available
I asked a question on a similar topics some time ago - Reading from a HttpResponseStream fails - when I had issues reading from HTTP streams

Related

FileStream.Read() - bytes read

FileStream.Read() returns the amount of bytes read, but... is there any situation other than having reached the end of file, that it will read less bytes than the number of bytes requested and not throw an exception?
the documentation says:
The Read method returns zero only after reaching the end of the stream. Otherwise, Read always reads at least one byte from the stream before returning. If no data is available from the stream upon a call to Read, the method will block until at least one byte of data can be returned. An implementation is free to return fewer bytes than requested even if the end of the stream has not been reached.
But this doesn't quite explain in what situations data would be unavailable and cause the method to block until it can read again. I mean, shouldn't most situations where data is unavailable force an exception?
What are real situations where comparing the number of bytes read against the number of expected bytes could differ (assuming that we're already checking for end of file when we mention number of bytes expected)?
EDIT: A bit more information, reason why I'm asking this is because I've come across a bit of code where the developer pretty much did something like this:
bytesExpected = (remainingBytesInFile > 94208 ? 94208 : remainingBytesInFile
while (bytesRead < bytesExpected)
{
bytesRead += fileStream.Read(buffer, bytesRead, bytesExpected - bytesRead)
}
Now, I can't see any advantage to having this while at all, I'd expect it to throw an exception if it can't read the number of bytes expected (bearing in mind it's already taking into account that there are those many bytes left to read)
What would the reason one could possibly have for something like this? I'm sure I'm missing something

The documentation is for Stream.Read, from which FileStream is derived. Since FileStream is a stream, it should obey the stream contract. Not all streams do, but unless you have a very good reason, you should stick to that.
In a typical file stream, you'll only get a return value smaller than count when you reach the end of file (and it's a pretty simple way of checking for the end of file).
However, in a NetworkStream, for example, you keep reading in a loop until the method returns zero - signalling the end of stream. The same works for file streams - you know you're at the end of the file when Read returns zero.
Most importantly, FileStream isn't just for what you'd consider files - it's also for pseudo-files like standard input/output pipes and COM ports, for example (try opening a file stream on PRN, for example). In that case, you're not reading a file with a fixed length, and the behaviour is the same as with NetworkStream.
Finally, don't forget that FileStream isn't sealed. It's perfectly fine for you to implement a virtualized file system, for example - and it's perfectly fine if your virtualized file system doesn't support seeking, or checking the length of file.
EDIT:
To address your edit, this is exactly how you're supposed to read any stream. Nothing wrong with it. If there's nothing else to read in a stream, the Read method will simply return 0, and you know the stream is over. The only thing is, it seems that he tries to fill his buffer to full, one buffer at a time - this only makes sense if you explicitly need to partition the file by 94208 bytes, and pass that byte[] for further processing somewhere.
If that's not the case, you don't really need to fill the full buffer - you just keep reading (and probably writing on some other side) until Read returns 0. And indeed, by default, FileStream will always fill the whole buffer unless it's built around a pipe handle - but since that's a possibility, you shouldn't rely on the "real file" behaviour, so as long as you need those byte[] for something non-stream (e.g. parsing messages), this is entirely fine. If you're only using the stream as an actual stream, and you're streaming the data somewhere else, it doesn't have a point, really - you only need one while to read the file.

Your expectations would only apply to the case when the stream is reading data off of a no-latency source. Other I/O sources can be slow, which is why the Read method might will not always be able to return immediately. That doesn't mean that there is an error (so no exception), just that it has to wait for data to arrive.
Examples: network stream, file stream on slow disk, etc.
(UPDATE, HDD example) To give an example specific to files (since your case is FileStream, although Read is defined on Stream and so all implementations should fulfill the requirements): mechanical hard-drives go to "sleep" when not active (specially on battery-powered devices, read laptops). Spinning up can take a second or so. That is not an IOException, but your read would have to wait for a second before any data is read.

Simple answer is that on a FileStream it probably never happens.
However keep in mind that the Read method is inherited from Stream which serves as base for many other streams like NetworkStream and in this case you may not be able to read has many bytes as you requested simple because they havent been received from the network yet.
So like the documentation says it all depends on the implementation of the specific type of stream - FileStream, NetworkStream, etc.

Why write to Stream in chunks?

I am wondering why so many examples read byte arrays into streams in chucks and not all at once... I know this is a soft question, but I am interested.
I understand a bit about hardware and filling buffers can be very size dependent and you wouldn't want to write to the buffer again until it has been flushed to wherever it needs to go etc... but with the .Net platform (and other modern languages) I see examples of both. So when use which and when, or is the second an absolute no no?
Here is the thing (code) I mean:
var buffer = new byte[4096];
while (true)
{
var read = this.InputStream.Read(buffer, 0, buffer.Length);
if (read == 0)
break;
OutputStream.Write(buffer, 0, read);
}
rather than:
var buffer = new byte[InputStream.Length];
var read = this.InputStream.Read(buffer, 0, buffer.Length);
OutputStream.Write(buffer, 0, read);
I believe both are legal? So why go through all the fuss of the while loop (in whatever for you decide to structure it)?
I am playing devils advocate here as I want to learn as much as I can :)

In the first case, all you need is 4kB of memory. In the second case, you need as much memory as the input stream data takes. If the input stream is 4GB, you need 4GB.
Do you think it would be good if a file copy operation required 4GB of RAM? What if you were to prepare a disk image that's 20GB?
There is also this thing with pipes. You don't often use them on Windows, but a similar case is often seen on other operating systems. The second case waits for all data to be read, and only then writes them to the output. However, sometimes it is advisable to write data as soon as possible—the first case will start writing to the output stream as soon as the first 4kB of input is read. Think of serving web pages: it is advisable for a web server to send data as soon as possible, so that client's web browser will start rendering headers and first part of the content, not waiting for the whole body.
However, if you know that the input stream won't be bigger than 4kB, then both cases are equivalent.

Sometimes, InputStream.Length is not valid for some source, e.g from the net transport, or the buffer maybe huge, e.g read from a huge file. IMO.

It protects you from the situation where your input stream is several gigabytes long.

You have no idea how much data Read might return. This could create major performance problems if you're reading a very large file.
If you have control over the input, and are sure the size is reasonable, then you can certainly read the whole array in at once. But be especially careful if the user can supply an arbitrary input.

What is the minimum number of bytes that will cause Socket.Receive to return?

We are using a application protocol which specifies the length indicator of the message in the first 4 bytes. Socket.Receive will return as much data as in the protocol stack at the time or block until data is available. This is why we have to continously read from the socket until we receive the number of bytes in the length indicator. The Socket.Receive will return 0 if the other side closed the connection. I understand all that.
Is there a minimum number of bytes that has to be read? The reason I ask is from the documentation it seems entirely possible that the entire length indicator (4 bytes) might not be available when socket.Receive can return. We would then have to have to keep trying. It would be more efficient to minimize the number of times we call socket.receive because it has to copy things in and out of buffers. So is it safer to get a single byte at a time to get the length indicator, is it safe to assume that 4 bytes will always be available or should we keep trying to get 4 bytes using an offset variable?
The reason that I think that there may be some sort of default minimum level is that I came across a varaible called ReceiveLowWater variable that I can set in the socket options. But this appears to only apply to BSD. MSDN See SO_RCVLOWAT.
It isn't really that important but I am trying to write unit tests. I have already wrapped a standard .Net Socket behind an interface.

is it safe to assume that 4 bytes will always be available
NO. Never. What if someone is testing your protocol with, say, telnet and a keyboard? Or over a real slow or busy connection? You can receive one byte at a time or a split "length indicator" over multiple Receive() calls. This isn't unit testing matter, it's basic socket matter that causes problems in production, especially under stressful situations.
or should we keep trying to get 4 bytes using an offset variable?
Yes, you should. For your convenience, you can use the Socket.Receive() overload that allows you to specify a number of bytes to be read so you won't read too much. But please note it can return less than required, that's what the offset parameter is for, so it can continue to write in the same buffer:
byte[] lenBuf = new byte[4];
int offset = 0;
while (offset < lenBuf.Length)
{
int received = socket.Receive(lenBuf, offset, lenBuf.Length - offset, 0);
offset += received;
if (received == 0)
{
// connection gracefully closed, do your thing to handle that
}
}
// Here you're ready to parse lenBuf
The reason that I think that there may be some sort of default minimum level is that I came across a varaible called ReceiveLowWater variable that I can set in the socket options. But this appears to only apply to BSD.
That is correct, the "receive low water" flag is only included for backwards compatibility and does nothing apart from throwing errors, as per MSDN, search for SO_RCVLOWAT:
This option is not supported by the Windows TCP/IP provider. If this option is used on Windows Vista and later, the getsockopt and setsockopt functions fail with WSAEINVAL. On earlier versions of Windows, these functions fail with WSAENOPROTOOPT". So I guess you'll have to use the offset.
It's a shame, because it can enhance performance. However, as #cdleonard pointed out in a comment, the performance penalty from keeping an offset variable will be minimal, as you'l usually receive the four bytes at once.

No, there isn't a minimum buffer size, the length in the receive just needs to match the actual space.
If you send a length in four bytes before the messages actual data, the recipient needs to handle the cases where 1, 2, 3 or 4 bytes are returned and keep repeating the read until all four bytes are received and then repeat the procedure to receive the actual data.

SslStream equivalent of TcpClient.Available?

Based on the advice of #Len-Holgate in this question, I'm asynchronously requesting 0-byte reads, and in the callback, accept bytes the available bytes with synchronous reads, since I know the data is available and won't block. This seems so efficient and wonderful.
But then I add the option for SslStream, and the approach falls apart. The zero-byte read is fine, but the SslStream decrypts the bytes, leaving a zero byte-count in the TcpClient's buffer (appropriately so), and I cannot determine how many bytes are now in the SslStream available for reading.
Is there a simple trick around this?
Some code, just for context:
sslStream.BeginRead(this.zeroByteBuffer, 0, 0, DataAvailable, this);
And after the EndRead() ( which correctly returns 0 ), DataAvailable contains:
// by now this is 0, because sslStream has already consumed the bytes
available = myTcpClient.Available;
if (0 < available) // Never occurs
{
// this part can be distractingly complicated, but
// it's based on the available byte count
sslStream.Read(...);
}
And due to the protocol, I need to evaluate byte-by-byte and decode variable byte-width unicode and stuff. I don't want to have to read byte-by-byte asynchronously!

If I understood correctly, your messages are delimited by a certain character, and you are already using a StringBuilder to cover the case when a message is fragmented into multiple pieces.
You could consider ignoring the delimiter when reading data, adding any data to it when it becomes available, and then inspecting the local StringBuilder for the delimiter character. When found, you can extract a single message using sb.ToString(0, delimiterIndex) and sb.Remove(0, delimiterIndex) until no delimiters remain.
This would also cover the case when two messages are received simultaneously.

stream reliability

I came across a few posts that where claming that streams are not a reliable data structure, meaning that read/write operations might not follow through in all cases.
So:
a) Is there any truth to this consensus?
b) If so what are the cases in wich read/write operaitions might fail?
This consensus on streams which I came across claims that you sould loop through read/write operations until complete:
var bytesRead = 0;
var _packet = new byte[8192];
while ((bytesRead += file_reader.Read(_packet, bytesRead, _packet.Length - bytesRead)) < _packet.Length) ;

Well, it depends on what operation you're talking about, and on what layer you consider it a failure.
For instance, if you attempt to read past the end of a stream (ie. read 1000 bytes from a file that only contains 100 bytes, or read a 1000 bytes from a position that is closer to the end of the file than 1000), you will get fewer bytes left. The stream read methods returns the number of bytes they actually managed to read, so you should check that value.
As for write operations, writing to a file might fail if the disk is full, or other similar problems, but in case of write operations you'll get back an exception.
If you're writing to sockets or other network streams, there is no guarantee that even if the Write method returns without exceptions, that the other end is able to receive it, there's a ton of problems that can go wrong along the way.
However, to alleviate your concerns, streams by themselves are not unreliable.
The medium they talk to, however, can be.

If you follow the documentation, catch and act accordingly when errors occur, I think you'll find streams are pretty much bullet-proof. There's a significant amount of code that has invested in this reliability.
Can we see links to those who claim otherwise? Either you've misunderstood, or they are incorrect.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.