C# BinaryReader end of Stream - c#

I am wondering if it's a good practice to use the EndOfStreamException to detect the end of the BinaryReader stream> I don't want to use BaseStream.Length property or PeekChar as proposed here
C# checking for binary reader end of file
because I have to load it into memory (maybe because it's from Zip file) and enable some flags. Instead, this is what I'm doing:
using (ZipArchive zipArchive = ZipFile.OpenRead(Filename))
using (BinaryReader fStream = new BinaryReader(zipArchive.Entries[0].Open()))
{
while(true){
try
{
fStream.ReadInt32();
}
catch (EndOfStreamException ex)
{
Log.Debug("End of Binary Stream");
break;
}
}
}

That approach is fine. If you know you have a seekable stream you can compare its length to the number of bytes read. Note that FileStream.Length does not load the whole stream into memory.
But that approach is appropriate for arbitrary streams.
And don't worry about the cost of using exceptions in this case, as streams imply IO, and IO is orders of magnitude slower that exception handling.

I would argue that 'best practice' is to have the number of values known, for example by prefixing the stream with the number of values. This should allow you to write it like
var length = fStream.ReadInt32();
for(var i = 0; i < length-1; i++){
fStream.ReadInt32(); // Skip all values except last
}
return fStream.ReadInt32(); // Last value
First of all this would reduce the need of exception handling, if you reach the endOfStream before the last item you know the stream was incorrectly saved, and have a chance of handling it, instead of just returning the last available value. I also find it helpful to have as few exceptions as possible, so you can run your debugger with "break with thrown", and have some sort of confidence that thrown exceptions indicate actual problems. It can also allow you to save your values as part of some other data.
If you cannot change the input format you can still get the uncompressed length of the entry from ZipArchiveEntry.Length. Just divide by sizeof(int) to get the number of values.
In most cases I would also argue for using a serialization library to save data. This tend to make it much easier to change the format of the data in the future.

check your program or init value.

Related

Guidelines for designing a robust file format writer?

Suppose you want to write a .WAV file format writer like so:
using var stream = File.OpenRead("test.wav");
using var writer = new WavWriter(stream, ... /* .WAV format parameters */);
// write the file
// writer.Dispose() does a few things:
// - writes user-added chunks
// - updates the file header (chunk size) so the file is valid
There is a concpetual problem in doing so:
the user can change the stream position and therefore screw the writing process
You may suggest the following:
the writer should own the stream, this would work if writing to a file, but not to a stream
own its own memory stream so it can write to streams too, okay but memory concerns
I guess you get the point...
To me, the only viable thing would be to document that aspect but I may have missed something, hence the question.
Question:
How to make a file format writer be able to write to a stream yet defend yourself about possible changes to its position?
My suggestion would be to keep an internal position field in the WavWriter. Each time you do some operation you can check that this matches the position in the backing stream and throw an exception if it does not. Update this value at the end of each write operation.
Ideally you should also handle streams that does not support seeking, but it does not sound like your design would permit that anyway. It might be a good idea to check CanSeek in the constructor and throw if seek is not supported. It is in general a good idea to validate any arguments before usage.

FileStream.Read() - bytes read

FileStream.Read() returns the amount of bytes read, but... is there any situation other than having reached the end of file, that it will read less bytes than the number of bytes requested and not throw an exception?
the documentation says:
The Read method returns zero only after reaching the end of the stream. Otherwise, Read always reads at least one byte from the stream before returning. If no data is available from the stream upon a call to Read, the method will block until at least one byte of data can be returned. An implementation is free to return fewer bytes than requested even if the end of the stream has not been reached.
But this doesn't quite explain in what situations data would be unavailable and cause the method to block until it can read again. I mean, shouldn't most situations where data is unavailable force an exception?
What are real situations where comparing the number of bytes read against the number of expected bytes could differ (assuming that we're already checking for end of file when we mention number of bytes expected)?
EDIT: A bit more information, reason why I'm asking this is because I've come across a bit of code where the developer pretty much did something like this:
bytesExpected = (remainingBytesInFile > 94208 ? 94208 : remainingBytesInFile
while (bytesRead < bytesExpected)
{
bytesRead += fileStream.Read(buffer, bytesRead, bytesExpected - bytesRead)
}
Now, I can't see any advantage to having this while at all, I'd expect it to throw an exception if it can't read the number of bytes expected (bearing in mind it's already taking into account that there are those many bytes left to read)
What would the reason one could possibly have for something like this? I'm sure I'm missing something
The documentation is for Stream.Read, from which FileStream is derived. Since FileStream is a stream, it should obey the stream contract. Not all streams do, but unless you have a very good reason, you should stick to that.
In a typical file stream, you'll only get a return value smaller than count when you reach the end of file (and it's a pretty simple way of checking for the end of file).
However, in a NetworkStream, for example, you keep reading in a loop until the method returns zero - signalling the end of stream. The same works for file streams - you know you're at the end of the file when Read returns zero.
Most importantly, FileStream isn't just for what you'd consider files - it's also for pseudo-files like standard input/output pipes and COM ports, for example (try opening a file stream on PRN, for example). In that case, you're not reading a file with a fixed length, and the behaviour is the same as with NetworkStream.
Finally, don't forget that FileStream isn't sealed. It's perfectly fine for you to implement a virtualized file system, for example - and it's perfectly fine if your virtualized file system doesn't support seeking, or checking the length of file.
EDIT:
To address your edit, this is exactly how you're supposed to read any stream. Nothing wrong with it. If there's nothing else to read in a stream, the Read method will simply return 0, and you know the stream is over. The only thing is, it seems that he tries to fill his buffer to full, one buffer at a time - this only makes sense if you explicitly need to partition the file by 94208 bytes, and pass that byte[] for further processing somewhere.
If that's not the case, you don't really need to fill the full buffer - you just keep reading (and probably writing on some other side) until Read returns 0. And indeed, by default, FileStream will always fill the whole buffer unless it's built around a pipe handle - but since that's a possibility, you shouldn't rely on the "real file" behaviour, so as long as you need those byte[] for something non-stream (e.g. parsing messages), this is entirely fine. If you're only using the stream as an actual stream, and you're streaming the data somewhere else, it doesn't have a point, really - you only need one while to read the file.
Your expectations would only apply to the case when the stream is reading data off of a no-latency source. Other I/O sources can be slow, which is why the Read method might will not always be able to return immediately. That doesn't mean that there is an error (so no exception), just that it has to wait for data to arrive.
Examples: network stream, file stream on slow disk, etc.
(UPDATE, HDD example) To give an example specific to files (since your case is FileStream, although Read is defined on Stream and so all implementations should fulfill the requirements): mechanical hard-drives go to "sleep" when not active (specially on battery-powered devices, read laptops). Spinning up can take a second or so. That is not an IOException, but your read would have to wait for a second before any data is read.
Simple answer is that on a FileStream it probably never happens.
However keep in mind that the Read method is inherited from Stream which serves as base for many other streams like NetworkStream and in this case you may not be able to read has many bytes as you requested simple because they havent been received from the network yet.
So like the documentation says it all depends on the implementation of the specific type of stream - FileStream, NetworkStream, etc.

MemoryStream read/write and data length

I have a MemoryStream /BinaryWriter , I use it as following:
memStram = new MemoryStream();
memStramWriter = new BinaryWriter(memStram);
memStramWriter(byteArrayData);
now to read I do the following:
byte[] data = new byte[this.BulkSize];
int readed = this.memStram.Read(data, 0, Math.Min(this.BulkSize,(int)memStram.Length));
My 2 question is:
After I read, the position move to currentPosition+readed , Does the
memStram.Length will changed?
I want to init the stream (like I just create it), can I do the following instead using Dispose and new again, if not is there any faster way than dispose&new: ;
memStram.Position = 0;
memStram.SetLength(0);
Thanks.
Joseph
No; why should Length (i.e. data size) change on read?
Yes; SetLength(0) is faster: there's no overhead with memory allocation and re-allocation in this case.
1: After I read, the position move to currentPosition+readed , Does the memStram.Length will changed?
Reading doesn't usually change the .Length - just the .Position; but strictly speaking, it is a bad idea even to look at the .Length and .Position when reading (and often: when writing), as that is not supported on all streams. Usually, you read until (one of, depending on the scenario):
until you have read an expected number of bytes, for example via some length-header that told you how much to expect
until you see a sentinel value (common in text protocols; not so common in binary protocols)
until the end of the stream (where Read returns a non-positive value)
I would also probably say: don't use BinaryWriter. There doesn't seem to be anything useful that it is adding over just using Stream.
2: I want to init the stream (like I just create it), can I do the following instead using Dispose and new again, if not is there any faster way than dispose&new:
Yes, SetLength(0) is fine for MemoryStream. It isn't necessarily fine in all cases (for example, it won't make much sense on a NetworkStream).
No the lenght should not change, and you can easily inspect that with a watch variable
i would use the using statement, so the syntax will be more elegant and clear, and you will not forget to dispose it later...

How to optimize "real-time" C# write-to-file & MATLAB read-from-file operation

I am trying to find a good method to write data from a NetworkStream (via C#) to a text file while "quasi-simultaneously" reading the newly written data from the text file into Matlab.
Basically, is there a good method or technique for coordinating write/read operations (from separate programs) such that a read operation does not block a write operation (and vice-versa) and the lag between successive write/reads is minimized?
Currently I am just writing (appending) data from the network stream to a text file via a WriteLine loop, and reading the data by looping Matlab's fscanf function which also marks the last element read and repositions the file-pointer to that spot.
Relevant portions of C# code:
(Note: The loop conditions I'm using are arbitrary, I'm just trying to see what works right now.)
NetworkStream network_stream = tcp_client.GetStream();
string path = #"C:\Matlab\serial_data.txt";
FileInfo file_info = new FileInfo(path);
using (StreamWriter writer = file_info.CreateText())
{
string foo = "";
writer.WriteLine(foo);
}
using (StreamWriter writer = File.AppendText(path))
{
byte[] buffer = new byte[1];
int maxlines = 100000;
int lines = 0;
while (lines <= maxlines)
{
network_stream.Read(buffer, 0, buffer.Length);
byte byte2string = buffer[0];
writer.WriteLine(byte2string);
lines++;
}
}
Relevant Matlab Code:
i=0;
while i<100;
a = fopen('serial_data.txt');
b = fscanf(a, '%g', [1000 1]);
fclose(a);
i=i+1;
end
When I look at the data read into Matlab there are large stretches of zeros in between the actual data, and the most disconcerting part is that number of consecutive data-points read between these "false zero" stretches varies drastically.
I was thinking about trying to insert some delays (Thread.sleep and wait(timerObject)) into C# and Matlab, respectively, but even then, I don't feel confident that will guarantee I always obtain the data received over the network stream, which is imperative.
Any advice/suggestions would be greatly appreciated.
Looks like there's an issue with how fscanf is being used in the reader on the Matlab side.
The reader code looks like it's going to reread the entire file each time through the loop, because it's re-opening it on each pass through the loop. Is this intentional? If you want to track the end of a file, you probably want to keep the file handle open, and just keep checking to see if you can read further data from it with repeated fscanf calls on the same open filehandle.
Also, that fscanf call looks like it might always return a zero-padded 1000-element array, regardless of how large the file it read was. Maybe that's where your "false zeros" are coming from. How many there are would vary with how much data is actually in the file and how often the Matlab code read it between writes. Grab the second argout of fscanf to see how many elements it actually read.
[b,nRead] = fscanf(a, '%g', [1000 1]);
fprintf('Read %d numbers\n', nRead);
b = b(1:nRead);
Check the doc page for fscanf. In the "Output Arguments" section: "If the input contains fewer than sizeA elements, MATLABĀ® pads A with zeros."
And then you may want to look at this question: How can I do an atomic write/append in C#, or how do I get files opened with the FILE_APPEND_DATA flag?. Keeping the writes shorter than the output stream's buffer (like they are now) will make them atomic, and flushing after each write will make them visible to the reader in a timely manner.

One time gives exception another time doesnt

I use multi threading to copy file to another place.I from stream needed byte array and dispose stream.at this example i use 7 threads to copy 3gb file.1st thread can get byte array,but at 2nd thread occurs exception 'System.OutOfMemoryException'
public void Begin()
{
FileStream stream = new FileStream(pathToFile, FileMode.Open);
stream.Position = (threNmb - 1) * 536870912;
BinaryReader reader = new BinaryReader(stream);
for (long i = 0; i < (length); i++)
{
source.Add(reader.ReadByte());//gives exception at i=134217728
}
reader.Dispose();
reader.Close();
stream.Dispose();
stream.Close();
}
It looks like you're using a List<byte>. That's going to be a very inefficient way of copying data, and you're probably making it less efficient by using multiple threads. Additionally, if you're using a single list from multiple threads, your code is already broken - List<T> isn't thread-safe. Even if it were, you'd be mixing the data from the different threads, so you wouldn't be able to reassemble the original data. Oh, and by not using using statements, if an exception is thrown you're leaving file handles open. In other words, I'm advising you to completely abandon your current approach.
Instead, copying a file from one place to another (assuming you can't use File.Copy for some reason), should basically be a case of:
Open stream to read
Open stream to write
Allocate buffer (e.g. 32K)
Repeatedly:
Read from the "input" stream into the buffer, noting how much data the call actually read
Write the data you've just read into the output stream
Loop until your "Read" indicates the end of the input stream
Close both streams (with using statements)
There's no need to have everything in memory. Note that in .NET 4 this is made even easier with the Stream.CopyTo method. which does the third and fourth steps for you.

Categories