MemoryStream read/write and data length - c#

I have a MemoryStream /BinaryWriter , I use it as following:
memStram = new MemoryStream();
memStramWriter = new BinaryWriter(memStram);
memStramWriter(byteArrayData);
now to read I do the following:
byte[] data = new byte[this.BulkSize];
int readed = this.memStram.Read(data, 0, Math.Min(this.BulkSize,(int)memStram.Length));
My 2 question is:
After I read, the position move to currentPosition+readed , Does the
memStram.Length will changed?
I want to init the stream (like I just create it), can I do the following instead using Dispose and new again, if not is there any faster way than dispose&new: ;
memStram.Position = 0;
memStram.SetLength(0);
Thanks.
Joseph

No; why should Length (i.e. data size) change on read?
Yes; SetLength(0) is faster: there's no overhead with memory allocation and re-allocation in this case.

1: After I read, the position move to currentPosition+readed , Does the memStram.Length will changed?
Reading doesn't usually change the .Length - just the .Position; but strictly speaking, it is a bad idea even to look at the .Length and .Position when reading (and often: when writing), as that is not supported on all streams. Usually, you read until (one of, depending on the scenario):
until you have read an expected number of bytes, for example via some length-header that told you how much to expect
until you see a sentinel value (common in text protocols; not so common in binary protocols)
until the end of the stream (where Read returns a non-positive value)
I would also probably say: don't use BinaryWriter. There doesn't seem to be anything useful that it is adding over just using Stream.
2: I want to init the stream (like I just create it), can I do the following instead using Dispose and new again, if not is there any faster way than dispose&new:
Yes, SetLength(0) is fine for MemoryStream. It isn't necessarily fine in all cases (for example, it won't make much sense on a NetworkStream).

No the lenght should not change, and you can easily inspect that with a watch variable
i would use the using statement, so the syntax will be more elegant and clear, and you will not forget to dispose it later...

Related

C# Reading, Modifying then writing binary data to file. Best convention?

I'm new to programming in general (My understanding of programming concepts is still growing.). So this question is about learning, so please provide enough info for me to learn but not so much that I can't, thank you.
(I would also like input on how to make the code reusable with in the project.)
The goal of the project I'm working on consists of:
Read binary file.
I have known offsets I need to read to find a particular chunk of data from within this file.
First offset is first 4 bytes(Offset for end of my chunk).
Second offset is 16 bytes from end of file. I read for 4 bytes.(Gives size of chunk in hex).
Third offset is the 4 bytes following previous, read for 4 bytes(Offset for start of chunk in hex).
Locate parts in the chunk to modify by searching ASCII text as well as offsets.
Now I have the start offset, end offset and size of my chunk.
This should allow me to read bytes from file into a byte array and know the size of the array ahead of time.
(Questions: 1. Is knowing the size important? Other than verification. 2. Is reading part of a file into a byte array in order to change bytes and overwrite that part of the file the best method?)
So far I have managed to read the offsets from the file using BinaryReader on a MemoryStream. I then locate the chunk of data I need and read that into a byte array.
I'm stuck in several ways:
What are the best practices for binary Reading / Writing?
What's the best storage convention for the data that is read?
When I need to modify bytes how do I go about that.
Should I be using FileStream?
Since you want to both read and write, it makes sense to use the FileStream class directly (using FileMode.Open and FileAccess.ReadWrite). See FileStream on MSDN for a good overall example.
You do need to know the number of bytes that you are going to be reading from the stream. See the FileStream.Read documentation.
Fundamentally, you have to read the bytes into memory at some point if you're going to use and later modify their contents. So you will have to make an in-memory copy (using the Read method is the right way to go if you're reading a variable-length chunk at a time).
As for best practices, always dispose your streams when you're done; e.g.:
using (var stream = File.Open(FILE_NAME, FileMode.Open, FileAccess.ReadWrite))
{
//Do work with the FileStream here.
}
If you're going to do a large amount of work, you should be doing the work asynchronously. (Let us know if that's the case.)
And, of course, check the FileStream.Read documentation and also the FileStream.Write documentation before using those methods.
Reading bytes is best done by pre-allocating an in-memory array of bytes with the length that you're going to read, then reading those bytes. The following will read the chunk of bytes that you're interested in, let you do work on it, and then replace the original contents (assuming the length of the chunk hasn't changed):
EDIT: I've added a helper method to do work on the chunk, per the comments on variable scope.
using (var stream = File.Open(FILE_NAME, FileMode.Open, FileAccess.ReadWrite))
{
var chunk = new byte[numOfBytesInChunk];
var offsetOfChunkInFile = stream.Position; // It sounds like you've already calculated this.
stream.Read(chunk, 0, numOfBytesInChunk);
DoWorkOnChunk(ref chunk);
stream.Seek(offsetOfChunkInFile, SeekOrigin.Begin);
stream.Write(chunk, 0, numOfBytesInChunk);
}
private void DoWorkOnChunk(ref byte[] chunk)
{
//TODO: Any mutation done here to the data in 'chunk' will be written out to the stream.
}

MemoryStream.WriteTo(Stream destinationStream) versus Stream.CopyTo(Stream destinationStream)

Which one is better : MemoryStream.WriteTo(Stream destinationStream) or Stream.CopyTo(Stream destinationStream)??
I am talking about the comparison of these two methods without Buffer as I am doing like this :
Stream str = File.Open("SomeFile.file");
MemoryStream mstr = new MemoryStream(File.ReadAllBytes("SomeFile.file"));
using(var Ms = File.Create("NewFile.file", 8 * 1024))
{
str.CopyTo(Ms) or mstr.WriteTo(Ms);// Which one will be better??
}
Update
Here is what I want to Do :
Open File [ Say "X" Type File]
Parse the Contents
From here I get a Bunch of new Streams [ 3 ~ 4 Files ]
Parse One Stream
Extract Thousands of files [ The Stream is an Image File ]
Save the Other Streams To Files
Editing all the Files
Generate a New "X" Type File.
I have written every bit of code which is actually working correctly..
But Now I am optimizing the code to make the most efficient.
It is an historical accident that there are two ways to do the same thing. MemoryStream always had the WriteTo() method, Stream didn't acquire the CopyTo() method until .NET 4.
The MemoryStream.WriteTo() version looks like this:
public virtual void WriteTo(Stream stream)
{
// Exception throwing code elided...
stream.Write(this._buffer, this._origin, this._length - this._origin);
}
The Stream.CopyTo() implementation like this:
private void InternalCopyTo(Stream destination, int bufferSize)
{
int num;
byte[] buffer = new byte[bufferSize];
while ((num = this.Read(buffer, 0, buffer.Length)) != 0)
{
destination.Write(buffer, 0, num);
}
}
Stream.CopyTo() is more universal, it works for any stream. And helps programmers that fumble copying data from, say, a NetworkStream. Forgetting to pay attention to the return value from Read() was a very common bug. But it of course copies the bytes twice and allocates that temporary buffer, MemoryStream doesn't need it since it can write directly from its own buffer. So you'd still prefer WriteTo(). Noticing the difference isn't very likely.
MemoryStream.WriteTo: Writes the entire contents of this memory stream to another stream.
Stream.CopyTo: Reads the bytes from the current stream and writes them to the destination stream. Copying begins at the current position in the current stream.
You'll need to seek back to 0, to get the whole source stream copied.
So I think MemoryStream.WriteTo better option for this situation
If you use Stream.CopyTo, you don't need to read all the bytes into memory to start with. However:
This code would be simpler if you just used File.Copy
If you are going to load all the data into memory, you can just use:
byte[] data = File.ReadAllBytes("input");
File.WriteAllBytes("output", data);
You should have a using statement for the input as well as the output stream
If you really need processing so can't use File.Copy, using Stream.CopyTo will cope with larger files than loading everything into memory. You may not need that, of course, or you may need to load the whole file into memory for other reasons.
If you have got a MemoryStream, I'd probably use MemoryStream.WriteTo rather than Stream.CopyTo, but it probably won't make much difference which you use, except that you need to make sure you're at the start of the stream when using CopyTo.
I think Hans Passant's claim of a bug in MemoryStream.WriteTo() is wrong; it does not "ignore the return value of Write()". Stream.Write() returns void, which implies to me that the entire count bytes are written, which implies that Stream.Write() will block as necessary to complete the operation to, e.g., a NetworkStream, or throw if it ultimately fails.
That is indeed different from the write() system call in ?nix, and its many emulations in libc and so forth, which can return a "short write". I suspect Hans leaped to the conclusion that Stream.Write() followed that, which I would have expected, too, but apparently it does not.
It is conceivable that Stream.Write() could perform a "short write", without returning any indication of that, requiring the caller to check that the Position property of the Stream has actually been advanced by count. That would be a very error-prone API, and I doubt that it does that, but I have not thoroughly tested it. (Testing it would be a bit tricky: I think you would need to hook up a TCP NetworkStream with a reader on the other end that blocked forever, and write enough to fill up the wire buffers. Or something like that...)
The comments for Stream.Write() are not quite unambiguous:
Summary:
When overridden in a derived class, writes a sequence of bytes to the current
stream and advances the current position within this stream by the number
of bytes written.
Parameters: buffer:
An array of bytes. This method copies count bytes from buffer to the current stream.
Compare that to the Linux man page for write(2):
write() writes up to count bytes from the buffer pointed buf to the file referred to by the file descriptor fd.
Note the crucial "up to". That sentence is followed by explanation of some of the conditions under which a "short write" might occur, making it very explicit that it can occur.
This is really a critical issue: we need to know how Stream.Write() behaves, beyond all doubt.
The CopyTo method creates a buffer, populates its with data from the original stream and then calls the Write method passing the created buffer as a parameter. The WriteTo uses the memoryStream's internal buffer to write. That is the difference. What is better - it is up to you to decide which method you prefer.
Creating a MemoryStream from a HttpInputStream in Vb.Net:
Dim filename As String = MyFile.PostedFile.FileName
Dim fileData As Byte() = Nothing
Using binaryReader = New BinaryReader(MyFile.PostedFile.InputStream)
binaryReader.BaseStream.Position = 0
fileData = binaryReader.ReadBytes(MyFile.PostedFile.ContentLength)
End Using
Dim memoryStream As MemoryStream = New MemoryStream(fileData)

Is there a way to mark the end of each protobuf-net record

I am saving a series of protobuf-net objects in a database cell as a Byte[] of length-prefixed protobuf-net objects:
//retrieve existing protobufs from database and convert to Byte[]
object q = sql_agent_cmd.ExecuteScalar();
older-pbfs = (Byte[])q;
// serialize the new pbf to add into MemoryStream m
//now write p and the new pbf-net Byte[] into a memory stream and retrieve the sum
var s = new System.IO.MemoryStream();
s.Write(older-pbfs, 0, older-pbfs.Length);
s.Write(m.GetBuffer(), 0, m.ToArray().Length); // append new bytes at the end of old
Byte[] sum-pbfs = s.ToArray();
//sum-pbfs = old pbfs + new pbf. Insert sum-pbfs into database
This works fine. My concern is what happens if there is slight db corruption. It will no longer be possible to know which byte is the length prefix and the entire cell contents would have to be discarded. Wouldn't it be advisable to also use some kind of a end-of-pbf-object indicator (kind of like the \n or EOF indicators used in text files). This way even if a record gets corrupted, the other records would be recoverable.
If so, what is the recommended way to add end-of-record indicators at the end of each pbf.
Using protobuf-netv2 and C# on Visual Studio 2010.
Thanks
Manish
If you use a vanilla message via Serialize / Deserialize, then no: that isn't part of the specification (because the format is designed to be appendable).
If, however, you use SerializeWithLengthPrefix, it will dump the length at the start of the message; it will then know in advance how much data is expected. You deserialize with DeserializeWithLengthPrefix, and it will complain loudly if it doesn't have enough data. However! It will not complain if you have extra data, since this too is designed to be appendable.
In terms of Jon's reply, the default usage of the *WithLengthPrefix method is in terms of the data stored exactly identical to what Jon suggests; it pretends there is a wrapper object and behaves accordingly. The differences are:
no wrapper object actually exists
the "withlengthprefix" methods explicitly stop after a single occurrence, rather than merging any later data into the same object (useful for, say, sending multiple discreet objects to a single file, or down a single socket)
The difference in the two "appendable"s here is that the first means "merge into a single object", where-as the second means "I expect multiple records".
Unrelated suggestion:
s.Write(m.GetBuffer(), 0, m.ToArray().Length);
should be:
s.Write(m.GetBuffer(), 0, (int)m.Length);
(no need to create an extra buffer)
(Note: I don't know much about protobuf-net itself, but this is generally applicable to Protocol Buffer messages.)
Typically if you want to record multiple messages, it's worth just putting a "wrapper" message - make the "real" message a repeated field within that. Each "real" message will then be length prefixed by the natural wire format of Protocol Buffers.
This won't detect corruption of course - but to be honest, if the database ends up getting corrupted you've got bigger problems. You could potentially detect corruption, e.g. by keeping a hash along with each record... but you need to consider the possibility of the corruption occurring within the length prefix, or within the hash itself. Think about what you're really trying to achieve here - what scenarios you're trying to protect against, and what level of recovery you need.

Stream.Seek(0, SeekOrigin.Begin) or Position = 0

When you need to reset a stream to beginning (e.g. MemoryStream) is it best practice to use
stream.Seek(0, SeekOrigin.Begin);
or
stream.Position = 0;
I've seen both work fine, but wondered if one was more correct than the other?
Use Position when setting an absolute position and Seek when setting a relative position. Both are provided for convenience so you can choose one that fits the style and readability of your code. Accessing Position requires the stream be seekable so they're safely interchangeable.
You can look at the source code for both methods to find out:
Position property
https://referencesource.microsoft.com/#mscorlib/system/io/memorystream.cs,320
Seek method
https://referencesource.microsoft.com/#mscorlib/system/io/memorystream.cs,482
The cost is almost identical (3 ifs and some arithmetics). However, this is only true for jumping to absolute offsets like Position = 0 and not relative offsets like Position += 0, in which case Seek seems slightly better.
However, you should keep in mind that we are talking about performance of a handful of integer arithmetics and if checks, that's like not even accurately measureable with benchmarking methods. Like others already pointed out, there is no significant/detectable difference.
If you are working with files (eg: with the FileStream class) it seems Seek(0, SeekOrigin.Begin) is able to keep internal buffer (when possible) while Position=0 will always discard it.

precautions for reading from a memorystream in c#

I recently came across this web page http://www.yoda.arachsys.com/csharp/readbinary.html explaining what precautions to take when reading from a filestream. The gist of it is that the following code doesnt always work:
// Bad code! Do not use!
FileStream fs = File.OpenRead(filename);
byte[] data = new byte[fs.Length];
fs.Read (data, 0, data.Length);
This is dangerous as the third argument for Read is a maximum of bytes to be read, and you should use Read's return value to check how much actually got read.
My question is should you take the same precautions when reading from a memorystream and under which circumstances might Read return before all bytes are read?
Well, I believe the current implementation of MemoryStream will always fill the buffer if it can - unless you've got some evil class derived from it. It's not guaranteed though, as far as I can see. The documentation even contains the warning:
An implementation is free to return fewer bytes than requested even if the end of the stream has not been reached.
Personally, I'd always code this defensively unless it makes things much easier. You never know when someone will change the type of stream and not notice what's happened.
Normally with a MemoryStream though, I want all the bytes at once: so I call MemoryStream.ToArray. This is guaranteed to work, and if someone changes the code to not use a MemoryStream, it will fail to compile as that member's only on MemoryStream. For general streams, I use a utility method which reads fully from a stream and returns a byte array.
I cant think of any reason for a normal MemoryStream. Unmanaged might be a different story.
Anyways, the GetBuffer() ToArray() command is always handy. :)
Yes, you should always be aware of how many bytes were actually read from a stream when calling Read. The roout cause can vary depending on the stream type, but essentially the return value will be less than the actual buffer size whenever you are trying to read beyond the end of the stream.
Here's what MSDN says about it:
...can be less than the number of
bytes requested if that number of
bytes are not currently available, or
zero if the end of the stream is
reached before any bytes are read.
and
An implementation is free to return
fewer bytes than requested even if the
end of the stream has not been
reached.
Note the term "an implementation...".

Categories