memory efficient way to read from COM stream to c# byte[] - c#

My current approach is to read the COM stream into a C# MemoryStream and then call .toArray. However, I believe toArray creates a redundant copy of the data. Is there a better way that has reduced memory usage as the priority?
var memStream = new MemoryStream(10000);
var chunk = new byte[1000];
while (true)
{
int bytesRead = comStream.read(ref chunk, chunk.Length);
if (bytesRead == 0)
break; // eos
memStream.Write(chunk, 0, bytesRead);
}
//fairly sure this creates a duplicate copy of the data
var array = memStream.ToArray();
//does this also dupe the data?
var array2 = memStream.GetBuffer();

If you know the length of the data before you start consuming it, then: you can allocate a simple byte[] and fill that in your read loop simply by incrementing an offset each read with the number of bytes read (and decrementing your "number of bytes you're allowed to touch). This does depend on having a read overload / API that accepts either an offset or a pointer, though.
If that isn't an option: GetBuffer() is your best bet - it doesn't duplicate the data; rather it hands you the current possibly oversized byte[]. Because it is oversized, you must consider it in combination with the current .Length, perhaps wrapping the length/data pair in either a ArraySegment<byte>, or a Span<byte>/Memory<byte>.
In the "the length is known" scenario, if you're happy to work with oversized buffers, you could also consider a leased array, via ArrayPool<byte>.Shared - rent one of at least that size, fill it, then constrain your segment/span to the populated part (and remember to return it to the pool when you're done).

Related

Convert byte[] array to a short[] array with half the length

I have a byte[200] that is read from a file, representing a short[100] in little-endian format. This is how I read it:
using (FileStream fs = new FileStream(_path, FileMode.Open, FileAccess.Read))
{
//fs.Seek(...)
byte[] record = new byte[200];
fs.Read(record, 0, record.Length);
short[] target = new short[100];
// magic operation that fills target array
}
I don't know what to put in "magic operation". I've read about BitConverter, but it doesn't seem to have a BitConverter.ToShort operation. Anyway, BitConverter seems to convert in a loop, whereas I would appreciate some way to "block copy" the whole array at once, if possible.
I think you're looking for Buffer.BlockCopy.
Buffer.BlockCopy(record, 0, target, 0, record.Length);
I believe that will preserve the endianness of the architecture you're on - so it may be inappropriate in some environments. You might want to abstract this into a method call which can check (once) whether or not it does what you want (e.g. by converting {0, 1} and seeing whether the result is {1} or {256}) and then either uses Buffer.BlockCopy or does it "manually" in a loop if necessary.

I am unable to keep the size of array fixed

I am having trouble keeping the size of an array fixed. I have to read data from a file. when the number of bytes read is less then 160 the array MyPCMBuf changes to the size of bytes read.
byte[] MyPCMBuf;
MyPCMBuf = new byte[160];
BinaryReader reader = new BinaryReader(File.Open(fileNamein, FileMode.Open)) ;
Array.Clear (MyPCMBuf,0,160);
MyPCMBuf = reader.ReadBytes(160); //if bytes read =20 , size of MyPCMBuf becomes 20
What is going on and how to avoid it?
That's because ReadBytes returns a new byte array.
If you want to read bytes into an existing array, call Read.
That is:
int bytesRead = reader.Read(myBuffer, 0, 160);
You have no problem keeing the size fixed.
ReadBytes returns a new array. That simple. The old one never changes size.
If you want to use your buffer, user another method, for example:
public virtual int Read(
byte[] buffer,
int index,
int count
)
on that class. Then you keep your array.
Don't use BinaryReader at all. Use the FileStream returned by Open and call its Read method. If you don't assign to MyPCMBuf its length cannot possibly change.
Unrelated: Use Array.Clear (MyPCMBuf,0,MyPCMBuf.Length);. Less redundancy. Less potential for errors. Use using. Don't initialize MyPCMBuf if you always overwrite it. Don't clear it redundantly. I see a lot of misunderstandings here. Be more rigorous in your approach to programming. It appears you don't really understand all the language features and APIs you are using. That's dangerous.
You are overwriting MyPCMBuf in line
MyPCMBuf = reader.ReadBytes(160);
Thus, line MyPCMBuf = new byte[160]; is irrelevant in your code.
You are not really doing what you think you are.
reader.ReadBytes(160) will create a new byte array of size at most 160. It's not storing values in the array you already had. That one is send to Garbage Collection.

Get Length of Data Available in NetworkStream

I would like to be able to get the length of the data available from a TCP network stream in C# to set the size of the buffer before reading from the network stream. There is a NetworkStream.Length property but it isn't implemented yet, and I don't want to allocate an enormous size for the buffer as it would take up too much space. The only way I though of doing it would be to precede the data transfer with another telling the size, but this seems a little messy. What would be the best way for me to go about doing this.
When accessing Streams, you usually read and write data in small chunks (e.g. a kilobyte or so), or use a method like CopyTo that does that for you.
This is an example using CopyTo to copy the contents of a stream to another stream and return it as a byte[] from a method, using an automatically-sized buffer.
using (MemoryStream ms = new MemoryStream())
{
networkStream.CopyTo(ms);
return ms.ToArray();
}
This is code that reads data in the same way, but more manually, which might be better for you to work with, depending on what you're doing with the data:
byte[] buffer = new byte[2048]; // read in chunks of 2KB
int bytesRead;
while((bytesRead = networkStream.Read(buffer, 0, buffer.Length)) > 0)
{
//do something with data in buffer, up to the size indicated by bytesRead
}
(the basis for these code snippets came from Most efficient way of reading data from a stream)
There is no inherent length of a network stream. You will either have to send the length of the data to follow from the other end or read all of the incoming data into a different stream where you can access the length information.
The thing is, you can't really be sure all the data is read by the socket yet, more data might come in at any time. This is try even if you somehow do know how much data to expect, say if you have a package header that contains the length. the whole packet might not be received yet.
If you're reading arbitrary data (like a file perhaps) you should have a buffer of reasonable size (like 1k-10k or whatever you find to be optimal for your scenario) and then write the data to a file as its read from the stream.
var buffer = byte[1000];
var readBytes = 0;
using(var netstream = GetTheStreamSomhow()){
using(var fileStream = (GetFileStreamSomeHow())){
while(netstream.Socket.Connected) //determine if there is more data, here we read until the socket is closed
{
readBytes = netstream.Read(buffer,0,buffer.Length);
fileStrem.Write(buffer,0,buffer.Length);
}
}
}
Or just use CopyTo like Tim suggested :) Just make sure that all the data has indeed been read, including data that hasn't gotten across the network yet.
You could send the lenght of the incoming data first.
For example:
You have data = byte[16] you want to send. So at first you send the 16 and define on the server, that this length is always 2 (because 16 has two characters). Now you know that the incomingLength = 16. You can wait now for data of the lenght incomingLength.

MemoryStream.WriteTo(Stream destinationStream) versus Stream.CopyTo(Stream destinationStream)

Which one is better : MemoryStream.WriteTo(Stream destinationStream) or Stream.CopyTo(Stream destinationStream)??
I am talking about the comparison of these two methods without Buffer as I am doing like this :
Stream str = File.Open("SomeFile.file");
MemoryStream mstr = new MemoryStream(File.ReadAllBytes("SomeFile.file"));
using(var Ms = File.Create("NewFile.file", 8 * 1024))
{
str.CopyTo(Ms) or mstr.WriteTo(Ms);// Which one will be better??
}
Update
Here is what I want to Do :
Open File [ Say "X" Type File]
Parse the Contents
From here I get a Bunch of new Streams [ 3 ~ 4 Files ]
Parse One Stream
Extract Thousands of files [ The Stream is an Image File ]
Save the Other Streams To Files
Editing all the Files
Generate a New "X" Type File.
I have written every bit of code which is actually working correctly..
But Now I am optimizing the code to make the most efficient.
It is an historical accident that there are two ways to do the same thing. MemoryStream always had the WriteTo() method, Stream didn't acquire the CopyTo() method until .NET 4.
The MemoryStream.WriteTo() version looks like this:
public virtual void WriteTo(Stream stream)
{
// Exception throwing code elided...
stream.Write(this._buffer, this._origin, this._length - this._origin);
}
The Stream.CopyTo() implementation like this:
private void InternalCopyTo(Stream destination, int bufferSize)
{
int num;
byte[] buffer = new byte[bufferSize];
while ((num = this.Read(buffer, 0, buffer.Length)) != 0)
{
destination.Write(buffer, 0, num);
}
}
Stream.CopyTo() is more universal, it works for any stream. And helps programmers that fumble copying data from, say, a NetworkStream. Forgetting to pay attention to the return value from Read() was a very common bug. But it of course copies the bytes twice and allocates that temporary buffer, MemoryStream doesn't need it since it can write directly from its own buffer. So you'd still prefer WriteTo(). Noticing the difference isn't very likely.
MemoryStream.WriteTo: Writes the entire contents of this memory stream to another stream.
Stream.CopyTo: Reads the bytes from the current stream and writes them to the destination stream. Copying begins at the current position in the current stream.
You'll need to seek back to 0, to get the whole source stream copied.
So I think MemoryStream.WriteTo better option for this situation
If you use Stream.CopyTo, you don't need to read all the bytes into memory to start with. However:
This code would be simpler if you just used File.Copy
If you are going to load all the data into memory, you can just use:
byte[] data = File.ReadAllBytes("input");
File.WriteAllBytes("output", data);
You should have a using statement for the input as well as the output stream
If you really need processing so can't use File.Copy, using Stream.CopyTo will cope with larger files than loading everything into memory. You may not need that, of course, or you may need to load the whole file into memory for other reasons.
If you have got a MemoryStream, I'd probably use MemoryStream.WriteTo rather than Stream.CopyTo, but it probably won't make much difference which you use, except that you need to make sure you're at the start of the stream when using CopyTo.
I think Hans Passant's claim of a bug in MemoryStream.WriteTo() is wrong; it does not "ignore the return value of Write()". Stream.Write() returns void, which implies to me that the entire count bytes are written, which implies that Stream.Write() will block as necessary to complete the operation to, e.g., a NetworkStream, or throw if it ultimately fails.
That is indeed different from the write() system call in ?nix, and its many emulations in libc and so forth, which can return a "short write". I suspect Hans leaped to the conclusion that Stream.Write() followed that, which I would have expected, too, but apparently it does not.
It is conceivable that Stream.Write() could perform a "short write", without returning any indication of that, requiring the caller to check that the Position property of the Stream has actually been advanced by count. That would be a very error-prone API, and I doubt that it does that, but I have not thoroughly tested it. (Testing it would be a bit tricky: I think you would need to hook up a TCP NetworkStream with a reader on the other end that blocked forever, and write enough to fill up the wire buffers. Or something like that...)
The comments for Stream.Write() are not quite unambiguous:
Summary:
When overridden in a derived class, writes a sequence of bytes to the current
stream and advances the current position within this stream by the number
of bytes written.
Parameters: buffer:
An array of bytes. This method copies count bytes from buffer to the current stream.
Compare that to the Linux man page for write(2):
write() writes up to count bytes from the buffer pointed buf to the file referred to by the file descriptor fd.
Note the crucial "up to". That sentence is followed by explanation of some of the conditions under which a "short write" might occur, making it very explicit that it can occur.
This is really a critical issue: we need to know how Stream.Write() behaves, beyond all doubt.
The CopyTo method creates a buffer, populates its with data from the original stream and then calls the Write method passing the created buffer as a parameter. The WriteTo uses the memoryStream's internal buffer to write. That is the difference. What is better - it is up to you to decide which method you prefer.
Creating a MemoryStream from a HttpInputStream in Vb.Net:
Dim filename As String = MyFile.PostedFile.FileName
Dim fileData As Byte() = Nothing
Using binaryReader = New BinaryReader(MyFile.PostedFile.InputStream)
binaryReader.BaseStream.Position = 0
fileData = binaryReader.ReadBytes(MyFile.PostedFile.ContentLength)
End Using
Dim memoryStream As MemoryStream = New MemoryStream(fileData)

Most efficient way to compare a memorystream to a file C# .NET

I have a MemoryStream containing the bytes of a PNG-encoded image, and want to check if there is an exact duplicate of that image data in a directory on disk. The first obvious step is to only look for files that match the exact length, but after this I'd like to know what's the most efficient way to compare the memory against the files. I'm not very experienced working with streams.
I had a couple thoughts on the matter:
First, if I could get a hash code for the file, it would (presumably) be more efficient to compare hash codes rather than every byte of the image. Similarly, I could compare just some of the bytes of the image, giving a "close-enough" answer.
And then of course I could just compare the entire stream, but I don't know how quick that would be.
What's the best way to compare a MemoryStream to a file? Byte-by-byte in a for-loop?
Another solution:
private static bool CompareMemoryStreams(MemoryStream ms1, MemoryStream ms2)
{
if (ms1.Length != ms2.Length)
return false;
ms1.Position = 0;
ms2.Position = 0;
var msArray1 = ms1.ToArray();
var msArray2 = ms2.ToArray();
return msArray1.SequenceEqual(msArray2);
}
Firstly, getting hashcode of the two streams won't help - to calculate hashcodes, you'd need to read the entire contents and perform some simple calculation while reading. If you compare the files byte-by-byte or using buffers, then you can stop earlier (after you find first two bytes/blocks) that don't match.
However, this approach would make sense if you needed to compare the MemoryStream against multiple files, because then you'd need to loop through the MemoryStream just once (to calculate the hashcode) and tne loop through all the files.
In any case, you'll have to write code to read the entire file. As you mentioned, this can be done either byte-by-byte or using buffers. Reading data into buffer is a good idea, because it may be more efficient operation when reading from HDD (e.g. reading 1kB buffer). Moreover, you could use asynchronous BeginRead method if you need to process multiple files in parallel.
Summary:
If you need to compare multiple files, use hashcode
To read/compare content of single file:
Read 1kB of data into a buffer from both streams
See if there is a difference (if yes, quit)
Continue looping
Implement the above steps asynchronously using BeginRead if you need to process mutliple files in parallel.
Firstly, getting hashcode of the two streams won't help - to calculate hashcodes, you'd need to read the entire contents and perform some simple calculation while reading.
I'm not sure if I misunderstood it or this is simply isn't true. Here's the example of hash calculation using streams
private static byte[] ComputeHash(Stream data)
{
using HashAlgorithm algorithm = MD5.Create();
byte[] bytes = algorithm.ComputeHash(data);
data.Seek(0, SeekOrigin.Begin); //I'll use this trick so the caller won't end up with the stream in unexpected position
return bytes;
}
I've measured this code with benchmark.net and it allocated 384 bytes on 900Mb file. Needless to say how inefficient loading whole file in memory in this case.
However, this is true
It's important to be aware of the (unlikely) possibility of hash collisions. Byte comparison would be necessary to avoid this issue.
So in case hashes don't match you have to perform additional checks in order to be sure that files are 100% different. In such a case following is a great approach.
As you mentioned, this can be done either byte-by-byte or using buffers. Reading data into buffer is a good idea, because it may be more efficient operation when reading from HDD (e.g. reading 1kB buffer).
Recently I had to perform such checks so I'll post results of this exercise as 2 utility methods
private bool AreStreamsEqual(Stream stream, Stream other)
{
const int bufferSize = 2048;
if (other.Length != stream.Length)
{
return false;
}
byte[] buffer = new byte[bufferSize];
byte[] otherBuffer = new byte[bufferSize];
while ((_ = stream.Read(buffer, 0, buffer.Length)) > 0)
{
var _ = other.Read(otherBuffer, 0, otherBuffer.Length);
if (!otherBuffer.SequenceEqual(buffer))
{
stream.Seek(0, SeekOrigin.Begin);
other.Seek(0, SeekOrigin.Begin);
return false;
}
}
stream.Seek(0, SeekOrigin.Begin);
other.Seek(0, SeekOrigin.Begin);
return true;
}
private bool IsStreamEuqalToByteArray(byte[] contents, Stream stream)
{
const int bufferSize = 2048;
var i = 0;
if (contents.Length != stream.Length)
{
return false;
}
byte[] buffer = new byte[bufferSize];
int bytesRead;
while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
{
var contentsBuffer = contents
.Skip(i * bufferSize)
.Take(bytesRead)
.ToArray();
if (!contentsBuffer.SequenceEqual(buffer))
{
stream.Seek(0, SeekOrigin.Begin);
return false;
}
}
stream.Seek(0, SeekOrigin.Begin);
return true;
}
We've open sourced a library to deal with this at NeoSmart Technologies, because we've had to compare opaque Stream objects for bytewise equality one time too many. It's available on NuGet as StreamCompare and you can read about its advantages over existing approaches in the official release announcement.
Usage is very straightforward:
var stream1 = ...;
var stream2 = ...;
var scompare = new StreamCompare();
var areEqual = await scompare.AreEqualAsync(stream1, stream2);
It's written to abstract away as many of the gotchas and performance pitfalls as possible, and contains a number of optimizations to speed up comparisons (and to minimize memory usage). There's also a file comparison wrapper FileCompare included in the package, that can be used to compare two files by path.
StreamCompare is released under the MIT license and runs on .NET Standard 1.3 and above. NuGet packages for .NET Standard 1.3, .NET Standard 2.0, .NET Core 2.2, and .NET Core 3.0 are available. Full documentation is in the README file.
Using Stream we don't get the result, each and every files has a unique identity, such as the last modified date and so on. So each and every file is different. This information is included in the stream

Categories