C# serializing/deserializing with memory stream System.OutOfMemoryException

C# serializing/deserializing with memory stream System.OutOfMemoryException - c#

I am trying to serialize/de-serialize a stream of around 50MB xml data with the following code and I get System.OutOfMemoryException exception.
var formatter = new BinaryFormatter();
var stream = new MemoryStream();
using (stream)
{
formatter.Serialize(stream, source);
stream.Seek(0, SeekOrigin.Begin);
return (T) formatter.Deserialize(stream);
}
I debugged the code and it throws OutOfMemoryException exception on the formatter.Serialize(stream,source) line.
I did some search in it says that limit is 2GB. How can I debug to find out the reason or is there any efficent way of writing this code? Or any tool to watch the memory usage.
Thanks,

Dealing with 2GB of xml is never going to be efficient. However, to make it work, you could try writing to a FileStream instead of a MemoryStream, since a MemoryStream has a 2GB limit. Alternatively, you could write your own Stream implementation using multiple buffers rather than a single large buffer.
However, I strongly suggest that what you actually want to do here is some combination of:
use a different serialization format (xml is a poor choice for large data)
don't require it all in one blob
have less data

Related

C# Converting object to byte array

I'm making a chat system thing with tcp which requires to send things in byte arrays, but when I convert an image into a byte array, send it and then convert back it gives this error: 'End of Stream encountered before parsing was completed.'. With strings it works just fine.
public byte[] ObjectToByteArray(object obj)
{
BinaryFormatter formatter = new BinaryFormatter();
using (var stream = new MemoryStream())
{
formatter.Serialize(stream, obj);
return stream.ToArray();
}
}
public object ByteArrayToObject(byte[] bytes)
{
using (var stream = new MemoryStream())
{
var binForm = new BinaryFormatter();
stream.Write(bytes, 0, bytes.Length);
stream.Position = 0;
var obj = binForm.Deserialize(stream);
return obj;
}
}

There's two separate things here; firstly, and I cannot emphasize this enough; do not use BinaryFormatter. Ever. It will hurt you. Lots of serializers exist, and BinaryFormatter (and the cousin NetDataContractSerializer) is literally the absolute last you should use. I can expand on that if you like, or I can suggest alternatives if you like.
Now; as for the actual problem: I strongly suspect that it isn't what you think it is. I have a hunch, based on decades of working on network code, that the real problem here is "framing". By which I mean: TCP is a stream protocol, not a message/packet protocol. I strongly suspect that you have not correctly deframed the exact bytes that were sent. I can't say this for sure without seeing your socket code, but... as I say: it is an hunch based on lots of experience. To investigate this: note the length of the bytes you send, and note the length of the bytes you've received. I'm pretty sure you'll find they are different. If there's still doubt: get the base-64 or hex string of the sent payload and the received payload (Convert.ToBase64String, for example), and compare that string. I'm pretty sure they'll turn out to be different.
Ultimately, network code is hard; I could try and explain individual points, but "how to correctly send messages over a network" could fill a book. IMO, if you're not interested in specializing in writing network code for the next 5 years: use an existing tool that will do the job for you, for example gRPC. Lots and lots of other messaging RPC tools exist.

Unexpected MemoryStream behavior

I am reading and writing to the same MemoryStream.
Something like this(possible compilation mistakes):
MemoryStream stream = new MemoryStream();
stream.Write("1234",0,4);
stream.Position -= 4;
stream.Read(buffer,0,4);
Why do I HAVE to move Position? Why it is not separate to read and write?
Is there any other Stream that can be used?

Because that's how streams are supposed to work. You have one position, see it as a cursor, set at a point in the stream at which you can read or write. Reading and writing both advance this position.
If you're merely using a MemoryStream to exchange data between callers, as a pseudo IPC mechanism, then perhaps some better way exists to do so.

MemoryStream.WriteTo(Stream destinationStream) versus Stream.CopyTo(Stream destinationStream)

Which one is better : MemoryStream.WriteTo(Stream destinationStream) or Stream.CopyTo(Stream destinationStream)??
I am talking about the comparison of these two methods without Buffer as I am doing like this :
Stream str = File.Open("SomeFile.file");
MemoryStream mstr = new MemoryStream(File.ReadAllBytes("SomeFile.file"));
using(var Ms = File.Create("NewFile.file", 8 * 1024))
{
str.CopyTo(Ms) or mstr.WriteTo(Ms);// Which one will be better??
}
Update
Here is what I want to Do :
Open File [ Say "X" Type File]
Parse the Contents
From here I get a Bunch of new Streams [ 3 ~ 4 Files ]
Parse One Stream
Extract Thousands of files [ The Stream is an Image File ]
Save the Other Streams To Files
Editing all the Files
Generate a New "X" Type File.
I have written every bit of code which is actually working correctly..
But Now I am optimizing the code to make the most efficient.

It is an historical accident that there are two ways to do the same thing. MemoryStream always had the WriteTo() method, Stream didn't acquire the CopyTo() method until .NET 4.
The MemoryStream.WriteTo() version looks like this:
public virtual void WriteTo(Stream stream)
{
// Exception throwing code elided...
stream.Write(this._buffer, this._origin, this._length - this._origin);
}
The Stream.CopyTo() implementation like this:
private void InternalCopyTo(Stream destination, int bufferSize)
{
int num;
byte[] buffer = new byte[bufferSize];
while ((num = this.Read(buffer, 0, buffer.Length)) != 0)
{
destination.Write(buffer, 0, num);
}
}
Stream.CopyTo() is more universal, it works for any stream. And helps programmers that fumble copying data from, say, a NetworkStream. Forgetting to pay attention to the return value from Read() was a very common bug. But it of course copies the bytes twice and allocates that temporary buffer, MemoryStream doesn't need it since it can write directly from its own buffer. So you'd still prefer WriteTo(). Noticing the difference isn't very likely.

MemoryStream.WriteTo: Writes the entire contents of this memory stream to another stream.
Stream.CopyTo: Reads the bytes from the current stream and writes them to the destination stream. Copying begins at the current position in the current stream.
You'll need to seek back to 0, to get the whole source stream copied.
So I think MemoryStream.WriteTo better option for this situation

If you use Stream.CopyTo, you don't need to read all the bytes into memory to start with. However:
This code would be simpler if you just used File.Copy
If you are going to load all the data into memory, you can just use:
byte[] data = File.ReadAllBytes("input");
File.WriteAllBytes("output", data);
You should have a using statement for the input as well as the output stream
If you really need processing so can't use File.Copy, using Stream.CopyTo will cope with larger files than loading everything into memory. You may not need that, of course, or you may need to load the whole file into memory for other reasons.
If you have got a MemoryStream, I'd probably use MemoryStream.WriteTo rather than Stream.CopyTo, but it probably won't make much difference which you use, except that you need to make sure you're at the start of the stream when using CopyTo.

I think Hans Passant's claim of a bug in MemoryStream.WriteTo() is wrong; it does not "ignore the return value of Write()". Stream.Write() returns void, which implies to me that the entire count bytes are written, which implies that Stream.Write() will block as necessary to complete the operation to, e.g., a NetworkStream, or throw if it ultimately fails.
That is indeed different from the write() system call in ?nix, and its many emulations in libc and so forth, which can return a "short write". I suspect Hans leaped to the conclusion that Stream.Write() followed that, which I would have expected, too, but apparently it does not.
It is conceivable that Stream.Write() could perform a "short write", without returning any indication of that, requiring the caller to check that the Position property of the Stream has actually been advanced by count. That would be a very error-prone API, and I doubt that it does that, but I have not thoroughly tested it. (Testing it would be a bit tricky: I think you would need to hook up a TCP NetworkStream with a reader on the other end that blocked forever, and write enough to fill up the wire buffers. Or something like that...)
The comments for Stream.Write() are not quite unambiguous:
Summary:
When overridden in a derived class, writes a sequence of bytes to the current
stream and advances the current position within this stream by the number
of bytes written.
Parameters: buffer:
An array of bytes. This method copies count bytes from buffer to the current stream.
Compare that to the Linux man page for write(2):
write() writes up to count bytes from the buffer pointed buf to the file referred to by the file descriptor fd.
Note the crucial "up to". That sentence is followed by explanation of some of the conditions under which a "short write" might occur, making it very explicit that it can occur.
This is really a critical issue: we need to know how Stream.Write() behaves, beyond all doubt.

The CopyTo method creates a buffer, populates its with data from the original stream and then calls the Write method passing the created buffer as a parameter. The WriteTo uses the memoryStream's internal buffer to write. That is the difference. What is better - it is up to you to decide which method you prefer.

Creating a MemoryStream from a HttpInputStream in Vb.Net:
Dim filename As String = MyFile.PostedFile.FileName
Dim fileData As Byte() = Nothing
Using binaryReader = New BinaryReader(MyFile.PostedFile.InputStream)
binaryReader.BaseStream.Position = 0
fileData = binaryReader.ReadBytes(MyFile.PostedFile.ContentLength)
End Using
Dim memoryStream As MemoryStream = New MemoryStream(fileData)

One time gives exception another time doesnt

I use multi threading to copy file to another place.I from stream needed byte array and dispose stream.at this example i use 7 threads to copy 3gb file.1st thread can get byte array,but at 2nd thread occurs exception 'System.OutOfMemoryException'
public void Begin()
{
FileStream stream = new FileStream(pathToFile, FileMode.Open);
stream.Position = (threNmb - 1) * 536870912;
BinaryReader reader = new BinaryReader(stream);
for (long i = 0; i < (length); i++)
{
source.Add(reader.ReadByte());//gives exception at i=134217728
}
reader.Dispose();
reader.Close();
stream.Dispose();
stream.Close();
}

It looks like you're using a List<byte>. That's going to be a very inefficient way of copying data, and you're probably making it less efficient by using multiple threads. Additionally, if you're using a single list from multiple threads, your code is already broken - List<T> isn't thread-safe. Even if it were, you'd be mixing the data from the different threads, so you wouldn't be able to reassemble the original data. Oh, and by not using using statements, if an exception is thrown you're leaving file handles open. In other words, I'm advising you to completely abandon your current approach.
Instead, copying a file from one place to another (assuming you can't use File.Copy for some reason), should basically be a case of:
Open stream to read
Open stream to write
Allocate buffer (e.g. 32K)
Repeatedly:
Read from the "input" stream into the buffer, noting how much data the call actually read
Write the data you've just read into the output stream
Loop until your "Read" indicates the end of the input stream
Close both streams (with using statements)
There's no need to have everything in memory. Note that in .NET 4 this is made even easier with the Stream.CopyTo method. which does the third and fourth steps for you.

Is there a way to make this faster? MemoryStream vs FileStream

I am working with iTextSharp, and need to generate hundreds of thousands of RTF documents - the resulting files are between 5KB and 500KB.
I am listing 2 approaches below - the original approach wasn't necessarily slow, but I figured why write and retrieve to/from file to get the output string I need. I saw this other approach using MemoryStream, but it actually slowed things down. I essentially just need the outputted RTF content, so that I can run some filters on that RTF to clean up unnecessary formatting. The queries bringing back the data are very quick instant seeming . To generate a 1000 files (actually 2000 files are created in process) with original approach files takes about 15 minutes, the same with second approach takes about 25-30 minutes. The resulting files that I've run are averaging around 80KB.
Is there something wrong with the second approach? Seems like it should be faster than the first one, not slower.
Original approach:
RtfWriter2.GetInstance(doc, new FileStream(RTFFilePathName, FileMode.Create));
doc.Open();
//Add Tables and stuff here
doc.Close(); //It saves a file here to (RTFPathFileName)
StreamReader srRTF = new StreamReader(RTFFilePathName);
string rtfText = srRTF.ReadToEnd();
srRTF.Close();
//Do additional things with rtfText before writing to my final file
New approach, trying to speed it up but this is actually half as fast:
MemoryStream stream = new MemoryStream();
RtfWriter2.GetInstance(doc, stream);
doc.Open();
//Add Tables and stuff here
doc.Close();
string rtfText =
ASCIIEncoding.ASCII.GetString(stream.GetBuffer());
stream.Close();
//Do additional things with rtfText before writing to my final file
The second approach I am trying I found here:
iTextSharp - How to generate a RTF document in the ClipBoard instead of a file

How big your resulting stream is? MemoryStream performs a lot of memory copy operations while growing, so for large results it may take significantly longer to write data by small chunks compared with FileStream.
To verify if it is the problem set inital size of MemoryStream to some large value around resulting size and re-run the code.
To fix it you can pre-grow memory stream initially (if you know approximate output) or write your own stream that uses different scheme when growing. Also using temporary file might be good enough for your purposes as is.

Like Alexei said, its probably caused by fact, yo are creating MemoryStream every time, and every time it continously re-alocates memory as it grows. Try creating only 1 stream and reset it to begining before every write.
Also I think stream.GetBuffer() again returns new memory, so try using same StreamReader with your MemoryStream.
And it seems your code can be easily paralelised, so you can try run it using Paralel Extesions or using TreadPool.
And it seems little weird, you are writing your text as bytes in stream, then reading this stream as bytes and converting to text. Wouldnt it be possible to save your document directly as text?

A MemoryStream is not associated with a file, and has no concept of a filename. Basically, you can't do that.
You certainly can't cast between them; you can only cast upwards an downwards - not sideways; to visualise:
Stream
|
| |
FileStream MemoryStream
You can cast a MemoryStream to a Stream trivially, and a Stream to a MemoryStream via a type-check; but never a FileStream to a MemoryStream. That is like saying a dog is an animal, and an elephant is an animal, so we can cast a dog to an elephant.
You could subclass MemoryStream and add a Name property (that you supply a value for), but there would still be no commonality between a FileStream and a YourCustomMemoryStream, and FileStream doesn't implement a pre-existing interface to get a Name; so the caller would have to explicitly handle both separately, or use duck-typing (maybe via dynamic or reflection).
Another option (perhaps easier) might be: write your data to a temporary file; use a FileStream from there; then (later) delete the file.

I know this is old but there is a lot of misinformation in this thread.
It's all about buffer size. The internal buffers are significantly smaller with a memory stream vs a file stream. Smaller buffers cause more read\writes.
Just intilaize your memory stream with either a file stream or a byte array with a size of around 80k. Close the doc, set stream position to 0 and read to end the contents.
On a side note, get buffer will return the whole allocated buffer. So if you only wrote 1 byte and the buffer is 4k, you will have a lot of garbage in your string.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.