Avoiding MemoryStream.ToArray() when using System.IO.Compression.ZipArchive

Avoiding MemoryStream.ToArray() when using System.IO.Compression.ZipArchive - c#

A helper method to turn a string into a zipped up text file:
public static System.Net.Mail.Attachment CreateZipAttachmentFromString(string content, string filename)
{
using (MemoryStream memoryStream = new MemoryStream())
{
using (ZipArchive zipArchive = new ZipArchive(memoryStream, ZipArchiveMode.Update))
{
ZipArchiveEntry zipArchiveEntry = zipArchive.CreateEntry(filename);
using (StreamWriter streamWriter = new StreamWriter(zipArchiveEntry.Open()))
{
streamWriter.Write(content);
}
}
MemoryStream memoryStream2 = new MemoryStream(memoryStream.ToArray(), false);
return new Attachment(memoryStream2, filename + ".zip", MediaTypeNames.Application.Zip);
}
}
I was really hoping to avoid turning the first memory stream into an array, making another memory stream on it to read it, and passing that to attachment. My logic was, why copy X megabytes to another place in memory to establish another stream pointing to the copy, when it's essentially just what we started out with.. It's the multi-megabyte equivalent of redundancy like if(myBool == true)
So I figured instead I would Seek back to the start of the first memory stream, and then attachment could just read that.. Or I would establish another memorystream pointing to the buffer of the first, and with the offset and length parameters set so it would know what to read
Neither of these approaches work out because it seems that ZipArchive only pushes data into the memory stream (in my case maybe) when control falls out of the using block and the ziparchive is disposed. Disposing it also disposes the MemoryStream and nearly everything (other than ToArray() and GetBuffer()) throw ObjectDisposedException.
Ultimately I can't seek it or get its length after the ZipArchive pumps data into it and before it pumps it in, the offset is usually zero and the length is definitely zero so the values are useless
Is there a nice optimal way, short of configuring my own over-large buffer (which then makes it non expandable by MemoryStream), to avoid having to burn up around 2x the memory bytes of the archive size with this method?

Most well designed streams and stream-users in .NET have an additional boolean parameter that can be used to instruct them to leave the "base stream" (terrible name) open when disposing.
This is ZipArchive's constructor:
public ZipArchive(
Stream stream,
ZipArchiveMode mode,
bool leaveOpen
)

There is no need for a second MemoryStream. You need to do two things:
Ensure, that the MemoryStream is not disposed before the last usage point. This is harmless. Disposing a MemoryStream does nothing helpful and for compatibility reasons can never do anything in the future. The .NET Framework has a very high compatibility bar. They often don't even dare to rename fields.
Seek to offset zero.
So remove the using around the MemoryStream and use the ctor for ZipArchive that allows you to leave the stream open.
Since the Attachment you are returning makes use of the MemoryStream you can't dispose it before exiting the method. Again, this is harmless. The only negative point is that the code becomes less obvious.
There's an entirely different approach: You can write your own Stream class that creates the bytes on demand. That way there is no need to buffer the string and ZIP bytes at all. This is much more work, of course. And it does not detract from the fact that the whole string must sit in memory at once, so it's still not a O(1) space solution.

public static System.Net.Mail.Attachment CreateZipAttachmentFromString(string content, string filename)
{
MemoryStream memoryStream = new MemoryStream();
using (ZipArchive zipArchive = new ZipArchive(memoryStream, ZipArchiveMode.Update, true))
{
ZipArchiveEntry zipArchiveEntry = zipArchive.CreateEntry(filename);
using (StreamWriter streamWriter = new StreamWriter(zipArchiveEntry.Open()))
{
streamWriter.Write(content);
}
}
memoryStream.Position = 0;
return new Attachment(memoryStream, filename + ".zip", MediaTypeNames.Application.Zip);
}

Related

C# Convert FileStream.WriteLine to go to a MemoryStream

I wrote some code in a console program and tested with files.
Now I want to port it to a BizTalk Pipeline Component that implements a specific interface. I wasn't aware that that .Write and .WriteLine methods from a File to a Memory Stream were so different. I thought I would just be able to swap my objects. There is no .WriteLine method, and the .Write method requires offset and bytes (additional parameters).
So now, what is the best way to change my tested code to write to the memory stream, given that I have a lot of .WriteLine statements. I could write to a StringBuffer first, but then I think that would blow the concept of streaming (i.e. would have the whole document in memory at one time).
// This is how I used the streams in the Console program
//FileStream originalStream = File.Open(inFilename, FileMode.Open);
//StreamWriter streamToReturn = new StreamWriter(outFilename);
// This is how to get the input stream in the BizTalk Pipeline Componenet
System.IO.Stream originalStream = pInMsg.BodyPart.GetOriginalDataStream();
MemoryStream streamToReturn = new MemoryStream();
streamToReturn.WriteLine("<" + schemaStructure.rootElement + ">");
There's a lot more code not shown here. Above is just to set the stage for what I did.

Use a StreamWriter which you can use to call WriteLine.
MemoryStream streamToReturn = new MemoryStream();
var writer = new StreamWriter(streamToReturn);
writer.WriteLine("<" + schemaStructure.rootElement + ">");

Generate zip file with xml content on the fly [duplicate]

I want to write a String to a Stream (a MemoryStream in this case) and read the bytes one by one.
stringAsStream = new MemoryStream();
UnicodeEncoding uniEncoding = new UnicodeEncoding();
String message = "Message";
stringAsStream.Write(uniEncoding.GetBytes(message), 0, message.Length);
Console.WriteLine("This:\t\t" + (char)uniEncoding.GetBytes(message)[0]);
Console.WriteLine("Differs from:\t" + (char)stringAsStream.ReadByte());
The (undesired) result I get is:
This: M
Differs from: ?
It looks like it's not being read correctly, as the first char of "Message" is 'M', which works when getting the bytes from the UnicodeEncoding instance but not when reading them back from the stream.
What am I doing wrong?
The bigger picture: I have an algorithm which will work on the bytes of a Stream, I'd like to be as general as possible and work with any Stream. I'd like to convert an ASCII-String into a MemoryStream, or maybe use another method to be able to work on the String as a Stream. The algorithm in question will work on the bytes of the Stream.

After you write to the MemoryStream and before you read it back, you need to Seek back to the beginning of the MemoryStream so you're not reading from the end.
UPDATE
After seeing your update, I think there's a more reliable way to build the stream:
UnicodeEncoding uniEncoding = new UnicodeEncoding();
String message = "Message";
// You might not want to use the outer using statement that I have
// I wasn't sure how long you would need the MemoryStream object
using(MemoryStream ms = new MemoryStream())
{
var sw = new StreamWriter(ms, uniEncoding);
try
{
sw.Write(message);
sw.Flush();//otherwise you are risking empty stream
ms.Seek(0, SeekOrigin.Begin);
// Test and work with the stream here.
// If you need to start back at the beginning, be sure to Seek again.
}
finally
{
sw.Dispose();
}
}
As you can see, this code uses a StreamWriter to write the entire string (with proper encoding) out to the MemoryStream. This takes the hassle out of ensuring the entire byte array for the string is written.
Update: I stepped into issue with empty stream several time. It's enough to call Flush right after you've finished writing.

Try this "one-liner" from Delta's Blog, String To MemoryStream (C#).
MemoryStream stringInMemoryStream =
new MemoryStream(ASCIIEncoding.Default.GetBytes("Your string here"));
The string will be loaded into the MemoryStream, and you can read from it. See Encoding.GetBytes(...), which has also been implemented for a few other encodings.

You're using message.Length which returns the number of characters in the string, but you should be using the nubmer of bytes to read. You should use something like:
byte[] messageBytes = uniEncoding.GetBytes(message);
stringAsStream.Write(messageBytes, 0, messageBytes.Length);
You're then reading a single byte and expecting to get a character from it just by casting to char. UnicodeEncoding will use two bytes per character.
As Justin says you're also not seeking back to the beginning of the stream.
Basically I'm afraid pretty much everything is wrong here. Please give us the bigger picture and we can help you work out what you should really be doing. Using a StreamWriter to write and then a StreamReader to read is quite possibly what you want, but we can't really tell from just the brief bit of code you've shown.

I think it would be a lot more productive to use a TextWriter, in this case a StreamWriter to write to the MemoryStream. After that, as other have said, you need to "rewind" the MemoryStream using something like stringAsStream.Position = 0L;.
stringAsStream = new MemoryStream();
// create stream writer with UTF-16 (Unicode) encoding to write to the memory stream
using(StreamWriter sWriter = new StreamWriter(stringAsStream, UnicodeEncoding.Unicode))
{
sWriter.Write("Lorem ipsum.");
}
stringAsStream.Position = 0L; // rewind
Note that:
StreamWriter defaults to using an instance of UTF8Encoding unless specified otherwise. This instance of UTF8Encoding is constructed without a byte order mark (BOM)
Also, you don't have to create a new UnicodeEncoding() usually, since there's already one as a static member of the class for you to use in convenient utf-8, utf-16, and utf-32 flavors.
And then, finally (as others have said) you're trying to convert the bytes directly to chars, which they are not. If I had a memory stream and knew it was a string, I'd use a TextReader to get the string back from the bytes. It seems "dangerous" to me to mess around with the raw bytes.

You need to reset the stream to the beginning:
stringAsStream.Seek(0, SeekOrigin.Begin);
Console.WriteLine("Differs from:\t" + (char)stringAsStream.ReadByte());
This can also be done by setting the Position property to 0:
stringAsStream.Position = 0

stream.CopyTo - file empty. asp.net

I'm saving an uploaded image using this code:
using (var fileStream = File.Create(savePath))
{
stream.CopyTo(fileStream);
}
When the image is saved to its destination folder, it's empty, 0 kb. What could possible be wrong here? I've checked the stream.Length before copying and its not empty.

There is nothing wrong with your code. The fact you say "I've checked the stream.Length before copying and its not empty" makes me wonder about the stream position before copying.
If you've already consumed the source stream once then although the stream isn't zero length, its position may be at the end of the stream - so there is nothing left to copy.
If the stream is seekable (which it will be for a MemoryStream or a FileStream and many others), try putting
stream.Position = 0
just before the copy. This resets the stream position to the beginning, meaning the whole stream will be copied by your code.

I would recommend to put the following before CopyTo()
fileStream.Position = 0
Make sure to use the Flush() after this, to avoid empty file after copy.
fileStream.Flush()

This problem started for me after migrating my project from to .NET Core 1 to 2.2.
I fixed this issue by setting the Position of my filestream to zero.
using (var fileStream = new FileStream(savePath, FileMode.Create))
{
fileStream.Position = 0;
await imageFile.CopyToAsync(fileStream);
}

Why disposing StreamReader makes a stream unreadable? [duplicate]

This question already has answers here:
Can you keep a StreamReader from disposing the underlying stream?
(9 answers)
Closed 7 years ago.
I need to read a stream two times, from start to end.
But the following code throws an ObjectDisposedException: Cannot access a closed file exception.
string fileToReadPath = #"<path here>";
using (FileStream fs = new FileStream(fileToReadPath, FileMode.Open))
{
using (StreamReader reader = new StreamReader(fs))
{
string text = reader.ReadToEnd();
Console.WriteLine(text);
}
fs.Seek(0, SeekOrigin.Begin); // ObjectDisposedException thrown.
using (StreamReader reader = new StreamReader(fs))
{
string text = reader.ReadToEnd();
Console.WriteLine(text);
}
}
Why is it happening? What is really disposed? And why manipulating StreamReader affects the associated stream in this way? Isn't it logical to expect that a seekable stream can be read several times, including by several StreamReaders?

This happens because the StreamReader takes over 'ownership' of the stream. In other words, it makes itself responsible for closing the source stream. As soon as your program calls Dispose or Close (leaving the using statement scope in your case) then it will dispose the source stream as well. Calling fs.Dispose() in your case. So the file stream is dead after leaving the first using block. It is consistent behavior, all stream classes in .NET that wrap another stream behave this way.
There is one constructor for StreamReader that allows saying that it doesn't own the source stream. It is however not accessible from a .NET program, the constructor is internal.
In this particular case, you'd solve the problem by not using the using-statement for the StreamReader. That's however a fairly hairy implementation detail. There's surely a better solution available to you but the code is too synthetic to propose a real one.

The purpose of Dispose() is to clean up resources when you're finished with the stream. The reason the reader impacts the stream is because the reader is just filtering the stream, and so disposing the reader has no meaning except in the context of it chaining the call to the source stream as well.
To fix your code, just use one reader the entire time:
using (FileStream fs = new FileStream(fileToReadPath, FileMode.Open))
using (StreamReader reader = new StreamReader(fs))
{
string text = reader.ReadToEnd();
Console.WriteLine(text);
fs.Seek(0, SeekOrigin.Begin); // ObjectDisposedException not thrown now
text = reader.ReadToEnd();
Console.WriteLine(text);
}
Edited to address comments below:
In most situations, you do not need to access the underlying stream as you do in your code (fs.Seek). In these cases, the fact that StreamReader chains its call to the underlying stream allows you to economize on the code by not using a usings statement for the stream at all. For example, the code would look like:
using (StreamReader reader = new StreamReader(new FileStream(fileToReadPath, FileMode.Open)))
{
...
}

Using defines a scope, outside of which an object will be disposed, thus the ObjectDisposedException. You can't access the StreamReader's contents outside of this block.

I agree with your question. The biggest issue with this intentional side-effect is when developers don't know about it and are blindly following the "best practice" of surrounding a StreamReader with a using. But it can cause some really hard to track down bugs when it is on a long-lived object's property, the best (worst?) example I've seen is
using (var sr = new StreamReader(HttpContext.Current.Request.InputStream))
{
body = sr.ReadToEnd();
}
The developer had no idea the InputStream is now hosed for any future place that expects it to be there.
Obviously, once you know the internals you know to avoid the using and just read and reset the position. But I thought a core principle of API design was to avoid side effects, especially not destroying data you are acting upon. Nothing inherent about a class that supposedly is a "reader" should clear the data it reads when done "using" it. Disposing of the reader should release any references to the Stream, not clear the stream itself. The only thing I can think is that the choice had to be made since the reader is altering other internal state of the Stream, like the position of the seek pointer, that they assumed if you are wrapping a using around it you are intentionally going to be done with everything. On the other hand, just like in your example, if you are creating a Stream, the stream itself will be in a using, but if you are reading a Stream that was created outside of your immediate method, it is presumptuous of the code to clear the data.
What I do and tell our developers to do on Stream instances that the reading code doesn't explicitly create is...
// save position before reading
long position = theStream.Position;
theStream.Seek(0, SeekOrigin.Begin);
// DO NOT put this StreamReader in a using, StreamReader.Dispose() clears the stream
StreamReader sr = new StreamReader(theStream);
string content = sr.ReadToEnd();
theStream.Seek(position, SeekOrigin.Begin);
(sorry I added this as an answer, wouldn't fit in a comment, I would love more discussion about this design decision of the framework)

Dispose() on parent will Dispose() all owned streams. Unfortunately, streams don't have Detach() method, so you have to create some workaround here.

I don't know why, but you can leave your StreamReader undisposed. That way your underlying stream won't be disposed, even when StreamReader got collected.

Reliable way to convert a file to a byte[]

I found the following code on the web:
private byte [] StreamFile(string filename)
{
FileStream fs = new FileStream(filename, FileMode.Open,FileAccess.Read);
// Create a byte array of file stream length
byte[] ImageData = new byte[fs.Length];
//Read block of bytes from stream into the byte array
fs.Read(ImageData,0,System.Convert.ToInt32(fs.Length));
//Close the File Stream
fs.Close();
return ImageData; //return the byte data
}
Is it reliable enough to use to convert a file to byte[] in c#, or is there a better way to do this?

byte[] bytes = System.IO.File.ReadAllBytes(filename);
That should do the trick. ReadAllBytes opens the file, reads its contents into a new byte array, then closes it. Here's the MSDN page for that method.

byte[] bytes = File.ReadAllBytes(filename)
or ...
var bytes = File.ReadAllBytes(filename)

Not to repeat what everyone already have said but keep the following cheat sheet handly for File manipulations:
System.IO.File.ReadAllBytes(filename);
File.Exists(filename)
Path.Combine(folderName, resOfThePath);
Path.GetFullPath(path); // converts a relative path to absolute one
Path.GetExtension(path);

All these answers with .ReadAllBytes(). Another, similar (I won't say duplicate, since they were trying to refactor their code) question was asked on SO here: Best way to read a large file into a byte array in C#?
A comment was made on one of the posts regarding .ReadAllBytes():
File.ReadAllBytes throws OutOfMemoryException with big files (tested with 630 MB file
and it failed) – juanjo.arana Mar 13 '13 at 1:31
A better approach, to me, would be something like this, with BinaryReader:
public static byte[] FileToByteArray(string fileName)
{
byte[] fileData = null;
using (FileStream fs = File.OpenRead(fileName))
{
var binaryReader = new BinaryReader(fs);
fileData = binaryReader.ReadBytes((int)fs.Length);
}
return fileData;
}
But that's just me...
Of course, this all assumes you have the memory to handle the byte[] once it is read in, and I didn't put in the File.Exists check to ensure the file is there before proceeding, as you'd do that before calling this code.

looks good enough as a generic version. You can modify it to meet your needs, if they're specific enough.
also test for exceptions and error conditions, such as file doesn't exist or can't be read, etc.
you can also do the following to save some space:
byte[] bytes = System.IO.File.ReadAllBytes(filename);

Others have noted that you can use the built-in File.ReadAllBytes. The built-in method is fine, but it's worth noting that the code you post above is fragile for two reasons:
Stream is IDisposable - you should place the FileStream fs = new FileStream(filename, FileMode.Open,FileAccess.Read) initialization in a using clause to ensure the file is closed. Failure to do this may mean that the stream remains open if a failure occurs, which will mean the file remains locked - and that can cause other problems later on.
fs.Read may read fewer bytes than you request. In general, the .Read method of a Stream instance will read at least one byte, but not necessarily all bytes you ask for. You'll need to write a loop that retries reading until all bytes are read. This page explains this in more detail.

string filePath= #"D:\MiUnidad\testFile.pdf";
byte[] bytes = await System.IO.File.ReadAllBytesAsync(filePath);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Avoiding MemoryStream.ToArray() when using System.IO.Compression.ZipArchive - c#

Most well designed streams and stream-users in .NET have an additional boolean parameter that can be used to instruct them to leave the "base stream" (terrible name) open when disposing. This is ZipArchive's constructor: public ZipArchive( Stream stream, ZipArchiveMode mode, bool leaveOpen )

Related

C# Convert FileStream.WriteLine to go to a MemoryStream

Generate zip file with xml content on the fly [duplicate]

stream.CopyTo - file empty. asp.net

Why disposing StreamReader makes a stream unreadable? [duplicate]

Reliable way to convert a file to a byte[]

Categories

Resources