I'm working on a project where I need the ability to unzip streams and byte arrays as well as zip them. I was running some unit tests that create the Zip from a stream and then unzip them and when I unzip them, the only way that DonNetZip sees them as a zip is if I run streamToZip.Seek(o,SeekOrigin.Begin) and streamToZip.Flush(). If I don't do this, I get the error "Cannot read Block, No data" on the ZipFile.Read(stream).
I was wondering if anyone could explain why that is. I've seen a few articles on using it to actually set the relative read position, but none that really explain why in this situation it is required.
Here is my Code:
Zipping the Object:
public Stream ZipObject(Stream data)
{
var output = new MemoryStream();
using (var zip = new ZipFile())
{
zip.AddEntry(Name, data);
zip.Save(output);
FlushStream(output);
ZippedItem = output;
}
return output;
}
Unzipping the Object:
public List<Stream> UnZipObject(Stream data)
{
***FlushStream(data); // This is what I had to add in to make it work***
using (var zip = ZipFile.Read(data))
{
foreach (var item in zip)
{
var newStream = new MemoryStream();
item.Extract(newStream);
UnZippedItems.Add(newStream);
}
}
return UnZippedItems;
}
Flush method I had to add:
private static void FlushStream(Stream stream)
{
stream.Seek(0, SeekOrigin.Begin);
stream.Flush();
}
When you return output from ZipObject, that stream is at the end - you've just written the data. You need to "rewind" it so that the data can then be read. Imagine you had a video cassette, and had just recorded a program - you'd need to rewind it before you watched it, right? It's exactly the same here.
I would suggest doing this in ZipObject itself though - and I don't believe the Flush call is necessary. I'd personally use the Position property, too:
public Stream ZipObject(Stream data)
{
var output = new MemoryStream();
using (var zip = new ZipFile())
{
zip.AddEntry(Name, data);
zip.Save(output);
}
output.Position = 0;
return output;
}
When you write to a stream, the position is changed. If you want to decompress it (the same stream object), you'll need to reset the position. Else you'll get a EndOfStreamException because the ZipFile.Read will start at the stream.Position.
So
stream.Seek(0, SeekOrigin.Begin);
Or
stream.Position = 0;
would do the trick.
Offtopic but sure useful:
public IEnumerable<Stream> UnZipObject(Stream data)
{
using (var zip = ZipFile.Read(data))
{
foreach (var item in zip)
{
var newStream = new MemoryStream();
item.Extract(newStream);
newStream.Position = 0;
yield return newStream;
}
}
}
Won't unzip all items in memory (because of the MemoryStream used in the UnZipObject(), only when iterated. Thats because extracted items are yielded. (returning an IEnumerable<Stream>) More info on yield: http://msdn.microsoft.com/en-us/library/vstudio/9k7k7cf0.aspx
Normally i wouldn't recomment returning data as stream, because the stream is something like an iterator (using the .Position as current position). This way it isn't by default threadsafe. I'd rather return these memory streams as ToArray().
Related
I have been messing with this for hours to no avail. I am trying to copy an excel file, add a new sheet to it, put the file in a MemoryStream and then return the stream.
Here is the code:
public Stream ProcessDocument()
{
var resultStream = new MemoryStream();
string sourcePath = "path\\to\\file";
string destinationPath = "path\\to\\file";
CopyFile(destinationPath, sourcePath);
var copiedFile = SpreadsheetDocument.Open(destinationPath, true);
var fileWithSheets = SpreadsheetDocument.Open("path\\to\\file", false);
AddCopyOfSheet(fileWithSheets.WorkbookPart, copiedFile.WorkbookPart, "foo");
using(var stream = new MemoryStream()){
copiedFile.WorkbookPart.Workbook.Save(stream);
stream.Position = 0;
stream.CopyTo(resultsStream);
}
return resultsStream;
}
public void CopyFile(string outputFullFilePath, string inputFileFullPath)
{
int bufferSize = 1024 * 1024;
using (var fileStream = new FileStream(outputFullFilePath, FileMode.OpenOrCreate))
{
var fs = new FileStream(inputFileFullPath, FileMode.Open, FileAccess.ReadWrite);
fileStream.SetLength(fs.Length);
int bytesRead = -1;
byte[] bytes = new byte[bufferSize];
while ((bytesRead = fs.Read(bytes, 0, bufferSize)) > 0)
{
fileStream.Write(bytes, 0, bytesRead);
}
fs.Close();
fileStream.Close();
}
}
public static void AddCopyOfSheet(WorkbookPart sourceDocument, WorkbookPart destinationDocument, string sheetName)
{
WorksheetPart sourceSheetPart = GetWorkSheetPart(sourceDocument, sheetName);
destinationDocument.AddPart(sourceSheetPart);
}
public static WorksheetPart GetWorksheetPart(WorkbookPart workbookPart, string sheetName)
{
string id = workbookPart.Workbook.Descendants<Sheet>().First(x => x.Name.Value.Contains(sheetName)).Id
return (WorksheetPart)workbookPart.GetPartById(id);
}
The issue seems to arise from copiedFile.WorkbookPart.Workbook.Save(stream).
After this is ran, I get an error saying that there was an exception of type 'System.ObjectDisposedException'. The file copies fine and adding the sheet seems to also be working.
Here's what I've tried:
Using .Save() without stream as a parameter. It does nothing.
Using two different streams (hence the resultStream jank left in this code)
Going pure OpenXML and copying the WorkbookParts to a stream directly. Tested with a plain text excel and was fine, but it breaks the desired file because it has some advanced formatting that does not seem to work well with OpenXML. I am open to refactoring if someone knows how I could work around this, though.
What I haven't tried:
Creating ANOTHER copy of the copy and using the SpreadsheetDocument.Create(stream, type) method. I have a feeling this would work but it seems like an awful and slow solution.
Updating OpenXML. I am currently on 2.5.
Any feedback or ideas are hugely appreciated. Thank you!
PS: My dev box is airgapped so I had to hand write this code over. Sorry if there are any errors.
Turns out that copiedFile.WorkbookPart.Workbook.Save(stream); disposes of the stream by default. The workaround to this was to make a MemoryStream class that overloads its ability to be disposed, like so:
public class DisposeLockableMemoryStream : MemoryStream
{
public DisposeLockableMemoryStream(bool allowDispose)
{
AllowDispose = allowDispose;
}
public bool AllowDispose { get; set; }
protected override void Dispose(bool disposing)
{
if (!AllowDispose)
return;
base.Dispose(disposing);
}
}
All you need to do is make sure you stream.AllowDispose = true and then dispose of it once you're done.
Now, this didn't really fix my code because it turns out that .Save() only tracks changes made to the document, not the entire thing!!!. Basically, this library is hot garbage and I regret signing up for this story to begin with.
For more information, see a post I made on r/csharp.
I have implemented a code block in order to convert Stream into Byte Array. And code snippet is shown below. But unfortunately, it gives OutOfMemory Exception while converting MemoryStream to Array (return newDocument.ToArray();). please could someone help me with this?
public byte[] MergeToBytes()
{
using (var processor = new PdfDocumentProcessor())
{
AppendStreamsToDocumentProcessor(processor);
using (var newDocument = new MemoryStream())
{
processor.SaveDocument(newDocument);
return newDocument.ToArray();
}
}
}
public Stream MergeToStream()
{
return new MemoryStream(MergeToBytes());
}
Firstly: how big is the document? if it is too big for the byte[] limit: you're going to have to use a different approach.
However, a MemoryStream is already backed by an (oversized) array; you can get this simply using newDocument.TryGetBuffer(out var buffer), and noting that you must restrict yourself to the portion of the .Array indicated by .Offset (usually, but not always, zero) and .Count (the number of bytes that should be considered "live"). Note that TryGetBuffer can return false, but not in the new MemoryStream() scenario.
If is also interesting that you're converting a MemoryStream to a byte[] and then back to a MemoryStream. An alternative here would just have been to set the Position back to 0, i.e. rewind it. So:
public Stream MergeToStream()
{
using var processor = new PdfDocumentProcessor();
AppendStreamsToDocumentProcessor(processor);
var newDocument = new MemoryStream();
processor.SaveDocument(newDocument);
newDocument.Position = 0;
return newDocument;
}
var incomingStream = ...
var outgoingStream = ...
await incomingStream.CopyToAsync(outgoingStream);
The above code is simple enough, and copies a incoming stream to the outgoign stream. Both streams being chunked transfers coming/going over the interet.
Now, lets say i wanted to Transform the stream with something like Func<Stream,Stream,Task> how would I do that without reading all data in.
Ofcause I could just do
var ms = new MemoryStream();
incomingStream.CopyTo(ms);
--- do transform of streams and seek
ms.CopyTo(outgoingStream)
but that would read the hole thing into the ms, is there any build in stuff that allows me to read from incoming stream and write to a new stream that dont buffer everything up but instead just keep a small internal stream for buffered data and it wont read from incoming stream before data is pulled off it again.
What I am trying to do is:
protected async Task XmlToJsonStream(Stream instream, Stream outStream)
{
XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.IgnoreWhitespace = false;
var reader = XmlReader.Create(instream, readerSettings);
var jsonWriter = new JsonTextWriter(new StreamWriter(outStream));
jsonWriter.WriteStartObject();
while (await reader.ReadAsync())
{
jsonWriter.writeReader(reader);
}
jsonWriter.WriteEndObject();
jsonWriter.Flush();
}
protected async Task XmlFilterStream(Stream instream, Stream outStream)
{
XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.IgnoreWhitespace = false;
var reader = XmlReader.Create(instream, readerSettings);
var writer = XmlWriter.Create(outStream, new XmlWriterSettings { Async = true, CloseOutput = false })
while (reader.Read())
{
writer.writeReader(reader);
}
}
but i dont know how to hook it up.
var incomingStream = ...
var outgoingStream = ...
var temp=...
XmlFilterStream(incomingStream,temp);
XmlToJsonStream(temp,outgoingstream);
because if I use a MemoryStream as temp, would it not just at the end have it all stored in the stream. Looking for at stream that throws away the data again when it has been read.
All of the above is just example code, missing some disposes and seeks ofcause, but I hope I managed to illustrate what i am going for. To be able to based on settings to plug and play between just copying stream, doing xml filtering and optional transform it to json.
Streams are sequences of bytes, so a stream transformation would be something like Func<ArraySegment<byte>, ArraySegment<byte>>. You can then apply it in a streaming way:
async Task TransformAsync(this Stream source, Func<ArraySegment<byte>, ArraySegment<byte>> transform, Stream destination, int bufferSize = 1024)
{
var buffer = new byte[bufferSize];
while (true)
{
var bytesRead = await source.ReadAsync(buffer, 0, bufferSize);
if (bytesRead == 0)
return;
var bytesToWrite = transform(new ArraySegment(buffer, 0, bytesRead));
if (bytesToWrite.Count != 0)
await destination.WriteAsync(bytesToWrite.Buffer, bytesToWrite.Offset, bytesToWrite.Count);
}
}
It's a bit more complicated than that, but that's the general idea. It needs some logic to ensure WriteAsync writes all the bytes; and there's also usually a "flush" method that is required in addition to the transform method, which is called when the source stream finishes, so the transform algorithm has a last chance to return its final data to write to the output stream.
If you want streams of other things, like XML or JSON types, then you're probably better off going with Reactive Extensions.
I'm not sure I understand your question fully, but I think you're asking how you would operate on an input stream without loading it entirely into memory first.
In this case, you wouldn't want do do something like this:
var ms = new MemoryStream();
incomingStream.CopyTo(ms);
This does load the entire input stream incomingStream into memory -- into ms.
From what I can see, your XmlFilterStream method seems to be redundant, i.e. XmlToJsonStream does everything that XmlFilterStream does anyway.
Why not just have:
protected async Task XmlToJsonStream(Stream instream, Stream outStream)
{
XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.IgnoreWhitespace = false;
var reader = XmlReader.Create(instream, readerSettings);
var jsonWriter = new JsonTextWriter(new StreamWriter(outStream));
jsonWriter.WriteStartObject();
while (await reader.ReadAsync())
{
jsonWriter.writeReader(reader);
}
jsonWriter.WriteEndObject();
jsonWriter.Flush();
}
And call it like this:
var incomingStream = ...
var outgoingStream = ...
XmlToJsonStream(incomingStream ,outgoingstream);
If the answer is that you have omitted some important details in XmlFilterStream then, without seeing those details, I would recommend that you just integrate those into the one XmlToJsonStream function.
I'm having an issue with copying data from a MemoryStream into a Stream inside a ZipArchive. The following is NOT working - it returns only 114 bytes:
GetDataAsByteArray(IDataSource dataSource)
{
using (var zipStream = new MemoryStream())
{
using (var archive = new ZipArchive(zipStream, ZipArchiveMode.Create, true))
{
var file = archive.CreateEntry("compressed.file");
using (var targetStream = file.Open())
{
using (var sourceStream = new MemoryStream())
{
await dataSource.LoadIntoStream(sourceStream);
sourceStream.CopyTo(targetStream);
}
}
}
var result = zipStream.ToArray();
zipStream.Close();
return result;
}
}
However, using the implementation below for the "copy"-process, all 1103 bytes are written to the array/memory stream:
await targetStream.WriteAsync(sourceStream.ToArray(), 0, (int) sourceStream.Length);
I'm wondering why the CopyTo yields less bytes. Also I'm feeling unsecure with the cast to Int32 in the second implementation.
FYI: Comparing the byte array: It looks like only the header and footer of the zip file were written by the first implementation.
Stream.CopyTo() starts copying from the stream's current Position. Which probably isn't 0 after that LoadIntoStream() call. Since it is a MemoryStream, you can simply fix it like this:
await dataSource.LoadIntoStream(sourceStream);
sourceStream.Position = 0;
sourceStream.CopyTo(targetStream);
Set sourceStream.Position = 0 before copying it. The copy will copy from the current position to the end of the stream.
As other have said the Position is probably no longer 0. You can't always set the Position back to 0 though, such as for Network and Compressed streams. You should check the stream.CanSeek property before doing any operations and if it is false then copy the stream to a new MemoryStream first (which can be seeked) and then after each operation which changes the position set the Position back to 0.
Following the documentation, I'm having an extremely difficult time getting this to work. Using ZipFile I want to create a zip in memory and then be able to update it. On each successive call to update the, the zip reports that it has 0 entries.
What am I doing wrong?
public void AddFile(MemoryStream zipStream, Stream file, string fileName)
{
//First call to this zipStream is just an empty stream
var zip = ZipFile.Create(zipStream);
zip.BeginUpdate();
zip.Add(new ZipDataSource(file), fileName);
zip.CommitUpdate();
zip.IsStreamOwner = false;
zip.Close();
zipStream.Position = 0;
}
public Stream GetFile(Stream zipStream, string pathAndName)
{
var zip = ZipFile.Create(zipStream);
zip.IsStreamOwner = false;
foreach (ZipEntry hi in zip) //count is 0
{
}
var entry = zip.GetEntry(pathAndName);
return entry == null ? null : zip.GetInputStream(entry);
}
The custom data source
public class ZipDataSource : IStaticDataSource
{
private Stream _stream;
public ZipDataSource(Stream stream)
{
_stream = stream;
}
public Stream GetSource()
{
_stream.Position = 0;
return _stream;
}
ZipFile.Create(zipStream) is not just a convenient static accessor like anyone would think. If you're going to use that only use it for the very first time you're creating a zip. When opening up an existing zip you need to use var zip = new ZipFile(zipStream).
I've personally had many issues in the past with this library and would suggest that anyone looking for a good zip library to choose something other than icsharpziplib... The API just plain sucks.